Tuesday, April 10, 2012

Of Turkeys, Traffic and Testing: Cloud Computing Lessons from My Hometown

You've got backup systems and power in place. What else could you possibly need? Testing routines -- and maybe cloud-based backups for your backups.

Sometimes, blog posts almost write themselves. As an example, let me quote directly from a front-page story from today's edition of my local paper, the Santa Rosa Press Democrat. The headline of the story in the print edition of the paper: "County's 911 system knocked out by turkey."

"A wild turkey that flew into power lines knocked out Sonoma County's high-tech emergency 911 dispatch system Sunday night and crippled operations at the courthouse and county jail Monday.

"The power blackout was compounded when the county's massive and expensive emergency backup power system failed.

"On Monday, with computers out, traffic court was greatly curtailed, court calendars and proceedings had to be recorded by hand and jail inmates missed morning court appearances."

"With the blackout, the [emergency services] dispatchers' computers and every computer connected to the county system went black, officials said. At that point, the county's uninterrupted power supply, or UPS, should have kicked in.

"'The UPS failed,' said Chris Hentz, who supervises tech support for the county's computer dispatch system. 'It's just that simple.'"

The Lesson Here: If your systems are indeed critical, consider renting and configuring some cloud-based computing and communications resources as "warm spare"backups you can activate quickly should your primary systems fail, for whatever reason. (A favorite company of mine, 2600Hz, is doing some interesting things around cloud-based E911 and disaster recovery, for example.) Regular testing of back-up systems, including supposedly "uninterruptible" power supplies, is also a good idea. But you're already doing that. Right?

Amazingly, in another section of the same day's paper, this headline: "Traffic control system crashes." The story, adapted from a March 30 post at the Press Democrat's "Road Warrior" blog, goes on to report that on March 23, both primary and backup servers running the software that coordinates Santa Rosa's traffic signals crashed. "This force signals at major intersections to revert to a dated program that isn't nearly as efficient as the current one," the story added.

"When the server that housed the software program crashed, the backup files were also corrupted, [city traffic engineer Rob] Sprinkle said. So instead of just rebooting the program on a new server, city staff have spent two weeks essentially rebuilding it, he said. He estimated that the fix is 95 percent complete."

The Lesson Here: At the risk of repeating myself (again/still), if you can't or aren't prepared to move a critical application to the cloud, at least put a backup version there. And test both your primary and backup solutions, especially the connections that are supposed to make that cloud-based backup readily available. (How many of you are using a cloud-based backup service such as Carbonite, Google Cloud Storage, iCloud or Mozy to back up your personal or business files? And how often have you tested the file-restoration features of your chosen solution?)

The Bottom Line: Don't be a turkey, wild or otherwise. Add cloud-based backups to your technology toolkits. And trust, but verify. Test everything regularly enough to let you and your colleagues sleep well at night.