Hurricane Katrina was one of the deadliest natural disasters ever to confront the United States; at least 1,836 people lost their lives as a result of the storm and subsequent flooding. Katrina was also the costliest hurricane in American history, with estimated damages exceeding $80 billion. Amid the scenes of death and devastation emanating from New Orleans, grave questions regarding the lack of preparedness on the part of state, local, and federal agencies immediately surfaced and continue to the present day.
Yet amid the controversies that have unfolded in the wake of Katrina, and the discussion regarding the steps necessary to lessen the impact of future disasters, several success stories have emerged. One of these involves the mission-critical computer systems used by the Naval Reserve Forces Command (the Reserve Component of the U.S. Navy), many of which were based in and around New Orleans. Thanks to a rigorous program of planning and testing on the part of the Navy Reserve's IT professionals, the systems were switched over to backup mode without affecting service delivery to the thousands of personnel who access these resources daily.
The power of planning—and practice
The Navy Reserve's computer systems house a broad range of tactical, financial, medical, and logistical information related to the activities and deployment of 125,000 Reservists. System capabilities range from a payroll application, to an order-writing system that generates active duty and training orders, to medical data documenting reservists' medical readiness. Other applications range from the official Navy Reserves website to a system for tracking cargo and passenger transport. These applications are accessed thousands of times daily by officers responsible for deploying and supporting reservists, who request, approve, and print their orders 24/7 from everywhere in the world.
According to Captain Sam Sumwalt, Deputy Chief of Staff for Information Technology and Chief Technology Officer for the Naval Reserve Forces Command, "If our systems go down, our reservists simply can't provide the operational support that their commanding officers require. For example, just one of our applications is used to generate several hundred thousand sets of active duty orders each year," he says. Having around-the-clock access to this data is essential in the post-9/11 world, where reservists may need to be deployed virtually anywhere on short notice—from Iraq and Afghanistan to U.S.-based assignments.
To understand why the Navy Reserve was able to keep its systems up and running when Katrina struck in 2005, one must go back to 2002, when the Naval Reserve began laying the groundwork for its COOP (continuity of operations) plan. This process—part of the Reserve's effort to achieve compliance with Federal Preparedness Circular 65, a disaster recovery regulation created by the Department of Homeland Security—involved several stages: obtaining funding and commitment from the Reserve's leadership, gathering input from dozens of stakeholders, installing recovery servers, and phasing in the offsite replication of system data that had previously been stored using backup tapes.
In 2002, information replication was established between the initial COOP site at Fort Worth, Texas, and the data center at the Space and Naval Warfare (SPAWAR) Systems Center in New Orleans at the edge of Lake Pontchartrain. The following year, the Reserve established a system for replicating information among the two New Orleans facilities—SPAWAR and the Navy Reserve Headquarters Data Center—and the Fort Worth site. In addition to providing a cohesive plan for multiple recovery systems spanning a breadth of critical functions, the COOP initiative had to accommodate factors ranging from the inclusion of teleworking capabilities for IT staff to integration with the Department of Defense's Secret Internet Protocol Router Network.
While the essential framework for the COOP plan was established by the end of 2003, it was the Reserve's commitment to refining its capability that made all the difference when Katrina struck. "After the system was established, we practiced recovering or switching the COOP site once a quarter," explains Sumwalt. "Each time we ran the procedure, we'd learn something new about how the networks related or how applications could be accessed in some modes but not in others. The most important thing we learned from our experience was that with any disaster recovery system, you must test it, test it again, and then test it yet again."
Calm during the storm
As with most natural disasters, the residents of New Orleans-including Sumwalt's staff-had little time to prepare for Katrina's fury. On Friday, August 26, meteorologists predicted the hurricane would bypass New Orleans on its way to Florida. But by 6:00 on the morning of Saturday, August 27, Katrina had changed course, putting New Orleans directly in its path. That's when Sumwalt and his team decided to switch computer operations from the data center at the Naval Reserve's headquarters to the second center on Lake Pontchartrain. It soon became clear that the lakefront facility would need to be shut down as well, and that the third center, in Fort Worth, would have to function as the sole site for the Reserve's computer operations.
At the moment of truth, the Reserve's months of testing and preparation paid a huge dividend. "Once we made the decision to switch the operation to the Fort Worth facility, there was no hesitation, no debating over 'Should we do it now?' Everyone did exactly what they had to do, and the procedure went flawlessly. Given the environment we were dealing with, if we hadn't been so well prepared, panic could easily have won out," says Sumwalt.
Katrina made landfall on the morning of August 29. Despite the resulting catastrophic conditions, users of the Reserve's systems saw no evidence of the disaster. "We have thousands of users hitting our site 24 hours a day, worldwide," notes Sumwalt. "When the system doesn't work, our help desk calls go through the roof. They didn't."
Despite the extraordinary level of preparedness achieved by Sumwalt and his staff, a number of contingencies inhibited the operation-a logical consequence of a disaster on the scale of Katrina. "I live about an hour north of New Orleans," says Sumwalt. "My house was safe from flooding, so on Monday morning, I was able to direct operations at the disaster site by landline and cell phone—until the first of what would eventually be five trees smashed through my roof. As a result, I was incommunicado for several days. Even if my home hadn't been affected, there were numerous telecommunication problems with jammed switches and phone systems that were running on emergency generators. As a result, it was virtually impossible to vouch for the safety of our staff, which was anxiety-provoking, to say the least."
A second scare
One of the Reserve's two data centers in New Orleans was completely destroyed. While the second New Orleans facility didn't receive direct flooding, it had no power and no circuit capability. What's more, given the huge infrastructure problems across the city, it was impossible to restore the second facility until two and a half months after Katrina hit—which meant that the Reserve had to function without a reliable backup during that entire time.
The problems didn't stop there. On September 24, Hurricane Rita made landfall along the Gulf of Mexico, threatening the Fort Worth data center. As a result, Sumwalt and his staff had to make tape backups that were flown out of the area, since the New Orleans backup capabilities were totally inaccessible. According to Sumwalt, the Reserve's experience underscores the need for the geographic dispersion of disaster recovery facilities. "Having two data centers near each other protects you from local disasters, such as fires and bombs, but not regional disasters. As a result of Katrina, we've dispersed our storage networks through a three-way COOP plan that extends from New Orleans and Fort Worth to San Diego."
Lessons learned
Another lesson that Sumwalt's group learned from the Katrina experience is the importance of incorporating desktop systems and laptops into disaster recovery planning. "By the time many of our users realized the true threat that Katrina posed, they were unable to physically return to the office," notes Sumwalt. "Many of them hadn't taken their laptops home, since they weren't aware of the scope of the threat. As a result, we lost access to a significant portion of our administrative data after the hurricane. We actually had to send an armed team back to headquarters in a truck driven from Fort Worth to recover some of that mission-critical information. Backing up desktop data, and reinforcing the idea that disasters can happen at any time, are key parts of a successful COOP plan."
Sumwalt also feels that reliable disaster recovery systems enable staff members involved in managing those systems to stay focused on what matters most. "We've all heard the stories of incredible heroism that Katrina inspired," he says. "When I was stranded at home after the storm, a neighbor I didn't even know came by with a backhoe and cut me out so that my car would have access to the street. Many of our staff went to those kinds of lengths to take care of our customers. The fact that we had devoted the time and resources to building and testing a successful COOP capability helped take some of the edge off an extremely stressful situation."
As organizations discover the benefits of communicating with video, media management becomes a new challenge. Scott Kirsner looks at this new infrastructure phenomenon.
Read More
