http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9029881 By Bert Latamore August 09, 2007 Computerworld Pushed in part by U.S. business regulations concerning data preservation, financial and other high-end organizations are moving to a three data center architecture for disaster recovery, says Wikibon.org community member and data center consultant Josh Krischer. In this architecture, two nearby data centers are linked synchronously with a third, located farther away, linked asynchronously. However, he warned, some data is always lost in a disaster, even when the remote copy is done via a synchronous link. Keeping data losses to a minimum is critical for some applications, but a more important issue is assuring data consistency and integrity at the recovery site. Inconsistent data at the recovery site usually requires time-consuming recovery processes, which may take days. Speaking at Wikibon.org's weekly Peer Incite teleconference, which is open to all interested parties, Wikibon.org co-founder David Floyer related his experience consulting with one such company that was considering implementing very high-speed continuous asynchronous data transfer from its U.S. to its European data centers to guard against a potential major loss. "The company had two data centers, 15 miles apart, synchronously connected so transactional data is written to both simultaneously," he says. "If one goes down, it can recover from the other, theoretically with very little loss of data." The proximity of the two centers, determined in part by the distance over which a synchronous link can be maintained, also avoided one of the common errors in disaster recovery planning, putting the recovery site too far from the main data center. "Putting them far apart may make you feel safer," says Floyer, "but it actually makes recovery harder and more expensive and may therefore decreases the plan's effectiveness." However, he says, this company was concerned about the possibility of a regionwide disaster that might bring down both data centers. The organization was sending a 2TB incremental backup to its European data center twice daily, but in a regional disaster that could result in the loss of up to 20 hours of transactions. It wanted to invest in an advanced network-based system to create an asynchronous link between the U.S. and European data centers to reduce the maximum potential loss to a few minutes. The implementation and operational costs for this upgrade was estimated at about $25 million over three years. The business leads This might seem to be an extravagant solution to the problem, and Floyer emphasizes that this isn't the answer for everyone. "I worked with a retailer, for instance, who decided that local backup site was sufficient for their DR needs. If a regional disaster took out both data centers and distribution centers, they expected their business would not survive in any case." IT organizations (ITO) can't make the basic decisions on disaster recovery strategy, Floyer says. They must be based on business decisions concerning how much data and time a company can afford to lose, how much that loss will cost the organization and how best to mitigate that loss (e.g., insurance, different technical solutions or accepting the problem as a business risk). Only senior business executives, and in some cases, the board of directors can make those decisions. So rather than going it alone, IT needs to push the business to examine disaster recovery in light of its financial and legal compliance situation. "ITOs in organizations that talk about disaster recovery but fail to develop a business-lead plan should not be seduced by the opportunity to buy more technology or experiment with new products," added Peter Burris, Wikibon.org's co-founder and chief content officer. "Instead, they must act as aggressively as possible to force the business to lead the process." Triangulating cost The first responsibility of the business, Krischer says, is to develop a business impact analysis to estimate the recovery cost of data lost and damage caused by each minute the business is interrupted in a disaster. This is based on the amount of business that will be lost, as well as other business damages such as reputation loss, for example. This is clearly a business rather than an IT calculation, and it's often difficult to develop. One of the most common errors in disaster recovery planning is misestimating the potential cost of a business outage to the corporation. Business impact can be hard to assess and has multiple aspects. Instead of relying on just one estimate -- for example, an internal computation of the cost per minute of a business interruption times the maximum number of minutes before systems can be restored -- businesses often seek multiple estimates from different experts who approach the issue from differing perspectives, Burris says. Some companies, for example, will ask their investment banker for an estimate of the impact of a business interruption on the organization's capitalization. Another alternative, Burris says, was to have a company that specializes in investigating business disasters create an estimate of potential loss. They also can usually provide a good estimate of the probability of the disaster occurring. In the case of Floyer's client, the disaster planning team calculated the average dollar value of a transaction and the average number of transactions per minute to arrive at a basic potential loss per minute of lost data. They also needed to calculate the probability of a regional disaster that would take both the local data centers down. Probabilities of various disasters are usually based on historical information -- how often these events have happened in the past -- and often are publicly available. Based on these calculations and the average amount of data that would be lost under its existing daily backup schedule, they estimated that the company could expect one regional disaster taking down both data centers every decade, for a staggering loss of $2.5 billion a decade, or $250 million per year. The best they could do by improving their disaster recovery processes would reduce this to about $1 billion, or $100 million per year. The team then approached an insurer and found that the annual premium to ensure against a regional disaster would be at least $100 million. The team then looked at the annual interest the firm would lose if it self-ensured by posting a reserve as required by the International Convergence of Capital Measurement and Capital Standards (Basel II). The annual lost of income from the reserve was well over $100 million. Given these alternatives, the three data center solution was the obvious choice. The payback period was seven months, with a net present value over three years of over $150 million. Invitations to the table Burris and Floyer suggested that at least four and possibly seven groups need to be represented at disaster recovery planning sessions: 1. CXO-level corporate management and possibly corporate directors who must make the final strategic and financial decisions. 2. The head of the line(s)-of-business the disaster recovery solution will serve. 3. Facilities or operations management, which must provide an assessment of relevant external factors such as the proximity of earthquake fault lines, chemical or nuclear power plants and so on, to the data center that increases the risk probability. 4. IT, which must quantify the potential risks and present the technical disaster recovery options for mitigating that risk. 5. Corporate auditors to ensure that auditing procedures are included in the recovery plan. 6. The corporate compliance officer or legal counsel to discuss regulatory and other potential legal exposure, depending on the nature of the organization's business. 7. Outside consulting to aid the planning process and ensure that nothing important is missed, important if the organization lacks depth of internal experience in disaster recovery planning. Keep it simple "In a disaster, nothing will work as planned," says Krischer. "So you have to improvise." To allow that, companies need to keep their plans as simple and flexible as possible. One of his clients focused much of its planning effort on ensuring that key business executives would be reachable in emergencies to make the business decisions on what to do. Discussion focused on what was adequate emergency communications and whether, for example, the disaster recovery budget should include satellite phones for those executives, and whether they would keep those phones charged and constantly with them if it did. Also, he says, "Users will accept lower service levels in a disaster," so IT doesn't have to recover all systems immediately to normal service levels. Practice, practice, practice Floyer's IT client had a second item on its agenda. IT was testing its disaster recovery plan twice a year, but the CIO had less than complete faith that it would work in a real event. "They were testing an ideal scenario with historical data, and when real disasters happen, a lot of other things go wrong," Floyer says. "The overall testing strategy is one of the most important things that you have to get right." The literature is replete with stories of disaster plan failures. "They wanted to move operations from one center to another regularly, to make what is essentially a disaster recovery from center A to B or C part of the normal way activities were scheduled." That required an expenditure of time and money but is the best way to reduce the risk that they would suffer major complications in a real disaster. Budget and time Finally, Burris says, "Business management must commit to supporting the plan, not just talking about it. The level of that commitment is expressed in how close the level of funding they authorize approaches the ideal funding level and in their willingness to commit their own time to planning, testing and other activities that will prepare the organization for the eventual disaster." Without that level of commitment, he says, IT can't hope to develop an adequate disaster response. -=- Bert Latamore is a journalist with 10 years' experience in daily newspapers and 25 in the computer industry. He has written for several computer industry and consumer publications. He lives in Linden, Va., with his wife, two parrots and a cat. ____________________________________ Visit the InfoSec News book store! http://www.shopinfosecnews.org
This archive was generated by hypermail 2.1.3 : Thu Aug 09 2007 - 23:41:54 PDT