http://www.expresscomputeronline.com/20060731/management02.shtml By Vinita Gupta 31 July 2006 The one important lesson that many organisations learned after last year's deluge in Mumbai is the significance of maintaining a disaster recovery (DR) site. Those companies that coped did so on by virtue of the fact that they had a DR site to resume operations from. The other reason for the need and growing acceptance of DR sites is the increasing number of business applications and the increasing dependence of organisations on IT. Though SBI Mutual Funds (SBIMF) was not affected by the flooding last year (or this year, for that matter), they have nevertheless invested in a DR site and are quite confident that if disaster hits they are prepared for the worst. The need The reasons that compelled SBIMF to go in for a DR site were internal risk management, regulatory guidelines from the Securities Exchange Board of India (SEBI), and the desire to gain the investor's trust. Says Subhojit Roy, SBIMF's Head of IT, "Business Continuity (BC) is very important, especially for the financial services sector. DR is a sub-set of BC. To keep the business running in case of a disaster such as a natural calamity, or the break-down of the primary data centre, DR is a must." SBIMF has many applications such as publication of the net asset value (NAV) for investors which need to be released everyday irrespective of disasters. The mutual fund's (MF's) workings are regulated by SEBI guidelines, which stipulate that the MF and its registrar and transfer (R&T) agents and custodians should have an offsite back-up facility and business contingency plan that is tested and evaluated on a regular basis. The business contingency plan should be comprehensive and should cover IT, infrastructure and personnel requirements. Since the MF works with intermediaries such as banks, custodians, R&T agents and brokers, the level of restoration of normal operations by the MF and the time taken for different levels of normalcy will depend on the individual DR implementation of its partners. The need for DR among MF providers also arises as investors want to know the level of preparedness of the provider before investing. Notes Roy, "BC has become quite critical in the financial services sector. Apart from regulatory requirements for DR and BC, institutional investors also like to know before investing whether risk management practices are in place or not." Process and implementation The process of DR planning began in May 2005, and the final implementation started in February this year; the DR site went live in June. SBIMF's DR site is at Chennai. The reason for choosing the TN capital was because it does not fall in a high seismic activity zone. Apart from deciding the location of its DR site, SBIMF had to decide on the cost and modalities i.e. whether it would be deployed and managed by an in-house IT team or whether it would be outsourced. Says Roy, "SBI had already set up a complete DR site in Chennai for its core banking and ATM network, and they provided space and infrastructure in their DR data centre to us." Though the site has well-equipped infrastructure, skilled personnel and BS7799 certification, SBIMF had to establish its own systems at the site. The stages of DR * Level 1. The first level of DR implementation consisted of planning and implementing policy-based strategic back-up management, back-up strategies, data consolidation and tape vaulting at the offsite facility. Informs Roy, "We are presently taking daily back-up of data of all critical servers. The back-up tapes are stored in a fire-proof cabinet in our office as well as in the bank's locker for offsite storage." * Level 2. The second step included charting out the critical components and designing a redundancy plan. Most of the servers and active network components are critical to the operations. A single point of failure in such components can raise the risk of disasters and bring the entire business to a halt. In this level the redundancy path is designed to avoid total disruption. "As a result of risk mitigation, you get different redundancy designs for critical network components. All single points of failure are treated for redundancy planning," adds Roy. * Level 3. Finally, in the third stage of DR, the primary site is offered an alternative site of operation to undertake business critical processes within the stipulated recovery time objective (RTO) and recovery point objective (RPO). While setting up a DR site, an appropriate data recovery solution is defined to satisfy the needs of RTO and RPO. Applications on the DR site SBIMF is running business applications such as Mfund, and front office and cash management systems at the DR site. All business-critical applications like Oracle database, and the mail, file and print server, are being replicated. Roy says, "Based on business impact analysis and the objectives of BC such as RTO and RPO, we have selected these applications and data replication technology. The applications are front office and back office systems (running on Oracle 9i), the cash management system (also running on Oracle 9i), portfolio management system (running on MS-SQL), centralised mailing system (Lotus Domino 6.5.3) and files of mapped drives of all the users in the network of the primary site." Non-critical applications such as workflow applications are not part of DR. Technology used SBIMF has about 50 branch offices which look at sales and investor servicing. All these branches are connected to the corporate office (at Cuffe Parade in Mumbai) through the WAN. Data from the branches is collated at the centralised server located at the corporate office. The servers are Intel-Windows-based. Data is replicated in two ways: host-based replication and consolidated replication. Host-based replication means data replication from one system at the primary site to a similar system at the DR site. It is application-level replication, which means it can be done at the application level (like Oracle Data Guard) or through third-party software. The other way is to consolidate the data from the various servers into a single storage box (like a SAN or NAS box), and then replicate the data of different applications from the external storage box to another similar box at the DR site. SBIMF has chosen to replicate its data by following a consolidated replication method. SBIMF first consolidates all critical server data through storage consolidation. With the use of Network Appliances fibre-attached storage (FAS), they replicate all the data to a similar FAS device at the DR site. At present, the servers are also accessing the FAS box. Informs Roy, "We have done data consolidation at the primary site, that is, the corporate office. All the critical data of the Oracle, mail and file servers have been migrated into a unified storage box." Critical data of SBIMF gets replicated every four hours; this means that whatever data there is in the FAS box in the primary site gets replicated to the FAS box at the DR site. The less critical data is replicated at the end of the day to reduce bandwidth utilisation during working hours. At the primary site, SBIMF is using a Tandberg autoloader and Veritas back-up software for archival. Earlier, back-ups were taken into SDLTs without an autoloader. For the connectivity part, the primary and DR site are connected by leased lines of 2 Mbps. Since they are using the same DR site as SBI, SBIMF could leverage it. Reveals Roy, "SBI has set up a leased line between its Chennai DR site and the central hub in Mumbai; we too are connected to the central hub of the SBI through a leased line of 2 Mbps. Because of this we saved on the cost of setting up our own leased line connecting the DR site to Mumbai." Role of BCP committee The company's Business Continuity Planning (BCP) committee is the highest-level committee for DR. This committee takes the final decisions on actual disaster situations, and based on its decision the BCP team will act. Typically, a BCP committee comprises the top management team, members of different functional areas, and the IT team. "The BCP team is responsible for reviewing the DR / BC plan, testing the DR site periodically through live DR drills with the help of users, and has a specific role to play in case of disaster," says Roy. The challenges faced by the SBIMF team in setting up the DR site were selection of the site, making a complete DR / BC manual, involving all departments / functions of the company, planning appropriate technology for the DR requirements of the company, continuous review and updating of DR / BC processes, and regular testing of DR. The testing to ensure accuracy of the DR site is conducted every quarter. First step to BC Roy believes that having a DR site is the first step towards BC. If any of the server components in the primary site is down, they can work from the DR site till the primary site's equipment is revived. "Business impact analysis helps as it gives us a complete picture for setting up an alternate operational site for BC, and also the manpower requirements for BC. It is useful in building adequate redundancy in the present infrastructure, and a complete DR / BC manual by giving everybody clear guidelines for disaster situations." _________________________________ Attend the Black Hat Briefings and Training, Las Vegas July 29 - August 3 2,500+ international security experts from 40 nations, 10 tracks, no vendor pitches. www.blackhat.com
This archive was generated by hypermail 2.1.3 : Mon Jul 31 2006 - 00:35:52 PDT