http://www.expresscomputeronline.com/20060731/management02.shtml
By Vinita Gupta
31 July 2006
The one important lesson that many organisations learned after last year's
deluge in Mumbai is the significance of maintaining a disaster recovery
(DR) site. Those companies that coped did so on by virtue of the fact that
they had a DR site to resume operations from. The other reason for the
need and growing acceptance of DR sites is the increasing number of
business applications and the increasing dependence of organisations on
IT.
Though SBI Mutual Funds (SBIMF) was not affected by the flooding last year
(or this year, for that matter), they have nevertheless invested in a DR
site and are quite confident that if disaster hits they are prepared for
the worst.
The need
The reasons that compelled SBIMF to go in for a DR site were internal risk
management, regulatory guidelines from the Securities Exchange Board of
India (SEBI), and the desire to gain the investor's trust.
Says Subhojit Roy, SBIMF's Head of IT, "Business Continuity (BC) is very
important, especially for the financial services sector. DR is a sub-set
of BC. To keep the business running in case of a disaster such as a
natural calamity, or the break-down of the primary data centre, DR is a
must." SBIMF has many applications such as publication of the net asset
value (NAV) for investors which need to be released everyday irrespective
of disasters.
The mutual fund's (MF's) workings are regulated by SEBI guidelines, which
stipulate that the MF and its registrar and transfer (R&T) agents and
custodians should have an offsite back-up facility and business
contingency plan that is tested and evaluated on a regular basis. The
business contingency plan should be comprehensive and should cover IT,
infrastructure and personnel requirements.
Since the MF works with intermediaries such as banks, custodians, R&T
agents and brokers, the level of restoration of normal operations by the
MF and the time taken for different levels of normalcy will depend on the
individual DR implementation of its partners.
The need for DR among MF providers also arises as investors want to know
the level of preparedness of the provider before investing. Notes Roy, "BC
has become quite critical in the financial services sector. Apart from
regulatory requirements for DR and BC, institutional investors also like
to know before investing whether risk management practices are in place or
not."
Process and implementation
The process of DR planning began in May 2005, and the final implementation
started in February this year; the DR site went live in June.
SBIMF's DR site is at Chennai. The reason for choosing the TN capital was
because it does not fall in a high seismic activity zone. Apart from
deciding the location of its DR site, SBIMF had to decide on the cost and
modalities i.e. whether it would be deployed and managed by an in-house IT
team or whether it would be outsourced. Says Roy, "SBI had already set up
a complete DR site in Chennai for its core banking and ATM network, and
they provided space and infrastructure in their DR data centre to us."
Though the site has well-equipped infrastructure, skilled personnel and
BS7799 certification, SBIMF had to establish its own systems at the site.
The stages of DR
* Level 1. The first level of DR implementation consisted of planning and
implementing policy-based strategic back-up management, back-up
strategies, data consolidation and tape vaulting at the offsite
facility. Informs Roy, "We are presently taking daily back-up of data of
all critical servers. The back-up tapes are stored in a fire-proof
cabinet in our office as well as in the bank's locker for offsite
storage."
* Level 2. The second step included charting out the critical components
and designing a redundancy plan. Most of the servers and active network
components are critical to the operations. A single point of failure in
such components can raise the risk of disasters and bring the entire
business to a halt. In this level the redundancy path is designed to
avoid total disruption. "As a result of risk mitigation, you get
different redundancy designs for critical network components. All single
points of failure are treated for redundancy planning," adds Roy.
* Level 3. Finally, in the third stage of DR, the primary site is offered
an alternative site of operation to undertake business critical
processes within the stipulated recovery time objective (RTO) and
recovery point objective (RPO). While setting up a DR site, an
appropriate data recovery solution is defined to satisfy the needs of
RTO and RPO.
Applications on the DR site
SBIMF is running business applications such as Mfund, and front office and
cash management systems at the DR site. All business-critical applications
like Oracle database, and the mail, file and print server, are being
replicated.
Roy says, "Based on business impact analysis and the objectives of BC such
as RTO and RPO, we have selected these applications and data replication
technology. The applications are front office and back office systems
(running on Oracle 9i), the cash management system (also running on Oracle
9i), portfolio management system (running on MS-SQL), centralised mailing
system (Lotus Domino 6.5.3) and files of mapped drives of all the users in
the network of the primary site." Non-critical applications such as
workflow applications are not part of DR.
Technology used
SBIMF has about 50 branch offices which look at sales and investor
servicing. All these branches are connected to the corporate office (at
Cuffe Parade in Mumbai) through the WAN. Data from the branches is
collated at the centralised server located at the corporate office. The
servers are Intel-Windows-based.
Data is replicated in two ways: host-based replication and consolidated
replication. Host-based replication means data replication from one system
at the primary site to a similar system at the DR site. It is
application-level replication, which means it can be done at the
application level (like Oracle Data Guard) or through third-party
software. The other way is to consolidate the data from the various
servers into a single storage box (like a SAN or NAS box), and then
replicate the data of different applications from the external storage box
to another similar box at the DR site.
SBIMF has chosen to replicate its data by following a consolidated
replication method. SBIMF first consolidates all critical server data
through storage consolidation. With the use of Network Appliances
fibre-attached storage (FAS), they replicate all the data to a similar FAS
device at the DR site. At present, the servers are also accessing the FAS
box.
Informs Roy, "We have done data consolidation at the primary site, that
is, the corporate office. All the critical data of the Oracle, mail and
file servers have been migrated into a unified storage box." Critical
data of SBIMF gets replicated every four hours; this means that whatever
data there is in the FAS box in the primary site gets replicated to the
FAS box at the DR site. The less critical data is replicated at the end of
the day to reduce bandwidth utilisation during working hours.
At the primary site, SBIMF is using a Tandberg autoloader and Veritas
back-up software for archival. Earlier, back-ups were taken into SDLTs
without an autoloader. For the connectivity part, the primary and DR site
are connected by leased lines of 2 Mbps. Since they are using the same DR
site as SBI, SBIMF could leverage it. Reveals Roy, "SBI has set up a
leased line between its Chennai DR site and the central hub in Mumbai; we
too are connected to the central hub of the SBI through a leased line of 2
Mbps. Because of this we saved on the cost of setting up our own leased
line connecting the DR site to Mumbai."
Role of BCP committee
The company's Business Continuity Planning (BCP) committee is the
highest-level committee for DR. This committee takes the final decisions
on actual disaster situations, and based on its decision the BCP team will
act. Typically, a BCP committee comprises the top management team, members
of different functional areas, and the IT team. "The BCP team is
responsible for reviewing the DR / BC plan, testing the DR site
periodically through live DR drills with the help of users, and has a
specific role to play in case of disaster," says Roy.
The challenges faced by the SBIMF team in setting up the DR site were
selection of the site, making a complete DR / BC manual, involving all
departments / functions of the company, planning appropriate technology
for the DR requirements of the company, continuous review and updating of
DR / BC processes, and regular testing of DR. The testing to ensure
accuracy of the DR site is conducted every quarter.
First step to BC
Roy believes that having a DR site is the first step towards BC. If any of
the server components in the primary site is down, they can work from the
DR site till the primary site's equipment is revived. "Business impact
analysis helps as it gives us a complete picture for setting up an
alternate operational site for BC, and also the manpower requirements for
BC. It is useful in building adequate redundancy in the present
infrastructure, and a complete DR / BC manual by giving everybody clear
guidelines for disaster situations."
_________________________________
Attend the Black Hat Briefings and
Training, Las Vegas July 29 - August 3
2,500+ international security experts from 40 nations,
10 tracks, no vendor pitches.
www.blackhat.com
This archive was generated by hypermail 2.1.3 : Mon Jul 31 2006 - 00:35:52 PDT