The ability to maintain production and general system availability under any stoppage circumstances, ranging from routine maintenance to a power outage or a natural disaster, is a key component of business continuity. One needs only to recall the Blackout of 2003 and its associated costs to begin to grasp the importance of this issue. Most businesses can afford little, if any, downtime, so when the plug was pulled last year on thousands of businesses across the Northeast and Midwest, why were so many of them unable to maintain normal operations? The answer lies in the complexities surrounding the nature of true business continuity, which requires appropriate solutions to the various business continuity threats facing organizations.
Perhaps the only value businesses can derive from the blackout is from analyzing how they fared during it, and how this experience can be applied to their future business continuity efforts. A survey by Mirifex (an Ohio-based business technology consulting firm) and Case Western Reserve University provides some insight into losses from the blackout, and the current state of business continuity preparedness from the affected region.
• More than one-third of the respondents have no risk management or disaster recovery plans in place.
• Two-thirds of the respondents lost at least a full business day due to the blackout.
• Over one-fifth of the respondents lost more than $50,000 per hour of downtime, and nearly four percent lost more than $1 million for each hour of downtime.
• Nearly half of the respondents said that lost employee productivity was the largest contributor to losses due to the blackout.
• Nearly half of the respondents (46 percent) will invest more in risk management, business continuity and/or disaster recovery in the future.
When the Lights Go Out
The first piece of equipment to consider obtaining when preparing for combat blackouts and power surges is an uninterruptable power supply (UPS) for all servers. Generally, a UPS is not that expensive, and one can provide temporary power to properly shut down the servers during a blackout or transfer the system from utility power to an on-site electrical power generation system.
The most effective solution for a blackout is, for obvious reasons, an on-site electrical power generation system. If this is not practical for your organization, and if you have remote offices outside of the affected area that need to access servers at your location, off-site redundant systems are your best alternative.
Off-site redundant systems will not turn the power on at your location, but they will provide failover to a secondary system at a remote location unaffected by the blackout. This allows users at other locations to access the same applications and data they normally access at your location. You will still lose productivity at your location, but employees at other locations will maintain normalcy. Redundant systems mitigate and localize productivity loss in the same way that blackouts localize power loss.
Off-site redundant systems can be implemented in many ways depending on the needs of your organization. They may be located at one of your remote offices or at a collocation facility. They may synchronize data from your primary system in near-real time, or the primary system may periodically detect and save recent updates, then send the updates as batch files to the redundant system for synchronization.
The first step toward implementing a redundant system is to determine which applications and data are most critical to run your business in the event of a blackout or other disruption. Then, you need to determine the time window that your organization can tolerate to execute the failover, which may include the time required for data synchronization of batch files recently uploaded to your secondary system. The final steps include installing the critical applications and data on the secondary system, configuring an identical environment to the primary system and scheduling data synchronizations.
You also need to consider your tolerance for temporary data loss. This particularly applies to redundant systems that store updates in batch files on the primary system and periodically upload them to the secondary system. Business disruptions often occur before the primary system backs up recent updates and uploads them to the secondary system. In this particular scenario, the secondary system cannot obtain and synchronize the most recent updates until the primary system is back online. To minimize this temporary data loss, you should schedule incremental backups, which store only data changed since the last incremental backup, at intervals as short as possible.
Following a failover operation, when the primary system is again available, redundant systems are able to fail back to the primary system by synchronizing the changes that occurred while users were accessing the secondary system. The secondary system detects changes, saves them, and uploads them to the primary system, where they are synchronized. Users are then transparently re-directed to the primary system, simply by changing the IP address that their computers use to locate the system.
The keys to maintaining a redundant system is verifying that synchronizations occur accurately and as scheduled, periodically testing failover and failback procedures, and training personnel in several locations to perform them.
Other Forms of Failure
While redundant systems may help reduce the cost of a blackout for your organization, they are even more appropriate for other threats to business continuity, such as equipment failure, massive data loss or routine system maintenance. These events occur at every organization regardless of size, complexity or location. The ability to fail over to a secondary system when the primary system or its data is temporarily unavailable is critical for maintaining productivity throughout your organization.
Another form of redundant systems can and should reside on your primary and secondary systems. Disk mirroring, for example, provides a mirror image on a secondary hard disk of the data on your primary hard disk. If the primary hard disk or other hardware (such as a controller that communicates with the primary hard disk) fails, the system detects the hardware failure and automatically fails over to the secondary hard disk. Without disk mirroring, you would either have to fail over to a secondary system or wait long hours for the installation of a new hard disk and restoration of its data from backup. Following the restoration, users would need to re-enter changes that occurred since the last backup. This results in lost productivity and potential inaccuracies in re-entering changes.
Equally important to redundant systems is keeping some form of backup. Backup is just what you need when your system is available, but several files were lost, corrupted or accidentally deleted. There is no need to fail over to a secondary server just to restore a few files.
Although backup represents the origin of business continuity, it has evolved along with the complexity of modern enterprise software. The problem that initial, traditional backup solutions solved was intended for disk crashes, where maintaining copies of information seemed adequate. The assumption was that if you had several copies of the same information stored in different locations, the information contained inside these files was safe. However, there was no ability to check data quality or data integrity. The system only backed up and restored the same data—good or bad.
Such traditional backup solutions are less effective for today’s enterprise systems, which are built on the integration of enterprise applications (software that makes information easier to work with), along with intellectual capital (the automation of processes), while leveraging relational database management systems that enable relationships and dependencies to be created and maintained among documents or objects.
Today’s enterprise systems create a new requirement for backup solutions to understand, backup and restore not only data, but also data relationships and the interdependencies within a business process. A modern business-continuity system should not only understand the data relationships but also check for data integrity, in order to ensure that the relationships are intact and repaired prior to backup. It should also be able to incrementally back up and restore modified or selected objects, rather than requiring restoration of the entire system only to recover several files. A final consideration is the impact on end users. If they must log off of the system in order for a backup or restoration to occur, it results in lost productivity and frustration.
Gauging the Viral Threat
Blackouts, physical attacks and natural disasters are destructive at many levels, but these are fewer and further between than the greatest threats to business continuity and information assets: cyber attacks and information theft.
While the blackout grabbed the headlines and focused our attention on business continuity among other issues, another threat to business continuity of a more insidious, far-reaching and frequently occurring category than blackouts occurred during the same week and received far less attention.
The Blaster worm infected more than 1.4 million computers in four days, and it was followed by multiple variants as well as two of the worst worms in history, Sobig and MyDoom.
According to a 2003 computer crime and security survey of 251 organizations conducted by the Computer Security Institute and the FBI, the actual monetary damages caused by various cyber-risks is formidable:
• Total annual losses due to computer crime amounted to almost $202 million.
• Since 1999, the theft of proprietary information has caused the greatest financial loss of any type of cyber-crime (over $70 million was lost in 2003, with an average reported loss of approximately $2.7 million).
• The sources of the attacks included independent hackers, disgruntled employees, U.S. competitors, foreign governments and foreign corporations.
Carnegie Mellon University’s CERT Coordination Center, which has been monitoring computer-security incidents since 1988, reports a steady and dramatic increase of computer attacks, from six in 1988 to 76,404 in the first half of 2003 alone. Unfortunately, the increase in reported attacks has been accompanied by an increase in reported vulnerabilities, from 171 in 1995 to 1,993 in the first half of 2003.
Furthermore, according to a June 2003 presentation to Silicon Valley executives by the Internet Security Alliance—a nonprofit collaboration between the Electronic Industries Alliance and the CERT Coordination Center—the tools used to design and deliver attacks are increasingly sophisticated and widespread, while at the same time, requiring less technical knowledge to operate.
The presentation includes cost estimates for lost productivity and clean-up due to the following viruses:
• Klez: $9 billion
• Love Bug: $8.8 billion
• Code Red: $2.6 billion
• Nimda: $1.2 billion
• Slammer $1 billion
Some ways to defend against viruses include anti-virus software and continual updating of its virus-definition tables, installation of software patches that fix vulnerabilities, and educating users on the identification and proper treatment of suspicious e-mails and their attachments.
The defenses against network intrusions include firewalls, intrusion-detection software and strong passwords that are not words and contain at least one punctuation mark or other symbol, such as an asterisk.
For Your Eyes Only
Today, business continuity is expanding beyond backup, redundant systems, virus protection and intrusion detection. It now also incorporates protecting an organization’s confidential information from accidental or intentional misuse, which may result in compliance breaches, loss of productivity, competitive advantage, shareholder wealth and possible fines and litigation.
Some companies do not put confidential documents in their enterprise systems because there is no way to protect the documents from users and administrators. There is even concern about authorized end users who may need confidential information in order to perform their jobs. The main problem is that, while access to systems and content may be restricted via user authentication, there is no way to restrict what end users do with the information following access.
The problem also extends beyond the organization as companies work together to achieve business goals such as mergers and acquisitions, and outsourcing for strategic projects. The information shared with outside parties for these purposes is among the most strategic and valuable to an organization, and, until now, there was no way to control their use of the information while maintaining the ability to collaborate.
Secure collaboration, the newest addition to the business continuity landscape, addresses the need to encrypt and protect content from internal and external users while enabling collaboration, the most fundamental requirement of business. Secure collaboration systems encrypt content, restrict content access via authentication, and provide granular post-access control over activities such as cutting and pasting, printing, saving and forwarding e-mail.
The criteria for evaluating whether a secure-collaboration solution should be part of your business-continuity solution are simple: you should consider secure collaboration if you have sensitive, confidential data that is a part of the business process, that must be used for collaboration internally or externally, or that must be stored or accessed according to privacy laws such as the Gramm-Leach-Bliley Act and the Health Insurance Portability and Accountability Act (HIPAA).
Corporate Compliance
A final boon to business continuity is the recent flurry of laws enacted to combat corporate malfeasance, terrorism, identity theft, misuse of personal information and to ensure privacy and portability of healthcare information. Compliance officers are well aware of the high-profile regulations now facing them, which have a major impact on information systems, such as HIPAA, SEC Rule 17a-4, NASD Rules 3010 & 3110, the Gramm-Leach-Bliley Act, the Sarbanes-Oxley Act, the USA PATRIOT Act, the California Security Breach Notice Law and the Basel II Accord. While each regulation focuses on a specific area and requires specific applications for data processing and compliance monitoring, they all have one thing in common: they dramatically increase storage, business-continuity and information-security requirements.
For example, Sarbanes-Oxley, by requiring all public companies to certify financial reporting and internal controls, places great importance on version control to document the evolution of financial reports and their supporting data and to ensure that final documents really are the final versions. These requirements are prompting banks and other businesses to implement enterprise content management (ECM) systems with built-in version control and audit trails indicating who accessed the content and when, and what changes were made. ECM systems may also include configurable workflows that automate content review and approval procedures, and help to ensure compliance with internal processes and controls.
While ECM systems aid in compliance with laws such as Sarbanes-Oxley, they require a considerable storage allocation in order to store all versions of the content as well as the audit trails, workflows and relationships to other content items in the system. The storage requirements do not end there, because all critical data that is stored must also be backed up and again stored in a separate location.
SEC Rule 17a-4, NASD Rules 3010 and 3110 and Basel II place an even greater requirement on storage and backup systems. SEC Rule 17a-4 and NASD Rules 3010 and 3110 require financial-services firms to supervise and record all electronic communications related to their business for a minimum of two years, while Basel II requires banks to archive two years of data to prove that they maintain minimum capital adequacy to cover their financial exposure.
The storage and data retention requirements of these and other laws created a need for archiving, the final category of business continuity. Archiving involves moving data at the end of its life cycle from more expensive primary storage to less expensive archival storage media. It also usually involves protecting data from modification. The same requirements for modern backup systems also apply to archiving: the archiving solution should understand and verify data relationships, and the archived data should be readily available if needed. Archiving plays an important role in complying with governmental mandates for information storage. If regulators ask for content that must be stored over several years and an organization cannot produce it, it may seriously impact an organization’s ability to do business.
Continued Improvement
Business continuity and information security will continue to gain notoriety in the form of general awareness, high-level meeting agendas and media coverage, and they will maintain their new place as one of the top priorities for national, organizational and even personal security. Information security is now even a household concern, due to the widespread adoption of personal computers and Internet access, and the rampant incidents of viruses, fraud and identity theft. Now that the threat is real and affects every one of us, business continuity and information security will only continue to improve. There is little we will ever be able to do to stop someone from attempting a cyber-attack, or to stop all disasters from happening, but there is much that we can and will continue to do to dramatically reduce their disruption and damage to business, the economy and our lives.
Elaine S. Price is co-founder, president and CEO of CYA Technologies in Trumbull, Connecticut.