Design Article
Single-box high availability is adaptable to network security
Ayikudy K. Srikanth, Director of Software Engineering, Crossbeam Systems, Inc., Concord, Mass.
10/14/2002 10:45 AM EDT
Never has the need for nonstop network security been so important. Viruses, denial of service attacks and hacker intrusions threaten corporate and service provider networks on a constant basis. Because of this critical demand, high availability is imperative in all levels of network security.
Availability "most" of the time just doesn't cut it when threats could destroy vital corporate information and expose sensitive subject matter. That is why the "four nines" reliability and availability currently the norm in most data-centric network centers is not enough. Telecom level and above "five nines" availability (99.999%) is a critical element of a security infrastructure. Despite their useability advantages, there are inherent shortcomings of dual-box high availability necessitating a shift to the newest single-box high availability technologies.
Network failures and security lapses result in losses of hundreds of millions of dollars to businesses each year. These network failures can happen as a result of failure of a single network device or because of a disaster striking the area where the key network devices are housed.
According to a study by a major analyst group, the network outages may cost a business up to $200,000 per minute. Even though a user may be satisfied with 99.9 percent availability of his network, 0.1 percent unavailability can be very damaging when an endless stream of data moves daily through the network.
Thanks to a parade of internal and external threats, corporate and service provider networks are more vulnerable. In major enterprises and service providers, everyone shares a common fear: the network is going to be breached. You don't know how, when or why, but it will happen.
The good news: companies are are investing in a slew of new software solutions such as firewalls, IDS, anti-virus and URL filtering to stem security breaches. But an increasingly complex network has emerged, it is one that, even with all the safety measures, is still one step behind the hackers. Each of these security applications requires deploying multiple servers and load-balancers; with the average network technician able to manage no more than 20 individual devices, it becomes necessary to hire and train additional personnel to implement, operate and support this infrastructure.
At best, you're left with a network that may be, for the moment, fairly safe, but is rife with performance bottlenecks, manageability problems and scalability issues. The dilemma is how to create a security solution that scales, is easy to manage and keeps physical space to a minimum. And five-nines reliability isn't a luxury, it's a necessity. There needs to be self-provisioning, so that if part of the security infrastructure fails, traffic is automatically directed elsewhere, without disrupting the stateful inspection of the flow.
The most commonly used way to achieve a redundant security infrastructure is through "dual-box high availability (one active, one standby)." Here, two or more redundant physical devices, such as a routers and servers, provide backup for a master device in the network. This ensures that hosts or clients can maintain connectivity to the network, even when one physical device fails, without user intervention to change the configuration. Here, virtual router redundancy protocol (VRRP) is used to allow multiple routers to act as a single virtual router on the network. Thus, if one of the routers fails, connectivity is still maintained.
Running VRRP is a key step for providing redundancy in a network, but it is not an all-encompassing solution. This protocol provides excellent backup for the basic routing of network traffic, but needs complementing solutions when there are several applications running that require preservation of state-particularly security applications.
As packet processing has moved from simple IP routing to content-based routing and stateful inspection, the firewall now examines each packet flowing into the network: not just the header information, but also the contents of the packet up through the application layer. This is done to determine more about the packet than just information about it's source and destination.
While this is key to keep the network secure, it is also a tremendous strain on the network infrastructure. Under a dual-box scheme, when the device running firewall fails, the application fails over to a redundant device. When that happens, it is nearly impossible to preserve state for the firewall, as well as the other applications in the security layer unless each of these applications have their own synchronization mechanisms between the two physical devices.
However, this means additional management headaches as well as a multitude of network traffic as each of these applications synchronize between the two boxes with their own specialized messages. This leads to the clustering concept . Wouldn't it be nice if two physical devices can act as one entity to the external world from both traffic and management point of view yet somehow share all other internal processing knowledge between them?
"Clustering" allows multiple physical entities that do a particular task to appear as though they were a physical entity, for management purposes. The separate physical network devices act as a single network device that share one IP address. The concept has gained particular mindshare in storage, where clustered pools of disk media appear as a single storage device, for example.
For firewalls, dual-box high availability in a clustered arrangement has potential. Stateful firewall processing can be done using a cluster of devices whose members are identically configured and can act independently yet appear to the external network as a collective whole. Among the advantages of this approach is that it provides a powerful redundancy mechanism where failover happens with no interruption to the end user.
It is highly scalable as new members can simply be added to the cluster or defective members removed without having to bring the services down. Hardware and software upgrades can also be done while the services are in operation. Dual boxes allow for two physical devices to be housed in different physical locations and thereby helps in disaster recovery
However, there are significant disadvantages. First, most clustering solutions are composed of equipment from a single manufacturer. Industry analysts have shown that security breaches are more likely to happen when a network relies on a security infrastructure from just one vendor. Also, clustering multiple layers of security requires a roomful of servers, switches and load balancers that could overtax even the most robust network management team.
One device does it
What if a single device could provide high availability? A new virtual single box approach involves a chassis equipped internally with network processing, control processing and application processing modules that can dynamically "self -heal" without compromising network security or backing up network traffic.
Network traffic enters through a network-processing module (NPM), composed of physical and virtual network interfaces that assures policy-based flow classification to application processing modules. The NPM handles load balancing, redundancy and fault tolerance automatically, greatly reducing the number of network devices and configurations previously required.
Application-processing modules (APM)-individual processors-run individual security applications. One could run firewall, another IDS, a third anti-virus, for example. Multiple APMs could be pooled to provide greater processing power, should the network require it, for a single application, and policy management would create a hierarchy of importance among the multiple applications. Ideally, the chassis is equipped with at least one "spare" module.
A control processing module (CPM) would provide system-level management functions and system level redundancy support. The CPM would interact with the NPMs and APMs to assure optimized load distribution by monitoring resource utilization across the system.
The CPM must be able to synchronize and back up other modules in the box, without compromising the security applications. The CPM monitors the performance of the APMs-should one of them fail, the CPM would initiate the dynamic reprovisioning of the failed APMs' application onto a spare, or a lower-priority application's APM.
For efficient switchover of traffic to another APM, it is ideal to have all of the modules in one box, as opposed to spread throughout the network, as network congestion could potentially slow down failover between the modules. Ideally, the switchover should happen within seconds, without user intervention and without disrupting or slowing network traffic.
Virtual single-box high availability, if implemented as above, solves major high availability problems related to application high availability. For those users, concerned with disaster recovery and the need to locate devices in two different physical locations, two single-box high availability devices can further be employed in a VRRP configuration via dedicated redundant links between them to complement the single-box high availability features and provide a full high availability solution for application deployment.



