Server Availability and the Data Centre Tier Classifications

Availability of the server room and datacentre is a measure of the designed operational performance and the percentage of the time the server facility is operational. This percentage is often represented by the rule of nines – as shown in table 1. Availability design considers the server room's critical systems, including, but not limited to, cooling and power delivery from main supplies and UPS’s.

The redundancy of these critical systems also needs consideration. While availability is key for the larger scale Datacentres, it’s not necessarily so critical for the small datacentre. Still, the planning and design principles used would be useful to adopt all the same. Whatever the data centre's size, the core critical systems should have a considered maintenance schedule in place to ensure maximum availability is ensured.  


Availability and the Nine Classifications


Calculating percentage availability (A) for the datacentre design relies on two familiar measurements, known as Mean Time Between Failure (MTBF) and Mean Time To Repair (MTTR).

This relationship is stated in the following formula:

A% = ((1- (MTTR/MTBF)) x 100) 

From this percentage, you can then state this as downtime per day, week, month and year.


Table 1



Although 99.9% looks like an achievable availability, this would potentially give a downtime of 1.44 minutes per day, which may well be a risk too far in a critical server room or datacentre. A way will need to be found to increase this to 99.99% or higher.                          

Even at 99.9999999%, there is still a risk of 86.4 msec downtime, although seemingly small, this may be an issue in certain systems!


Tier Classification Levels


Tier classifications for datacentres were developed by the Uptime Institute more than 25 years ago. 

The Uptime Institute is a globally recognised player in the datacentre industry. They created and now administer the Tier standards and certifications of designs and operations to the Tier levels. Certification ensures the datacentre clients have a strategy to solve wide-ranging challenges to ensure downtime avoidance is maximised.
 

The Four Datacentre Tier levels

 

Tier 1 


A Tier 1 datacentre is a basic level adopted by computer rooms and many server room operators. Tier 1 is designed to support IT for office environments through a single pathway with no equipment redundancy. Although it protects against human error disruptions, it does not protect against unexpected failures.                                                                                                             
Tier 1 guarantees two-nines at 99.671%, allowing no more than 28.8 hours of downtime per year. This is the minimum level for Uptime Institute certification with zero power or cooling equipment redundancy. The system will have to be shut down for preventative maintenance and repairs. Failure to carry out maintenance regimes will increase the risk of unplanned and sometimes catastrophic system failures.                                                                                 


Tier 1 Datacentre Requirements

 
  • An Uninterruptible Power Supply (UPS) to protect against Power black-outs, brown-outs and surges.
  • A cooling system dedicated to the system that runs 24/7
  • Back-up power generator for power black-outs  
 

Tier 2 


A Tier 2 datacentre has improved availability over Tier 1. This is achieved with redundant capacity within the critical power and cooling systems, giving a greater safety margin against disruptions. Like Tier 1, Tier 2 is not fault-tolerant, so if there is an unexpected shutdown, availability will suffer.                                                                                                            
Tier 2 also guarantees two-nines, but at 99.741%, this allows no more than 22 hours of downtime per year. Maintenance is improved over Tier 1 due to partial redundancy but still has to be shut down for preventative maintenance and repairs. As before, failure to carry out maintenance regimes will increase the risk of unplanned and sometimes catastrophic system failures.  


Tier 2 Datacentre Requirements and Redundant Equipment


Requirements as per Tier 1. The below components can be removed for maintenance without shutting the system down.
  • N+1 Fault-tolerant redundancy – see Table 2
  • UPS systems
  • Power generators
  • Energy storage
  • Chillers
  • Cooling units
  • Heat rejection equipment
  • Pumps
  • Fuel tanks and fuel cells
 

Tier 3 


Tier 3 is the typical level for datacentres that provide public and private internet cloud services. A Tier 3 datacentre is concurrently maintainable with multiple redundant distribution pathways and components. Like Tiers 1 & 2, Tier 3 is not fault-tolerant, so if there is an unexpected shutdown, availability will suffer.                                                                        
Tier 3 guarantees three-nines at 99.982%, allowing no more than 1.6 hours of downtime per year. Tier 3 facilities do not need to be shut down for maintenance or equipment replacement. Again, failure to carry out maintenance regimes will increase the risk of system failures.  


Tier 3 Datacentre Requirements 


Requirements as per Tiers 1 & 2
  • 2 Delivery pathways
  • 72 hours minimum back up power from UPS's and standby generators exclusive to the Tier 3 system. (i.e., no external power sources)

Tier 4


A Tier 4 datacentre differentiates itself from a Tier 3 data centre by having multiple independent and physically isolated systems providing redundant equipment capacity and distribution pathways. The separate capacity prevents compromising the system when subject to planned or eve unplanned disruption.                                                                                  
Tier 4 facilities have added fault tolerance to the critical power and cooling systems, ensuring continuous operation and stable environmental conditions. No single outage or error can disrupt the system.
Tier 4 guarantees four-nines at 99.995%, allowing no more than 26.3 minutes of downtime per year – with maintenance and emergency work not affecting service delivery.

When the redundant equipment or distribution pathways are shut down for maintenance, there will be a higher risk of disruption if a failure occurs in the active paths. Once again, failure to carry out maintenance regimes will increase the risk of system failures.  


Tier 4 Datacentre Additional Requirements

 
  • Multiple delivery pathways
  • All equipment must have a fault-tolerant power design
  • Zero failure points for all processes and data pathways
  • 96 hours minimum back up power from UPS's and standby generators exclusive to the Tier 3 system. (i.e., no external power sources)
Table 2

Summary


Smaller computer and server rooms may not have the need or, indeed, the budget for a Tier 3 or Tier 4 datacentre. However, it is good practice to consider availability and risk mitigation and incorporate some if not all of Tier 1 and Tier 2 facilities' design principles. An additional low-cost monitoring system can be added to monitor environmental factors that could cause system failures and highlight possible system design issues to ensure a resilient IT system.

This, along with a regular preventative maintenance regime, will ensure availability levels to match the organisations' required acceptable downtime levels. Basic requirements would be for a UPS system with a bypass to allow for maintenance – if the budget allows, a modular system with hot-swap modules is even better as it will provide redundancy, two load-sharing air conditioners for cooling redundancy and rack and room environmental monitoring for temperature and humidity.

Finally, some form of fire suppression system should be considered. Although maybe not feasible budget-wise for small systems, in-rack systems are worth consideration as they could mean the difference between a minor disaster and catastrophic destruction. 
Related Products