White paper: Power Processor-based systems RAS

Johannesburg, 06 Dec 2019
Reliability generally refers to the infrequency of system and component failures experienced by a server. Availability, broadly speaking, is how the hardware, firmware, operating systems and application designs handle failures to minimise application outages. Serviceability generally refers to the ability to efficiently and effectively install and upgrade systems, firmware and applications, as well as to diagnose problems and efficiently repair faulty components when required. These interrelated concepts of reliability, availability and serviceability are often spoken of as ”RAS”.

Within a server environment, all of RAS, but especially application availability, is really an end-to-end proposition. Attention to RAS needs to permeate all the aspects of application deployment. However, a good foundation for server reliability whether in a scale-out or scale-up environment is clearly beneficial. Systems based on the Power processors are generally known for their design, emphasising reliability, availability and serviceability capabilities.

Previous versions of an RAS white paper have been published to discuss general aspects of the hardware reliability and the hardware and firmware aspects of availability and serviceability. The focus of this white paper is POWER9 Processor-based systems using the PowerVM hypervisor. Systems not using PowerVM will not be discussed specifically in this white paper.

