FailSafe

This topic provides an elementary description of Reality FailSafe. The purpose and operation of logging in a FailSafe pair are described. The description is extended to multiple databases.

Overview

What is FailSafe?

FailSafe is a resilience software facility which maintains two identical databases on separate systems so that if one database crashes due to a system or application failure, the other database can be used to maintain service to users.

The FailSafe Configuration

The two identical databases in a FailSafe configuration are designated the primary and secondary. The primary database located on one system is the live database to which users log on. The secondary database located on a second system is maintained as a duplicate of the primary and operates as a standby. It is only used if the primary database becomes unavailable.

The secondary database is closed to all users except the database owner, system super-user (root), or administrator. Even the database owner and super-user should exercise extreme caution when accessing the secondary as update operations on the secondary will lead to loss of synchronisation between the two databases

FailSafe Logging

FailSafe resilience is provided by building on the Transaction Logging concepts discussed in Introduction to Transaction Logging. Changes to the active primary are recorded in an associated clean log file on the primary. They are also transmitted across a dedicated FailSafe LAN to the other system in the FailSafe configuration where they are applied to the secondary database, maintaining it as a real-time duplicate of the primary. They are also recorded in a second clean log which is associated with the secondary database. See Logging Path of a FailSafe Pair.

Before and After images for all primary database updates are written to the raw log on the local system containing the primary database and, via the FailSafe LAN, to the raw log on the remote system containing the secondary database. Refer to the topic The Raw Log for a description of the raw log.

Committed transactions and independent updates in the local and remote raw logs are then written to clean logs for the primary and secondary databases, respectively. Transactions and independent updates logged to the secondary clean log are also applied to the secondary database so as to Shadow the primary.

Logging Path of a FailSafe Pair

Logging Path of a FailSafe Pair

Database Recovery

If a primary database becomes unavailable, for example, due to a system crash, the secondary database can be converted to be the primary, without loss of transaction integrity and with minimum loss of data and service to users. The transfer of users to the secondary is a manual operation.

If a secondary database becomes unavailable, the primary continues unaffected as a stand-alone database.

The failed database, whether primary or secondary, can be recovered by restoring the most recent backup of the database and clean log(s). The restored database can then be re-introduced as a secondary and synchronised with the primary without affecting the users. FailSafe operation is then resumed. In the case of primary failure, primary/secondary roles will be reversed after recovery.

FailSafe Logging Link

The FailSafe Logging link between the two FailSafe systems is normally via a dedicated Local Area Network (LAN). However, if the dedicated LAN fails, the Transaction Logging link can be re-routed temporarily via the user LAN. The tlmenu administration utility provides the facility to define the preferred route (dedicated LAN) and fall-back route (user LAN) with different communications protocols.

FailSafe with Multiple Databases

Reality supports multiple databases on one system. It is not necessary for all databases on a system to operate in FailSafe mode. Some may operate in FailSafe mode, while others may be stand-alone unresilient databases, with or without Transaction Logging. There are also no technical limitations on where the primary and secondary databases in a FailSafe pair are located. However, it is necessary that each half of a pair is on a different system, so that in the event of a system failure, one database remains in service, otherwise the purpose of FailSafe operation is defeated. Unrelated primary and secondary databases may be located on the same system.

FailSafe Operation with Multiple Databases illustrates FailSafe operation in a multiple database configuration, showing two FailSafe pairs (Databases A and B) and an unresilient database not using transaction boundaries (Database C). Note that FailSafe operation can take place in both directions across the machine-to-machine link. Updates from local and remote primaries are stored in the same raw log.

FailSafe Operation with Multiple Databases

FailSafe Operation with Multiple Databases