High Availability Clustering

With Oracle Linux instances running on Oracle Cloud Infrastructure (OCI), you can create high availability (HA) clusters that deliver continuous access to applications and services running across multiple nodes. HA clustering minimizes downtime and provides continuous service if and when system components fail.

You can create HA clusters with OCI instances by installing and using Pacemaker, an open source high availability resource manager, and Corosync, an open source cluster engine. For more information about HA clustering and the Pacemaker and Corosync technologies, see Oracle Linux 9 Setting Up High Availability Clustering and Oracle Linux 8 Setting Up High Availability Clustering.

Prerequisite

Before you begin, configure a shared storage device to be accessible from all nodes that you want in the HA cluster. A shared storage device is needed for cluster service and application messaging, and for cluster SBD fencing. For more information about setting up a shared storage device, see Oracle Linux 9 Managing Shared File Systems and Oracle Linux 8 Managing Shared File Systems.

Setting Up High Availability Clustering With OCI Instances

To set up high availability clustering with OCI instances:

  1. Install the Pacemaker software
  2. Create an HA cluster
  3. Configure fencing

For a tutorial on how to set up a highly available Network File System (NFS) service for OCI instances, see Create a Highly Available NFS Service with Gluster and Oracle Linux.

Installing Pacemaker

To create a high availability (HA) cluster with Oracle Cloud Infrastructure (OCI) instances, you must first install the Pacemaker and Corosync packages on each instance, or node, that you want in the cluster. You can then configure each cluster node, and ensure that the Pacemaker service automatically starts and runs on each node at boot time.

Note

OCI instance, node, and cluster node are used interchangeably with HA clustering for OCI.

Best Practice

For each OCI instance that you want in the cluster, open a terminal window and connect to the instance.

For example, if you want two OCI instances to be nodes in the cluster, open two terminal windows, and connect to each instance using ssh:
ssh instance-IP-address

Having a terminal window open for each node prevents the need to repeatedly log in and out of the nodes when configuring the HA cluster.

Installing Pacemaker and Corosync

To install the Pacemaker and Corosync packages and configure the HA cluster nodes:

  1. Complete the prerequisite in High Availability Clustering.
  2. Enable the repository on the Oracle Linux yum server where the Pacemaker and Corosync packages reside.

    Oracle Linux 9:

    sudo dnf config-manager --enable ol9_addons

    Oracle Linux 8:

    sudo dnf config-manager --enable ol8_addons
  3. On each node, install the Pacemaker pcs command shell, software packages, and all available resource and fence agents:

    sudo dnf install pcs pacemaker resource-agents fence-agents-sbd
  4. Configure the firewall so that the service components can communicate across the network:

    sudo firewall-cmd --permanent --add-service=high-availability
  5. On each node, set a password for the hacluster user:

    sudo passwd hacluster
  6. On each node, set the pcsd service to run and start at boot:

    sudo systemctl enable --now pcsd.service
  7. Create an HA cluster using the nodes you have configured. See Creating an HA Cluster.

Creating an HA Cluster

With the Pacemaker and Corosync software, you can create a high availability (HA) cluster with Linux instances running on Oracle Cloud Infrastructure (OCI).

To create an HA cluster:

  1. Install the Pacemaker and Corosync software packages on each node you want in the cluster. See Installing Pacemaker.
  2. From one of the nodes, authenticate the pcs cluster configuration tool for the hacluster user of each cluster node.

    For example, if you want two nodes to make up the HA cluster, run the following command from one of the cluster nodes:

    sudo pcs host auth node1-resolvable-hostname node2-resolvable-hostname -u hacluster
  3. When prompted, enter the password that you defined in Step 5 for the hacluster user on each node.

  4. Create the HA cluster by using the pcs cluster setup command, and specifying the following:

    • Name of the cluster
    • Resolvable host name and IP address of each node that you want in the cluster

    For example, to create an HA cluster with two nodes:

    sudo pcs cluster setup cluster-name node1-resolvable-hostname addr=node1-IP-address node2-resolvable-hostname addr=node2-IP-address
  5. From one of the nodes, start the cluster on all nodes:

    sudo pcs cluster start --all
  6. Configure SBD fencing for the newly created HA cluster. See Configuring Fencing.

Configuring Fencing

STONITH Block Device (SBD) fencing works with the Pacemaker software to protect data when a node in a high availability (HA) cluster becomes unresponsive. Fencing prevents the live nodes in the HA cluster from accessing data on the unresponsive node until the Pacemaker software takes that unresponsive node offline.

SBD fencing configuration is the last step in completing the setup of an HA cluster with OCI instances. For information about creating an HA cluster, see Creating an HA Cluster.

Note

To create HA clusters with OCI instances, you must use only the SBD cluster fencing mechanism. Other cluster fencing mechanisms aren't currently supported in this environment.

Configuring SBD Fencing for an HA Cluster

To configure SBD fencing for an HA cluster:

  1. From one of the cluster nodes, enable stonith (Shoot The Other Node In The Head), a fencing technique that's used as part of the SBD fencing strategy.

    sudo pcs property set stonith-enabled=true
  2. From one of the nodes, stop the cluster:

    sudo pcs cluster stop --all
  3. On each node, install and configure the SBD daemon:

    sudo dnf install sbd
  4. On each node, enable the sbd systemd service:

    sudo systemctl enable sbd
    Note

    When enabled, the sbd systemd service automatically starts and stops as a dependency of the Pacemaker service. This means you don't need to run the sbd service independently, and you can't manually start or stop the service. If you try to manually start or stop it, the state of the service remains the same, and an error message is displayed, indicating that the service is a dependent service.
  5. On each node, edit the /etc/sysconfig/sbd file and set the SBD_DEVICE parameter to identify the shared storage device. For information about shared storage devices, see Oracle Linux 9 Managing Shared File Systems and Oracle Linux 8 Managing Shared File Systems.

    For example, if the shared storage device is available on /dev/sdc, ensure the /etc/sysconfig/sbd file on each node contains the following line:

    SBD_DEVICE="/dev/sdc"
  6. Continue to edit the /etc/sysconfig/sbd file on each node by setting the Watchdog device to /dev/null:
    SBD_WATCHDOG_DEV=/dev/null
  7. From one of the nodes, create the SBD messaging layout on the shared storage device, and confirm that it's in place.

    For example, to set up and verify messaging on the shared storage device at /dev/sdc:

    sudo sbd -d /dev/sdc create 
    sudo sbd -d /dev/sdc list
  8. From one of the nodes, start the cluster and configure the fence_sbd fencing agent for the shared storage device.

    For example, to start the cluster and configure the shared storage device at /dev/sdc:

    sudo pcs cluster start --all 
    sudo pcs stonith create sbd_fencing fence_sbd devices=/dev/sdc