Infrastructure Maintenance

Oracle Cloud Infrastructure performs routine data center maintenance on the physical infrastructure for compute instances. This maintenance includes tasks such as upgrading and replacing hardware or performing maintenance that halts power to the host. This topic provides details about infrastructure maintenance, migration options, and status metrics that you can use to monitor infrastructure maintenance.

You can use compute infrastructure health metrics to monitor the status of your instances during maintenance.

Recovering an Instance During Planned Maintenance

When the underlying infrastructure for an instance needs to undergo planned maintenance, when possible, Oracle Cloud Infrastructure automatically attempts to recover the instance. The maintenance action depends on the type of instance.

  • Virtual machine (VM) instances: When possible, the instance is live migrated to a healthy physical host. If live migration isn't possible, the instance is reboot migrated or rebuilt in place, depending on the shape.
  • Bare metal instances: When possible, the instance is reboot migrated to a healthy physical host. If reboot migration isn't possible, you must manually migrate the instance.

Planned Maintenance for VM Instances

When an infrastructure maintenance event affects VM instances, Oracle Cloud Infrastructure live migrates supported VM instances from the physical VM host that needs maintenance to a healthy VM host with minimal disruption to running instances.

If a VM instance cannot be live migrated or doesn't support live migration, Oracle Cloud Infrastructure schedules a maintenance due date within 14 to 16 days and sends you a notification describing the type of maintenance action that is required, such as reboot migration. A live migration might not succeed if any of the following events happen during the migration: there is too much activity on the instance, a change to the instance is made using the API, or an internal error unrelated to the instance occurs.

If a VM instance is scheduled for maintenance, you can proactively reboot (or stop and start) the instance at any time before the scheduled maintenance due date. Proactively rebooting lets you control how and when your applications experience downtime. If you do not proactively reboot the instance before the due date, the instance is either reboot migrated or rebuilt in place for you, depending on the shape.

Customer-managed maintenance for VM instances is supported on standard and dense I/O instance shapes, including platform images and custom images that were imported from outside of Oracle Cloud Infrastructure.

For standard shapes, you can extend the maintenance due date.

If you choose not to reboot before the scheduled time, Oracle Cloud Infrastructure migrates or rebuilds the instance. After a migration, by default the instance is recovered to the same lifecycle state as before the maintenance event. If you have an alternate process to recover the instance, you can optionally configure the instance to remain stopped after it is reboot migrated to healthy hardware.

Planned Maintenance for Bare Metal Instances

When an infrastructure maintenance event affects bare metal instances, Oracle Cloud Infrastructure reboot migrates supported bare metal instances from the physical host that needs maintenance to a healthy host. Oracle Cloud Infrastructure schedules a maintenance due date within 14 to 16 days and sends you a notification describing the type of maintenance action that is required, such as reboot migration. Within 24 hours after the maintenance due date, the bare metal instance is stopped, migrated to a healthy host, and restarted. A short downtime occurs during the migration.

If a bare metal instance is scheduled for maintenance, you can proactively reboot the instance at any time before the scheduled maintenance due date. Proactively rebooting lets you control how and when your applications experience downtime. If you do not proactively reboot the instance before the due date, the instance is reboot migrated for you.

Reboot migration for bare metal instances is supported on standard instance shapes that use Linux-based platform images. Reboot migration for bare metal instances is not supported for instances that use Windows or custom images, shielded instances, instances which have secondary VNICs created and configured on physical NIC with index 1, or for instances that don't use the standard sanboot command in the iPXE script.

For standard shapes, you can extend the maintenance due date.

If you choose not to reboot before the scheduled time, then Oracle Cloud Infrastructure migrates or rebuilds the instance. After a migration, by default the instance is recovered to the same lifecycle state as before the maintenance event. If you have an alternate process to recover the instance, then you can optionally configure the instance to remain stopped after it is reboot migrated to healthy hardware.

Identifying Instances with Planned Maintenance

If an instance supports the maintenance actions of live migration, reboot migration, or rebuild in place, a date in the Maintenance reboot field for the instance (available in the Console, CLI, and SDKs) indicates that planned maintenance is scheduled. For instances that only support manual migration, Oracle Cloud Infrastructure sends you a notification, but no date is displayed in the Maintenance reboot field.

To identify the instances that are scheduled for maintenance, do any of the following things:

Using the Console: To see which instances in the current compartment are scheduled for maintenance
  1. Open the navigation menu and click Compute. Under Compute, click Instances.

    If the instance has maintenance scheduled and can be proactively rebooted, a warning icon appears next to the instance name.

  2. Click the instance that you're interested in, and then check the Maintenance reboot field for the instance. This field displays the date and start time for the maintenance.
Using the API: To see which instances in a compartment are scheduled for maintenance

Use the ListInstances operation. The timeMaintenanceRebootDue field for the Instance returns the date and start time for the maintenance.

Using Search: To find all instances that are scheduled for maintenance
  1. In the top navigation bar, click Search for resources, services, documentation, and Marketplace, and then select Advanced resource query.
  2. Click Select Sample Query, and then click Query for all instances which have an upcoming scheduled maintenance reboot.
  3. Click Search.

An instance is no longer impacted by a maintenance event when the Maintenance reboot field for the instance is blank.

VM Recovery Due to Infrastructure Failure

When the underlying infrastructure of a VM instance fails because of software or hardware issues, Oracle Cloud Infrastructure automatically attempts to recover the instance.

Standard VM instances are recovered using a reboot migration, which automatically restores the VM on a healthy host, whether that's the original physical host or a different physical host. The VM failure is detected within one minute of occurrence. If the host cannot be recovered immediately, a healthy move occurs, whereby the VM is moved to a different host. In this scenario, the process of migrating to and restarting on a healthy host automatically begins within five minutes. During the reboot, instance properties such as private and ephemeral public IP addresses, attached block volumes, and VNICs are preserved.

Dense I/O VM instances are recovered by rebooting the instance on the same physical host. If recovering a dense I/O instance on the same physical host isn't possible, Oracle Cloud Infrastructure notifies you to delete (terminate) the instance within 14 days. If you don't delete the instance before the deadline, Oracle Cloud Infrastructure disables the instance on the deadline and deletes it within the next seven days. The boot volume and remote attached data volume are preserved.

Oracle Cloud Infrastructure notifies you by email or announcements of any VM infrastructure failure events, with the status of the recovery action that was taken. You can also monitor the instance status metric to stay aware of any unexpected reboots.

You can choose not to have your VMs automatically restart by configuring your instances to remain stopped after they are recovered.

Infrastructure Health Metrics

You can use metrics, alarms, and notifications to monitor the maintenance status of the infrastructure that your compute instances run on. The primary metrics to consider for infrastructure maintenance are the infrastructure health metrics:

  • Instance health (up/down) status: The instance_status metric lets you check whether a VM instance is available (up) or unavailable (down) when in the running state. If the instance is unavailable for more than 30 minutes, contact support.
  • Instance maintenance status: The maintenance_status metric lets you monitor whether a VM or bare metal instance is scheduled for planned infrastructure maintenance.
  • Bare metal infrastructure health status: The health_status metric helps you monitor the health of the infrastructure for bare metal instances, including hardware components such as the CPU and memory.

Viewing Instance Status and Maintenance Notifications in the Console

You can view the instance status and maintenance reboot notifications in the Console on the Instance Details page. To see these fields:

  1. Open the navigation menu and click Compute. Under Compute, click Instances.
  2. Click the instance that you're interested in.
  3. On the Instance information tab, in the Instance details section, see the Instance status field and the Maintenance reboot field.
    Note

    The Instance status field only displays if the instance was unavailable in the past month.

Maintenance Actions

Oracle Cloud Infrastructure supports a variety of maintenance actions for compute instances including rebuild in place, live migration, reboot migration, and manual migration. The maintenance action depends on characteristics such as the shape that the instance uses.

Rebuild in Place

This maintenance action doesn't move the instance. At the scheduled time, the instance is stopped, rebuilt on the same physical hardware, and restarted. A downtime of several hours occurs during the maintenance process.

A rebuild in place preserves instance properties that are tied to the physical hardware, such as the MAC address or universal identification number. A rebuild in place also lets you retain the locally-attached NVMe-based SSD on a dense I/O instance.

For VMs, if you want to minimize downtime and can delete the locally-attached NVMe-based SSD, you can proactively reboot the instance before the scheduled maintenance time. The instance will be reboot migrated to a healthy host and the SSD will be permanently deleted. A short downtime occurs during the migration.

Migration Maintenance Actions

The other three maintenance actions involve migrating instances. For detailed information about each maintenance action, see Live, Reboot, and Manual Migration: Moving a Compute Instance to a New Host. Or click one of the following links to go directly to the details for that action.