Oracle Cloud Infrastructure Documentation

Compute Health Monitoring for Bare Metal Instances

Compute health monitoring for bare metal instances is a feature that provides notifications about hardware issues with your bare metal instances. With the health monitoring feature, you can monitor the health of the hardware for your bare metal instances, including components such as the CPU, motherboard, DIMM, and NVMe drives. You can use the notifications to identify problems, letting you proactively redeploy your instances to improve availability.

Health monitoring notifications are emailed to the tenant administrator within one business day of the error occurring. This warning helps you to take action before any potential hardware failure and redeploy your instances to healthy hardware to minimize the impact on your applications.

You can also use the infrastructure health metrics available in the Monitoring service to create The trigger rule and query to evaluate and related configuration, such as notification details to use when the trigger is breached. Alarms passively monitor your cloud resources using metrics in Monitoring. and notifications based on hardware issues.

Error Messages and Troubleshooting

This section contains information about the most common health monitoring error messages and provides troubleshooting suggestions for you to try for your bare metal instance.

A fault has been detected in one or more CPUs
A fault in the memory subsystem was detected during instance launch or a recent reboot
A fault in the memory subsystem was detected
A fault in the instance management controller has been detected
A fault in the PCI subsystem has been detected