Oracle Cloud Infrastructure Documentation

Compute Health Monitoring for Bare Metal Instances

Compute health monitoring for bare metal instances is a feature that provides notifications about hardware issues with your bare metal instances. With the health monitoring feature, you can now easily monitor the health of your bare metal instances' hardware, including their components such as DIMM, CPU, motherboard and NVMe drives. You can use the notifications to identify problems enabling you to proactively redeploy your instances to improve availability.

Health monitoring notifications are sent in emails to the tenant administrator within one business day of the error occurring. This warning allows you to take action before any potential hardware failure and redeploy your instances to healthy hardware to minimize the impact on your applications.

Guidance on Compute Health Monitoring Error Messages

This section contains information about the most common health monitoring error messages and provides troubleshooting suggestions for you to try for your bare metal instance.

Click the error message that matches the information in your notification email to expand the troubleshooting section.

1: A fault in the PCI subsystem has been detected
2:A fault in the memory subsystem was detected
3: A fault in the memory subsystem was detected during instance launch or a recent reboot
4: A fault has been detected in one or more CPUs
5: A fault in the instance management controller has been detected