Infrastructure Health Metrics
You can monitor the health, capacity, and performance of your Compute bare metal instances by using metrics, alarms, and notifications.
This topic describes the metrics emitted by the metric namespace
Resources: Bare metal Compute instances.
Overview of Metrics: oci_compute_infrastructure_health
The infrastructure health metrics help you monitor the health of the infrastructure for your bare metal instances, including hardware components such as the CPU, motherboard, DIMM, and NVMe drives. You can use the metrics to identify hardware issues, and proactively take action to minimize the impact on your applications.
Required IAM Policy
To monitor resources, you must be given the required type of access in a policy written by an administrator, whether you're using the Console or the REST API with an SDK, CLI, or other tool. The policy must give you access to the monitoring services as well as the resources being monitored. If you try to perform an action and get a message that you don’t have permission or are unauthorized, confirm with your administrator the type of access you've been granted and which compartment you should work in. For more information on user authorizations for monitoring, see the Authentication and Authorization section for the related service: Monitoring or Notifications.
Available Metrics: oci_compute_infrastructure_health
The metric listed in the following table is automatically available for each bare metal instance that you create. You do not need to enable monitoring on the instance to get this metric.
You also can use the Monitoring service to create custom queries.
The metric includes the following dimensions :
- The type of hardware issue:
CPU: A fault has been detected in one or more CPUs.
MEM-BOOT: A fault in the memory subsystem was detected during instance launch or a recent reboot.
MEM-RUNTIME: A fault in the memory subsystem was detected.
MGMT-CONTROLLER: A fault in the instance management controller has been detected.
PCI: A fault in the PCI subsystem has been detected.
- The friendly name of the instance.
- The OCID of the instance.
|Infrastructure Health Status
||Number of issues. Any non-zero value indicates a health defect.
Using the Console
Using the API
For information about using the API and signing requests, see REST APIs and Security Credentials. For information about SDKs, see Software Development Kits and Command Line Interface.
Use the following APIs for monitoring: