Oracle Cloud Infrastructure Documentation

Infrastructure Health Metrics

You can monitor the health, capacity, and performance of your Compute bare metal instances by using metrics, alarms, and notifications.

This topic describes the metrics emitted by the metric namespace oci_compute_infrastructure_health.

Resources: Bare metal Compute instances.

Overview of Metrics: oci_compute_infrastructure_health

The infrastructure health metrics help you monitor the health of the infrastructure for your bare metal instances, including hardware components such as the CPU, motherboard, DIMM, and NVMe drives. You can use the metrics to identify hardware issues, and proactively take action to minimize the impact on your applications.

Required IAM Policy

To monitor resources, you must be given the required type of access in a An IAM document that specifies who has what type of access to your resources. It is used in different ways: to mean an individual statement written in the policy language; to mean a collection of statements in a single, named "policy" document (which has an Oracle Cloud ID (OCID) assigned to it); and to mean the overall body of policies your organization uses to control access to resources. written by an administrator, whether you're using the Console or the REST API with an SDK, CLI, or other tool. The policy must give you access to the monitoring services as well as the resources being monitored. If you try to perform an action and get a message that you don’t have permission or are unauthorized, confirm with your administrator the type of access you've been granted and which A collection of related resources that can be accessed only by certain groups that have been given permission by an administrator in your organization. you should work in. For more information on user authorizations for monitoring, see the Authentication and Authorization section for the related service: Monitoring or Notifications.

Available Metrics: oci_compute_infrastructure_health

The metric listed in the following table is automatically available for each bare metal instance that you create. You do not need to enable monitoring on the instance to get this metric.

You also can use the Monitoring service to create custom queries.

The metric includes the following (Monitoring service) A qualifier provided in a metric definition. Example: Resource identifier (resourceId), provided in the definitions of oci_computeagent metrics.:

faultClass
The type of hardware issue:
  • CPU: A fault has been detected in one or more CPUs.

  • MEM-BOOT: A fault in the memory subsystem was detected during instance launch or a recent reboot.

  • MEM-RUNTIME: A fault in the memory subsystem was detected.

  • MGMT-CONTROLLER: A fault in the instance management controller has been detected.

  • PCI: A fault in the PCI subsystem has been detected.

resourceDisplayName
The friendly name of the instance.
resourceId
The An Oracle-assigned unique ID called an Oracle Cloud Identifier (OCID). This ID is included as part of the resource's information in both the Console and API. of the instance.
Metric Metric Display Name Unit Description Dimensions
health_status Infrastructure Health Status Issues Number of issues. Any non-zero value indicates a health defect.

faultClass

resourceDisplayName

resourceId

Using the Console

To view infrastructure health metrics for a single Compute instance
To view infrastructure health metrics for all Compute instances in a compartment

Using the API

For information about using the API and signing requests, see REST APIs and Security Credentials. For information about SDKs, see Software Development Kits and Command Line Interface.

Use the following APIs for monitoring: