Metrics

You can monitor the health, capacity, and performance of some Data Science resources using metrics, alarms, and notifications.

Data Science monitors these running resources to collect and report metrics:

Job Metrics
  • CPU usage

  • GPU usage

  • Disk usage

  • Memory usage

  • Network bytes in

  • Network bytes out

Model Deployment Metrics
  • CPU usage

  • Memory usage

  • Network bytes

  • Predict request count

  • Predict response

  • Predict latency

  • Predict bandwidth usage

Notebook Session Metrics
  • CPU usage

  • Memory usage

  • Network bytes in

  • Network bytes out

Pipeline Run Metrics
  • CPU usage

  • GPU usage

  • Disk usage

  • Memory usage

  • Network bytes in

  • Network bytes out

Before You Begin

IAM policies:

To monitor resources, you must be given the required access in a policy. This is true whether you're using the Console or the REST API with an SDK, CLI, or other tool. The policy must give you access to the monitoring services and the resources being monitored. If you try to perform an action and get a message that you don't have permission or are unauthorized, confirm with your administrator the type of access you have been granted, and which compartment you should work in. For more information, see Monitoring authentication and authorization or Notifications authentication and authorization.

Viewing Metrics from the Monitoring Service

You can view the default metric charts for all the notebook sessions in a compartment using the Monitoring service.

  1. Open the navigation menu and click Observability & Management. Under Monitoring, click Service Metrics.
  2. Select the compartment that contains the project of the resource that you want to view the metrics for.
  3. Select the resource namespace you want to view for the Metric Namespace. For example, oci_datascience, oci_datascience_jobrun or oci_datascience_modeldeploy.

    The Service Metrics page dynamically updates the page to show charts for each that is emitted by the selected metric namespace, see resources with metrics.

For more information about monitoring metrics and using alarms, see Overview of Monitoring. For information about notifications for alarms, see Notifications Overview.