Streaming Metrics

You can monitor the health and performance of your streams by using metrics and alarms. For more information, see Monitoring Overview.

This topic describes the metrics emitted by the Streaming service using the metric namespace oci_streaming.

Overview of Streaming Metrics

The Streaming service provides metrics showing how the service is performing. These metrics are automatically available.

You can use these metrics to:

  • Understand the produce/consume latency for a real-time application.
  • Calculate and validate the price of service usage.
  • Monitor changes in throughput over time.
  • Check the time that the last message was consumed.

Example streaming metrics graphs

To view a default set of metrics charts in the Console, navigate to the Service Metrics page and then select the oci_streaming metric namespace.

Available Metrics

The following tables describe the available Streaming metrics.

You also can use the Monitoring service to create custom queries.

Each metric includes the following dimensions :

REGION
The REGION  where the stream resides.
RESOURCEID
The OCID  of the stream.

Producers

Metric Metric Display Name Unit Description Dimensions
PutMessagesLatency.Time Put Messages Latency time (ms) Time taken for put messages operation measured over time range. region,resourceId
PutMessagesThroughput.Bytes Put Messages Total Throughput Bytes Bytes pushed to the stream measured over time.
PutMessagesThroughput.Count Put Messages Records/sec count Count of messages pushed to stream measured over time.
PutMessagesThrottling.Count Put Messages Throttled Records/sec count Number of put messages throttled either due to volume or requests measured over time.
PutMessagesSuccess.Count Put Messages Success/sec count Successful requests for put messages per stream measured over time.
PutMessagesFault.Count Put Messages Failure/sec count Total failed putMessage requests per stream measured over time.
PutMessagesRecords.Count Put Messages Requests/sec count Number of mesages published to a stream measured over time.

Consumers

Metric Metric Display Name Unit Description Dimensions
GetMessagesLatency.Time Get Messages Latency time (s) Time taken for get messages operation measured over time range. region,resourceId
GetMessagesThroughput.Bytes Get Messages Total Throughput Bytes Bytes retrieved from stream measured over time.
GetMessagesThroughput.Count Get Messages Requests/sec count Count of messages read from stream measured over time.
GetMessagesThrottling.Count Get Messages Throttled Requests/sec count Number of get messages throttled either due to volume or requests measured over time.
GetMessagesSuccess.Count Get Messages Success/sec count Successful requests for get messages per stream measured over time.
GetMessagesFault.Count Get Messages Failure/sec count Total failed getMessage requests per stream measured over time.

Stream Health

A healthy stream is a stream that is active: messages are received and consumed successfully.

Writes to the service are durable. If you can produce to your stream, and if you get a successful response, then the stream is healthy.

After data is ingested, it is accessible to consumers for the configured retention period. If GetMessages API calls return elevated levels of internal server errors, the service isn't healthy.

A healthy stream also has healthy metrics:

  • Put Messages Latency is low.
  • Put Messages Total Throughput is close to 1 MB per second per partition.
  • Put Messages Throttled Records is close to 0.
  • Put Messages Failure is close to 0.
  • Get Messages Latency is low.
  • Get Messages Total Throughput is close to 2 MB per second per partition.
  • Get Messages Throttled Requests is close to 0.
  • Get Messages Failure is close to 0.

Suggested Alarms

Producers

For producers, consider setting alarms on the following metrics:

  • Put Messages Latency: An increase in latency means that the messages are taking longer to publish, which could indicate network issues.
  • Put Messages Total Throughput:
    • An increase in total throughput could indicate that the 1 MB per second per partition limit will be reached, and that event will trigger the throttling mechanism.
    • A decrease could mean that the client producer is having an issue or is about to stop.
  • Put Messages Throttled Records: It's important to get notified when messages are throttled.
  • Put Messages Failure: It's important to get notified if put messages start failing.

Consumers

For consumers, consider setting similar alarms based on the following metrics:

  • Get Messages Latency
  • Get Messages Total Throughput
  • Get Messages Throttled Requests
  • Get Messages Failure

Using the Console

To view default producer metrics
  1. Open the navigation menu. Under Solutions and Platform, click Analyics, and then click Streaming.
  2. Click a stream to view its details.
  3. Under Resources, click Produce Monitoring Charts.

For more information about monitoring metrics and using alarms, see Monitoring Overview. For information about notifications for alarms, see Notifications Overview.

To view default consumer metrics
  1. Open the navigation menu. Under Solutions and Platform, click Analyics, and then click Streaming.
  2. Click a stream to view its details.
  3. Under Resources, click Consume Monitoring Charts.

For more information about monitoring metrics and using alarms, see Monitoring Overview. For information about notifications for alarms, see Notifications Overview.