You can monitor the health and performance of your streams by using metrics and alarms. For more information, see Monitoring.
This topic describes the metrics emitted by the Streaming
service using the metric namespace oci_streaming.
Overview of Streaming Metrics
The Streaming service provides metrics showing how the
service is performing. These metrics are automatically available.
You can use these metrics to:
Understand the produce/consume latency for a real-time application.
Calculate and validate the price of service usage.
Monitor changes in throughput over time.
Check the time that the last message was consumed.
To view a default set of metrics charts in the Console, navigate to the Service Metrics
page and then select the oci_streaming metric namespace.
Available Metrics π
The following tables describe the available Streaming metrics.
You also can use the Monitoring service to create custom
queries.
Each metric includes the following dimensions :
REGION
The REGION where the stream resides.
RESOURCEID
The OCID of the stream.
Producers
Metric
Metric Display Name
Unit
Description
Dimensions
PutMessagesLatency.Time
Put Messages Latency
time (ms)
Time taken for put messages operation measured over time
range.
region,resourceId
PutMessagesThroughput.Bytes
Put Messages Total Throughput
Bytes
Bytes pushed to the stream measured over time.
PutMessagesThroughput.Count
Put Messages Records/sec
count
Count of messages pushed to stream measured over time.
PutMessagesThrottling.Count
Put Messages Throttled Records/sec
count
Number of put messages throttled either due to volume or requests
measured over time.
PutMessagesSuccess.Count
Put Messages Success/sec
count
Successful requests for put messages per stream measured over
time.
PutMessagesFault.Count
Put Messages Failure/sec
count
Total failed putMessage requests per stream measured over
time.
PutMessagesRecords.Count
Put Messages Requests/sec
count
Number of mesages published to a stream measured over
time.
Consumers
Metric
Metric Display Name
Unit
Description
Dimensions
GetMessagesLatency.Time
Get Messages Latency
time (s)
Time taken for get messages operation measured over time
range.
region,resourceId
GetMessagesThroughput.Bytes
Get Messages Total Throughput
Bytes
Bytes retrieved from stream measured over time.
GetMessagesThroughput.Count
Get Messages Requests/sec
count
Count of messages read from stream measured over time.
GetMessagesThrottling.Count
Get Messages Throttled Requests/sec
count
Number of get messages throttled either due to volume or requests
measured over time.
GetMessagesSuccess.Count
Get Messages Success/sec
count
Successful requests for get messages per stream measured over
time.
GetMessagesFault.Count
Get Messages Failure/sec
count
Total failed getMessage requests per stream measured over
time.
Stream Health π
A healthy stream is a stream that is active: messages are received and consumed
successfully.
Writes to the service are durable. If you can produce to your stream, and if you get a
successful response, then the stream is healthy.
After data is ingested, it is accessible to consumers for the configured retention
period. If GetMessages API calls return elevated levels of internal server errors, the
service isn't healthy.
A healthy stream also has healthy metrics:
Put Messages Latency is low.
Put Messages Total Throughput is close to 1 MB per second per partition.
Put Messages Throttled Records is close to 0.
Put Messages Failure is close to 0.
Get Messages Latency is low.
Get Messages Total Throughput is close to 2 MB per second per partition.
Get Messages Throttled Requests is close to 0.
Get Messages Failure is close to 0.
Suggested Alarms π
Producers π
For producers, consider setting alarms on the following metrics:
Put Messages Latency: An increase in latency means that the messages are
taking longer to publish, which could indicate network issues.
Put Messages Total Throughput:
An increase in total throughput could indicate that the 1 MB per
second per partition limit will be reached, and that event will
trigger the throttling mechanism.
A decrease could mean that the client producer is having an issue or
is about to stop.
Put Messages Throttled Records: It's important to get notified when
messages are throttled.
Put Messages Failure: It's important to get notified if put messages
start failing.
Consumers π
For consumers, consider setting similar alarms based on the following metrics: