The Oracle Cloud Infrastructure Streaming service provides a fully managed, scalable, and durable solution for ingesting and consuming high-volume data streams in real-time. Use Streaming for any use case in which data is produced and processed continually and sequentially in a publish-subscribe messaging model.
You can use Streaming for:
Messaging
Use Streaming to decouple the components of large systems. Producers and consumers can use Streaming as an asynchronous message bus and act independently and at their own pace.
Metric and log ingestion
Use Streaming as an alternative for traditional file-scraping approaches to help make critical operational data more quickly available for indexing, analysis, and visualization.
Web or mobile activity data ingestion
Use Streaming for capturing activity from websites or mobile apps, such as page views, searches, or other user actions. You can use this information for real-time monitoring and analytics, and in data warehousing systems for offline processing and reporting.
Infrastructure and apps event processing
Use Streaming as a unified entry point for cloud components to report their lifecycle events for audit, accounting, and related activities.
Streaming Features
Streaming provides the following features:
Fully managed
Streaming is fully managed, from the underlying
infrastructure to its provisioning, deployment, maintenance, security patching,
and replication. Integration with Monitoring and
default metrics make operations easy.
Oracle manages stream partitions and
consumer groups can handle your message offsets.
Durability and Availability
Messages published to the Streaming service are
synchronously replicated across three availability domains when
available. In regions with a single availability domain, the data is replicated
across multiple fault domains. This ensures that even the failure of an
availability domain or fault domain does not result in data loss. The result is
highly durable data.
Streaming data is encrypted both at rest and in transit, ensuring message
integrity. You can let Oracle manage encryption, or use the Oracle Cloud Infrastructure Vault service to
securely store and manage your own encryption keys if you need to meet
specific compliance or security standards.
Private endpoints restrict access to a specified virtual cloud
network (VCN) within your tenancy so that its streams cannot be accessed
through the internet.
Streaming makes it possible to offload the
setup, maintenance, and management of the infrastructure that hosting your own
Apache Kafka cluster requires.
Streaming is
compatible with most Kafka
APIs, allowing you to use applications written for Kafka to send
messages to and receive messages from the Streaming service without having to rewrite
your code. See Using Kafka APIs for more
information.
Streaming also takes
advantage of the Kafka Connect ecosystem to interface directly with
first-party and third-party products by using out-of-the-box Kafka source
and sink connectors. See Using Kafka Connect for more
information.
How Streaming Works 🔗
Here's how Streaming works:
A producer publishes messages to a stream, which is an append-only
log. These messages are distributed among Oracle-managed partitions for
scalability.
Partitions allow you to distribute a stream by splitting messages across multiple nodes
(or brokers). Each partition can be placed on a separate machine, allowing multiple
consumers to read a stream in parallel.
A consumer reads messages from one or more partitions. Consumers can read from any
partition regardless of where the partition is hosted. Each message within a stream is
marked with an offset value, so a consumer can pick up where it left off if it is
interrupted. Messages from a partition are guaranteed to be delivered in the same order
they were produced.
Consumers can read messages explicitly by providing the partition and offset, or as a member of a consumer group, which coordinates the consumption of an entire stream by the members of the group.
The following concepts are essential to understanding and working with Streaming.
stream
A partitioned, append-only log of messages.
stream pool
A grouping that you can use to organize and manage streams, including any
shared Kafka or security settings.
partition
A section of a stream. Partitions allow you to
distribute a stream by splitting messages across multiple nodes. This also
allows multiple consumers to read from a stream in parallel.
cursor
A pointer to a location in a stream. This location could be a pointer to a
specific offset or time in a partition, or to a group's current
location.
message
A Base64-encoded message that is published to a
stream. Streaming is schema-agnostic and accepts
any message format, including XML, JSON, CSV, and even compressed formats such
as gzip. Producers and consumers should agree upon the message format.
producer
An entity that publishes messages to a stream.
consumer
An entity that reads messages from one or more streams.
consumer group
A set of instances which coordinate to consume
messages from all partitions in a stream. At any given time, the messages from a
specific partition can only be consumed by a single consumer in the group.
instance
A member of a consumer group. Instances are defined
when a group cursor is created. Group membership is maintained through
interaction; lack of interaction results in a timeout, removing the instance
from the consumer group.
key
An identifier used to group related messages.
offset
The location of a message within a partition. Each
message within the partition is identified by its offset. Consumers can read
messages starting from any chosen offset. You can use the offset to restart
reading from a stream if interrupted.
Benefits of Streams 🔗
Streams have several advantages over traditional messaging queues,
including:
Configurable message persistence
You control how long your data is retained. Messages in a stream are immutable
and available for the entirety of the stream's configured retention time.
Replay
Because a stream's messages are not removed immediately when processed by
consumers, you can replay any and all messages in the stream at any time within
the configured retention limit.
Message guarantees
Each message is guaranteed to be delivered at least once. In some cases, such as
a consumer's failure to commit messages before going offline, messages may be
delivered multiple times.
Order guarantees
Messages within a stream, per partition, are always delivered in the same order
that they were produced.
Client-side cursors
Your client applications control and track which messages are read and can
move the cursor as needed for maximum flexibility.
Horizontal scale
Partitions provide an opportunity to scale up throughput to meet the needs of
multiple consumers, resulting in increased flexibility.
Consumer groups
Consumer groups handle all of the coordination that is required to deliver
messages to multiple consumers in a balanced manner. Because this management is
handled by a consumer group on behalf of all of its members, you can enjoy
reduced overhead and operational ease.