This information describes using Streaming with Apache Kafka.
Oracle Cloud Infrastructure Streaming lets users of Apache Kafka offload the
setup, maintenance, and infrastructure management that hosting your own Zookeeper and
Kafka cluster requires.
Streaming is compatible with most Kafka APIs, allowing
you to use applications written for Kafka to send messages to and receive messages from
the Streaming service without having to rewrite your
code. See Using Kafka APIs for more
information.
Streaming can also utilize the Kafka Connect ecosystem to
interface directly with external sources like databases, object stores, or any
microservice on the Oracle Cloud. Kafka connectors can easily and automatically create,
publish to, and deliver topics while taking advantage of the Streaming service's high throughput and durability. See
Using Kafka Connect for more information.
Use cases for Streaming and Kafka include:
Move data from Streaming to Autonomous Data
Warehouse via the JDBC Connector to perform advanced analytics and
visualization.
Use the Oracle GoldenGate connector for Big Data to build an event-driven
application.
Move data from Streaming to Oracle Object Storage
via the HDFS/S3 Connector for long term storage, or to run Hadoop/Spark
jobs.
Kafka API Support
Streaming is fully upstream compatible with the latest
versions of Kafka APIs. Streaming supports the following
Kafka APIs:
The implementation of Streaming's Kafka compatibility
results in the following configurations, limitations, and behaviors.
Lossless Configuration 🔗
Streaming only supports lossless Kafka
configurations. Data is replicated three ways. Messages from producers do not
initiate an acknowledgment (ACK) from Streaming
until at least two replicas are in sync.
Unique Stream Names 🔗
If you have streams with the same names in a compartment, you can't use Kafka with
Streaming until you delete the duplicated
streams, unless the streams are in different stream pools. Two
streams with the same name can exist in the same compartment only if the streams are
in different stream pools.
Duplicate stream names otherwise manifest through an "authentication
failed" error. If you do not want to delete your streams, contact the Streaming team so we can rename your streams without
data loss.
Load Balancing Connection Recycling 🔗
Because the Kafka protocol uses long-lived TCP connections, the Streaming Kafka compatibility layer implements a
load balancing mechanism to periodically balance connections between front-end
nodes. This mechanism periodically closes connections to force new ones. Most Kafka
SDKs handle these disconnections automatically when consuming, but producing to Streaming using the Kafka API might raise
disconnection errors. Disconnections can be mitigated by adding retries to your
requests. Retries are part of the Kafka SDK and are automatically enabled, and you
can explicitly configure their behavior.