As a streaming platform, Apache Kafka provides low-latency, high-throughput, fault-tolerant publish and subscribe pipelines and excels at processing streams of real-time events. Kafka provides reliable, millisecond delivery for connecting downstream systems with real-time data.
In this talk, we will show how easy it is to leverage Kafka and the Elasticsearch connector to keep your indices populated with the latest data from the rest of your enterprise, as it changes.
3. Kafka
Cluster
3
Apache Kafka®
Kafka
A Distributed Commit Log. Publish and subscribe to
streams of records. Highly scalable, high throughput.
Supports transactions. Persisted data.
Reads are a single seek &
Writes are
append only
4. 4
Apache Kafka®
Kafka Streams API
Write standard Java applications & microservices
to process your data in real-time
Kafka Connect API
Reliable and scalable integration of Kafka
with other systems – no coding required.
Orders
Table
Customers
Kafka Streams API
24. 24
Sink properties : Converters
• Json, Avro, String, Protobuf, etc
• Specify the converter in the Kafka Connect configuration, e.g.
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
• Kafka Connect uses pluggable converters for both message key and
value deserialisation
28. 28
Single Message Transform (SMT) -- Extract, TRANSFORM, Load…
• Modify events before storing in Kafka:
• Mask/drop sensitive information
• Set partitioning key
• Store lineage
• Cast data types
• Modify events going out of Kafka:
• Direct events to different Elasticsearch
indexes
• Mask/drop sensitive information
• Cast data types to match destination
29. 29
Confluent Platform: Enterprise Streaming based on Apache Kafka®
Database Changes Log Events loT Data Web Events …
CRM
Data Warehouse
Database
Hadoop
Data
Integration
…
Monitoring
Analytics
Custom Apps
Transformations
Real-time Applications
…
Apache Open Source Confluent Open Source Confluent Enterprise
Confluent Platform
Confluent Platform
Apache Kafka®
Core | Connect API | Streams API
Data Compatibility
Schema Registry
Monitoring & Administration
Confluent Control Center | Security
Operations
Replicator | Auto Data Balancing
Development and Connectivity
Clients | Connectors | REST Proxy | CLI
Apache Open Source Confluent Open Source Confluent Enterprise
SQL Stream Processing
KSQL
30. 30
https://www.confluent.io/download/
Streaming ETL, powered by Apache Kafka and Confluent Platform
https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-1/
https://docs.confluent.io/current/connect/connect-elasticsearch/docs/