SlideShare une entreprise Scribd logo
1  sur  20
11
Introduction to Apache Kafka and Confluent
... and why they matter!
First Italian Kafka Meetup
Wednesday, November 29th 2017
18:45 – 20:30
PoliHub – Startup District & Incubator
via Durando 39, Milano
https://www.meetup.com/Milano-Kafka-meetup/events/244352352/
22
How Organizations Handle Data Flows: a Giant Mess
Data
Warehouse
Hadoop
NoSQL
Oracle
SFDC
Logging
Bloomberg
…any sink/source
Web Custom Apps Microservices Monitoring Analytics
…and more
OLTP
ActiveMQ
App App
Caches
OLTP OLTPAppAppApp
33
Apache Kafka™: A Distributed Streaming Platform
Apache Kafka
Offline Batch (+1 Hour)Near-Real Time (>100s ms)Real Time (0-100 ms)
Data
Warehouse
Hadoop
NoSQL
Oracle
SFDC
Twitter
Bloomberg
…any sink/source …any sink/source
…and more
Web Custom Apps Microservices Monitoring Analytics
44
Over 35% of Fortune 500’s are using Apache Kafka™
6 of top 10
Travel
7 of top 10
Global banks
8 of top 10
Insurance
9 of top 10
Telecom
55
Industry Trends… and why Apache Kafka matters!
1. From ‘big data’ (batch) to ‘fast data’ (stream processing)
2. Internet of Things (IoT) and sensor data
3. Microservices and asynchronous communication (coordination
messages and data streams) between loosely coupled and fine-
grained services
66
Apache Kafka APIs – A UNIX Analogy
$ cat < in.txt | grep "apache" | tr a-z A-Z > out.txt
Connect APIs
Streams APIs
Producer / Consumer APIs
77
Apache Kafka API – ETL Analogy
Source SinkConnectAPI
ConnectAPI
Streams API
Extract Transform Load
88
The Connect API of Apache Kafka®
 Centralized management and configuration
 Support for hundreds of technologies
including RDBMS, Elasticsearch, HDFS, S3
 Supports CDC ingest of events from RDBMS
 Preserves data schema
 Fault tolerant and automatically load balanced
 Extensible API
 Single Message Transforms
 Part of Apache Kafka, included in
Confluent Open Source
Reliable and scalable integration of Kafka
with other systems – no coding required.
{
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:mysql://localhost:3306/demo?user=rmoff&password=foo",
"table.whitelist": "sales,orders,customers"
}
https://docs.confluent.io/current/connect/
99
The Streams API of Apache Kafka®
 No separate processing cluster required
 Develop on Mac, Linux, Windows
 Deploy to containers, VMs, bare metal, cloud
 Powered by Kafka: elastic, scalable,
distributed, battle-tested
 Perfect for small, medium, large use cases
 Fully integrated with Kafka security
 Exactly-once processing semantics
 Part of Apache Kafka, included in
Confluent Open Source
Write standard Java applications and microservices
to process your data in real-time
KStream<User, PageViewEvent> pageViews = builder.stream("pageviews-topic");
KTable<Windowed<User>, Long> viewsPerUserSession = pageViews
.groupByKey()
.count(SessionWindows.with(TimeUnit.MINUTES.toMillis(5)), "session-views");
https://docs.confluent.io/current/streams/
1010
KSQL: a Streaming SQL Engine for Apache Kafka® from Confluent
 No coding required, all you need is SQL
 No separate processing cluster required
 Powered by Kafka: elastic, scalable,
distributed, battle-tested
CREATE TABLE possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
CREATE STREAM vip_actions AS
SELECT userid, page, action
FROM clickstream c
LEFT JOIN users u
ON c.userid = u.userid
WHERE u.level = 'Platinum';
KSQL is the simplest way to process streams of data in real-time
 Perfect for streaming ETL, anomaly detection,
event monitoring, and more
 Part of Confluent Open Source
https://github.com/confluentinc/ksql
1111
Confluent Enterprise: Logical Architecture
Kafka Cluster
Mainframe
Kafka Connect Servers
Kafka ConnectRDBMS
Hadoop
Cassandra
Elasticsearch
Kafka Connect Servers
Kafka Connect
Files
Producer
Application
Consumer
ApplicationZookeeper
Kafka Broker
REST Proxy Servers
REST Proxy
REST Client
Control Center Servers
Control Center
Schema Registry Servers
Schema Registry
Kafka Producer APIs Kafka Consumer APIs
Stream Processing Application 1
Stream Client
Stream Processing Application 2
Stream Client
1212
Confluent Enterprise: Physical Architecture
Rack 1
Kafka Broker #1
ToR Switch
ToR Switch
Schema Registry #1
Kafka Connect #1
Zookeeper #1
REST Proxy #1
Kafka Broker #4
Zookeeper #4
Rack 2
Kafka Broker #2
ToR Switch
ToR Switch
Schema Registry #2
Kafka Connect #2
Zookeeper #2
Kafka Broker #5
Zookeeper #5
Rack 3
Kafka Broker #3
ToR Switch
ToR Switch
Kafka Connect #3
Zookeeper #3
Core Switch Core Switch
REST Proxy #2
Load Balancer Load Balancer
Control Center #1 Control Center #2
1313
Confluent Completes Kafka
Feature Benefit Apache Kafka Confluent Open Source Confluent Enterprise
Apache Kafka
High throughput, low latency, high availability, secure distributed streaming
platform
Kafka Connect API Advanced API for connecting external sources/destinations into Kafka
Kafka Streams API
Simple library that enables streaming application development within the
Kafka framework
Additional Clients Supports non-Java clients; C, C++, Python, .NET and several others
REST Proxy
Provides universal access to Kafka from any network connected device via
HTTP
Schema Registry
Central registry for the format of Kafka data – guarantees all data is always
consumable
Pre-Built Connectors
HDFS, JDBC, Elasticsearch, Amazon S3 and other connectors fully certified
and supported by Confluent
JMS Client
Support for legacy Java Message Service (JMS) applications consuming
and producing directly from Kafka
Confluent Control
Center
Enables easy connector management, monitoring and alerting for a Kafka
cluster
Auto Data Balancer Rebalancing data across cluster to remove bottlenecks
Replicator Multi-datacenter replication simplifies and automates MDC Kafka clusters
Support
Enterprise class support to keep your Kafka environment running at top
performance Community Community 24x7x365
1414
Big Data and Fast Data Ecosystems
Synchronous Req/Response
0 – 100s ms
Near Real Time
> 100s ms
Offline Batch
> 1 hour
Apache Kafka
Stream Data Platform
Search
RDBMS
Apps Monitoring
Real-time
Analytics
NoSQL
Stream
Processing
Apache Hadoop
Data Lake
Impala
DWH
Hive
Spark Map-Reduce
Confluent HDFS Connector
(exactly once semantics)
https://www.confluent.io/blog/the-value-of-apache-kafka-in-big-data-ecosystem/
1515
Building a Microservices Ecosystem with Kafka Streams and KSQL
https://www.confluent.io/blog/building-a-microservices-ecosystem-with-kafka-streams-and-ksql/
https://github.com/confluentinc/kafka-streams-examples/tree/3.3.0-post/src/main/java/io/confluent/examples/streams/microservices
1616
About Confluent and Apache Kafka™
70% of active Kafka
Committers
Founded
September 2014
Technology developed
while at LinkedIn
Founded by the creators of
Apache Kafka
1717
Apache Kafka: PMC members and committers
https://kafka.apache.org/committers
PMC
PMC PMC PMCPMC PMC PMC PMC
PMC PMC PMC
1818
Download Confluent Platform: the easiest way to get you started
https://www.confluent.io/download/
1919
Books: get them all three in PDF format from Confluent website!
https://www.confluent.io/apache-kafka-stream-processing-book-bundle
2020
Discount code: kacom17
Presented by
https://kafka-summit.org/
Presented by

Contenu connexe

Tendances

Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Kai Wähner
 

Tendances (20)

Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Introduction to Kafka connect
Introduction to Kafka connectIntroduction to Kafka connect
Introduction to Kafka connect
 
ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
kafka
kafkakafka
kafka
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
An Introduction to Confluent Cloud: Apache Kafka as a Service
An Introduction to Confluent Cloud: Apache Kafka as a ServiceAn Introduction to Confluent Cloud: Apache Kafka as a Service
An Introduction to Confluent Cloud: Apache Kafka as a Service
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB)
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Event driven architecture with Kafka
Event driven architecture with KafkaEvent driven architecture with Kafka
Event driven architecture with Kafka
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 

Similaire à Introduction to Apache Kafka and Confluent... and why they matter

Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...
Kai Wähner
 

Similaire à Introduction to Apache Kafka and Confluent... and why they matter (20)

Introduction to Apache Kafka and Confluent... and why they matter!
Introduction to Apache Kafka and Confluent... and why they matter!Introduction to Apache Kafka and Confluent... and why they matter!
Introduction to Apache Kafka and Confluent... and why they matter!
 
Introducing Confluent Cloud: Apache Kafka as a Service
Introducing Confluent Cloud: Apache Kafka as a Service Introducing Confluent Cloud: Apache Kafka as a Service
Introducing Confluent Cloud: Apache Kafka as a Service
 
Introduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - MadridIntroduction to Apache Kafka and why it matters - Madrid
Introduction to Apache Kafka and why it matters - Madrid
 
JHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka EcosystemJHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka Ecosystem
 
Beyond the brokers - Un tour de l'écosystème Kafka
Beyond the brokers - Un tour de l'écosystème KafkaBeyond the brokers - Un tour de l'écosystème Kafka
Beyond the brokers - Un tour de l'écosystème Kafka
 
Beyond the Brokers: A Tour of the Kafka Ecosystem
Beyond the Brokers: A Tour of the Kafka EcosystemBeyond the Brokers: A Tour of the Kafka Ecosystem
Beyond the Brokers: A Tour of the Kafka Ecosystem
 
Beyond the brokers - A tour of the Kafka ecosystem
Beyond the brokers - A tour of the Kafka ecosystemBeyond the brokers - A tour of the Kafka ecosystem
Beyond the brokers - A tour of the Kafka ecosystem
 
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
Apache Kafka vs. Traditional Middleware (Kai Waehner, Confluent) Frankfurt 20...
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
 
Confluent and Syncsort Webinar August 2016
Confluent and Syncsort Webinar August 2016Confluent and Syncsort Webinar August 2016
Confluent and Syncsort Webinar August 2016
 
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
Applying ML on your Data in Motion with AWS and Confluent | Joseph Morais, Co...
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
 
Confluent and Elastic: a Lovely Couple - Elastic Stack in a Day 2018
Confluent and Elastic: a Lovely Couple - Elastic Stack in a Day 2018Confluent and Elastic: a Lovely Couple - Elastic Stack in a Day 2018
Confluent and Elastic: a Lovely Couple - Elastic Stack in a Day 2018
 
Confluent Enterprise Datasheet
Confluent Enterprise DatasheetConfluent Enterprise Datasheet
Confluent Enterprise Datasheet
 
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
Confluent REST Proxy and Schema Registry (Concepts, Architecture, Features)
 
Kafka Basic For Beginners
Kafka Basic For BeginnersKafka Basic For Beginners
Kafka Basic For Beginners
 
Apache Kafka - A Distributed Streaming Platform
Apache Kafka - A Distributed Streaming PlatformApache Kafka - A Distributed Streaming Platform
Apache Kafka - A Distributed Streaming Platform
 
Apache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platformApache kafka-a distributed streaming platform
Apache kafka-a distributed streaming platform
 
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...
Apache Kafka + Apache Mesos + Kafka Streams - Highly Scalable Streaming Micro...
 
Kafka Vienna Meetup 020719
Kafka Vienna Meetup 020719Kafka Vienna Meetup 020719
Kafka Vienna Meetup 020719
 

Plus de confluent

Plus de confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Dernier (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Introduction to Apache Kafka and Confluent... and why they matter

  • 1. 11 Introduction to Apache Kafka and Confluent ... and why they matter! First Italian Kafka Meetup Wednesday, November 29th 2017 18:45 – 20:30 PoliHub – Startup District & Incubator via Durando 39, Milano https://www.meetup.com/Milano-Kafka-meetup/events/244352352/
  • 2. 22 How Organizations Handle Data Flows: a Giant Mess Data Warehouse Hadoop NoSQL Oracle SFDC Logging Bloomberg …any sink/source Web Custom Apps Microservices Monitoring Analytics …and more OLTP ActiveMQ App App Caches OLTP OLTPAppAppApp
  • 3. 33 Apache Kafka™: A Distributed Streaming Platform Apache Kafka Offline Batch (+1 Hour)Near-Real Time (>100s ms)Real Time (0-100 ms) Data Warehouse Hadoop NoSQL Oracle SFDC Twitter Bloomberg …any sink/source …any sink/source …and more Web Custom Apps Microservices Monitoring Analytics
  • 4. 44 Over 35% of Fortune 500’s are using Apache Kafka™ 6 of top 10 Travel 7 of top 10 Global banks 8 of top 10 Insurance 9 of top 10 Telecom
  • 5. 55 Industry Trends… and why Apache Kafka matters! 1. From ‘big data’ (batch) to ‘fast data’ (stream processing) 2. Internet of Things (IoT) and sensor data 3. Microservices and asynchronous communication (coordination messages and data streams) between loosely coupled and fine- grained services
  • 6. 66 Apache Kafka APIs – A UNIX Analogy $ cat < in.txt | grep "apache" | tr a-z A-Z > out.txt Connect APIs Streams APIs Producer / Consumer APIs
  • 7. 77 Apache Kafka API – ETL Analogy Source SinkConnectAPI ConnectAPI Streams API Extract Transform Load
  • 8. 88 The Connect API of Apache Kafka®  Centralized management and configuration  Support for hundreds of technologies including RDBMS, Elasticsearch, HDFS, S3  Supports CDC ingest of events from RDBMS  Preserves data schema  Fault tolerant and automatically load balanced  Extensible API  Single Message Transforms  Part of Apache Kafka, included in Confluent Open Source Reliable and scalable integration of Kafka with other systems – no coding required. { "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector", "connection.url": "jdbc:mysql://localhost:3306/demo?user=rmoff&password=foo", "table.whitelist": "sales,orders,customers" } https://docs.confluent.io/current/connect/
  • 9. 99 The Streams API of Apache Kafka®  No separate processing cluster required  Develop on Mac, Linux, Windows  Deploy to containers, VMs, bare metal, cloud  Powered by Kafka: elastic, scalable, distributed, battle-tested  Perfect for small, medium, large use cases  Fully integrated with Kafka security  Exactly-once processing semantics  Part of Apache Kafka, included in Confluent Open Source Write standard Java applications and microservices to process your data in real-time KStream<User, PageViewEvent> pageViews = builder.stream("pageviews-topic"); KTable<Windowed<User>, Long> viewsPerUserSession = pageViews .groupByKey() .count(SessionWindows.with(TimeUnit.MINUTES.toMillis(5)), "session-views"); https://docs.confluent.io/current/streams/
  • 10. 1010 KSQL: a Streaming SQL Engine for Apache Kafka® from Confluent  No coding required, all you need is SQL  No separate processing cluster required  Powered by Kafka: elastic, scalable, distributed, battle-tested CREATE TABLE possible_fraud AS SELECT card_number, count(*) FROM authorization_attempts WINDOW TUMBLING (SIZE 5 SECONDS) GROUP BY card_number HAVING count(*) > 3; CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.userid WHERE u.level = 'Platinum'; KSQL is the simplest way to process streams of data in real-time  Perfect for streaming ETL, anomaly detection, event monitoring, and more  Part of Confluent Open Source https://github.com/confluentinc/ksql
  • 11. 1111 Confluent Enterprise: Logical Architecture Kafka Cluster Mainframe Kafka Connect Servers Kafka ConnectRDBMS Hadoop Cassandra Elasticsearch Kafka Connect Servers Kafka Connect Files Producer Application Consumer ApplicationZookeeper Kafka Broker REST Proxy Servers REST Proxy REST Client Control Center Servers Control Center Schema Registry Servers Schema Registry Kafka Producer APIs Kafka Consumer APIs Stream Processing Application 1 Stream Client Stream Processing Application 2 Stream Client
  • 12. 1212 Confluent Enterprise: Physical Architecture Rack 1 Kafka Broker #1 ToR Switch ToR Switch Schema Registry #1 Kafka Connect #1 Zookeeper #1 REST Proxy #1 Kafka Broker #4 Zookeeper #4 Rack 2 Kafka Broker #2 ToR Switch ToR Switch Schema Registry #2 Kafka Connect #2 Zookeeper #2 Kafka Broker #5 Zookeeper #5 Rack 3 Kafka Broker #3 ToR Switch ToR Switch Kafka Connect #3 Zookeeper #3 Core Switch Core Switch REST Proxy #2 Load Balancer Load Balancer Control Center #1 Control Center #2
  • 13. 1313 Confluent Completes Kafka Feature Benefit Apache Kafka Confluent Open Source Confluent Enterprise Apache Kafka High throughput, low latency, high availability, secure distributed streaming platform Kafka Connect API Advanced API for connecting external sources/destinations into Kafka Kafka Streams API Simple library that enables streaming application development within the Kafka framework Additional Clients Supports non-Java clients; C, C++, Python, .NET and several others REST Proxy Provides universal access to Kafka from any network connected device via HTTP Schema Registry Central registry for the format of Kafka data – guarantees all data is always consumable Pre-Built Connectors HDFS, JDBC, Elasticsearch, Amazon S3 and other connectors fully certified and supported by Confluent JMS Client Support for legacy Java Message Service (JMS) applications consuming and producing directly from Kafka Confluent Control Center Enables easy connector management, monitoring and alerting for a Kafka cluster Auto Data Balancer Rebalancing data across cluster to remove bottlenecks Replicator Multi-datacenter replication simplifies and automates MDC Kafka clusters Support Enterprise class support to keep your Kafka environment running at top performance Community Community 24x7x365
  • 14. 1414 Big Data and Fast Data Ecosystems Synchronous Req/Response 0 – 100s ms Near Real Time > 100s ms Offline Batch > 1 hour Apache Kafka Stream Data Platform Search RDBMS Apps Monitoring Real-time Analytics NoSQL Stream Processing Apache Hadoop Data Lake Impala DWH Hive Spark Map-Reduce Confluent HDFS Connector (exactly once semantics) https://www.confluent.io/blog/the-value-of-apache-kafka-in-big-data-ecosystem/
  • 15. 1515 Building a Microservices Ecosystem with Kafka Streams and KSQL https://www.confluent.io/blog/building-a-microservices-ecosystem-with-kafka-streams-and-ksql/ https://github.com/confluentinc/kafka-streams-examples/tree/3.3.0-post/src/main/java/io/confluent/examples/streams/microservices
  • 16. 1616 About Confluent and Apache Kafka™ 70% of active Kafka Committers Founded September 2014 Technology developed while at LinkedIn Founded by the creators of Apache Kafka
  • 17. 1717 Apache Kafka: PMC members and committers https://kafka.apache.org/committers PMC PMC PMC PMCPMC PMC PMC PMC PMC PMC PMC
  • 18. 1818 Download Confluent Platform: the easiest way to get you started https://www.confluent.io/download/
  • 19. 1919 Books: get them all three in PDF format from Confluent website! https://www.confluent.io/apache-kafka-stream-processing-book-bundle
  • 20. 2020 Discount code: kacom17 Presented by https://kafka-summit.org/ Presented by