SlideShare une entreprise Scribd logo
1  sur  33
Télécharger pour lire hors ligne
Delivering Fast Data Systems with Kafka
LANDOOP
www.landoop.com
Antonios Chalkiopoulos
18/1/2017
@chalkiopoulos
Open Source contributor
Big Data projects in Media, Betting, Retail and 

Investment Banks in London
Books
Author, Programming MapReduce with Scalding


Founder of Landoop
DevOps Big Data Scala
Automation Distributed Systems Monitoring
Hadoop Fast Data / Streams Kafka
KAFKA CONNECT
a bit of context
KAFKA CONNECT
“a common framework
for allowing stream data flow
between kafka and other systems”
Data is produced from a source and consumed to a sink.
Data Source
KafkaConnect
KafkaConnect
KAFKA Data SinkData Source
KafkaConnect
KafkaConnect
KAFKA Data Sink
Stream processing
Data Source
KafkaConnect
KafkaConnect
KAFKA Data Sink
Stream processing
E T L
Developers don’t care about:

Move data to/from sink/source
Support delivery semantics
Offset Management
Serialization / de-serialization
Partitioning / Scalability
Fault tolerance / fail-over
Schema Registry integration
Developers care about:

Domain specific transformations
CONNECTORS
Kafka Connect’s framework allows developers to create connectors that
copy data to/from other systems just by writing configuration files and
submitting them to Connect with no code necessary
Connector configurations are key-value mappings
name connector’s unique name
connector.class connector’s java class
tasks.max maximum tasks to create
topics list of topics (to source or sink data)
Introducing a query language for the connectors
name connector’s unique name
connector.class connector’s java class
tasks.max maximum tasks to create
topics list of topics (to source or sink data)
query KCQL query specifies fields/actions for the target system
KCQL
Kafka Connect Query Language
is a SQL like syntax allowing streamlined configuration of Kafka Sink Connectors and then some more..
Example:
Project fields, rename or ignore them and further customise in plain text
INSERT INTO transactions SELECT field1 AS column1, field2 AS column2, field3 FROM TransactionTopic;
INSERT INTO audits SELECT * FROM AuditsTopic;
INSERT INTO logs SELECT * FROM LogsTopic AUTOEVOLVE;
INSERT INTO invoices SELECT * FROM InvoiceTopic PK invoiceID;
So while integrating
Kafka with in-memory
data grid, key-value,
document stores,
NoSQL, search etc
systems..
INSERT INTO $TARGET
SELECT *|columns(i.e col1,col2 | col1 AS column1,col2)
FROM $TOPIC_NAME
[ IGNORE columns ]
[ AUTOCREATE ]
[ PK columns ]
[ AUTOEVOLVE ]
[ BATCH = N ]
[ CAPITALIZE ]
[ INITIALIZE ]
[ PARTITIONBY cola[,colb] ]
[ DISTRIBUTEBY cola[,colb] ]
[ CLUSTERBY cola[,colb] ]
[ TIMESTAMP cola|sys_current ]
[ STOREAS $YOUR_TYPE([key=value, .....]) ]
[ WITHFORMAT TEXT|AVRO|JSON|BINARY|OBJECT|MAP ]
KCQL
How does it look like?
Topic to target mapping
Field selection
Auto creation
Auto evolution
Error policies
Multiple KCQLs / topic 

- Field extraction

- Access to Key & Metadata
Why KCQL ?
KCQL
Advanced Features Examples
KCQL |
{ "sensor_id": "01" , "temperature": 52.7943, "ts": 1484648810 }
{ “sensor_id": "02" , "temperature": 28.8597, "ts": 1484648810 }
Example Kafka topic with IoT data
INSERT INTO sensor_ringbuffer 

SELECT sensor_id, temperature, ts 

FROM coap_sensor_topic 

WITHFORMAT JSON

STOREAS RING_BUFFER
INSERT INTO sensor_reliabletopic 

SELECT sensor_id, temperature, ts

FROM coap_sensor_topic 

WITHFORMAT AVRO

STOREAS RELIABLE_TOPIC
INSERT INTO FXSortedSet 

SELECT symbol, price 

FROM yahooFX-topic 

STOREAS SortedSet(score=ts)
SELECT price 

FROM yahooFX-topic 

PK symbol 

STOREAS SortedSet(score=ts)
KCQL |
{ "symbol": "USDGBP" , "price": 0.7943, "ts": 1484648810 }
{ "symbol": "EURGBP" , "price": 0.8597, "ts": 1484648810 }
Example Kafka topic with FX data
B:1 A:2 D:3 C:20
Sorted Set -> { value : score }
Stream reactor connectors support KCQL
kafka-connect-blockchain
kafka-connect-bloomberg
kafka-connect-cassandra
kafka-connect-coap
kafka-connect-druid
kafka-connect-elastic
kafka-connect-ftp
kafka-connect-hazelcast
kafka-connect-hbase
kafka-connect-influxdb
kafka-connect-jms
kafka-connect-kudu
kafka-connect-mongodb
kafka-connect-mqtt
kafka-connect-redis
kafka-connect-rethink
kafka-connect-voltdb
kafka-connect-yahoo
Source: https://github.com/datamountaineer/stream-reactor
Integration Tests: http://coyote.landoop.com/connect/
DEMO
Kafka Connect InfluxDB
We ‘ll need:
• Zookeeper
• Kafka Broker
• Schema Registry
• Kafka Connect Distributed
• Kafka REST Proxy
We ‘ll also use:
• StreamReactor connectors
• Landoop Fast Data Web Tools
docker run --rm -it 
-p 2181:2181 -p 3030:3030 -p 8081:8081 
-p 8082:8082 -p 8083:8083 -p 9092:9092 
-e ADV_HOST=192.168.99.100 
landoop/fast-data-dev
case class DeviceMeasurements(

deviceId: Int,
temperature: Int,
moreData: String,
timestamp: Long)
We’ll generate some Avro messages
DEMO
Kafka Development Environment
@ Fast-data-dev docker image
https://hub.docker.com/r/landoop/fast-data-dev/
DEMO
Integration testing with Coyote
for connectors & infrastructure
https://github.com/Landoop/coyote
Schema Registry UI
https://github.com/Landoop/schema-registry-ui
Kafka Topics UI
https://github.com/Landoop/kafka-topics-ui
Kafka Connect UI
https://github.com/Landoop/kafka-connect-ui
Connectors Performance
Monitoring & Alerting
via JMX
Deployment

apps
Containers 

mesos -kubernetes
Hadoop 

integration
* state-less apps = container-friendly

schema registry, kafka connect
How do I IT?
Available features: 

Kafka ecosystem
StreamReactor
Connectors
Landoop web tools
Monitoring & Alerting
Security features
Wrap up
- KCQL
- Connectors
- Kafka Web Tools
- Automation & Integrations
Coming up
- Kafka backend
enhanced UIs | Timetravel
$ locate
https://github.com/Landoop
https://hub.docker.com/r/landoop/
https://github.com/datamountaineer/stream-reactor
http://www.landoop.com
Thank you ;)

Contenu connexe

Tendances

Akka Microservices Architecture And Design
Akka Microservices Architecture And DesignAkka Microservices Architecture And Design
Akka Microservices Architecture And DesignYaroslav Tkachenko
 
Robust Operations of Kafka Streams
Robust Operations of Kafka StreamsRobust Operations of Kafka Streams
Robust Operations of Kafka Streamsconfluent
 
Building Out Your Kafka Developer CDC Ecosystem
Building Out Your Kafka Developer CDC  EcosystemBuilding Out Your Kafka Developer CDC  Ecosystem
Building Out Your Kafka Developer CDC Ecosystemconfluent
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark StreamingKnoldus Inc.
 
Kafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema RegistryKafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema RegistryJean-Paul Azar
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaJoe Stein
 
Revitalizing Enterprise Integration with Reactive Streams
Revitalizing Enterprise Integration with Reactive StreamsRevitalizing Enterprise Integration with Reactive Streams
Revitalizing Enterprise Integration with Reactive StreamsLightbend
 
Kafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersKafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersJean-Paul Azar
 
Kafka connect 101
Kafka connect 101Kafka connect 101
Kafka connect 101Whiteklay
 
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...confluent
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Knoldus Inc.
 
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesLightbend
 
Spark streaming and Kafka
Spark streaming and KafkaSpark streaming and Kafka
Spark streaming and KafkaIraj Hedayati
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Erik Onnen
 
Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overviewconfluent
 
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...confluent
 
Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containers
Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker ContainersKafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containers
Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containersconfluent
 

Tendances (20)

Kafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internalsKafka blr-meetup-presentation - Kafka internals
Kafka blr-meetup-presentation - Kafka internals
 
Akka Microservices Architecture And Design
Akka Microservices Architecture And DesignAkka Microservices Architecture And Design
Akka Microservices Architecture And Design
 
Robust Operations of Kafka Streams
Robust Operations of Kafka StreamsRobust Operations of Kafka Streams
Robust Operations of Kafka Streams
 
Building Out Your Kafka Developer CDC Ecosystem
Building Out Your Kafka Developer CDC  EcosystemBuilding Out Your Kafka Developer CDC  Ecosystem
Building Out Your Kafka Developer CDC Ecosystem
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Kafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema RegistryKafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema Registry
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
 
Revitalizing Enterprise Integration with Reactive Streams
Revitalizing Enterprise Integration with Reactive StreamsRevitalizing Enterprise Integration with Reactive Streams
Revitalizing Enterprise Integration with Reactive Streams
 
Kafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersKafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced Producers
 
Kafka connect 101
Kafka connect 101Kafka connect 101
Kafka connect 101
 
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
 
Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
 
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
 
Spark streaming and Kafka
Spark streaming and KafkaSpark streaming and Kafka
Spark streaming and Kafka
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
 
Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overview
 
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
 
Kafka
KafkaKafka
Kafka
 
Kafka Connect
Kafka ConnectKafka Connect
Kafka Connect
 
Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containers
Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker ContainersKafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containers
Kafka Summit SF 2017 - Best Practices for Running Kafka on Docker Containers
 

En vedette

From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...
From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...
From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...Landoop Ltd
 
Athens BigData Meetup - Sept 17
Athens BigData Meetup - Sept 17Athens BigData Meetup - Sept 17
Athens BigData Meetup - Sept 17Landoop Ltd
 
Kafka Tutorial - DevOps, Admin and Ops
Kafka Tutorial - DevOps, Admin and OpsKafka Tutorial - DevOps, Admin and Ops
Kafka Tutorial - DevOps, Admin and OpsJean-Paul Azar
 
Kafka Tutorial: Streaming Data Architecture
Kafka Tutorial: Streaming Data ArchitectureKafka Tutorial: Streaming Data Architecture
Kafka Tutorial: Streaming Data ArchitectureJean-Paul Azar
 
Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?Micron Technology
 
Kafka Tutorial Advanced Kafka Consumers
Kafka Tutorial Advanced Kafka ConsumersKafka Tutorial Advanced Kafka Consumers
Kafka Tutorial Advanced Kafka ConsumersJean-Paul Azar
 

En vedette (7)

From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...
From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...
From Big to Fast Data. How #kafka and #kafka-connect can redefine you ETL and...
 
Athens BigData Meetup - Sept 17
Athens BigData Meetup - Sept 17Athens BigData Meetup - Sept 17
Athens BigData Meetup - Sept 17
 
Kafka Tutorial - DevOps, Admin and Ops
Kafka Tutorial - DevOps, Admin and OpsKafka Tutorial - DevOps, Admin and Ops
Kafka Tutorial - DevOps, Admin and Ops
 
Python and test
Python and testPython and test
Python and test
 
Kafka Tutorial: Streaming Data Architecture
Kafka Tutorial: Streaming Data ArchitectureKafka Tutorial: Streaming Data Architecture
Kafka Tutorial: Streaming Data Architecture
 
Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?Connect K of SMACK:pykafka, kafka-python or?
Connect K of SMACK:pykafka, kafka-python or?
 
Kafka Tutorial Advanced Kafka Consumers
Kafka Tutorial Advanced Kafka ConsumersKafka Tutorial Advanced Kafka Consumers
Kafka Tutorial Advanced Kafka Consumers
 

Similaire à London Apache Kafka Meetup (Jan 2017)

KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!Guido Schmutz
 
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLBuilding a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLScyllaDB
 
Event streaming webinar feb 2020
Event streaming webinar feb 2020Event streaming webinar feb 2020
Event streaming webinar feb 2020Maheedhar Gunturu
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kai Wähner
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKai Wähner
 
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLSteps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLconfluent
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...Kai Wähner
 
ksqlDB Workshop
ksqlDB WorkshopksqlDB Workshop
ksqlDB Workshopconfluent
 
Kafka meetup - kafka connect
Kafka meetup -  kafka connectKafka meetup -  kafka connect
Kafka meetup - kafka connectYi Zhang
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Codemotion
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Codemotion
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLNick Dearden
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingGuozhang Wang
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsGuido Schmutz
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Helena Edelson
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkEvan Chan
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...HostedbyConfluent
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingYaroslav Tkachenko
 
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...HostedbyConfluent
 

Similaire à London Apache Kafka Meetup (Jan 2017) (20)

KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQLBuilding a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
Building a Real-time Streaming ETL Framework Using ksqlDB and NoSQL
 
Event streaming webinar feb 2020
Event streaming webinar feb 2020Event streaming webinar feb 2020
Event streaming webinar feb 2020
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)
 
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache KafkaKSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
KSQL Deep Dive - The Open Source Streaming Engine for Apache Kafka
 
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLSteps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
 
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
KSQL – The Open Source SQL Streaming Engine for Apache Kafka (Big Data Spain ...
 
ksqlDB Workshop
ksqlDB WorkshopksqlDB Workshop
ksqlDB Workshop
 
Kafka meetup - kafka connect
Kafka meetup -  kafka connectKafka meetup -  kafka connect
Kafka meetup - kafka connect
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
 
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
Kai Waehner - KSQL – The Open Source SQL Streaming Engine for Apache Kafka - ...
 
Streaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQLStreaming ETL with Apache Kafka and KSQL
Streaming ETL with Apache Kafka and KSQL
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
 
Spark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka StreamsSpark (Structured) Streaming vs. Kafka Streams
Spark (Structured) Streaming vs. Kafka Streams
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
 
KSQL Intro
KSQL IntroKSQL Intro
KSQL Intro
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to StreamingBravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streaming
 
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
 

Dernier

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Dernier (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

London Apache Kafka Meetup (Jan 2017)

  • 1. Delivering Fast Data Systems with Kafka LANDOOP www.landoop.com Antonios Chalkiopoulos 18/1/2017
  • 2. @chalkiopoulos Open Source contributor Big Data projects in Media, Betting, Retail and 
 Investment Banks in London Books Author, Programming MapReduce with Scalding 
 Founder of Landoop
  • 3. DevOps Big Data Scala Automation Distributed Systems Monitoring Hadoop Fast Data / Streams Kafka
  • 4. KAFKA CONNECT a bit of context
  • 5. KAFKA CONNECT “a common framework for allowing stream data flow between kafka and other systems”
  • 6. Data is produced from a source and consumed to a sink. Data Source KafkaConnect KafkaConnect KAFKA Data SinkData Source KafkaConnect KafkaConnect KAFKA Data Sink Stream processing
  • 8. Developers don’t care about:
 Move data to/from sink/source Support delivery semantics Offset Management Serialization / de-serialization Partitioning / Scalability Fault tolerance / fail-over Schema Registry integration Developers care about:
 Domain specific transformations
  • 9. CONNECTORS Kafka Connect’s framework allows developers to create connectors that copy data to/from other systems just by writing configuration files and submitting them to Connect with no code necessary
  • 10. Connector configurations are key-value mappings name connector’s unique name connector.class connector’s java class tasks.max maximum tasks to create topics list of topics (to source or sink data)
  • 11. Introducing a query language for the connectors name connector’s unique name connector.class connector’s java class tasks.max maximum tasks to create topics list of topics (to source or sink data) query KCQL query specifies fields/actions for the target system
  • 12. KCQL Kafka Connect Query Language is a SQL like syntax allowing streamlined configuration of Kafka Sink Connectors and then some more.. Example: Project fields, rename or ignore them and further customise in plain text INSERT INTO transactions SELECT field1 AS column1, field2 AS column2, field3 FROM TransactionTopic; INSERT INTO audits SELECT * FROM AuditsTopic; INSERT INTO logs SELECT * FROM LogsTopic AUTOEVOLVE; INSERT INTO invoices SELECT * FROM InvoiceTopic PK invoiceID;
  • 13. So while integrating Kafka with in-memory data grid, key-value, document stores, NoSQL, search etc systems.. INSERT INTO $TARGET SELECT *|columns(i.e col1,col2 | col1 AS column1,col2) FROM $TOPIC_NAME [ IGNORE columns ] [ AUTOCREATE ] [ PK columns ] [ AUTOEVOLVE ] [ BATCH = N ] [ CAPITALIZE ] [ INITIALIZE ] [ PARTITIONBY cola[,colb] ] [ DISTRIBUTEBY cola[,colb] ] [ CLUSTERBY cola[,colb] ] [ TIMESTAMP cola|sys_current ] [ STOREAS $YOUR_TYPE([key=value, .....]) ] [ WITHFORMAT TEXT|AVRO|JSON|BINARY|OBJECT|MAP ] KCQL How does it look like?
  • 14. Topic to target mapping Field selection Auto creation Auto evolution Error policies Multiple KCQLs / topic 
 - Field extraction
 - Access to Key & Metadata Why KCQL ?
  • 16. KCQL | { "sensor_id": "01" , "temperature": 52.7943, "ts": 1484648810 } { “sensor_id": "02" , "temperature": 28.8597, "ts": 1484648810 } Example Kafka topic with IoT data INSERT INTO sensor_ringbuffer 
 SELECT sensor_id, temperature, ts 
 FROM coap_sensor_topic 
 WITHFORMAT JSON
 STOREAS RING_BUFFER INSERT INTO sensor_reliabletopic 
 SELECT sensor_id, temperature, ts
 FROM coap_sensor_topic 
 WITHFORMAT AVRO
 STOREAS RELIABLE_TOPIC
  • 17. INSERT INTO FXSortedSet 
 SELECT symbol, price 
 FROM yahooFX-topic 
 STOREAS SortedSet(score=ts) SELECT price 
 FROM yahooFX-topic 
 PK symbol 
 STOREAS SortedSet(score=ts) KCQL | { "symbol": "USDGBP" , "price": 0.7943, "ts": 1484648810 } { "symbol": "EURGBP" , "price": 0.8597, "ts": 1484648810 } Example Kafka topic with FX data B:1 A:2 D:3 C:20 Sorted Set -> { value : score }
  • 18. Stream reactor connectors support KCQL kafka-connect-blockchain kafka-connect-bloomberg kafka-connect-cassandra kafka-connect-coap kafka-connect-druid kafka-connect-elastic kafka-connect-ftp kafka-connect-hazelcast kafka-connect-hbase kafka-connect-influxdb kafka-connect-jms kafka-connect-kudu kafka-connect-mongodb kafka-connect-mqtt kafka-connect-redis kafka-connect-rethink kafka-connect-voltdb kafka-connect-yahoo Source: https://github.com/datamountaineer/stream-reactor Integration Tests: http://coyote.landoop.com/connect/
  • 19. DEMO Kafka Connect InfluxDB We ‘ll need: • Zookeeper • Kafka Broker • Schema Registry • Kafka Connect Distributed • Kafka REST Proxy We ‘ll also use: • StreamReactor connectors • Landoop Fast Data Web Tools docker run --rm -it -p 2181:2181 -p 3030:3030 -p 8081:8081 -p 8082:8082 -p 8083:8083 -p 9092:9092 -e ADV_HOST=192.168.99.100 landoop/fast-data-dev case class DeviceMeasurements(
 deviceId: Int, temperature: Int, moreData: String, timestamp: Long) We’ll generate some Avro messages
  • 20. DEMO Kafka Development Environment @ Fast-data-dev docker image https://hub.docker.com/r/landoop/fast-data-dev/
  • 21. DEMO Integration testing with Coyote for connectors & infrastructure https://github.com/Landoop/coyote
  • 27. Deployment
 apps Containers 
 mesos -kubernetes Hadoop 
 integration * state-less apps = container-friendly
 schema registry, kafka connect How do I IT? Available features: 
 Kafka ecosystem StreamReactor Connectors Landoop web tools Monitoring & Alerting Security features
  • 28.
  • 29.
  • 30. Wrap up - KCQL - Connectors - Kafka Web Tools - Automation & Integrations
  • 31. Coming up - Kafka backend enhanced UIs | Timetravel

Notes de l'éditeur

  1. Thank you very much for coming today. I will be delivering a talk about building Fast Data systems with Kafka
  2. My name is Antonios. I’ve been involved with open-source projects on distributed systems of the Hadoop eco-system, and currently, i’m having Apache Kafka in my heart :) I have authored a book on MapReduce using Scalding and co-authored another one
  3. Landoop is a company starting-up and focusing on DevOps, Distributed Systems and particularly Apache Kafka
  4. Today i’d like to start the presentation with Kafka Connect. I guess most of you are already familiar with it, so will give a quick overview
  5. Kafka Connect was introduced almost one year ago, as a feature of Apache Kafka 0.9+ with the narrow (although very important) scope of copying streaming data from and to a Kafka cluster. I found the concept really interesting and decided to experiment with it to see what this framework introduces. Kafka Connect is part of the Apache Kafka project, open source under the Apache license, and ships with Kafka. It’s a framework for building connectors between other data systems and Kafka, and the associated runtime to run these connectors in a distributed, fault tolerant manner at scale. The announcement by confluent https://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-low-latency-data-pipelines/
  6. And this is how Kafka Connect fits into the picture on a Kafka based system. You would normally use a stream processing framework to transform your data streams i.e. Spark, K-Streams, etc
  7. And what Kafka Connect offers is the separation of concerns. It can simplify the key stages of the ETL process, and using simple tools, we can build and maintain distributed streaming data pipeline. The E (the extraction) and the L (the load) can be taken care for you, and then as a developer you can focus on the T (the transformations) By combining Kafka Connect and stream processing engines we can perform streaming ETLs. Each does the job it is best at, and Kafka acts as the underlying data storage layer that supports them and allows simple integration with a variety of other applications.
  8. By using a robust framework that delivers scalability and fault tolerance out-of-the-box we can then focus on extracting value in a transformation layer. deployments to deliver fault tolerance and automated load balancing As you can see here, the basic pattern is to use Kafka Connect to perform Extraction of the data and load it into Kafka as a temporary, scalable, fault tolerant streaming data store. While you can do this with other, more generic data copy tools, you’ll commonly lose important semantics such as at least once delivery of data. Once the data is extracted, you use stream processing engines to perform Transformation and either this is the endpoint for the data or you can deliver it back into Kafka. Finally, Load the data with another Kafka connector into the destination system. Obviously this is a simplified picture and your pipeline will grow much more complex, have multiple stages of transformation (especially if the intermediate data is useful for reuse by multiple applications, including anything downstream that may not be processed by stream processing engines).
  9. Most configurations are connector dependent, but there are a few settings common to all connectors
  10. What we are introducing to all our Kafka connectors is the KCQL query
  11. Let’s look at some of the more advanced features of KCQL - and in particular regarding some sinks.
  12. Hazelcast for example supports the Ring Buffer Data structure, which is quite popular from the Disruptor pattern. Data can be pushed in a fixed-size buffer, with a particular retention period. If the buffer gets filled - an eviction policy will be triggered - to either evict oldest records, or deny the addition of new records. So to write some IoT data from a Kafka topic into a Ring Buffer - we can use the STOREAS keyword. On the right side, you can see how we can store the same data into a RELIABLE TOPIC - another hazel cast data-structure. *Hazelcast requires data to be serializable, and JSON and Avro are supported.
  13. Redis provides the Sorted Set data structure. This structure allows only unique elements to be added - and each element is required to be scored - to enforce ordering. This data structure is oftenly used to preserve time-series data, as Redis allows running time-range queries. So if we have a Kafka topic with Foreign Exchange data, we can either -store all the messages into a SortedSet ( the one with the blue colour) OR -create a new SortedSet for each symbol ( one SortedSet for each currency rate ) using the PK syntax on the right
  14. So this is a list of Apache 2.0 licensed Kafka Connectors that we have been working on. Blockchain, Bloomberg, the Cassandra connector that is certified by DataStax, a Constrained Application Protocol connector, Elastic Search, JMS, MQTT and others are some of the connectors already available, and released against the 2 latest releases of Apache Kafka.
  15. https://github.com/Landoop/fast-data-dev
  16. So let’s see a DEMO in real-time http://fast-data-dev.demo.landoop.com
  17. So let’s see a DEMO in real-time https://coyote.landoop.com/connect/
  18. So let’s see a DEMO in real-time http://schema-registry-ui.landoop.com
  19. So let’s see a DEMO in real-time http://kafka-topics-ui.landoop.com
  20. So let’s see a DEMO in real-time http://kafka-connect-ui.landoop.com
  21. Connectors look overall simple - and i know a number of people in this room already using them in production. So how does performance look like ? This image above demonstrates that depending on the sink system - we can sink 50 K records / sec by using: 20 partitions 3 connect tasks 5 GB RAM / connector less than 2 CPUs On the bottom-left corner - we can see that we have saturated 50% of the available network bandwidth. Depending on the number of tasks and partitions - we can easily increase sink performance to more than 100K records / sec. The lesson regarding performance is that: Kafka Connect can scale really well It requires quite some memory and quite some CPUs especially if batching writes
  22. We have also send Pull Requests to the prometheus team - to enable GZIP compression - to minimise any impact in the running system, something that has significantly decreased the network i/o We then provide pre-built DashBoards on Grafana We are using Grafana version 4.0 released a few months ago - that allows alerting that is a really revolutionary feature as it transforms Grafana from a visualisation tool into a truly mission critical monitoring tool We’ll have a demo, but before going into it ..
  23. Before doing a Live presentation - i’d like to answer a question : How do i ship such a complex infrastructure that can easily grow into Hundreds of running services ? We preferably use: Deployment apps such as Ansible Docker based technologies for state-less micro-services CDH based integration with Cloudera Managed for CDH Hadoop clusters
  24. https://docs.landoop.com/
  25. CDH docs - https://docs.landoop.com/
  26. More connectors are added monthly
  27. Time-Travel in Kafka topics and KCQL queries and real-time
  28. http://www.landoop.com https://github.com/Landoop https://github.com/datamountaineer/stream-reactor https://hub.docker.com/r/landoop/