Jug - ecosystem

JUG
Florent Ramière
Technical Account Manager
florent@confluent.io
@framiere

Agenda
1.Confluent
2.Streaming
3.KSQL
4.Demo
5.Resources
6.Q&A!

About Confluent and Apache Kafka™
70% of active Kafka Committers
Founded 
September 2014
Technology developed  
while at LinkedIn
Founded by the creators of
Apache Kafka
Cheryl Dalrymple 
CFO
Luanne Dauber 
CMO
Simon Hayes 
Head of Corporate &
Business Development
Jay Kreps 
CEO
Todd Barnett 
VP WW Sales
Neha Narkhede 
CTO, VP Engineering
Sarah Sproehnle
VP Customer Success

Why a Streaming Platform?
All your data
Real-time
Fault tolerant
Secure

Confluent Platform: Enterprise Streaming based on Apache Kafka
Database Changes Log Events loT Data Web Events …
CRM
Data Warehouse
Database
Hadoop
Data 
Integration
…
Monitoring
Analytics
Custom Apps
Transformations
Real-time Applications
…
Apache Open Source Confluent Open Source Confluent Enterprise
Confluent Platform
Apache Kafka®
Core | Connect API | Streams API
Data Compatibility
Schema Registry
Confluent Platform
Monitoring & Administration
Confluent Control Center | Security
Operations
Replicator | Auto Data Balancing | JMS Client | JMS Connectors
Development and Connectivity
Clients | Connectors | REST Proxy | CLI
Apache Open Source Confluent Open Source Confluent Enterprise
SQL Stream Processing
KSQL (Streams API)

A Kafka story!
https://github.com/framiere/a-kafka-story

No Panic, it's a walktrough!
https://github.com/framiere/a-kafka-story/tree/master/step6

Trade-Offs
• subscribe()
• poll()
• send()
• flush()
Consumer,
Producer
Flexibility Simplicity

Trade-Offs
• subscribe()
• poll()
• send()
• flush()
Consumer,
Producer
• filter()
• join()
• aggregate()
Kafka Streams

App
Streams
API
Not running
inside brokers!

Brokers?
Nope!
App
Streams
API
App
Streams
API
App
Streams
API
Same app, many instances

Before
DashboardProcessing Cluster
Your Job
Shared Database

After
Dashboard
APP
Streams
API

Things Kafka Streams Does
Runs
everywhere
Clustering
done for you
Exactly-once
processing
Event-time
processing
Integrated
database
Joins, windowing,
aggregation
S/M/L/XL/XXL/XXXL
sizes

Stream Processing in Kafka
● KStream
KStream<byte[], String> textLines = builder
.stream("textlines-topic", Consumed.with(Serdes.ByteArray(), Serdes.String()))
.mapValues(String::toUpperCase));
KTable<String, Long> wordCounts = textLines
.flatMapValues(textLine -> Arrays.asList(textLine.split("W+")))
.groupBy((key, word) -> word)
.count();
● KTable

Trade-Offs
• subscribe()
• poll()
• send()
• flush()
Consumer,
Producer
• filter()
• join()
• aggregate()
Kafka Streams
• Copy In
• Copy Out
• SMT
Kafka Connect

Apache Kafka™ Connect API – Streaming Data Capture
JDBC
Mongo
MySQL
Elastic
Cassandra
HDFS 
Kafka Connect API
Kafka Pipeline
Connector
Connector
Connector
Connector
Connector
Connector
Sources Sinks
Fault tolerant
Manage hundreds of data
sources and sinks
Preserves data schema
Part of Apache Kafka project
Integrated within Confluent
Platform’s Control Center
Flexible Integrated Reliable Compatible
Connect any source to any target system

Single Message Transforms
•Mask sensitive information
•Add identifiers
•Tag events
•Lineage/provenance
•Remove unnecessary
columns
•Route high priority events to
faster data stores
•Direct events to different
Elasticsearch indexes
•Cast data types to match
destination
•Remove unnecessary
columns
Modify events before storing in
Kafka:
Modify events going out of Kafka:

But…Easy to Implement
/** 
* Single message transformation for Kafka Connect record types. 
* 
* Connectors can be configured with transformations to make lightweight
* message-at-a-time modifications. 
*/ 
public interface Transformation<R extends ConnectRecord<R>> extends Configurable, Closeable { 
 
/** 
* Apply transformation to the {@code record} and return another record object. 
* 
* The implementation must be thread-safe. 
*/ 
R apply(R record);
 
 
/** Configuration specification for this transformation. **/ 
ConfigDef config();
 
 
/** Signal that this transformation instance will no longer will be used. **/ 
@Override 
void close(); 
 
}

Trade-Offs
• subscribe()
• poll()
• send()
• flush()
Consumer,
Producer
• filter()
• join()
• aggregate()
Kafka Streams
• Select…from…
• Join…where…
• Group by..
KSQL

Declarative
Stream
Language
Processing
KSQLis a

KSQL for Data Exploration
SELECT status, bytes
FROM clickstream
WHERE user_agent =
'Mozilla/5.0 (compatible; MSIE 6.0)';
An easy way to inspect data in a running cluster

KSQL for Streaming ETL
• Kafka is popular for data pipelines.
• KSQL enables easy transformations of data within the pipe.
• Transforming data while moving from Kafka to another system.
CREATE STREAM vip_actions AS  
SELECT userid, page, action FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id  
WHERE u.level = 'Platinum';

KSQL for Anomaly Detection
CREATE TABLE possible_fraud AS 
SELECT card_number, count(*) 
FROM authorization_attempts  
WINDOW TUMBLING (SIZE 5 SECONDS) 
GROUP BY card_number 
HAVING count(*) > 3;
Identifying patterns or anomalies in real-time data,
surfaced in milliseconds

Once again
KSQL implements for you the Kafka Stream
application you would have implemented if you had
• ... the time
• ... the experience
• ... the KSQL as a spec
• ... the willingness to do boring code

KSQL is really Kafka Stream ? ... yes!
./confluent start
./jmc&
echo '{"something":"value"}' | ./kafka-console-producer --broker-list localhost:9092 --topic temp
./kafka-console-consumer --bootstrap-server localhost:9092 --topic temp --from-beginning
{"something":"value"}
./ksql
ksql>SET 'auto.offset.reset' = 'earliest';
ksql>CREATE STREAM TEMP (something varchar) WITH ( kafka_topic='temp',value_format='JSON');
ksql>SELECT * FROM TEMP;
1526371655810 | null | value

Where is KSQL not such a great fit?
BI reports (Tableau etc.)
• No indexes
• No JDBC (most BI tools are not
good with continuous results!)
Ad-hoc queries
• Limited span of time usually
retained in Kafka
• No indexes

Demo fun
https://www.confluent.io/blog/taking-ksql-spin-using-real-time-device-data/

Demo ... less fun
https://bit.ly/2KqPZYo

Demo ... less fun
https://bit.ly/2L5l2dj

Demo ... less fun
https://github.com/framiere/a-kafka-story/tree/master/step19
Change Data Capture
Docker
Producer
Consumer
Kafka Stream
KSQL
Event sourcing
Influxdb
Grafana
S3
docker run --rm -it --name dcv -v $(pwd):/input pmsipilot/docker-compose-viz
render --horizontal --output-format image --force
docker-compose.yml

Demo ... less fun
... without Confluent Control Center links

Confluent 4.2 - Nested Types
SELECT userid, address.city
FROM users
WHERE address.state = 'CA'
https://github.com/confluentinc/ksql/pull/1114

Confluent 4.2 - Remaining joins
SELECT orderid, shipmentid
FROM orders INNER JOIN shipments
ON order.id = shipmentid;

Where to go from here
● KSQL project page
○ https://www.confluent.io/product/ksql
● Confluent blog
○ http://blog.confluent.io/
● Blog Formule1 game
○ https://www.confluent.io/blog/taking-ksql-spin-using-real-time-device-data/
● KSQL github repo
○ https://github.com/confluentinc/ksql
● CP-Demo
○ https://github.com/confluentinc/cp-demo
● A-Kafka-Story
○ https://github.com/framiere/a-kafka-story
● Un tour de l'environement Kafka
○ https://www.youtube.com/watch?v=BBo-rqmhpDM
● KSQL Recipies
○ https://github.com/bluemonk3y/ksql-recipe-fraudulent-txns/

Confluent Download – 4.1 – Kafka 1.1 and KSQL GA

KSQL Capacity Planning – Sizing
https://docs.confluent.io/current/ksql/docs/capacity-planning.html

Resources - Confluent Enterprise Reference Architecture
https://www.confluent.io/whitepaper/confluent-enterprise-reference-
architecture/

Resources – Community Slack and Mailing List
https://slackpass.io/confluentcommunity
https://groups.google.com/forum/#!forum/confluent-platform

Devoxx France
https://www.youtube.com/watch?v=BBo-rqmhpDM

Jug - ecosystem

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Jug - ecosystem

Similaire à Jug - ecosystem (20)

Plus de Florent Ramiere

Plus de Florent Ramiere (6)

Dernier

Dernier (20)

Jug - ecosystem