SlideShare a Scribd company logo
1 of 18
Download to read offline
Real-Time Data Processing Pipeline &
Visualization with Docker, Spark, Kafka
and Cassandra
Roberto G. Hashioka – 2016-10-04 – TIAD – Paris
Personal Information
• Roberto Gandolfo Hashioka
• @rogaha (Github) e @rhashioka (Twitter)
• Finance -> Software Engineer
• Growth & Data Engineer at Docker
Summary
• Background / Motivation
• Project Goals
• How to build it?
• DEMO
Background
• Gather of data from multiple sources and process them in “real-time”
• Transform raw data into meaningful and useful information used to enable more effective
decision-making process
• Provide more visibility into trends on: 1) user behavior 2) feature engagement 3) opportunities
for future investments
• Data transparency and standardization
Project Goals
• Create a data processing pipeline that can handle a huge amount of events per second
• Automate the development environment — Docker compose.
• Automate the remote machines management — Docker for AWS / Machine.
• Reduce the time to market / time to development — New hires / new features.
Project / Language Stack
How to build it?
• Step 1: Install Docker for Mac/Win and dockerize all the applications
link: https://www.docker.com/products/docker
Exemplo de Dockerfile
-----------------------------------------------------------------------------------------------------------
FROM ubuntu:14.04
MAINTAINER Roberto Hashioka (roberto@docker.com)
RUN apt-get update && apt-get install -y nginx
RUN echo “Hello World! #TIAD” > /usr/share/nginx/html/index.html
EXPOSE 80
------------------------------------------------------------------------------------------------------------
$ docker build –t rogaha/web_demotiad2016 .
$ docker run –d –p 80:80 –-name web_demotiad2016 rogaha/web_demotiad2016
How to build it?
• Step 2: Define your services stack with a docker-compose file
Docker Compose
containers:
web:
build: .
command: python app.py
ports:
- "5000:5000"
volumes:
- .:/code
links:
- redis
environment:
- PYTHONUNBUFFERED=1
redis:
image: redis:latest
command: redis-server --appendonly yes
How to build it?
• Step 3: Test the applications locally from your laptop using containers
How to build it?
How to build it?
• Step 4: Provision your remote servers and deploy your containers
How to build it?
How to build it?
• Step 5: Scale your services with Docker swarm
DEMO
source code: https://github.com/rogaha/data-processing-pipeline
Open Source Projects Used
• Docker (https://github.com/docker/docker)
• An open platform for distributed applications for developers and sysadmins
• Apache Spark / Spark SQL (https://github.com/apache/spark)
• A fast, in-memory data processing engine. Spark SQL lets you query structured data as a resilient distributed dataset (RDD)
• Apache Kafka (https://github.com/apache/kafka)
• A fast and scalable pub-sub messaging service
• Apache Zookeeper (https://github.com/apache/zookeeper)
• A distributed configuration service, synchronization service, and naming registry for large distributed systems
• Apache Cassandra (https://github.com/apache/cassandra)
• Scalable, high-available and distributed columnar NoSQL database
• D3 (https://github.com/mbostock/d3)
• A JavaScript visualization library for HTML and SVG.
Thanks!
Questions?
@rhashioka

More Related Content

What's hot

RedisConf17 - Redis Graph
RedisConf17 - Redis GraphRedisConf17 - Redis Graph
RedisConf17 - Redis GraphRedis Labs
 
GIT presentation
GIT presentationGIT presentation
GIT presentationNaim Latifi
 
Zero Downtime Deployment
Zero Downtime DeploymentZero Downtime Deployment
Zero Downtime DeploymentJoel Dickson
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Brian Brazil
 
MySQL Operator for Kubernetes
MySQL Operator for KubernetesMySQL Operator for Kubernetes
MySQL Operator for KubernetesKenny Gryp
 
Learning git
Learning gitLearning git
Learning gitSid Anand
 
Getting started with Docker
Getting started with DockerGetting started with Docker
Getting started with DockerRavindu Fernando
 
NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?Anton Zadorozhniy
 
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptxGrafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptxRomanKhavronenko
 
Monolith vs Microservices with Golang at practice - Ivan Kutuzov
Monolith vs Microservices with Golang at practice  -  Ivan Kutuzov Monolith vs Microservices with Golang at practice  -  Ivan Kutuzov
Monolith vs Microservices with Golang at practice - Ivan Kutuzov Kuberton
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaArvind Kumar G.S
 
Dual write strategies for microservices
Dual write strategies for microservicesDual write strategies for microservices
Dual write strategies for microservicesBilgin Ibryam
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
Handling eventual consistency in a transactional world with Matteo Cimini and...
Handling eventual consistency in a transactional world with Matteo Cimini and...Handling eventual consistency in a transactional world with Matteo Cimini and...
Handling eventual consistency in a transactional world with Matteo Cimini and...HostedbyConfluent
 
CI/CD with an Idempotent Kafka Producer & Consumer | Kafka Summit London 2022
CI/CD with an Idempotent Kafka Producer & Consumer | Kafka Summit London 2022CI/CD with an Idempotent Kafka Producer & Consumer | Kafka Summit London 2022
CI/CD with an Idempotent Kafka Producer & Consumer | Kafka Summit London 2022HostedbyConfluent
 

What's hot (20)

RedisConf17 - Redis Graph
RedisConf17 - Redis GraphRedisConf17 - Redis Graph
RedisConf17 - Redis Graph
 
GIT presentation
GIT presentationGIT presentation
GIT presentation
 
Zero Downtime Deployment
Zero Downtime DeploymentZero Downtime Deployment
Zero Downtime Deployment
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)Prometheus and Docker (Docker Galway, November 2015)
Prometheus and Docker (Docker Galway, November 2015)
 
MySQL Operator for Kubernetes
MySQL Operator for KubernetesMySQL Operator for Kubernetes
MySQL Operator for Kubernetes
 
Learning git
Learning gitLearning git
Learning git
 
Getting started with Docker
Getting started with DockerGetting started with Docker
Getting started with Docker
 
NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?
 
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptxGrafana Mimir and VictoriaMetrics_ Performance Tests.pptx
Grafana Mimir and VictoriaMetrics_ Performance Tests.pptx
 
Monolith vs Microservices with Golang at practice - Ivan Kutuzov
Monolith vs Microservices with Golang at practice  -  Ivan Kutuzov Monolith vs Microservices with Golang at practice  -  Ivan Kutuzov
Monolith vs Microservices with Golang at practice - Ivan Kutuzov
 
Monitoring using Prometheus and Grafana
Monitoring using Prometheus and GrafanaMonitoring using Prometheus and Grafana
Monitoring using Prometheus and Grafana
 
Dual write strategies for microservices
Dual write strategies for microservicesDual write strategies for microservices
Dual write strategies for microservices
 
Ansible 101
Ansible 101Ansible 101
Ansible 101
 
GitHub Presentation
GitHub PresentationGitHub Presentation
GitHub Presentation
 
Prometheus Storage
Prometheus StoragePrometheus Storage
Prometheus Storage
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Handling eventual consistency in a transactional world with Matteo Cimini and...
Handling eventual consistency in a transactional world with Matteo Cimini and...Handling eventual consistency in a transactional world with Matteo Cimini and...
Handling eventual consistency in a transactional world with Matteo Cimini and...
 
CI/CD with an Idempotent Kafka Producer & Consumer | Kafka Summit London 2022
CI/CD with an Idempotent Kafka Producer & Consumer | Kafka Summit London 2022CI/CD with an Idempotent Kafka Producer & Consumer | Kafka Summit London 2022
CI/CD with an Idempotent Kafka Producer & Consumer | Kafka Summit London 2022
 
12-Factor Apps
12-Factor Apps12-Factor Apps
12-Factor Apps
 

Viewers also liked

TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...
TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...
TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...The Incredible Automation Day
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelinesLars Albertsson
 
[246] foursquare데이터라이프사이클 설현준
[246] foursquare데이터라이프사이클 설현준[246] foursquare데이터라이프사이클 설현준
[246] foursquare데이터라이프사이클 설현준NAVER D2
 
Large scale data processing pipelines at trivago
Large scale data processing pipelines at trivago Large scale data processing pipelines at trivago
Large scale data processing pipelines at trivago Clemens Valiente
 
Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSAmazon Web Services
 
[225]yarn 기반의 deep learning application cluster 구축 김제민
[225]yarn 기반의 deep learning application cluster 구축 김제민[225]yarn 기반의 deep learning application cluster 구축 김제민
[225]yarn 기반의 deep learning application cluster 구축 김제민NAVER D2
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Stormviirya
 
[115] clean fe development_윤지수
[115] clean fe development_윤지수[115] clean fe development_윤지수
[115] clean fe development_윤지수NAVER D2
 
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...Brandon O'Brien
 
[211]대규모 시스템 시각화 현동석김광림
[211]대규모 시스템 시각화 현동석김광림[211]대규모 시스템 시각화 현동석김광림
[211]대규모 시스템 시각화 현동석김광림NAVER D2
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민NAVER D2
 
[112]rest에서 graph ql과 relay로 갈아타기 이정우
[112]rest에서 graph ql과 relay로 갈아타기 이정우[112]rest에서 graph ql과 relay로 갈아타기 이정우
[112]rest에서 graph ql과 relay로 갈아타기 이정우NAVER D2
 
[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영NAVER D2
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakHakka Labs
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkGuido Schmutz
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Spark Summit
 

Viewers also liked (18)

TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...
TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...
TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, S...
 
Test strategies for data processing pipelines
Test strategies for data processing pipelinesTest strategies for data processing pipelines
Test strategies for data processing pipelines
 
[246] foursquare데이터라이프사이클 설현준
[246] foursquare데이터라이프사이클 설현준[246] foursquare데이터라이프사이클 설현준
[246] foursquare데이터라이프사이클 설현준
 
Large scale data processing pipelines at trivago
Large scale data processing pipelines at trivago Large scale data processing pipelines at trivago
Large scale data processing pipelines at trivago
 
Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWS
 
[225]yarn 기반의 deep learning application cluster 구축 김제민
[225]yarn 기반의 deep learning application cluster 구축 김제민[225]yarn 기반의 deep learning application cluster 구축 김제민
[225]yarn 기반의 deep learning application cluster 구축 김제민
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 
[115] clean fe development_윤지수
[115] clean fe development_윤지수[115] clean fe development_윤지수
[115] clean fe development_윤지수
 
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
Real Time Data Processing With Spark Streaming, Node.js and Redis with Visual...
 
[211]대규모 시스템 시각화 현동석김광림
[211]대규모 시스템 시각화 현동석김광림[211]대규모 시스템 시각화 현동석김광림
[211]대규모 시스템 시각화 현동석김광림
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민
 
[112]rest에서 graph ql과 relay로 갈아타기 이정우
[112]rest에서 graph ql과 relay로 갈아타기 이정우[112]rest에서 graph ql과 relay로 갈아타기 이정우
[112]rest에서 graph ql과 relay로 갈아타기 이정우
 
[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 

Similar to Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra

Intro to R and H2O with Spencer Aiello
Intro to R and H2O with Spencer AielloIntro to R and H2O with Spencer Aiello
Intro to R and H2O with Spencer AielloSri Ambati
 
Docker Container As A Service - Mix-IT 2016
Docker Container As A Service - Mix-IT 2016Docker Container As A Service - Mix-IT 2016
Docker Container As A Service - Mix-IT 2016Patrick Chanezon
 
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + KubernetesMongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + KubernetesMongoDB
 
Docker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITDocker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITStijn Wijndaele
 
'DOCKER' & CLOUD: ENABLERS For DEVOPS
'DOCKER' & CLOUD:  ENABLERS For DEVOPS'DOCKER' & CLOUD:  ENABLERS For DEVOPS
'DOCKER' & CLOUD: ENABLERS For DEVOPSACA IT-Solutions
 
Building a data warehouse with Pentaho and Docker
Building a data warehouse with Pentaho and DockerBuilding a data warehouse with Pentaho and Docker
Building a data warehouse with Pentaho and DockerWellington Marinho
 
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + KubernetesMongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + KubernetesMongoDB
 
Docker engine - Indroduc
Docker engine - IndroducDocker engine - Indroduc
Docker engine - IndroducAl Gifari
 
Getting started with Docker sandboxes for MariaDB
Getting started with Docker sandboxes for MariaDBGetting started with Docker sandboxes for MariaDB
Getting started with Docker sandboxes for MariaDBMariaDB plc
 
Dayta AI Seminar - Kubernetes, Docker and AI on Cloud
Dayta AI Seminar - Kubernetes, Docker and AI on CloudDayta AI Seminar - Kubernetes, Docker and AI on Cloud
Dayta AI Seminar - Kubernetes, Docker and AI on CloudJung-Hong Kim
 
Cloud-native .NET Microservices mit Kubernetes
Cloud-native .NET Microservices mit KubernetesCloud-native .NET Microservices mit Kubernetes
Cloud-native .NET Microservices mit KubernetesQAware GmbH
 
The App Developer's Kubernetes Toolbox
The App Developer's Kubernetes ToolboxThe App Developer's Kubernetes Toolbox
The App Developer's Kubernetes ToolboxNebulaworks
 
betterCode Workshop: Effizientes DevOps-Tooling mit Go
betterCode Workshop:  Effizientes DevOps-Tooling mit GobetterCode Workshop:  Effizientes DevOps-Tooling mit Go
betterCode Workshop: Effizientes DevOps-Tooling mit GoQAware GmbH
 
Into The Box 2018 Going live with commandbox and docker
Into The Box 2018 Going live with commandbox and dockerInto The Box 2018 Going live with commandbox and docker
Into The Box 2018 Going live with commandbox and dockerOrtus Solutions, Corp
 
Going live with BommandBox and docker Into The Box 2018
Going live with BommandBox and docker Into The Box 2018Going live with BommandBox and docker Into The Box 2018
Going live with BommandBox and docker Into The Box 2018Ortus Solutions, Corp
 
Containers as a Service with Docker
Containers as a Service with DockerContainers as a Service with Docker
Containers as a Service with DockerDocker, Inc.
 
Docker Container As A Service - March 2016
Docker Container As A Service - March 2016Docker Container As A Service - March 2016
Docker Container As A Service - March 2016Patrick Chanezon
 
Docker Enterprise Workshop - Technical
Docker Enterprise Workshop - TechnicalDocker Enterprise Workshop - Technical
Docker Enterprise Workshop - TechnicalPatrick Chanezon
 
Deploying applications to Windows Server 2016 and Windows Containers
Deploying applications to Windows Server 2016 and Windows ContainersDeploying applications to Windows Server 2016 and Windows Containers
Deploying applications to Windows Server 2016 and Windows ContainersBen Hall
 
Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019Radulescu Adina-Valentina
 

Similar to Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra (20)

Intro to R and H2O with Spencer Aiello
Intro to R and H2O with Spencer AielloIntro to R and H2O with Spencer Aiello
Intro to R and H2O with Spencer Aiello
 
Docker Container As A Service - Mix-IT 2016
Docker Container As A Service - Mix-IT 2016Docker Container As A Service - Mix-IT 2016
Docker Container As A Service - Mix-IT 2016
 
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + KubernetesMongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes
 
Docker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-ITDocker and Cloud - Enables for DevOps - by ACA-IT
Docker and Cloud - Enables for DevOps - by ACA-IT
 
'DOCKER' & CLOUD: ENABLERS For DEVOPS
'DOCKER' & CLOUD:  ENABLERS For DEVOPS'DOCKER' & CLOUD:  ENABLERS For DEVOPS
'DOCKER' & CLOUD: ENABLERS For DEVOPS
 
Building a data warehouse with Pentaho and Docker
Building a data warehouse with Pentaho and DockerBuilding a data warehouse with Pentaho and Docker
Building a data warehouse with Pentaho and Docker
 
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + KubernetesMongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes
 
Docker engine - Indroduc
Docker engine - IndroducDocker engine - Indroduc
Docker engine - Indroduc
 
Getting started with Docker sandboxes for MariaDB
Getting started with Docker sandboxes for MariaDBGetting started with Docker sandboxes for MariaDB
Getting started with Docker sandboxes for MariaDB
 
Dayta AI Seminar - Kubernetes, Docker and AI on Cloud
Dayta AI Seminar - Kubernetes, Docker and AI on CloudDayta AI Seminar - Kubernetes, Docker and AI on Cloud
Dayta AI Seminar - Kubernetes, Docker and AI on Cloud
 
Cloud-native .NET Microservices mit Kubernetes
Cloud-native .NET Microservices mit KubernetesCloud-native .NET Microservices mit Kubernetes
Cloud-native .NET Microservices mit Kubernetes
 
The App Developer's Kubernetes Toolbox
The App Developer's Kubernetes ToolboxThe App Developer's Kubernetes Toolbox
The App Developer's Kubernetes Toolbox
 
betterCode Workshop: Effizientes DevOps-Tooling mit Go
betterCode Workshop:  Effizientes DevOps-Tooling mit GobetterCode Workshop:  Effizientes DevOps-Tooling mit Go
betterCode Workshop: Effizientes DevOps-Tooling mit Go
 
Into The Box 2018 Going live with commandbox and docker
Into The Box 2018 Going live with commandbox and dockerInto The Box 2018 Going live with commandbox and docker
Into The Box 2018 Going live with commandbox and docker
 
Going live with BommandBox and docker Into The Box 2018
Going live with BommandBox and docker Into The Box 2018Going live with BommandBox and docker Into The Box 2018
Going live with BommandBox and docker Into The Box 2018
 
Containers as a Service with Docker
Containers as a Service with DockerContainers as a Service with Docker
Containers as a Service with Docker
 
Docker Container As A Service - March 2016
Docker Container As A Service - March 2016Docker Container As A Service - March 2016
Docker Container As A Service - March 2016
 
Docker Enterprise Workshop - Technical
Docker Enterprise Workshop - TechnicalDocker Enterprise Workshop - Technical
Docker Enterprise Workshop - Technical
 
Deploying applications to Windows Server 2016 and Windows Containers
Deploying applications to Windows Server 2016 and Windows ContainersDeploying applications to Windows Server 2016 and Windows Containers
Deploying applications to Windows Server 2016 and Windows Containers
 
Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019Docker Timisoara: Dockercon19 recap slides, 23 may 2019
Docker Timisoara: Dockercon19 recap slides, 23 may 2019
 

Recently uploaded

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra

  • 1. Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra Roberto G. Hashioka – 2016-10-04 – TIAD – Paris
  • 2. Personal Information • Roberto Gandolfo Hashioka • @rogaha (Github) e @rhashioka (Twitter) • Finance -> Software Engineer • Growth & Data Engineer at Docker
  • 3. Summary • Background / Motivation • Project Goals • How to build it? • DEMO
  • 4. Background • Gather of data from multiple sources and process them in “real-time” • Transform raw data into meaningful and useful information used to enable more effective decision-making process • Provide more visibility into trends on: 1) user behavior 2) feature engagement 3) opportunities for future investments • Data transparency and standardization
  • 5. Project Goals • Create a data processing pipeline that can handle a huge amount of events per second • Automate the development environment — Docker compose. • Automate the remote machines management — Docker for AWS / Machine. • Reduce the time to market / time to development — New hires / new features.
  • 7. How to build it? • Step 1: Install Docker for Mac/Win and dockerize all the applications link: https://www.docker.com/products/docker
  • 8. Exemplo de Dockerfile ----------------------------------------------------------------------------------------------------------- FROM ubuntu:14.04 MAINTAINER Roberto Hashioka (roberto@docker.com) RUN apt-get update && apt-get install -y nginx RUN echo “Hello World! #TIAD” > /usr/share/nginx/html/index.html EXPOSE 80 ------------------------------------------------------------------------------------------------------------ $ docker build –t rogaha/web_demotiad2016 . $ docker run –d –p 80:80 –-name web_demotiad2016 rogaha/web_demotiad2016
  • 9. How to build it? • Step 2: Define your services stack with a docker-compose file
  • 10. Docker Compose containers: web: build: . command: python app.py ports: - "5000:5000" volumes: - .:/code links: - redis environment: - PYTHONUNBUFFERED=1 redis: image: redis:latest command: redis-server --appendonly yes
  • 11. How to build it? • Step 3: Test the applications locally from your laptop using containers
  • 13. How to build it? • Step 4: Provision your remote servers and deploy your containers
  • 15. How to build it? • Step 5: Scale your services with Docker swarm
  • 17. Open Source Projects Used • Docker (https://github.com/docker/docker) • An open platform for distributed applications for developers and sysadmins • Apache Spark / Spark SQL (https://github.com/apache/spark) • A fast, in-memory data processing engine. Spark SQL lets you query structured data as a resilient distributed dataset (RDD) • Apache Kafka (https://github.com/apache/kafka) • A fast and scalable pub-sub messaging service • Apache Zookeeper (https://github.com/apache/zookeeper) • A distributed configuration service, synchronization service, and naming registry for large distributed systems • Apache Cassandra (https://github.com/apache/cassandra) • Scalable, high-available and distributed columnar NoSQL database • D3 (https://github.com/mbostock/d3) • A JavaScript visualization library for HTML and SVG.