TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra

•

1 j'aime•553 vues

The Incredible Automation Day

Talk @ TIAD, 04/10/2016, Paris by Roberto Hashioka, Docker. #tiadparis

Technologie

Real-Time Data Processing Pipeline &
Visualization with Docker, Spark, Kafka
and Cassandra
Roberto G. Hashioka – 2016-10-04 – TIAD – Paris

Personal Information
• Roberto Gandolfo Hashioka
• @rogaha (Github) e @rhashioka (Twitter)
• Finance -> Software Engineer
• Growth & Data Engineer at Docker

Summary
• Background / Motivation
• Project Goals
• How to build it?
• DEMO

Background
• Gather of data from multiple sources and process them in “real-time”
• Transform raw data into meaningful and useful information used to enable more effective
decision-making process
• Provide more visibility into trends on: 1) user behavior 2) feature engagement 3) opportunities
for future investments
• Data transparency and standardization

Project Goals
• Create a data processing pipeline that can handle a huge amount of events per second
• Automate the development environment — Docker compose.
• Automate the remote machines management — Docker for AWS / Machine.
• Reduce the time to market / time to development — New hires / new features.

How to build it?
• Step 1: Install Docker for Mac/Win and dockerize all the applications
link: https://www.docker.com/products/docker

Exemplo de Dockerfile
-----------------------------------------------------------------------------------------------------------
FROM ubuntu:14.04
MAINTAINER Roberto Hashioka (roberto@docker.com)
RUN apt-get update && apt-get install -y nginx
RUN echo “Hello World! #TIAD” > /usr/share/nginx/html/index.html
EXPOSE 80
------------------------------------------------------------------------------------------------------------
$ docker build –t rogaha/web_demotiad2016 .
$ docker run –d –p 80:80 –-name web_demotiad2016 rogaha/web_demotiad2016

How to build it?
• Step 2: Define your services stack with a docker-compose file

Docker Compose
containers:
web:
build: .
command: python app.py
ports:
- "5000:5000"
volumes:
- .:/code
links:
- redis
environment:
- PYTHONUNBUFFERED=1
redis:
image: redis:latest
command: redis-server --appendonly yes

How to build it?
• Step 3: Test the applications locally from your laptop using containers

How to build it?
• Step 4: Provision your remote servers and deploy your containers

How to build it?
• Step 5: Scale your services with Docker swarm

DEMO
source code: https://github.com/rogaha/data-processing-pipeline

Open Source Projects Used
• Docker (https://github.com/docker/docker)
• An open platform for distributed applications for developers and sysadmins
• Apache Spark / Spark SQL (https://github.com/apache/spark)
• A fast, in-memory data processing engine. Spark SQL lets you query structured data as a resilient distributed dataset (RDD)
• Apache Kafka (https://github.com/apache/kafka)
• A fast and scalable pub-sub messaging service
• Apache Zookeeper (https://github.com/apache/zookeeper)
• A distributed configuration service, synchronization service, and naming registry for large distributed systems
• Apache Cassandra (https://github.com/apache/cassandra)
• Scalable, high-available and distributed columnar NoSQL database
• D3 (https://github.com/mbostock/d3)
• A JavaScript visualization library for HTML and SVG.

Recommandé

Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...Roberto Hashioka

Ansible @ Red Hat | December 2015 Ansible Meetup in MelbourneKen Thompson

DCEU 18: From Legacy Mainframe to the Cloud: The Finnish Railways Evolution w...Docker, Inc.

The 2nd half. Scaling to the next^2Haggai Philip Zagury

Kubernetes 101 and FunQAware GmbH

DevEx | there’s no place like k3sHaggai Philip Zagury

DCEU 18: Docker Container NetworkingDocker, Inc.

Red hat cloud platformsGiovanni Galloro

Recommandé

Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...Roberto Hashioka

Ansible @ Red Hat | December 2015 Ansible Meetup in MelbourneKen Thompson

DCEU 18: From Legacy Mainframe to the Cloud: The Finnish Railways Evolution w...Docker, Inc.

The 2nd half. Scaling to the next^2Haggai Philip Zagury

Kubernetes 101 and FunQAware GmbH

DevEx | there’s no place like k3sHaggai Philip Zagury

DCEU 18: Docker Container NetworkingDocker, Inc.

Red hat cloud platformsGiovanni Galloro

Tectonic Summit 2016: Multi-Cluster Kubernetes: Planning for UnknownsCoreOS

Git ops & Continuous Infrastructure with terra*Haggai Philip Zagury

Setup Hybrid Clusters Using Kubernetes Federationinwin stack

Cloud infrastructure as codeTomasz Cholewa

Cloud Native UnleashedQAware GmbH

Scaling i/o bound MicroservicesHaggai Philip Zagury

Die große Cloud-native FaaS-HitparadeQAware GmbH

使用 Prometheus 監控 Kubernetes Cluster inwin stack

Gitlab ci, cncf.skJuraj Hantak

Terraform Code Reviews: Supercharged with ConftestJay Wallace

Zero downtime deployment of micro-services with KubernetesWojciech Barczyński

Kubernetes or OpenShift - choosing your container platform for Dev and OpsTomasz Cholewa

Introduction to KubernetesPaul Czarkowski

The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...Josef Adersberger

AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...AWS Summits

DCEU 18: Continuous Delivery with Docker Containers and Java: The Good, the B...Docker, Inc.

Building streaming applications using a managed Kafka service | DevNation Tec...Red Hat Developers

The Big Cloud native FaaS Lebowski QAware GmbH

A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17Mario-Leander Reimer

Kubernetes extensibility: crd & operators Giacomo Tirabassi

Intro to R and H2O with Spencer AielloSri Ambati

Docker Container As A Service - Mix-IT 2016Patrick Chanezon

Contenu connexe

Tendances

Tectonic Summit 2016: Multi-Cluster Kubernetes: Planning for UnknownsCoreOS

Git ops & Continuous Infrastructure with terra*Haggai Philip Zagury

Setup Hybrid Clusters Using Kubernetes Federationinwin stack

Cloud infrastructure as codeTomasz Cholewa

Cloud Native UnleashedQAware GmbH

Scaling i/o bound MicroservicesHaggai Philip Zagury

Die große Cloud-native FaaS-HitparadeQAware GmbH

使用 Prometheus 監控 Kubernetes Cluster inwin stack

Gitlab ci, cncf.skJuraj Hantak

Terraform Code Reviews: Supercharged with ConftestJay Wallace

Zero downtime deployment of micro-services with KubernetesWojciech Barczyński

Kubernetes or OpenShift - choosing your container platform for Dev and OpsTomasz Cholewa

Introduction to KubernetesPaul Czarkowski

The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...Josef Adersberger

AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...AWS Summits

DCEU 18: Continuous Delivery with Docker Containers and Java: The Good, the B...Docker, Inc.

Building streaming applications using a managed Kafka service | DevNation Tec...Red Hat Developers

The Big Cloud native FaaS Lebowski QAware GmbH

A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17Mario-Leander Reimer

Kubernetes extensibility: crd & operators Giacomo Tirabassi

Tendances (20)

Tectonic Summit 2016: Multi-Cluster Kubernetes: Planning for Unknowns

Git ops & Continuous Infrastructure with terra*

Setup Hybrid Clusters Using Kubernetes Federation

Cloud infrastructure as code

Cloud Native Unleashed

Scaling i/o bound Microservices

Die große Cloud-native FaaS-Hitparade

使用 Prometheus 監控 Kubernetes Cluster

Gitlab ci, cncf.sk

Terraform Code Reviews: Supercharged with Conftest

Zero downtime deployment of micro-services with Kubernetes

Kubernetes or OpenShift - choosing your container platform for Dev and Ops

Introduction to Kubernetes

The Good, the Bad and the Ugly of Migrating Hundreds of Legacy Applications ...

AWS Summit Singapore 2019 | Latest Trends for Cloud-Native Application Develo...

DCEU 18: Continuous Delivery with Docker Containers and Java: The Good, the B...

Building streaming applications using a managed Kafka service | DevNation Tec...

The Big Cloud native FaaS Lebowski

A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17

Kubernetes extensibility: crd & operators

Similaire à TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra

Intro to R and H2O with Spencer AielloSri Ambati

Docker Container As A Service - Mix-IT 2016Patrick Chanezon

MongoDB.local Austin 2018: MongoDB Ops Manager + KubernetesMongoDB

'DOCKER' & CLOUD: ENABLERS For DEVOPSACA IT-Solutions

Docker and Cloud - Enables for DevOps - by ACA-ITStijn Wijndaele

Building a data warehouse with Pentaho and DockerWellington Marinho

MongoDB.local DC 2018: MongoDB Ops Manager + KubernetesMongoDB

Docker engine - IndroducAl Gifari

Getting started with Docker sandboxes for MariaDBMariaDB plc

Dayta AI Seminar - Kubernetes, Docker and AI on CloudJung-Hong Kim

Cloud-native .NET Microservices mit KubernetesQAware GmbH

The App Developer's Kubernetes ToolboxNebulaworks

betterCode Workshop: Effizientes DevOps-Tooling mit GoQAware GmbH

Into The Box 2018 Going live with commandbox and dockerOrtus Solutions, Corp

Going live with BommandBox and docker Into The Box 2018Ortus Solutions, Corp

Containers as a Service with DockerDocker, Inc.

Docker Container As A Service - March 2016Patrick Chanezon

Docker Enterprise Workshop - TechnicalPatrick Chanezon

Deploying applications to Windows Server 2016 and Windows ContainersBen Hall

Docker Timisoara: Dockercon19 recap slides, 23 may 2019Radulescu Adina-Valentina

Similaire à TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra (20)

Intro to R and H2O with Spencer Aiello

Docker Container As A Service - Mix-IT 2016

MongoDB.local Austin 2018: MongoDB Ops Manager + Kubernetes

'DOCKER' & CLOUD: ENABLERS For DEVOPS

Docker and Cloud - Enables for DevOps - by ACA-IT

Building a data warehouse with Pentaho and Docker

MongoDB.local DC 2018: MongoDB Ops Manager + Kubernetes

Docker engine - Indroduc

Getting started with Docker sandboxes for MariaDB

Dayta AI Seminar - Kubernetes, Docker and AI on Cloud

Cloud-native .NET Microservices mit Kubernetes

The App Developer's Kubernetes Toolbox

betterCode Workshop: Effizientes DevOps-Tooling mit Go

Into The Box 2018 Going live with commandbox and docker

Going live with BommandBox and docker Into The Box 2018

Containers as a Service with Docker

Docker Container As A Service - March 2016

Docker Enterprise Workshop - Technical

Deploying applications to Windows Server 2016 and Windows Containers

Docker Timisoara: Dockercon19 recap slides, 23 may 2019

Plus de The Incredible Automation Day

A smooth migration to Docker focusing on build pipelines - TIAD Camp DockerThe Incredible Automation Day

Docker in real life and in the Cloud - TIAD Camp DockerThe Incredible Automation Day

Orchestrating Docker in production - TIAD Camp DockerThe Incredible Automation Day

Monitoring in 2017 - TIAD Camp DockerThe Incredible Automation Day

Strategy, planning and governance for enterprise deployments of containers - ...The Incredible Automation Day

Cluster SQL - TIAD Camp Microsoft Cloud ReadinessThe Incredible Automation Day

Build the VPC - TIAD Camp Microsoft Cloud ReadinessThe Incredible Automation Day

Opening Keynote - TIAD Camp Microsoft Cloud ReadinessThe Incredible Automation Day

Replatforming - TIAD Camp Microsoft Cloud ReadinessThe Incredible Automation Day

GitLab CI Packer - TIAD Camp Microsoft Cloud ReadinessThe Incredible Automation Day

Active Directory - TIAD Camp Microsoft Cloud ReadinessThe Incredible Automation Day

Application Stack - TIAD Camp Microsoft Cloud ReadinessThe Incredible Automation Day

Keynote TIAD Camp ServerlessThe Incredible Automation Day

From AIX to Zero-ops by Pierre BailletThe Incredible Automation Day

Serverless low cost analytics by Adways y Audric GuigonThe Incredible Automation Day

Operationnal challenges behind Serverless architectures by Laurent BernailleThe Incredible Automation Day

Build chatbots with api.ai and Google cloud functionsThe Incredible Automation Day

Real time serverless data pipelines on AWSThe Incredible Automation Day

Azure functionsThe Incredible Automation Day

TIAD 2016 - Beyond windowsautomation The Incredible Automation Day

Plus de The Incredible Automation Day (20)

A smooth migration to Docker focusing on build pipelines - TIAD Camp Docker

Docker in real life and in the Cloud - TIAD Camp Docker

Orchestrating Docker in production - TIAD Camp Docker

Monitoring in 2017 - TIAD Camp Docker

Strategy, planning and governance for enterprise deployments of containers - ...

Cluster SQL - TIAD Camp Microsoft Cloud Readiness

Build the VPC - TIAD Camp Microsoft Cloud Readiness

Opening Keynote - TIAD Camp Microsoft Cloud Readiness

Replatforming - TIAD Camp Microsoft Cloud Readiness

GitLab CI Packer - TIAD Camp Microsoft Cloud Readiness

Active Directory - TIAD Camp Microsoft Cloud Readiness

Application Stack - TIAD Camp Microsoft Cloud Readiness

Keynote TIAD Camp Serverless

From AIX to Zero-ops by Pierre Baillet

Serverless low cost analytics by Adways y Audric Guigon

Operationnal challenges behind Serverless architectures by Laurent Bernaille

Build chatbots with api.ai and Google cloud functions

Real time serverless data pipelines on AWS

Azure functions

TIAD 2016 - Beyond windowsautomation

Dernier

Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood

Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA

MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar

Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda

Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González

All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough

A Framework for Development in the AI AgeCprime

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765

JET Technology Labs White Paper for Virtualized Security and Encryption Techn...amber724300

QMMS Lesson 2 - Using MS Excel Formula.pdfROWELL MARQUINA

Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal

Decarbonising Buildings: Making a net-zero built environment a realityIES VE

Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll

Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada

Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC

Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple

UiPath Community: Communication Mining from Zero to HeroUiPathCommunity

So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda

Dernier (20)

Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...

Long journey of Ruby standard library at RubyConf AU 2024

MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes

Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger

Generative Artificial Intelligence: How generative AI works.pdf

All These Sophisticated Attacks, Can We Really Detect Them - PDF

A Framework for Development in the AI Age

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration

JET Technology Labs White Paper for Virtualized Security and Encryption Techn...

QMMS Lesson 2 - Using MS Excel Formula.pdf

Microservices, Docker deploy and Microservices source code in C#

Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...

Decarbonising Buildings: Making a net-zero built environment a reality

Emixa Mendix Meetup 11 April 2024 about Mendix Native development

Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...

Landscape Catalogue 2024 Australia-1.pdf

Testing tools and AI - ideas what to try with some tool examples

Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...

UiPath Community: Communication Mining from Zero to Hero

So einfach geht modernes Roaming fuer Notes und Nomad.pdf

TIAD 2016 : Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra

1. Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka and Cassandra Roberto G. Hashioka – 2016-10-04 – TIAD – Paris

2. Personal Information • Roberto Gandolfo Hashioka • @rogaha (Github) e @rhashioka (Twitter) • Finance -> Software Engineer • Growth & Data Engineer at Docker

3. Summary • Background / Motivation • Project Goals • How to build it? • DEMO

4. Background • Gather of data from multiple sources and process them in “real-time” • Transform raw data into meaningful and useful information used to enable more effective decision-making process • Provide more visibility into trends on: 1) user behavior 2) feature engagement 3) opportunities for future investments • Data transparency and standardization

5. Project Goals • Create a data processing pipeline that can handle a huge amount of events per second • Automate the development environment — Docker compose. • Automate the remote machines management — Docker for AWS / Machine. • Reduce the time to market / time to development — New hires / new features.

6. Project / Language Stack

7. How to build it? • Step 1: Install Docker for Mac/Win and dockerize all the applications link: https://www.docker.com/products/docker

8. Exemplo de Dockerfile ----------------------------------------------------------------------------------------------------------- FROM ubuntu:14.04 MAINTAINER Roberto Hashioka (roberto@docker.com) RUN apt-get update && apt-get install -y nginx RUN echo “Hello World! #TIAD” > /usr/share/nginx/html/index.html EXPOSE 80 ------------------------------------------------------------------------------------------------------------ $ docker build –t rogaha/web_demotiad2016 . $ docker run –d –p 80:80 –-name web_demotiad2016 rogaha/web_demotiad2016

9. How to build it? • Step 2: Define your services stack with a docker-compose file

10. Docker Compose containers: web: build: . command: python app.py ports: - "5000:5000" volumes: - .:/code links: - redis environment: - PYTHONUNBUFFERED=1 redis: image: redis:latest command: redis-server --appendonly yes

11. How to build it? • Step 3: Test the applications locally from your laptop using containers

12. How to build it?

13. How to build it? • Step 4: Provision your remote servers and deploy your containers

14. How to build it?

15. How to build it? • Step 5: Scale your services with Docker swarm

16. DEMO source code: https://github.com/rogaha/data-processing-pipeline

17. Open Source Projects Used • Docker (https://github.com/docker/docker) • An open platform for distributed applications for developers and sysadmins • Apache Spark / Spark SQL (https://github.com/apache/spark) • A fast, in-memory data processing engine. Spark SQL lets you query structured data as a resilient distributed dataset (RDD) • Apache Kafka (https://github.com/apache/kafka) • A fast and scalable pub-sub messaging service • Apache Zookeeper (https://github.com/apache/zookeeper) • A distributed configuration service, synchronization service, and naming registry for large distributed systems • Apache Cassandra (https://github.com/apache/cassandra) • Scalable, high-available and distributed columnar NoSQL database • D3 (https://github.com/mbostock/d3) • A JavaScript visualization library for HTML and SVG.

18. Thanks! Questions? @rhashioka