SlideShare une entreprise Scribd logo
1  sur  63
Building a Data Product using
apache Zeppelin (incubating)
NFLabs for ApacheCon ’15 EU
Alexander Bezzubov
Data engineer @NFLabs
based in Seoul, South Korea
bzz@apache.org
So you want to build a data
product…
What do you need?
To build a data product we need:
Idea
Data
Software (to process the data)
Hardware (to run the software)
…and a brain
Idea
Data
Software (to process the data)
Hardware (to run the software)
…and a brain
To build a data product we need:
Software
Software
Software
… …
Software
… …
Zeppelin
Zeppelin
Opensouce analytical environment,
with pluggable backend data-processing systems,
with notebook-style GUI for visualisations
ASF Incubation12.2014
08.2013 NFLabs Internal project Hive/Shark
http://zeppelin.incubator.apache.org
12.2012 Commercial App using AMP Lab Shark 0.5
10.2013 Prototype Hive/Shark
Zeppelin History
Zeppelin History
Zeppelin History
Zeppelin History
Zeppelin History
Zeppelin Current Status
1 Release
63 Contributors worldwide
689 Stars on GH
~300/900 Emails at users/dev
@i.a.o
http://zeppelin.incubator.apache.org
Zeppelin Architecture
Hardware
Hardware
.
.
.
.
.
.
Idea
An Idea
Would not it be cool if …
An Idea
Would not it be cool if …Would not it be cool if …
you could have your own Google
Analytics?
An Idea
Would not it be cool if …Would not it be cool if …
you could have your own Google
Analytics?
sorry, we already saw it in eCG
talk..
ok, let’s pick something else
An Idea
you could be the first to know when
there is a new interesting*
opensouce project
Would not it be cool if …
Data
Data: Github archive
https://www.githubarchive.org• Github logs, hosted in the
cloud
• Collaboration between Github
and Google engineers
• 20+ events, 250+Gb since
2012
• Proprietary software
• available on BigQuery
But what if you could:
analyse this data independently, without asking
permission or paying anybody?
But what if you could:
analyse this data independent, without asking
permission or paying anybody?
well, with ASF you can!
Let’s build a data product for ourselves!
Building a product
We are going to build a Notebook that:
• Downloads the latest data from GitHub Archive
• Read & explore the dataset
• Imports, filters the PublicEvent
• Join logs w/ more data from Github API calls
• Shows simple HTML template, to visualise the
list
• Sends email notifications
We are going to build a Notebook that:
basically, sends you digest emails:
Start Zeppelin
./bin/zeppelin-daemon.sh start
& create a new notebook
http://zeppelin.incubator.apache.org/download.html
Zeppelin Architecture
We are going to build a Notebook that:
• Downloads the latest data from GitHub Archive
• Read & explore the dataset
• Imports, filters the PublicEvent
• Join logs w/ more data from Github API calls
• Shows simple HTML template, to visualise the
list
• Sends email notifications
Load Dependency
We are going to build a Notebook that:
• Downloads the latest data from GitHub Archive
• Read & explore the dataset
• Imports, filters the PublicEvent
• Join logs w/ more data from Github API calls
• Shows simple HTML template, to visualise the
list
• Sends email notifications
Download Data
In serial, sample, using shell interpreter
Download Data
In serial, whole day, using shell interpreter
Don’t need this as we have data
prepared
Download Data
In parallel, using Spark interpreter
We are going to build a Notebook that:
• Downloads the latest data from GitHub Archive
• Read & explore the dataset
• Imports, filters the PublicEvent
• Join on external information through remote API
call
• Shows simple HTML template to visualise the
list
• Sends email notifications
Read Data
Explore Data #1
Pie Chart of Event types
Explore Data #2
Top 10 organisations by event type
We are going to build a Notebook that:
• Downloads the latest data from GitHub Archive
• Read & explore the dataset
• Imports, filters the PublicEvent
• Join on external information through remote API
call
• Shows simple HTML template to visualise the
list
• Sends email notifications
Filter: cleanup
Only organisations open sourcing
repositories
Filter: interesting companies
Only organisations open sourcing
repositories
We are going to build a Notebook that:
• Downloads the latest data from GitHub Archive
• Read & explore the dataset
• Imports, filters the PublicEvent
• Join on external information through remote API
• Shows simple HTML template to visualize the
list
• Sends email notifications
Call Github API
Getting more information about
repository
GitHub personal access token to rise rate-limit
github.com/<username> => Edit Profile => Personal access tokens => Generate new
token
Join orgs and repos
We are going to build a Notebook that:
• Downloads the latest data from GitHub Archive
• Read & explore the dataset
• Imports, filters the PublicEvent
• Join on external information through remote API
call
• Shows simple HTML template
• Sends email notifications
HTML Preview
To generate a template
HTML Preview
To output the results
We are going to build a Notebook that:
• Downloads the latest data from GitHub Archive
• Read & explore the dataset
• Imports, filters the PublicEvent
• Join on external information through remote API
call
• Shows simple HTML template to visualise the
list
• Sends email notifications
Send Email
Send Email
Send Email
We are going to build a Notebook that:
• Downloads the latest data from GitHub Archive
• Read & explore the dataset
• Imports, filters the PublicEvent
• Join on external information through remote API
call
• Shows simple HTML template to visualise the
list
• Sends email notifications
Schedule
Kylin: interpreter setup
Kylin: consume data
Alexander Bezzubov
bzz@apache.org
http://s.apache.org/zeppelin-workshop

Contenu connexe

Tendances

Airflow - An Open Source Platform to Author and Monitor Data Pipelines
Airflow - An Open Source Platform to Author and Monitor Data PipelinesAirflow - An Open Source Platform to Author and Monitor Data Pipelines
Airflow - An Open Source Platform to Author and Monitor Data PipelinesDataWorks Summit
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin IntroductionLuke Han
 
Hw09 Hadoop Applications At Yahoo!
Hw09   Hadoop Applications At Yahoo!Hw09   Hadoop Applications At Yahoo!
Hw09 Hadoop Applications At Yahoo!Cloudera, Inc.
 
Koalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIsKoalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIsXiao Li
 
The Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke HanThe Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke HanLuke Han
 
The Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanThe Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanLuke Han
 
Zeppelin at Twitter
Zeppelin at TwitterZeppelin at Twitter
Zeppelin at TwitterPrasad Wagle
 
Spark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub HavaSpark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub HavaSpark Summit
 
Apache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingApache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingLuke Han
 
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...DataWorks Summit/Hadoop Summit
 
Adding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupAdding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupLuke Han
 
Apache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingApache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingLuke Han
 
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @ShanghaiLuke Han
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirLuciano Resende
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development KitJen Aman
 
Zeppelin at twitter (sf data science meetup, july 2016)
Zeppelin at twitter (sf data science meetup, july 2016)Zeppelin at twitter (sf data science meetup, july 2016)
Zeppelin at twitter (sf data science meetup, july 2016)Prasad Wagle
 
Spark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean WamplerSpark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean WamplerSpark Summit
 
Interactive Analytics using Apache Spark
Interactive Analytics using Apache SparkInteractive Analytics using Apache Spark
Interactive Analytics using Apache SparkSachin Aggarwal
 
Apache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big DataApache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big DataLuke Han
 

Tendances (20)

Airflow - An Open Source Platform to Author and Monitor Data Pipelines
Airflow - An Open Source Platform to Author and Monitor Data PipelinesAirflow - An Open Source Platform to Author and Monitor Data Pipelines
Airflow - An Open Source Platform to Author and Monitor Data Pipelines
 
Apache Kylin Introduction
Apache Kylin IntroductionApache Kylin Introduction
Apache Kylin Introduction
 
Hw09 Hadoop Applications At Yahoo!
Hw09   Hadoop Applications At Yahoo!Hw09   Hadoop Applications At Yahoo!
Hw09 Hadoop Applications At Yahoo!
 
Koalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIsKoalas: Unifying Spark and pandas APIs
Koalas: Unifying Spark and pandas APIs
 
The Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke HanThe Apache Way - Building Open Source Community in China - Luke Han
The Apache Way - Building Open Source Community in China - Luke Han
 
The Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke HanThe Evolution of Apache Kylin by Luke Han
The Evolution of Apache Kylin by Luke Han
 
Zeppelin at Twitter
Zeppelin at TwitterZeppelin at Twitter
Zeppelin at Twitter
 
Spark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub HavaSpark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub Hava
 
Apache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 BeijingApache Kylin Open Source Journey for QCon2015 Beijing
Apache Kylin Open Source Journey for QCon2015 Beijing
 
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
 
Adding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark MeetupAdding Spark support to Kylin at Bay Area Spark Meetup
Adding Spark support to Kylin at Bay Area Spark Meetup
 
Apache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 BeijingApache kylin - Big Data Technology Conference 2014 Beijing
Apache kylin - Big Data Technology Conference 2014 Beijing
 
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
6. Apache Kylin Roadmap and Community - Apache Kylin Meetup @Shanghai
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar CastanedaSpark Summit EU talk by Oscar Castaneda
Spark Summit EU talk by Oscar Castaneda
 
Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
Zeppelin at twitter (sf data science meetup, july 2016)
Zeppelin at twitter (sf data science meetup, july 2016)Zeppelin at twitter (sf data science meetup, july 2016)
Zeppelin at twitter (sf data science meetup, july 2016)
 
Spark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean WamplerSpark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean Wampler
 
Interactive Analytics using Apache Spark
Interactive Analytics using Apache SparkInteractive Analytics using Apache Spark
Interactive Analytics using Apache Spark
 
Apache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big DataApache Kylin Extreme OLAP Engine for Big Data
Apache Kylin Extreme OLAP Engine for Big Data
 

Similaire à 4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai

Make an Instant Website with Webhooks
Make an Instant Website with WebhooksMake an Instant Website with Webhooks
Make an Instant Website with WebhooksAnne Gentle
 
IBM Connections REST API Klompendans
IBM Connections REST API KlompendansIBM Connections REST API Klompendans
IBM Connections REST API KlompendansHenning Schmidt
 
CICD Pipeline Using Github Actions
CICD Pipeline Using Github ActionsCICD Pipeline Using Github Actions
CICD Pipeline Using Github ActionsKumar Shìvam
 
Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018Den Delimarsky
 
Dd13.2013.milano.open ntf
Dd13.2013.milano.open ntfDd13.2013.milano.open ntf
Dd13.2013.milano.open ntfUlrich Krause
 
DevOps on AWS: Accelerating Software Delivery with the AWS Developer Tools
DevOps on AWS: Accelerating Software Delivery with the AWS Developer ToolsDevOps on AWS: Accelerating Software Delivery with the AWS Developer Tools
DevOps on AWS: Accelerating Software Delivery with the AWS Developer ToolsAmazon Web Services
 
GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...
GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...
GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...Neo4j
 
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13Dominopoint - Italian Lotus User Group
 
Learn Electron for Web Developers
Learn Electron for Web DevelopersLearn Electron for Web Developers
Learn Electron for Web DevelopersKyle Cearley
 
Machine Learning Platform in LINE Fukuoka
Machine Learning Platform in LINE FukuokaMachine Learning Platform in LINE Fukuoka
Machine Learning Platform in LINE FukuokaLINE Corporation
 
DEVNET-2002 Coding 201: Coding Skills 201: Going Further with REST and Python...
DEVNET-2002	Coding 201: Coding Skills 201: Going Further with REST and Python...DEVNET-2002	Coding 201: Coding Skills 201: Going Further with REST and Python...
DEVNET-2002 Coding 201: Coding Skills 201: Going Further with REST and Python...Cisco DevNet
 
Open Data and Web API
Open Data and Web APIOpen Data and Web API
Open Data and Web APISammy Fung
 
Getting started with sbt
Getting started with sbtGetting started with sbt
Getting started with sbtikenna4u
 
XPages -Beyond the Basics
XPages -Beyond the BasicsXPages -Beyond the Basics
XPages -Beyond the BasicsUlrich Krause
 
Guidelines for Working with Contract Developers in Evergreen
Guidelines for Working with Contract Developers in EvergreenGuidelines for Working with Contract Developers in Evergreen
Guidelines for Working with Contract Developers in Evergreenloriayre
 
Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017Mandi Walls
 
An Introduction to Working With the Activity Stream
An Introduction to Working With the Activity StreamAn Introduction to Working With the Activity Stream
An Introduction to Working With the Activity StreamMikkel Flindt Heisterberg
 
AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...
AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...
AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...Amazon Web Services
 
How to Play at Work - A Play Framework Tutorial
How to Play at Work - A Play Framework TutorialHow to Play at Work - A Play Framework Tutorial
How to Play at Work - A Play Framework TutorialAssistSoftware
 

Similaire à 4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai (20)

Make an Instant Website with Webhooks
Make an Instant Website with WebhooksMake an Instant Website with Webhooks
Make an Instant Website with Webhooks
 
IBM Connections REST API Klompendans
IBM Connections REST API KlompendansIBM Connections REST API Klompendans
IBM Connections REST API Klompendans
 
CICD Pipeline Using Github Actions
CICD Pipeline Using Github ActionsCICD Pipeline Using Github Actions
CICD Pipeline Using Github Actions
 
Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018Docs as Part of the Product - Open Source Summit North America 2018
Docs as Part of the Product - Open Source Summit North America 2018
 
Dd13.2013.milano.open ntf
Dd13.2013.milano.open ntfDd13.2013.milano.open ntf
Dd13.2013.milano.open ntf
 
DevOps on AWS: Accelerating Software Delivery with the AWS Developer Tools
DevOps on AWS: Accelerating Software Delivery with the AWS Developer ToolsDevOps on AWS: Accelerating Software Delivery with the AWS Developer Tools
DevOps on AWS: Accelerating Software Delivery with the AWS Developer Tools
 
GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...
GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...
GraphDay Paris - Intégrer des flux de données dans Neo4j avec l'ETL Open Sour...
 
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
The Latest and Greatest from OpenNTF and the IBM Social Business Toolkit, #dd13
 
Learn Electron for Web Developers
Learn Electron for Web DevelopersLearn Electron for Web Developers
Learn Electron for Web Developers
 
Machine Learning Platform in LINE Fukuoka
Machine Learning Platform in LINE FukuokaMachine Learning Platform in LINE Fukuoka
Machine Learning Platform in LINE Fukuoka
 
DEVNET-2002 Coding 201: Coding Skills 201: Going Further with REST and Python...
DEVNET-2002	Coding 201: Coding Skills 201: Going Further with REST and Python...DEVNET-2002	Coding 201: Coding Skills 201: Going Further with REST and Python...
DEVNET-2002 Coding 201: Coding Skills 201: Going Further with REST and Python...
 
Open Data and Web API
Open Data and Web APIOpen Data and Web API
Open Data and Web API
 
Getting started with sbt
Getting started with sbtGetting started with sbt
Getting started with sbt
 
INLS461_day14a.ppt
INLS461_day14a.pptINLS461_day14a.ppt
INLS461_day14a.ppt
 
XPages -Beyond the Basics
XPages -Beyond the BasicsXPages -Beyond the Basics
XPages -Beyond the Basics
 
Guidelines for Working with Contract Developers in Evergreen
Guidelines for Working with Contract Developers in EvergreenGuidelines for Working with Contract Developers in Evergreen
Guidelines for Working with Contract Developers in Evergreen
 
Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017
 
An Introduction to Working With the Activity Stream
An Introduction to Working With the Activity StreamAn Introduction to Working With the Activity Stream
An Introduction to Working With the Activity Stream
 
AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...
AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...
AWS re:Invent 2016: DevOps on AWS: Accelerating Software Delivery with the AW...
 
How to Play at Work - A Play Framework Tutorial
How to Play at Work - A Play Framework TutorialHow to Play at Work - A Play Framework Tutorial
How to Play at Work - A Play Framework Tutorial
 

Plus de Luke Han

Augmented OLAP for Big Data
Augmented OLAP for Big DataAugmented OLAP for Big Data
Augmented OLAP for Big DataLuke Han
 
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainApache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainLuke Han
 
Refactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsRefactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsLuke Han
 
Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSILuke Han
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanLuke Han
 
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @ShanghaiLuke Han
 
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @ShanghaiLuke Han
 
ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015Luke Han
 
Kylin OLAP Engine Tour
Kylin OLAP Engine TourKylin OLAP Engine Tour
Kylin OLAP Engine TourLuke Han
 
Actuate presentation 2011
Actuate presentation   2011Actuate presentation   2011
Actuate presentation 2011Luke Han
 

Plus de Luke Han (10)

Augmented OLAP for Big Data
Augmented OLAP for Big DataAugmented OLAP for Big Data
Augmented OLAP for Big Data
 
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainApache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data Spain
 
Refactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsRefactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics Products
 
Building Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSIBuilding Enterprise OLAP on Hadoop for FSI
Building Enterprise OLAP on Hadoop for FSI
 
Apache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and JapanApache Kylin Use Cases in China and Japan
Apache Kylin Use Cases in China and Japan
 
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
3. Apache Tez Introducation - Apache Kylin Meetup @Shanghai
 
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
5. Apache Kylin的金融大数据应用场景 - Apache Kylin Meetup @Shanghai
 
ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015ApacheKylin_HBaseCon2015
ApacheKylin_HBaseCon2015
 
Kylin OLAP Engine Tour
Kylin OLAP Engine TourKylin OLAP Engine Tour
Kylin OLAP Engine Tour
 
Actuate presentation 2011
Actuate presentation   2011Actuate presentation   2011
Actuate presentation 2011
 

Dernier

Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdfAndrey Devyatkin
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogueitservices996
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfmaor17
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shardsChristopher Curtin
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 

Dernier (20)

Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
2024-04-09 - From Complexity to Clarity - AWS Summit AMS.pdf
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Ronisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited CatalogueRonisha Informatics Private Limited Catalogue
Ronisha Informatics Private Limited Catalogue
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
Zer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdfZer0con 2024 final share short version.pdf
Zer0con 2024 final share short version.pdf
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards2024 DevNexus Patterns for Resiliency: Shuffle shards
2024 DevNexus Patterns for Resiliency: Shuffle shards
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 

4.Building a Data Product using apache Zeppelin - Apache Kylin Meetup @Shanghai

Notes de l'éditeur

  1. short intro designed as a workshop - go home with some practical skills
  2. Go though some history of Zeppelin project and then I’ll guide through a hands-on example of one data product
  3. any “product”, based on data
  4. Easy, because of Apache &BigData
  5. ASF is a leader in BigData ecosystem one of the biggest foundations, 200+ projects gives you the power to crunch the data that only big companies have interesting engineering challenges, job
  6. a lot computing options some of them do have a web ui to interact with, but are project-secific
  7. with Z - you have visualisation options, agnostic to the backend allows you to be flexible with your stack Spark is the most advanced interpreter now - historical reasons
  8. * what is Zeppelin? and some history
  9. started at NFLabs in Seoul, South Korea open-source (free, as a beer) since 2013 when mature enough, to provide a value to community -> ASF
  10. comercial app
  11. prototype OSS free as a beer
  12. visualisations 2 different backend Hive and Shark
  13. with pluggable backend systems, thought ‘interpreters’
  14. so, what exactly is Zeppelin that’s how it look like in 10 months under ASF
  15. interpreters abstracted was very important
  16. if you want to build a data product - you better be prepared for scale quality of the product depends on the amount of data
  17. * plan for both: a cluster (cloud) and a laptop (dev prod) for free * …and you have those, thanks to ASF!
  18. good example of data product using web logs!
  19. talked about Apache a lot (Kylin, Tez..) we all love Opensource Github is the biggest opensource code repository problem: you cannot follow organisation on Github!
  20. oss is awesome, so many things happen - hard to keep up problem: you cannot follow organisation on Github! say eBay, or any other tech company you are interested in OSS something now have software\hardware and an Idea for data product. Something missing
  21. Github opensources their log data may companies - log-generating machines, but GH pushes it further * available to crunch, using tones of closed-sourced engineering effort
  22. Would not that be cool?
  23. Hands-on part
  24. break-down: 8 steps
  25. build a missing feature definition of “interesting” may vary i.e not just “new from big companies”, but intelligent suggestions based on
  26. going to be using different interpreters: shell, spark, kylin
  27. break-down: 8 steps
  28. break-down: 8 steps
  29. sequential download, does not utilise full channel bandwidth
  30. * simplicity! * in 24 parallel threads