SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
To Have Own Data
Analytics Platform,
Or NOT To
青山エンジニア勉強交流会 April 24, 2017
Satoshi Tagomori (@tagomoris)
Satoshi "Moris" Tagomori
(@tagomoris)
Fluentd, MessagePack-Ruby, Norikra, ...
Treasure Data, Inc.
http://tsuchinoko.dmmlabs.com/?p=1770
At Feb 23, 2015
• To Have Own Data Analytics Platform, Or NOT To,
In Startup Companies:
• "NOT To, in general"
• Data analytics services:
• AWS EMR, Redshift
• Google BigQuery
• Treasure Data
Options In 2017
• On Premise
• Cloudera CDH, Hortonworks HDP, ...
• Services
• AWS EMR, Redshift, Athena, Kinesis Analytics, ...
• Google BigQuery, Cloud Dataflow, Cloud
Dataproc, ...
• MS Azure SQL Data Warehouse, Stream Analytics,
Data Lake Analytics, ...
• Treasure Data
TO HAVE
OR
NOT TO HAVE
?
DO NOT
😝
Anyway,
NO FINE CONCLUSION
IN THIS PRESENTATION
On Premise Platform In Past
• 2011-2014: On-premise Hadoop&Presto cluster
• w/ Fluentd stream processing cluster
• w/ Norikra stream processing
• w/ Web UI (Shib)
https://www.slideshare.net/tagomoris/lambda-architecture-using-sql-hadoopcon-2014-taiwan
To Be Considered
• Distributed Processing Platform
• Data Management
• Process Management
• Platform Management
• Visualization and BI
• Connecting Data
Distributed Processing Platform
• Hadoop, Presto, Spark, Flink, Storm, ...
• + Servers
• EMR, Redshift, Dataproc, ...
• Cost per instances
• BigQuery, Athena, Treasure Data, ....
• Cost per data/queries/...
Data Management
• How to collect data?
• How to ingest data?
• How to manage schema?
• How to move data from here to there?
Process Management
• How to run queries on schedule?
• How to build workflow between queries?
• How to run queries after data ingestion?
• How to move data from the platform to elsewhere
after queries?
Platform Management
• How to upgrade software?
• How to add nodes?
• How to manage failures / downtime?
• How to replace hardware?
• How to switch platforms?
• How to provide compatibility for queries?
Visualization and BI
• How to show query results graphically?
• How to show relations between data graphically?
• How to query data interactively?
Connecting Data
• How to join logs and master data?
• How to join logs and user list?
• How to join logs and CRM data?
• How to push query results to marketing tools/
services?
• How to send notifications using query results?
Additional Topics
• Stream Processing Platform
• Machine Learning Platform
• AI(?) Services
In My Past Case:
• Distributed Processing Platform
• Hadoop & Presto (& Norikra)
• Data Management
• Hive schema & Custom made UI (Shib)
• Managed by engineers of each services
• Process Management
• Custom made query scheduler (ShibUI)
• Platform Management
• By tagomoris
• Visualization, BI: N/A
• Connecting Data: N/A
About Treasure Data
• Distributed Processing Platform: Hive, Presto
• Data Management: Fluentd & Schema-less DB
• Process Management: Digdag / Treasure Workflow
• Platform Management: Automatic
• Visualization and BI: Treasure BI
• Connecting Data: Embulk / Data Connector
😝
Recent Improvements around Data Analytics
• Improvements of CDH/HDP to manage clusters
• Online Upgrade
• Support many processing frameworks
• Many new data processing software/frameworks
• Apache Flink, Apache Arrow, Apache Beam, ...
• Many new services available
• Stream processing, Machine learning, ...
MONEY
• Saving money is important - it's true.
MONEY
• Saving money introduces many issues - it's true!
MONEY
• Money solves many problems - is it true?
Complexity
• Connecting data / processing with applications
• Connecting data / processing with services
• Connecting data / processing with people
Chasing the World
• Many new software / services / platform /
paradigm, day by day
• Data sizes are growing day by day
• Complexity is growing day by day
• A data platform CANNOT live as-is 5 years!
Finding Treasure From Data
• "Data Processing" is:
• NOT the purpose
• just a tool to get something great
• Use developers and their time to find treasures!
TBD
Thank you!
@tagomoris

Contenu connexe

Tendances

Prestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoSadayuki Furuhashi
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure DataTaro L. Saito
 
Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -Treasure Data, Inc.
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure DataTaro L. Saito
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQLSATOSHI TAGOMORI
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015N Masahiro
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkVinoth Chandar
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Sadayuki Furuhashi
 
Presto at Twitter
Presto at TwitterPresto at Twitter
Presto at TwitterBill Graham
 
Fluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, ScalableFluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, ScalableShu Ting Tseng
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016kbajda
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Taro L. Saito
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Sadayuki Furuhashi
 
Presto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupPresto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupWojciech Biela
 

Tendances (20)

Prestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for Presto
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -Plazma - Treasure Data’s distributed analytical database -
Plazma - Treasure Data’s distributed analytical database -
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Lambda Architecture Using SQL
Lambda Architecture Using SQLLambda Architecture Using SQL
Lambda Architecture Using SQL
 
Norikra Recent Updates
Norikra Recent UpdatesNorikra Recent Updates
Norikra Recent Updates
 
Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015Treasure Data and AWS - Developers.io 2015
Treasure Data and AWS - Developers.io 2015
 
Presto+MySQLで分散SQL
Presto+MySQLで分散SQLPresto+MySQLで分散SQL
Presto+MySQLで分散SQL
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Presto
PrestoPresto
Presto
 
tdtechtalk20160330johan
tdtechtalk20160330johantdtechtalk20160330johan
tdtechtalk20160330johan
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on Spark
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014
 
Presto at Twitter
Presto at TwitterPresto at Twitter
Presto at Twitter
 
Fluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, ScalableFluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, Scalable
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Presto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop MeetupPresto for the Enterprise @ Hadoop Meetup
Presto for the Enterprise @ Hadoop Meetup
 

En vedette

Clustering output of Apache Nutch using Apache Spark
Clustering output of Apache Nutch using Apache SparkClustering output of Apache Nutch using Apache Spark
Clustering output of Apache Nutch using Apache SparkThamme Gowda
 
Sparkler—Crawler on Apache Spark: Spark Summit East talk by Karanjeet Singh a...
Sparkler—Crawler on Apache Spark: Spark Summit East talk by Karanjeet Singh a...Sparkler—Crawler on Apache Spark: Spark Summit East talk by Karanjeet Singh a...
Sparkler—Crawler on Apache Spark: Spark Summit East talk by Karanjeet Singh a...Spark Summit
 
Dokkuの活用と内部構造
Dokkuの活用と内部構造Dokkuの活用と内部構造
Dokkuの活用と内部構造修平 富田
 
Sparkler - Spark Crawler
Sparkler - Spark Crawler Sparkler - Spark Crawler
Sparkler - Spark Crawler Thamme Gowda
 
ひとつひとつ組み上げてみていくビデオチャットアプリのアーキテクチャ
ひとつひとつ組み上げてみていくビデオチャットアプリのアーキテクチャひとつひとつ組み上げてみていくビデオチャットアプリのアーキテクチャ
ひとつひとつ組み上げてみていくビデオチャットアプリのアーキテクチャKoki Kumagai
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldSATOSHI TAGOMORI
 
Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"SATOSHI TAGOMORI
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and ThenSATOSHI TAGOMORI
 
20160730 fluentd meetup in matsue slide
20160730 fluentd meetup in matsue slide20160730 fluentd meetup in matsue slide
20160730 fluentd meetup in matsue slidecosmo0920
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In RubySATOSHI TAGOMORI
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersSATOSHI TAGOMORI
 
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05都元ダイスケ Miyamoto
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage SystemsSATOSHI TAGOMORI
 
Fluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API DetailsFluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API DetailsSATOSHI TAGOMORI
 

En vedette (14)

Clustering output of Apache Nutch using Apache Spark
Clustering output of Apache Nutch using Apache SparkClustering output of Apache Nutch using Apache Spark
Clustering output of Apache Nutch using Apache Spark
 
Sparkler—Crawler on Apache Spark: Spark Summit East talk by Karanjeet Singh a...
Sparkler—Crawler on Apache Spark: Spark Summit East talk by Karanjeet Singh a...Sparkler—Crawler on Apache Spark: Spark Summit East talk by Karanjeet Singh a...
Sparkler—Crawler on Apache Spark: Spark Summit East talk by Karanjeet Singh a...
 
Dokkuの活用と内部構造
Dokkuの活用と内部構造Dokkuの活用と内部構造
Dokkuの活用と内部構造
 
Sparkler - Spark Crawler
Sparkler - Spark Crawler Sparkler - Spark Crawler
Sparkler - Spark Crawler
 
ひとつひとつ組み上げてみていくビデオチャットアプリのアーキテクチャ
ひとつひとつ組み上げてみていくビデオチャットアプリのアーキテクチャひとつひとつ組み上げてみていくビデオチャットアプリのアーキテクチャ
ひとつひとつ組み上げてみていくビデオチャットアプリのアーキテクチャ
 
Modern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real WorldModern Black Mages Fighting in the Real World
Modern Black Mages Fighting in the Real World
 
Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"Fighting API Compatibility On Fluentd Using "Black Magic"
Fighting API Compatibility On Fluentd Using "Black Magic"
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
20160730 fluentd meetup in matsue slide
20160730 fluentd meetup in matsue slide20160730 fluentd meetup in matsue slide
20160730 fluentd meetup in matsue slide
 
How To Write Middleware In Ruby
How To Write Middleware In RubyHow To Write Middleware In Ruby
How To Write Middleware In Ruby
 
The Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and ContainersThe Patterns of Distributed Logging and Containers
The Patterns of Distributed Logging and Containers
 
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05
AWSにおけるバッチ処理の ベストプラクティス - Developers.IO Meetup 05
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
Fluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API DetailsFluentd v0.14 Plugin API Details
Fluentd v0.14 Plugin API Details
 

Similaire à To Have Own Data Analytics Platform, Or NOT To

Data-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesData-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesSATOSHI TAGOMORI
 
Pacemaker hadoop infrastructure and soft serve experience
Pacemaker   hadoop infrastructure and soft serve experiencePacemaker   hadoop infrastructure and soft serve experience
Pacemaker hadoop infrastructure and soft serve experienceVitaliy Bashun
 
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data ArchitectHadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data ArchitectSoftServe
 
SPS Chevy Chase Tips on migrating to Office 365
SPS Chevy Chase Tips on migrating to Office 365SPS Chevy Chase Tips on migrating to Office 365
SPS Chevy Chase Tips on migrating to Office 365Mike Maadarani
 
Text Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & AzureText Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & AzureSanil Mhatre
 
MS Azure with IoT - Final Version
MS Azure with IoT - Final VersionMS Azure with IoT - Final Version
MS Azure with IoT - Final VersionJanani Eshwaran
 
MS Azure with IoT - Final Version
MS Azure with IoT - Final VersionMS Azure with IoT - Final Version
MS Azure with IoT - Final VersionJanani Eshwaran
 
Power BI - 2016 - Public
Power BI - 2016 - PublicPower BI - 2016 - Public
Power BI - 2016 - PublicJulian Payne
 
SKill Up Yourself with SQL, Azure, Power BI
SKill Up Yourself with SQL, Azure, Power BISKill Up Yourself with SQL, Azure, Power BI
SKill Up Yourself with SQL, Azure, Power BISequelGate
 
SharePoint Tips and Tricks to avoid migration headaches
SharePoint Tips and Tricks to avoid migration headachesSharePoint Tips and Tricks to avoid migration headaches
SharePoint Tips and Tricks to avoid migration headachesMike Maadarani
 
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...Perficient, Inc.
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Open Analytics
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenChristopher Whitaker
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Vantara
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist ToolboxAndrei Savu
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics PlatformN Masahiro
 
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...Mark Rittman
 
CCI2018 - Real-time dashboard whatif analysis
CCI2018 - Real-time dashboard whatif analysisCCI2018 - Real-time dashboard whatif analysis
CCI2018 - Real-time dashboard whatif analysiswalk2talk srl
 
POWER BI Training From SQL SchoolV2.pptx
POWER BI Training From SQL SchoolV2.pptxPOWER BI Training From SQL SchoolV2.pptx
POWER BI Training From SQL SchoolV2.pptxSequelGate
 
Text Mining & Sentiment Analysis made easy, with Azure and Power BI
Text Mining & Sentiment Analysis made easy, with Azure and Power BIText Mining & Sentiment Analysis made easy, with Azure and Power BI
Text Mining & Sentiment Analysis made easy, with Azure and Power BISanil Mhatre
 

Similaire à To Have Own Data Analytics Platform, Or NOT To (20)

Data-Driven Development Era and Its Technologies
Data-Driven Development Era and Its TechnologiesData-Driven Development Era and Its Technologies
Data-Driven Development Era and Its Technologies
 
Pacemaker hadoop infrastructure and soft serve experience
Pacemaker   hadoop infrastructure and soft serve experiencePacemaker   hadoop infrastructure and soft serve experience
Pacemaker hadoop infrastructure and soft serve experience
 
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data ArchitectHadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
Hadoop Infrastructure and SoftServe Experience by Vitaliy Bashun, Data Architect
 
SPS Chevy Chase Tips on migrating to Office 365
SPS Chevy Chase Tips on migrating to Office 365SPS Chevy Chase Tips on migrating to Office 365
SPS Chevy Chase Tips on migrating to Office 365
 
Text Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & AzureText Mining & Sentiment Analysis with Power BI & Azure
Text Mining & Sentiment Analysis with Power BI & Azure
 
MS Azure with IoT - Final Version
MS Azure with IoT - Final VersionMS Azure with IoT - Final Version
MS Azure with IoT - Final Version
 
MS Azure with IoT - Final Version
MS Azure with IoT - Final VersionMS Azure with IoT - Final Version
MS Azure with IoT - Final Version
 
Power BI - 2016 - Public
Power BI - 2016 - PublicPower BI - 2016 - Public
Power BI - 2016 - Public
 
SKill Up Yourself with SQL, Azure, Power BI
SKill Up Yourself with SQL, Azure, Power BISKill Up Yourself with SQL, Azure, Power BI
SKill Up Yourself with SQL, Azure, Power BI
 
SharePoint Tips and Tricks to avoid migration headaches
SharePoint Tips and Tricks to avoid migration headachesSharePoint Tips and Tricks to avoid migration headaches
SharePoint Tips and Tricks to avoid migration headaches
 
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
 
Technologies for Data Analytics Platform
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
 
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...Riga dev day 2016   adding a data reservoir and oracle bdd to extend your ora...
Riga dev day 2016 adding a data reservoir and oracle bdd to extend your ora...
 
CCI2018 - Real-time dashboard whatif analysis
CCI2018 - Real-time dashboard whatif analysisCCI2018 - Real-time dashboard whatif analysis
CCI2018 - Real-time dashboard whatif analysis
 
POWER BI Training From SQL SchoolV2.pptx
POWER BI Training From SQL SchoolV2.pptxPOWER BI Training From SQL SchoolV2.pptx
POWER BI Training From SQL SchoolV2.pptx
 
Text Mining & Sentiment Analysis made easy, with Azure and Power BI
Text Mining & Sentiment Analysis made easy, with Azure and Power BIText Mining & Sentiment Analysis made easy, with Azure and Power BI
Text Mining & Sentiment Analysis made easy, with Azure and Power BI
 

Plus de SATOSHI TAGOMORI

Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speedSATOSHI TAGOMORI
 
Good Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsGood Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsSATOSHI TAGOMORI
 
Invitation to the dark side of Ruby
Invitation to the dark side of RubyInvitation to the dark side of Ruby
Invitation to the dark side of RubySATOSHI TAGOMORI
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)SATOSHI TAGOMORI
 
Make Your Ruby Script Confusing
Make Your Ruby Script ConfusingMake Your Ruby Script Confusing
Make Your Ruby Script ConfusingSATOSHI TAGOMORI
 
Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubySATOSHI TAGOMORI
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsSATOSHI TAGOMORI
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the WorldSATOSHI TAGOMORI
 
Planet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamPlanet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamSATOSHI TAGOMORI
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDSATOSHI TAGOMORI
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
Engineer as a Leading Role
Engineer as a Leading RoleEngineer as a Leading Role
Engineer as a Leading RoleSATOSHI TAGOMORI
 

Plus de SATOSHI TAGOMORI (14)

Ractor's speed is not light-speed
Ractor's speed is not light-speedRactor's speed is not light-speed
Ractor's speed is not light-speed
 
Good Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/OperationsGood Things and Hard Things of SaaS Development/Operations
Good Things and Hard Things of SaaS Development/Operations
 
Maccro Strikes Back
Maccro Strikes BackMaccro Strikes Back
Maccro Strikes Back
 
Invitation to the dark side of Ruby
Invitation to the dark side of RubyInvitation to the dark side of Ruby
Invitation to the dark side of Ruby
 
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)Hijacking Ruby Syntax in Ruby (RubyConf 2018)
Hijacking Ruby Syntax in Ruby (RubyConf 2018)
 
Make Your Ruby Script Confusing
Make Your Ruby Script ConfusingMake Your Ruby Script Confusing
Make Your Ruby Script Confusing
 
Hijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in RubyHijacking Ruby Syntax in Ruby
Hijacking Ruby Syntax in Ruby
 
Lock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive OperationsLock, Concurrency and Throughput of Exclusive Operations
Lock, Concurrency and Throughput of Exclusive Operations
 
Data Processing and Ruby in the World
Data Processing and Ruby in the WorldData Processing and Ruby in the World
Data Processing and Ruby in the World
 
Planet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamPlanet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: Bigdam
 
Fluentd 101
Fluentd 101Fluentd 101
Fluentd 101
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TD
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Engineer as a Leading Role
Engineer as a Leading RoleEngineer as a Leading Role
Engineer as a Leading Role
 

Dernier

cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptrcbcrtm
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 

Dernier (20)

cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 

To Have Own Data Analytics Platform, Or NOT To

  • 1. To Have Own Data Analytics Platform, Or NOT To 青山エンジニア勉強交流会 April 24, 2017 Satoshi Tagomori (@tagomoris)
  • 2. Satoshi "Moris" Tagomori (@tagomoris) Fluentd, MessagePack-Ruby, Norikra, ... Treasure Data, Inc.
  • 3.
  • 5. At Feb 23, 2015 • To Have Own Data Analytics Platform, Or NOT To, In Startup Companies: • "NOT To, in general" • Data analytics services: • AWS EMR, Redshift • Google BigQuery • Treasure Data
  • 6. Options In 2017 • On Premise • Cloudera CDH, Hortonworks HDP, ... • Services • AWS EMR, Redshift, Athena, Kinesis Analytics, ... • Google BigQuery, Cloud Dataflow, Cloud Dataproc, ... • MS Azure SQL Data Warehouse, Stream Analytics, Data Lake Analytics, ... • Treasure Data
  • 11. NO FINE CONCLUSION IN THIS PRESENTATION
  • 12. On Premise Platform In Past • 2011-2014: On-premise Hadoop&Presto cluster • w/ Fluentd stream processing cluster • w/ Norikra stream processing • w/ Web UI (Shib) https://www.slideshare.net/tagomoris/lambda-architecture-using-sql-hadoopcon-2014-taiwan
  • 13. To Be Considered • Distributed Processing Platform • Data Management • Process Management • Platform Management • Visualization and BI • Connecting Data
  • 14. Distributed Processing Platform • Hadoop, Presto, Spark, Flink, Storm, ... • + Servers • EMR, Redshift, Dataproc, ... • Cost per instances • BigQuery, Athena, Treasure Data, .... • Cost per data/queries/...
  • 15. Data Management • How to collect data? • How to ingest data? • How to manage schema? • How to move data from here to there?
  • 16. Process Management • How to run queries on schedule? • How to build workflow between queries? • How to run queries after data ingestion? • How to move data from the platform to elsewhere after queries?
  • 17. Platform Management • How to upgrade software? • How to add nodes? • How to manage failures / downtime? • How to replace hardware? • How to switch platforms? • How to provide compatibility for queries?
  • 18. Visualization and BI • How to show query results graphically? • How to show relations between data graphically? • How to query data interactively?
  • 19. Connecting Data • How to join logs and master data? • How to join logs and user list? • How to join logs and CRM data? • How to push query results to marketing tools/ services? • How to send notifications using query results?
  • 20. Additional Topics • Stream Processing Platform • Machine Learning Platform • AI(?) Services
  • 21. In My Past Case: • Distributed Processing Platform • Hadoop & Presto (& Norikra) • Data Management • Hive schema & Custom made UI (Shib) • Managed by engineers of each services • Process Management • Custom made query scheduler (ShibUI) • Platform Management • By tagomoris • Visualization, BI: N/A • Connecting Data: N/A
  • 22. About Treasure Data • Distributed Processing Platform: Hive, Presto • Data Management: Fluentd & Schema-less DB • Process Management: Digdag / Treasure Workflow • Platform Management: Automatic • Visualization and BI: Treasure BI • Connecting Data: Embulk / Data Connector 😝
  • 23. Recent Improvements around Data Analytics • Improvements of CDH/HDP to manage clusters • Online Upgrade • Support many processing frameworks • Many new data processing software/frameworks • Apache Flink, Apache Arrow, Apache Beam, ... • Many new services available • Stream processing, Machine learning, ...
  • 24. MONEY • Saving money is important - it's true.
  • 25. MONEY • Saving money introduces many issues - it's true!
  • 26. MONEY • Money solves many problems - is it true?
  • 27. Complexity • Connecting data / processing with applications • Connecting data / processing with services • Connecting data / processing with people
  • 28. Chasing the World • Many new software / services / platform / paradigm, day by day • Data sizes are growing day by day • Complexity is growing day by day • A data platform CANNOT live as-is 5 years!
  • 29. Finding Treasure From Data • "Data Processing" is: • NOT the purpose • just a tool to get something great • Use developers and their time to find treasures!