Soumettre la recherche
Mettre en ligne
Apache Kudu - Updatable Analytical Storage #rakutentech
•
5 j'aime
•
7,534 vues
Cloudera Japan
Suivre
https://rakutentechnologyconference2017.sched.com/speaker/shoshimauchi
Lire moins
Lire la suite
Technologie
Signaler
Partager
Signaler
Partager
1 sur 42
Télécharger maintenant
Télécharger pour lire hors ligne
Recommandé
Cloudera のサポートエンジニアリング #supennight
Cloudera のサポートエンジニアリング #supennight
Cloudera Japan
分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは
分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは
Cloudera Japan
Train, predict, serve: How to go into production your machine learning model
Train, predict, serve: How to go into production your machine learning model
Cloudera Japan
Introduction to Apache Kudu
Introduction to Apache Kudu
Shravan (Sean) Pabba
Apache Spark: Usage and Roadmap in Hadoop
Apache Spark: Usage and Roadmap in Hadoop
Cloudera Japan
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera Japan
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
Hadoop / Spark Conference Japan
Envelope
Envelope
نهاد مبارك
Recommandé
Cloudera のサポートエンジニアリング #supennight
Cloudera のサポートエンジニアリング #supennight
Cloudera Japan
分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは
分散DB Apache KuduのアーキテクチャDBの性能と一貫性を両立させる仕組み「HybridTime」とは
Cloudera Japan
Train, predict, serve: How to go into production your machine learning model
Train, predict, serve: How to go into production your machine learning model
Cloudera Japan
Introduction to Apache Kudu
Introduction to Apache Kudu
Shravan (Sean) Pabba
Apache Spark: Usage and Roadmap in Hadoop
Apache Spark: Usage and Roadmap in Hadoop
Cloudera Japan
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera + MicrosoftでHadoopするのがイイらしい。 #CWT2016
Cloudera Japan
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート...
Hadoop / Spark Conference Japan
Envelope
Envelope
نهاد مبارك
Kudu Cloudera Meetup Paris
Kudu Cloudera Meetup Paris
نهاد مبارك
How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017
Cloudera Japan
Apache Hadoop 3
Apache Hadoop 3
Cloudera, Inc.
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Data Con LA
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
Rakuten Group, Inc.
dplyr Interfaces to Large-Scale Data
dplyr Interfaces to Large-Scale Data
Cloudera, Inc.
Spark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
Cloudera, Inc.
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Cloudera, Inc.
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual Machines
DataWorks Summit
Intro to hadoop tutorial
Intro to hadoop tutorial
markgrover
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
Caserta
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...
Amazon Web Services
Maintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoop
Kai Sasaki
Hadoop Operations
Hadoop Operations
Cloudera, Inc.
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax
SQL on Hadoop
SQL on Hadoop
nvvrajesh
Apache Kudu: Technical Deep Dive
Apache Kudu: Technical Deep Dive
Cloudera, Inc.
What Can HPC on AWS Do?
What Can HPC on AWS Do?
inside-BigData.com
#cwt2016 Apache Kudu 構成とテーブル設計
#cwt2016 Apache Kudu 構成とテーブル設計
Cloudera Japan
「新製品 Kudu 及び RecordServiceの概要」 #cwt2015
「新製品 Kudu 及び RecordServiceの概要」 #cwt2015
Cloudera Japan
Contenu connexe
Tendances
Kudu Cloudera Meetup Paris
Kudu Cloudera Meetup Paris
نهاد مبارك
How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017
Cloudera Japan
Apache Hadoop 3
Apache Hadoop 3
Cloudera, Inc.
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Data Con LA
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
Rakuten Group, Inc.
dplyr Interfaces to Large-Scale Data
dplyr Interfaces to Large-Scale Data
Cloudera, Inc.
Spark One Platform Webinar
Spark One Platform Webinar
Cloudera, Inc.
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
Cloudera, Inc.
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Cloudera, Inc.
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual Machines
DataWorks Summit
Intro to hadoop tutorial
Intro to hadoop tutorial
markgrover
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
Caserta
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...
Amazon Web Services
Maintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoop
Kai Sasaki
Hadoop Operations
Hadoop Operations
Cloudera, Inc.
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax
SQL on Hadoop
SQL on Hadoop
nvvrajesh
Apache Kudu: Technical Deep Dive
Apache Kudu: Technical Deep Dive
Cloudera, Inc.
What Can HPC on AWS Do?
What Can HPC on AWS Do?
inside-BigData.com
Tendances
(20)
Kudu Cloudera Meetup Paris
Kudu Cloudera Meetup Paris
How to go into production your machine learning models? #CWT2017
How to go into production your machine learning models? #CWT2017
Apache Hadoop 3
Apache Hadoop 3
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
cloudera Apache Kudu Updatable Analytical Storage for Modern Data Platform
dplyr Interfaces to Large-Scale Data
dplyr Interfaces to Large-Scale Data
Spark One Platform Webinar
Spark One Platform Webinar
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
How to build leakproof stream processing pipelines with Apache Kafka and Apac...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Hadoop in the Clouds, Virtualization and Virtual Machines
Hadoop in the Clouds, Virtualization and Virtual Machines
Intro to hadoop tutorial
Intro to hadoop tutorial
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...
(BDT305) Lessons Learned and Best Practices for Running Hadoop on AWS | AWS r...
Maintainable cloud architecture_of_hadoop
Maintainable cloud architecture_of_hadoop
Hadoop Operations
Hadoop Operations
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
DataStax | Building a Spark Streaming App with DSE File System (Rocco Varela)...
SQL on Hadoop
SQL on Hadoop
Apache Kudu: Technical Deep Dive
Apache Kudu: Technical Deep Dive
What Can HPC on AWS Do?
What Can HPC on AWS Do?
En vedette
#cwt2016 Apache Kudu 構成とテーブル設計
#cwt2016 Apache Kudu 構成とテーブル設計
Cloudera Japan
「新製品 Kudu 及び RecordServiceの概要」 #cwt2015
「新製品 Kudu 及び RecordServiceの概要」 #cwt2015
Cloudera Japan
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
Cloudera Japan
基礎から学ぶ超並列SQLエンジンImpala #cwt2015
基礎から学ぶ超並列SQLエンジンImpala #cwt2015
Cloudera Japan
Gunosyデータマイニング研究会 #118 これからの強化学習
Gunosyデータマイニング研究会 #118 これからの強化学習
圭輔 大曽根
機械学習で大事なことをミニGunosyをつくって学んだ╭( ・ㅂ・)و ̑̑
機械学習で大事なことをミニGunosyをつくって学んだ╭( ・ㅂ・)و ̑̑
Seiji Takahashi
あなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LT
あなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LT
Hiroaki Kudo
A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation
WrangleConf
爆速クエリエンジン”Presto”を使いたくなる話
爆速クエリエンジン”Presto”を使いたくなる話
Kentaro Yoshida
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと
Hiroaki Kudo
Gunosy における AWS 上での自然言語処理・機械学習の活用事例
Gunosy における AWS 上での自然言語処理・機械学習の活用事例
圭輔 大曽根
記事分類における教師データおよびモデルの管理
記事分類における教師データおよびモデルの管理
圭輔 大曽根
論文紹介@ Gunosyデータマイニング研究会 #97
論文紹介@ Gunosyデータマイニング研究会 #97
圭輔 大曽根
マイクロサービスとABテスト
マイクロサービスとABテスト
圭輔 大曽根
WebDB Forum 2016 gunosy
WebDB Forum 2016 gunosy
Hiroaki Kudo
Apache Kuduを使った分析システムの裏側
Apache Kuduを使った分析システムの裏側
Cloudera Japan
いまさら聞けない機械学習の評価指標
いまさら聞けない機械学習の評価指標
圭輔 大曽根
En vedette
(17)
#cwt2016 Apache Kudu 構成とテーブル設計
#cwt2016 Apache Kudu 構成とテーブル設計
「新製品 Kudu 及び RecordServiceの概要」 #cwt2015
「新製品 Kudu 及び RecordServiceの概要」 #cwt2015
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
Apache Kuduは何がそんなに「速い」DBなのか? #dbts2017
基礎から学ぶ超並列SQLエンジンImpala #cwt2015
基礎から学ぶ超並列SQLエンジンImpala #cwt2015
Gunosyデータマイニング研究会 #118 これからの強化学習
Gunosyデータマイニング研究会 #118 これからの強化学習
機械学習で大事なことをミニGunosyをつくって学んだ╭( ・ㅂ・)و ̑̑
機械学習で大事なことをミニGunosyをつくって学んだ╭( ・ㅂ・)و ̑̑
あなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LT
あなただけにそっと教える弊社の分析事情 #data analyst meetup tokyo vol.1 LT
A/B Testing at Pinterest: Building a Culture of Experimentation
A/B Testing at Pinterest: Building a Culture of Experimentation
爆速クエリエンジン”Presto”を使いたくなる話
爆速クエリエンジン”Presto”を使いたくなる話
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと
“確率的最適化”を読む前に知っておくといいかもしれない関数解析のこと
Gunosy における AWS 上での自然言語処理・機械学習の活用事例
Gunosy における AWS 上での自然言語処理・機械学習の活用事例
記事分類における教師データおよびモデルの管理
記事分類における教師データおよびモデルの管理
論文紹介@ Gunosyデータマイニング研究会 #97
論文紹介@ Gunosyデータマイニング研究会 #97
マイクロサービスとABテスト
マイクロサービスとABテスト
WebDB Forum 2016 gunosy
WebDB Forum 2016 gunosy
Apache Kuduを使った分析システムの裏側
Apache Kuduを使った分析システムの裏側
いまさら聞けない機械学習の評価指標
いまさら聞けない機械学習の評価指標
Similaire à Apache Kudu - Updatable Analytical Storage #rakutentech
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Cloudera, Inc.
Kudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast Data
michaelguia
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Hadoop / Spark Conference Japan
SFHUG Kudu Talk
SFHUG Kudu Talk
Felicia Haggarty
Introduction to Apache Kudu
Introduction to Apache Kudu
Jeff Holoman
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
Yahoo Developer Network
Kudu austin oct 2015.pptx
Kudu austin oct 2015.pptx
Felicia Haggarty
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Cloudera, Inc.
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
Mike Percy
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
Hakka Labs
Simplifying Hadoop: A Secure and Unified Data Access Path for Computer Framew...
Simplifying Hadoop: A Secure and Unified Data Access Path for Computer Framew...
Dataconomy Media
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike Percy
Spark Summit
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
jdcryans
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Mike Percy
What's New in Apache Hive
What's New in Apache Hive
DataWorks Summit
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft Azure
Cloudera, Inc.
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Alluxio, Inc.
Fast Analytics
Fast Analytics
Worapol Alex Pongpech, PhD
How to deploy SQL Server on an Microsoft Azure virtual machines
How to deploy SQL Server on an Microsoft Azure virtual machines
SolarWinds
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
Kathleen Ting
Similaire à Apache Kudu - Updatable Analytical Storage #rakutentech
(20)
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Apache Kudu (Incubating): New Hadoop Storage for Fast Analytics on Fast Data ...
Kudu: Fast Analytics on Fast Data
Kudu: Fast Analytics on Fast Data
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
SFHUG Kudu Talk
SFHUG Kudu Talk
Introduction to Apache Kudu
Introduction to Apache Kudu
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
Kudu austin oct 2015.pptx
Kudu austin oct 2015.pptx
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Kudu: New Hadoop Storage for Fast Analytics on Fast Data
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data
Simplifying Hadoop: A Secure and Unified Data Access Path for Computer Framew...
Simplifying Hadoop: A Secure and Unified Data Access Path for Computer Framew...
Spark Summit EU talk by Mike Percy
Spark Summit EU talk by Mike Percy
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Kudu: Resolving Transactional and Analytic Trade-offs in Hadoop
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
What's New in Apache Hive
What's New in Apache Hive
Self-service Big Data Analytics on Microsoft Azure
Self-service Big Data Analytics on Microsoft Azure
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Accelerate and Scale Big Data Analytics with Disaggregated Compute and Storage
Fast Analytics
Fast Analytics
How to deploy SQL Server on an Microsoft Azure virtual machines
How to deploy SQL Server on an Microsoft Azure virtual machines
Hadoop Operations for Production Systems (Strata NYC)
Hadoop Operations for Production Systems (Strata NYC)
Plus de Cloudera Japan
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Cloudera Japan
機械学習の定番プラットフォームSparkの紹介
機械学習の定番プラットフォームSparkの紹介
Cloudera Japan
HDFS Supportaiblity Improvements
HDFS Supportaiblity Improvements
Cloudera Japan
Apache Impalaパフォーマンスチューニング #dbts2018
Apache Impalaパフォーマンスチューニング #dbts2018
Cloudera Japan
Apache Hadoop YARNとマルチテナントにおけるリソース管理
Apache Hadoop YARNとマルチテナントにおけるリソース管理
Cloudera Japan
HBase Across the World #LINE_DM
HBase Across the World #LINE_DM
Cloudera Japan
Cloudera in the Cloud #CWT2017
Cloudera in the Cloud #CWT2017
Cloudera Japan
先行事例から学ぶ IoT / ビッグデータの始め方
先行事例から学ぶ IoT / ビッグデータの始め方
Cloudera Japan
Clouderaが提供するエンタープライズ向け運用、データ管理ツールの使い方 #CW2017
Clouderaが提供するエンタープライズ向け運用、データ管理ツールの使い方 #CW2017
Cloudera Japan
Hue 4.0 / Hue Meetup Tokyo #huejp
Hue 4.0 / Hue Meetup Tokyo #huejp
Cloudera Japan
Cloudera Data Science WorkbenchとPySparkで 好きなPythonライブラリを 分散で使う #cadeda
Cloudera Data Science WorkbenchとPySparkで 好きなPythonライブラリを 分散で使う #cadeda
Cloudera Japan
Cloud Native Hadoop #cwt2016
Cloud Native Hadoop #cwt2016
Cloudera Japan
大規模データに対するデータサイエンスの進め方 #CWT2016
大規模データに対するデータサイエンスの進め方 #CWT2016
Cloudera Japan
#cwt2016 Cloudera Managerを用いた Hadoop のトラブルシューティング
#cwt2016 Cloudera Managerを用いた Hadoop のトラブルシューティング
Cloudera Japan
Ibis: すごい pandas ⼤規模データ分析もらっくらく #summerDS
Ibis: すごい pandas ⼤規模データ分析もらっくらく #summerDS
Cloudera Japan
クラウド上でHadoopを構築できる Cloudera Director 2.0 の紹介 #dogenzakalt
クラウド上でHadoopを構築できる Cloudera Director 2.0 の紹介 #dogenzakalt
Cloudera Japan
MapReduceを置き換えるSpark 〜HadoopとSparkの統合〜 #cwt2015
MapReduceを置き換えるSpark 〜HadoopとSparkの統合〜 #cwt2015
Cloudera Japan
PCIコンプライアンスに向けたビジネス指針〜MasterCardの事例〜 #cwt2015
PCIコンプライアンスに向けたビジネス指針〜MasterCardの事例〜 #cwt2015
Cloudera Japan
基調講演: 「データエコシステムへの挑戦」 #cwt2015
基調講演: 「データエコシステムへの挑戦」 #cwt2015
Cloudera Japan
基調講演:「ビッグデータのセキュリティとガバナンス要件」 #cwt2015
基調講演:「ビッグデータのセキュリティとガバナンス要件」 #cwt2015
Cloudera Japan
Plus de Cloudera Japan
(20)
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
Impala + Kudu を用いたデータウェアハウス構築の勘所 (仮)
機械学習の定番プラットフォームSparkの紹介
機械学習の定番プラットフォームSparkの紹介
HDFS Supportaiblity Improvements
HDFS Supportaiblity Improvements
Apache Impalaパフォーマンスチューニング #dbts2018
Apache Impalaパフォーマンスチューニング #dbts2018
Apache Hadoop YARNとマルチテナントにおけるリソース管理
Apache Hadoop YARNとマルチテナントにおけるリソース管理
HBase Across the World #LINE_DM
HBase Across the World #LINE_DM
Cloudera in the Cloud #CWT2017
Cloudera in the Cloud #CWT2017
先行事例から学ぶ IoT / ビッグデータの始め方
先行事例から学ぶ IoT / ビッグデータの始め方
Clouderaが提供するエンタープライズ向け運用、データ管理ツールの使い方 #CW2017
Clouderaが提供するエンタープライズ向け運用、データ管理ツールの使い方 #CW2017
Hue 4.0 / Hue Meetup Tokyo #huejp
Hue 4.0 / Hue Meetup Tokyo #huejp
Cloudera Data Science WorkbenchとPySparkで 好きなPythonライブラリを 分散で使う #cadeda
Cloudera Data Science WorkbenchとPySparkで 好きなPythonライブラリを 分散で使う #cadeda
Cloud Native Hadoop #cwt2016
Cloud Native Hadoop #cwt2016
大規模データに対するデータサイエンスの進め方 #CWT2016
大規模データに対するデータサイエンスの進め方 #CWT2016
#cwt2016 Cloudera Managerを用いた Hadoop のトラブルシューティング
#cwt2016 Cloudera Managerを用いた Hadoop のトラブルシューティング
Ibis: すごい pandas ⼤規模データ分析もらっくらく #summerDS
Ibis: すごい pandas ⼤規模データ分析もらっくらく #summerDS
クラウド上でHadoopを構築できる Cloudera Director 2.0 の紹介 #dogenzakalt
クラウド上でHadoopを構築できる Cloudera Director 2.0 の紹介 #dogenzakalt
MapReduceを置き換えるSpark 〜HadoopとSparkの統合〜 #cwt2015
MapReduceを置き換えるSpark 〜HadoopとSparkの統合〜 #cwt2015
PCIコンプライアンスに向けたビジネス指針〜MasterCardの事例〜 #cwt2015
PCIコンプライアンスに向けたビジネス指針〜MasterCardの事例〜 #cwt2015
基調講演: 「データエコシステムへの挑戦」 #cwt2015
基調講演: 「データエコシステムへの挑戦」 #cwt2015
基調講演:「ビッグデータのセキュリティとガバナンス要件」 #cwt2015
基調講演:「ビッグデータのセキュリティとガバナンス要件」 #cwt2015
Dernier
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
DianaGray10
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
Curtis Poe
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
Pixlogix Infotech
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
Nathaniel Shimoni
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
Neo4j
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
LoriGlavin3
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
Lonnie McRorey
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
Ingrid Airi González
2024 April Patch Tuesday
2024 April Patch Tuesday
Ivanti
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
IES VE
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Pim van der Noll
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
panagenda
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
LoriGlavin3
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
LoriGlavin3
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Alkin Tezuysal
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
Sergiu Bodiu
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
LoriGlavin3
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
HarshalMandlekar2
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
LoriGlavin3
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
Ravi Sanghani
Dernier
(20)
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
2024 April Patch Tuesday
2024 April Patch Tuesday
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
Apache Kudu - Updatable Analytical Storage #rakutentech
1.
1© Cloudera, Inc.
All rights reserved. Apache Kudu Updatable Analytical Storage for Modern Data Platform Sho Shimauchi | Sales Engineer | Cloudera
2.
2© Cloudera, Inc.
All rights reserved. Who Am I? Sho Shimauchi Sales Engineer / Technical Evangelist Joined Cloudera in 2011 The First Employee in Cloudera APJ Email: sho@cloudera.com Twitter: @shiumachi
3.
3© Cloudera, Inc.
All rights reserved. • Founded in 2008 • 1600+ Clouderans • Machine learning and analytics platform • Shared data experience • Cloud-native and cloud-differentiated • Open-source innovation and efficiency
4.
4© Cloudera, Inc.
All rights reserved. Rakuten Card replaced Mainframe to Cloudera Enterprise in 2017 Apache Spark improved performance of the batch processes >2x Please join Cloudera World Tokyo 2017 to see Kobayashi-san’s Keynote! www.clouderaworldtokyo.com Rakuten Card + Cloudera
5.
5© Cloudera, Inc.
All rights reserved. Why Kudu? Use Cases and Motivation
6.
6© Cloudera, Inc.
All rights reserved. 6 The modern platform for machine learning and analytics optimized for the cloud EXTENSIBLE SERVICES CORE SERVICES DATA ENGINEERING OPERATIONAL DATABASE ANALYTIC DATABASE DATA CATALOG INGEST & REPLICATION SECURITY GOVERNANCE WORKLOAD MANAGEMENT DATA SCIENCE NEW OFFERINGS Cloudera Enterprise Amazon S3 Microsoft ADLS HDFS KUDU STORAGE SERVICES
7.
7© Cloudera, Inc.
All rights reserved. HDFS Fast Scans, Analytics and Processing of Stored Data Fast On-Line Updates & Data Serving Arbitrary Storage (Active Archive) Fast Analytics (on fast-changing or frequently-updated data) Unchanging Fast Changing Frequent Updates HBase Append-Only Real-Time Kudu Kudu fills the Gap Modern analytic applications often require complex data flow & difficult integration work to move data between HBase & HDFS Analytic Gap Pace of Analysis PaceofData Filling the Analytic Gap
8.
8© Cloudera, Inc.
All rights reserved. Apache Kudu: Scalable and fast structured storage Scalable • Tested up to 300+ nodes (PBs cluster) • Designed to scale to 1000s of nodes and tens of PBs Fast • Multiple GB/second read throughput per node • Millions of read/write operations per second across cluster Tabular • Represents data in structured tables like a relational database • Strict schema, finite column count, no BLOBs • Individual record-level access to 100+ billion row tables
9.
9© Cloudera, Inc.
All rights reserved. Apache Kudu Community
10.
10© Cloudera, Inc.
All rights reserved. Can you insert time series data in real time? How long does it take to prepare it for analysis? Can you get results and act fast enough to change outcomes? Can you handle large volumes of machine-generated data? Do you have the tools to identify problems or threats? Can your system do machine learning? How fast can you add data to your data store? Are you trading off the ability to do broad analytics for the ability to make updates? Are you retaining only part of your data? Time Series Data Machine Data Analytics Online Reporting Why Kudu?
11.
11© Cloudera, Inc.
All rights reserved. Cheaper and faster every year. Persistent memory (3D XPoint™) Kudu can take advantage of SSD and NVM using Intel’s NVM Library. RAM is cheaper and bigger every day. Kudu runs smoothly with huge RAM. Written in C++ to avoid GC issues. Modern CPUs are adding cores and SIMD width, not GHz. Kudu takes advantage of SIMD instructions and concurrent data structures. Next generation hardware Solid-state Storage Cheaper, Bigger Memory Efficiency on Modern CPUs
12.
12© Cloudera, Inc.
All rights reserved. How it Works Replication And Fault Tolerance
13.
13© Cloudera, Inc.
All rights reserved. Tables, tablets, and tablet servers • Each table is horizontally partitioned into tablets • Range or hash partitioning • PRIMARY KEY (host, metric, timestamp) DISTRIBUTE BY HASH(timestamp) INTO 100 BUCKETS • Each tablet has N replicas (3 or 5) with Raft consensus • Automatic fault tolerance • MTTR (mean time to repair): ~5 seconds
14.
14© Cloudera, Inc.
All rights reserved. Metadata Replicated master Acts as a tablet directory Acts as a catalog (which tables exist, etc) Acts as a load balancer (tracks TS liveness, re-replicates under- replicated tablets) Caches all metadata in RAM for high performance Client configured with master addresses Asks master for tablet locations as needed and caches them
15.
15© Cloudera, Inc.
All rights reserved. Client Hey Master! Where is the row for ‘tlipcon’ in table “T”? It’s part of tablet 2, which is on servers {Z,Y,X}. BTW, here’s info on other tablets you might care about: T1, T2, T3, … UPDATE tlipcon SET col=foo Meta Cache T1: … T2: … T3: …
16.
16© Cloudera, Inc.
All rights reserved. Raft consensus TS A Tablet 1 (LEADER) Client TS B Tablet 1 (FOLLOWER) TS C Tablet 1 (FOLLOWER) WAL WALWAL 2b. Leader writes local WAL 1a. Client->Leader: Write() RPC 2a. Leader->Followers: UpdateConsensus() RPC 3. Follower: write WAL 4. Follower->Leader: success 3. Follower: write WAL 5. Leader has achieved majority 6. Leader->Client: Success!
17.
17© Cloudera, Inc.
All rights reserved. How it Works Columnar Storage
18.
18© Cloudera, Inc.
All rights reserved. Row Storage Scans have to read all the data, no encodings {23059873, newsycbot, 1442865158, Visual exp…} {22309487, RideImpala, 1442828307, Introducing …} … Tweet_id, user_name, created_at, text
19.
19© Cloudera, Inc.
All rights reserved. {25059873, 22309487, 23059861, 23010982} Tweet_id {newsycbot, RideImpala, fastly, llvmorg} User_name {1442865158, 1442828307, 1442865156, 1442865155} Created_at {Visual exp…, Introducing .., Missing July…, LLVM 3.7….} text Columnar Storage
20.
20© Cloudera, Inc.
All rights reserved. SELECT COUNT(*) FROM tweets WHERE user_name = ‘newsycbot’; {25059873, 22309487, 23059861, 23010982} Tweet_id 1GB {newsycbot, RideImpala, fastly, llvmorg} User_name Only read 1 column 2GB {1442865158, 1442828307, 1442865156, 1442865155} Created_at 1GB {Visual exp…, Introducing .., Missing July…, LLVM 3.7….} text 200GB Columnar Storage
21.
21© Cloudera, Inc.
All rights reserved. {1442825158, 1442826100, 1442827994, 1442828527} Created_at Created_at Diff(created_at) 1442825158 n/a 1442826100 942 1442827994 1894 1442828527 533 64 bits each 11 bits each Columnar Compression Many columns can compress to a few bits per row! Especially: Timestamps Time series values Low-cardinality strings Massive space savings and throughput increase!
22.
22© Cloudera, Inc.
All rights reserved. How it Works Write and Read Paths
23.
23© Cloudera, Inc.
All rights reserved. LSM vs Kudu LSM – Log Structured Merge (Cassandra, HBase, etc) Inserts and updates all go to an in-memory map (MemStore) and later flush to on-disk files (SSTable, HFile) Reads perform an on-the-fly merge of all on-disk HFiles Kudu Shares some traits (memstores, compactions) More complex. Slower writes in exchange for faster reads (especially scans)
24.
24© Cloudera, Inc.
All rights reserved. LSM Insert Path MemStore INSERT Row=r1 col=c1 val=“blah” Row=r1 col=c2 val=“1” HFile 1 Row=r1 col=c1 val=“blah” Row=r1 col=c2 val=“1” flush
25.
25© Cloudera, Inc.
All rights reserved. LSM Insert Path MemStore INSERT Row=r1 col=c1 val=“blah2” Row=r1 col=c2 val=“2” HFile 2 Row=r2 col=c1 val=“blah2” Row=r2 col=c2 val=“2” flush HFile 1Row=r1 col=c1 val=“blah” Row=r1 col=c2 val=“1”
26.
26© Cloudera, Inc.
All rights reserved. LSM Update path MemStore UPDATE HFile 1 Row=r1 col=c1 val=“blah” Row=r1 col=c2 val=“2” HFile 2 Row=r2 col=c1 val=“v2” Row=r2 col=c2 val=“5” Row=r2 col=c1 val=“newval” Note: all updates are “fully decoupled” from reads. Random-write workload is transformed to fully sequential!
27.
27© Cloudera, Inc.
All rights reserved. LSM Read path MemStore HFile 1 Row=r1 col=c1 val=“blah” Row=r1 col=c2 val=“2” HFile 2 Row=r2 col=c1 val=“v2” Row=r2 col=c2 val=“5” Row=r2 col=c1 val=“newval” Merge based on string row keys R1: c1=blah c2=2 R2: c1=newval c2=5 …. CPU intensive! Must always read rowkeys Any given row may exist across multiple HFiles: must always merge! The more HFiles to merge, the slower it reads
28.
28© Cloudera, Inc.
All rights reserved. Kudu storage – Inserts and Flushes MemRowSet INSERT(“todd”, “$1000”,”engineer”) name pay role DiskRowSet 1 flush Multiple files for each columns base data Latest version of data
29.
29© Cloudera, Inc.
All rights reserved. Kudu storage – Inserts and Flushes MemRowSet name pay role DiskRowSet 1 name pay role DiskRowSet 2 INSERT(“doug”, “$1B”, “Hadoop man”) flush base data base data
30.
30© Cloudera, Inc.
All rights reserved. Kudu storage - Updates MemRowSet name pay role DiskRowSet 1 name pay role DiskRowSet 2 DeltaMemStore DeltaMemStore base data base data On MemoryOn Disk On Memory
31.
31© Cloudera, Inc.
All rights reserved. Kudu storage - Updates MemRowSet name pay role DiskRowSet 1 name pay role DiskRowSet 2 DeltaMemStore DeltaMemStore UPDATE set pay=“$1M” WHERE name=“todd” Is the row in DiskRowSet 2? (check bloom filters) Is the row in DiskRowSet 1? (check bloom filters) Bloom says: no! Bloom says: maybe! Search key column to find offset: rowid = 150 150: col 1=$1M base data
32.
32© Cloudera, Inc.
All rights reserved. Kudu storage – Delta flushes MemRowSet name pay role DiskRowSet 1 name pay role DiskRowSet 2 DeltaMemStore DeltaMemStore 0: pay=fooREDO DeltaFile Flush A REDO delta indicates how to transform between the ‘base data’ (columnar) and a later version base data base data
33.
33© Cloudera, Inc.
All rights reserved. Kudu storage – Minor delta compaction name pay role DiskRowSet(pre-compaction) Delta MS REDO DeltaFile REDO DeltaFile REDO DeltaFile REDO DeltaFile base data
34.
34© Cloudera, Inc.
All rights reserved. Kudu storage – Major delta compaction name pay role DiskRowSet Delta MS REDO DeltaFile REDO DeltaFile REDO DeltaFile Unmerged REDO DeltaFile base data pay Compaction can be performed only on high-frequent column UNDO Records UNDO stores previous versions of data
35.
35© Cloudera, Inc.
All rights reserved. Kudu storage – RowSet Compactions DRS 1 (32MB) [PK=alice], [PK=iris], [PK=linda], [PK=zach] DRS 2 (32MB) [PK=bob], [PK=jon], [PK=mary] [PK=zeke] DRS 3 (32MB) [PK=carl], [PK=julie], [PK=omar] [PK=zoe] DRS 4 (32MB) DRS 5 (32MB) DRS 6 (32MB) [alice, bob, carl, iris] [jon, julie, linda, mary] [omar, zach, zeke, zoe] Writes for “chris” have to perform bloom lookups on all 3 RS Range: A-Z Range: A-Z Range: A-Z Range: A-I Range: J-M Range: O-Z Reorganize rows to avoid rowsets with overlapping key ranges “chris” is in this range!
36.
36© Cloudera, Inc.
All rights reserved. Kudu Storage - Compactions Main Idea: Always be compacting! Compactions run continuously to prevent IO storms ”Budgeted” RS compactions: What is the best way to spend X MBs IO? Physical/Logical decoupling: different replicas run compactions at different times
37.
37© Cloudera, Inc.
All rights reserved. Kudu storage – Read path MemRowSet name pay role DiskRowSet 1 name pay role DiskRowSet 2 DeltaMemStore DeltaMemStore 150: pay=$1M base data base data Just need to read this DiskRowSet!
38.
38© Cloudera, Inc.
All rights reserved. Kudu storage – Time Travel Read name pay role DiskRowSet Delta MS REDO DeltaFile REDO DeltaFile REDO DeltaFile base data pay UNDO Records T=0: a query starts to read “pay” in other DiskRowSet T=10: major delta compaction happened! Base file is updated, and UNDO is created T=20: the query starts to read “pay” in this DiskRowSet, but read the version of T=0 from UNDO Records
39.
39© Cloudera, Inc.
All rights reserved. Takeaways
40.
40© Cloudera, Inc.
All rights reserved. Getting Started On the web: https://www.cloudera.com/documentation/kudu/latest.html, https://www.cloudera.com/downloads.html, https://blog.cloudera.com/?s=Kudu, kudu.apache.org • Apache project user mailing list: user@kudu.apache.org • Quickstart VM • Easiest way to get started • Impala and Kudu in an easy-to-install VM • CSD and Parcels • For installation on a Cloudera Manager-managed cluster Training classes available: https://www.cloudera.com/more/training.html
41.
41© Cloudera, Inc.
All rights reserved. Nov 7, 2017 Tue ANA Intercontinental Hotel Estimated Attendees #: 1000 E-1: Apache Kudu on Analytical Data Platform Register Now! www.clouderaworldtokyo.com Cloudera World Tokyo 2017
42.
42© Cloudera, Inc.
All rights reserved. Thank you sho@cloudera.com
Télécharger maintenant