Submit Search
Upload
20161215 python pandas-spark四方山話
•
7 likes
•
1,234 views
Ryuji Tamagawa
Follow
2016/12/15 インサイトテクノロジーさんの三木会でお話しした内容のスライドです。PythonとかPandasとかSparkとか。
Read less
Read more
Technology
Report
Share
Report
Share
1 of 26
Download now
Download to read offline
Recommended
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
Ryuji Tamagawa
20160708 データ処理のプラットフォームとしてのpython 札幌
20160708 データ処理のプラットフォームとしてのpython 札幌
Ryuji Tamagawa
PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase)
Ryuji Tamagawa
20171012 found IT #9 PySparkの勘所
20171012 found IT #9 PySparkの勘所
Ryuji Tamagawa
20170210 sapporotechbar7
20170210 sapporotechbar7
Ryuji Tamagawa
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
Ryuji Tamagawa
Big Data Ecosystem after Spark
Big Data Ecosystem after Spark
bigdata trunk
Cpu analysis with flamegraphs
Cpu analysis with flamegraphs
Vinicius M Grippa
Recommended
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
20161004 データ処理のプラットフォームとしてのpythonとpandas 東京
Ryuji Tamagawa
20160708 データ処理のプラットフォームとしてのpython 札幌
20160708 データ処理のプラットフォームとしてのpython 札幌
Ryuji Tamagawa
PySparkの勘所(20170630 sapporo db analytics showcase)
PySparkの勘所(20170630 sapporo db analytics showcase)
Ryuji Tamagawa
20171012 found IT #9 PySparkの勘所
20171012 found IT #9 PySparkの勘所
Ryuji Tamagawa
20170210 sapporotechbar7
20170210 sapporotechbar7
Ryuji Tamagawa
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
20170927 pydata tokyo データサイエンスな皆様に送る分散処理の基礎の基礎、そしてPySparkの勘所
Ryuji Tamagawa
Big Data Ecosystem after Spark
Big Data Ecosystem after Spark
bigdata trunk
Cpu analysis with flamegraphs
Cpu analysis with flamegraphs
Vinicius M Grippa
Beginner Apache Spark Presentation
Beginner Apache Spark Presentation
Nidhin Pattaniyil
StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有
StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有
Yoshiyasu SAEKI
Brug af Solr i IMPACT
Brug af Solr i IMPACT
IMPACT
Growing a Data Pipeline for Analytics
Growing a Data Pipeline for Analytics
Roberto Agostino Vitillo
Sparkler Presentation for Spark Summit East 2017
Sparkler Presentation for Spark Summit East 2017
Karanjeet Singh
Денис Головняк - Продвинутый поиск с помощью Search API
Денис Головняк - Продвинутый поиск с помощью Search API
LEDC 2016
Final_show
Final_show
Nitay Alon
ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方
Yoshiyasu SAEKI
Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon
Jeremy Hanna
Introduing spark
Introduing spark
Taotao Li
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
台灣資料科學年會
The Evolution of Hadoop at Spotify - Through Failures and Pain
The Evolution of Hadoop at Spotify - Through Failures and Pain
Rafał Wojdyła
MongoDB & Hadoop, Sittin' in a Tree
MongoDB & Hadoop, Sittin' in a Tree
MongoDB
ニュースパスのクローラーアーキテクチャとマイクロサービス
ニュースパスのクローラーアーキテクチャとマイクロサービス
mosa siru
Debugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden Karau
Spark Summit
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Uwe Korn
Apache Spark Super Happy Funtimes - CHUG 2016
Apache Spark Super Happy Funtimes - CHUG 2016
Holden Karau
Go, memcached, microservices
Go, memcached, microservices
mosa siru
Microsoft Azure + R
Microsoft Azure + R
Dmitry Petukhov
Fluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, Scalable
Shu Ting Tseng
Contributing to pandas (Korean)
Contributing to pandas (Korean)
Younggun Kim
data science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & Jupyter
Raj Singh
More Related Content
What's hot
Beginner Apache Spark Presentation
Beginner Apache Spark Presentation
Nidhin Pattaniyil
StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有
StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有
Yoshiyasu SAEKI
Brug af Solr i IMPACT
Brug af Solr i IMPACT
IMPACT
Growing a Data Pipeline for Analytics
Growing a Data Pipeline for Analytics
Roberto Agostino Vitillo
Sparkler Presentation for Spark Summit East 2017
Sparkler Presentation for Spark Summit East 2017
Karanjeet Singh
Денис Головняк - Продвинутый поиск с помощью Search API
Денис Головняк - Продвинутый поиск с помощью Search API
LEDC 2016
Final_show
Final_show
Nitay Alon
ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方
Yoshiyasu SAEKI
Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon
Jeremy Hanna
Introduing spark
Introduing spark
Taotao Li
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
台灣資料科學年會
The Evolution of Hadoop at Spotify - Through Failures and Pain
The Evolution of Hadoop at Spotify - Through Failures and Pain
Rafał Wojdyła
MongoDB & Hadoop, Sittin' in a Tree
MongoDB & Hadoop, Sittin' in a Tree
MongoDB
ニュースパスのクローラーアーキテクチャとマイクロサービス
ニュースパスのクローラーアーキテクチャとマイクロサービス
mosa siru
Debugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden Karau
Spark Summit
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Uwe Korn
Apache Spark Super Happy Funtimes - CHUG 2016
Apache Spark Super Happy Funtimes - CHUG 2016
Holden Karau
Go, memcached, microservices
Go, memcached, microservices
mosa siru
Microsoft Azure + R
Microsoft Azure + R
Dmitry Petukhov
Fluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, Scalable
Shu Ting Tseng
What's hot
(20)
Beginner Apache Spark Presentation
Beginner Apache Spark Presentation
StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有
StackStormを1年間データ基盤で使ってみてぶつかったトラブルとその解決策の共有
Brug af Solr i IMPACT
Brug af Solr i IMPACT
Growing a Data Pipeline for Analytics
Growing a Data Pipeline for Analytics
Sparkler Presentation for Spark Summit East 2017
Sparkler Presentation for Spark Summit East 2017
Денис Головняк - Продвинутый поиск с помощью Search API
Денис Головняк - Продвинутый поиск с помощью Search API
Final_show
Final_show
ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方
Cassandra + Hadoop @ApacheCon
Cassandra + Hadoop @ApacheCon
Introduing spark
Introduing spark
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
使用 Elasticsearch 及 Kibana 進行巨量資料搜尋及視覺化-曾書庭
The Evolution of Hadoop at Spotify - Through Failures and Pain
The Evolution of Hadoop at Spotify - Through Failures and Pain
MongoDB & Hadoop, Sittin' in a Tree
MongoDB & Hadoop, Sittin' in a Tree
ニュースパスのクローラーアーキテクチャとマイクロサービス
ニュースパスのクローラーアーキテクチャとマイクロサービス
Debugging PySpark: Spark Summit East talk by Holden Karau
Debugging PySpark: Spark Summit East talk by Holden Karau
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Fulfilling Apache Arrow's Promises: Pandas on JVM memory without a copy
Apache Spark Super Happy Funtimes - CHUG 2016
Apache Spark Super Happy Funtimes - CHUG 2016
Go, memcached, microservices
Go, memcached, microservices
Microsoft Azure + R
Microsoft Azure + R
Fluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, Scalable
Similar to 20161215 python pandas-spark四方山話
Contributing to pandas (Korean)
Contributing to pandas (Korean)
Younggun Kim
data science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & Jupyter
Raj Singh
PyData London 2017 – Efficient and portable DataFrame storage with Apache Par...
PyData London 2017 – Efficient and portable DataFrame storage with Apache Par...
Uwe Korn
Accelerating Big Data beyond the JVM - Fosdem 2018
Accelerating Big Data beyond the JVM - Fosdem 2018
Holden Karau
Apache Arrow and Pandas UDF on Apache Spark
Apache Arrow and Pandas UDF on Apache Spark
Takuya UESHIN
Wisely Chen Spark Talk At Spark Gathering in Taiwan
Wisely Chen Spark Talk At Spark Gathering in Taiwan
Wisely chen
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
Databricks
Spark7
Spark7
poovarasu maniandan
Fluentd: Unified Logging Layer at CWT2014
Fluentd: Unified Logging Layer at CWT2014
N Masahiro
Spark Streamingによるリアルタイムユーザ属性推定
Spark Streamingによるリアルタイムユーザ属性推定
Yoshiyasu SAEKI
Docker and Fluentd
Docker and Fluentd
N Masahiro
Hands on with Apache Spark
Hands on with Apache Spark
Dan Lynn
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
Databricks
Jumpstart on Apache Spark 2.2 on Databricks
Jumpstart on Apache Spark 2.2 on Databricks
Databricks
Big data beyond the JVM - DDTX 2018
Big data beyond the JVM - DDTX 2018
Holden Karau
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Sarah Guido
Penny coventry fiddler-spsbe23
Penny coventry fiddler-spsbe23
BIWUG
ApacheCon Europe Big Data 2016 – Parquet in practice & detail
ApacheCon Europe Big Data 2016 – Parquet in practice & detail
Uwe Korn
OSINT tools for security auditing with python
OSINT tools for security auditing with python
Jose Manuel Ortega Candel
Similar to 20161215 python pandas-spark四方山話
(20)
Contributing to pandas (Korean)
Contributing to pandas (Korean)
data science toolkit 101: set up Python, Spark, & Jupyter
data science toolkit 101: set up Python, Spark, & Jupyter
PyData London 2017 – Efficient and portable DataFrame storage with Apache Par...
PyData London 2017 – Efficient and portable DataFrame storage with Apache Par...
Accelerating Big Data beyond the JVM - Fosdem 2018
Accelerating Big Data beyond the JVM - Fosdem 2018
Apache Arrow and Pandas UDF on Apache Spark
Apache Arrow and Pandas UDF on Apache Spark
Wisely Chen Spark Talk At Spark Gathering in Taiwan
Wisely Chen Spark Talk At Spark Gathering in Taiwan
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
Overview of Apache Spark 2.3: What’s New? with Sameer Agarwal
Spark7
Spark7
Fluentd: Unified Logging Layer at CWT2014
Fluentd: Unified Logging Layer at CWT2014
Spark Streamingによるリアルタイムユーザ属性推定
Spark Streamingによるリアルタイムユーザ属性推定
Docker and Fluentd
Docker and Fluentd
Hands on with Apache Spark
Hands on with Apache Spark
Jump Start on Apache® Spark™ 2.x with Databricks
Jump Start on Apache® Spark™ 2.x with Databricks
Jumpstart on Apache Spark 2.2 on Databricks
Jumpstart on Apache Spark 2.2 on Databricks
Big data beyond the JVM - DDTX 2018
Big data beyond the JVM - DDTX 2018
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Data Science at Scale: Using Apache Spark for Data Science at Bitly
Penny coventry fiddler-spsbe23
Penny coventry fiddler-spsbe23
ApacheCon Europe Big Data 2016 – Parquet in practice & detail
ApacheCon Europe Big Data 2016 – Parquet in practice & detail
OSINT tools for security auditing with python
OSINT tools for security auditing with python
More from Ryuji Tamagawa
hbstudy 74 Site Reliability Engineering
hbstudy 74 Site Reliability Engineering
Ryuji Tamagawa
20160127三木会 RDB経験者のためのspark
20160127三木会 RDB経験者のためのspark
Ryuji Tamagawa
20151205 Japan.R SparkRとParquet
20151205 Japan.R SparkRとParquet
Ryuji Tamagawa
Performant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame API
Ryuji Tamagawa
Apache Sparkの紹介
Apache Sparkの紹介
Ryuji Tamagawa
足を地に着け落ち着いて考える
足を地に着け落ち着いて考える
Ryuji Tamagawa
ヘルシープログラマ・翻訳と実践
ヘルシープログラマ・翻訳と実践
Ryuji Tamagawa
Google Big Query
Google Big Query
Ryuji Tamagawa
BigQueryの課金、節約しませんか
BigQueryの課金、節約しませんか
Ryuji Tamagawa
You might be paying too much for BigQuery
You might be paying too much for BigQuery
Ryuji Tamagawa
Google BigQueryについて 紹介と推測
Google BigQueryについて 紹介と推測
Ryuji Tamagawa
lessons learned from talking at rakuten technology conference
lessons learned from talking at rakuten technology conference
Ryuji Tamagawa
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
Ryuji Tamagawa
Mongo dbを知ろう devlove関西
Mongo dbを知ろう devlove関西
Ryuji Tamagawa
Seleniumをもっと知るための本の話
Seleniumをもっと知るための本の話
Ryuji Tamagawa
データベース勉強会 In 広島 mongodb
データベース勉強会 In 広島 mongodb
Ryuji Tamagawa
Invitation to mongo db @ Rakuten TechTalk
Invitation to mongo db @ Rakuten TechTalk
Ryuji Tamagawa
MongoDB tuning on AWS
MongoDB tuning on AWS
Ryuji Tamagawa
初めてのMongo db
初めてのMongo db
Ryuji Tamagawa
RDB経験者に送るMongoDBの勘所(db tech showcase tokyo 2013)
RDB経験者に送るMongoDBの勘所(db tech showcase tokyo 2013)
Ryuji Tamagawa
More from Ryuji Tamagawa
(20)
hbstudy 74 Site Reliability Engineering
hbstudy 74 Site Reliability Engineering
20160127三木会 RDB経験者のためのspark
20160127三木会 RDB経験者のためのspark
20151205 Japan.R SparkRとParquet
20151205 Japan.R SparkRとParquet
Performant data processing with PySpark, SparkR and DataFrame API
Performant data processing with PySpark, SparkR and DataFrame API
Apache Sparkの紹介
Apache Sparkの紹介
足を地に着け落ち着いて考える
足を地に着け落ち着いて考える
ヘルシープログラマ・翻訳と実践
ヘルシープログラマ・翻訳と実践
Google Big Query
Google Big Query
BigQueryの課金、節約しませんか
BigQueryの課金、節約しませんか
You might be paying too much for BigQuery
You might be paying too much for BigQuery
Google BigQueryについて 紹介と推測
Google BigQueryについて 紹介と推測
lessons learned from talking at rakuten technology conference
lessons learned from talking at rakuten technology conference
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
丸の内MongoDB勉強会#20LT 2.8のストレージエンジン動かしてみました
Mongo dbを知ろう devlove関西
Mongo dbを知ろう devlove関西
Seleniumをもっと知るための本の話
Seleniumをもっと知るための本の話
データベース勉強会 In 広島 mongodb
データベース勉強会 In 広島 mongodb
Invitation to mongo db @ Rakuten TechTalk
Invitation to mongo db @ Rakuten TechTalk
MongoDB tuning on AWS
MongoDB tuning on AWS
初めてのMongo db
初めてのMongo db
RDB経験者に送るMongoDBの勘所(db tech showcase tokyo 2013)
RDB経験者に送るMongoDBの勘所(db tech showcase tokyo 2013)
Recently uploaded
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Pixlogix Infotech
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Delhi Call girls
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
2toLead Limited
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Miguel Araújo
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
shyamraj55
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Enterprise Knowledge
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Puma Security, LLC
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Maria Levchenko
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Sinan KOZAK
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Alan Dix
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Pooja Nehwal
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
OnBoard
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
HampshireHUG
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
Principled Technologies
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
Scott Keck-Warren
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
naman860154
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Ridwan Fadjar
Recently uploaded
(20)
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
20161215 python pandas-spark四方山話
1.
Python, Pandas, Spark
2.0 Sky
2.
3.
• • Python 2000 (**) •
db tech showcase MongoDB • • FB: Ryuji Tamagawa • Twitter : tamagawa_ryuji
4.
5.
2017
6.
• Python Spark •
7.
• • Python /
Pandas • Spark 2.0
8.
Part 1 :
9.
• • • csv
10.
Python Pandas Python Jupyter Notebook Jenkins Spark
2.0
11.
• Spark API
RDD ~1.3 DataFrame / DataSet 1.4~ • DataFrame API RDD API Python Spark
12.
DataFrame • RDB / •
R Pandas Spark Spark R / Pandas Spark +
13.
Part 2 :
14.
CSV zip RDB Parquet Excel CSV Feather Spark Pandas / Spark
15.
• • CPU • • Pandas
read_csv zip CSV Pandas
16.
2 • CSV CPU Pandas
zip CSV CPU … • Parquet ! •
17.
: Parquet I/O • • Spark
Parquet • Python Parquet
18.
HDFS / S3 Parquet
Parquet
19.
SSD Parquet Parquet
20.
Parquet No No Yes HDD
21.
• • I/O Pandas •
Spark • DataFrame Pandas → Spark Spark → Pandas Pandas → Spark • Apache Arrow
22.
CPU ~2010 2010~ SSD CPU
23.
Apache Spark 2.0 •
1.x • 2.0 1.x • DataFrame API Python • databricks http://go.databricks.com/mastering-apache-spark-2.0 •
24.
Spark 2.0 • CPU •
CPU • SQL DataFrame • + SSD • CSV zip Pandas read_csv
25.
Python + Spark •
Python serialize • DataFrame API UDF UDF Scala/Java • http://www.slideshare.net/dragan10/performant-data-processing-with-pyspark-sparkr- and-dataframe-api Executor JVM DataFrame, Cached Python lambda items: items[0] == ‘abc’ transfer DataFrame, result transfer Driver
Download now