Soumettre la recherche
Mettre en ligne
Ibis: Scaling the Python Data Experience
•
7 j'aime
•
3,788 vues
Wes McKinney
Suivre
Delivered at Data Science Summit July 20, 2015. See http://ibis-project.org for more
Lire moins
Lire la suite
Technologie
Signaler
Partager
Signaler
Partager
1 sur 13
Télécharger maintenant
Télécharger pour lire hors ligne
Recommandé
My Data Journey with Python (SciPy 2015 Keynote)
My Data Journey with Python (SciPy 2015 Keynote)
Wes McKinney
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015
Wes McKinney
Python Data Ecosystem: Thoughts on Building for the Future
Python Data Ecosystem: Thoughts on Building for the Future
Wes McKinney
PyData: The Next Generation
PyData: The Next Generation
Wes McKinney
Next-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache Arrow
Wes McKinney
DataFrames: The Extended Cut
DataFrames: The Extended Cut
Wes McKinney
DataFrames: The Good, Bad, and Ugly
DataFrames: The Good, Bad, and Ugly
Wes McKinney
Improving data interoperability in Python and R
Improving data interoperability in Python and R
Wes McKinney
Recommandé
My Data Journey with Python (SciPy 2015 Keynote)
My Data Journey with Python (SciPy 2015 Keynote)
Wes McKinney
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015
Wes McKinney
Python Data Ecosystem: Thoughts on Building for the Future
Python Data Ecosystem: Thoughts on Building for the Future
Wes McKinney
PyData: The Next Generation
PyData: The Next Generation
Wes McKinney
Next-generation Python Big Data Tools, powered by Apache Arrow
Next-generation Python Big Data Tools, powered by Apache Arrow
Wes McKinney
DataFrames: The Extended Cut
DataFrames: The Extended Cut
Wes McKinney
DataFrames: The Good, Bad, and Ugly
DataFrames: The Good, Bad, and Ugly
Wes McKinney
Improving data interoperability in Python and R
Improving data interoperability in Python and R
Wes McKinney
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)
Wes McKinney
Data Science Languages and Industry Analytics
Data Science Languages and Industry Analytics
Wes McKinney
High Performance Python on Apache Spark
High Performance Python on Apache Spark
Wes McKinney
Apache Arrow and Python: The latest
Apache Arrow and Python: The latest
Wes McKinney
How Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperability
Uwe Korn
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory data
Wes McKinney
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen
Wes McKinney
Ibis: Scaling Python Analytics on Hadoop and Impala
Ibis: Scaling Python Analytics on Hadoop and Impala
Wes McKinney
Memory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine Learning
Wes McKinney
PyCon Singapore 2013 Keynote
PyCon Singapore 2013 Keynote
Wes McKinney
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
Wes McKinney
Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory Data
Wes McKinney
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
Cloudera, Inc.
Python Data Wrangling: Preparing for the Future
Python Data Wrangling: Preparing for the Future
Wes McKinney
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
P. Taylor Goetz
Apache Arrow - An Overview
Apache Arrow - An Overview
Dremio Corporation
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Wes McKinney
Apache Spark Briefing
Apache Spark Briefing
Thomas W. Dinsmore
Application Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)
Thomas W. Dinsmore
Data Tools and the Data Scientist Shortage
Data Tools and the Data Scientist Shortage
Wes McKinney
pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Python
Wes McKinney
Contenu connexe
Tendances
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)
Wes McKinney
Data Science Languages and Industry Analytics
Data Science Languages and Industry Analytics
Wes McKinney
High Performance Python on Apache Spark
High Performance Python on Apache Spark
Wes McKinney
Apache Arrow and Python: The latest
Apache Arrow and Python: The latest
Wes McKinney
How Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperability
Uwe Korn
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory data
Wes McKinney
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen
Wes McKinney
Ibis: Scaling Python Analytics on Hadoop and Impala
Ibis: Scaling Python Analytics on Hadoop and Impala
Wes McKinney
Memory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine Learning
Wes McKinney
PyCon Singapore 2013 Keynote
PyCon Singapore 2013 Keynote
Wes McKinney
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
Wes McKinney
Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory Data
Wes McKinney
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
Cloudera, Inc.
Python Data Wrangling: Preparing for the Future
Python Data Wrangling: Preparing for the Future
Wes McKinney
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
P. Taylor Goetz
Apache Arrow - An Overview
Apache Arrow - An Overview
Dremio Corporation
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Wes McKinney
Apache Spark Briefing
Apache Spark Briefing
Thomas W. Dinsmore
Application Architectures with Hadoop
Application Architectures with Hadoop
hadooparchbook
Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)
Thomas W. Dinsmore
Tendances
(20)
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)
Data Science Languages and Industry Analytics
Data Science Languages and Industry Analytics
High Performance Python on Apache Spark
High Performance Python on Apache Spark
Apache Arrow and Python: The latest
Apache Arrow and Python: The latest
How Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperability
Apache Arrow -- Cross-language development platform for in-memory data
Apache Arrow -- Cross-language development platform for in-memory data
Enabling Python to be a Better Big Data Citizen
Enabling Python to be a Better Big Data Citizen
Ibis: Scaling Python Analytics on Hadoop and Impala
Ibis: Scaling Python Analytics on Hadoop and Impala
Memory Interoperability in Analytics and Machine Learning
Memory Interoperability in Analytics and Machine Learning
PyCon Singapore 2013 Keynote
PyCon Singapore 2013 Keynote
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow at DataEngConf Barcelona 2018
Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory Data
PyData: The Next Generation | Data Day Texas 2015
PyData: The Next Generation | Data Day Texas 2015
Python Data Wrangling: Preparing for the Future
Python Data Wrangling: Preparing for the Future
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
Apache Arrow - An Overview
Apache Arrow - An Overview
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Practical Medium Data Analytics with Python (10 Things I Hate About pandas, P...
Apache Spark Briefing
Apache Spark Briefing
Application Architectures with Hadoop
Application Architectures with Hadoop
Advanced Analytics and Big Data (August 2014)
Advanced Analytics and Big Data (August 2014)
En vedette
Data Tools and the Data Scientist Shortage
Data Tools and the Data Scientist Shortage
Wes McKinney
pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Python
Wes McKinney
Productive Data Tools for Quants
Productive Data Tools for Quants
Wes McKinney
PyCon APAC 2016 Keynote
PyCon APAC 2016 Keynote
Wes McKinney
What's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial users
Wes McKinney
A look inside pandas design and development
A look inside pandas design and development
Wes McKinney
Raising the Tides: Open Source Analytics for Data Science
Raising the Tides: Open Source Analytics for Data Science
Wes McKinney
Data Analysis and Statistics in Python using pandas and statsmodels
Data Analysis and Statistics in Python using pandas and statsmodels
Wes McKinney
pandas: a Foundational Python Library for Data Analysis and Statistics
pandas: a Foundational Python Library for Data Analysis and Statistics
Wes McKinney
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and Interoperability
Wes McKinney
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
Wes McKinney
En vedette
(11)
Data Tools and the Data Scientist Shortage
Data Tools and the Data Scientist Shortage
pandas: Powerful data analysis tools for Python
pandas: Powerful data analysis tools for Python
Productive Data Tools for Quants
Productive Data Tools for Quants
PyCon APAC 2016 Keynote
PyCon APAC 2016 Keynote
What's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial users
A look inside pandas design and development
A look inside pandas design and development
Raising the Tides: Open Source Analytics for Data Science
Raising the Tides: Open Source Analytics for Data Science
Data Analysis and Statistics in Python using pandas and statsmodels
Data Analysis and Statistics in Python using pandas and statsmodels
pandas: a Foundational Python Library for Data Analysis and Statistics
pandas: a Foundational Python Library for Data Analysis and Statistics
Improving Python and Spark (PySpark) Performance and Interoperability
Improving Python and Spark (PySpark) Performance and Interoperability
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
Similaire à Ibis: Scaling the Python Data Experience
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
Hakka Labs
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache Hadoop
Cloudera, Inc.
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Codemotion
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Cloudera, Inc.
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
Hortonworks
Deep learning on HDP 2018 Prague
Deep learning on HDP 2018 Prague
Timothy Spann
ADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
ADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
DATAVERSITY
Data Science and CDSW
Data Science and CDSW
Jason Hubbard
Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018
Timothy Spann
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems
Apache deep learning 101
Apache deep learning 101
DataWorks Summit
IBM Developer Model Asset eXchange
IBM Developer Model Asset eXchange
Nick Pentreath
Big Data Open Source Technologies
Big Data Open Source Technologies
neeraj rathore
Data mining tools overall
Data mining tools overall
Mohamed Sharique Vellikan
Keynote at Converge 2019
Keynote at Converge 2019
Travis Oliphant
OpenPOWER foundation update new executive director and bright open future_i...
OpenPOWER foundation update new executive director and bright open future_i...
Ganesan Narayanasamy
OpenFaaS 2019 Project Update
OpenFaaS 2019 Project Update
Alex Ellis
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Uri Laserson
Intel Cloud Foundry and OpenStack
Intel Cloud Foundry and OpenStack
Silicon Valley Cloud Foundry Meetup
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Krishna Petrochemicals
Similaire à Ibis: Scaling the Python Data Experience
(20)
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
Ibis: operating the Python data ecosystem at Hadoop scale by Wes McKinney
Data Science at Scale Using Apache Spark and Apache Hadoop
Data Science at Scale Using Apache Spark and Apache Hadoop
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Luciano Resende - Scaling Big Data Interactive Workloads across Kubernetes Cl...
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Hive, Impala, and Spark, Oh My: SQL-on-Hadoop in Cloudera 5.5
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
Deep learning on HDP 2018 Prague
Deep learning on HDP 2018 Prague
ADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
ADV Slides: Trends in Streaming Analytics and Message-oriented Middleware
Data Science and CDSW
Data Science and CDSW
Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
Apache deep learning 101
Apache deep learning 101
IBM Developer Model Asset eXchange
IBM Developer Model Asset eXchange
Big Data Open Source Technologies
Big Data Open Source Technologies
Data mining tools overall
Data mining tools overall
Keynote at Converge 2019
Keynote at Converge 2019
OpenPOWER foundation update new executive director and bright open future_i...
OpenPOWER foundation update new executive director and bright open future_i...
OpenFaaS 2019 Project Update
OpenFaaS 2019 Project Update
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Intel Cloud Foundry and OpenStack
Intel Cloud Foundry and OpenStack
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Plus de Wes McKinney
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
Wes McKinney
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
Wes McKinney
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney
Apache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data Framework
Wes McKinney
New Directions for Apache Arrow
New Directions for Apache Arrow
Wes McKinney
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
Wes McKinney
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data Frames
Wes McKinney
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
Wes McKinney
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
Wes McKinney
Apache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics Stack
Wes McKinney
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Wes McKinney
Apache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science Stack
Wes McKinney
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
Wes McKinney
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
Wes McKinney
Shared Infrastructure for Data Science
Shared Infrastructure for Data Science
Wes McKinney
Data Science Without Borders (JupyterCon 2017)
Data Science Without Borders (JupyterCon 2017)
Wes McKinney
Plus de Wes McKinney
(16)
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: High Performance Columnar Data Framework
Apache Arrow: High Performance Columnar Data Framework
New Directions for Apache Arrow
New Directions for Apache Arrow
Apache Arrow Flight: A New Gold Standard for Data Transport
Apache Arrow Flight: A New Gold Standard for Data Transport
ACM TechTalks : Apache Arrow and the Future of Data Frames
ACM TechTalks : Apache Arrow and the Future of Data Frames
Apache Arrow: Present and Future @ ScaledML 2020
Apache Arrow: Present and Future @ ScaledML 2020
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
PyCon Colombia 2020 Python for Data Analysis: Past, Present, and Future
Apache Arrow: Leveling Up the Analytics Stack
Apache Arrow: Leveling Up the Analytics Stack
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Apache Arrow: Leveling Up the Data Science Stack
Apache Arrow: Leveling Up the Data Science Stack
Ursa Labs and Apache Arrow in 2019
Ursa Labs and Apache Arrow in 2019
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
PyCon.DE / PyData Karlsruhe keynote: "Looking backward, looking forward"
Shared Infrastructure for Data Science
Shared Infrastructure for Data Science
Data Science Without Borders (JupyterCon 2017)
Data Science Without Borders (JupyterCon 2017)
Dernier
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
carlostorres15106
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Delhi Call girls
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
HampshireHUG
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Pixlogix Infotech
Slack Application Development 101 Slides
Slack Application Development 101 Slides
praypatel2
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Malak Abu Hammad
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Alan Dix
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
2toLead Limited
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
Radu Cotescu
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
XfilesPro
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
OnBoard
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Puma Security, LLC
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Padma Pradeep
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Safe Software
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Sinan KOZAK
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
AndikSusilo4
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Delhi Call girls
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
ThousandEyes
Dernier
(20)
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Slack Application Development 101 Slides
Slack Application Development 101 Slides
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Ibis: Scaling the Python Data Experience
1.
1 © Cloudera,
Inc. All rights reserved. Ibis: Scaling the Python Data Experience Wes McKinney Marcel Kornacker JusFn Erickson Silvius Rus
2.
2 © Cloudera,
Inc. All rights reserved. Wes McKinney • A key person in building today’s open source Python data community • Creator of pandas, a standard Python data wrangling and analyFcs toolkit used by data scienFsts • Author of best-‐selling canonical text Python for Data Analysis (2012) • Formerly Founder/CEO of DataPad (acquired by Cloudera in 2014)
3.
3 © Cloudera,
Inc. All rights reserved. Python is popular… • Python has become a standard language of data science • Why is it popular? • Maximizes producFvity for data engineers and data scienFsts • Build robust so[ware and do interacFve data analysis with 100% Python code • Easy-‐to-‐learn and makes happy and producFve data teams • Large, diverse open source development community • Comprehensive libraries: data wrangling, ML, visualizaFon, etc. • Main use case: data science & engineering swiss army knife on small-‐to-‐medium size data
4.
4 © Cloudera,
Inc. All rights reserved. …but Python does not scale today • Python ecosystem confined to single-‐node analysis • Great for smaller data sets • Requires sampling or aggregaFons for larger data • Distributed tools compromise in various ways • ExtracFng samples or aggregaFons for larger data means: • “Scales” by losing more fidelity • AddiFonal ETL overhead to extract samples/aggregaFons • Loss of producFvity with mulFple languages, tools, etc • Blocks certain analysis and use cases
5.
5 © Cloudera,
Inc. All rights reserved. Ibis: Same Python, now at scale • Target user: • Data scienFsts and data engineers (“Python data users”) • Goals: • Mirrors single-‐node Python experience • Scales to any node and data size • No compromise in funcFonality or usability • InteracFve experience at naFve hardware speeds
6.
6 © Cloudera,
Inc. All rights reserved. What’s announced? • First public release of Ibis • hgp://ibis-‐project.org • Beta release to Cloudera Labs • InviFng usage and community development • Apache-‐licensed open-‐source
7.
7 © Cloudera,
Inc. All rights reserved. Ibis’s Vision • Uncompromised Python experience • 100% Python end-‐to-‐end user workflows • Enable integraFon with the exisFng Python data ecosystem (pandas, scikit-‐ learn, NumPy, etc) • InteracFve at big data scale • Full-‐fidelity analysis without extracFons • Scalability for big data • NaFve hardware speeds for a broad set of use cases
8.
8 © Cloudera,
Inc. All rights reserved.
9.
9 © Cloudera,
Inc. All rights reserved. Advantages of our approach • Analyze big data 100% in Python, with the same ease as small/medium data on the local filesystem • Full-‐fidelity data access • Familiar Python experience and integraFon with exisFng Python data libraries • Provide a means for Python high performance compuFng tools to be leveraged at Hadoop-‐scale
10.
10 © Cloudera,
Inc. All rights reserved. Beta 0.3 release • High level Python API for describing analyFcs and ETL that can be executed by Impala • Familiar API for users of pandas • Comprehensive coverage of operaFons expressible as relaFonal data flows • Integrated tools for managing data in HDFS • Simple workflows to query data files in several formats (Parquet, Avro, Text) • pandas data interchange
11.
11 © Cloudera,
Inc. All rights reserved. Ibis/Impala Joint Roadmap • More natural data modeling • Complex types support • IntegraFon with full Python data ecosystem • Advanced analyFcs + machine learning • Enable use of performance compuFng tools • User extensibility with naFve performance • In-‐memory columnar format • Python-‐to-‐LLVM IR compilaFon • Workflow and usability tools
12.
12 © Cloudera,
Inc. All rights reserved. Benefits of Ibis • Maximize developer producFvity • Mirrors single-‐node Python experience • Solve big data problems without leaving Python • Leverage Python skills, ecosystem, and tools • Python as first-‐class language for Hadoop • Full-‐fidelity analysis without extracFons • Python analysis at any scale • NaFve hardware speeds for a broad set of use cases
13.
13 © Cloudera,
Inc. All rights reserved. Thank you wes@cloudera.com
Télécharger maintenant