Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Scalable Scientific Computing with Dask

377 vues

Publié le

Dask Tutorial at PyConDE / PyData Karlsruhe 2018. These were the introductory slides that mainly contain the link to Matthew Rocklin's Dask workshop at PyData NYC 2018 whereon this workshop was based.

Publié dans : Données & analyses
  • Soyez le premier à commenter

Scalable Scientific Computing with Dask

  1. 1. 1 PyCon.DE / PyData Karlsruhe 2018 Uwe L. Korn Scalable Scientific Computing with Dask
  2. 2. 2 • Senior Data Scientist at Blue Yonder (@BlueYonderTech) • Apache {Arrow, Parquet} PMC • Data Engineer and Architect with heavy focus around Pandas About me xhochy mail@uwekorn.com
  3. 3. 3 • Execution and definition of task graphs • a parallel computing library that scales the existing Python ecosystem. • scales down to your laptop laptop • sclaes up to a cluster What is Dask?
  4. 4. 4 • multi-core and distributed parallel execution • low-level: task schedulers for computation graphs • high-level: Array, Bag and DataFrame More than a single CPU
  5. 5. 5 Dask is • More light-weight • In Python, operates well with C/C++/Fortran/LLVM or other natively compiled code • Part of the Python ecosystem What about Spark?
  6. 6. 6 Spark is • Written in Scala and works well within the JVM • Python support is very limited • Brings its own ecosystem • Able to provide more higher level optimizations What about Spark?
  7. 7. https://github.com/mrocklin/ pydata-nyc-2018-tutorial 7