Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Light Up Your Dark Data

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 20 Publicité

Light Up Your Dark Data

Télécharger pour lire hors ligne

Presentation from Lance Ransom, Product Manager at Continuum Analytics and a former Partner and CTO of Schonfeld Group, at QuantCon 2016 in NYC.

Quants are faced with a complex data environment. Data is everywhere and it's increasingly challenging to analyze, explore and evaluate, all in one language and in one environment. Quants need a unified environment where they are able to write expressions and conduct pushdown processes, all without having to move the data and having the ability to deploy anywhere, anytime. Organizations need to better marshal the data and have visibility to conduct a clean transformation. This session will discuss how businesses gain a better understanding of their data, leading to better results. In the FinServ industry, fluidity in understanding the data will help create better risk models and trading strategies. Ransom will discuss how organizations address these challenges and future proof their work.

Presentation from Lance Ransom, Product Manager at Continuum Analytics and a former Partner and CTO of Schonfeld Group, at QuantCon 2016 in NYC.

Quants are faced with a complex data environment. Data is everywhere and it's increasingly challenging to analyze, explore and evaluate, all in one language and in one environment. Quants need a unified environment where they are able to write expressions and conduct pushdown processes, all without having to move the data and having the ability to deploy anywhere, anytime. Organizations need to better marshal the data and have visibility to conduct a clean transformation. This session will discuss how businesses gain a better understanding of their data, leading to better results. In the FinServ industry, fluidity in understanding the data will help create better risk models and trading strategies. Ransom will discuss how organizations address these challenges and future proof their work.

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Les utilisateurs ont également aimé (20)

Publicité

Similaire à Light Up Your Dark Data (20)

Plus par Anaconda (20)

Publicité

Plus récents (20)

Light Up Your Dark Data

  1. 1. QuantCon “Light Up Your Dark Data” April 2016
  2. 2. What is dark data? 2 SQL CSV REST JSON SQL CSV REST JSON SQL CSV SQL CSV
  3. 3. Example Datasets 3 Trade History Signal History Clearing Data Log Files Ref Data Corp Actions Market Data Models Firm Generated Vendor Generated
  4. 4. Compounding Challenges Accumulates Quickly Disparate Storage Different Vendors Format Changes Ad-hoc Usage Urgent! 4
  5. 5. Workflow Find Data Ad-Hoc ETL Store / Copy Analysis Report 5
  6. 6. Sample Environment 6 Oracle MySQL MSSQL KDB ZIPCSV SQL Python DSL R Matlab C++ Java Storage ETL Analysis REST
  7. 7. Independent First Class Citizens 7 Expression ComputeData
  8. 8. Datashape 8 Structured data description language http://datashape.pydata.org
  9. 9. Datashape Example 9 daily_bars: var * { date: string, symbol: string, open: float64, high: float64, low: float64, close: float64, volume: int64, } Language, compute, and storage independent
  10. 10. Blaze 10 Write expressions independent of storage system Push computations to the data Lazy evaluation Pandas-like API
  11. 11. Blaze 11 http://blaze.pydata.org/
  12. 12. Blaze Expressions 12
  13. 13. Flat File Repositories 13 Many directories and files Dictated structure Naming convention part of dataset Requires one off ad-hoc scripts
  14. 14. Vendor - directory structure /daily/us/nasdaq stocks/ /daily/us/nasdaq stocks/1/ /daily/us/nasdaq stocks/2/ osn.us.txt ostk.us.txt … zyne.us.txt /daily/us/nyse etfs/ /daily/us/nyse stocks/1/ /daily/us/nyse stocks/2/ Contains ~8400 individual files 14
  15. 15. Vendor – file contents 15 Date,Open,High,Low,Close,Volume,OpenInt 20151111,18.5,25.9,18,24.5,1584600,0 20151112,24.25,27.12,22.5,25,83000,0 20151113,25.47,26.2,24.55,25.26,67300,0 20151116,25.01,26.19,24.13,25.02,16900,0 20151117,24.46,25.51,24.38,24.62,25900,0 20151118,24.62,26.31,24.06,25,111100,0 20151119,24.85,26,24.71,25.9,113100,0 … Symbol is not contained within the individual data files /daily/us/nasdaq stocks/1/aaap.us.txt
  16. 16. Lux 16 source: "lux://global-equities/data/daily/us/nasdaq stocks" extractor: "{}/{Symbol}.{Region}.txt" Date,Open,High,Low,Close,Volume,OpenInt,Symbol,Region 20151111,18.5,25.9,18,24.5,1584600,0,aaap,us 20151112,24.25,27.12,22.5,25,83000,0,aaap,us 20151113,25.47,26.2,24.55,25.26,67300,0,aaap,us … 20160322,11.56,11.98,10.8894,11.09,517604,0,zyne,us 20160323,11.3,11.72,9.5,9.75,489743,0,zyne,us 20160324,9.5,10.24,9.22,9.64,188512,0,zyne,us One dataset with ~5.5 million rows
  17. 17. Lux Benefits 17 Combines individual files No separate ETL or storage Names become part of data Optimized compute
  18. 18. Anaconda Mosaic 18 Interactive exploration Intuitive interface Advanced visualizations Catalog of datasets and expressions Provenance and Governance
  19. 19. Live Walkthrough 19
  20. 20. Project References • Anaconda Mosaic - http://know.continuum.io/Anaconda-Mosaic • Blaze Ecosystem - http://blaze.pydata.org • Bokeh - http://bokeh.pydata.org 20

×