Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Exploratory Data Analysis

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
Session 03 acquiring data
Session 03 acquiring data
Chargement dans…3
×

Consultez-les par la suite

1 sur 8 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Exploratory Data Analysis (20)

Publicité

Plus récents (20)

Exploratory Data Analysis

  1. 1. Exploratory Data Analysis Aditya Laghate Twitter: @thinrhino 1
  2. 2. Who am I? • A pseudo geek • Freelance software consultant • Wildlife photographer Twitter: @thinrhino 2
  3. 3. Agenda • • • • Data gathering Data cleaning Usage of classic unix tools Data analysis Twitter: @thinrhino 3
  4. 4. Data Gathering • Public data websites o data.gov.in o databank.worldbank.org • Social websites o facebook.com o twitter.com • Blogs / websites /etc via scrapping Twitter: @thinrhino 4
  5. 5. Data cleaning • Eg: openrefine o OpenRefine (ex-Google Refine) is a powerful tool for working with messy data, cleaning it, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase o openrefine.org Twitter: @thinrhino 5
  6. 6. Classic Unix Tools • sed /awk • Shell scripts • GNU parallel o Examples: o cat rands20M.txt | awk '{s+=$1} END {print s}’ o cat rands20M.txt | parallel --pipe awk '{s+=$1}END{print s}' | awk '{s+=$1} END {print s}’ o wc -l bigfile.txt o cat bigfile.txt | parallel {print s}' Twitter: @thinrhino --pipe wc -l | awk '{s+=$1} END 6
  7. 7. Data Analysis Twitter: @thinrhino 7
  8. 8. Questions @thinrhino me@adityalaghate.in Twitter: @thinrhino 8

×