4. ...and the 5 R's of 21st Century Literacy
⇨Reading
⇨wRiting
⇨aRithmetic
⇨pRobability
⇨R
Source: Joe BlitzStein, Harvard
5. "data scientists should take a page
from social scientists, who have a
long history of asking where the
data they're working with comes
from, what methods were used to
gather and analyze it, and what
cognitive biases they might bring to
its interpretation."
Kate Crawford, Microsoft Research/MIT
11. Data Scientists have more fun
Source: How to Engage and Retain Analytical Talent
By Elizabeth Craig, Jeanne G. Harris and Henry Egan
January 2010
12. How Do I Become A Data Scientist?
⇨ Learn about matrix factorizations
⇨ Learn about distributed computing
⇨ Learn about statistical analysis
⇨ Learn about optimization
⇨ Learn about machine learning
⇨ Learn about information retrieval
⇨ Learn about signal detection and estimation
⇨ Master algorithms and data structures
⇨ Practice
⇨ Study Engineering
Source: http://www.quora.com/Career-Advice/How-do-I-become-a-data-scientist
13. 6 levels of expertise needed
Data wranglingStatistics
Data mining Visualization
Communication
Data
Science*
Domain & Business Expertise
* a bit of programming
skills doesn't hurt either
30. Want to get your feet wet?
Tableau Public
http://www.tableausoftware.com/public/
SAS Visual Analytics
http://www.sas.com/software/visual-analytics
31. Where to go from here?
⇨ Read 'Competing on Analytics'
⇨ Move on to 'Data Analysis Using SQL and Excel'
⇨ Then buy 'Handbook of Statistical Analysis & Data Mining
Applications'
⇨ Statistics for business:
⇨
http://home.ubalt.edu/ntsbarsh/Business-stat/opre504.htm
⇨ Data Mining:
⇨ www.rapid-i.com (RapidMiner)
⇨
http://www.thearling.com
⇨ http://www.autonlab.org/tutorials/
⇨ For free text books, search www.scribd.com
⇨ Enter http://www.coursera.org
32. More Resources to Get You Started
Books:
⇨ DataMiningTechniques:ForMarketing,SalesandCustomerSupport,MichaelJ.BarryandGordonLinoff
⇨
DataPreparationforDataMining,DorianPyle
⇨ DataMiningAlgorithms,ElbeFrank,IanWitten,JimGray
⇨
AnIntroductiontoInformationRetrieval,ChristopherD.Manning,PrabhakarRaghavan,HinrichSchütze
⇨ InformationRetrieval,C.J.vanRijsbergen
⇨
TheVisualDisplayofQuantitativeInformation,EdwardR.Tufte
Journals,Newsletters,WebSites:
⇨
SIGKDDExplorations,NewsletteroftheACMSIGonKnowledgeDiscoveryandDataMining
⇨ IEEETransactionsonPatternAnalysisandMachineIntelligence
⇨
SASKnowledgeExchange: www.sas.com/knowledge-exchange/business-analytics
⇨ KDNuggetsdataminingresources: www.kdnuggets.com
⇨
FlowingData,visualizationresources: http://flowingdata.com/
⇨ Infoaesthetics,visualdesignresources: http://infosthetics.com/
⇨
VisualComplexity,visualizationresources: www.visualcomplexity.com/vc/index.cfm
⇨ Recommendationsystemsresources:
http://www.deitel.com/ResourceCenters/Web20/RecommenderSystems/tabid/1229/Default.aspx
⇨
TheImpoverishedSocialScientist'sGuidetoFreeStatisticalSoftwareandResources: http://maltman.hmdc.harvard.edu/socsci.shtml
33. Free Stuff So You Can Work Cheaply
⇨
WEKA http://www.cs.waikato.ac.nz/ml/weka/
⇨ IND decision tree software http://opensource.arc.nasa.gov/software/ind/
⇨
Clustering http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/
⇨ Parallel Sets http://eagereyes.org/parallel-sets#download
⇨
RapidMiner http://rapid-i.com/content/blogcategory/38/69/
⇨ Knime http://www.knime.org/
⇨ Orange http://www.ailab.si/Orange/
⇨
R statistics software http://www.r-project.org/
⇨ ARC statistics software http://www.stat.umn.edu/arc/software.html
⇨
Octave numerical and matrix computation http://www.gnu.org/software/octave/
⇨ Processing http://www.processing.org/
⇨
Circos http://mkweb.bcgsc.ca/circos/
⇨
Treemap http://www.cs.umd.edu/hcil/treemap/
⇨ Many Eyes http://manyeyes.alphaworks.ibm.com/manyeyes/
⇨ Dutch Students: SAS & SPSS Academic Licenses (e.g. SurfSpot.nl)
34.
35. Web: www.sas.com
Email: jos.vandongen<at>sas.com
Phone: +31-(0)6-10172008
Skype: tholis.jos
LinkedIn: jvdongen
Twitter: josvandongen
Delicious: jvdongen
Jos van Dongen
In BI since 1991
Principal Consultant @ SAS
Author/Speaker/Analyst