Slides of the course on big data by Clement Levallois from EMLYON Business School.
For business students. Check the online video connected with these slides.
-> The definition and profile of a data scientist is presented: hacker, math person and domain specialist.
1. MK99 – Big Data
1
Big data & cross-platform analytics
MOOC lectures Pr. Clement Levallois
2. MK99 – Big Data
2
What is a data scientist? [or, a guide for business to spot good ones and recruit them!]
3. MK99 – Big Data
3
Source: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
4. MK99 – Big Data
4
+ Math and stats knowledge
•Maths and stats are excellent foundations
•But a data scientist has a different mindset
–Focuses on accuracy of prediction, not causality
–Even if this is not “elegant” in terms of formal models
–Ready to use any bit of information available in the data (text, networks, …)
–See the slide deck on “Machine Learning” for details
5. MK99 – Big Data
5
+ Hacking skills
•Ability to think “out of the box”
–As an econometrician, and as a computer scientist, as a computational linguist and a network analyst!
–Concerned with scale and speed
–Not dependent on packaged software
•Aware of, and contributing to developments in open source
•Following current developments in different academic fields
6. MK99 – Big Data
6
+ Substantive expertise
•Substantive expertise = grasp of the business logic
–Many jumps of optimization come from a good knowledge of the specificities of the domain
–These domains can be quite complex!
–Data scientists must be able to understand and translate these business specificities into their data models
7. MK99 – Big Data
7
A data scientist should be able to…
–Discover interesting angles in the dataset
•You see worthless metadata? I see gold!
–Choose from a wide choice of techniques across social and natural sciences
•Statistics, machine learning, network analysis, natural language processing, etc.
•From economics, physics, psychology, linguistics, computational science, genomics, neuroscience, etc.
–Implement these techniques, possibly on large datasets
•Can you implement them in your programming language of choice?
•Can you deal with large datasets (what if it doesn’t fit in memory?)
•Can you be quick (and not ask for a couple of nights to run a script)
•Can you be cheap (buying more hardware is not always a solution you can afford)
8. MK99 – Big Data
8
How to hire and keep a data scientist in your business?
1.Find them where they hang out: stackoverflow, github, specialized communities on Twitter. Good profiles are PhD students near graduation, and / or leading developers of open source projects.
2.Allow plenty of time for their personal development
–Contributing to open source projects, attending conferences, working on personal projects on their working hours
3.Treat them not as executioners, but as business co-developers
9. MK99 – Big Data
9
This slide presentation is part of a course offered by EMLYON Business School (www.em-lyon.com)
Contact Clement Levallois (levallois [at] em-lyon.com) for more information.