SlideShare une entreprise Scribd logo
1  sur  61
Télécharger pour lire hors ligne
Data Science Retreat
Berlin, Mar 2014
http://datascienceretreat.com/
Introduction to the first Data Science school
in Europe
Plus advice for upcoming data scientists
Who Am I
Twitter: @quesada
Before: Consulting on predictive
models of ecommerce (CLV),
data scientist at GetYourGuide
What this talk is about: Two problems
• Making the jump from junior to senior data
science is hard (solution: data science retreat)
• Acquiring the skillset, even with killer online
courses, is hard (solution: meerkat method)
contents
• Data Science retreat
• The meerkat method
• Data Science retreat for companies
• Hiring
• Getting tailored courses at your location
• Advice to anyone on their path to be a data scientist
• Advice to companies growing a data team
Problem: Making the jump from junior to
senior data science is hard
It’s too hard for companies to find data scientists
“It takes 150 phone interviews
to find someone who is good
enough to bring in to continue
on-site”
Alex Kagoshima, Pivotal,
Berlin
People applying to Data scientist jobs have no experience
• Vincent Granville:
“There is no shortage of data scientists. For every linkedin
Job, there are several hundreds applications on average”
Data scientists need to program (5 year experience)
Stefan Schmidt (Amazon Berlin):
“It takes us months to fill our positions; we hire world-wide
for Berlin openings. Most profiles cannot program at the level
we need. We have engineers, but the data scientist needs to be
able to understand large projects and commit code”
Truth is, data scientist is a senior role
• Often, advising to the CEO directly
• This is why so many people with strong profiles and
lots of coursera courses cannot find jobs
The gap from junior to senior
• Junior:
• Has a technical degree
• Has done some courses online
• Has never worked with data that generates value to
companies
• Can apply ‘recipes’, but not think creatively about data
sources and algorithms
The gap from junior to senior
• Junior:
• Has a technical degree
• Has done some courses online
• Has never worked with data that generates value to
companies
• Can apply ‘recipes’, but not think creatively about data
sources and algorithms
This profile has no practical value for most companies
Data Science Retreat
The story
TEAM: Chief-Data-Scientist-level mentors
Formulating the analytical problem
• Finding the question
• Translating something vague into a
dependent measure and an actual set of
predictors
• What generates business value?
• The Business Model Canvas to design a data
product
• Key performance indicators; examples,
measurement, improvement
• Most business problems are not very well
defined. How do we make them actionable?
• Analyzing big success stories in data science
• Getting Buy-in
Getting data (APIs, feature engineering)
• Using APIs
• Using databases
• Parsing html; web scrapping
• Transforming data (reshape)
• Finding APIs
• Feature engineering
• Avoiding autocorrelation
• Removing features with low variance
• Detecting outliers
• Exploratory analyses
• Measuring predictor importance
Finding insights, making predictions
• Regression
• Linear regression, penalized
models
• non-linear regression
• SVM
• K-nearest neighbors
• regression trees + rule-based
models (random forests)
Finding insights, making predictions
• Classification
• Logistic regression, linear
classification
• nonlinear classification models
• Classification trees + rule-
bases models (random forests)
• SVM
• K-nearest neighbors
• naive bayes
R
• R language fundamentals
• data structures (including
data.table)
• subsetting
• input/output
• functions/control flow
• vectorization
• split-apply-combine
advanced R
functional programming in R
Profiling
object systems
packaging
Rcpp
R
• advanced R
• functional programming in
R
• Profiling
• object systems
• packaging
• Rcpp
data at scale
• MapReduce
• MapReduce, Google 2004.
• Applications, extensions. Beyond
MapReduce.
• Big Data analysis
• Preparation and configuration
• Hadoop cluster overview.
• Practice: Uploading / downloading
/ moving files around, executing
jobs, checking for completion /
failure, etc.
data at scale
• Hive / Pig
• Defining a Hive table, querying a Hive table.
• Integrating R with Hive.
• An introduction to Pig.
• Mahout
• Executing clustering tasks. Visualizing the
results with R.
• Executing an item-to-item recommender.
• Cascading / Pattern
• Data flow modeling using PyCascading.
• Executing Machine Learning "Pattern"
algorithms.
Location: Microsoft ventures berlin
Methodology: portfolio project
• Ten students per batch
• Pair programming and code reviews with mentors (guild model)
• Datasets come from companies (non-NDA only)
• Portfolio project, where the fellow demonstrates what he can do
end-to-end to deliver value
• Weekly presentation training to improve communication to
non-technical stakeholders (video feedback)
Who we are looking for
• Passion for generating insights from data
• Familiarity with trends in data growth, open-source platforms, and public
data sets.
• From familiarity to strong knowledge of statistical methods
• Some experience with statistical languages and packages, including Mahout, R
or python with pandas
• Some familiarity with visualization software and techniques (including
Tableau)
• Preferably, experience working hands-on with large-scale data sets
• Excellent written and verbal communications skills
Acquiring the skillset, even with killer online
courses, is hard
Problem projectexercise
Problem ProjectExercise
Problem ExerciseProject
Concepts in meerkat method
• Learner
• Mentor
• Scorpion
• Maiming the scorpion
Problem ProjectExercise
advantages
• No need to find the right tutorial/book/whatever
• Spend more time at the border of your capability
• You Save time doing exercises that would be too easy
Advantages (cont)
• Higher project completion rates: all projects must have a
concrete output, so you will see your own progress in
tangible ways
• You will have an Easier time to demonstrate progress to
yourself and to others (the Mentor vouches for the
Learner).
• You will get more hands-on training than in other methods
Interested in being a learner?
AppLY
Data Science Retreat for companies
Sponsors
Meet Stefan, he's the Chief data scientist of
bigCompany.
They wanted to go 'all in' with big data, but
needed people capable of taming it.
He realized it's not easy to find Data Scientists.
What do you
mean ‘sub-
second queries
by Monday’?
What do you
mean ‘improve
predictions two
standard
deviations’?
€?
Then Laura, Stefan’s friend,
pointed him to
Data Science Retreat
… an intensive course helping selected fellows
ramp-up fast for a career in data science. “Tell
me more…”. Stefan was very interested.
Stefan could
interview ten
data scientists that
were
as good as Ben.
He hired three, an
they jumped into
their roles with little
training.
Stefan was Ecstatic!
How It works
• As a sponsor you pay 7000€ in advance + 3000€ after the
data scientist worked on-site for 3 months and you know you
want to keep him.
• students who take the sponsorship agree to work for a reduced
salary (50%) the first 3 months. The salary savings during the
internship should cover the cost of the sponsorship. When the
students finish the program, no one has any obligations.
• You prepay 7000€ and become a sponsor.
• At this point, you don't know the students.
• But as a committed sponsor you participate actively
during the retreat, see the student's presentations, go
out for lunch with them, etc
• Thanks to these activities, you have now developed
strong relationships and know more about the students
than what would come out in interviews.
How It works: an example
How it works, an example (contd)
• You have set your target on a killer candidate: Klaas. You
make an offer, he accepts, and he starts working at your
location
• Klaas gets paid 50% of his negotiated salary. If Klaas’
60000€/year, that is a 5000€/mo cost, and produces
2500€ * 3 months = 7500€ savings, which covers your
initial investment of 7000€.
Data Science Retreat
Contact:
Jose Quesada, PhD,
Director, Data Science Retreat Berlin
jose@datascienceretreat.com
DO you want to be a sponsor?
Advice to anyone on their path to be a data scientist
• Try to find a mentor
• Spend as much time at the border of your ability
• Practice communication
• Having a culture that can integrate such individuals is as
hard as finding them. Interview your companies
• How do you move from being a junior person to being the
'CEO wisperer'? Spend time with people who are
Getting tailored courses at your location
• We listen to the people you need to train before we
design the course
• We will start with a dataset that is important for your
company. Lacking that, we’ll bring a public that is
relevant
• Enterprisey courses are supposed to be non-effective
Hire somebody who’s better at engineering and teach him data science or
hire somebody who’s better with data and teach him engineering?
• Is your culture ready? Because if you manage to attract
someone senior enough, they will sense if it's not
• The problems you have must be a good match for the data
scientists. People are extremely specialized, more so after
PhDs. If you have say graph theory/recommendation
problems, and hire someone with a time series background,
things will take a while no matter who good he is in his
field
And Stefan’s CEO?
He’s optimistic about this
new big-data deployment!
Thank you for your attention
Jose Quesada: jose@datascienceretreat.com
@quesada
@datascienceret
Sponsors

Contenu connexe

Tendances

Back to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchBack to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchKlaas Bosteels
 
Supporting decisions with ML
Supporting decisions with MLSupporting decisions with ML
Supporting decisions with MLMegan Neider
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist prateek kumar
 
Is Data Scientist the Sexiest Job of the 21st century?
Is Data Scientist the Sexiest Job of the 21st century?Is Data Scientist the Sexiest Job of the 21st century?
Is Data Scientist the Sexiest Job of the 21st century?Edureka!
 
Data Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyData Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyLyn Fenex
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprisemark madsen
 
H2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.ioH2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.ioSri Ambati
 
What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist Experian_US
 
How to Build Data Science Teams
How to Build Data Science TeamsHow to Build Data Science Teams
How to Build Data Science TeamsGanes Kesari
 
New professional careers in data
New professional careers in dataNew professional careers in data
New professional careers in dataDavid Rostcheck
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkVivian S. Zhang
 
Best Practices for Scaling Data Science Across the Organization
Best Practices for Scaling Data Science Across the OrganizationBest Practices for Scaling Data Science Across the Organization
Best Practices for Scaling Data Science Across the OrganizationChasity Gibson
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellSri Ambati
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) Dataiku
 
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use CaseData Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use CaseFormulatedby
 
Dataiku r users group v2
Dataiku   r users group v2Dataiku   r users group v2
Dataiku r users group v2Cdiscount
 

Tendances (20)

Data Analytics: From Basic Skills to Executive Decision-Making
Data Analytics: From Basic Skills to Executive Decision-MakingData Analytics: From Basic Skills to Executive Decision-Making
Data Analytics: From Basic Skills to Executive Decision-Making
 
Back to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from ScratchBack to Square One: Building a Data Science Team from Scratch
Back to Square One: Building a Data Science Team from Scratch
 
Supporting decisions with ML
Supporting decisions with MLSupporting decisions with ML
Supporting decisions with ML
 
Who is a data scientist
Who is a data scientist  Who is a data scientist
Who is a data scientist
 
Is Data Scientist the Sexiest Job of the 21st century?
Is Data Scientist the Sexiest Job of the 21st century?Is Data Scientist the Sexiest Job of the 21st century?
Is Data Scientist the Sexiest Job of the 21st century?
 
Data Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyData Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st Century
 
Operationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the EnterpriseOperationalizing Machine Learning in the Enterprise
Operationalizing Machine Learning in the Enterprise
 
H2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.ioH2O World - What you need before doing predictive analysis - Keen.io
H2O World - What you need before doing predictive analysis - Keen.io
 
What is a Data Scientist
What is a Data Scientist What is a Data Scientist
What is a Data Scientist
 
The Big Data Dream Team
The Big Data Dream TeamThe Big Data Dream Team
The Big Data Dream Team
 
How to Build Data Science Teams
How to Build Data Science TeamsHow to Build Data Science Teams
How to Build Data Science Teams
 
Lean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science teamLean Analytics: How to get more out of your data science team
Lean Analytics: How to get more out of your data science team
 
New professional careers in data
New professional careers in dataNew professional careers in data
New professional careers in data
 
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talkNYC Open Data Meetup-- Thoughtworks chief data scientist talk
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
 
Best Practices for Scaling Data Science Across the Organization
Best Practices for Scaling Data Science Across the OrganizationBest Practices for Scaling Data Science Across the Organization
Best Practices for Scaling Data Science Across the Organization
 
H2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin LedellH2O World - Intro to Data Science with Erin Ledell
H2O World - Intro to Data Science with Erin Ledell
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
How to Build a Successful Data Team - Florian Douetteau (@Dataiku)
 
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use CaseData Science Salon: Introduction to Machine Learning - Marketing Use Case
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
 
Dataiku r users group v2
Dataiku   r users group v2Dataiku   r users group v2
Dataiku r users group v2
 

En vedette

A Infrared hyperspectral imaging technique for non-invasive cancer detection.
A Infrared hyperspectral imaging technique for non-invasive cancer detection.A Infrared hyperspectral imaging technique for non-invasive cancer detection.
A Infrared hyperspectral imaging technique for non-invasive cancer detection.IJERD Editor
 
lgit.cursus.ru
 lgit.cursus.ru lgit.cursus.ru
lgit.cursus.ruAretevt62
 
A quick overview of the available reference managers2010
A quick overview of the available reference managers2010A quick overview of the available reference managers2010
A quick overview of the available reference managers2010Jose Quesada
 
R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009Jose Quesada
 
Wave Hackathon Intro
Wave Hackathon IntroWave Hackathon Intro
Wave Hackathon IntroJose Quesada
 
Irmles2010 Random indexing spaces to bridge the human and data webs
Irmles2010 Random indexing spaces to bridge the human and data websIrmles2010 Random indexing spaces to bridge the human and data webs
Irmles2010 Random indexing spaces to bridge the human and data websJose Quesada
 
Лекция А. Шульгина из цикла "13 лекций о будущем"
Лекция А. Шульгина из цикла "13 лекций о будущем"Лекция А. Шульгина из цикла "13 лекций о будущем"
Лекция А. Шульгина из цикла "13 лекций о будущем"ASIMP
 
BCS APSG Theory of Systems
BCS APSG Theory of SystemsBCS APSG Theory of Systems
BCS APSG Theory of SystemsGeoff Sharman
 
General systems theory - a brief introduction
General systems theory - a brief introductionGeneral systems theory - a brief introduction
General systems theory - a brief introductionMark Stancombe
 
data science @NYT ; inaugural Data Science Initiative Lecture
data science @NYT ; inaugural Data Science Initiative Lecturedata science @NYT ; inaugural Data Science Initiative Lecture
data science @NYT ; inaugural Data Science Initiative Lecturechris wiggins
 
Booz Allen Field Guide to Data Science
Booz Allen Field Guide to Data Science Booz Allen Field Guide to Data Science
Booz Allen Field Guide to Data Science Booz Allen Hamilton
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 

En vedette (14)

A Infrared hyperspectral imaging technique for non-invasive cancer detection.
A Infrared hyperspectral imaging technique for non-invasive cancer detection.A Infrared hyperspectral imaging technique for non-invasive cancer detection.
A Infrared hyperspectral imaging technique for non-invasive cancer detection.
 
lgit.cursus.ru
 lgit.cursus.ru lgit.cursus.ru
lgit.cursus.ru
 
A quick overview of the available reference managers2010
A quick overview of the available reference managers2010A quick overview of the available reference managers2010
A quick overview of the available reference managers2010
 
R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009R for the semantic web, Quesada useR 2009
R for the semantic web, Quesada useR 2009
 
Wave Hackathon Intro
Wave Hackathon IntroWave Hackathon Intro
Wave Hackathon Intro
 
Irmles2010 Random indexing spaces to bridge the human and data webs
Irmles2010 Random indexing spaces to bridge the human and data websIrmles2010 Random indexing spaces to bridge the human and data webs
Irmles2010 Random indexing spaces to bridge the human and data webs
 
1619 quantum computing
1619 quantum computing1619 quantum computing
1619 quantum computing
 
Лекция А. Шульгина из цикла "13 лекций о будущем"
Лекция А. Шульгина из цикла "13 лекций о будущем"Лекция А. Шульгина из цикла "13 лекций о будущем"
Лекция А. Шульгина из цикла "13 лекций о будущем"
 
#BigDataCanarias: "Big Data & Career Paths"
#BigDataCanarias: "Big Data & Career Paths"#BigDataCanarias: "Big Data & Career Paths"
#BigDataCanarias: "Big Data & Career Paths"
 
BCS APSG Theory of Systems
BCS APSG Theory of SystemsBCS APSG Theory of Systems
BCS APSG Theory of Systems
 
General systems theory - a brief introduction
General systems theory - a brief introductionGeneral systems theory - a brief introduction
General systems theory - a brief introduction
 
data science @NYT ; inaugural Data Science Initiative Lecture
data science @NYT ; inaugural Data Science Initiative Lecturedata science @NYT ; inaugural Data Science Initiative Lecture
data science @NYT ; inaugural Data Science Initiative Lecture
 
Booz Allen Field Guide to Data Science
Booz Allen Field Guide to Data Science Booz Allen Field Guide to Data Science
Booz Allen Field Guide to Data Science
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 

Similaire à Data Science Retreat Helps Companies Find Talent

Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadVamsiNihal
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabadsaitejavella
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training HyderabadNithinsunil1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabadVamsiNihal
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
data science training and placement
data science training and placementdata science training and placement
data science training and placementSaiprasadVella
 
online data science training
online data science trainingonline data science training
online data science trainingDIGITALSAI1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabadVamsiNihal
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabadVamsiNihal
 
Best data science training in Hyderabad
Best data science training in HyderabadBest data science training in Hyderabad
Best data science training in HyderabadKumarNaik21
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training HyderabadNithinsunil1
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data careerAdwait Bhave
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)SayyedYusufali
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)SayyedYusufali
 
Data science training in hydpdf converted (1)
Data science training in hydpdf  converted (1)Data science training in hydpdf  converted (1)
Data science training in hydpdf converted (1)SayyedYusufali
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and PlacementAkhilGGM
 

Similaire à Data Science Retreat Helps Companies Find Talent (20)

Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
 
online data science training
online data science trainingonline data science training
online data science training
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabad
 
Best data science training in Hyderabad
Best data science training in HyderabadBest data science training in Hyderabad
Best data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data career
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)
 
Data science training in hydpdf converted (1)
Data science training in hydpdf  converted (1)Data science training in hydpdf  converted (1)
Data science training in hydpdf converted (1)
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
 

Data Science Retreat Helps Companies Find Talent

  • 1. Data Science Retreat Berlin, Mar 2014 http://datascienceretreat.com/ Introduction to the first Data Science school in Europe Plus advice for upcoming data scientists
  • 2. Who Am I Twitter: @quesada Before: Consulting on predictive models of ecommerce (CLV), data scientist at GetYourGuide
  • 3. What this talk is about: Two problems • Making the jump from junior to senior data science is hard (solution: data science retreat) • Acquiring the skillset, even with killer online courses, is hard (solution: meerkat method)
  • 4. contents • Data Science retreat • The meerkat method • Data Science retreat for companies • Hiring • Getting tailored courses at your location • Advice to anyone on their path to be a data scientist • Advice to companies growing a data team
  • 5. Problem: Making the jump from junior to senior data science is hard
  • 6. It’s too hard for companies to find data scientists “It takes 150 phone interviews to find someone who is good enough to bring in to continue on-site” Alex Kagoshima, Pivotal, Berlin
  • 7. People applying to Data scientist jobs have no experience • Vincent Granville: “There is no shortage of data scientists. For every linkedin Job, there are several hundreds applications on average”
  • 8. Data scientists need to program (5 year experience) Stefan Schmidt (Amazon Berlin): “It takes us months to fill our positions; we hire world-wide for Berlin openings. Most profiles cannot program at the level we need. We have engineers, but the data scientist needs to be able to understand large projects and commit code”
  • 9. Truth is, data scientist is a senior role • Often, advising to the CEO directly • This is why so many people with strong profiles and lots of coursera courses cannot find jobs
  • 10. The gap from junior to senior • Junior: • Has a technical degree • Has done some courses online • Has never worked with data that generates value to companies • Can apply ‘recipes’, but not think creatively about data sources and algorithms
  • 11. The gap from junior to senior • Junior: • Has a technical degree • Has done some courses online • Has never worked with data that generates value to companies • Can apply ‘recipes’, but not think creatively about data sources and algorithms This profile has no practical value for most companies
  • 15. Formulating the analytical problem • Finding the question • Translating something vague into a dependent measure and an actual set of predictors • What generates business value? • The Business Model Canvas to design a data product • Key performance indicators; examples, measurement, improvement • Most business problems are not very well defined. How do we make them actionable? • Analyzing big success stories in data science • Getting Buy-in
  • 16. Getting data (APIs, feature engineering) • Using APIs • Using databases • Parsing html; web scrapping • Transforming data (reshape) • Finding APIs • Feature engineering • Avoiding autocorrelation • Removing features with low variance • Detecting outliers • Exploratory analyses • Measuring predictor importance
  • 17. Finding insights, making predictions • Regression • Linear regression, penalized models • non-linear regression • SVM • K-nearest neighbors • regression trees + rule-based models (random forests)
  • 18. Finding insights, making predictions • Classification • Logistic regression, linear classification • nonlinear classification models • Classification trees + rule- bases models (random forests) • SVM • K-nearest neighbors • naive bayes
  • 19. R • R language fundamentals • data structures (including data.table) • subsetting • input/output • functions/control flow • vectorization • split-apply-combine advanced R functional programming in R Profiling object systems packaging Rcpp
  • 20. R • advanced R • functional programming in R • Profiling • object systems • packaging • Rcpp
  • 21. data at scale • MapReduce • MapReduce, Google 2004. • Applications, extensions. Beyond MapReduce. • Big Data analysis • Preparation and configuration • Hadoop cluster overview. • Practice: Uploading / downloading / moving files around, executing jobs, checking for completion / failure, etc.
  • 22. data at scale • Hive / Pig • Defining a Hive table, querying a Hive table. • Integrating R with Hive. • An introduction to Pig. • Mahout • Executing clustering tasks. Visualizing the results with R. • Executing an item-to-item recommender. • Cascading / Pattern • Data flow modeling using PyCascading. • Executing Machine Learning "Pattern" algorithms.
  • 24. Methodology: portfolio project • Ten students per batch • Pair programming and code reviews with mentors (guild model) • Datasets come from companies (non-NDA only) • Portfolio project, where the fellow demonstrates what he can do end-to-end to deliver value • Weekly presentation training to improve communication to non-technical stakeholders (video feedback)
  • 25. Who we are looking for • Passion for generating insights from data • Familiarity with trends in data growth, open-source platforms, and public data sets. • From familiarity to strong knowledge of statistical methods • Some experience with statistical languages and packages, including Mahout, R or python with pandas • Some familiarity with visualization software and techniques (including Tableau) • Preferably, experience working hands-on with large-scale data sets • Excellent written and verbal communications skills
  • 26. Acquiring the skillset, even with killer online courses, is hard
  • 27.
  • 28.
  • 31.
  • 32.
  • 33.
  • 34. Concepts in meerkat method • Learner • Mentor • Scorpion • Maiming the scorpion
  • 36.
  • 37. advantages • No need to find the right tutorial/book/whatever • Spend more time at the border of your capability • You Save time doing exercises that would be too easy
  • 38. Advantages (cont) • Higher project completion rates: all projects must have a concrete output, so you will see your own progress in tangible ways • You will have an Easier time to demonstrate progress to yourself and to others (the Mentor vouches for the Learner). • You will get more hands-on training than in other methods
  • 39. Interested in being a learner? AppLY
  • 40. Data Science Retreat for companies
  • 42. Meet Stefan, he's the Chief data scientist of bigCompany. They wanted to go 'all in' with big data, but needed people capable of taming it.
  • 43. He realized it's not easy to find Data Scientists.
  • 44. What do you mean ‘sub- second queries by Monday’?
  • 45. What do you mean ‘improve predictions two standard deviations’?
  • 46. €?
  • 47.
  • 48.
  • 49. Then Laura, Stefan’s friend, pointed him to Data Science Retreat … an intensive course helping selected fellows ramp-up fast for a career in data science. “Tell me more…”. Stefan was very interested.
  • 50. Stefan could interview ten data scientists that were as good as Ben. He hired three, an they jumped into their roles with little training. Stefan was Ecstatic!
  • 51. How It works • As a sponsor you pay 7000€ in advance + 3000€ after the data scientist worked on-site for 3 months and you know you want to keep him. • students who take the sponsorship agree to work for a reduced salary (50%) the first 3 months. The salary savings during the internship should cover the cost of the sponsorship. When the students finish the program, no one has any obligations.
  • 52. • You prepay 7000€ and become a sponsor. • At this point, you don't know the students. • But as a committed sponsor you participate actively during the retreat, see the student's presentations, go out for lunch with them, etc • Thanks to these activities, you have now developed strong relationships and know more about the students than what would come out in interviews. How It works: an example
  • 53. How it works, an example (contd) • You have set your target on a killer candidate: Klaas. You make an offer, he accepts, and he starts working at your location • Klaas gets paid 50% of his negotiated salary. If Klaas’ 60000€/year, that is a 5000€/mo cost, and produces 2500€ * 3 months = 7500€ savings, which covers your initial investment of 7000€.
  • 54. Data Science Retreat Contact: Jose Quesada, PhD, Director, Data Science Retreat Berlin jose@datascienceretreat.com DO you want to be a sponsor?
  • 55. Advice to anyone on their path to be a data scientist • Try to find a mentor • Spend as much time at the border of your ability • Practice communication • Having a culture that can integrate such individuals is as hard as finding them. Interview your companies • How do you move from being a junior person to being the 'CEO wisperer'? Spend time with people who are
  • 56. Getting tailored courses at your location • We listen to the people you need to train before we design the course • We will start with a dataset that is important for your company. Lacking that, we’ll bring a public that is relevant • Enterprisey courses are supposed to be non-effective
  • 57. Hire somebody who’s better at engineering and teach him data science or hire somebody who’s better with data and teach him engineering? • Is your culture ready? Because if you manage to attract someone senior enough, they will sense if it's not • The problems you have must be a good match for the data scientists. People are extremely specialized, more so after PhDs. If you have say graph theory/recommendation problems, and hire someone with a time series background, things will take a while no matter who good he is in his field
  • 59. He’s optimistic about this new big-data deployment!
  • 60. Thank you for your attention Jose Quesada: jose@datascienceretreat.com @quesada @datascienceret