#Datacaeer - AI Guild workshop on data roles in industry with Adam Green

#DATACAREER
FIRST AND SECOND #DATAROLES IN THE INDUSTRY
AND STARTUPS
SPECIAL EDITION WITH ADAM GREEN
FROM AIGUILD.EVENTBRITE.COM

#DATACAREER
“No matter who you are, self-improvement is
one of the most important and most
overlooked attributes of young AI talent. It
only takes four years of experience to become
a senior AI researcher, or five years of
experience to lead an entire institute. The
determination and discipline to improve both
the hard and soft skills continually will be the
deciding factor in an AI researcher’s career.”
Jean-François Gagné

Dânia Meira
Founding member, AI Guild
ML models for predictive analytics
Former bootcamp teacher
#datacareer since 2012
LinkedIn

Adam Green
Senior data scientist
Former bootcamp director
Focus on energy industry
LinkedIn

Chris Armbruster
10,000 Data Scientists for Europe
Former bootcamp director
#datacareer coaching since 2017
LinkedIn

#DATACAREER
WORKSHOP
OUTLINE
AI Guild career
coaching
#dataroles
specialization
#dataroles
upgrading
#datacareer
orientation

AI GUILD CAREER
COACHING
Running for junior and for senior
practitioners since early 2019
Runs monthly for AI Guild members
Coaching capacity per year: 240
participants

INSIGHTS FROM
CAREER
COACHING
Search for the 1st as well as the 2nd role may
take >6 months
Upgrading inside a company may be easier
Job advertisements may be misleading and
confusing
The role ‘in real life’ may not match the talents
expectations

OBSERVING THE
MARKET
Specialization and differentiation of roles
Rising value of domain expertise
Experimental phase with PoC plays ending
Increasing focus on deployment

ANECDOTAL EVIDENCE
FOR DIGITAL ADOPTION
AND BEHAVIOR
INCREASING 10X
… but labor market
admittedly very
difficult

#DATACAREER
WORKSHOP
OUTLINE #dataroles
specialization
#dataroles
upgrading
#datacareer
orientation

PRODUCTIONIZING MACHINE LEARNING
ML
Models
Data Collection Data Quality
Infrastructure
Process Management
Tools
Machine
Resource
Management
Monitoring
Configuration
Feature
Extraction
Analysis
Data Preprocessing
Parameter
Configuration
Offline
Validation
Business
Logic
A/B Testing
Data
Engineer
Data
Scientist
Data
Analyst
ML Engineer
AI
Researcher
#dataroles
See also: “Hidden Technical Debt in Machine Learning System” by Sculley et al, Google inc, 2015

#DATAROLES
Task Understand business case,
build features to train predictive
models to address such use
cases
Skill Statistics, SQL,
programming (e.g. python, R),
ML & DL techniques.
Data Scientist
Task Business and data under-
standing to report on what
happens
Skill Descriptive analytics, SQL,
statistics, dashboarding and
visualization tools
Data Analyst Data Engineer
Task Build and maintain infra-
structure and pipeline to collect,
clean and pre-process data
Skill Distributed systems,
databases, software engineering
Task Optimize, deploy and
maintain machine learning
models in production
Skill Software engineering,
devops and systems architecture
Machine Learning
Engineer
Task Build new machine learning
algorithms, find custom scientific
solutions
Skill Research, presenting at
conferences, writing publications
AI Researcher

‚COOKING‘ DATA: EXPLAINING SPECIALIZATION
ML
Models
Data Collection Data Quality
Infrastructure
Process Management
Tools
Machine
Resource
Management
Monitoring
Configuration
Feature
Extraction
Analysis
Data Preprocessing
Parameter
Configuration
Offline
Validation
Business
Logic
A/B Testing
See also: Understanding a Machine Learning Workflow Through Food by Daniel Godoy
Sowing Harvesting Choose recipe
Prepare ingredients
Customers tasting
Kitchen Tasting
Use utensils
Try combinations of
appliances and recipes
Kitchen space and available appliances

UNDERSTANDING #DATAROLES
Build Kitchen Appliances
Create and use recipes to cook
Check quality of ingredients and recipes
Process ingredients at scale
Turn a recipe into many dishes served efficiently
Data
Engineer
Data
Scientist
Data
Analyst
ML Engineer
AI
Researcher

ADAM’S CASE: COMING INTO DATA FROM INDUSTRY
Chemical engineering and black
box modelling
Working as energy engineer with
spreadsheets and linear
programming
From bootcamp graduate to
director

ADAM’S TAKE ON #DATAROLES
THE DATA ANALYST
ENRICHES DATA
THE DATA SCIENTIST MAKES
PREDICTIONS
THE DATA ENGINEER
ENABLES ACCESS TO DAATA

WHAT IS GREAT
ABOUT BEING A
DATA SCIENTIST
¡ A never-ending story of learning
¡ Tooling is free
¡ Lots of freely accessible data
¡ Leverage of the technology
¡ The variety of non-traditional and
interdisciplinary routes into the field
¡ Future proof
¡ People are excited and interested in
what you do
¡ Many interesting life lessons

WHAT IS GETTING
EASIER
¡ Tooling
¡ Putting code into
production
¡ Differentiation of roles

WHAT IS STILL
DIFFICULT
¡ Knowing where to stop learning
¡ Mastering new algorithms
¡ Keeping up with research
¡ Dealing with the impostor
syndrome
¡ Access to simulators
¡ APIs and libraries for
Reinforcement Learning

#DATACAREER
WORKSHOP
OUTLINE
#dataroles upgrading #datacareer orientation

LET‘S START FROM
YOUR ‚USERS‘ AND
‚CUSTOMERS‘
¡ Hiring managers
¡ Human resources
¡ Recruiters
¡ Network of friends and
colleagues
¡ Company leaders

DATA ENGINEER
SQL, Bash, Java, Scala, Python
Hadoop: Hive, Pig, Spark
Databases e.g. Microsoft SQL, PostgreSQL, MongoDB
Platforms: AWS, Google Cloud Platform, Microsoft Azure, Linux
Tools: git, docker, airflow, Jenkins
Language specific skills are important, also for ETL and databases. Certifications with AWS, Google, or
Cloudera may be relevant.
Key topics include data pipelines, algorithms and data structures, and the understanding of system
design.

DATA ANALYST
SQL, Excel
Visualization tools like Tableau
Python/R packages like matplotlib, seaborn, ggplot2
Key topics include statistical knowledge, data analysis, data interpretation, and logical approach.

DATA SCIENTIST
SQL, Bash
R: dplyr, sqldf, tidyr, lubridate, shiny, ggplot2, MLR, ranger, xgboost
Python: numpy, pandas, matplotlib, scikit-learn, keras,
Hadoop: Hive, Pig, Spark
Databases: Microsoft SQL, PostgreSQL, MongoDB
Tools: git, jupyter notebook, docker
Models & algorithms: Statistical models and distributions, linear and logistic regression, random forest,
backpropagation, ARIMA, Natural Language Processing, Computer Vision

WHERE ARE WE TODAY?
ML IS WIDELY DEPLOYED AND THE
PRACTICE DEVELOPING CREATIVELY
MORE AND MORE INDUSTRIES ARE
PROGRESSING FROM DIGITAL TO
DATA AND ARTIFICIAL INTELLIGENCE
VALIDATING THE BUSINESS CASE IS
KEY

#DATACAREER
WORKSHOP
OUTLINE
#datacareer orientation

KEY INDUSTRY
CHALLENGES*
¡ Data volume, accessibility, and
quality
¡ Trust of customers,
stakeholders, and employees,
including governance,
compliance, and reputation
¡ Competence of employees,
management, and company
*Based on the 2019 PWC report “Künstliche
Intelligenz in Unternehmen”, p. 12

SOME STARTUP
CHALLENGES
• Data volume, accessibility, and
quality
• Company funding and runway
• Expertise levels and team size

…AMONG
EARLY AI
ADOPTERS IN
THE UNITED
STATES

…AMONG AI
PLAYERS IN
GERMANY

…AMONG EARLY
ADOPTERS IN
THE USA

….AMONG AI
PLAYERS IN
GERMANY

….PROSPECTIVE
USE CASES IN
GERMANY

WRAPPING UP
Keep observing the market
Look for matches between employers’
needs and your skills profile
Scan the industry and startups for the
most promising #aiusecase

THANK YOU
Join at
theguild.ai/community

#Datacaeer - AI Guild workshop on data roles in industry with Adam Green

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to #Datacaeer - AI Guild workshop on data roles in industry with Adam Green

Similar to #Datacaeer - AI Guild workshop on data roles in industry with Adam Green (20)

Recently uploaded

Recently uploaded (20)

#Datacaeer - AI Guild workshop on data roles in industry with Adam Green