Based on AI Guild career coaching this workshop looks at roles such as Data Analyst, Data Scientist, and Data Engineer in industry and startups. We discuss emerging specialization, and how to upgrade your competence profile. Also included, tips and tricks from practitioners on how to find your next role.
Please find the event series on aiguild.eventbrite.com
#Datacaeer - AI Guild workshop on data roles in industry with Adam Green
1. #DATACAREER
FIRST AND SECOND #DATAROLES IN THE INDUSTRY
AND STARTUPS
SPECIAL EDITION WITH ADAM GREEN
FROM AIGUILD.EVENTBRITE.COM
2. #DATACAREER
“No matter who you are, self-improvement is
one of the most important and most
overlooked attributes of young AI talent. It
only takes four years of experience to become
a senior AI researcher, or five years of
experience to lead an entire institute. The
determination and discipline to improve both
the hard and soft skills continually will be the
deciding factor in an AI researcher’s career.”
Jean-François Gagné
3.
4. Dânia Meira
Founding member, AI Guild
ML models for predictive analytics
Former bootcamp teacher
#datacareer since 2012
LinkedIn
5. Adam Green
Founding member, AI Guild
Senior data scientist
Former bootcamp director
Focus on energy industry
LinkedIn
6. Chris Armbruster
Founding member, AI Guild
10,000 Data Scientists for Europe
Former bootcamp director
#datacareer coaching since 2017
LinkedIn
9. AI GUILD CAREER
COACHING
Running for junior and for senior
practitioners since early 2019
Runs monthly for AI Guild members
Coaching capacity per year: 240
participants
10. INSIGHTS FROM
CAREER
COACHING
Search for the 1st as well as the 2nd role may
take >6 months
Upgrading inside a company may be easier
Job advertisements may be misleading and
confusing
The role ‘in real life’ may not match the talents
expectations
11. OBSERVING THE
MARKET
Specialization and differentiation of roles
Rising value of domain expertise
Experimental phase with PoC plays ending
Increasing focus on deployment
14. PRODUCTIONIZING MACHINE LEARNING
ML
Models
Data Collection Data Quality
Infrastructure
Process Management
Tools
Machine
Resource
Management
Monitoring
Configuration
Feature
Extraction
Analysis
Data Preprocessing
Parameter
Configuration
Offline
Validation
Business
Logic
A/B Testing
Data
Engineer
Data
Scientist
Data
Analyst
ML Engineer
AI
Researcher
#dataroles
See also: “Hidden Technical Debt in Machine Learning System” by Sculley et al, Google inc, 2015
15. #DATAROLES
Task Understand business case,
build features to train predictive
models to address such use
cases
Skill Statistics, SQL,
programming (e.g. python, R),
ML & DL techniques.
Data Scientist
Task Business and data under-
standing to report on what
happens
Skill Descriptive analytics, SQL,
statistics, dashboarding and
visualization tools
Data Analyst Data Engineer
Task Build and maintain infra-
structure and pipeline to collect,
clean and pre-process data
Skill Distributed systems,
databases, software engineering
Task Optimize, deploy and
maintain machine learning
models in production
Skill Software engineering,
devops and systems architecture
Machine Learning
Engineer
Task Build new machine learning
algorithms, find custom scientific
solutions
Skill Research, presenting at
conferences, writing publications
AI Researcher
16. ‚COOKING‘ DATA: EXPLAINING SPECIALIZATION
ML
Models
Data Collection Data Quality
Infrastructure
Process Management
Tools
Machine
Resource
Management
Monitoring
Configuration
Feature
Extraction
Analysis
Data Preprocessing
Parameter
Configuration
Offline
Validation
Business
Logic
A/B Testing
See also: Understanding a Machine Learning Workflow Through Food by Daniel Godoy
Sowing Harvesting Choose recipe
Prepare ingredients
Customers tasting
Kitchen Tasting
Use utensils
Try combinations of
appliances and recipes
Kitchen space and available appliances
17. UNDERSTANDING #DATAROLES
Build Kitchen Appliances
Create and use recipes to cook
Check quality of ingredients and recipes
Process ingredients at scale
Turn a recipe into many dishes served efficiently
Data
Engineer
Data
Scientist
Data
Analyst
ML Engineer
AI
Researcher
18. ADAM’S CASE: COMING INTO DATA FROM INDUSTRY
Chemical engineering and black
box modelling
Working as energy engineer with
spreadsheets and linear
programming
From bootcamp graduate to
director
19. ADAM’S TAKE ON #DATAROLES
THE DATA ANALYST
ENRICHES DATA
THE DATA SCIENTIST MAKES
PREDICTIONS
THE DATA ENGINEER
ENABLES ACCESS TO DAATA
20. WHAT IS GREAT
ABOUT BEING A
DATA SCIENTIST
¡ A never-ending story of learning
¡ Tooling is free
¡ Lots of freely accessible data
¡ Leverage of the technology
¡ The variety of non-traditional and
interdisciplinary routes into the field
¡ Future proof
¡ People are excited and interested in
what you do
¡ Many interesting life lessons
22. WHAT IS STILL
DIFFICULT
¡ Knowing where to stop learning
¡ Mastering new algorithms
¡ Keeping up with research
¡ Dealing with the impostor
syndrome
¡ Access to simulators
¡ APIs and libraries for
Reinforcement Learning
24. LET‘S START FROM
YOUR ‚USERS‘ AND
‚CUSTOMERS‘
¡ Hiring managers
¡ Human resources
¡ Recruiters
¡ Network of friends and
colleagues
¡ Company leaders
25. DATA ENGINEER
SQL, Bash, Java, Scala, Python
Hadoop: Hive, Pig, Spark
Databases e.g. Microsoft SQL, PostgreSQL, MongoDB
Platforms: AWS, Google Cloud Platform, Microsoft Azure, Linux
Tools: git, docker, airflow, Jenkins
Language specific skills are important, also for ETL and databases. Certifications with AWS, Google, or
Cloudera may be relevant.
Key topics include data pipelines, algorithms and data structures, and the understanding of system
design.
26. DATA ANALYST
SQL, Excel
Visualization tools like Tableau
Python/R packages like matplotlib, seaborn, ggplot2
Key topics include statistical knowledge, data analysis, data interpretation, and logical approach.
27. DATA SCIENTIST
SQL, Bash
R: dplyr, sqldf, tidyr, lubridate, shiny, ggplot2, MLR, ranger, xgboost
Python: numpy, pandas, matplotlib, scikit-learn, keras,
Hadoop: Hive, Pig, Spark
Databases: Microsoft SQL, PostgreSQL, MongoDB
Tools: git, jupyter notebook, docker
Models & algorithms: Statistical models and distributions, linear and logistic regression, random forest,
backpropagation, ARIMA, Natural Language Processing, Computer Vision
28. WHERE ARE WE TODAY?
ML IS WIDELY DEPLOYED AND THE
PRACTICE DEVELOPING CREATIVELY
MORE AND MORE INDUSTRIES ARE
PROGRESSING FROM DIGITAL TO
DATA AND ARTIFICIAL INTELLIGENCE
VALIDATING THE BUSINESS CASE IS
KEY
32. KEY INDUSTRY
CHALLENGES*
¡ Data volume, accessibility, and
quality
¡ Trust of customers,
stakeholders, and employees,
including governance,
compliance, and reputation
¡ Competence of employees,
management, and company
*Based on the 2019 PWC report “Künstliche
Intelligenz in Unternehmen”, p. 12
33. SOME STARTUP
CHALLENGES
• Data volume, accessibility, and
quality
• Company funding and runway
• Expertise levels and team size
44. WRAPPING UP
Keep observing the market
Look for matches between employers’
needs and your skills profile
Scan the industry and startups for the
most promising #aiusecase