Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

DATA
SCIENCE
POP UP
AUSTIN
Data Do's and Dont's: Lessons
From the Front Line
Ryan Orban
VP of Product and Strategy,  
Data Scientist, Galvanize
ryanorban

DATA
SCIENCE
POP UP
AUSTIN
#datapopupaustin
April 13, 2016
Galvanize, Austin Campus

Data Do’s and Dont’s:
Lessons from the Frontline

Co-Founder & CEO
Zipﬁan Academy
Ryan Orban
@ryanorban
EVP of Product and Strategy
Galvanize

We believe an opportunity belongs  
to anyone with aptitude and ambition.

4Galvanize 2015
NODES ON THE NETWORK
COLORADO (BOULDER, DENVER, FORT COLLINS)
SEATTLE, WA
SAN FRANCISCO, CA
AUSTIN, TX (OPENING Q1 2016)
Programs: Full Stack Immersive, Data Science Immersive,
Entrepreneurship
Entrepreneurship
Programs: Full Stack Immersive, Data Science Immersive, Data
Engineering Immersive, Masters of Science in Data Science,
Entrepreneurship
Entrepreneurship
[Explanation Text]

5Galvanize 2015
5 PROGRAMS
• Full Stack Immersive
• Data Science Immersive
• Data Engineering Immersive
Project over 500 Student Member Graduates in 2015
Currently over 1500 Members
• Master of Science in Data Science  
(University of New Haven)
• Startup Membership

6Galvanize 2015
PLACEMENT STATS
FULL STACK IMMERSIVE DATA SCIENCE IMMERSIVE
$43K $77KPre-program Salary
Average Starting Salary
97% Placement
Rate*
*Galvanize is a founder member of NESTA (New Economy Skills Training Association), a trade organization founded to regulate the new “bootcamp” market.
This place rate is more rigorous than that requested by state licensure agencies. The placement rate is calculated 6 months after graduation.
$72K $114KPre-program Salary
94%Placement
Rate*
Average Starting Salary

Software Engineering
Data
Science
Data
Analysis
Data
Engineering
Machine
Learning Java
Linux, UNIX
Mobile
Development
Objective C
C, C++, C#
Web
Development
Ruby on Rails
JavaScript
Front-endPHP
Full-
Stack
Excel
Python
SQL
NLP
Hadoop
Databases
Network Analysis
Java
Assembly
Statistics
R
The orange words are the most
important things we teach.
How These Things
Relate to Each Other
Full-Stack Web Development
and Data Science are in gray
circles.

8Galvanize 2015
DATA SCIENCE IMMERSIVE
Week 1 - Exploratory Data Analysis and Software Engineering Best Practices
Week 2 - Statistical Inference, Bayesian Methods, A/B Testing, Multi-Armed Bandit
Week 3 - Regression, Regularization, Gradient Descent
Week 4 - Supervised Machine Learning: Classiﬁcation, Validation, Ensemble Methods
Week 5 - Clustering, Topic Modeling (NMF, LDA), NLP
Week 6 - Network Analysis, Matrix Factorization, and Time Series
Week 7 - Hadoop, Hive, and MapReduce
Week 8 - Data Visualization with D3.js, Data Products, and Fraud Detection Case Study
Weeks 9-10 - Capstone Projects
Week 12 - Onsite Interviews

Data Manipulation Model Creation Prediction

Do
Don’t
• Assume your data is friendly
• ETL and feature engineering is largely opaque to others (and
yourself after enough time away)
• Automate cleaning and transformation pipelines
• Jupyter and RStudio are great for EDA, but have issues with
collaboration and version control
• Build functional code to be reused; export into plain code ﬁles,
track with Git

Do
Don’t
• Never use accuracy as your main metric
• You can have 99% accuracy but 0% predictive power
• Unbalanced classes; sampling
• Use metrics like precision and recall
• Aggregate metrics like F1-score, AUC/AIC/BIC also good
• Remember that models with highest scores are not always the
ones you need; permissive vs. conservative based on use case

Do
Don’t
• Don’t start with the most complicated models ﬁrst (deep learning,
gradient boosting, SVMs, etc.)
• Don’t focus on the algorithm
•“More data always beats better algorithms”
• But better features usually beat better algorithms*
• Start with a baseline model, then continuously “close the loop”
• Create a base case to optimize against
• Does 1% greater F1-score outweigh a 10x training time in
production? Not usually unless you’re Google-scale.

Do
Don’t
• Assume your cross-validation metrics will hold up against real-life
data
• Separate your application and prediction code
• Fast iteration cycles are key. Create a “scoring service” that is
uncoupled from application code.
• APIs & service oriented architectures typically work best

Do
Don’t
• Don’t focus on the “how”, i.e. cover every trial and tribulation
• Cut to the chase
• After a presentation, I always ask the class two questions:
• What is one sentence that describes what the speaker learned?
• Why do I care?

19Galvanize 2015
• Early Access to Students
• Candidate Matching
• Curriculum Development
• Corporate Student Sponsorship
• Diversity
TALENT

20Galvanize 2015
• Membership
• Organic Relationships
• Course Content
• Mentorship
• Community
• Events
ACCESS

21Galvanize 2015
• Galvanize Experts
• Capstone Projects
• Internship
• Corporate Training
EXPERTISE

THANK YOU
RYAN ORBAN | EVP, STRATEGY
ryan.orban@galvanize.com
@ryanorban
www.galvanize.com

DATA
SCIENCE
POP UP
AUSTIN
@datapopup
#datapopupaustin

Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (9)

Plus de Domino Data Lab

Plus de Domino Data Lab (20)

Dernier

Dernier (20)

Data Science Popup Austin: Data Do's and Dont's: Lessons From The Front Line