Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Thinkful DC - Intro to Data Science

Prochain SlideShare
Intro to Data Science
Intro to Data Science
Chargement dans…3

Consultez-les par la suite

1 sur 53 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Thinkful DC - Intro to Data Science (20)


Plus par TJ Stalcup (20)

Plus récents (20)


Thinkful DC - Intro to Data Science

  1. 1. bit.ly/data-science-dc network: 1875ConfRoom password: vornado1875
  2. 2. March 2017 Intro to Data Science
  3. 3. Me • TJ Stalcup • Lead DC Mentor @ Thinkful • API Evangelist @ WealthEngine • Github: tjstalcup • Twitter: @tjstalcup
  4. 4. You I already have a career in data I’m serious about switching into a career in data I’m curious about switching into a career in data I just want to see what all the fuss is about
  5. 5. Today’s Goals What is a data scientist and what do they do? How and why has the field emerged? How can one become a data scientist?
  6. 6. Why do we care? “The United States alone faces a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts to analyze big data and make decisions based on their findings.” - @McKinsey
  7. 7. Why do we care? Also… average salaries are $115,000 a year
  8. 8. Nate Silver FiveThirtyEight.com “I think data-scientist is a sexed up term for a statistician”
  9. 9. Example: LinkedIn 2006 “[LinkedIn] was like arriving at a conference reception and realizing you don’t know anyone. So you just stand in the corner sipping your drink—and you probably leave early.” -LinkedIn Manager, June 2006
  10. 10. Enter: Data Scientist Joined LinkedIn in 2006, only 8M users (450M in 2016) Started experiments to predict people’s networks Engineers were dismissive: “you can already import your address book” Jonathan Goldman
  11. 11. The Result
  12. 12. Other Examples Uber — Where drivers should hang out Netflix — $1M movie recommendations contest Ebola — Mobile mapping in Senegal to fight disease
  13. 13. Big Data Big Data: datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze
  14. 14. Big Data - History Trend “started” in 2005 (Hadoop!) Web 2.0 - Majority of content is created by users Mobile accelerates this — data/person skyrockets
  15. 15. Hadoop? HDFS MapReduce
  16. 16. Hadoop Distributed File System File is too big….Distribute! Too many files….Distribute! Yahoo has over 10,000 servers running Hadoop
  17. 17. MapReduce Data + Processing Software Distributed Processing Map all of the data, reduce it
  18. 18. MapReduce
  19. 19. Big Data 90% of the data in the world today has been created in the last two years alone - IBM, May 2013
  20. 20. Big Data
  21. 21. Data Scientists - We Can Be Heroes
  22. 22. Data Scientists - Jack of all Trades
  23. 23. The Process Frame the question Collect the raw data Process the data Explore the data Communicate results
  24. 24. Case: Frame the Question What questions do we want to answer?
  25. 25. Case: Frame the Question What connections (type and number) lead to higher user engagement? Which connections do people want to make but are currently limited from making? How might we predict these types of connections with limited data from the user?
  26. 26. Case: Collect the Data What data do we need to answer these questions?
  27. 27. Case: Collect the Data Connection data (who is who connected to?) Demographic data (what is profile of connection) Retention data (how do people stay or leave) Engagement data (how do they use the site)
  28. 28. Case: Process the Data How is the data “dirty” and how can we clean it?
  29. 29. Case: Process the Data User input - 80/20 Redundancies - 2 emails Feature changes Data model changes
  30. 30. Case: Explore the Data What are the meaningful patterns in the data?
  31. 31. Case: Explore the Data Triangle closing Time overlaps Geographic clustering
  32. 32. Case: Communicate Findings How do we communicate this? To whom?
  33. 33. Case: Communicate Findings Tell story at the right technical level for each audience Make sure to focus on Whats In It For You (WIIFY!) Be objective, don’t lie with statistics Be visual! Show, don’t just tell
  34. 34. Tools SQL Queries Business Analytics Software Machine Learning Algorithms
  35. 35. #1 - SQL Queries SQL is the standard querying language to access and manipulate databases
  36. 36. #1 - SQL Queries friends id full_name age 1 Dan Friedman 24 2 Tyler Brewer 27 3 David Coulter 22 4 TJ Stalcup 33 SELECT full_name FROM friends WHERE age>22
  37. 37. #2: Visualization Software Business analytics software for your database enabling you to easily find and communicate insights visually
  38. 38. #2: Visualization Software
  39. 39. #3: Machine Learning Algorithms Machine learning algorithms provide computers with the ability to learn without being explicitly programmed — “programming by example”
  40. 40. Iris Data Set
  41. 41. Iris Data Set
  42. 42. Use Cases for Machine Learning Classification — Predict categories Regression — Predict values Anomaly Detection — Find unusual occurrences Clustering — Discover structure
  43. 43. It’s not easy but someone has to do it
  44. 44. That someone might be you Knowledge of statistics, algorithms, & software Comfort with languages & tools (Python, SQL, Tableau) Inquisitiveness and intellectual curiosity Strong communication skills It’s all Teachable!
  45. 45. Data Science Bootcamp Syllabus: Python Toolkit, Statistics & Probability, Experimentation, Machine Learning, Communicating Data, Algorithms and Big Data
  46. 46. or Web Development Bootcamp Syllabus: Beginner and Intermediate Frontend Development, Backend Development, CS Fundamentals, Product Engineering
  47. 47. What is Thinkful? Online skills bootcamp with 1-on-1 mentorship — learn anytime & anywhere & get a job, guaranteed. Anyone who’s committed can learn to code.
  48. 48. 1-on-1 Mentorship is the best way to learn
  49. 49. Our Results — Job Guarantee Bhaumik Liz
  50. 50. Special Prep Course Offer • Three-week program, includes six mentor sessions • Covers Python programming, Data Science Toolkit, Stats Refresher • Option to continue into data science bootcamp • Prep course costs $500 (can apply to cost of full bootcamp) • Talk to us about special 50% discount (available until the end of the week).
  51. 51. Thanks! tj@thinkful.com