Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

An introduction to data science on azure

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Prochain SlideShare
Python for Data Science
Python for Data Science
Chargement dans…3
×

Consultez-les par la suite

1 sur 67 Publicité

An introduction to data science on azure

Télécharger pour lire hors ligne

Azure provides some great functionality for exploring data and building advanced machine learning solutions. This presentation introduces Azure Notebooks and Azure Machine Learning Studio

Azure provides some great functionality for exploring data and building advanced machine learning solutions. This presentation introduces Azure Notebooks and Azure Machine Learning Studio

Publicité
Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à An introduction to data science on azure (20)

Publicité

Plus récents (20)

An introduction to data science on azure

  1. 1. Nick Wienholt An Introduction to Data Science on Azure
  2. 2. ex Microsoft MVP Nick Wienholt @NickWienholt  SSW Solution Architect  Author of a number of books on .NET performance and C#  Loves  DevOps  Machine Learning  Large-scale development
  3. 3. Intro to Data Science Azure ML Studio Intro to Machine Learning Azure Notebooks Table of Contents
  4. 4. History of Data Science 1957 – Fortran 1984 - Matlab 1987 – Excel 1991 - Python 1993 – R 1996 – Data science coined as a term
  5. 5. History of Data Science Early 2000s – Google and SaaS show the importance of data- crunching 2006 - Hadoop 2008 – Pandas (numeric processing library for Python) released 2011 – IPython (Jupyter) Notebooks 2015 - Azure Notebook Support
  6. 6. About you Power BI? Python? R? Azure Machine Learning? IBM Watson? AWS? Google? Matlab? Other?
  7. 7. Steps in Data Science Project Step 0: Understand stakeholder motivations Step 1: Acquiring Data Step 2: Pre-Processing/ Cleaning Data Step 3: Analyzing Data Step 4: Communicating Results Step 5: Turning Insights into Action – Reporting and Machine Learning
  8. 8. What is data science Software engineering meets statistics Reporting vs data science – what's the difference? Involves exploring data in a structured way Supported by an increasing array of tools
  9. 9. Data mining is the computing process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. It is an interdisciplinary subfield of computer science.
  10. 10. Intro to Data Science Azure ML Studio Intro to Machine Learning Azure Notebooks Table of Contents
  11. 11. ? Data Analysis Tools Excel limitations • Rows are limited • 2D data only • Rich client (mostly) • Lack of advanced algorithms • Hidden calculations!!!!!
  12. 12. Greece and Austerity Reinhart, Rogoff, and the Excel Error That Changed History (https://www.bloomberg.com/news/articles/2013-04- 18/faq-reinhart-rogoff-and-the-excel-error-that- changed-history)
  13. 13. ? New Socialism Excel in the 21st century (https://www.breakingviews.com/considered- view/piketty-spreadsheets-set-bad-excel-example/)
  14. 14. Jupyter (was IPython) notebooks • Open-source web application - http://jupyter.org • Create and share documents • Live code, equations, visualizations and explanatory text • Data cleaning and transformation, numerical simulation, statistical modelling, machine learning and more • Many languages supported
  15. 15. Jupyter Quick Intro
  16. 16. Azure Jupyter Notebooks • Support R, Python and F# • Supports Python and R data-processing libraries • Currently in preview • Supports Microsoft’s Cognitive Toolkit (CNTK)
  17. 17. Steps in Data Science Project Step 0: Understand stakeholder motivations Q: What drives SSW.TV views and likes? Step 1: Acquiring Data https://analytics.youtube.com – download Analytics CSV into public URL (Dropbox) Step 2: Pre-Processing/ Cleaning Data Open https://notebooks.azure.com Import data (from Dropbox) Remove noise Missing data – set to zero, remove rows, infer missing cells
  18. 18. Steps in Data Science Project Step 3: Analyzing Data Graphs and statistical calculations Step 4: Communicating Results Add annotations and commentary Step 5: Turning Insights into Action – Reporting and Machine Learning N/A
  19. 19. Correlation
  20. 20. http://www.tylervigen.com/spurious-correlations
  21. 21. Azure Notebook Summary • Great tool for data cleansing, analysis and presentation. • Code and commentary inline • Can be cloned by others for modification and confirmation
  22. 22. Intro to Data Science Azure ML Studio Intro to Machine Learning Azure Notebooks Table of Contents
  23. 23. ? How many have used machine learning
  24. 24. Machine Learning History 1952 – First learning program written to play Checkers 1967 – “nearest neighbor” developed for image recognition 1970 – First AI winter 1979 – self-navigating robot developed at Stanford 1980s – back-propagation rediscovered
  25. 25. Machine Learning History 1990s – Computing power enabled shift from knowledge-based systems to data-based systems 1997 – Deep Blue beats Garry Kasparov 2006 – Netflix prize 2010 – Microsoft Kinect, Kaggle 2011 – IBM Watson wins Jeopardy! 2015 – Google’s AlphaGo wins
  26. 26. What Is Machine Learning? y = mx + c
  27. 27. What Is Machine Learning?
  28. 28. Classification – ‘does this patient have a disease’? Machine Learning – The Five Uses
  29. 29. Regression – ‘what will the temperature be tomorrow?’ Machine Learning – The Five Uses
  30. 30. Anomaly Detection – ‘is this credit card transaction fraudulent?’ Machine Learning – The Five Uses
  31. 31. Grouping of Data (unsupervised) – ‘can these customers be placed in distinct groups’ Machine Learning – The Five Uses
  32. 32. Recommender systems. e.g.. Amazon Machine Learning – The Five Uses
  33. 33. https://www.youtube.com/watch?v=DDqrfCmIPxI
  34. 34. Data collection and cleansing Choosing an algorithm can be hard Lots of training data is required Lots of computing power Risk of overfitting Not a form of generalised intelligence Classification – ‘does this patient have a disease’? Regression – ‘what will the temperature be tomorrow?’ Anomaly Detection – ‘is this credit card transaction fraudulent?’ Grouping of Data (unsupervised) – ‘can these customers be placed in distinct groups’ Recommender systems. e.g.. Amazon Machine Learning – The Five Uses
  35. 35. Why is Machine Learning So Popular • More demand for data analytics • Lots of data – more data collected, more data available • Lots of compute power • Lots of connectivity
  36. 36. Lifecycle of a Machine Learning Project • Define the business problem • Identify, collect and cleanse the data • Develop the model • Deploy the model • Monitor
  37. 37. Intro to Data Science Azure ML Studio Intro to Machine Learning Azure Notebooks Table of Contents
  38. 38. Azure Machine Learning • Data Input • Data Collection • Data Splitting • Training • Score Model
  39. 39. Resources • SSW TV - Peter Meyers on Machine Learning • Understanding Machine Learning From Scratch – Coursera. • Predictive Analytics with Microsoft Azure Machine Learning 2nd Edition • Microsoft Data Science Virtual Machine • Microsoft Professional Program for Data Science
  40. 40. Summary Machine Learning ≈ Parameter Estimation for a Model Model Selection and Data Prep Are Hard Machine Learning Is Becoming Easier
  41. 41. Intro to Data Science Azure ML Studio Intro to Machine Learning Azure Notebooks Table of Contents
  42. 42. Thank you! info@ssw.com.au www.ssw.com.au Sydney | Melbourne | Brisbane | Adelaide

Notes de l'éditeur

  • Good day everyone. My name is Danijel Malik and I’m a Microsoft MVP for Visual Studio ALM. I joined the ALM Rangers a few months ago and I do a fair bit of work on Web as Solution Architect at SSW.

×