Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Python for Data Science - Python Brasil 11 (2015)

14 997 vues

Publié le

This talk demonstrate a complete Data Science process, involving Obtaining, Scrubbing, Exploring, Modeling and Interpreting data using Python ecosystem tools, like IPython Notebook, Pandas, Matplotlib, NumPy, SciPy and Scikit-learn.

Publié dans : Données & analyses
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • Thanks for Sharing the document. In the world of open source with less cost to business, GUI based tools are though user friendly but with heavy licence fees. Open source giving some releaf to work on for small and medium business to work. It was an interested stuff.
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Python for Data Science - Python Brasil 11 (2015)

  1. 1. PYTHON FOR DATA SCIENCE Gabriel Moreira Machine Learning Engineer @gspmoreira PythonBrasil 2015
  2. 2. Why so much buzz?
  3. 3. https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century
  4. 4. WHAT IS DATA SCIENCE http://drewconway.com
  5. 5. TYPES OF ANALYTICS Investigative Analytics Operational Analytics Consumers: Humans Consumers: Machines http://blog.cloudera.com/blog/2014/03/why-apache-spark-is-a-crossover-hit-for-data-scientists/ https://hbr.org/2014/08/the-question-to-ask-before-hiring-a-data-scientist/
  6. 6. [Hillary Mason, Data Scientist] Inquire( Obtain( Scrub( Explore( Model( iNterpret( DATA SCIENCE IS IOSEMN
  7. 7. Inquire( Obtain( Scrub( Explore( Model( iNterpret( PYTHON IS IOSEMN js Outsider
  8. 8. ANALYTICS CASE
 CORPORATE SOCIAL NETWORKS
  9. 9. Full Data Analysis demo available in IPython Notebook bit.ly/python4ds_nb
  10. 10. Investigative Analytics Consumers: Humans
  11. 11. Inquire( Obtain( Scrub( Explore( Model( iNterpret(
  12. 12. INQUIRE 1. Which communities are more popular? 2. Is the user engagement increasing? 3. What is the distribution of user interactions? 4. Is there a relationship between publishing hour and number of interactions?
  13. 13. Inquire( Obtain( Scrub( Explore( Model( iNterpret(
  14. 14. OBTAIN •Download data from another location (e.g., a web page or server) •Query data from a database (e.g., MySQL or Oracle) •Extract data from an API (e.g.,Twitter, Facebook) •Extract data from another file (e.g., an HTML file or spreadsheet) •Generate data yourself (e.g., reading sensors or taking surveys)
  15. 15. READING INTERACTIONS FROM CVS
  16. 16. READING POSTS FROM JSON LINES
  17. 17. Inquire( Obtain( Scrub( Explore( Model( iNterpret(
  18. 18. SCRUB
  19. 19. SCRUB
  20. 20. SCRUB
  21. 21. SCRUB Dealing with nulls
  22. 22. SCRUB
  23. 23. Inquire( Obtain( Scrub( Explore( Model( iNterpret(
  24. 24. 1 - WHICH COMMUNITIES ARE MORE POPULAR?
  25. 25. 1 - WHICH COMMUNITIES ARE MORE POPULAR?
  26. 26. 2 - IS USER ENGAGEMENT INCREASING?
  27. 27. 2 - IS USER ENGAGEMENT INCREASING?
  28. 28. 3 - HOW ISTHE DISTRIBUTION OF USER INTERACTIONS?
  29. 29. 3 - HOW ISTHE DISTRIBUTION OF USER INTERACTIONS?
  30. 30. 3 - HOW ISTHE DISTRIBUTION OF USER INTERACTIONS?
  31. 31. 4 - RELATIONSHIP BETWEEN PUBLISHINGTIME AND NUMBER OF INTERACTIONS?
  32. 32. 4 - RELATIONSHIP BETWEEN PUBLISHINGTIME AND NUMBER OF INTERACTIONS?
  33. 33. 4 - RELATIONSHIP BETWEEN PUBLISHINGTIME AND NUMBER OF INTERACTIONS?
  34. 34. 4 - RELATIONSHIP BETWEEN PUBLISHINGTIME AND NUMBER OF INTERACTIONS? http://viverdeblog.com/melhoresahorarios-para-postar-nas-redes-sociais/
  35. 35. Operational Analytics Consumers: Machines
  36. 36. Inquire( Obtain( Scrub( Explore( Model( iNterpret(
  37. 37. 1. Discover the most relevant words in the posts 2. Find related posts, with similar content Operational AnalyticsTasks example Find Related Posts
  38. 38. 1 - RELEVANT WORDS IN A POST TF-IDF - More “relevant" terms in a document are frequent terms in the document and rare in other documents
  39. 39. 1 - RELEVANT WORDS IN A POST
  40. 40. 1 - RELEVANT WORDS IN A POST
  41. 41. 1 - RELEVANT WORDS IN A POST
  42. 42. BONUS - GLOBAL RELEVANTTERMS [ALL POSTS]
  43. 43. 2 - SIMILAR POSTS Cosine Similarity
 Measure of similarity between two vectors 
 being the cosine of the angle between them.
  44. 44. 2 - SIMILAR POSTS
  45. 45. 2 - SIMILAR POSTS Original Post Did you ever wonder how great it would be if you could write your jmeter tests in ruby ?This projects aims to do so. If you use it on your project just let me now. On the Architecture Academy you can read how jmeter can be used to validate your Architecture. modulo 13 arch definition architecture validation | academia de arquitetura
 
 Most similar post (cosine similarity = 0.30)
 Foram disponibilizados no site Enterprise Architecture, na parte de Knowledge Base de performance, alguns how-tos relacionados a testes de performance.Entre eles, como definir os requisitos (throughput, cálculo de threads para o JMeter etc.), utilização do JMeter, geração de massa de dados e monitoramento. planning and executing performance testing | enterprise architecture - how to identify performance acceptance criteria | enterprise architecture - how to geracao de massa de dados | enterprise architecture - how to jmeter | enterprise architecture - how to monitoramento | enterprise architecture
  46. 46. SIMILAR PEOPLE!
  47. 47. Inquire( Obtain( Scrub( Explore( Model( iNterpret(
  48. 48. INTERPRET •Drawing conclusions from your data •Evaluating what your results mean •Communicating your result
  49. 49. DATA PRODUCTS “If information has context and the context is interactive, insights are not predictable." [Agile Data Science, O’Reilly, 2014]
  50. 50. SENTIMENT ANALYSIS bit.ly/eleicoes2014debatesbt Analytical Dashboard
  51. 51. SENTIMENT ANALYSIS Analytical Dashboard bit.ly/eleicoes2014debatesbt
  52. 52. NETWORK ANALYSIS https://linkedjazz.org/network/ js
  53. 53. What about 
 Python for Big Data?
  54. 54. PYTHON FOR BIG DATA Streaming HADOOPY Pig UDFs 
 in Jython
  55. 55. DATA SCIENCE COURSES • Introduction to Data Science (Univ. of Washington) • Data Science specialization (Johns Hopkins) • Intro to Hadoop and MapReduce (Cloudera) • Machine Learning (Stanford) • Statistical Learning (Stanford) • Mining Massive Datasets (Stanford) • Scalable Machine Learning (Berkeley) http://workingsweng.com.br/2014/04/cursos-mooc-e-especializacoes-em-data-science/
  56. 56. BOOKS
  57. 57. Happy data geeking!
  58. 58. Gabriel Moreira @gspmoreira http://about.me/gspmoreira Thank you! PYTHON FOR DATA SCIENCE Slides: http://bit.ly/python4ds_pybr11 PythonBrasil 2015

×