Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Jupyter Notebooks: Interactive Visualization Approaches

33 vues

Publié le

Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2VaT3O6.

Chakri Cherukuri talks about how to understand and visualize machine learning models using interactive widgets. He introduces the widget libraries and walks through the code of a simple example to show how to assemble and link these widgets. He looks at models like regressions, clustering and finally a wizard for building and training deep learning models with diagnostic plots. Filmed at qconsf.com.

Chakri Cherukuri is a senior researcher in the Quantitative Financial Research group at Bloomberg. His research interests include quantitative portfolio management, algorithmic trading strategies and applied machine learning. Previously, he built analytical tools for the trading desks at Goldman Sachs and Lehman Brothers. He has extensive experience in numerical computing and software development.

Publié dans : Technologie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Jupyter Notebooks: Interactive Visualization Approaches

  1. 1. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. QCon 2018 San Francisco November 07, 2018 Chakri Cherukuri Senior Researcher Quantitative Financial Research Group Jupyter Notebooks: Interactive Visualization Approaches
  2. 2. InfoQ.com: News & Community Site • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ ml-models-jupyter
  3. 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  4. 4. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Outline • Introduction • Overview of interactive widgets • Example with code walk-through — Normal Distribution • Case studies — Server log dashboard — Twitter sentiment analysis — Tools for deep learning • Q & A
  5. 5. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Interactive widgets Visual Representation • Graphics, Interaction • JavaScript implementation hidden from the user Visual RepresentationPython Object • Interface the user sees • Its attributes (traits) can send events when updated IntSlider(description=‘slider, value=50)
  6. 6. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Interactive widgets • ipywidgets: core UI controls (text boxes, sliders, button etc.) • bqplot: 2D plotting widgets (built on top of the ipywidgets framework) • Key takeaways: — ipywidgets and bqplot can be combined to build interactive visualizations — Attributes of widgets (traits) can be linked using callbacks
  7. 7. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Server Logs Dashboard
  8. 8. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Server Logs • Information we can glean: — Timestamp of requests/events — IP address — HTTP status codes — Agent — URL • URLs can be parsed to obtain search parameters: — Product IDs — Category IDS
  9. 9. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Dashboard Components • Time series plots of events — Trends — Seasonality — Outliers • Daily/hourly aggregation of events • Plots of events broken down by — Products — Categories — Status codes
  10. 10. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Server log dashboard
  11. 11. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Performance/Scaling • Data ingestion from streaming sources like Kafka • Notebook servers running on spark/hadoop clusters • Streaming data frames • dask dataframes • pandas with chunking
  12. 12. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Twitter Sentiment Analysis
  13. 13. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. News/Twitter Sentiment • Social sentiment from the raw news story or tweet. Data is: — Unstructured — Highly time-sensitive • Story-level sentiment • Company-level sentiment • Sentiment score can be used as a trading signal
  14. 14. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Twitter Sentiment Classification Problem statement: Predict the sentiment (negative, neutral, positive) of a tweet for a company Ex: “$CTIC Rated strong buy by three WS analysts. Increased target from $5 to $8.” = Positive Three way classification problem — Input: raw tweets — Output: sentiment label ∑ {negative, neutral, positive}
  15. 15. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Methodology • We are given labeled training and test data sets • Train classifier on training data set • Predict labels on test data and evaluate performance
  16. 16. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. One vs. rest Logistic Regression • Train three binary classifiers for each label — Model 1: Negative vs. Not Negative — Model 2: Positive vs. Not Positive — Model 3: Neutral vs. Not Neutral • Get probabilities (measures of confidence) for each label • Output the label associated with the highest probability
  17. 17. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Classifier performance analysis • Look at misclassifications — Confusion Matrix • Understand model predicted probabilities — Triangle visualization • Fix data issues
  18. 18. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Confusion Matrix K x K matrix where K = number of classes Cell[i, j] = number of samples whose: Actual label = i Predicted label = j Diagonal entries: correct predictions Off diagonal entries: misclassifications
  19. 19. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Triangle Visualization • Model returns 3 probabilities (which sum to 1) • How can we visualize these 3 numbers? — Points inside an equilateral triangle Not sure Very positive Negative / Neutral
  20. 20. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Performance analysis dashboard Use the dashboard to: — Analyze misclassifications (using confusion matrix) — Improve model by adding more features (by looking at model coefficients) — Fix data issues (using triangle and lasso)
  21. 21. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Analyze Misclassifications
  22. 22. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Analyze Misclassifications
  23. 23. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Analyze Misclassifications
  24. 24. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Use Lasso To Find Data Issues
  25. 25. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Use Lasso To Find Data Issues
  26. 26. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Tools for Deep Learning
  27. 27. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Deep Learning Tools Graphical Wizard — Select network parameters — Build network architecture — Training plots • Real time loss/accuracy curves • Distribution of weights/biases/activations — Diagnostic plots • Residual vs. Predicted Values • Confusion Matrix
  28. 28. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Network Parameters
  29. 29. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Network Architecture
  30. 30. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Loss/Accuracy Curves
  31. 31. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Distributions of Weights/Biases/Activations
  32. 32. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Diagnostic Plots
  33. 33. © 2018 Bloomberg Finance L.P. All rights reserved. Resources • Widget libraries used to build the applications: ipywidgets: https://github.com/jupyter-widgets/ipywidgets bqplot: https://github.com/bloomberg/bqplot • Machine Learning libraries scikit-learn: https://http://scikit-learn.org tensorflow: https://www.tensorflow.org keras: https://keras.io • Link to the notebooks/code: https://github.com/ChakriCherukuri/qcon_2018 • Tech At Bloomberg blog: www.TechAtBloomberg.com
  34. 34. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ ml-models-jupyter

×