Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

VSSML17 Review. Summary Day 2 Sessions

Valencian Summer School in Machine Learning 2017 - Day 2
Lecture Review: Summary Day 2 Sessions. By Mercè Martín Prats (BigML).
https://bigml.com/events/valencian-summer-school-in-machine-learning-2017

  • Soyez le premier à commenter

VSSML17 Review. Summary Day 2 Sessions

  1. 1. Class summary
  2. 2. BigML, Inc 2 Day 2 – Morning sessions
  3. 3. BigML, Inc 3 Basic transformations Expectations Poul Petersen Reality $ ML-ready data needs work!!! Any data is always ML-ready What does ML-ready mean? ● Machine Learning algorithms consume instances of the question that you want to model. Each row must describe one of the instances and each column a property of the instance ● Fields can be: – already present in your data – derived from your data – generated using other fields
  4. 4. BigML, Inc 4 Basic transformations ● Define your goal and select the right model for the problem you want to solve: Classification, regression, cluster analysis, anomaly detection, association discovery ● Perform cleansing, denormalizing, aggregating, pivoting, and other data wrangling tasks to generate a collection of instances relevant to the problem at hand. Finally use a very common format as output format: CSV ● Choose the right format to store each type of feature into a field ● Feature engineering: Using domain knowledge and Machine Learning expertise, generate explicit features that help to better represent the instances (Flatline) ML-ready steps
  5. 5. BigML, Inc 5 Basic transformations Cleansing: Homogenize missing values and different types in the same feature, fix input errors, correct semantic issues, etc. Denormalizing: Data is usually normalized in relational databases, ML-Ready datasets need the information de-normalized in a single file/dataset. Aggregation: When data is stored as individual transactions, as in log files, an aggregation to get the entity might be needed Pivoting: Different values of a feature are pivoted to new columns in the result dataset Regular time windows: Create new features using values over different periods of time Preprocessing data
  6. 6. BigML, Inc 6 Basic transformations ● Define a clear idea of the goal. ● Understand what ML tasks will achieve the goal. ● Understand the data structure to perform those ML tasks. ● Find out what kind of data you have and make it ML-Ready – where is it, how is it stored? – what are the features? – can you access it programmatically? ● Feature Engineering: transform the data you have into the data you actually need. ● Evaluate: Try it on a small scale ● Accept that you might have to start over…. ● But when it works, automate it!!! Preparation tasks
  7. 7. BigML, Inc 7 Feature Engineering Adding some domain knowledge to your data by creating new predicates from the existing features to help ML algorithms What do ML algorithms know about your fields? ● Numeric: contain sequences of numbers (no idea about odd/even, prime, etc.) ● Date-time: contain a timestamp (no idea about weekends, special holidays or seasons) ● Categorical: contain an enumeration of values (no relations between them) ● Text/Items: contain terms (no relations between them) Features can be useless to the algorithm if: ● They are not correlated to the objective to be predicted ● Their values change their meaning when combined with other features For ML Algorithms to work there must be some kind of statistical relation between some of the features and the objective. Sometimes, you must transform the available features to find such relations
  8. 8. BigML, Inc 8 Feature Engineering When do you need Feature Engineering? ● When the relationship between the feature and the objective is mathematically unsatisfying ● When the relationship of a function of two or more features with the objective is far more relevant than the one of the original features ● When there is missing data ● When the data is time-series, especially when the previous time period’s objective is known ● When the data can’t be used for machine learning in the obvious way (e.g., timestamps, text data)
  9. 9. BigML, Inc 9 Feature Engineering For numeric features: – Discretization: percentiles, within percentiles, groups – Replacement of missings – Normalization – Exponentiation, logarithms, etc. – Casting to categorical, integer or real – Statistics – Shocks (speed of change compared to stdev) For text features: – Mispellings – Length – Number of subordinate sentences – Language – Levenshtein distance
  10. 10. BigML, Inc 10 Feature Engineering Date-time features ● Cannot be used “as is” in a model. It's a collection of features. BigML is able to decompose them automatically when they are provided in the most usual formats. With Flatline, you can decompose them all. ● Date-time predicates that the computer does not know (some of them, domain dependent): Working hours? Daylight? Is rush hour?... Text features ● Bag of words: a new feature is associated to each word in the document (built- in in BigML) ● Tokenization: how do we select tokens? Do we want n-grams? What about numbers? ● Stemming: grouping forms of the same word in a unique term ● Length ● Text predicates: Dollar amounts? Dates? Salutations? Please and Thank you?
  11. 11. BigML, Inc 11 Feature Engineering Time-series transformations ● Better objective (percent change instead of absolute values) ● Deltas from previous reference time points ● Deltas from moving average (time windows) ● Recent Volatility... Problem: Exponential explosion of possible transformations
  12. 12. BigML, Inc. 12 ● Regressions are typically used to relate two numeric variables ● But using the proper function we can relate discrete variables too Ensembles and Logistic Regressions How comes we use a regression to classify? Logistic Regression is a classification ML Algorithm
  13. 13. BigML, Inc. 13 ● We should use feature engineering to transform raw features in linearly related predictors, if needed. ● The ML algorithm searches for the coefficients to solve the problem by transforming it into a linear regression problem In general, the algorithm will find a coefficient per feature plus a bias coefficient and a missing coefficient Ensembles and Logistic Regressions Assumption: The output is linearly related to the predictors.
  14. 14. BigML, Inc. 14 Default numeric: Replaces missing numeric values. Missing numeric: Adds a field for missing numerics. Bias: Allows an intercept term. Important if P(x=0) != 0 Regularization L1: prefers zeroing individual coefficients L2 (default): prefers pushing all coefficients towards zero Strength “C”: Higher values reduce regularization. EPS: The minimum error between steps to stop. Auto-scaling: Ensures that all features contribute equally. Recommended, unless there is a specific need to not auto- scale. Ensembles and Logistic Regressions Configuration parameters
  15. 15. BigML, Inc. 15 • Multi-class LR: Each class has its own LR computed as a binary problem (one-vs-the-rest). A set of coefficients is computed for each class. • Non-numeric predictors: As LR works for numeric predictors, the algorithm needs to do some encoding of the non-numeric features to be able to use them. These are the field-encodings. – Categorical: one-shot, dummy encoding, contrast encoding – Text and Items: frequencies of terms ● Curvilinear LR: adding quadratic features as new features Ensembles and Logistic Regressions Extending the domain for the algorithm
  16. 16. BigML, Inc. 16 Ensembles and Logistic Regressions Logistic Regressions versus Decision Trees ● Expects a "smooth" linear relationship with predictors ● LR is concerned with probability of a discrete outcome. ● Lots of parameters to get wrong: regularization, scaling, codings ● Slightly less prone to over-fitting ● Because it fits a shape, might work better when less data available if it fulfills the expected linear relationship. ● Adapts well to ragged non- linear relationships ● No concern: classification, regression, multi-class all fine. ● Virtually parameter free ● Slightly more prone to over- fitting ● Prefers surfaces parallel to parameter axes, but given enough data will discover any shape.
  17. 17. BigML, Inc. 17 Compared to the other classifiers ● Shares the massive predictional power of decision trees and ensembles ● Some smooth, multivariate functions are not a problem (like in LR) ● Can improve some of their cons But... ● Need massive data to learn every coefficient in a massive parameter space The goal is again predicting a classification Time series and Deepnets Deepnets are also a classifier (supervised learning)
  18. 18. BigML, Inc. 18 ● Low efficiency: The right structure for given data is not easily found, and most structures are bad ● Difficult interpretability: Nothing like the interpretability of trees. ● Small data ● Problems that need quick iteration ● Problems easy or not so performance demanding Time series and Deepnets Deepnets cons When it’s not so useful?
  19. 19. BigML, Inc. 19 • ● The time series model solves a forecast problem ● The training data must be a temporal identical distributed sequence of data (so order in rows is important!) ● The goal is predicting numeric properties in the future based on past behaviour. Time series and Deepnets Time series are supervised learning models able to extrapolate to the future the patterns learnt from data in the past
  20. 20. BigML, Inc. 20 The resulting family of models use exponential smoothing to fit the past training data and generate the different components of the solution: ● Trend: the slope between two consecutive points in time ● Seasonality: periodically recurrent pattern of variation ● Error: variations that cannot be described by trend or seasonality Each of those can contribute in an additive or multiplicative way to the particular model. Time series and Deepnets
  21. 21. BigML, Inc. 21 Each additive or multiplicative combination of these components generates a different model. Which is the best? There are some error metrics: ● AIC: Akaike Information Criterion ● AICc: Corrected Akaike Information Criterion ● BIC: Schwarz Bayesian Information Criterion ● R-squared And finally, they can be evaluated: Watch out! You need linear train/test splits to maintain the sequence order Time series and Deepnets
  22. 22. BigML, Inc. 22 ● Forecast of one or many numeric features for a user-given horizon using all possible ETS models ● The error intervals associated to these forecasts Time series and Deepnets Time series outputs
  23. 23. BigML, Inc 23 Day 2 – Evening sessions
  24. 24. BigML, Inc 24 REST API, bindings and basic workflows jao (José Antonio Ortega) Academics Real world How do Machine Learning Workflows look like? We need high-level tools to face the real world workflows by growing in: ● Automation ● Abstraction
  25. 25. BigML, Inc 25 REST API, bindings and basic workflows The foundations ● REST API first applications: Standards in software development. First level of abstraction Client side tools ● Web UI: Sitting on top of the REST API. Human-friendly access and visualizations for all the Machine Learning resources. Workflows must be defined and executed step by step. Second level of abstraction. ● Bindings: Sitting on top of the REST API. Fine-grained accessors for the REST API calls. Workflows must be defined and executed step by step. Second level of abstraction. ● BigMLer: Relying on the bindings. High-level syntax. Entire workflows can be created in only one command line. Third level of abstraction.
  26. 26. BigML, Inc 26 REST API, bindings and basic workflows . BigMLer automation ● Basic 1-click workflows in one command line ● Rich parameterized workflows: feature selection, cross-validation, etc. ● Models are downloaded to your laptop, tablet, cell phone, etc. once and can be used offline to create predictions Still.. Great for local predictions
  27. 27. BigML, Inc 27 REST API, bindings and basic workflows . Problems of client-side solutions ● Complexity Lots of details outside the problem domain ● Reuse No inter-language compatibility ● Scalability Client-side workflows hard to optimize ● Extensibility BigMLer hides complexity at the cost of flexibility ● Not enough abstraction
  28. 28. BigML, Inc 28 REST API, bindings and basic workflows .Solution: bringing automation and abstraction to the server-side ● DSL for ML workflow automation ● Framework for scalable, remote execution of ML workflows Sophisticated server-side optimization Out-of-the-box scalability Client-server brittleness removed Infrastructure for creating and sharing ML scripts and libraries WhizzML
  29. 29. BigML, Inc 29 REST API, bindings and basic workflows . WhizzML's new REST API resources: Scripts: Executable code that describes an actual workflow, taking a list of typed inputs and producing a list of outputs. Executions: Given a script and a complete set of inputs, the workflow can be executed and its outputs generated. Libraries: A collection of WhizzML definitions that can be imported by other libraries or scripts.
  30. 30. BigML, Inc 30 REST API, bindings and basic workflows Scripts Creating scripts ● Usable by any binding (from any language) ● Built-in parallelization ● BigML resources management as primitives of the language ● Complete programming language for workflow definition Using scripts Web UI Bindings BigMLer WhizzML
  31. 31. BigML, Inc 31 Advanced WhizzML workflows Charles Parker WhizzML offers: ● Primitives for all ML resources: (datasets, models, clusters, etc.) ● A complete programming language to compose at will these ML resources. ● Parallelization and Scalability built-in. This empowers the user to benefit from: ● Automated feature engineering: Best-first feature selection. ● Automated configuration choice: Randomized parameter optimization, SMACdown. ● Complex algorithms as 1-click: Stacked generalization, Boosting. All of them can be shared, reproduced and reused as one more BigML resource in a language-agnostic way.
  32. 32. BigML, Inc 32 Advanced WhizzML workflows Selected fields Following iterations don't improve the score for the model with (f5 f7), so the process stops
  33. 33. BigML, Inc 33 Advanced WhizzML workflows Stacked generalization
  34. 34. BigML, Inc 34 Advanced WhizzML workflows Process stops when you reach the expected performance or the user-given iterations limit Randomized parameter optimization
  35. 35. BigML, Inc 35 Advanced WhizzML workflows
  36. 36. BigML, Inc 36 Advanced WhizzML workflows … … The final model is an ensemble of models T0 F0 T1 F1 T2 F2 F8 T8 Boosting
  37. 37. BigML, Inc 37 Advanced WhizzML workflows Script it once, for everybody anywhere Publish scripts in the gallery Add scripts to your menus

    Soyez le premier à commenter

    Identifiez-vous pour voir les commentaires

  • melamela

    Sep. 28, 2017

Valencian Summer School in Machine Learning 2017 - Day 2 Lecture Review: Summary Day 2 Sessions. By Mercè Martín Prats (BigML). https://bigml.com/events/valencian-summer-school-in-machine-learning-2017

Vues

Nombre de vues

479

Sur Slideshare

0

À partir des intégrations

0

Nombre d'intégrations

0

Actions

Téléchargements

53

Partages

0

Commentaires

0

Mentions J'aime

1

×