How to do Predictive Analytics with Limited Data

2. Agenda Introduction and Motivation Semi-Supervised Learning • Generative Models • Large Margin Approaches • Similarity Based Methods Conclusion © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13

3. Predictive Modeling Traditional Supervised Learning • • • • • Promotion on bookseller’s web page Customers can rate books. Will a new customer like this book? Training set: observations on previous customers Test set: new customers What happens if only few customers rate a book? © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13 Test Data Age Attributes Income 60K 80K 95K 52K 45K 75K 51K + + + 52 47K - 47 25 38K 22K - 33 47K ? + 20 43 26 41K - 35 ? LikesBook + 65 60 67K 39 Age 24 LikesBook 22 Target Label Income + Training Data Model Age 22 Income 67K LikesBook + 39 41K - Prediction

4. Predictive Modeling Test Data Traditional Supervised Learning • • • • • Age Test set: new customers What happens if only few customers rate a book? 41K ? 95K 52K ? ? 20 45K ? 43 75K ? 26 52 51K 47K ? ? 47 38K ? 25 22K ? 33 Training set: observations on previous customers ? 60 35 Will a new customer like this book? 67K 39 Customers can rate books. LikesBook 22 Promotion on bookseller’s web page Income 47K ? Target Label Attributes Age Income LikesBook 24 60K + 65 80K - Model Training Data Prediction © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13

5. Predictive Modeling Traditional Supervised Learning • • • • • Promotion on bookseller’s web page Customers can rate books. Will a new customer like this book? Training set: observations on previous customers Test set: new customers What happens if only few customers rate a book? © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13 Test Data Age Attributes Income 60K 80K 95K 52K 45K 75K 51K ? + ? 52 47K - 47 25 38K 22K ? ? 33 47K ? ? 20 43 26 41K ? 35 ? LikesBook + 65 60 67K 39 Age 24 LikesBook 22 Target Label Income ? Training Data Model Age 22 Income 67K LikesBook + 39 41K - Prediction

6. Supervised Learning Classiﬁcation • For now: each data instance is a point in a 2D coordinate system • • Color denotes target label Model is given as decision boundary What’s the correct model? • • In theory: no way to tell • All learning systems have underlying assumptions Smoothness assumption: similar instances have similar labels © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13

11. Semi-Supervised Learning Can we make use of the unlabeled data? • • In theory: no ... but we can make assumptions Popular Assumptions • • • Clustering assumption Low density assumption Manifold assumption © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13

12. Semi-Supervised Learning Can we make use of the unlabeled data? • • In theory: no ... but we can make assumptions Popular Assumptions • • • Clustering assumption Low density assumption Manifold assumption © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13

13. The Clustering Assumption Clustering • Partition instances into groups (clusters) of similar instances • Many diﬀerent algorithms: k-Means, EM, DBSCAN, etc. • Available e.g. on Mahout Clustering Assumption • The two classiﬁcation targets are distinct clusters • Simple semi-supervised learning: cluster, then perform majority vote © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13

17. Generative Models Mixture of Gaussians • Assumption: the data in each cluster is generated by a normal distribution • Find most probable location and shape of clusters given data Expectation-Maximization • • Two step optimization procedure • • Each step is one MapReduce job Keeps estimates of cluster assignment probabilities for each instance Might converge to local optimum © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13

23. Beyond Mixtures of Gaussians Expectation-Maximization • • Can be adjusted to all kinds of mixture models E.g. use Naive Bayes as mixture model for text classiﬁcation Self-Training • • • • Learn model on labeled instances only Apply model to unlabeled instances Learn new model on all instances Repeat until convergence © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13

24. The Low Density Assumption Assumption • The area between the two classes has low density • Does not assume any speciﬁc form of cluster Support Vector Machine • • • Decision boundary is linear Maximizes margin to closest instances Can be learned in one Map-Reduce step (Stochastic Gradient Descent) © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13

27. The Low Density Assumption Semi-Supervised Support Vector Machine • Minimize distance to labeled and unlabeled instances • Parameter to ﬁne-tune inﬂuence of unlabeled instances • Additional constraint: keep class balance correct Implementation • • Simple extension of SVM But non-convex optimization problem © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13

30. The Low Density Assumption Semi-Supervised Support Vector Machine • • • Minimize distance to labeled and unlabeled instances Parameter to ﬁne-tune inﬂuence of unlabeled instances Additional constraint: keep class balance correct Implementation • • Margins of labeled instances Regularizer 2 min kwk w +C +C n X `(yi (wT xi + b)) i=1 n+m X ? `(|wT xi + b|) i=n+1 Simple extension of SVM But non-convex optimization problem © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13 Margins of unlabeled instances

31. Semi-Supervised SVM Stochastic Gradient Descent • • One run over the data in random order • Steps get smaller over time Each misclassified or unlabeled instance moves classifier a bit Implementation on Hadoop • Mapper: send data to reducer in random order • Reducer: update linear classifier for unlabeled or misclassified instances • Many random runs to find best one © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13

36. The Manifold Assumption The Assumption • Training data is (roughly) contained in a low dimensional manifold • One can perform learning in a more meaningful low-dimensional space • Avoids curse of dimensionality Similarity Graphs • Idea: compute similarity scores between instances • Create network where the nearest neighbors are connected © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13

40. Label Propagation Main Idea • Propagate label information to neighboring instances • • Then repeat until convergence Similar to PageRank Theory • Known to converge under weak conditions • Equivalent to matrix inversion © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13

44. Nearest Neighbor Join Block Nested Loop Join • • 1st MR job: partition data into blocks, compute nearest neighbors between blocks 2nd MR job: ﬁlter out the overall nearest neighbors Reducers Use spatial information to avoid unnecessary comparisons • R trees, space ﬁlling curves, locality sensitive hashing • Example implementations available online © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13 1 1 1 2 1 3 Smarter Approaches • Data 1 4 1 1 1 1

45. Nearest Neighbor Join Block Nested Loop Join • • 1st MR job: partition data into blocks, compute nearest neighbors between blocks 2nd MR job: ﬁlter out the overall nearest neighbors Reducers Use spatial information to avoid unnecessary comparisons • R trees, space ﬁlling curves, locality sensitive hashing • Example implementations available online © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13 1 1 1 1 2 1 1 2 2 1 2 2 2 2 3 Smarter Approaches • Data 1 2 4 1 2

46. Nearest Neighbor Join Block Nested Loop Join • • 1st MR job: partition data into blocks, compute nearest neighbors between blocks 2nd MR job: ﬁlter out the overall nearest neighbors Reducers Use spatial information to avoid unnecessary comparisons • R trees, space ﬁlling curves, locality sensitive hashing • Example implementations available online © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13 1 1 1 1 2 1 3 1 4 2 2 1 2 2 2 3 2 4 3 Smarter Approaches • Data 3 1 3 2 3 3 3 4 4 4 1 4 2 4 3 4 4

47. Nearest Neighbor Join Block Nested Loop Join • • 1st MR job: partition data into blocks, compute nearest neighbors between blocks 2nd MR job: ﬁlter out the overall nearest neighbors Smarter Approaches • Data 1 2 3 Use spatial information to avoid unnecessary comparisons • R trees, space ﬁlling curves, locality sensitive hashing • Example implementations available online © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13 4 Nearest Neighbors

48. Nearest Neighbor Join Block Nested Loop Join • • 1st MR job: partition data into blocks, compute nearest neighbors between blocks 2nd MR job: ﬁlter out the overall nearest neighbors Smarter Approaches • Data 1 2 3 Use spatial information to avoid unnecessary comparisons • R trees, space ﬁlling curves, locality sensitive hashing • Example implementations available online © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13 4 Nearest Neighbors Result

49. Nearest Neighbor Join Block Nested Loop Join • • 1st MR job: partition data into blocks, compute nearest neighbors between blocks 2nd MR job: ﬁlter out the overall nearest neighbors Smarter Approaches • Data 1 1 1 1 2 2 2 1 2 2 2 3 3 2 3 3 3 4 4 3 4 4 3 Use spatial information to avoid unnecessary comparisons • R trees, space ﬁlling curves, locality sensitive hashing • Example implementations available online © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13 Reducers 4

50. Nearest Neighbor Join Block Nested Loop Join • 1st MR job: partition data into blocks, compute nearest neighbors between blocks • 2nd MR job: ﬁlter out the overall nearest neighbors Smarter Approaches • Use spatial information to avoid unnecessary comparisons • R trees, space ﬁlling curves, locality sensitive hashing • Example implementations available online © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13

51. Nearest Neighbor Join Block Nested Loop Join • 1st MR job: partition data into blocks, compute nearest neighbors between blocks • 2nd MR job: ﬁlter out the overall nearest neighbors Smarter Approaches • Use spatial information to avoid unnecessary comparisons • R trees, space ﬁlling curves, locality sensitive hashing • Example implementations available online © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13

52. Label Propagation on Hadoop Iterative Procedure • • Mapper: Emit neighbor-label pairs • Repeat until convergence Reducer: Collect incoming labels per node and combine them into new label Improvement • Cluster data and perform within-cluster propagation locally • Sometimes it’s more eﬃcient to perform matrix inversion instead © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13

57. Conclusion Semi-Supervised Learning • Only few training instances have labels • Unlabeled instances can still provide valuable signal Diﬀerent assumptions lead to diﬀerent approaches • • • Cluster assumption: generative models Low density assumption: semi-supervised support vector machines Manifold assumption: label propagation Code available: https://github.com/Datameer-Inc/lesel © 2013 Datameer, Inc. All rights reserved. Thursday, October 31, 13

58. @Datameer Thursday, October 31, 13

How to do Predictive Analytics with Limited Data

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (7)

Similaire à How to do Predictive Analytics with Limited Data

Similaire à How to do Predictive Analytics with Limited Data (20)

Plus de Datameer

Plus de Datameer (20)

Dernier

Dernier (20)

How to do Predictive Analytics with Limited Data