Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
© 2015 IBM Corporation
Predict whether an online ad will be
clicked using Spark MLlib pipelines
Presenter: Manisha Sule, I...
© 2015 IBM Corporation2
Outline
 Online advertising and click through rate prediction
 Data set and features
 Spark MLl...
Online Advertising
© 2015 IBM Corporation4
Online advertising
© 2015 IBM Corporation5
Online advertising on search engines
© 2015 IBM Corporation6
Social Media advertising
 Facebook's "Sponsored Stories"
 LinkedIn's "Sponsored Updates"
 Twitt...
© 2015 IBM Corporation7
Online advertising is big business
© 2015 IBM Corporation8
Types of online advertising
 Retargeting – Using cookies, track if a user left a webpage without ...
© 2015 IBM Corporation9
Online Advertising – The Players
Publishers:
 Earn revenue by displaying ads on their sites
 Goo...
© 2015 IBM Corporation10
Revenue models from online advertising
 CPM (Cost-Per-Mille): is an inventory based pricing mode...
© 2015 IBM Corporation11
Click-through rate (CTR)
 Ratio of users who click on an ad to the number of total users who vie...
© 2015 IBM Corporation12
Predict CTR – a Scalable Machine Learning success story
Predict conditional probability that the ...
Dataset and features
© 2015 IBM Corporation14
About the dataset used
 Sample of the dataset used for the Display Advertising Challenge hosted ...
© 2015 IBM Corporation15
Dataset Features
Data fields
 Label - Target variable that indicates if an ad was clicked (1) or...
Spark MLlib and Pipeline API
© 2015 IBM Corporation17
Spark MLlib
 Spark’s machine learning (ML) library.
 Goal is to make ML scalable and easy
 Inc...
© 2015 IBM Corporation18
ML pipelines for complex ML workflows
 DataFrame: Equivalent to a relational table in Spark SQL
...
MLlib Pipeline for Click Through Rate
Prediction
© 2015 IBM Corporation20
Overview of ML pipeline for Click Through Rate Prediction
© 2015 IBM Corporation21
Step 1 - Parse Data into Spark SQL DataFrames
 Spark SQL converts a RDD of Row objects into a Da...
© 2015 IBM Corporation22
Step 2 – Feature Transformer using StringIndexer
 Encodes a string column to a column of indices...
© 2015 IBM Corporation23
Step 3 - Feature Transformer using One Hot Encoding
 Maps a column of label indices to a column ...
© 2015 IBM Corporation24
Step 4 – Feature Selector using Vector Assembler
 Transformer that combines a given list of colu...
© 2015 IBM Corporation25
Step 5 – Train a model using Estimator Logistic Regression
© 2015 IBM Corporation26
Apply the pipeline to make predictions
Demo
© 2015 IBM Corporation28
Parameter Tuning using CrossValidator or TrainValidationSplit
© 2015 IBM Corporation29
Some alternatives:
 Use Hashed features instead of OHE
 Use Log loss evaluation or ROC to evalu...
© 2015 IBM Corporation30
Thank you!
© 2015 IBM Corporation31
References:
 Wikipedia: https://en.wikipedia.org/wiki/Online_advertising
 BerkeleyX: CS190.1x S...
Prochain SlideShare
Chargement dans…5
×

CTR Prediction using Spark Machine Learning Pipelines

6 187 vues

Publié le

Presented at Austin Data Science meetup, Jan 13th 2016.
Predict the click through rate in online advertising using Spark Machine Learning Pipelines

Publié dans : Données & analyses
  • If you’re looking for a great essay service then you should check out ⇒ www.HelpWriting.net ⇐. A friend of mine asked them to write a whole dissertation for him and he said it turned out great! Afterwards I also ordered an essay from them and I was very happy with the work I got too.
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • DOWNLOAD THE BOOK INTO AVAILABLE FORMAT (New Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://urlzs.com/UABbn } ......................................................................................................................... Download Full EPUB Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download Full doc Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download PDF EBOOK here { https://urlzs.com/UABbn } ......................................................................................................................... Download EPUB Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download doc Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THE can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THE is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBOOK .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookBOOK, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, EBOOK, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THE Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THE the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THE Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • DOWNLOAD THE BOOK INTO AVAILABLE FORMAT (New Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://urlzs.com/UABbn } ......................................................................................................................... Download Full EPUB Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download Full doc Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download PDF EBOOK here { https://urlzs.com/UABbn } ......................................................................................................................... Download EPUB Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download doc Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THE can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THE is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBOOK .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookBOOK, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, EBOOK, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THE Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THE the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THE Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • DOWNLOAD THE BOOK INTO AVAILABLE FORMAT (New Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://urlzs.com/UABbn } ......................................................................................................................... Download Full EPUB Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download Full doc Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download PDF EBOOK here { https://urlzs.com/UABbn } ......................................................................................................................... Download EPUB Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... Download doc Ebook here { https://urlzs.com/UABbn } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THE can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THE is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBOOK .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, CookBOOK, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, EBOOK, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THE Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THE the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THE Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici
  • DOWNLOAD THIS BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download Full EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download Full doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download PDF EBOOK here { https://soo.gd/irt2 } ......................................................................................................................... Download EPUB Ebook here { https://soo.gd/irt2 } ......................................................................................................................... Download doc Ebook here { https://soo.gd/irt2 } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book THIS can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer THIS is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story THIS Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money THIS the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths THIS Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

CTR Prediction using Spark Machine Learning Pipelines

  1. 1. © 2015 IBM Corporation Predict whether an online ad will be clicked using Spark MLlib pipelines Presenter: Manisha Sule, IBM Spark Enablement
  2. 2. © 2015 IBM Corporation2 Outline  Online advertising and click through rate prediction  Data set and features  Spark MLlib and the Pipeline API  MLlib pipeline for Click Through Rate Prediction  Questions
  3. 3. Online Advertising
  4. 4. © 2015 IBM Corporation4 Online advertising
  5. 5. © 2015 IBM Corporation5 Online advertising on search engines
  6. 6. © 2015 IBM Corporation6 Social Media advertising  Facebook's "Sponsored Stories"  LinkedIn's "Sponsored Updates"  Twitter's "Promoted Tweets"
  7. 7. © 2015 IBM Corporation7 Online advertising is big business
  8. 8. © 2015 IBM Corporation8 Types of online advertising  Retargeting – Using cookies, track if a user left a webpage without making a purchase and retarget the user with ads from that site  Behavioral targeting – Data related to user’s online activity is collected from multiple websites, thus creating a detailed picture of the user’s interests to deliver more targeted advertising  Contextual advertising – Display ads related to the content of the webpage  Geo targeting – Ads are presented based on the user’s suspected geography
  9. 9. © 2015 IBM Corporation9 Online Advertising – The Players Publishers:  Earn revenue by displaying ads on their sites  Google, Wall Street Journal, Twitter Advertisers:  Pay for their ads to be displayed on publisher sites  Goal is to increase business via advertising Matchmakers:  Match publishers with advertisers  Interaction with advertisers and publishers occurs real time
  10. 10. © 2015 IBM Corporation10 Revenue models from online advertising  CPM (Cost-Per-Mille): is an inventory based pricing model. Is when the price is based on 1,000 impressions.  CPC (Cost-Per-Click): Is a performance-based metric. This means the Publisher only gets paid when (and if) a user clicks on an ad  CPA (Cost Per Action): Best deal of all for Advertisers in terms of risk because they only pay for media when it results in a sale
  11. 11. © 2015 IBM Corporation11 Click-through rate (CTR)  Ratio of users who click on an ad to the number of total users who view the ad  Used to measure the success of an online advertising campaign  Very first online ad shown for AT&T on website HotWired has a 44% Click-through rate  Today, typical click through rate is less than 1%  In a pay-per-click setting, revenue can be maximized by choosing to display ads that have the maximum CTR, hence the problem of CTR Prediction.
  12. 12. © 2015 IBM Corporation12 Predict CTR – a Scalable Machine Learning success story Predict conditional probability that the ad will be clicked by the user given the predictive features of ads Predictive features are:  Ad’s historical performance  Advertiser and ad content info  Publisher info  User Info (eg: search/ click history) Data set is high dimensional, sparse and skewed:  Hundreds of millions of online users  Millions of unique publisher pages to display ads  Millions of unique ads to display  Very few ads get clicked by users
  13. 13. Dataset and features
  14. 14. © 2015 IBM Corporation14 About the dataset used  Sample of the dataset used for the Display Advertising Challenge hosted by Kaggle: https://www.kaggle.com/c/criteo-display-ad-challenge/  Consists of a portion of Criteo's traffic over a period of 7 days.
  15. 15. © 2015 IBM Corporation15 Dataset Features Data fields  Label - Target variable that indicates if an ad was clicked (1) or not (0).  I1-I13 - A total of 13 columns of integer features (mostly count features).  C1-C26 - A total of 26 columns of categorical features. The values of these features have been hashed onto 32 bits for anonymization purposes.
  16. 16. Spark MLlib and Pipeline API
  17. 17. © 2015 IBM Corporation17 Spark MLlib  Spark’s machine learning (ML) library.  Goal is to make ML scalable and easy  Includes common learning algorithms and utilities, like classification, regression, clustering, collaborative filtering, dimensionality reduction  Includes lower-level optimization primitives and higher level pipeline APIs  Divided into two packages • Spark.mllib: original API built on top of RDDs • Spark.ml: provides higher level API built on top of DataFrames for constructing ML pipelines
  18. 18. © 2015 IBM Corporation18 ML pipelines for complex ML workflows  DataFrame: Equivalent to a relational table in Spark SQL  Transformer: An algorithm that transforms one DataFrame into another  Estimator: An algorithm that can be fit on a DataFrame to produce a Transformer  Pipeline: Chains multiple transformers and estimators to produce a ML workflow  Parameter: Common API for specifying parameters
  19. 19. MLlib Pipeline for Click Through Rate Prediction
  20. 20. © 2015 IBM Corporation20 Overview of ML pipeline for Click Through Rate Prediction
  21. 21. © 2015 IBM Corporation21 Step 1 - Parse Data into Spark SQL DataFrames  Spark SQL converts a RDD of Row objects into a DataFrame  Rows are constructed by passing key/value pairs where keys define the column names of the table  The data types are inferred by looking at the data
  22. 22. © 2015 IBM Corporation22 Step 2 – Feature Transformer using StringIndexer  Encodes a string column to a column of indices  Indices are from 0 to max number of distinct string values, ordered by frequencies  If column is numeric, cast it to string and index string values
  23. 23. © 2015 IBM Corporation23 Step 3 - Feature Transformer using One Hot Encoding  Maps a column of label indices to a column of binary vectors  Allows algorithms which expect continuous features, to use categorical features
  24. 24. © 2015 IBM Corporation24 Step 4 – Feature Selector using Vector Assembler  Transformer that combines a given list of columns into a single vector column  Useful for combining raw features and features generated by different feature transformers into a single feature vector
  25. 25. © 2015 IBM Corporation25 Step 5 – Train a model using Estimator Logistic Regression
  26. 26. © 2015 IBM Corporation26 Apply the pipeline to make predictions
  27. 27. Demo
  28. 28. © 2015 IBM Corporation28 Parameter Tuning using CrossValidator or TrainValidationSplit
  29. 29. © 2015 IBM Corporation29 Some alternatives:  Use Hashed features instead of OHE  Use Log loss evaluation or ROC to evaluate Logistic Regression  Perform feature selection  Use Naïve Bayes or other binary classification algorithms
  30. 30. © 2015 IBM Corporation30 Thank you!
  31. 31. © 2015 IBM Corporation31 References:  Wikipedia: https://en.wikipedia.org/wiki/Online_advertising  BerkeleyX: CS190.1x Scalable Machine Learning (https://courses.edx.org/)  Big Data University: Spark Fundamentals 1 (https://bigdatauniversity.com/courses/spark- fundamentals/)  Big Data University: Spark Fundamentals 2 (http://bigdatauniversity.com/courses/spark- fundamentals-ii/)  Learning Spark: Lightning-Fast Big Data Analysis (Book by Andy Konwinski, Holden Karau, and Patrick Wendell)  Advanced Analytics with Spark: Patterns for Learning from Data at Scale (Book by Josh Wills, Sandy Ryza, Sean Owen, and Uri Laserson)  Spark Programming Guide (http://spark.apache.org/docs/latest/programming-guide.html)

×