Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Prochain SlideShare
What to Upload to SlideShare
Suivant
Télécharger pour lire hors ligne et voir en mode plein écran

Partager

Superweek 2019 - Digital Analytics Meets Data Science

Télécharger pour lire hors ligne

Past attendees of Superweek have ridden along with Tim as he explored R, and then as he dove deeper into some of the fundamental concepts of statistics. In this presentation, he provides the latest update on that journey: how he is putting his exploration into the various dimensions of data science to use with real data and real clients. The statistical methods are real, the code is R (and available on GitHub -- see http://bit.ly/ga-and-r), and the data is only lightly obfuscated.

Livres associés

Gratuit avec un essai de 30 jours de Scribd

Tout voir

Superweek 2019 - Digital Analytics Meets Data Science

  1. 1. Digital Analytics Meets Data Science Tim Wilson Superweek 2019
  2. 2. 2017
  3. 3. 2018 The journey continued!
  4. 4. 2018 Levitation through R!
  5. 5. 2018
  6. 6. 2018
  7. 7. 2018
  8. 8. 2018
  9. 9. 2018 All marketers inherently operate under conditions of uncertainty. There is a cost to reduce uncertainty. Uncertainty cannot be eliminated.
  10. 10. “We have a problem in [marketing] with thinking probabilistically.” - Annie Duke Source: Annie Duke
  11. 11. @tgwilson What is data science?
  12. 12. @tgwilson What is data science? https://www.kdnuggets.com/2016/10/battle-data-science-venn-diagrams.html
  13. 13. @tgwilson An 85% Confidence Venn Diagram Computer Science Statistics Subject Matter Expertise
  14. 14. Ap i d… Dat e c ?
  15. 15. 1 Com t Uni r in Exa l 2-3 Mil y U e l Tho t-Pro n Exa l 1 Ama g Exa l !
  16. 16. IDEA / CONCEPT EXPLORATION A TREAT FOR LATER!
  17. 17. IDEA / CONCEPT EXPLORATION A TREAT FOR LATER!
  18. 18. Some (Self-Imposed) Constraints Bas Fre Detailed Data (BigQuery) > Aggregated Data ( )
  19. 19. IDEA: TIME-NORMALIZED PAGEVIEWS Page Traffic: Launch vs. Lifecycle
  20. 20. IDEA / CONCEPT vs. TIME-NORMALIZED PAGEVIEWS
  21. 21. IDEA / CONCEPT vs. TIME-NORMALIZED PAGEVIEWS NO STATS REQUIRED!!!
  22. 22. @tgwilson Consider Two Blog Posts
  23. 23. @tgwilson We Can “Time-Normalize” Them to Launch Date
  24. 24. @tgwilson We Can “Time-Normalize” Them to Launch Date
  25. 25. @tgwilson We Can “Time-Normalize” Them to Launch Date
  26. 26. IDEA: PAGE-LEVEL METRIC CORRELATION Exploring Winners and Losers with Simple Page Metrics
  27. 27. IDEA / CONCEPT PAGE-LEVEL METRIC CORRELATION MORELESS TRAFFIC BAD GOOD METRIC
  28. 28. IDEA / CONCEPT PAGE-LEVEL METRIC CORRELATION MORELESS TRAFFIC BAD GOOD METRIC !!!
  29. 29. IDEA / CONCEPT PAGE-LEVEL METRIC CORRELATION
  30. 30. IDEA / CONCEPT PAGE-LEVEL METRIC CORRELATION
  31. 31. IDEA: EXPLORING SITE SEARCH Throwback to 2017… plus Topic Modeling!
  32. 32. IDEA: EXPLORING SITE SEARCH Throwback to 2017… plus Topic Modeling! Sébastien Brodeur Nancy Koons Julia Silge
  33. 33. IDEA / CONCEPT SITE SEARCH = VOICE OF THE CUSTOMER
  34. 34. (Some) Users type in complete questions.
  35. 35. SITE SEARCH + (LIGHT) TEXT MINING Unnest the Terms routing number…..42 rates…..68 rates…….68 routing….42 number....42
  36. 36. SITE SEARCH + (LIGHT) TEXT MINING Make All Terms Lowercase Routing routing ROUTING routing routing routing
  37. 37. SITE SEARCH + (LIGHT) TEXT MINING Word Stemming routing route routes rout rout rout routing routing routing
  38. 38. SITE SEARCH + (LIGHT) TEXT MINING Remove Stopwords i, me, my, myself, we, our, ours, ourselves, you, your, yours, yourself, yourselves, he, him, his, himself, she, her, hers, herself, it, its, itself, they, them, their, theirs, themselves, what, which, who, whom, this, that, these, those, am, is, are, was, were, be, been, being, have, has, had, having, do, does, did, doing, would, should, could, ought, i'm, you're, he's, she's, it's, we're, they're, i've, you've, we've, they've, i'd, you'd, he'd, she'd, we'd, they'd, i'll, you'll, he'll, she'll, we'll, they'll, isn't, aren't, wasn't, weren't, hasn't, haven't, hadn't, doesn't, don't, didn't, won't, wouldn't, shan't, shouldn't, can't, cannot, couldn't, mustn't, let's, that's, who's, what's, here's, there's, when's, where's, why's, how's, a, an, the, and, but, if, or, because, as, until, while, of, at, by, for, with, about, against, between, into, through, during, before, after, above, below, to, from, up, down, in, out, on, off, over, under, again, further, then, once, here, there, when, where, why, how, all, any, both, each, few, more, most, other, some, such, no, nor, not, only, own, same, so, than, too, very, will
  39. 39. SITE SEARCH + (LIGHT) TEXT MINING Remove Stopwords i, me, my, myself, we, our, ours, ourselves, you, your, yours, yourself, yourselves, he, him, his, himself, she, her, hers, herself, it, its, itself, they, them, their, theirs, themselves, what, which, who, whom, this, that, these, those, am, is, are, was, were, be, been, being, have, has, had, having, do, does, did, doing, would, should, could, ought, i'm, you're, he's, she's, it's, we're, they're, i've, you've, we've, they've, i'd, you'd, he'd, she'd, we'd, they'd, i'll, you'll, he'll, she'll, we'll, they'll, isn't, aren't, wasn't, weren't, hasn't, haven't, hadn't, doesn't, don't, didn't, won't, wouldn't, shan't, shouldn't, can't, cannot, couldn't, mustn't, let's, that's, who's, what's, here's, there's, when's, where's, why's, how's, a, an, the, and, but, if, or, because, as, until, while, of, at, by, for, with, about, against, between, into, through, during, before, after, above, below, to, from, up, down, in, out, on, off, over, under, again, further, then, once, here, there, when, where, why, how, all, any, both, each, few, more, most, other, some, such, no, nor, not, only, own, same, so, than, too, very, will
  40. 40. ORIGINAL DATA TIDIED UP DATA
  41. 41. ORIGINAL DATA TERM-FREQUENCY MATRIX
  42. 42. ORIGINAL DATA TERM-FREQUENCY MATRIX
  43. 43. A (Clean) Word Cloud + selective removal of overly dominant terms
  44. 44. Topic Modeling! 2-6 Groupings of Search Terms Using Latent Dirichlet Allocation (LDA)
  45. 45. Twitter followers, perhaps?
  46. 46. IDEA: QUANTIFY IMPACT OF DAY OF WEEK ...as an introduction to regression with nominal variables
  47. 47. IDEA / CONCEPT LINEAR REGRESSION WITH DAY OF WEEK y = mx + b
  48. 48. ?
  49. 49. Sessions = 52 + 203 × MON + 193 × TUE + 177 × WED + 194 × THU + 158 × FRI
  50. 50. Sessions = 52 + 203 × MON + 193 × TUE + 177 × WED + 194 × THU + 158 × FRI
  51. 51. 193 x TUE?!!! 47 + TACOS?!!!
  52. 52. Start by Pulling Some Daily Data
  53. 53. Regression = The Formula for a Line…? y = mx + b
  54. 54. Our dependent variable is obvious. y = mx + b
  55. 55. But our independent variable is…? y = mx + b
  56. 56. Let’s start with our basic plot.
  57. 57. Or we could (box)plot it a different way Thi t no re ed, bu he s h ab he a li l fe t .
  58. 58. We need some dummies!
  59. 59. We need some dummies!
  60. 60. We need some dummies!
  61. 61. We need some dummies! SAT ?!
  62. 62. We need some dummies!
  63. 63. Just a little bit of code... Use all (six) variables to build a best fit model. # Fit the full model full_model <- lm(Sessions ~., data = analysis_data) Perform “stepwise” operation to see which subset of variables provide the best fit. 1 # Build a model using stepwise regression library(MASS) step_model <- stepAIC(full_model, direction = "both", trace = FALSE) 2
  64. 64. ≈ 0.0000 < 0.01 p-value: 4.354e-11 The model is statistically significant at a 99% confidence level (this is good!). p-value = 4.354 x 10-11
  65. 65. Adjusted R-squared: 0.4584 46% of the variation in sessions from day to day is explained by the model! Adjusted R2 = 0.4584
  66. 66. Sessions = 52 + 203 × MON + 193 × TUE + 177 × WED + 194 × THU + 158 × FRI
  67. 67. Sessions = 52 + 203 × MON + 193 × TUE + 177 × WED + 194 × THU + 158 × FRI
  68. 68. Sessions = 52 + 203 × MON + 193 × TUE + 177 × WED + 194 × THU + 158 × FRI
  69. 69. Sessions = 52 + 203 × MON + 193 × TUE + 177 × WED + 194 × THU + 158 × FRI
  70. 70. Sessions = 52 + 203 × MON + 193 × TUE + 177 × WED + 194 × THU + 158 × FRI
  71. 71. Sessions = 52 + 203 × MON + 193 × TUE + 177 × WED + 194 × THU + 158 × FRI
  72. 72. Sessions = 52 + 203 × MON + 193 × TUE + 177 × WED + 194 × THU + 158 × FRI
  73. 73. Sessions = 52 + 203 × MON + 193 × TUE + 177 × WED + 194 × THU + 158 × FRI
  74. 74. Sessions = 52 + 203 × MON + 193 × TUE + 177 × WED + 194 × THU + 158 × FRI
  75. 75. ?
  76. 76. !!!
  77. 77. @tgwilson | analyticshour.io So...now what?
  78. 78. @tgwilson | analyticshour.io bit.ly/ga-and-r analyticshour.io tim.wilson@searchdiscovery.com
  • DanielVarberg

    Feb. 25, 2020
  • MatthiasKupperschmid

    Feb. 12, 2019
  • Tim5tewart

    Feb. 5, 2019

Past attendees of Superweek have ridden along with Tim as he explored R, and then as he dove deeper into some of the fundamental concepts of statistics. In this presentation, he provides the latest update on that journey: how he is putting his exploration into the various dimensions of data science to use with real data and real clients. The statistical methods are real, the code is R (and available on GitHub -- see http://bit.ly/ga-and-r), and the data is only lightly obfuscated.

Vues

Nombre de vues

829

Sur Slideshare

0

À partir des intégrations

0

Nombre d'intégrations

34

Actions

Téléchargements

26

Partages

0

Commentaires

0

Mentions J'aime

3

×