SlideShare une entreprise Scribd logo
1  sur  54
Learning from Web Activity

                                              Jake Hofman

                                  Joint work with Irmak Sirer & Sharad Goel
                                              Yahoo! Research


                                             August 1, 2011




Jake Hofman   (hofman@yahoo-inc.com)        Learning from Web Activity        JSM, 2011.08.01   1 / 42
Jake Hofman   (hofman@yahoo-inc.com)   Learning from Web Activity   JSM, 2011.08.01   2 / 42
“... applies social sciences techniques, including
                algorithmic game theory, economics, network analysis,
                psychology, ethnography, and mechanism design, to
                online situations.”


                                       http://research.yahoo.com




Jake Hofman   (hofman@yahoo-inc.com)         Learning from Web Activity   JSM, 2011.08.01   3 / 42
possible to estimate accurately the age of ori-      biodiversification event in the Paleogene (65 to    8. M. Foote, in Evolutionary Patterns, J. B. C. Jackson et al.,
                                                                                                                     Eds. (Univ. of Chicago Press, Chicago, IL, 2001), vol. 245,
         gin of almost all extant genera. It is then possi-   23 million years ago) that is perhaps not yet          pp. 245–295.
         ble to plot a backward survivorship curve (8)        captured in Alroy et al.’s database (5, 7). The     9. M. D. Spalding et al., Bioscience 57, 573 (2007).
         for each of the 27 global bivalve provinces (9).     jury is still out on what may have caused this     10. S. M. Stanley, Paleobiology 33, 1 (2007).
                                                                                                                 11. M. J. Benton, B. C. Emerson, Palaeontology 50, 23 (2007).
             On the basis of these curves, Krug et al. find   event. But we should not lose sight of the fact
         that origination rates of marine bivalves in-        that the steep rise to prominence of many mod-                                         10.1126/science.1169410




         SOCIAL SCIENCE
                                                                                                                 A field is emerging that leverages the
                                                                                                                 capacity to collect and analyze data at a
         Computational Social Science                                                                            scale that may reveal patterns of individual
                                                                                                                 and group behaviors.
         David Lazer,1 Alex Pentland,2 Lada Adamic,3 Sinan Aral,2,4 Albert-László Barabási,5
         Devon Brewer,6 Nicholas Christakis,1 Noshir Contractor,7 James Fowler,8 Myron Gutmann,3
         Tony Jebara,9 Gary King,1 Michael Macy,10 Deb Roy,2 Marshall Van Alstyne2,11



         W“... a computational social science is emerging that
                     e live life in the network. We check     ment agencies such as the U.S. National Secur-     critiqued or replicated. Neither scenario will
                     our e-mails regularly, make mobile       ity Agency. Computational social science could     serve the long-term public interest of accumu-
                     phone calls from almost any loca-        become the exclusive domain of private com-        lating, verifying, and disseminating knowledge.
         tion, swipe transit cards to use public trans-       panies and government agencies. Alternatively,         What value might a computational social
                   leverages the capacity to collect and analyze data with an
         portation, and make purchases with credit
         cards. Our movements in public places may be
                                                              there might emerge a privileged set of aca-
                                                              demic researchers presiding over private data
                                                                                                                 science—based in an open academic environ-
                                                                                                                 ment—offer society, by enhancing understand-
                   unprecedented breadth and depth and scale ...”
         captured by video cameras, and our medical
         records stored as digital files. We may post blog
                                                              from which they produce papers that cannot be      ing of individuals and collectives? What are the

         entries accessible to anyone, or maintain friend-
         ships through online social networks. Each of
         these transactions leaves digital traces that can
                        http://sciencemag.org/content/323/5915/721
         be compiled into comprehensive pictures of
         both individual and group behavior, with the
         potential to transform our understanding of our
         lives, organizations, and societies.
              The capacity to collect and analyze massive
         amounts of data has transformed such fields as
         biology and physics. But the emergence of a
         data-driven “computational social science” has
         been much slower. Leading journals in eco-
         nomics, sociology, and political science show
         little evidence of this field. But computational
         social science is occurring—in Internet compa-
         nies such as Google and Yahoo, and in govern-
Jake Hofman    (hofman@yahoo-inc.com)                             Learning from Web Activity                                                JSM, 2011.08.01                     4 / 42
possible to estimate accurately the age of ori-      biodiversification event in the Paleogene (65 to    8. M. Foote, in Evolutionary Patterns, J. B. C. Jackson et al.,
                                                                                                                     Eds. (Univ. of Chicago Press, Chicago, IL, 2001), vol. 245,
         gin of almost all extant genera. It is then possi-   23 million years ago) that is perhaps not yet          pp. 245–295.
         ble to plot a backward survivorship curve (8)        captured in Alroy et al.’s database (5, 7). The     9. M. D. Spalding et al., Bioscience 57, 573 (2007).
         for each of the 27 global bivalve provinces (9).     jury is still out on what may have caused this     10. S. M. Stanley, Paleobiology 33, 1 (2007).
                                                                                                                 11. M. J. Benton, B. C. Emerson, Palaeontology 50, 23 (2007).
             On the basis of these curves, Krug et al. find   event. But we should not lose sight of the fact
         that origination rates of marine bivalves in-        that the steep rise to prominence of many mod-                                         10.1126/science.1169410




         SOCIAL SCIENCE
                                                                                                                 A field is emerging that leverages the
                                                                                                                 capacity to collect and analyze data at a
         Computational Social Science                                                                            scale that may reveal patterns of individual
                                                                                                                 and group behaviors.
         David Lazer,1 Alex Pentland,2 Lada Adamic,3 Sinan Aral,2,4 Albert-László Barabási,5
         Devon Brewer,6 Nicholas Christakis,1 Noshir Contractor,7 James Fowler,8 Myron Gutmann,3
         Tony Jebara,9 Gary King,1 Michael Macy,10 Deb Roy,2 Marshall Van Alstyne2,11



         W“... shares with other nascent interdisciplinary fields
                     e live life in the network. We check     ment agencies such as the U.S. National Secur-     critiqued or replicated. Neither scenario will
                     our e-mails regularly, make mobile       ity Agency. Computational social science could     serve the long-term public interest of accumu-
                     phone calls from almost any loca-        become the exclusive domain of private com-        lating, verifying, and disseminating knowledge.
         tion, swipe transit cards to use public trans-       panies and government agencies. Alternatively,         What value might a computational social
                   (e.g., sustainability science) the need to develop a
         portation, and make purchases with credit
         cards. Our movements in public places may be
                                                              there might emerge a privileged set of aca-
                                                              demic researchers presiding over private data
                                                                                                                 science—based in an open academic environ-
                                                                                                                 ment—offer society, by enhancing understand-
                   paradigm for training new scholars ...”
         captured by video cameras, and our medical
         records stored as digital files. We may post blog
                                                              from which they produce papers that cannot be      ing of individuals and collectives? What are the

         entries accessible to anyone, or maintain friend-
         ships through online social networks. Each of
         these transactions leaves digital traces that can
                        http://sciencemag.org/content/323/5915/721
         be compiled into comprehensive pictures of
         both individual and group behavior, with the
         potential to transform our understanding of our
         lives, organizations, and societies.
              The capacity to collect and analyze massive
         amounts of data has transformed such fields as
         biology and physics. But the emergence of a
         data-driven “computational social science” has
         been much slower. Leading journals in eco-
         nomics, sociology, and political science show
         little evidence of this field. But computational
         social science is occurring—in Internet compa-
         nies such as Google and Yahoo, and in govern-
Jake Hofman    (hofman@yahoo-inc.com)                             Learning from Web Activity                                                JSM, 2011.08.01                     4 / 42
The clean real story


                   “ We have a habit in writing articles published in
                   scientific journals to make the work as finished as
                   possible, to cover all the tracks, to not worry about the
                   blind alleys or to describe how you had the wrong idea
                   first, and so on. So there isn’t any place to publish, in
                   a dignified manner, what you actually did in order to
                   get to do the work ...”

                                                                      -Richard Feynman
                                                                    Nobel Lecture 1 , 1965




              1
                  http://bit.ly/feynmannobel
Jake Hofman   (hofman@yahoo-inc.com)   Learning from Web Activity              JSM, 2011.08.01   5 / 42
Outline




              • The clean story
              • The real story
              • Lessons learned




Jake Hofman   (hofman@yahoo-inc.com)   Learning from Web Activity   JSM, 2011.08.01   6 / 42
Demographic diversity on the Web
  with Irmak Sirer and Sharad Goel




                                       The clean story
                                        (covering our tracks)




Jake Hofman   (hofman@yahoo-inc.com)      Learning from Web Activity   JSM, 2011.08.01   7 / 42
Motivation




          Previous work is largely survey-based and focuses and group-level
                              differences in online access



Jake Hofman   (hofman@yahoo-inc.com)   Learning from Web Activity   JSM, 2011.08.01   8 / 42
Motivation



                “As of January 1997, we estimate that 5.2 million
                African Americans and 40.8 million whites have ever used
                the Web, and that 1.4 million African Americans and
                20.3 million whites used the Web in the past week.”

                                                               -Hoffman & Novak (1998)




Jake Hofman   (hofman@yahoo-inc.com)   Learning from Web Activity           JSM, 2011.08.01   8 / 42
Motivation




        Chang, et. al., ICWSM (2010)
Jake Hofman   (hofman@yahoo-inc.com)   Learning from Web Activity   JSM, 2011.08.01   9 / 42
Motivation

                                 Focus on activity instead of access




                                       How diverse is the Web?

              To what extent do online experiences vary across demographic
                                        groups?


Jake Hofman   (hofman@yahoo-inc.com)      Learning from Web Activity   JSM, 2011.08.01   10 / 42
Data




              • Representative sample of 265,000 individuals in the US, paid
                   via the Nielsen MegaPanel2
              • Log of anonymized, complete browsing activity from June
                   2009 through May 2010 (URLs viewed, timestamps, etc.)
              • Detailed individual and household demographic information
                   (age, education, income, race, sex, etc.)




              2
                  Special thanks to Mainak Mazumdar
Jake Hofman   (hofman@yahoo-inc.com)     Learning from Web Activity   JSM, 2011.08.01   11 / 42
Data



              • Transform all demographic attributes to binary variables
                e.g., Age → Over/Under 25, Race → White/Non-White,
                Sex → Female/Male




Jake Hofman   (hofman@yahoo-inc.com)   Learning from Web Activity   JSM, 2011.08.01   12 / 42
Data



              • Transform all demographic attributes to binary variables
                e.g., Age → Over/Under 25, Race → White/Non-White,
                Sex → Female/Male
              • Normalize pageviews to at most three domain levels, sans www
                e.g. www.yahoo.com → yahoo.com,
                us.mg2.mail.yahoo.com/neo/launch → mail.yahoo.com




Jake Hofman   (hofman@yahoo-inc.com)   Learning from Web Activity   JSM, 2011.08.01   12 / 42
Data



              • Transform all demographic attributes to binary variables
                e.g., Age → Over/Under 25, Race → White/Non-White,
                Sex → Female/Male
              • Normalize pageviews to at most three domain levels, sans www
                e.g. www.yahoo.com → yahoo.com,
                us.mg2.mail.yahoo.com/neo/launch → mail.yahoo.com
              • Restrict to top 100k (out of 9M+ total) most popular sites
                (by unique visitors)




Jake Hofman   (hofman@yahoo-inc.com)   Learning from Web Activity   JSM, 2011.08.01   12 / 42
Data



              • Transform all demographic attributes to binary variables
                e.g., Age → Over/Under 25, Race → White/Non-White,
                Sex → Female/Male
              • Normalize pageviews to at most three domain levels, sans www
                e.g. www.yahoo.com → yahoo.com,
                us.mg2.mail.yahoo.com/neo/launch → mail.yahoo.com
              • Restrict to top 100k (out of 9M+ total) most popular sites
                (by unique visitors)
              • Aggregate activity at the site, group, and user levels




Jake Hofman   (hofman@yahoo-inc.com)   Learning from Web Activity   JSM, 2011.08.01   12 / 42
Site-level skew

                                   How diverse are site audiences?


         • For each site and attribute,
              calculate the skew in visitors
              (e.g., 93% of pageviews on




                                                            Density
              foxnews.com are by White
              users)
         • For each attribute, plot the
              distribution of visitor skew
              across all sites
                                                                      0.0   0.2     0.4     0.6     0.8    1.0
                                                                            Proportion White Visitors




Jake Hofman   (hofman@yahoo-inc.com)      Learning from Web Activity                         JSM, 2011.08.01     13 / 42
Site-level skew
              Density




                                                                                           Density




                                                                                                                                                                                         Density
                         0.0    0.2    0.4             0.6         0.8         1.0                         0.0         0.2         0.4             0.6           0.8         1.0                         0.0         0.2         0.4    0.6   0.8   1.0
                               Proportion Female Visitors                                                          Proportion White Visitors                                                               Proportion College Educated Visitors




                                                                                                                                         Density
                                             Density




                                                                                                                                                         0.0           0.2         0.4             0.6         0.8         1.0
                                                             0.0         0.2         0.4             0.6         0.8         1.0                                  Proportion of Visitors With
                                                                         Proportion Adult Visitors                                                             Household Incomes Under $50,000




Jake Hofman             (hofman@yahoo-inc.com)                                                         Learning from Web Activity                                                                                                      JSM, 2011.08.01    14 / 42
Site-level skew



              Many sites have skew close the overall mean, but there also
                             popular, highly-skewed sites

                                                                                              Greater Than 90%               Less Than 10%

                                                                                                   youravon.com              coveritlive.com
                                                             Female
                                                                                               collectionsetc.com              needlive.com
                                                                                                   foxnews.com               blackplanet.com
                                                              White
                                                                                                wunderground.com            mediatakeout.com
                                                                                                 news.google.com            slumz.boxden.com
                                                  College Educated
                                                                                                   nytimes.com                  sythe.com
                                                                                                  mail.yahoo.com              nanowrimo.org
                                                 Over 25 Years Old
                                                                                                apps.facebook.com                cbox.ws
                                                 Household Income                                 scarleteen.com              opentable.com
                                                  Under $50,000                               boards.adultswim.com           marketwatch.com


              Table 1: A selection of popular sites that are homogeneous along various demographic dimensions.


                                                                                                          visually apparent from Figure 5, there are significant differ-
                                                           College     Female
                                         70
                                              Non−White       !
                                                                                Over 25
                                                                                                          ences in how groups distribute their time on the web. These
                                                                         !         !

                                         60
                                                                                                          differences—which, as mentioned above, hold for highly fre-
                    r−Capita Pageviews




                                                 !
                                                                                                          quented sites such as Facebook and YouTube—are in some
                                                White     No College
                                         50
                                                                                                          cases even more pronounced for lower traffic sites. For in-
                                         40                             Male                              stance, the gaming site pogo.com accounts for less than 1%
                      30
                                                                                                          of pageviews among both low and high income users, but
Jake Hofman   (hofman@yahoo-inc.com)                                            Under 25
                                                                                           Learning from Web Activityusers spend almost twice as much of their time 15 / 42
                                                                                                          low income                             JSM, 2011.08.01
Site-level skew
          This skew persists even when we restrict attention to the top 10k
                                     or 1k sites
                                                1.0                                                                                                                 1.0                                                                                                                     1.0




                                                                                                                                                                                                                                                     Proportion College Educated Visitors
              Proportion Female Site Visitors




                                                                                                                                   Proportion White Site Visitors
                                                0.8                                                                                                                 0.8                                                                                                                     0.8



                                                0.6                                                                                                                 0.6                                                                                                                     0.6



                                                0.4                                                                                                                 0.4                                                                                                                     0.4



                                                0.2                                                                                                                 0.2                                                                                                                     0.2



                                                0.0                                                                                                                 0.0                                                                                                                     0.0

                                                      101   102        103                                        104    105                                              101    102        103                                     104    105                                                    101     102        103    104   105
                                                                  Site Rank                                                                                                            Site Rank                                                                                                                Site Rank


                                                                                                           1.0                                                                                                                1.0




                                                                                                                                                                                            Household Incomes Under $50,000
                                                                       Proportion of Adult Site Visitors




                                                                                                           0.8                                                                                                                0.8

                                                                                                                                                                                               Proportion of Visitors With
                                                                                                           0.6                                                                                                                0.6



                                                                                                           0.4                                                                                                                0.4



                                                                                                           0.2                                                                                                                0.2



                                                                                                           0.0                                                                                                                0.0

                                                                                                                 101    102        103                                     104    105                                                101   102                              103                     104    105
                                                                                                                              Site Rank                                                                                                          Site Rank




Jake Hofman                                     (hofman@yahoo-inc.com)                                                                                               Learning from Web Activity                                                                                                                      JSM, 2011.08.01    16 / 42
Sites vs. ZIPs



               How do diversity of the online and offline worlds compare?

                                            Sites                                                   Sites
                                            ZIPs                                                    ZIPs
                            Density




                                                                                    Density
                                      0.0    0.2        0.4     0.6     0.8   1.0             0.0    0.2       0.4     0.6     0.8   1.0
                                                    Proportion Female                                       Proportion White




Jake Hofman   (hofman@yahoo-inc.com)                              Learning from Web Activity                                               JSM, 2011.08.01   17 / 42
Sites vs. ZIPs


               How do diversity of the online and offline worlds compare?

                                            Sites                                                   Sites
                                            ZIPs                                                    ZIPs
                            Density




                                                                                    Density
                                      0.0    0.2        0.4     0.6     0.8   1.0             0.0    0.2       0.4     0.6     0.8   1.0
                                                    Proportion Female                                       Proportion White




          As expected, neighborhoods are more gender-balanced than sites




Jake Hofman   (hofman@yahoo-inc.com)                              Learning from Web Activity                                               JSM, 2011.08.01   17 / 42
Sites vs. ZIPs


               How do diversity of the online and offline worlds compare?

                                            Sites                                                   Sites
                                            ZIPs                                                    ZIPs
                            Density




                                                                                    Density
                                      0.0    0.2        0.4     0.6     0.8   1.0             0.0    0.2       0.4     0.6     0.8   1.0
                                                    Proportion Female                                       Proportion White




              But sites typically have more racially diverse audiences than
                              neighborhoods have residents



Jake Hofman   (hofman@yahoo-inc.com)                              Learning from Web Activity                                               JSM, 2011.08.01   17 / 42
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity
Learning from Web Activity

Contenu connexe

En vedette (20)

Reuters2
Reuters2Reuters2
Reuters2
 
National geog
National geogNational geog
National geog
 
La nuit
La nuitLa nuit
La nuit
 
7강 기업교육론 20110413
7강 기업교육론 201104137강 기업교육론 20110413
7강 기업교육론 20110413
 
Maine Tobacco Control Timeline, 1897-2008
Maine Tobacco Control Timeline, 1897-2008Maine Tobacco Control Timeline, 1897-2008
Maine Tobacco Control Timeline, 1897-2008
 
Pay italie-lombardie
Pay italie-lombardiePay italie-lombardie
Pay italie-lombardie
 
Istanbul
Istanbul Istanbul
Istanbul
 
The bestof
The bestofThe bestof
The bestof
 
Neechadi TechConsultants
Neechadi TechConsultantsNeechadi TechConsultants
Neechadi TechConsultants
 
ομορσιες σος καναδα
ομορσιες σος καναδαομορσιες σος καναδα
ομορσιες σος καναδα
 
Why states should fund comprehensive tc programs
Why states should fund comprehensive tc programsWhy states should fund comprehensive tc programs
Why states should fund comprehensive tc programs
 
Presentation1
Presentation1Presentation1
Presentation1
 
Camtasia getting started guide
Camtasia getting started guideCamtasia getting started guide
Camtasia getting started guide
 
May26 arxiu notebook
May26   arxiu notebookMay26   arxiu notebook
May26 arxiu notebook
 
Bca 2010 draft
Bca 2010 draftBca 2010 draft
Bca 2010 draft
 
Mobile Analytics
Mobile AnalyticsMobile Analytics
Mobile Analytics
 
Canada beauty
Canada beautyCanada beauty
Canada beauty
 
46 Proven Ways to Make Money Online
46 Proven Ways to Make Money Online46 Proven Ways to Make Money Online
46 Proven Ways to Make Money Online
 
Powet point lana bloggerako 2
Powet point lana bloggerako 2Powet point lana bloggerako 2
Powet point lana bloggerako 2
 
Unusual churches
Unusual churchesUnusual churches
Unusual churches
 

Similaire à Learning from Web Activity

NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Sciencejakehofman
 
Visualization: ACS Sp 2010 CINF Keynote
Visualization: ACS Sp 2010 CINF KeynoteVisualization: ACS Sp 2010 CINF Keynote
Visualization: ACS Sp 2010 CINF KeynoteLiz Dorland
 
Elizabeth Churchill, "Data by Design"
Elizabeth Churchill, "Data by Design"Elizabeth Churchill, "Data by Design"
Elizabeth Churchill, "Data by Design"summersocialwebshop
 
Accomplishments And Challenges In Bioinformatics
Accomplishments And Challenges In BioinformaticsAccomplishments And Challenges In Bioinformatics
Accomplishments And Challenges In BioinformaticsDereck Downing
 
Micah Allen: Zombies or Cyborgs: Is Facebook eating your brain?
Micah Allen: Zombies or Cyborgs: Is Facebook eating your brain?Micah Allen: Zombies or Cyborgs: Is Facebook eating your brain?
Micah Allen: Zombies or Cyborgs: Is Facebook eating your brain?Seismonaut
 
A Future for Education: Some Core Thoughts
A Future for Education: Some Core ThoughtsA Future for Education: Some Core Thoughts
A Future for Education: Some Core ThoughtsJack Park
 
Towards Supporting Data-Intensive Research
Towards Supporting Data-Intensive ResearchTowards Supporting Data-Intensive Research
Towards Supporting Data-Intensive ResearchJano van Hemert
 
Advanced Data Mining and Integration Research for Europe (ADMIRE)
Advanced Data Mining and Integration Research for Europe (ADMIRE)Advanced Data Mining and Integration Research for Europe (ADMIRE)
Advanced Data Mining and Integration Research for Europe (ADMIRE)Jano van Hemert
 
Enlisting the Use of Educated Volunteers at a Distance -- or, why Crowdsourci...
Enlisting the Use of Educated Volunteers at a Distance -- or, why Crowdsourci...Enlisting the Use of Educated Volunteers at a Distance -- or, why Crowdsourci...
Enlisting the Use of Educated Volunteers at a Distance -- or, why Crowdsourci...andrea thomer
 
The Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and MusicThe Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and MusicDavid De Roure
 
Biotracker Presentation-Technology Mediated Social Participation
Biotracker Presentation-Technology Mediated Social ParticipationBiotracker Presentation-Technology Mediated Social Participation
Biotracker Presentation-Technology Mediated Social ParticipationBiotrackers
 
Biodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBiodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBryan Heidorn
 
There is No Intelligent Life Down Here
There is No Intelligent Life Down HereThere is No Intelligent Life Down Here
There is No Intelligent Life Down HerePhilip Bourne
 
Discover Data Portal
Discover Data PortalDiscover Data Portal
Discover Data PortalTom Loughran
 
Data Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishData Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishCarly Strasser
 
EarthCube Stakeholder Alignment Survey Introduction to the Data by Joel Cutch...
EarthCube Stakeholder Alignment Survey Introduction to the Data by Joel Cutch...EarthCube Stakeholder Alignment Survey Introduction to the Data by Joel Cutch...
EarthCube Stakeholder Alignment Survey Introduction to the Data by Joel Cutch...EarthCube
 
Understanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceUnderstanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceAndrew Sallans
 

Similaire à Learning from Web Activity (20)

NYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social ScienceNYC Data Science Meetup: Computational Social Science
NYC Data Science Meetup: Computational Social Science
 
Dynamic Nature of Knowledge
Dynamic Nature of KnowledgeDynamic Nature of Knowledge
Dynamic Nature of Knowledge
 
Visualization: ACS Sp 2010 CINF Keynote
Visualization: ACS Sp 2010 CINF KeynoteVisualization: ACS Sp 2010 CINF Keynote
Visualization: ACS Sp 2010 CINF Keynote
 
Elizabeth Churchill, "Data by Design"
Elizabeth Churchill, "Data by Design"Elizabeth Churchill, "Data by Design"
Elizabeth Churchill, "Data by Design"
 
Accomplishments And Challenges In Bioinformatics
Accomplishments And Challenges In BioinformaticsAccomplishments And Challenges In Bioinformatics
Accomplishments And Challenges In Bioinformatics
 
Micah Allen: Zombies or Cyborgs: Is Facebook eating your brain?
Micah Allen: Zombies or Cyborgs: Is Facebook eating your brain?Micah Allen: Zombies or Cyborgs: Is Facebook eating your brain?
Micah Allen: Zombies or Cyborgs: Is Facebook eating your brain?
 
A Future for Education: Some Core Thoughts
A Future for Education: Some Core ThoughtsA Future for Education: Some Core Thoughts
A Future for Education: Some Core Thoughts
 
Towards Supporting Data-Intensive Research
Towards Supporting Data-Intensive ResearchTowards Supporting Data-Intensive Research
Towards Supporting Data-Intensive Research
 
Advanced Data Mining and Integration Research for Europe (ADMIRE)
Advanced Data Mining and Integration Research for Europe (ADMIRE)Advanced Data Mining and Integration Research for Europe (ADMIRE)
Advanced Data Mining and Integration Research for Europe (ADMIRE)
 
Enlisting the Use of Educated Volunteers at a Distance -- or, why Crowdsourci...
Enlisting the Use of Educated Volunteers at a Distance -- or, why Crowdsourci...Enlisting the Use of Educated Volunteers at a Distance -- or, why Crowdsourci...
Enlisting the Use of Educated Volunteers at a Distance -- or, why Crowdsourci...
 
The Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and MusicThe Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and Music
 
Biotracker Presentation-Technology Mediated Social Participation
Biotracker Presentation-Technology Mediated Social ParticipationBiotracker Presentation-Technology Mediated Social Participation
Biotracker Presentation-Technology Mediated Social Participation
 
Biodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBiodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary Challenge
 
There is No Intelligent Life Down Here
There is No Intelligent Life Down HereThere is No Intelligent Life Down Here
There is No Intelligent Life Down Here
 
E research overview gahegan bioinformatics workshop 2010
E research overview gahegan bioinformatics workshop 2010E research overview gahegan bioinformatics workshop 2010
E research overview gahegan bioinformatics workshop 2010
 
Discover Data Portal
Discover Data PortalDiscover Data Portal
Discover Data Portal
 
Data Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishData Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or Perish
 
Cybersociety
CybersocietyCybersociety
Cybersociety
 
EarthCube Stakeholder Alignment Survey Introduction to the Data by Joel Cutch...
EarthCube Stakeholder Alignment Survey Introduction to the Data by Joel Cutch...EarthCube Stakeholder Alignment Survey Introduction to the Data by Joel Cutch...
EarthCube Stakeholder Alignment Survey Introduction to the Data by Joel Cutch...
 
Understanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceUnderstanding the Big Picture of e-Science
Understanding the Big Picture of e-Science
 

Plus de jakehofman

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2jakehofman
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1jakehofman
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networksjakehofman
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classificationjakehofman
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationjakehofman
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1jakehofman
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in Rjakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overviewjakehofman
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systemsjakehofman
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayesjakehofman
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scalejakehofman
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Countingjakehofman
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studiesjakehofman
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classificationjakehofman
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regressionjakehofman
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experimentsjakehofman
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wranglingjakehofman
 
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIComputational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIjakehofman
 

Plus de jakehofman (20)

Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
Modeling Social Data, Lecture 12: Causality & Experiments, Part 2
 
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
Modeling Social Data, Lecture 11: Causality and Experiments, Part 1
 
Modeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: NetworksModeling Social Data, Lecture 10: Networks
Modeling Social Data, Lecture 10: Networks
 
Modeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: ClassificationModeling Social Data, Lecture 8: Classification
Modeling Social Data, Lecture 8: Classification
 
Modeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalizationModeling Social Data, Lecture 7: Model complexity and generalization
Modeling Social Data, Lecture 7: Model complexity and generalization
 
Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1Modeling Social Data, Lecture 6: Regression, Part 1
Modeling Social Data, Lecture 6: Regression, Part 1
 
Modeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at ScaleModeling Social Data, Lecture 4: Counting at Scale
Modeling Social Data, Lecture 4: Counting at Scale
 
Modeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in RModeling Social Data, Lecture 3: Data manipulation in R
Modeling Social Data, Lecture 3: Data manipulation in R
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: OverviewModeling Social Data, Lecture 1: Overview
Modeling Social Data, Lecture 1: Overview
 
Modeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation SystemsModeling Social Data, Lecture 8: Recommendation Systems
Modeling Social Data, Lecture 8: Recommendation Systems
 
Modeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive BayesModeling Social Data, Lecture 6: Classification with Naive Bayes
Modeling Social Data, Lecture 6: Classification with Naive Bayes
 
Modeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at ScaleModeling Social Data, Lecture 3: Counting at Scale
Modeling Social Data, Lecture 3: Counting at Scale
 
Modeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to CountingModeling Social Data, Lecture 2: Introduction to Counting
Modeling Social Data, Lecture 2: Introduction to Counting
 
Modeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case StudiesModeling Social Data, Lecture 1: Case Studies
Modeling Social Data, Lecture 1: Case Studies
 
Computational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: ClassificationComputational Social Science, Lecture 13: Classification
Computational Social Science, Lecture 13: Classification
 
Computational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: RegressionComputational Social Science, Lecture 11: Regression
Computational Social Science, Lecture 11: Regression
 
Computational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online ExperimentsComputational Social Science, Lecture 10: Online Experiments
Computational Social Science, Lecture 10: Online Experiments
 
Computational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data WranglingComputational Social Science, Lecture 09: Data Wrangling
Computational Social Science, Lecture 09: Data Wrangling
 
Computational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part IIComputational Social Science, Lecture 08: Counting Fast, Part II
Computational Social Science, Lecture 08: Counting Fast, Part II
 

Dernier

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 

Dernier (20)

Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 

Learning from Web Activity

  • 1. Learning from Web Activity Jake Hofman Joint work with Irmak Sirer & Sharad Goel Yahoo! Research August 1, 2011 Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 1 / 42
  • 2. Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 2 / 42
  • 3. “... applies social sciences techniques, including algorithmic game theory, economics, network analysis, psychology, ethnography, and mechanism design, to online situations.” http://research.yahoo.com Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 3 / 42
  • 4. possible to estimate accurately the age of ori- biodiversification event in the Paleogene (65 to 8. M. Foote, in Evolutionary Patterns, J. B. C. Jackson et al., Eds. (Univ. of Chicago Press, Chicago, IL, 2001), vol. 245, gin of almost all extant genera. It is then possi- 23 million years ago) that is perhaps not yet pp. 245–295. ble to plot a backward survivorship curve (8) captured in Alroy et al.’s database (5, 7). The 9. M. D. Spalding et al., Bioscience 57, 573 (2007). for each of the 27 global bivalve provinces (9). jury is still out on what may have caused this 10. S. M. Stanley, Paleobiology 33, 1 (2007). 11. M. J. Benton, B. C. Emerson, Palaeontology 50, 23 (2007). On the basis of these curves, Krug et al. find event. But we should not lose sight of the fact that origination rates of marine bivalves in- that the steep rise to prominence of many mod- 10.1126/science.1169410 SOCIAL SCIENCE A field is emerging that leverages the capacity to collect and analyze data at a Computational Social Science scale that may reveal patterns of individual and group behaviors. David Lazer,1 Alex Pentland,2 Lada Adamic,3 Sinan Aral,2,4 Albert-László Barabási,5 Devon Brewer,6 Nicholas Christakis,1 Noshir Contractor,7 James Fowler,8 Myron Gutmann,3 Tony Jebara,9 Gary King,1 Michael Macy,10 Deb Roy,2 Marshall Van Alstyne2,11 W“... a computational social science is emerging that e live life in the network. We check ment agencies such as the U.S. National Secur- critiqued or replicated. Neither scenario will our e-mails regularly, make mobile ity Agency. Computational social science could serve the long-term public interest of accumu- phone calls from almost any loca- become the exclusive domain of private com- lating, verifying, and disseminating knowledge. tion, swipe transit cards to use public trans- panies and government agencies. Alternatively, What value might a computational social leverages the capacity to collect and analyze data with an portation, and make purchases with credit cards. Our movements in public places may be there might emerge a privileged set of aca- demic researchers presiding over private data science—based in an open academic environ- ment—offer society, by enhancing understand- unprecedented breadth and depth and scale ...” captured by video cameras, and our medical records stored as digital files. We may post blog from which they produce papers that cannot be ing of individuals and collectives? What are the entries accessible to anyone, or maintain friend- ships through online social networks. Each of these transactions leaves digital traces that can http://sciencemag.org/content/323/5915/721 be compiled into comprehensive pictures of both individual and group behavior, with the potential to transform our understanding of our lives, organizations, and societies. The capacity to collect and analyze massive amounts of data has transformed such fields as biology and physics. But the emergence of a data-driven “computational social science” has been much slower. Leading journals in eco- nomics, sociology, and political science show little evidence of this field. But computational social science is occurring—in Internet compa- nies such as Google and Yahoo, and in govern- Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 4 / 42
  • 5. possible to estimate accurately the age of ori- biodiversification event in the Paleogene (65 to 8. M. Foote, in Evolutionary Patterns, J. B. C. Jackson et al., Eds. (Univ. of Chicago Press, Chicago, IL, 2001), vol. 245, gin of almost all extant genera. It is then possi- 23 million years ago) that is perhaps not yet pp. 245–295. ble to plot a backward survivorship curve (8) captured in Alroy et al.’s database (5, 7). The 9. M. D. Spalding et al., Bioscience 57, 573 (2007). for each of the 27 global bivalve provinces (9). jury is still out on what may have caused this 10. S. M. Stanley, Paleobiology 33, 1 (2007). 11. M. J. Benton, B. C. Emerson, Palaeontology 50, 23 (2007). On the basis of these curves, Krug et al. find event. But we should not lose sight of the fact that origination rates of marine bivalves in- that the steep rise to prominence of many mod- 10.1126/science.1169410 SOCIAL SCIENCE A field is emerging that leverages the capacity to collect and analyze data at a Computational Social Science scale that may reveal patterns of individual and group behaviors. David Lazer,1 Alex Pentland,2 Lada Adamic,3 Sinan Aral,2,4 Albert-László Barabási,5 Devon Brewer,6 Nicholas Christakis,1 Noshir Contractor,7 James Fowler,8 Myron Gutmann,3 Tony Jebara,9 Gary King,1 Michael Macy,10 Deb Roy,2 Marshall Van Alstyne2,11 W“... shares with other nascent interdisciplinary fields e live life in the network. We check ment agencies such as the U.S. National Secur- critiqued or replicated. Neither scenario will our e-mails regularly, make mobile ity Agency. Computational social science could serve the long-term public interest of accumu- phone calls from almost any loca- become the exclusive domain of private com- lating, verifying, and disseminating knowledge. tion, swipe transit cards to use public trans- panies and government agencies. Alternatively, What value might a computational social (e.g., sustainability science) the need to develop a portation, and make purchases with credit cards. Our movements in public places may be there might emerge a privileged set of aca- demic researchers presiding over private data science—based in an open academic environ- ment—offer society, by enhancing understand- paradigm for training new scholars ...” captured by video cameras, and our medical records stored as digital files. We may post blog from which they produce papers that cannot be ing of individuals and collectives? What are the entries accessible to anyone, or maintain friend- ships through online social networks. Each of these transactions leaves digital traces that can http://sciencemag.org/content/323/5915/721 be compiled into comprehensive pictures of both individual and group behavior, with the potential to transform our understanding of our lives, organizations, and societies. The capacity to collect and analyze massive amounts of data has transformed such fields as biology and physics. But the emergence of a data-driven “computational social science” has been much slower. Leading journals in eco- nomics, sociology, and political science show little evidence of this field. But computational social science is occurring—in Internet compa- nies such as Google and Yahoo, and in govern- Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 4 / 42
  • 6. The clean real story “ We have a habit in writing articles published in scientific journals to make the work as finished as possible, to cover all the tracks, to not worry about the blind alleys or to describe how you had the wrong idea first, and so on. So there isn’t any place to publish, in a dignified manner, what you actually did in order to get to do the work ...” -Richard Feynman Nobel Lecture 1 , 1965 1 http://bit.ly/feynmannobel Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 5 / 42
  • 7. Outline • The clean story • The real story • Lessons learned Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 6 / 42
  • 8. Demographic diversity on the Web with Irmak Sirer and Sharad Goel The clean story (covering our tracks) Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 7 / 42
  • 9. Motivation Previous work is largely survey-based and focuses and group-level differences in online access Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 8 / 42
  • 10. Motivation “As of January 1997, we estimate that 5.2 million African Americans and 40.8 million whites have ever used the Web, and that 1.4 million African Americans and 20.3 million whites used the Web in the past week.” -Hoffman & Novak (1998) Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 8 / 42
  • 11. Motivation Chang, et. al., ICWSM (2010) Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 9 / 42
  • 12. Motivation Focus on activity instead of access How diverse is the Web? To what extent do online experiences vary across demographic groups? Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 10 / 42
  • 13. Data • Representative sample of 265,000 individuals in the US, paid via the Nielsen MegaPanel2 • Log of anonymized, complete browsing activity from June 2009 through May 2010 (URLs viewed, timestamps, etc.) • Detailed individual and household demographic information (age, education, income, race, sex, etc.) 2 Special thanks to Mainak Mazumdar Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 11 / 42
  • 14. Data • Transform all demographic attributes to binary variables e.g., Age → Over/Under 25, Race → White/Non-White, Sex → Female/Male Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 12 / 42
  • 15. Data • Transform all demographic attributes to binary variables e.g., Age → Over/Under 25, Race → White/Non-White, Sex → Female/Male • Normalize pageviews to at most three domain levels, sans www e.g. www.yahoo.com → yahoo.com, us.mg2.mail.yahoo.com/neo/launch → mail.yahoo.com Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 12 / 42
  • 16. Data • Transform all demographic attributes to binary variables e.g., Age → Over/Under 25, Race → White/Non-White, Sex → Female/Male • Normalize pageviews to at most three domain levels, sans www e.g. www.yahoo.com → yahoo.com, us.mg2.mail.yahoo.com/neo/launch → mail.yahoo.com • Restrict to top 100k (out of 9M+ total) most popular sites (by unique visitors) Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 12 / 42
  • 17. Data • Transform all demographic attributes to binary variables e.g., Age → Over/Under 25, Race → White/Non-White, Sex → Female/Male • Normalize pageviews to at most three domain levels, sans www e.g. www.yahoo.com → yahoo.com, us.mg2.mail.yahoo.com/neo/launch → mail.yahoo.com • Restrict to top 100k (out of 9M+ total) most popular sites (by unique visitors) • Aggregate activity at the site, group, and user levels Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 12 / 42
  • 18. Site-level skew How diverse are site audiences? • For each site and attribute, calculate the skew in visitors (e.g., 93% of pageviews on Density foxnews.com are by White users) • For each attribute, plot the distribution of visitor skew across all sites 0.0 0.2 0.4 0.6 0.8 1.0 Proportion White Visitors Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 13 / 42
  • 19. Site-level skew Density Density Density 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Proportion Female Visitors Proportion White Visitors Proportion College Educated Visitors Density Density 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Proportion of Visitors With Proportion Adult Visitors Household Incomes Under $50,000 Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 14 / 42
  • 20. Site-level skew Many sites have skew close the overall mean, but there also popular, highly-skewed sites Greater Than 90% Less Than 10% youravon.com coveritlive.com Female collectionsetc.com needlive.com foxnews.com blackplanet.com White wunderground.com mediatakeout.com news.google.com slumz.boxden.com College Educated nytimes.com sythe.com mail.yahoo.com nanowrimo.org Over 25 Years Old apps.facebook.com cbox.ws Household Income scarleteen.com opentable.com Under $50,000 boards.adultswim.com marketwatch.com Table 1: A selection of popular sites that are homogeneous along various demographic dimensions. visually apparent from Figure 5, there are significant differ- College Female 70 Non−White ! Over 25 ences in how groups distribute their time on the web. These ! ! 60 differences—which, as mentioned above, hold for highly fre- r−Capita Pageviews ! quented sites such as Facebook and YouTube—are in some White No College 50 cases even more pronounced for lower traffic sites. For in- 40 Male stance, the gaming site pogo.com accounts for less than 1% 30 of pageviews among both low and high income users, but Jake Hofman (hofman@yahoo-inc.com) Under 25 Learning from Web Activityusers spend almost twice as much of their time 15 / 42 low income JSM, 2011.08.01
  • 21. Site-level skew This skew persists even when we restrict attention to the top 10k or 1k sites 1.0 1.0 1.0 Proportion College Educated Visitors Proportion Female Site Visitors Proportion White Site Visitors 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0.0 0.0 0.0 101 102 103 104 105 101 102 103 104 105 101 102 103 104 105 Site Rank Site Rank Site Rank 1.0 1.0 Household Incomes Under $50,000 Proportion of Adult Site Visitors 0.8 0.8 Proportion of Visitors With 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 101 102 103 104 105 101 102 103 104 105 Site Rank Site Rank Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 16 / 42
  • 22. Sites vs. ZIPs How do diversity of the online and offline worlds compare? Sites Sites ZIPs ZIPs Density Density 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Proportion Female Proportion White Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 17 / 42
  • 23. Sites vs. ZIPs How do diversity of the online and offline worlds compare? Sites Sites ZIPs ZIPs Density Density 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Proportion Female Proportion White As expected, neighborhoods are more gender-balanced than sites Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 17 / 42
  • 24. Sites vs. ZIPs How do diversity of the online and offline worlds compare? Sites Sites ZIPs ZIPs Density Density 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Proportion Female Proportion White But sites typically have more racially diverse audiences than neighborhoods have residents Jake Hofman (hofman@yahoo-inc.com) Learning from Web Activity JSM, 2011.08.01 17 / 42