SlideShare une entreprise Scribd logo
1  sur  52
Télécharger pour lire hors ligne
Detecting, Modeling, & Predicting
       User Temporal Intention
            in Social Media
                               Hany M. SalahEldeen
                              Old Dominion University
                                    Department of Computer Science



                              Advisor: Dr. Michael L. Nelson
                          TPDL ‘12 Doctoral Consortium
Hany SalahEldeen & Michael Nelson                              Doctoral Consortium
Let’s breakdown the title first…



   Detecting, Modeling, & Predicting
       User Temporal Intention
            in Social Media


Hany SalahEldeen & Michael Nelson   Doctoral Consortium
Let’s breakdown the title first…



   Detecting, Modeling, & Predicting
       User Temporal Intention
            in Social Media


Hany SalahEldeen & Michael Nelson   Doctoral Consortium
Let’s breakdown the title first…



   Detecting, Modeling, & Predicting
       User Temporal Intention
            in Social Media


Hany SalahEldeen & Michael Nelson   Doctoral Consortium
Scenario 1:
   Jenny reading Jeff’s tweets



Hany SalahEldeen & Michael Nelson   Doctoral Consortium
Michael Jackson Dies




                          Snapshot on: June 25th 2009
       http://web.archive.org/web/20090625232522/http://www.cnn.com/
Hany SalahEldeen & Michael Nelson                   Doctoral Consortium
Jeff tweets about it…




                                Published on: June 25th 2009
                      https://twitter.com/mdnitehk/status/2333993907
Hany SalahEldeen & Michael Nelson                         Doctoral Consortium
Jenny is off the grid…

  Jeff’s friend Jenny was on a vacation in Hawaii for a
  month




Hany SalahEldeen & Michael Nelson     Doctoral Consortium
Jenny starts catching up a month later

  When she came back she checked Jeff’s tweets and
  was shocked!




                                                         Read on: July26th 2009
                      https://twitter.com/mdnitehk/status/2333993907
Hany SalahEldeen & Michael Nelson                         Doctoral Consortium
Jenny follows the link on July 26th
  She quickly clicked on the link in the tweet…




                                                                            CNN page on:
                                                                            July 26th 2009




            http://web.archive.org/web/20090726234411/http://www.cnn.com/
Hany SalahEldeen & Michael Nelson                     Doctoral Consortium
Jenny is confused!

• Implication:
  • Jenny thought Jeff is making a joke about her
    favorite singer and she got mad at him

• Problem:
  • The tweet and the resource the tweet links
    to have become unsynchronized.



Hany SalahEldeen & Michael Nelson        Doctoral Consortium
Scenario 2:
   The Egyptian Revolution



Hany SalahEldeen & Michael Nelson   Doctoral Consortium
The Egyptian Revolution Jan 2011




Hany SalahEldeen & Michael Nelson   Doctoral Consortium
Reading about it in Storify.com a year
            later in March 2012




                     http://storify.com/maq4sure/egypts-revolution
Hany SalahEldeen & Michael Nelson                          Doctoral Consortium
I noticed some shared images are missing




                         http://storify.com/maq4sure/egypts-revolution
Hany SalahEldeen & Michael Nelson                           Doctoral Consortium
Some tweets are still intact




              https://twitter.com/miss_amy_qb/status/32477898581483521
Hany SalahEldeen & Michael Nelson                      Doctoral Consortium
…and some lost their meaning with
   the disappearance of the images


                   https://twitter.com/aishes/status/32485352102952960
                                                                                Missing ?




               https://twitter.com/omar_chaaban/status/32203697597452289

Hany SalahEldeen & Michael Nelson                         Doctoral Consortium
The tweet remains but the shared
          image disappeared…




                             http://yfrog.com/h5923xrvbqqvgzj
Hany SalahEldeen & Michael Nelson                          Doctoral Consortium
Cairo….we have a problem!
• Implication:
  • The reader cannot understand what the
    author of the tweet meant because the image
    is not available.

• Problem:
  • The post is available but the linked resource
    (image) is completely missing.


Hany SalahEldeen & Michael Nelson   Doctoral Consortium
…back to the title



   Detecting, Modeling, & Predicting
       User Temporal Intention
            in Social Media


Hany SalahEldeen & Michael Nelson         Doctoral Consortium
…back to the title



   Detecting, Modeling, & Predicting
       User Temporal Intention
            in Social Media


Hany SalahEldeen & Michael Nelson         Doctoral Consortium
The Anatomy of a Tweet




Hany SalahEldeen & Michael Nelson   Doctoral Consortium
The Anatomy of a Tweet
                                                 Author’s username
                                                 Other user mention
Social
 Post                                                               Tweet Body




   Interaction Publishing Shortened URL             Hash Tag
   options     timestamp to resource

                                     Shared Resource
 Hany SalahEldeen & Michael Nelson                 Doctoral Consortium
3 URIs = 3 Chances to fail




Hany SalahEldeen & Michael Nelson   Doctoral Consortium
Explanation in MJ’s example
            t3   t4   t5        t7   t8   t9   …   tn
  t1   t2                  t6
User’s Temporal Intention
      The Focus of our research                     Instrumented shortener



          Share time                    Implicit        Explicit

           Click time                   Implicit        Explicit
                                                   Instrumented web client
               Out of our scope
                Purview of Facebook,                Engineering problem
                Twitter, Google, …etc
                                                     Solved by providing
                                                            tools
Hany SalahEldeen & Michael Nelson                     Doctoral Consortium
Sometimes you want a
                      previous version




                                    The Correct Temporal
                                         Intention

                CNN.com at the closest time to the tweet: 25th June 2009 ~ 7pm
Hany SalahEldeen & Michael Nelson                          Doctoral Consortium
Sometimes you want the
                    current version




                                    The Correct Temporal
                                         Intention

                In this case the current state of the press releases page
Hany SalahEldeen & Michael Nelson                        Doctoral Consortium
Research Question

          Can we estimate the users’
        intention at the time of posting
           and reading to predict and
        maintain temporal consistency?



Hany SalahEldeen & Michael Nelson      Doctoral Consortium
Research Goals
  • Detect the temporal intention of the:
        1.     Author upon sharing time
        2.     The reader upon dereferencing time
  • Model this intention as a function of time, nature of the resource,
       and its context.
  • Predict how resources change with time and the intention behind
       sharing them to minimize inconsistency.
  • Implement the prediction model to automatically preserve
       vulnerable social content that is prone to change or loss.
  • Create an environment implementing this framework that
       provides a smooth temporal navigation of the social web.

Hany SalahEldeen & Michael Nelson                   Doctoral Consortium
Related Work
     •    User’s Web Search Intention    • Persistence of shared resources
           –   A. Ashkan ECIR ’09             – M. Nelson D-Lib ‘02
           –   C. Lee AINA ‘05                – R. Sanderson OR’11
           –   A. Loser IRSW ‘08              – F. McCown JCDL ‘07
           –   L. Azzopardi ECIR ‘09
           –   R. Baeza-Yates SPIR‘06
           –   N. Dai HT ’11
                                         • URL Shortening
                                              – D. Antoniades WWW ’11
     •    Commercial Intention
           –   Q. Guo SIGIR ’10          • Tweeting, Micro-blogging and Popularity
           –   A. Benczur AIRWeb ’07
                                              – S. Wu WWW ’11
                                              – A. Java SNA-KDD ’07
     •    Sentiment Analysis
                                              – H. Kwak WWW ’10
           –   G. Mishne AAAI ‘06
           –   J. Bollen JCS ‘11
                                         •   Social Networks Growth and Evolution
     •    Access to Archives
                                              – B. Meeder WWW ’11
           –   H. Van de Sompel OR‘09




Hany SalahEldeen & Michael Nelson                        Doctoral Consortium
Dissertation Plan
                                      BEGIN
                                              Read Literature
                                              Collect Datasets
                                              Analyze Archives Coverage
                                              Analyze Shortened URIs
                                              Prototype Application
                                              Analyze Shared Resources Persistence and Coverage
                                                                                       Current
                                              Analyze Contextual Intention
                                                                                        State

                                              Create Intention-based dataset
                                              Extract Intention Features
                                              Train a Parametric Model to predict intention
                                              Evaluate, test, cross-validate the model
                                              Create a mockup application
                                              Extend the model to induce preservation
                                              Finish Writing the Dissertation


                                    PhD Defense
Hany SalahEldeen & Michael Nelson                                Doctoral Consortium
Dissertation Plan
                                      BEGIN
                                              Read Literature
                                              Collect Datasets
                                              Analyze Archives Coverage
                                              Analyze Shortened URIs
                                              Prototype Application
                                              Analyze Shared Resources Persistence and Coverage

                                              Analyze Contextual Intention

                                              Create Intention-based dataset
                                              Extract Intention Features
                                              Train a Parametric Model to predict intention
                                              Evaluate, test, cross-validate the model
                                              Create a mockup application
                                              Extend the model to induce preservation
                                              Finish Writing the Dissertation


                                    PhD Defense
Hany SalahEldeen & Michael Nelson                                Doctoral Consortium
Estimating Web Archiving Coverage
  • Goal: Estimate how much of the public web is present in the public archives
    and how many copies are available?
  • Action:
     – Getting 4 different datasets from 4 different sources:
              •   Search Engines Indices
              •   Bit.ly
              •   DMOZ
              •   Delicious.
  • Results:                                            *
                                                            * Table Courtesy of
                                                            Ahmed AlSum JCDL 2011




  • Publications:
        – How much of the web is archived? JCDL '11

Hany SalahEldeen & Michael Nelson                     Doctoral Consortium
Dissertation Plan
                                      BEGIN
                                              Read Literature
                                              Collect Datasets
                                              Analyze Archives Coverage
                                              Analyze Shortened URIs
                                              Prototype Application
                                              Analyze Shared Resources Persistence and Coverage

                                              Analyze Contextual Intention

                                              Create Intention-based dataset
                                              Extract Intention Features
                                              Train a Parametric Model to predict intention
                                              Evaluate, test, cross-validate the model
                                              Create a mockup application
                                              Extend the model to induce preservation
                                              Finish Writing the Dissertation


                                    PhD Defense
Hany SalahEldeen & Michael Nelson                                Doctoral Consortium
Shortened URI analysis
  •    Goal: Have a better understanding of URI shortening and
       resolving, understand the effect of time on this process and the correlation
       between the page’s features and characteristics, and its resolution.

  •    Action:
        – Fresh Bit.lys
        – Get hourly clicklogs, rate of change, social networking spread, and other
          contextual information
        – Longitudinal study

  •    Evaluation:
        – Compare results with frequency of change analysis of Cho and Garcia-
          Molina.
        – Compare results with Antoniades et al. WWW 2011.

Hany SalahEldeen & Michael Nelson                      Doctoral Consortium
Dissertation Plan
             BEGIN
                      Read Literature
                      Collect Datasets
                      Analyze Archives Coverage
                      Analyze Shortened URIs
                      Prototype Application
                       Analyze Shared Resources Persistence and Coverage
                       Analyze Contextual Intention

                       Create Intention-based dataset
                       Extract Intention Features
                       Train a Parametric Model to predict intention
                       Evaluate, test, cross-validate the model
                       Create a mockup application
                       Extend the model to induce preservation
                       Finish Writing the Dissertation


         PhD Defense
Hany SalahEldeen & Michael Nelson                             Doctoral Consortium
Estimating Loss of Shared
               Resources in Social Media
•   Goal: Estimate how much of the public web is present in the public archives
    and how many copies are available?
•   Action:
     – Sampling from 6 public events
     – Events spanning 3 years
     – Existence in the current web
     – Existence in the public archives
     – Find relation with time
•   Results:
     – After 1st year ~11% will be lost
     – After that we will continue on losing 0.02% daily
•   Publications:
     – A year after the Egyptian revolution, 10% of the social media documentation is gone.
       http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html
     – Losing my revolution: How Many Resources Shared on Social Media Have Been
       Lost? TPDL '12
Hany SalahEldeen & Michael Nelson                              Doctoral Consortium
Dissertation Plan
                                    BEGIN
                                            Read Literature
                                            Collect Datasets
                                            Analyze Archives Coverage
                                            Analyze Shortened URIs
                                            Prototype Application
                                            Analyze Shared Resources Persistence and Coverage

                                            User Intention Analysis
                                            Create Intention-based dataset
                                            Extract Intention Features
                                            Train a Parametric Model to predict intention
                                            Evaluate, test, cross-validate the model
                                            Create a mockup application
                                            Extend the model to induce preservation
                                            Finish Writing the Dissertation


                               PhD Defense
Hany SalahEldeen & Michael Nelson                               Doctoral Consortium
User Intention Analysis
  •    Goal: Have a better understanding of User Intention and what factors affect
       it. Also create a new testing and training set.

  •    Action:
        –   Get a sample set of tweets selected at random
        –   Extract the URIs
        –   Get closest Memento
        –   Download the snapshot & current version
        –   Use Amazon’s Mechanical Turk in choosing the best version

  •    Evaluation:
        – Measure cross-rater agreement and confidence.



Hany SalahEldeen & Michael Nelson                        Doctoral Consortium
Proposed Work
  •    Data Gathering
  •    Feature Extraction
  •    Modeling the intention engine
  •    Evaluation
  •    Application: Prediction and Preservation




Hany SalahEldeen & Michael Nelson        Doctoral Consortium
Possible Solution for Jenny




Hany SalahEldeen & Michael Nelson   Doctoral Consortium
Possible Solution for Jenny

                                    The resource has changed since last time it was shared
                                    Do you wish to see the version the author intended or
                                    the current version?

                                                   Current Version     Intended Version




Hany SalahEldeen & Michael Nelson                                       Doctoral Consortium
Proposed Framework

                                                                              Archived Version




                                       Feature
                                                        Classifier
                                      Extraction

                                    Example Features:                         Current Version

                                    - Tweet Content
                                    - Click Logs
                                    - Other Tweets
                                    - Shared Resource
                                    - Timemaps




Hany SalahEldeen & Michael Nelson                       Doctoral Consortium
Tpdl Doctoral consortium 2012
Extra Slides


Hany SalahEldeen & Michael Nelson   Doctoral Consortium
Archive Shortener Application




Hany SalahEldeen & Michael Nelson   Doctoral Consortium
Estimating Shared Resources Loss in Social Media




Hany SalahEldeen & Michael Nelson   Doctoral Consortium
Estimating Shared Resources Loss in Social Media




Hany SalahEldeen & Michael Nelson   Doctoral Consortium
My Publications
 •    S. G. Ainsworth, A. Alsum, H. SalahEldeen, M. C. Weigle, and M. L. Nelson. How much
      of the web is archived? In Proceedings of the 11th annual international ACM/IEEE
      joint conference on Digital libraries, JCDL '11, pages 133{136, 2011.

 •    H. SalahEldeen and M. L. Nelson. Losing my revolution: How much social media
      content has been lost? Accepted in TPDL 2012


 •    H. SalahEldeen and M. L. Nelson. Losing my revolution: A year after the Egyptian
      revolution, 10% of the social media documentation is gone. http://ws-
      dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html.




Hany SalahEldeen & Michael Nelson                         Doctoral Consortium
References
 •    D. Antoniades, I. Polakis, G. Kontaxis, E. Athanasopoulos, S. Ioannidis, E. P. Markatos, and T. Karagiannis. we.b: the web of short
      urls. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 715 {724, New York, NY, USA,
      2011. ACM.
 •    A. Ashkan, C. L. Clarke, E. Agichtein, and Q. Guo. Classifying and characterizing query intent. In Proceedings of the 31th
      European Conference on IR Research on Advances in Information Retrieval, ECIR '09, pages 578{586, Berlin, Heidelberg, 2009.
      Springer-Verlag.
 •    L. Azzopardi and M. de Rijke. Query intention acquisition: A case study on automatically inferring structured queries. In
      Proceedings DIR-2006, 2006.
 •    R. Baeza-Yates, L. Calderon-Benavides, and C. Gonzalez-Caro. The intention behind web queries. In F. Crestani, P. Ferragina, and
      M. Sanderson, editors, String Processing and Information Retrieval, volume 4209 of Lecture Notes in Computer Science, pages
      98{109. Springer Berlin / Heidelberg, 2006. 10.1007/11880561 9.
 •    A. Benczur, I. Bro, K. Csalogany, and T. Sarlos. Web spam detection via commercial intent analysis. In Proceedings of the 3rd
      international workshop on Adversarial information retrieval on the web, AIRWeb '07, pages 89{92, New York, NY, USA, 2007.
      ACM.
 •    J. Bollen, H. Mao, and X.-J. Zeng. Twitter mood predicts the stock market. CoRR, abs/1010.3003, 2010.
 •    N. Dai, X. Qi, and B. D. Davison. Bridging link and query intent to enhance web search. In Proceedings of the 22nd ACM
      conference on Hypertext and hypermedia, HT '11, pages 17{26, New York, NY, USA, 2011. ACM.
 •    N. Dai, X. Qi, and B. D. Davison. Enhancing web search with entity intent. In Proceedings of the 20 th international conference
      companion on World wide web, WWW '11, pages 29{30, New York, NY, USA, 2011. ACM.
 •    K. Durant and M. Smith. Predicting the political sentiment of web log posts using supervised machine learning techniques
      coupled with feature selection. In O. Nasraoui, M. Spiliopoulou, J. Srivastava, B. Mobasher, and B. Masand, editors, Advances in
      Web Mining and Web Usage Analysis, volume 4811 of Lecture Notes in Computer Science, pages 187{206. Springer Berlin /
      Heidelberg, 2007. 10.1007/978-3-540-77485-3 11.



Hany SalahEldeen & Michael Nelson                                                        Doctoral Consortium
References
 •    Q. Guo and E. Agichtein. Ready to buy or just browsing?: detecting web searcher goals from interaction data. In Proceedings of the 33rd
      international ACM SIGIR conference on Research and development in information retrieval, SIGIR '10, pages 130{137, New York, NY, USA,
      2010. ACM.
 •    A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th
      WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, WebKDD/SNA-KDD '07, pages 56{65, New York, NY,
      USA, 2007. ACM.
 •    H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international
      conference on World wide web, WWW '10, pages 591{600, New York, NY, USA, 2010. ACM.
 •    C.-H. L. Lee and A. Liu. Modeling the query intention with goals. In Proceedings of the 19th International Conference on Advanced
      Information Networking and Applications - Volume 2, AINA '05, pages 535{540, Washington, DC, USA, 2005. IEEE Computer Society.
 •    A. Loser, W. M. Barczynski, and F. Brauer. What's the intention behind your query? a few observations from a large developer community.
      In IRSW, 2008.
 •    F. McCown, N. Diawara, and M. L. Nelson. Factors aecting website reconstruction from the web infrastructure. In JCDL '07: Proceedings of
      the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, pages 39{48, 2007.
 •    B. Meeder, B. Karrer, A. Sayedi, R. Ravi, C. Borgs, and J. Chayes. We know who you followed last summer: inferring social link creation times
      in twitter. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 517{526, New York, NY, USA, 2011.
      ACM.
 •    G. Mishne. Predicting movie sales from blogger sentiment. In In AAAI 2006 Spring Symposium on Computational Approaches to Analysing
      Weblogs (AAAI-CAAW), 2006.
 •    M. L. Nelson and B. D. Allen. Object persistence and availability in digital libraries. D-Lib Magazine, 8(1), 2002.
 •    R. Sanderson, M. Phillips, and H. Van de Sompel. Analyzing the persistence of referenced web resources with memento. CoRR,
      abs/1105.3459, 2011.
 •    H. Van de Sompel, M. L. Nelson, R. Sanderson, L. Balakireva, S. Ainsworth, and H. Shankar. Memento: Time travel for the web. CoRR,
      abs/0911.1112, 2009.
 •    S. Wu, J. M. Hofman, W. A. Mason, and D. J. Watts. Who says what to whom on twitter. In Proceedings of the 20th international
      conference on World wide web, WWW '11, pages 705{714, New York, NY, USA, 2011. ACM.



Hany SalahEldeen & Michael Nelson                                                              Doctoral Consortium

Contenu connexe

Similaire à Tpdl Doctoral consortium 2012

Social Media Recruiting
Social Media RecruitingSocial Media Recruiting
Social Media RecruitingJoseph Fung
 
Facilitating user research - being aware of bias and techniques to overcome it
Facilitating user research - being aware of bias and techniques to overcome itFacilitating user research - being aware of bias and techniques to overcome it
Facilitating user research - being aware of bias and techniques to overcome itNicola Dobiecka
 
Reading the Correct History? Modeling Temporal Intention in Resource Sharing
Reading the Correct History? Modeling Temporal Intention in Resource SharingReading the Correct History? Modeling Temporal Intention in Resource Sharing
Reading the Correct History? Modeling Temporal Intention in Resource Sharingheinestien
 
Kelleher s power_point
Kelleher s power_pointKelleher s power_point
Kelleher s power_pointskelleher2011
 
21st Century Influencer: Finding the Vital Behaviors to Flatten Your Classroom
21st Century Influencer: Finding the Vital Behaviors to Flatten Your Classroom21st Century Influencer: Finding the Vital Behaviors to Flatten Your Classroom
21st Century Influencer: Finding the Vital Behaviors to Flatten Your ClassroomVicki Davis
 
Creativity presentation 2015
Creativity presentation 2015Creativity presentation 2015
Creativity presentation 2015tosheilajones
 
Herdsa.2013.wicked issues.bubbles
Herdsa.2013.wicked issues.bubblesHerdsa.2013.wicked issues.bubbles
Herdsa.2013.wicked issues.bubblesMerilyn Childs
 
Social Media and Faith Organisations
Social Media and Faith OrganisationsSocial Media and Faith Organisations
Social Media and Faith OrganisationsBex Lewis
 
TED Talks for Transformative Teaching Part 2
TED Talks for Transformative Teaching Part 2TED Talks for Transformative Teaching Part 2
TED Talks for Transformative Teaching Part 2Lisa Rubenstein
 
Launch to New Heights Using Social Media
Launch to New Heights Using Social MediaLaunch to New Heights Using Social Media
Launch to New Heights Using Social MediaStephanie Schierholz
 
Facing up to Frustration: Taking Control of Learning
Facing up to Frustration: Taking Control of LearningFacing up to Frustration: Taking Control of Learning
Facing up to Frustration: Taking Control of LearningLaura Sagert
 
The Social Media Triage - Maximising your presence
The Social Media Triage - Maximising your presenceThe Social Media Triage - Maximising your presence
The Social Media Triage - Maximising your presenceCharles Darwin University
 
Facilitating the Adult Learner
Facilitating the Adult LearnerFacilitating the Adult Learner
Facilitating the Adult LearnerMarian Willeke
 
Deep learning in the Age of Distraction
Deep learning in the Age of DistractionDeep learning in the Age of Distraction
Deep learning in the Age of DistractionAlec Couros
 
2012 NWA Social Media Boot Camp Sponsored by The Weather Channel
2012 NWA Social Media Boot Camp Sponsored by The Weather Channel2012 NWA Social Media Boot Camp Sponsored by The Weather Channel
2012 NWA Social Media Boot Camp Sponsored by The Weather ChannelTiffany Sunday
 
CNW Presents... The New PR: Creating & Curating Trusted Content from @CraigSi...
CNW Presents... The New PR: Creating & Curating Trusted Content from @CraigSi...CNW Presents... The New PR: Creating & Curating Trusted Content from @CraigSi...
CNW Presents... The New PR: Creating & Curating Trusted Content from @CraigSi...CNW Group
 
Jasmine townsendsece275powerpoint
Jasmine townsendsece275powerpointJasmine townsendsece275powerpoint
Jasmine townsendsece275powerpointCMoondog
 

Similaire à Tpdl Doctoral consortium 2012 (20)

Social Media Recruiting
Social Media RecruitingSocial Media Recruiting
Social Media Recruiting
 
Facilitating user research - being aware of bias and techniques to overcome it
Facilitating user research - being aware of bias and techniques to overcome itFacilitating user research - being aware of bias and techniques to overcome it
Facilitating user research - being aware of bias and techniques to overcome it
 
Reading the Correct History? Modeling Temporal Intention in Resource Sharing
Reading the Correct History? Modeling Temporal Intention in Resource SharingReading the Correct History? Modeling Temporal Intention in Resource Sharing
Reading the Correct History? Modeling Temporal Intention in Resource Sharing
 
Kelleher s power_point
Kelleher s power_pointKelleher s power_point
Kelleher s power_point
 
21st Century Influencer: Finding the Vital Behaviors to Flatten Your Classroom
21st Century Influencer: Finding the Vital Behaviors to Flatten Your Classroom21st Century Influencer: Finding the Vital Behaviors to Flatten Your Classroom
21st Century Influencer: Finding the Vital Behaviors to Flatten Your Classroom
 
Creativity presentation 2015
Creativity presentation 2015Creativity presentation 2015
Creativity presentation 2015
 
Herdsa.2013.wicked issues.bubbles
Herdsa.2013.wicked issues.bubblesHerdsa.2013.wicked issues.bubbles
Herdsa.2013.wicked issues.bubbles
 
Social Media and Faith Organisations
Social Media and Faith OrganisationsSocial Media and Faith Organisations
Social Media and Faith Organisations
 
TED Talks for Transformative Teaching Part 2
TED Talks for Transformative Teaching Part 2TED Talks for Transformative Teaching Part 2
TED Talks for Transformative Teaching Part 2
 
Launch to New Heights Using Social Media
Launch to New Heights Using Social MediaLaunch to New Heights Using Social Media
Launch to New Heights Using Social Media
 
Facing up to Frustration: Taking Control of Learning
Facing up to Frustration: Taking Control of LearningFacing up to Frustration: Taking Control of Learning
Facing up to Frustration: Taking Control of Learning
 
The Social Media Triage - Maximising your presence
The Social Media Triage - Maximising your presenceThe Social Media Triage - Maximising your presence
The Social Media Triage - Maximising your presence
 
Facilitating the Adult Learner
Facilitating the Adult LearnerFacilitating the Adult Learner
Facilitating the Adult Learner
 
Deep learning in the Age of Distraction
Deep learning in the Age of DistractionDeep learning in the Age of Distraction
Deep learning in the Age of Distraction
 
2012 NWA Social Media Boot Camp Sponsored by The Weather Channel
2012 NWA Social Media Boot Camp Sponsored by The Weather Channel2012 NWA Social Media Boot Camp Sponsored by The Weather Channel
2012 NWA Social Media Boot Camp Sponsored by The Weather Channel
 
CNW Presents... The New PR: Creating & Curating Trusted Content from @CraigSi...
CNW Presents... The New PR: Creating & Curating Trusted Content from @CraigSi...CNW Presents... The New PR: Creating & Curating Trusted Content from @CraigSi...
CNW Presents... The New PR: Creating & Curating Trusted Content from @CraigSi...
 
David kenneth waldman_dissertation_june_2_2011
David kenneth waldman_dissertation_june_2_2011David kenneth waldman_dissertation_june_2_2011
David kenneth waldman_dissertation_june_2_2011
 
Critical thinking
Critical thinkingCritical thinking
Critical thinking
 
Social Media Critique
Social Media CritiqueSocial Media Critique
Social Media Critique
 
Jasmine townsendsece275powerpoint
Jasmine townsendsece275powerpointJasmine townsendsece275powerpoint
Jasmine townsendsece275powerpoint
 

Tpdl Doctoral consortium 2012

  • 1. Detecting, Modeling, & Predicting User Temporal Intention in Social Media Hany M. SalahEldeen Old Dominion University Department of Computer Science Advisor: Dr. Michael L. Nelson TPDL ‘12 Doctoral Consortium Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 2. Let’s breakdown the title first… Detecting, Modeling, & Predicting User Temporal Intention in Social Media Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 3. Let’s breakdown the title first… Detecting, Modeling, & Predicting User Temporal Intention in Social Media Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 4. Let’s breakdown the title first… Detecting, Modeling, & Predicting User Temporal Intention in Social Media Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 5. Scenario 1: Jenny reading Jeff’s tweets Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 6. Michael Jackson Dies Snapshot on: June 25th 2009 http://web.archive.org/web/20090625232522/http://www.cnn.com/ Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 7. Jeff tweets about it… Published on: June 25th 2009 https://twitter.com/mdnitehk/status/2333993907 Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 8. Jenny is off the grid… Jeff’s friend Jenny was on a vacation in Hawaii for a month Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 9. Jenny starts catching up a month later When she came back she checked Jeff’s tweets and was shocked! Read on: July26th 2009 https://twitter.com/mdnitehk/status/2333993907 Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 10. Jenny follows the link on July 26th She quickly clicked on the link in the tweet… CNN page on: July 26th 2009 http://web.archive.org/web/20090726234411/http://www.cnn.com/ Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 11. Jenny is confused! • Implication: • Jenny thought Jeff is making a joke about her favorite singer and she got mad at him • Problem: • The tweet and the resource the tweet links to have become unsynchronized. Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 12. Scenario 2: The Egyptian Revolution Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 13. The Egyptian Revolution Jan 2011 Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 14. Reading about it in Storify.com a year later in March 2012 http://storify.com/maq4sure/egypts-revolution Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 15. I noticed some shared images are missing http://storify.com/maq4sure/egypts-revolution Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 16. Some tweets are still intact https://twitter.com/miss_amy_qb/status/32477898581483521 Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 17. …and some lost their meaning with the disappearance of the images https://twitter.com/aishes/status/32485352102952960 Missing ? https://twitter.com/omar_chaaban/status/32203697597452289 Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 18. The tweet remains but the shared image disappeared… http://yfrog.com/h5923xrvbqqvgzj Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 19. Cairo….we have a problem! • Implication: • The reader cannot understand what the author of the tweet meant because the image is not available. • Problem: • The post is available but the linked resource (image) is completely missing. Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 20. …back to the title Detecting, Modeling, & Predicting User Temporal Intention in Social Media Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 21. …back to the title Detecting, Modeling, & Predicting User Temporal Intention in Social Media Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 22. The Anatomy of a Tweet Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 23. The Anatomy of a Tweet Author’s username Other user mention Social Post Tweet Body Interaction Publishing Shortened URL Hash Tag options timestamp to resource Shared Resource Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 24. 3 URIs = 3 Chances to fail Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 25. Explanation in MJ’s example t3 t4 t5 t7 t8 t9 … tn t1 t2 t6
  • 26. User’s Temporal Intention The Focus of our research Instrumented shortener Share time Implicit Explicit Click time Implicit Explicit Instrumented web client Out of our scope Purview of Facebook, Engineering problem Twitter, Google, …etc Solved by providing tools Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 27. Sometimes you want a previous version The Correct Temporal Intention CNN.com at the closest time to the tweet: 25th June 2009 ~ 7pm Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 28. Sometimes you want the current version The Correct Temporal Intention In this case the current state of the press releases page Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 29. Research Question Can we estimate the users’ intention at the time of posting and reading to predict and maintain temporal consistency? Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 30. Research Goals • Detect the temporal intention of the: 1. Author upon sharing time 2. The reader upon dereferencing time • Model this intention as a function of time, nature of the resource, and its context. • Predict how resources change with time and the intention behind sharing them to minimize inconsistency. • Implement the prediction model to automatically preserve vulnerable social content that is prone to change or loss. • Create an environment implementing this framework that provides a smooth temporal navigation of the social web. Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 31. Related Work • User’s Web Search Intention • Persistence of shared resources – A. Ashkan ECIR ’09 – M. Nelson D-Lib ‘02 – C. Lee AINA ‘05 – R. Sanderson OR’11 – A. Loser IRSW ‘08 – F. McCown JCDL ‘07 – L. Azzopardi ECIR ‘09 – R. Baeza-Yates SPIR‘06 – N. Dai HT ’11 • URL Shortening – D. Antoniades WWW ’11 • Commercial Intention – Q. Guo SIGIR ’10 • Tweeting, Micro-blogging and Popularity – A. Benczur AIRWeb ’07 – S. Wu WWW ’11 – A. Java SNA-KDD ’07 • Sentiment Analysis – H. Kwak WWW ’10 – G. Mishne AAAI ‘06 – J. Bollen JCS ‘11 • Social Networks Growth and Evolution • Access to Archives – B. Meeder WWW ’11 – H. Van de Sompel OR‘09 Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 32. Dissertation Plan BEGIN Read Literature Collect Datasets Analyze Archives Coverage Analyze Shortened URIs Prototype Application Analyze Shared Resources Persistence and Coverage Current Analyze Contextual Intention State Create Intention-based dataset Extract Intention Features Train a Parametric Model to predict intention Evaluate, test, cross-validate the model Create a mockup application Extend the model to induce preservation Finish Writing the Dissertation PhD Defense Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 33. Dissertation Plan BEGIN Read Literature Collect Datasets Analyze Archives Coverage Analyze Shortened URIs Prototype Application Analyze Shared Resources Persistence and Coverage Analyze Contextual Intention Create Intention-based dataset Extract Intention Features Train a Parametric Model to predict intention Evaluate, test, cross-validate the model Create a mockup application Extend the model to induce preservation Finish Writing the Dissertation PhD Defense Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 34. Estimating Web Archiving Coverage • Goal: Estimate how much of the public web is present in the public archives and how many copies are available? • Action: – Getting 4 different datasets from 4 different sources: • Search Engines Indices • Bit.ly • DMOZ • Delicious. • Results: * * Table Courtesy of Ahmed AlSum JCDL 2011 • Publications: – How much of the web is archived? JCDL '11 Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 35. Dissertation Plan BEGIN Read Literature Collect Datasets Analyze Archives Coverage Analyze Shortened URIs Prototype Application Analyze Shared Resources Persistence and Coverage Analyze Contextual Intention Create Intention-based dataset Extract Intention Features Train a Parametric Model to predict intention Evaluate, test, cross-validate the model Create a mockup application Extend the model to induce preservation Finish Writing the Dissertation PhD Defense Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 36. Shortened URI analysis • Goal: Have a better understanding of URI shortening and resolving, understand the effect of time on this process and the correlation between the page’s features and characteristics, and its resolution. • Action: – Fresh Bit.lys – Get hourly clicklogs, rate of change, social networking spread, and other contextual information – Longitudinal study • Evaluation: – Compare results with frequency of change analysis of Cho and Garcia- Molina. – Compare results with Antoniades et al. WWW 2011. Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 37. Dissertation Plan BEGIN Read Literature Collect Datasets Analyze Archives Coverage Analyze Shortened URIs Prototype Application Analyze Shared Resources Persistence and Coverage Analyze Contextual Intention Create Intention-based dataset Extract Intention Features Train a Parametric Model to predict intention Evaluate, test, cross-validate the model Create a mockup application Extend the model to induce preservation Finish Writing the Dissertation PhD Defense Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 38. Estimating Loss of Shared Resources in Social Media • Goal: Estimate how much of the public web is present in the public archives and how many copies are available? • Action: – Sampling from 6 public events – Events spanning 3 years – Existence in the current web – Existence in the public archives – Find relation with time • Results: – After 1st year ~11% will be lost – After that we will continue on losing 0.02% daily • Publications: – A year after the Egyptian revolution, 10% of the social media documentation is gone. http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html – Losing my revolution: How Many Resources Shared on Social Media Have Been Lost? TPDL '12 Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 39. Dissertation Plan BEGIN Read Literature Collect Datasets Analyze Archives Coverage Analyze Shortened URIs Prototype Application Analyze Shared Resources Persistence and Coverage User Intention Analysis Create Intention-based dataset Extract Intention Features Train a Parametric Model to predict intention Evaluate, test, cross-validate the model Create a mockup application Extend the model to induce preservation Finish Writing the Dissertation PhD Defense Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 40. User Intention Analysis • Goal: Have a better understanding of User Intention and what factors affect it. Also create a new testing and training set. • Action: – Get a sample set of tweets selected at random – Extract the URIs – Get closest Memento – Download the snapshot & current version – Use Amazon’s Mechanical Turk in choosing the best version • Evaluation: – Measure cross-rater agreement and confidence. Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 41. Proposed Work • Data Gathering • Feature Extraction • Modeling the intention engine • Evaluation • Application: Prediction and Preservation Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 42. Possible Solution for Jenny Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 43. Possible Solution for Jenny The resource has changed since last time it was shared Do you wish to see the version the author intended or the current version? Current Version Intended Version Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 44. Proposed Framework Archived Version Feature Classifier Extraction Example Features: Current Version - Tweet Content - Click Logs - Other Tweets - Shared Resource - Timemaps Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 46. Extra Slides Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 47. Archive Shortener Application Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 48. Estimating Shared Resources Loss in Social Media Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 49. Estimating Shared Resources Loss in Social Media Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 50. My Publications • S. G. Ainsworth, A. Alsum, H. SalahEldeen, M. C. Weigle, and M. L. Nelson. How much of the web is archived? In Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries, JCDL '11, pages 133{136, 2011. • H. SalahEldeen and M. L. Nelson. Losing my revolution: How much social media content has been lost? Accepted in TPDL 2012 • H. SalahEldeen and M. L. Nelson. Losing my revolution: A year after the Egyptian revolution, 10% of the social media documentation is gone. http://ws- dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html. Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 51. References • D. Antoniades, I. Polakis, G. Kontaxis, E. Athanasopoulos, S. Ioannidis, E. P. Markatos, and T. Karagiannis. we.b: the web of short urls. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 715 {724, New York, NY, USA, 2011. ACM. • A. Ashkan, C. L. Clarke, E. Agichtein, and Q. Guo. Classifying and characterizing query intent. In Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, ECIR '09, pages 578{586, Berlin, Heidelberg, 2009. Springer-Verlag. • L. Azzopardi and M. de Rijke. Query intention acquisition: A case study on automatically inferring structured queries. In Proceedings DIR-2006, 2006. • R. Baeza-Yates, L. Calderon-Benavides, and C. Gonzalez-Caro. The intention behind web queries. In F. Crestani, P. Ferragina, and M. Sanderson, editors, String Processing and Information Retrieval, volume 4209 of Lecture Notes in Computer Science, pages 98{109. Springer Berlin / Heidelberg, 2006. 10.1007/11880561 9. • A. Benczur, I. Bro, K. Csalogany, and T. Sarlos. Web spam detection via commercial intent analysis. In Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, AIRWeb '07, pages 89{92, New York, NY, USA, 2007. ACM. • J. Bollen, H. Mao, and X.-J. Zeng. Twitter mood predicts the stock market. CoRR, abs/1010.3003, 2010. • N. Dai, X. Qi, and B. D. Davison. Bridging link and query intent to enhance web search. In Proceedings of the 22nd ACM conference on Hypertext and hypermedia, HT '11, pages 17{26, New York, NY, USA, 2011. ACM. • N. Dai, X. Qi, and B. D. Davison. Enhancing web search with entity intent. In Proceedings of the 20 th international conference companion on World wide web, WWW '11, pages 29{30, New York, NY, USA, 2011. ACM. • K. Durant and M. Smith. Predicting the political sentiment of web log posts using supervised machine learning techniques coupled with feature selection. In O. Nasraoui, M. Spiliopoulou, J. Srivastava, B. Mobasher, and B. Masand, editors, Advances in Web Mining and Web Usage Analysis, volume 4811 of Lecture Notes in Computer Science, pages 187{206. Springer Berlin / Heidelberg, 2007. 10.1007/978-3-540-77485-3 11. Hany SalahEldeen & Michael Nelson Doctoral Consortium
  • 52. References • Q. Guo and E. Agichtein. Ready to buy or just browsing?: detecting web searcher goals from interaction data. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, SIGIR '10, pages 130{137, New York, NY, USA, 2010. ACM. • A. Java, X. Song, T. Finin, and B. Tseng. Why we twitter: understanding microblogging usage and communities. In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, WebKDD/SNA-KDD '07, pages 56{65, New York, NY, USA, 2007. ACM. • H. Kwak, C. Lee, H. Park, and S. Moon. What is twitter, a social network or a news media? In Proceedings of the 19th international conference on World wide web, WWW '10, pages 591{600, New York, NY, USA, 2010. ACM. • C.-H. L. Lee and A. Liu. Modeling the query intention with goals. In Proceedings of the 19th International Conference on Advanced Information Networking and Applications - Volume 2, AINA '05, pages 535{540, Washington, DC, USA, 2005. IEEE Computer Society. • A. Loser, W. M. Barczynski, and F. Brauer. What's the intention behind your query? a few observations from a large developer community. In IRSW, 2008. • F. McCown, N. Diawara, and M. L. Nelson. Factors aecting website reconstruction from the web infrastructure. In JCDL '07: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, pages 39{48, 2007. • B. Meeder, B. Karrer, A. Sayedi, R. Ravi, C. Borgs, and J. Chayes. We know who you followed last summer: inferring social link creation times in twitter. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 517{526, New York, NY, USA, 2011. ACM. • G. Mishne. Predicting movie sales from blogger sentiment. In In AAAI 2006 Spring Symposium on Computational Approaches to Analysing Weblogs (AAAI-CAAW), 2006. • M. L. Nelson and B. D. Allen. Object persistence and availability in digital libraries. D-Lib Magazine, 8(1), 2002. • R. Sanderson, M. Phillips, and H. Van de Sompel. Analyzing the persistence of referenced web resources with memento. CoRR, abs/1105.3459, 2011. • H. Van de Sompel, M. L. Nelson, R. Sanderson, L. Balakireva, S. Ainsworth, and H. Shankar. Memento: Time travel for the web. CoRR, abs/0911.1112, 2009. • S. Wu, J. M. Hofman, W. A. Mason, and D. J. Watts. Who says what to whom on twitter. In Proceedings of the 20th international conference on World wide web, WWW '11, pages 705{714, New York, NY, USA, 2011. ACM. Hany SalahEldeen & Michael Nelson Doctoral Consortium