SlideShare une entreprise Scribd logo
1  sur  40
Crawling Big Data in a New Frontier
   for Socioeconomic Research:
    Testing with Social Tagging
     JUAN DIEGO BORRERO, jdiego@uhu.es
      ESTRELLA GUALDA, estrella@uhu.es
                University of Huelva



       Seminários CIEO - Universidade do Algarve
                Faro, 31 October, 2012        1
Table of Contents
• 1. Introduction                • 3. Methodology
• 2. Theoretical perspective        – 3.1. Data Collection
                                      procedure
   – Web 2.0 and Collaborative      – 3.2. Analysis procedure.
     tagging                          SNA
   – Tagging and Folksonomy      • 4. Results
   – The collective knowledge       – 4.1. Centralization:
     inherent in social tags          Authority
   – Tagging and Social             – 4.2. Node Tags: Users
     networks                         producing Tags
   – Social Web and its impact   • 5. Discussion
     on Information Retrieval       – 5.1. Centrality and Power
     (IR) and Recommender           – 5.2. Central Tags: Users
     Systems (RS)                     producing Tags
                                 • 6. Conclusions and future
                                   research
                                                2
1. Introduction
What puzzles?


 1. The era of Big Data and Social Media has begun!
     E.g., Twitter, Facebook, Tumbrl, Delicious, Youtube,
     Flickr, Wikipedia…
 2. Will it transform how we study human communication
    and social relations?
 3. Will it alter what ‘research’ means?
     Some or all of the above?




                                            3
1. Introduction
What puzzles?

 1. Big Data is notable not because of its size, but
    because of its relationality to other data. Big Data is
    fundamentally networked. Its value comes from the
    patterns that can be derived by making connections
    between pieces of data, about an individual, about
    individuals in relation to others, about groups of
    people, or simply about the structure of information
    itself.
 2. Big Data is important because it refers to an analytic
    phenomenon playing out in academia.
 3. Big data is important because of its popular
    salience.
                                               4
1. Introduction
Tagging

 • New technologies have made it possible for
   a wide range of people to produce, share,
   interact with, and organize data.
 • People can classify the huge amount of
   information at her/his disposal in the form of
   tags.




                                       5
1. Introduction
Tagging in Delicious
Keywords
freely
chosen by
users
employed
to
annotate
various
types of
digital
content, or
suggested
by
Delicious

                             6
 Source: www.delicious.com
1. Introduction
  Social Tagging Systems
                                                                   Many users add metadata in
                                                                              the form of tags



                                                                                                             Source: http://bvdt.tuxic.nl/index.php/the-wisdom-of-
                                                                                                             the-crowds-in-the-audiovisual-archive-domain/


                                                                                     Resulting collective tag
                                                                                                    structure




Source: http://blog.hubspot.com/blog/tabid/6307/bid/7372/9-Reasons-Why-
Your-Social-Media-Strategy-Isn-t-Working.aspx/




                                                                                                                  7
                                                                                     Source: http://www.idonato.com/2009/05/27/fun-with-tag-clouds/
1. Introduction
Delicious


Delicious is a
free social
bookmarking
website for
storing,
sharing and
discovering
web
bookmarks

                                8
    Source: www.delicious.com
1. Introduction
Our Assumption

 • Big Data offers the humanistic disciplines a new
   way to work in the quantitative side and it also offers
   other kind of objective method for analysis.
 • Although in reality, working with Big Data is still
   subjective.
 • Due to this, it is crucial to begin asking questions
   about the analytic assumptions, methodological
   frameworks, and underlying biases embedded in
   the Big Data phenomenon.


                                              9
1. Introduction
Our Objectives

 1. Proposing a methodology to use big data
    from Web 2.0 in social research,
 2. Applying it to extract automatically data from
    Delicious social bookmarking website, and
 3. To show the type of results that this kind of
    analysis can offer to social scientists.
 4. We focus our study in globalization
    agriculture community, and pay special
    attention to SNA
                                        10
2. Theoretical perspective
Web 2.0… and collaborative tagging
Web 2.0 is the business
revolution in the computer
industry caused by the move to
the Internet as platform, and an
attempt to understand the rules
for success on that new
platform (O’Reilly, 2007)
Collaborative – or social –
tagging is the activity in the
Web 2.0 of annotating digital
resources with keywords - tags
(Golder and Huberman, 2006;
Trant, 2009).
                                   Source: http://www.laurenwood.org/anyway/2007/11/web-20-buzzwords/




                                                                      11
2. Theoretical perspective
… collaborative tagging
Collaborative – or social – tagging is the activity in the
Web 2.0 of annotating digital resources with keywords -
tags (Golder and Huberman, 2006; Trant, 2009).

                                                                     Webpages,
                                                                       photos,
                                                                      videos…




    A collaborative tagging system is mainly composed of three interconnected components
                             users, tags, and resources
                                   (Smith, 2008)
                                                                         12
2. Theoretical perspective
… collaborative tagging and folksonomy


Social tagging
systems
aggregate the
tags of all
users and
describe the
resources in a
so-called
folksonomy
(Vander Wal,
2004)
                 problems   Synonyms           global warming = climate change
                            Terms variations   globalization = globalisation
                                                           poor=poors
                                                             13
2. Theoretical perspective
… folksonomy and collective knowledge
             Bottom-up
              process…
                         …the tags of many different users
                         are aggregated and the resulting
                         collective tag structure
                         – such as tag cloud – depicts the
                         collective knowledge of Web users
                         (Cress et al., 2012)




                                           14
                                Source: http://blog.cimmyt.org/?p=6052
2. Theoretical perspective
Tagging and social networks
The structure of Social tagging websites can be viewed as a
network of three different node types: the U users, the R
resources (web sites – URLs) and the T tags that the U users
deploy to tag the R web sites.           Figure 1. A Bipartite Network made of three users U=(u,u’,u’’),
                                                        three tags T=(t,t’,t’’) and two kinds of links: between users RU
                                                        (straight lines), and between users and tags RT (dashed lines)

A particular class of networks
is the bipartite networks,
whose nodes are divided into
two sets –e.g. users and tags.

An opinion network (Maslov
and Zhang, 2001; Blattner et
al., 2007), is a network in
which users connect to the
objects that they gather.                                                          15
                                                        Source: Authors
2. Theoretical perspective
Social web and its impact on Information
Retrieval (IR) and Recommender Systems (RS)

 1. From Social IR point of view -i.e. IR that uses
    folksonomies- IT creates algorithms for folksonomies
    in order to identify which information is relevant and to
    identify communities to their need, this paper aims to
    exhibit a methodology to retrieve big data from Web
    2.0 environment.
 2. We introduce social tagging as basis for
    recommendations focused into a ternary relation
    between users, resources, and tags, to discover latent
    patterns links to the activity of collaborative tagging,
    which could be basic in order to provide effective
    recommendations to different actors.
                                                16
3. Methodology

• Data set from: Delicious – www.delicious.com
  –.
• Delicious = social bookmarking system whose
  – Content is created, annotated and viewed by its
    users.
  – Non-hierarchical classification system: users can tag
    each of their bookmarks on the Delicious website,
    and provides knowledge about the URL marked
  – Collective nature:
     • view bookmarks added or annotated by other users.
     • organize existing tags into groups (tag bundles).


                                                17
3.1. Data Collection procedure

Collected annotations made in Social Bookmarking Services.
At least four parts:
• 1. Link to the resource (website…)
• 2. One or more tags
• 3. User who makes the annotation
• 4. Moment/ time when the annotation is made

•     This article focus more on the co-occurrence of users, resources
      and tags (user, resource, tag).

    Dataset collected : U = {u1; u2; : : : ; uK}, R = {r1; r2; : : ; rM}, and T =
                                  {t1; t2; : : ; tN}




                                                               18
3.1. Process to retrieve the data
                                                               Figure 2. Data Collection Procedure
(A) Start point. Identify the search attributes.
Authoritative source as baseline to find keywords
connected to the idea of ‘globalization of agriculture’
     –   Wikipedia definition of “critics of globalization
         (popular, high reputation)
     –   Other starts points (future)
     –   Selected (manually= researcher expertise) main
         concepts from the website homepages, tag clouds or
         topics.
     –   Identified the 5 seed keywords (globalization +
         agriculture, food, organic, and GMO)
     –   Other concepts rejected

(B) With a Perl program web-crawling was made,
gathering the sample of users, URLs and tags
     -    For globalization+agriculture; globalization+food;
         globalization+organic; globalization+GMO
     -   22 April 2011 and 21 May 2011 (one completed
         month)                                                Source: Authors
     -   Results: 10,220 taggings that involved 851 users on
         1,077 URLs and 1,720 tags.

(C) Program in Haskell to reduce the amount of data
by cutting the URLs and using key words, including the
identification of synonyms, the elimination of words with                                 (D) Dataset for
capital letters and derivatives such as words in plural.                                  analysis


                                                                                     19
Example: final dataset




Source: Authors
                      526 urls   1,700 tags   20 users
                                               851
Table 1. Keywords Used in the topic
                     “Globalization of agriculture”
     Search attributes        Number of          More frequent Tags
          used              resulting tags                /
                                (I+II)               Main Tags

      Globalization (I) +       1,116        Food (268), economics (176),
       agriculture (II)                         environment (145), politics
                                                     (85), trade (81),
                                                    sustainability (70)
      Globalization (I) +       1,682         Economy (180), economics
          food (II)                             (171), environment (122),
                                                sustainability (78), politics
                                                           (60)
      Globalization (I) +        22           Business (3), fair-trade (3)
         organic (II)

      Globalization (I) +        54           Food (13), agriculture (12)
          GMO (II)
                                                             21
Source: Authors
3.2. Analysis procedure: SNA
Network analysis

•     Node centrality: identification of the nodes that are more “central” than
      others
      Network level property = idea of the node’s social power based on how well it
      “connects” to the network.

•     Degree of a node = Number of direct connections individuals have with
      others in the group
      Highest degree = exerts influence (or authority).

       In-degree = number of incoming ties that reflect the popularity of a website. As a
          result, the prominent, well-connected members (those with a high degree of
          centrality) are usually the opinion leaders.

       Out-degree = number of outgoing ties which determine if a particular user is an
         active or passive participant within the network.

    Software Pajek (big series of data): Delicious bookmarking system’s user
       is simply using Delicious, latent structures, power that emerges from
                                    the network…
                                                            22
Figure 3. Hyperlink Network Energy Kamada-Kawai Map.
                             Bipartite Network userurl




Source: Authors by Pajek



                                                        23
Results 4.1. Centralization (Authority)
Centralization: userURL

URL’s Indegree: Sum of total inbound links
User’s Outdegree: Sum of the total outbound links

Network highly centralized within a few nodes:

Only 10 URLs from 526 (1.90%) account for 32.29% links to URLs.
10 URLs got 3,290 inbound links from a total of 10,219.

Only 10 users from 851 (1.17%) account for 14.05% links to URLs.
These 10 users produced 1,436 outbound links from a total of 10,219.

10 most centralized websites. Nine of them were media-based (online newpapers such as
   The New York Times, BBC, The Guardian, Washington Post, Financial Times, Reason,
   The Nation, Spiegel and The Economist) (Table 2)

Identification of Users with a greater degree of centrality.
   Mritiunjoy user play a very important role in the network.
   Mritiunjoy joined to Delicious on 12 march, 2007 and to the date he has 10,020 links and
   is following 38 users.
   Mritiunjoy Mohanty - is a professor at the Indian Institute of Management Calcutta, India
   and his Research Interests are Political Economy of growth and development.
                                                                     24
Table 2. Top Authoritative Sites in the
                   hyperlink network
                                Indegree                      Outdegree
    1             1203 http://www.nytimes.com/          433 /mritiunjoy
    2              674 http://news.bbc.co.uk/           195 /laura208
    3              365 http://www.guardian.co.uk/       127 /rd108
    4              186 http://www.washingtonpost.com/   112 /amaah
    5              158 http://www.ft.com/               111 /thepouncer
    6              154 http://www.reason.com/           100 /anilius
    7              147 http://www.thenation.com/        100 /emmarlyb
    8              137 http://www.spiegel.de/            87 /adorngeography
    9              136 http://www.foodfirst.org/         86 /pagolnari
  10               130 http://www.economist.com/         85 /freemanlc
Source: Authors




                                                                 25
Figure 4. user-user Unipartite Network Energy Kamada-Kawai Map
                 Degree Cut-off = 1. Size: Degree




Source: Authors by Pajek



                                               26
Figure 5. user-user Unipartite Network Energy Kamada-Kawai Map

                    Degree Cut-off = 30. Nodes = 211. Size: Betweeness




Source: Authors by Pajek



                                                             27
Figure 6. user-user Unipartite Network Energy Kamada-Kawai Map

                       Degree Cut-off = 30. Nodes = 211. Size: Closeness




Source: Authors by Pajek



                                                               28
Figure 7. user-user Unipartite Network Energy Kamada-Kawai Map

                           Degree Cut-off = 30. Nodes = 211. Size: Degree




Source: Authors by Pajek



                                                                  29
Figure 8. Hyperlink Network. 851 users arranged in rank order by
    number of outbound links and 1,077 URLs arranged in rank order
                       by number of inbound links




Source: Authors




       Why?/ How come that a few users and websites are better connected
                              than the majority?

                                                          30
Value of identified nodes (websites) due to:
• The links that they receive (its
  instrumental nature)
• The profile of these organizations
  (newspapers that channel big quantities of
  resources – information) (quality of the
  links) = central URLs with authority.




                                        31
Results. 4.2. Node Tags: Users producing Tags

• Collective tag structure (excluded the key
  search words, such as globalization, agriculture,
  food and organic, and GMO) produced with
  Wordle.
• Sizes of the terms in the tag clouds are
  proportional to the weights - the top 25
  highest weighted tags.
• Tag clouds: identifying the topical groupings in
  a tag network
  – Identification of topics around globalization of
    agriculture
                                              32
Figure 9. Tag Cloud for Agriculture Globalization
         Network Identified on the delicious Data Set




   Source: Authors by wordle




Resulting main key topics were economics and the environment
Main keywords used by users to describe or characterise in Delicious the topic
‘globalization of agriculture’.




                                                            33
50 more frequent TAGS. Tags used more than 20 times
Economics            350    World           47      BBC             30
Environment          274    Global          46      Future          30
Sustainability       153    Capitalism      45      Geography       30
Politics             152    Green           43      Water           30
Economy              144    Research        42      Nutrition       29
Trade                131    Crisis          41      Government      27
Business              99    International   41      Wto             27
Poverty               97    Oil             38      Agribusiness    26
Culture               84    Prices          37      Ecology         25
Farming               84    Activism        35      Europe          25
Africa                83    News            35      Globalwarming   23
Health                78    Science         35      Reference       22
Development           76    Hunger          34      Technology      22
Energy                76    Usa             34      Biofuel         21
India                 65    Inflation       32      Corporations    21
China                 59    History         31      Farmers         21
Policy                55    Local           31         34
Discussion: 5.1. Centrality and Power
New York Times in this network of globalization of agriculture in Delicious
  surpasses by far other URLs (with 1,203 inbound links, followed by BBC
  website with 674 ones).
        Most cited, recommended or considered websites with regards to a topic
        occupy a central place and have an important role in the process of
        dissemination of news, events, trending topics, ideology, culture and etcetera.

Identification of key collective actors (represented here through URLs) allows
   a better comprehension of leadership, influence process, and power-
   related structures.

For social practitioners, is a good way to identify key informants in a
   community through whom disseminating useful and important information.

Very inequal distribution of power of the URLs cited by users in the topic
   globalization of agriculture.
    -   Important accumulation of inlinks.



               ADVANTAGES OF THIS TYPE OF KNOWLEDGE
                 FOR RESEARCHING AND INTERVENING
                                                                   35
Discussion. 5.1. Centrality and Power
• FOCUS ON Users: identification of key actors that
  disseminate and share URLs, as the previously cited
  Mritiunjoy
   – Determine from where key elements that structure the network
     emerge.
• Why ‘that’ so important actor in the network of
  globalization of agriculture?
   – Key actors in this type of network could configure and
     reconfigure the evolution of the network (TIME), and
     structure and even manipulate the type of interchange of
     resources in Delicious or in similar bookmarking sites.
• Is it by chance? Are most prominent actors in a type of
  website like Delicious corresponding to a profile of very
  active and participative people? Do they usually work
  (or have as hobby) in this area and this is why
  accumulate and tag so many URLs in Delicious?
   – Further steps of the research.

                                                  36
5.2. Central Tags: Users producing Tags
•   Tags suggested by the website + Added new tags in a creative way
•   ‘Tag cloud’: visual approach to the language used by users
•   From a total of 1700 tags two words were the main ones.
•   Each user could label a URL with an unlimited number of tags
    (average 12 tags per user, max 433 and min 2).
•   Most frequently tags used were the words: ‘economics’ (350 citations
    out of 1700 tags -20.6%-) and ‘environment’ (273, 16%).
•   Other very frequent tags were also sustainability (153), politics (152),
    economy (144), trade (131), business (99), poverty (97), culture (84),
    farming (84), africa (83), health (78), and development (76),
    representing these 13 tags in relatives terms one out of four
    labelled tags around the topic (25,9%).

Questions:
• Reasons of the prominence of the two first tags around the
  globalization of agriculture.
• Are some of the 1700 found tags used in a interchangeable basis?
     – Why sometimes the word economics is used sometimes, and why other
       times is used economy?
     – Are they used in the same way at classifying the URLs?
                                                           37
Conclusions: achieved goals

• Presenting this methodology to use big data from Web 2.0 in
  socioeconomic research, and the illustration from a social
  bookmarking site (Delicious) is:
• A first step towards the development of empirical techniques
  capable of automatically differentiating groups of
  individuals with common interests, and individuals who
  occupy a more central position.
• First stone in the difficult process of understanding and
  discovering patterns in the process that characterize users
  tagging URLs for collaborative reasons.
• Utility: Discovering latent patterns = provide effective
  recommendations to different actors.
• Understanding the community of more than a thousand links.
• Retrieval and analysis of information: complex but easy =
  working in interdisciplary teams

                                              38
Other topics for Researching: Future
•   Improvements are necessary regarding in retrieval methods and the
    implementation of Information Retrieval and Recommender Systems
    techniques
•   Influence of first tags on the following ones. Role of innovation and
    creativity at tagging
•   Evolution and usage of language around an issue along time.
•   Ideological and terminological approaches in the national/ international
    arena
•   Use of some tags at classifying URLs and the distinction among users in
    the way they use some words/tags
     – Distinction between scientifics/ other professionals or users?
     – Identify users with the same patterns at tagging, or URLs that were similarly
       labelled: study structural equivalences
•   Other possible studies based in retrieving the pages and making content
    analysis
•   Why some labels are present/ absent?
•   Are there “traditions”/ “fashions” at tagging in the Web 2.0?
•   Comparing results from Delicious and from other social bookmarking sites
•   Go in-depth about users (if possible)
•   And other explorations, other starting points, other bookmarking sites, other
    indicators, complementary to those used in this illustration
                                                               39
Possible Applications
•   Producing and manipulating public opinion (at recommending and
    describing websites) and markets
    – If we know the interests of users belonging to a network, we could also be
      able to make recommendations
•   Recommender Systems, changes into a ternary relation between
    users, resources, and tags, more complex to manage.
•   Important for researchers interested in formulating strategies for
    intervention and mobilisation, but also practitioners, and
    companies could make use of this.
•   The discovering of the central elements in a network (users and
    URLs), at the same time that the tags used by users could be key to
    design future strategies for the dissemination of messages and to
    achieve more success in the communications, making use of
    important keywords, for instance, to atract more attention, etc.
•   Implementation of Information Retrieval and Recommender Systems
    techniques in social commerce and social media contexts.
•   Applications in advertising, mobilising, etc.
•   Security, Social Studies, Market studies, consumers
•   Time: longitudinal analysis
•   Etcétera
                                                           40

Contenu connexe

Tendances

Indexing presentation 2013 06-04
Indexing presentation 2013 06-04Indexing presentation 2013 06-04
Indexing presentation 2013 06-04Louise Spiteri
 
Breaking Down Walls in Enterprise with Social Semantics
Breaking Down Walls in Enterprise with Social SemanticsBreaking Down Walls in Enterprise with Social Semantics
Breaking Down Walls in Enterprise with Social SemanticsJohn Breslin
 
Interlinking Online Communities and Enriching Social Software with the Semant...
Interlinking Online Communities and Enriching Social Software with the Semant...Interlinking Online Communities and Enriching Social Software with the Semant...
Interlinking Online Communities and Enriching Social Software with the Semant...John Breslin
 
Learning with facebook sandra perusch_slideshare
Learning with facebook sandra perusch_slideshareLearning with facebook sandra perusch_slideshare
Learning with facebook sandra perusch_slideshareSandra Sabitzer
 
Aiim Webinar Helen Mitchell Unified Search Final 7 21 2010
Aiim Webinar Helen Mitchell  Unified Search Final 7 21 2010Aiim Webinar Helen Mitchell  Unified Search Final 7 21 2010
Aiim Webinar Helen Mitchell Unified Search Final 7 21 2010Helen Mitchell
 
Web 2.0
Web 2.0Web 2.0
Web 2.0bjornh
 
Pratt Sils LIS653 4 Fall 2007
Pratt Sils LIS653 4 Fall 2007Pratt Sils LIS653 4 Fall 2007
Pratt Sils LIS653 4 Fall 2007PrattSILS
 
Towards enhanced user interaction to qualify web resources for higher-layered...
Towards enhanced user interaction to qualify web resources for higher-layered...Towards enhanced user interaction to qualify web resources for higher-layered...
Towards enhanced user interaction to qualify web resources for higher-layered...Monika Steinberg
 
Summary of my Doctoral Research, Interests
Summary of my Doctoral Research, InterestsSummary of my Doctoral Research, Interests
Summary of my Doctoral Research, InterestsMeena Nagarajan
 
The Social Web : SCAmore
The Social Web : SCAmoreThe Social Web : SCAmore
The Social Web : SCAmoreJISC Netskills
 
DMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research WeekDMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research WeekCarly Strasser
 
Chapter 4 open, social and participatory media
Chapter 4 open, social and participatory mediaChapter 4 open, social and participatory media
Chapter 4 open, social and participatory mediaGrainne Conole
 
Harvesting Intelligence from User Interactions
Harvesting Intelligence from User Interactions Harvesting Intelligence from User Interactions
Harvesting Intelligence from User Interactions R A Akerkar
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic WebJohn Breslin
 
Data Accessibility and Me: Introducing SIOC, FOAF and the Linked Data Web
Data Accessibility and Me: Introducing SIOC, FOAF and the Linked Data WebData Accessibility and Me: Introducing SIOC, FOAF and the Linked Data Web
Data Accessibility and Me: Introducing SIOC, FOAF and the Linked Data WebJohn Breslin
 
Workshop on Social Bookmarking
Workshop on Social BookmarkingWorkshop on Social Bookmarking
Workshop on Social BookmarkingDaniel Churchill
 
Knowledge Management
Knowledge ManagementKnowledge Management
Knowledge ManagementBarbora P
 
SRS presentation
SRS presentationSRS presentation
SRS presentationslavaxx
 

Tendances (19)

Indexing presentation 2013 06-04
Indexing presentation 2013 06-04Indexing presentation 2013 06-04
Indexing presentation 2013 06-04
 
Breaking Down Walls in Enterprise with Social Semantics
Breaking Down Walls in Enterprise with Social SemanticsBreaking Down Walls in Enterprise with Social Semantics
Breaking Down Walls in Enterprise with Social Semantics
 
Interlinking Online Communities and Enriching Social Software with the Semant...
Interlinking Online Communities and Enriching Social Software with the Semant...Interlinking Online Communities and Enriching Social Software with the Semant...
Interlinking Online Communities and Enriching Social Software with the Semant...
 
Learning with facebook sandra perusch_slideshare
Learning with facebook sandra perusch_slideshareLearning with facebook sandra perusch_slideshare
Learning with facebook sandra perusch_slideshare
 
Gic2011 aula10-ingles
Gic2011 aula10-inglesGic2011 aula10-ingles
Gic2011 aula10-ingles
 
Aiim Webinar Helen Mitchell Unified Search Final 7 21 2010
Aiim Webinar Helen Mitchell  Unified Search Final 7 21 2010Aiim Webinar Helen Mitchell  Unified Search Final 7 21 2010
Aiim Webinar Helen Mitchell Unified Search Final 7 21 2010
 
Web 2.0
Web 2.0Web 2.0
Web 2.0
 
Pratt Sils LIS653 4 Fall 2007
Pratt Sils LIS653 4 Fall 2007Pratt Sils LIS653 4 Fall 2007
Pratt Sils LIS653 4 Fall 2007
 
Towards enhanced user interaction to qualify web resources for higher-layered...
Towards enhanced user interaction to qualify web resources for higher-layered...Towards enhanced user interaction to qualify web resources for higher-layered...
Towards enhanced user interaction to qualify web resources for higher-layered...
 
Summary of my Doctoral Research, Interests
Summary of my Doctoral Research, InterestsSummary of my Doctoral Research, Interests
Summary of my Doctoral Research, Interests
 
The Social Web : SCAmore
The Social Web : SCAmoreThe Social Web : SCAmore
The Social Web : SCAmore
 
DMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research WeekDMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research Week
 
Chapter 4 open, social and participatory media
Chapter 4 open, social and participatory mediaChapter 4 open, social and participatory media
Chapter 4 open, social and participatory media
 
Harvesting Intelligence from User Interactions
Harvesting Intelligence from User Interactions Harvesting Intelligence from User Interactions
Harvesting Intelligence from User Interactions
 
The Social Semantic Web
The Social Semantic WebThe Social Semantic Web
The Social Semantic Web
 
Data Accessibility and Me: Introducing SIOC, FOAF and the Linked Data Web
Data Accessibility and Me: Introducing SIOC, FOAF and the Linked Data WebData Accessibility and Me: Introducing SIOC, FOAF and the Linked Data Web
Data Accessibility and Me: Introducing SIOC, FOAF and the Linked Data Web
 
Workshop on Social Bookmarking
Workshop on Social BookmarkingWorkshop on Social Bookmarking
Workshop on Social Bookmarking
 
Knowledge Management
Knowledge ManagementKnowledge Management
Knowledge Management
 
SRS presentation
SRS presentationSRS presentation
SRS presentation
 

En vedette

La sociedad de los poetas muertos
La sociedad de los poetas muertos La sociedad de los poetas muertos
La sociedad de los poetas muertos Nayecocom
 
Production knowledge imass-olhao_24-4-2014_en
Production knowledge imass-olhao_24-4-2014_enProduction knowledge imass-olhao_24-4-2014_en
Production knowledge imass-olhao_24-4-2014_enBO TRUE ACTIVITIES SL
 
Social Movements on the Internet: Together Alone or Alone Together?
Social Movements on the Internet: Together Alone or Alone Together?Social Movements on the Internet: Together Alone or Alone Together?
Social Movements on the Internet: Together Alone or Alone Together?BO TRUE ACTIVITIES SL
 
The Adoption of Social Network Sites for Expressive Participation in Internet...
The Adoption of Social Network Sites for Expressive Participation in Internet...The Adoption of Social Network Sites for Expressive Participation in Internet...
The Adoption of Social Network Sites for Expressive Participation in Internet...BO TRUE ACTIVITIES SL
 
La influencia de las Redes Sociales en la política 06 11-2012
La influencia de las Redes Sociales en la política 06 11-2012La influencia de las Redes Sociales en la política 06 11-2012
La influencia de las Redes Sociales en la política 06 11-2012BO TRUE ACTIVITIES SL
 
Tim Lang, Rode Hoed, October 2009
Tim Lang, Rode Hoed, October 2009 Tim Lang, Rode Hoed, October 2009
Tim Lang, Rode Hoed, October 2009 dickveerman
 
Ponencia Congreso Andaluz Sociología, Almeria 25.11.2016 Social media el quin...
Ponencia Congreso Andaluz Sociología, Almeria 25.11.2016 Social media el quin...Ponencia Congreso Andaluz Sociología, Almeria 25.11.2016 Social media el quin...
Ponencia Congreso Andaluz Sociología, Almeria 25.11.2016 Social media el quin...BO TRUE ACTIVITIES SL
 
Hyperlink Formation in Social Bookmarking Systems: Who is Who Online?
Hyperlink Formation in Social Bookmarking Systems: Who is Who Online?Hyperlink Formation in Social Bookmarking Systems: Who is Who Online?
Hyperlink Formation in Social Bookmarking Systems: Who is Who Online?BO TRUE ACTIVITIES SL
 
More than fifteen agricultural business opportunities
More than fifteen agricultural business opportunitiesMore than fifteen agricultural business opportunities
More than fifteen agricultural business opportunitiesBO TRUE ACTIVITIES SL
 
Competencias claves para las organizaciones
Competencias claves para las organizacionesCompetencias claves para las organizaciones
Competencias claves para las organizacionesBO TRUE ACTIVITIES SL
 
Really do university students believe that facebook is a useful tool to mobil...
Really do university students believe that facebook is a useful tool to mobil...Really do university students believe that facebook is a useful tool to mobil...
Really do university students believe that facebook is a useful tool to mobil...BO TRUE ACTIVITIES SL
 

En vedette (16)

Imagine...
Imagine...Imagine...
Imagine...
 
La sociedad de los poetas muertos
La sociedad de los poetas muertos La sociedad de los poetas muertos
La sociedad de los poetas muertos
 
Production knowledge imass-olhao_24-4-2014_en
Production knowledge imass-olhao_24-4-2014_enProduction knowledge imass-olhao_24-4-2014_en
Production knowledge imass-olhao_24-4-2014_en
 
Tips to reach the tipping point
Tips to reach the tipping pointTips to reach the tipping point
Tips to reach the tipping point
 
Social Movements on the Internet: Together Alone or Alone Together?
Social Movements on the Internet: Together Alone or Alone Together?Social Movements on the Internet: Together Alone or Alone Together?
Social Movements on the Internet: Together Alone or Alone Together?
 
Work cathy 2012v2
Work cathy 2012v2Work cathy 2012v2
Work cathy 2012v2
 
The Adoption of Social Network Sites for Expressive Participation in Internet...
The Adoption of Social Network Sites for Expressive Participation in Internet...The Adoption of Social Network Sites for Expressive Participation in Internet...
The Adoption of Social Network Sites for Expressive Participation in Internet...
 
Spanish revolution 23 4-2014 en
Spanish revolution 23 4-2014 enSpanish revolution 23 4-2014 en
Spanish revolution 23 4-2014 en
 
La influencia de las Redes Sociales en la política 06 11-2012
La influencia de las Redes Sociales en la política 06 11-2012La influencia de las Redes Sociales en la política 06 11-2012
La influencia de las Redes Sociales en la política 06 11-2012
 
4bklas
4bklas4bklas
4bklas
 
Tim Lang, Rode Hoed, October 2009
Tim Lang, Rode Hoed, October 2009 Tim Lang, Rode Hoed, October 2009
Tim Lang, Rode Hoed, October 2009
 
Ponencia Congreso Andaluz Sociología, Almeria 25.11.2016 Social media el quin...
Ponencia Congreso Andaluz Sociología, Almeria 25.11.2016 Social media el quin...Ponencia Congreso Andaluz Sociología, Almeria 25.11.2016 Social media el quin...
Ponencia Congreso Andaluz Sociología, Almeria 25.11.2016 Social media el quin...
 
Hyperlink Formation in Social Bookmarking Systems: Who is Who Online?
Hyperlink Formation in Social Bookmarking Systems: Who is Who Online?Hyperlink Formation in Social Bookmarking Systems: Who is Who Online?
Hyperlink Formation in Social Bookmarking Systems: Who is Who Online?
 
More than fifteen agricultural business opportunities
More than fifteen agricultural business opportunitiesMore than fifteen agricultural business opportunities
More than fifteen agricultural business opportunities
 
Competencias claves para las organizaciones
Competencias claves para las organizacionesCompetencias claves para las organizaciones
Competencias claves para las organizaciones
 
Really do university students believe that facebook is a useful tool to mobil...
Really do university students believe that facebook is a useful tool to mobil...Really do university students believe that facebook is a useful tool to mobil...
Really do university students believe that facebook is a useful tool to mobil...
 

Similaire à Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with Social Tagging

Ieml social recommendersystems
Ieml social recommendersystemsIeml social recommendersystems
Ieml social recommendersystemsAntonio Medina
 
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...ijscai
 
Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...
Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...
Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...IJSCAI Journal
 
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...ijscai
 
Immersion Program Presentation Web2
Immersion Program Presentation   Web2Immersion Program Presentation   Web2
Immersion Program Presentation Web2Rick Reo
 
FACILITATING VIDEO SOCIAL MEDIA SEARCH USING SOCIAL-DRIVEN TAGS COMPUTING
FACILITATING VIDEO SOCIAL MEDIA SEARCH USING SOCIAL-DRIVEN TAGS COMPUTINGFACILITATING VIDEO SOCIAL MEDIA SEARCH USING SOCIAL-DRIVEN TAGS COMPUTING
FACILITATING VIDEO SOCIAL MEDIA SEARCH USING SOCIAL-DRIVEN TAGS COMPUTINGcsandit
 
Interactive Innovation Through Social Software And Web 2.0
Interactive Innovation Through Social Software And Web 2.0Interactive Innovation Through Social Software And Web 2.0
Interactive Innovation Through Social Software And Web 2.0Thomas Ryberg
 
Erindi fyrir samfelagsfraediikennara_feb2012
Erindi fyrir samfelagsfraediikennara_feb2012Erindi fyrir samfelagsfraediikennara_feb2012
Erindi fyrir samfelagsfraediikennara_feb2012Sólveig Jakobsdóttir
 
Open-Ed 2011 Conference - Barcelona, Spain
Open-Ed 2011 Conference - Barcelona, SpainOpen-Ed 2011 Conference - Barcelona, Spain
Open-Ed 2011 Conference - Barcelona, SpainAnna De Liddo
 
Survey of data mining techniques for social
Survey of data mining techniques for socialSurvey of data mining techniques for social
Survey of data mining techniques for socialFiras Husseini
 
The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...tobold
 
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...Sebastian Dennerlein
 
Evolution of social networks based on tagging practices
Evolution of social networks based on tagging practicesEvolution of social networks based on tagging practices
Evolution of social networks based on tagging practicesJPINFOTECH JAYAPRAKASH
 
Tags, Networks, Narrative
Tags, Networks, NarrativeTags, Networks, Narrative
Tags, Networks, NarrativeBruce Mason
 
myExperiment @ Nettab
myExperiment @ NettabmyExperiment @ Nettab
myExperiment @ NettabDuncan Hull
 

Similaire à Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with Social Tagging (20)

188-tagging.pdf
188-tagging.pdf188-tagging.pdf
188-tagging.pdf
 
sm@jgc Session Three
sm@jgc Session Threesm@jgc Session Three
sm@jgc Session Three
 
Social search
Social searchSocial search
Social search
 
Ieml social recommendersystems
Ieml social recommendersystemsIeml social recommendersystems
Ieml social recommendersystems
 
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
 
Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...
Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...
Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...
 
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
 
Jx2517481755
Jx2517481755Jx2517481755
Jx2517481755
 
Jx2517481755
Jx2517481755Jx2517481755
Jx2517481755
 
Immersion Program Presentation Web2
Immersion Program Presentation   Web2Immersion Program Presentation   Web2
Immersion Program Presentation Web2
 
FACILITATING VIDEO SOCIAL MEDIA SEARCH USING SOCIAL-DRIVEN TAGS COMPUTING
FACILITATING VIDEO SOCIAL MEDIA SEARCH USING SOCIAL-DRIVEN TAGS COMPUTINGFACILITATING VIDEO SOCIAL MEDIA SEARCH USING SOCIAL-DRIVEN TAGS COMPUTING
FACILITATING VIDEO SOCIAL MEDIA SEARCH USING SOCIAL-DRIVEN TAGS COMPUTING
 
Interactive Innovation Through Social Software And Web 2.0
Interactive Innovation Through Social Software And Web 2.0Interactive Innovation Through Social Software And Web 2.0
Interactive Innovation Through Social Software And Web 2.0
 
Erindi fyrir samfelagsfraediikennara_feb2012
Erindi fyrir samfelagsfraediikennara_feb2012Erindi fyrir samfelagsfraediikennara_feb2012
Erindi fyrir samfelagsfraediikennara_feb2012
 
Open-Ed 2011 Conference - Barcelona, Spain
Open-Ed 2011 Conference - Barcelona, SpainOpen-Ed 2011 Conference - Barcelona, Spain
Open-Ed 2011 Conference - Barcelona, Spain
 
Survey of data mining techniques for social
Survey of data mining techniques for socialSurvey of data mining techniques for social
Survey of data mining techniques for social
 
The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...The Social Semantic Server: A Flexible Framework to Support Informal Learning...
The Social Semantic Server: A Flexible Framework to Support Informal Learning...
 
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
The Social Semantic Server - A Flexible Framework to Support Informal Learnin...
 
Evolution of social networks based on tagging practices
Evolution of social networks based on tagging practicesEvolution of social networks based on tagging practices
Evolution of social networks based on tagging practices
 
Tags, Networks, Narrative
Tags, Networks, NarrativeTags, Networks, Narrative
Tags, Networks, Narrative
 
myExperiment @ Nettab
myExperiment @ NettabmyExperiment @ Nettab
myExperiment @ Nettab
 

Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with Social Tagging

  • 1. Crawling Big Data in a New Frontier for Socioeconomic Research: Testing with Social Tagging JUAN DIEGO BORRERO, jdiego@uhu.es ESTRELLA GUALDA, estrella@uhu.es University of Huelva Seminários CIEO - Universidade do Algarve Faro, 31 October, 2012 1
  • 2. Table of Contents • 1. Introduction • 3. Methodology • 2. Theoretical perspective – 3.1. Data Collection procedure – Web 2.0 and Collaborative – 3.2. Analysis procedure. tagging SNA – Tagging and Folksonomy • 4. Results – The collective knowledge – 4.1. Centralization: inherent in social tags Authority – Tagging and Social – 4.2. Node Tags: Users networks producing Tags – Social Web and its impact • 5. Discussion on Information Retrieval – 5.1. Centrality and Power (IR) and Recommender – 5.2. Central Tags: Users Systems (RS) producing Tags • 6. Conclusions and future research 2
  • 3. 1. Introduction What puzzles? 1. The era of Big Data and Social Media has begun! E.g., Twitter, Facebook, Tumbrl, Delicious, Youtube, Flickr, Wikipedia… 2. Will it transform how we study human communication and social relations? 3. Will it alter what ‘research’ means? Some or all of the above? 3
  • 4. 1. Introduction What puzzles? 1. Big Data is notable not because of its size, but because of its relationality to other data. Big Data is fundamentally networked. Its value comes from the patterns that can be derived by making connections between pieces of data, about an individual, about individuals in relation to others, about groups of people, or simply about the structure of information itself. 2. Big Data is important because it refers to an analytic phenomenon playing out in academia. 3. Big data is important because of its popular salience. 4
  • 5. 1. Introduction Tagging • New technologies have made it possible for a wide range of people to produce, share, interact with, and organize data. • People can classify the huge amount of information at her/his disposal in the form of tags. 5
  • 6. 1. Introduction Tagging in Delicious Keywords freely chosen by users employed to annotate various types of digital content, or suggested by Delicious 6 Source: www.delicious.com
  • 7. 1. Introduction Social Tagging Systems Many users add metadata in the form of tags Source: http://bvdt.tuxic.nl/index.php/the-wisdom-of- the-crowds-in-the-audiovisual-archive-domain/ Resulting collective tag structure Source: http://blog.hubspot.com/blog/tabid/6307/bid/7372/9-Reasons-Why- Your-Social-Media-Strategy-Isn-t-Working.aspx/ 7 Source: http://www.idonato.com/2009/05/27/fun-with-tag-clouds/
  • 8. 1. Introduction Delicious Delicious is a free social bookmarking website for storing, sharing and discovering web bookmarks 8 Source: www.delicious.com
  • 9. 1. Introduction Our Assumption • Big Data offers the humanistic disciplines a new way to work in the quantitative side and it also offers other kind of objective method for analysis. • Although in reality, working with Big Data is still subjective. • Due to this, it is crucial to begin asking questions about the analytic assumptions, methodological frameworks, and underlying biases embedded in the Big Data phenomenon. 9
  • 10. 1. Introduction Our Objectives 1. Proposing a methodology to use big data from Web 2.0 in social research, 2. Applying it to extract automatically data from Delicious social bookmarking website, and 3. To show the type of results that this kind of analysis can offer to social scientists. 4. We focus our study in globalization agriculture community, and pay special attention to SNA 10
  • 11. 2. Theoretical perspective Web 2.0… and collaborative tagging Web 2.0 is the business revolution in the computer industry caused by the move to the Internet as platform, and an attempt to understand the rules for success on that new platform (O’Reilly, 2007) Collaborative – or social – tagging is the activity in the Web 2.0 of annotating digital resources with keywords - tags (Golder and Huberman, 2006; Trant, 2009). Source: http://www.laurenwood.org/anyway/2007/11/web-20-buzzwords/ 11
  • 12. 2. Theoretical perspective … collaborative tagging Collaborative – or social – tagging is the activity in the Web 2.0 of annotating digital resources with keywords - tags (Golder and Huberman, 2006; Trant, 2009). Webpages, photos, videos… A collaborative tagging system is mainly composed of three interconnected components users, tags, and resources (Smith, 2008) 12
  • 13. 2. Theoretical perspective … collaborative tagging and folksonomy Social tagging systems aggregate the tags of all users and describe the resources in a so-called folksonomy (Vander Wal, 2004) problems Synonyms global warming = climate change Terms variations globalization = globalisation poor=poors 13
  • 14. 2. Theoretical perspective … folksonomy and collective knowledge Bottom-up process… …the tags of many different users are aggregated and the resulting collective tag structure – such as tag cloud – depicts the collective knowledge of Web users (Cress et al., 2012) 14 Source: http://blog.cimmyt.org/?p=6052
  • 15. 2. Theoretical perspective Tagging and social networks The structure of Social tagging websites can be viewed as a network of three different node types: the U users, the R resources (web sites – URLs) and the T tags that the U users deploy to tag the R web sites. Figure 1. A Bipartite Network made of three users U=(u,u’,u’’), three tags T=(t,t’,t’’) and two kinds of links: between users RU (straight lines), and between users and tags RT (dashed lines) A particular class of networks is the bipartite networks, whose nodes are divided into two sets –e.g. users and tags. An opinion network (Maslov and Zhang, 2001; Blattner et al., 2007), is a network in which users connect to the objects that they gather. 15 Source: Authors
  • 16. 2. Theoretical perspective Social web and its impact on Information Retrieval (IR) and Recommender Systems (RS) 1. From Social IR point of view -i.e. IR that uses folksonomies- IT creates algorithms for folksonomies in order to identify which information is relevant and to identify communities to their need, this paper aims to exhibit a methodology to retrieve big data from Web 2.0 environment. 2. We introduce social tagging as basis for recommendations focused into a ternary relation between users, resources, and tags, to discover latent patterns links to the activity of collaborative tagging, which could be basic in order to provide effective recommendations to different actors. 16
  • 17. 3. Methodology • Data set from: Delicious – www.delicious.com –. • Delicious = social bookmarking system whose – Content is created, annotated and viewed by its users. – Non-hierarchical classification system: users can tag each of their bookmarks on the Delicious website, and provides knowledge about the URL marked – Collective nature: • view bookmarks added or annotated by other users. • organize existing tags into groups (tag bundles). 17
  • 18. 3.1. Data Collection procedure Collected annotations made in Social Bookmarking Services. At least four parts: • 1. Link to the resource (website…) • 2. One or more tags • 3. User who makes the annotation • 4. Moment/ time when the annotation is made • This article focus more on the co-occurrence of users, resources and tags (user, resource, tag). Dataset collected : U = {u1; u2; : : : ; uK}, R = {r1; r2; : : ; rM}, and T = {t1; t2; : : ; tN} 18
  • 19. 3.1. Process to retrieve the data Figure 2. Data Collection Procedure (A) Start point. Identify the search attributes. Authoritative source as baseline to find keywords connected to the idea of ‘globalization of agriculture’ – Wikipedia definition of “critics of globalization (popular, high reputation) – Other starts points (future) – Selected (manually= researcher expertise) main concepts from the website homepages, tag clouds or topics. – Identified the 5 seed keywords (globalization + agriculture, food, organic, and GMO) – Other concepts rejected (B) With a Perl program web-crawling was made, gathering the sample of users, URLs and tags - For globalization+agriculture; globalization+food; globalization+organic; globalization+GMO - 22 April 2011 and 21 May 2011 (one completed month) Source: Authors - Results: 10,220 taggings that involved 851 users on 1,077 URLs and 1,720 tags. (C) Program in Haskell to reduce the amount of data by cutting the URLs and using key words, including the identification of synonyms, the elimination of words with (D) Dataset for capital letters and derivatives such as words in plural. analysis 19
  • 20. Example: final dataset Source: Authors 526 urls 1,700 tags 20 users 851
  • 21. Table 1. Keywords Used in the topic “Globalization of agriculture” Search attributes Number of More frequent Tags used resulting tags / (I+II) Main Tags Globalization (I) + 1,116 Food (268), economics (176), agriculture (II) environment (145), politics (85), trade (81), sustainability (70) Globalization (I) + 1,682 Economy (180), economics food (II) (171), environment (122), sustainability (78), politics (60) Globalization (I) + 22 Business (3), fair-trade (3) organic (II) Globalization (I) + 54 Food (13), agriculture (12) GMO (II) 21 Source: Authors
  • 22. 3.2. Analysis procedure: SNA Network analysis • Node centrality: identification of the nodes that are more “central” than others Network level property = idea of the node’s social power based on how well it “connects” to the network. • Degree of a node = Number of direct connections individuals have with others in the group Highest degree = exerts influence (or authority). In-degree = number of incoming ties that reflect the popularity of a website. As a result, the prominent, well-connected members (those with a high degree of centrality) are usually the opinion leaders. Out-degree = number of outgoing ties which determine if a particular user is an active or passive participant within the network. Software Pajek (big series of data): Delicious bookmarking system’s user is simply using Delicious, latent structures, power that emerges from the network… 22
  • 23. Figure 3. Hyperlink Network Energy Kamada-Kawai Map. Bipartite Network userurl Source: Authors by Pajek 23
  • 24. Results 4.1. Centralization (Authority) Centralization: userURL URL’s Indegree: Sum of total inbound links User’s Outdegree: Sum of the total outbound links Network highly centralized within a few nodes: Only 10 URLs from 526 (1.90%) account for 32.29% links to URLs. 10 URLs got 3,290 inbound links from a total of 10,219. Only 10 users from 851 (1.17%) account for 14.05% links to URLs. These 10 users produced 1,436 outbound links from a total of 10,219. 10 most centralized websites. Nine of them were media-based (online newpapers such as The New York Times, BBC, The Guardian, Washington Post, Financial Times, Reason, The Nation, Spiegel and The Economist) (Table 2) Identification of Users with a greater degree of centrality. Mritiunjoy user play a very important role in the network. Mritiunjoy joined to Delicious on 12 march, 2007 and to the date he has 10,020 links and is following 38 users. Mritiunjoy Mohanty - is a professor at the Indian Institute of Management Calcutta, India and his Research Interests are Political Economy of growth and development. 24
  • 25. Table 2. Top Authoritative Sites in the hyperlink network Indegree Outdegree 1 1203 http://www.nytimes.com/ 433 /mritiunjoy 2 674 http://news.bbc.co.uk/ 195 /laura208 3 365 http://www.guardian.co.uk/ 127 /rd108 4 186 http://www.washingtonpost.com/ 112 /amaah 5 158 http://www.ft.com/ 111 /thepouncer 6 154 http://www.reason.com/ 100 /anilius 7 147 http://www.thenation.com/ 100 /emmarlyb 8 137 http://www.spiegel.de/ 87 /adorngeography 9 136 http://www.foodfirst.org/ 86 /pagolnari 10 130 http://www.economist.com/ 85 /freemanlc Source: Authors 25
  • 26. Figure 4. user-user Unipartite Network Energy Kamada-Kawai Map Degree Cut-off = 1. Size: Degree Source: Authors by Pajek 26
  • 27. Figure 5. user-user Unipartite Network Energy Kamada-Kawai Map Degree Cut-off = 30. Nodes = 211. Size: Betweeness Source: Authors by Pajek 27
  • 28. Figure 6. user-user Unipartite Network Energy Kamada-Kawai Map Degree Cut-off = 30. Nodes = 211. Size: Closeness Source: Authors by Pajek 28
  • 29. Figure 7. user-user Unipartite Network Energy Kamada-Kawai Map Degree Cut-off = 30. Nodes = 211. Size: Degree Source: Authors by Pajek 29
  • 30. Figure 8. Hyperlink Network. 851 users arranged in rank order by number of outbound links and 1,077 URLs arranged in rank order by number of inbound links Source: Authors Why?/ How come that a few users and websites are better connected than the majority? 30
  • 31. Value of identified nodes (websites) due to: • The links that they receive (its instrumental nature) • The profile of these organizations (newspapers that channel big quantities of resources – information) (quality of the links) = central URLs with authority. 31
  • 32. Results. 4.2. Node Tags: Users producing Tags • Collective tag structure (excluded the key search words, such as globalization, agriculture, food and organic, and GMO) produced with Wordle. • Sizes of the terms in the tag clouds are proportional to the weights - the top 25 highest weighted tags. • Tag clouds: identifying the topical groupings in a tag network – Identification of topics around globalization of agriculture 32
  • 33. Figure 9. Tag Cloud for Agriculture Globalization Network Identified on the delicious Data Set Source: Authors by wordle Resulting main key topics were economics and the environment Main keywords used by users to describe or characterise in Delicious the topic ‘globalization of agriculture’. 33
  • 34. 50 more frequent TAGS. Tags used more than 20 times Economics 350 World 47 BBC 30 Environment 274 Global 46 Future 30 Sustainability 153 Capitalism 45 Geography 30 Politics 152 Green 43 Water 30 Economy 144 Research 42 Nutrition 29 Trade 131 Crisis 41 Government 27 Business 99 International 41 Wto 27 Poverty 97 Oil 38 Agribusiness 26 Culture 84 Prices 37 Ecology 25 Farming 84 Activism 35 Europe 25 Africa 83 News 35 Globalwarming 23 Health 78 Science 35 Reference 22 Development 76 Hunger 34 Technology 22 Energy 76 Usa 34 Biofuel 21 India 65 Inflation 32 Corporations 21 China 59 History 31 Farmers 21 Policy 55 Local 31 34
  • 35. Discussion: 5.1. Centrality and Power New York Times in this network of globalization of agriculture in Delicious surpasses by far other URLs (with 1,203 inbound links, followed by BBC website with 674 ones). Most cited, recommended or considered websites with regards to a topic occupy a central place and have an important role in the process of dissemination of news, events, trending topics, ideology, culture and etcetera. Identification of key collective actors (represented here through URLs) allows a better comprehension of leadership, influence process, and power- related structures. For social practitioners, is a good way to identify key informants in a community through whom disseminating useful and important information. Very inequal distribution of power of the URLs cited by users in the topic globalization of agriculture. - Important accumulation of inlinks. ADVANTAGES OF THIS TYPE OF KNOWLEDGE FOR RESEARCHING AND INTERVENING 35
  • 36. Discussion. 5.1. Centrality and Power • FOCUS ON Users: identification of key actors that disseminate and share URLs, as the previously cited Mritiunjoy – Determine from where key elements that structure the network emerge. • Why ‘that’ so important actor in the network of globalization of agriculture? – Key actors in this type of network could configure and reconfigure the evolution of the network (TIME), and structure and even manipulate the type of interchange of resources in Delicious or in similar bookmarking sites. • Is it by chance? Are most prominent actors in a type of website like Delicious corresponding to a profile of very active and participative people? Do they usually work (or have as hobby) in this area and this is why accumulate and tag so many URLs in Delicious? – Further steps of the research. 36
  • 37. 5.2. Central Tags: Users producing Tags • Tags suggested by the website + Added new tags in a creative way • ‘Tag cloud’: visual approach to the language used by users • From a total of 1700 tags two words were the main ones. • Each user could label a URL with an unlimited number of tags (average 12 tags per user, max 433 and min 2). • Most frequently tags used were the words: ‘economics’ (350 citations out of 1700 tags -20.6%-) and ‘environment’ (273, 16%). • Other very frequent tags were also sustainability (153), politics (152), economy (144), trade (131), business (99), poverty (97), culture (84), farming (84), africa (83), health (78), and development (76), representing these 13 tags in relatives terms one out of four labelled tags around the topic (25,9%). Questions: • Reasons of the prominence of the two first tags around the globalization of agriculture. • Are some of the 1700 found tags used in a interchangeable basis? – Why sometimes the word economics is used sometimes, and why other times is used economy? – Are they used in the same way at classifying the URLs? 37
  • 38. Conclusions: achieved goals • Presenting this methodology to use big data from Web 2.0 in socioeconomic research, and the illustration from a social bookmarking site (Delicious) is: • A first step towards the development of empirical techniques capable of automatically differentiating groups of individuals with common interests, and individuals who occupy a more central position. • First stone in the difficult process of understanding and discovering patterns in the process that characterize users tagging URLs for collaborative reasons. • Utility: Discovering latent patterns = provide effective recommendations to different actors. • Understanding the community of more than a thousand links. • Retrieval and analysis of information: complex but easy = working in interdisciplary teams 38
  • 39. Other topics for Researching: Future • Improvements are necessary regarding in retrieval methods and the implementation of Information Retrieval and Recommender Systems techniques • Influence of first tags on the following ones. Role of innovation and creativity at tagging • Evolution and usage of language around an issue along time. • Ideological and terminological approaches in the national/ international arena • Use of some tags at classifying URLs and the distinction among users in the way they use some words/tags – Distinction between scientifics/ other professionals or users? – Identify users with the same patterns at tagging, or URLs that were similarly labelled: study structural equivalences • Other possible studies based in retrieving the pages and making content analysis • Why some labels are present/ absent? • Are there “traditions”/ “fashions” at tagging in the Web 2.0? • Comparing results from Delicious and from other social bookmarking sites • Go in-depth about users (if possible) • And other explorations, other starting points, other bookmarking sites, other indicators, complementary to those used in this illustration 39
  • 40. Possible Applications • Producing and manipulating public opinion (at recommending and describing websites) and markets – If we know the interests of users belonging to a network, we could also be able to make recommendations • Recommender Systems, changes into a ternary relation between users, resources, and tags, more complex to manage. • Important for researchers interested in formulating strategies for intervention and mobilisation, but also practitioners, and companies could make use of this. • The discovering of the central elements in a network (users and URLs), at the same time that the tags used by users could be key to design future strategies for the dissemination of messages and to achieve more success in the communications, making use of important keywords, for instance, to atract more attention, etc. • Implementation of Information Retrieval and Recommender Systems techniques in social commerce and social media contexts. • Applications in advertising, mobilising, etc. • Security, Social Studies, Market studies, consumers • Time: longitudinal analysis • Etcétera 40