SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
The bX project:
 Federating and Mining Usage Logs from Linking Servers


  Johan Bollen (1), Oren Beit-Arie (2), and Herbert Van de Sompel (1)

                Digital Library Research & Prototyping Team
              (1)

              Research Library, Los Alamos National Laboratory
                        (2) Ex Libris Inc., Boston, MA




jbollen@lanl.gov , oren@exlibris-usa.com , herbertv@lanl.gov



                       Acknowledgement: Marvin Pollard (CalState)




             The bX Project: Federating and Mining Usage Logs from Linking Servers
                      Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                   CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Outline



1.    Problem statement
2.    Analysis of local usage data
3.    Towards federated usage data
4.    Collaborating on the bX project
5.    Mining federated usage data
6.    Discussion
7.    Conclusion




  The bX Project: Federating and Mining Usage Logs from Linking Servers
           Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
        CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Outline



1.    Problem statement
2.    Analysis of local usage data
3.    Towards federated usage data
4.    Collaborating on the bX project
5.    Mining federated usage data
6.    Discussion
7.    Conclusion




  The bX Project: Federating and Mining Usage Logs from Linking Servers
           Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
        CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Scholarly evaluation in an electronic publishing paradigm

                                                                 Evaluation scholarly quality
•    Scholarly quality evaluated by citation
     counts                                                                                         paper paradigm
      o    Citable, published literature only
      o    Metrics: citation frequency                     electronic                        Articles, journals:
      o    Limited resources: what and how we
           count                                           paradigm                            Citation data

                                                                                      +
•    Electronic paradigm changes everything
      •    New models of communication:                               IR, Pre-print,               Citation metrics
            •  Everything will be published                       multimedia, raw data,
            •  No central vetting authority                           software, etc
      o    New models of scholarship
            o  Publish multimedia, raw data, software
      o    New metrics of evaluation?
                                                                                    ? metrics



                          The bX Project: Federating and Mining Usage Logs from Linking Servers
                                   Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                                CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Evaluation of resources: a user-driven revolution
                                         authors                                          Evaluation of resources (quality,
      scholarship evaluation now                                                              status, prestige) is required on all
                                                                                              levels of our digital infrastructure.

                                                                                          Trend:
                                                                                               1.  author -> user
                                                        Google’s PR                            2.  frequency -> structure
                   IF, citation counts
                                                         Technorati



frequentist                                                                         structural

                        Flickr.org                     Amazon.com
                      Slashdot.org                 Collaborative filtering




               novel methods of                            LANL experiments for scholarship evaluation
              resource evaluation                                        since 1999
                                          users


                                 The bX Project: Federating and Mining Usage Logs from Linking Servers
                                          Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                                       CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Outline



1.    Problem statement
2.    Analysis of local usage data
3.    Towards federated usage data
4.    Collaborating on the bX project
5.    Mining federated usage data
6.    Discussion
7.    Conclusion




  The bX Project: Federating and Mining Usage Logs from Linking Servers
           Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
        CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Scholarly evaluation: process flow for data analysis




     source            data              structure              metrics           evaluation


    Usage: user activity that expresses interest or preference
    Access data: particular instance(s) of usage (e.g. request abstract, download full-text)
    Co-access: repeated instances of users accessing same pairs of items (documents)
    Co-access graph: network of co-access data
    Social network metrics: prestige from network structure




          The bX Project: Federating and Mining Usage Logs from Linking Servers
                   Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Scholarly evaluation: mining usage data and deriving metrics
     Two essential components to move beyond descriptive usage stats:

     1)  Datamine usage patterns for networks of items relationships:
          •  Citation: when A cites B, A and B are related
          •  Usage: when A and B are frequently co-used, they are related




     2)  Structural analysis of resulting networks:
          •  Social network metrics of visibility (in-degree), prestige (PageRank), power
              (betweenness), etc
          •  Mapping techniques: multi-dimensional scaling, self-organizing maps
                                                                   • Kothari (2003). On using page cooccurence …
                                                                   • Kim (2004). A clickstream-based collaborative…
                                                                   • Sarwar (2001. Item-based collaborative filtering


               The bX Project: Federating and Mining Usage Logs from Linking Servers
                        Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                     CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
LANL experiments: demonstrating the power of usage data analysis


          •    LANL has been active in this area since early 1999
                o      Early analysis of LANL RL usage data (local) in 1999
                o      Extraction of item networks
                o      Calculation of impact metrics (social network approach)
          •    Preliminary success
                o      Demonstrated valid journal and article networks
                o      Surprising success in ranking of items according to institutional focus
                o      Discovery of hidden interest groups and focii
          •    Next two slides: recent results
                o      February 2004 to April 2005
                o      392,455 usage events: any indication of preferences/interest
                o      5,866 users
                o      330,109 articles
                o      10,695 journals
          •    See publication list at end for more information



                     The bX Project: Federating and Mining Usage Logs from Linking Servers
                              Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                           CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
A comparison of 2004 LANL usage data and citation Impact Factor

  rank   Usage (PageRank)          IF (2003)             ISSN                         Title
  1      60.196                    7.035                 0031-9007          PHYS REV LETT
  2      37.568                    2.950                 0021-9606          J CHEM PHYS
  3      34.618                    1.179                 0022-3115          J NUCL MATER
  4      31.132                    2.202                 1063-651X          PHYS REV E
  5      30.441                    2.171                 0021-8979          J APPL PHYS
  6      30.128                    30.979                0028-0836          NATURE
  7      29.972                    29.781                 0036-8075 SCIENCE
  8      27.187                    6.516                 0002-7863          J AM CHEM SOC
  9      24.602                    4.049                 0003-6951          APPL PHYS LETT
  10     23.631                    2.992                 0148-0227          J GEOPHYS RES

                                                                                              Green: convergent
                                                                                              Red: divergent

                  The bX Project: Federating and Mining Usage Logs from Linking Servers
                           Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                        CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Information landscapes




LANL 2004 Usage Data                              ISI Journal Citation Reports 2003

       •  Two component model
       •  Principal Component 1: Life vs. natural science
       •  Principal Component 2: Microscopic vs. macroscopic
       •  Z-axis: cluster density


         The bX Project: Federating and Mining Usage Logs from Linking Servers
                  Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
               CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Outline



1.    Problem statement
2.    Analysis of local usage data
3.    Towards federated usage data
4.    Collaborating on the bX project
5.    Mining federated usage data
6.    Discussion
7.    Conclusion




  The bX Project: Federating and Mining Usage Logs from Linking Servers
           Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
        CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
From local usage data to global usage data

                                                     •  Local usage is interesting
                                                            •  Informs local collection management
                                                            •  Prominent communities can inform
                                                            assessments of science trends
                                                            •  Covers wide range of
                                                            communication items
                                                            •  Immediate availability

                                                     •  Global, aggregated usage data is even
                                                     more interesting
                                                            •  Monitor science as it takes place
                                           LANL             •  Replace/augment/validate
                                                            proprietary data sets
                                                            •  Allow free-form aggregation:
                                                                   •  Clusters of institutions
    ISI                      Scholary                              •  Focus on sub-domains and
   core                    communication                           communities




   The bX Project: Federating and Mining Usage Logs from Linking Servers
            Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
         CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Local aggregation of usage data: linking servers

                           ISI

      Publisher                                  ingenta                       •    Linking servers can record
        sites
                                                                                    activities across multiple
                                                                                    OpenURL-enabled
                                                                                    information sources of a
                                                                                    specific digital library
                                                                                    environment
Full text                 Link                               Ehost
  DBs                    Resolver                            EJS               •    Linking server logs are
                                                                                    representative of the
                                                                                    activities of a particular user
                                                                                    population

                                                                               •    Allows recording of
       British                                       A&I                            clickstream data: other
       Library                                     service                          methods of log aggregation
                                                                                    can not connect “same user,
                        Google
                                                                                    different system” streams




                   The bX Project: Federating and Mining Usage Logs from Linking Servers
                            Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                         CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Global aggregation of usage data

              Log
           Repository 1

  Link                          Usage logs
Resolver

                                                                                          •       Aggregation of linking
                                                                                                  server logs leads to
                                                                                                  data set representative
              Log                                                                                 of large sample of
           Repository 2    Usage logs                     Aggregated
                                                          Usage Data                              scholarly community
                                                                                          •       Global really means
  Link
                                                           Aggregated                             different samples of
Resolver
                                                            Log DB
                                                              logs                                scholarly community
                                                                                                   •  Can be finetuned
                                                                                                         for local
              Log
           Repository 3                                                                                  communities
                                                                                                   •  Possibility of truly
                                   Usage logs
                                                                                                         global coverage
  Link
Resolver




                          The bX Project: Federating and Mining Usage Logs from Linking Servers
                                   Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                                CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Analysis and services based on global usage data

              Log
           Repository 1

  Link                          Usage logs
Resolver




              Log
           Repository 2   Usage logs                      Aggregated
                                                          Usage Data
                                                                                       Data                    Metrics
                                                                                                    Item
  Link
                                                           Aggregated                  mining     relations
Resolver
                                                            Log DB                                             Services
                                                              logs

                                                                                                   •  Recommender Services
              Log                                                                                  •  Analysis services
           Repository 3                                                                            •  Collection management
                                  Usage logs                                                       •  Trend analysis
  Link
Resolver




                          The bX Project: Federating and Mining Usage Logs from Linking Servers
                                   Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                                CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
bX project: standards-based aggregation of usage data

               Log
            Repository 1
                                              OpenURL
  Link                                      ContextObjects                        Usage log aggregation via OAI-PMH
Resolver                               CO
                                      CO
                                     CO
                                                                                  Log Repository properties:

               Log                                                                •  OAI-PMH metadata record:
            Repository 2        CO
                               CO                                   Service              •  linking server event log for
                              CO
                                                                    provider             specific document in specific
  Link
                                                                                         session
Resolver
                                                                Aggregated               •  expressed using OpenURL XML
                                                                 Log DB
                                                                   logs                  ContextObject Format
                                CO                  Log
                               CO
                              CO
                                                  harvester
                                                                                  •  OAI-PMH identifier: UUID for event
               Log                                                                •  OAI-PMH datestamp: datetime the
            Repository 3                                                          event was added to the Log Repository

  Link
Resolver




                           The bX Project: Federating and Mining Usage Logs from Linking Servers
                                    Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                                 CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
bX project: OpenURL ContextObject to represent usage data
                                   <?xml version=“1.0” encoding=“UTF-8”?>
                                   <ctx:context-object
Event information:                    timestamp=“2005-06-01T10:22:33Z” …
* event datetime                      identifier=“urn:UUID:58f202ac-22cf-11d1-b12d-002035b29062” …>
* globally unique event ID         …
                                   <ctx:referent>
                                      <ctx:identifier>info:pmid/12572533</ctx:identifier>
                                      <ctx:metadata-by-val>
Referent                                 <ctx:format>info:ofi/fmt:xml:xsd:journal</ctx:format>
* identifier                             <ctx:metadata>
                                            <jou:journal xmlns:jou=“info:ofi/fmt:xml:xsd:journal”> …
* metadata                                  <jou:atitle>Isolation of common receptor for coxsackie B …
                                            <jou:jtitle>Science</jou:jtitle>
                                   …
                                   </ctx:referent>
                                   …
Requester                             <ctx:requester>
                                            <ctx:identifier>urn:ip:63.236.2.100</ctx:identifier>
* User or user proxy: IP,             </ctx:requester>
session, …                         …
                                      <ctx:service-type>
                                         …
ServiceType                              <full-text>yes</full-text>
                                         …
                                      </ctx:service-type>
                                      …
Resolver:                              Resolver…
* identifier of linking                Referrer…
                                       ….
server                             </ctx:context-object>


                     The bX Project: Federating and Mining Usage Logs from Linking Servers
                              Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                           CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
bX project: analysis and services based on aggregated usage data

                Log
             Repository 1
                                               OpenURL
    Link                                     ContextObjects
  Resolver                              CO
                                       CO
                                      CO




                Log
             Repository 2        CO
                                CO                                   Service
                               CO
                                                                     provider
                                                                                            Data                  Metrics
    Link
                                                                                                       Item
  Resolver
                                                                 Aggregated                 mining   relations
                                                                  Log DB                                          Services
                                                                    logs
                                 CO
                                CO
                                                     Log
                               CO
                                                   harvester                                          •  Recommender Services
                Log
                                                                                                      •  Analysis services
             Repository 3
                                                                                                      •  Collection management
                                                                                                      •  Trend analysis
    Link
  Resolver




                            The bX Project: Federating and Mining Usage Logs from Linking Servers
                                     Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                                  CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
bX project: analysis and services based on aggregated usage data



    •    Data mining:
          o    Derive document relationships from access sequences
          o    Use common techniques: clickstream datamining and association rule
               learning
    •    Metrics:
          o    Recommender systems: item-based collaborative filtering and spreading
               activation
          o    Common social network metrics of impact, prestige, prominence, etc




                     The bX Project: Federating and Mining Usage Logs from Linking Servers
                              Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                           CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Outline



1.    Problem statement
2.    Analysis of local usage data
3.    Towards federated usage data
4.    Collaborating on the bX project
5.    Mining federated usage data
6.    Discussion
7.    Conclusion




  The bX Project: Federating and Mining Usage Logs from Linking Servers
           Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
        CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Partners and collaborations: Ex Libris/SFX

•    Launched SFX in March 2001
•    Co-developed the OpenURL
•    About 900 libraries in 36 countries
      o    66% are members of consortia
      o    74 ARL libraries (60%)
      o    Central and Local hosting
      o    Growing usage
•    Extensive usage logs
•    Some relevant features:
      o    Support for Z39.88-2004 (OpenURL 1.0)
             -  SAP1 and SAP2
             -  Internal representation of Context Object
      o    Supports various consortia models
             -  Supports distributive linking environments
•    Involvement in bX:
      o    Enabling role for research and development
      o    Enhanced SFX to facilitate experimentation
      o    Facilitate access to usage data sources


      The bX Project: Federating and Mining Usage Logs from Linking Servers
               Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
            CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Partners and collaborations: CalState


•    23 campuses and seven off-campus
     centers,
•    409,000 students
•    44,000 faculty and staff
•    SFX live since Fall 2002
•    SFX consortium model: 23 instances
     (for each of the campuses) + 1 shared
     (the Chancellor’s Office, for shared
     resources)
•    Involvement in bX: provided access to
     usage data for experimentation in
     framework of bX project




                      The bX Project: Federating and Mining Usage Logs from Linking Servers
                               Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                            CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Outline



1.    Problem statement
2.    Analysis of local usage data
3.    Towards federated usage data
4.    Collaborating on the bX project
5.    Mining federated usage data
6.    Discussion
7.    Conclusion




  The bX Project: Federating and Mining Usage Logs from Linking Servers
           Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
        CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Mining federated usage data: CalState experiments

This is not pie in the sky: we have actually done
it!

•  Collaboration with CalState system via Ex Libris:
       •  23 campuses, seven off-campus centers,
       409,000 students, and 44,000 faculty and staff
•  CalState collaborator and point of contact:
                                                                                                  *
       • Marvin Pollard (Chancellorʼs office)
                                                                           *

•  Recorded usage includes all requests for which                          *
merged SFX menu has been presented:
      •  Full-text requests
      •  Abstract requests                                                 *
      •  Any expression of user interest
                                                                           *
•  Present analysis covers 9 CalState institutions:
                                                                           *
       •  Chancellor, CPSLO, Los Angeles,
       Northridge., Sacramento, San Jose, San                              *
       Marcos, SDSU, and SFSU                                              *                          *

       •  167,204 individuals, 3,507,484 accesses,
       2,133,556 documents, Nov. 2003 - Aug. 2005



                          The bX Project: Federating and Mining Usage Logs from Linking Servers
                                   Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                                CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Some statistics: the academic rhythm
                                                Work
                                                late




   Sleep-in




                                  Fall
Spring                          Semester
break            Summer




              The bX Project: Federating and Mining Usage Logs from Linking Servers
                       Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                    CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Results: journal ranking

rank   Usage (PageRank) IF (2003)                  ISSN                           Title
1      78.565                   21.455             0098-7484          JAMA-J AM MED ASSOC
2      71.414                  29.781              0036-8075          SCIENCE
3      60.373                  30.979              0028-0836           NATURE
4      40.828                  3.779               0890-8567          J AM ACAD CHILD PSY
5      39.708                  7.157               0002-953X          AM J PSYCHIAT
6      38.113                  34.833              0028-4793          NEW ENGL J MED
7      37.492                  3.363               0090-0036          AM J PUBLIC HEALTH
8      37.031                  2.591               0195-9131           MED SCI SPORT EXER
9      27.248                  0.998               0309-2402          J ADV NURS
10     26.987                  5.692               0002-9165          AM J CLIN NUTR

                                                                                        Green: convergent
                                                                                        Red: divergent


                The bX Project: Federating and Mining Usage Logs from Linking Servers
                         Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                      CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Comparison of journal usage PageRank and citation Impact Factor




                                                                                      COMPUTER SCIENCE




              The bX Project: Federating and Mining Usage Logs from Linking Servers
                       Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                    CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Comparison of journal usage PageRank and citation Impact Factor




                                                                                      PSYCHOLOGY
                                                                                      PSYCHIATRY




              The bX Project: Federating and Mining Usage Logs from Linking Servers
                       Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                    CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Mapping the structure of science

                             PSYCHOLOGY
                             PSYCHIATRY




NEWS




                                                            PUBLIC HEALTH
                                                            FAMILY




  The bX Project: Federating and Mining Usage Logs from Linking Servers
           Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
        CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Usage-based recommender system

•    Operates on network derived from
     aggregated usage data
                                                                        Movie: article level recommendations
•    Starts from (set of) documents
     (articles or journals)
•    Scans usage network links for                                      Movie: journal level recommendations
     directly and indirectly related
     documents
•    Results:
      o    Scalable
      o    Highly efficient
      o    Highly relevant results derived
           from accumulated, aggregated
           usage data




                      The bX Project: Federating and Mining Usage Logs from Linking Servers
                               Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                            CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Outline



1.    Problem statement
2.    Analysis of local usage data
3.    Towards federated usage data
4.    Collaborating on the bX project
5.    Mining federated usage data
6.    Discussion
7.    Conclusion




  The bX Project: Federating and Mining Usage Logs from Linking Servers
           Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
        CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
General issues

•    Privacy and other legal issues involved in large-scale usage recording: user and
     session identification, legal implications of log storage, ownership, retention policies
•    Data validity: usage definition, recording and representation, quality
     benchmarks, falsification issues
•    Metrics: frequency, structure, mappings and trends
•    Aggregation and scalability:
      o    different architectural frameworks: linking server-based, other, scalability,
           anonymization issues
      o    social/economic models of aggregation: trusted log repository, incentives,
           sampling issues
•    Log data processing:
      o    Datamining approaches: support from informetric and bibliometric community,
           Grouping, isolating and aggregating useful usage patterns
      o    Cross-validation issues: comparison and validation to citation data, data validity
           metrics
•    Metrics and services: informetric indicators, interfaces with existing bibliometric
     products, definition of end-user services
•    Advocacy, strategies and policies: implications for IR and OA movement


                     The bX Project: Federating and Mining Usage Logs from Linking Servers
                              Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                           CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
What’s next?

•    Emerging activities in the realm of applications of usage data :
      o    Mellon Foundation workshop on Usage Data, early 2005
      o    DINI meeting Humboldt-Universität zu Berlin
      o    SUSHI: Standardized Usage Statistics Harvesting Initiative (Harvard,
           Thomson Scientific, Cornell, and others)
      o    IRS: Interoperable Repository Statistics (U. Southampton)

•    LANL and Ex Libris exploring further collaboration in the realm of bX




                   The bX Project: Federating and Mining Usage Logs from Linking Servers
                            Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                         CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Outline



1.    Problem statement
2.    Analysis of local usage data
3.    Towards federated usage data
4.    Collaborating on the bX project
5.    Mining federated usage data
6.    Discussion
7.    Conclusion




  The bX Project: Federating and Mining Usage Logs from Linking Servers
           Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
        CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Conclusion

•    Scholarly communication is going                  •    Feasibility of usage analysis demonstrated
     through a revolution                                   at local and semi-global level
•    Scholarly evaluation will too! Focus                     o    LANL results indicate:
                                                                     -  Possibility of local prestige and impact
     will be on
                                                                        ranking
      o    Immediacy                                                 -  Additional usage-based services such
      o    Representativeness                                           as recommender systems possible
      o    Openness, standards and                            o    bX project on aggregated data and
           scalability                                             analysis:
                                                                     -  Large-scale aggregation demonstrated
      o    Acknowledging structural aspects                             scalability
           of prestige and impact in the                             -  Use of existing standards ensures
           scholarly community                                          openness, ability of all to participate
•    User driven evaluation offers an                                -  Possibility of spontaneous emergence
     interesting alternative to current                                 of vetting and standardization system
                                                                        for usage quality indicators
     short-front evaluation methods in a
                                                                     -  Enticing community and global
     long-tail world                                                    recommender services offer further
                                                                        incentives to adopt locally and
                                                                        collaborate globally



                    The bX Project: Federating and Mining Usage Logs from Linking Servers
                             Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                          CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
Some papers:

•    J. Bollen, H. Van de Sompel, J. Smith, and R. Luce. Toward alternative
     metrics of journal impact: a comparison of download and citation data.
     Information Processing and Management, 41(6):1419-1440, 2005.
      o    http://dx.doi.org/10.1016/j.ipm.2005.03.024

•    J. Bollen, R. Luce, S. Vemulapalli, and W. Xu. Detecting research trends in
     digital library readership. In Proceedings of the Seventh European Conference
     on Digital Libraries (LNCS 2769), pages 24-28, Trondheim, Norway, August
     18 2003. Springer-Verlag.
      o    http://www.springerlink.com/openurl.asp?genre=article&issn=0302-9743&volume=2769&spage=24

•    J. Bollen, R. Luce, S. Vemulapalli, and W. Xu. Usage analysis for the
     identification of research trends in digital libraries. D-Lib Magazine, 9(5),
     2003.
      o    http://www.dlib.org/dlib/may03/bollen/05bollen.html




                       The bX Project: Federating and Mining Usage Logs from Linking Servers
                                Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel
                             CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA

Contenu connexe

Tendances

NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
National Information Standards Organization (NISO)
 
DSNotify - Detecting and Fixing Broken Links in Linked Data Sets
DSNotify - Detecting and Fixing Broken Links in Linked Data SetsDSNotify - Detecting and Fixing Broken Links in Linked Data Sets
DSNotify - Detecting and Fixing Broken Links in Linked Data Sets
Bernhard Haslhofer
 

Tendances (20)

Linked Data and Sevices
Linked Data and SevicesLinked Data and Sevices
Linked Data and Sevices
 
Linked Data as a new environment for Learning Analytics and education
Linked Data as a new environment  for Learning Analytics and educationLinked Data as a new environment  for Learning Analytics and education
Linked Data as a new environment for Learning Analytics and education
 
Semantic Web, Linked Data and Education: A Perfect Fit?
Semantic Web, Linked Data and Education: A Perfect Fit?Semantic Web, Linked Data and Education: A Perfect Fit?
Semantic Web, Linked Data and Education: A Perfect Fit?
 
Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013
 
British Library Seminar: Shared Canvas (September 2011)
British Library Seminar: Shared Canvas (September 2011)British Library Seminar: Shared Canvas (September 2011)
British Library Seminar: Shared Canvas (September 2011)
 
Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology...
Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology...Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology...
Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology...
 
Linked data in the digital humanities skills workshop for realising the oppo...
Linked data in the digital humanities  skills workshop for realising the oppo...Linked data in the digital humanities  skills workshop for realising the oppo...
Linked data in the digital humanities skills workshop for realising the oppo...
 
W4 4 marc-alexandre-nolin-v2
W4 4 marc-alexandre-nolin-v2W4 4 marc-alexandre-nolin-v2
W4 4 marc-alexandre-nolin-v2
 
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
NISO Forum, Denver, Sept. 24, 2012: Opening Keynote: The Many and the One: BC...
 
DSNotify - Detecting and Fixing Broken Links in Linked Data Sets
DSNotify - Detecting and Fixing Broken Links in Linked Data SetsDSNotify - Detecting and Fixing Broken Links in Linked Data Sets
DSNotify - Detecting and Fixing Broken Links in Linked Data Sets
 
BHL hardware architecture - storage and clusters
BHL hardware architecture - storage and clustersBHL hardware architecture - storage and clusters
BHL hardware architecture - storage and clusters
 
Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...Linked Data at the Open University: From Technical Challenges to Organization...
Linked Data at the Open University: From Technical Challenges to Organization...
 
Evolving the Web into a Giant Global Database
Evolving the Web into a Giant Global DatabaseEvolving the Web into a Giant Global Database
Evolving the Web into a Giant Global Database
 
Crushing, Blending, and Stretching Transactional Data
Crushing, Blending, and Stretching Transactional DataCrushing, Blending, and Stretching Transactional Data
Crushing, Blending, and Stretching Transactional Data
 
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
NISO Forum, Denver, Sept. 24, 2012: EZID: Easy dataset identification & manag...
 
Distributed Graph Databases and the Emerging Web of Data
Distributed Graph Databases and the Emerging Web of DataDistributed Graph Databases and the Emerging Web of Data
Distributed Graph Databases and the Emerging Web of Data
 
Doing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebDoing Clever Things with the Semantic Web
Doing Clever Things with the Semantic Web
 
Data management for researchers
Data management for researchersData management for researchers
Data management for researchers
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
 
Interpreting Data Mining Results with Linked Data for Learning Analytics
Interpreting Data Mining Results with Linked Data for Learning AnalyticsInterpreting Data Mining Results with Linked Data for Learning Analytics
Interpreting Data Mining Results with Linked Data for Learning Analytics
 

En vedette

Open Archives Initiative Object Re-Use & Exchange
Open Archives Initiative Object Re-Use & ExchangeOpen Archives Initiative Object Re-Use & Exchange
Open Archives Initiative Object Re-Use & Exchange
Herbert Van de Sompel
 
Augmenting interoperability across scholarly repositories
Augmenting interoperability across scholarly repositoriesAugmenting interoperability across scholarly repositories
Augmenting interoperability across scholarly repositories
Herbert Van de Sompel
 
Attempts at innovation in scholarly communication
Attempts at innovation in scholarly communicationAttempts at innovation in scholarly communication
Attempts at innovation in scholarly communication
Herbert Van de Sompel
 
Motivation, inspiration and innovation from frustration
Motivation, inspiration and innovation from frustrationMotivation, inspiration and innovation from frustration
Motivation, inspiration and innovation from frustration
Herbert Van de Sompel
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
Herbert Van de Sompel
 

En vedette (18)

MESUR: Making sense and use of usage data
MESUR: Making sense and use of usage dataMESUR: Making sense and use of usage data
MESUR: Making sense and use of usage data
 
Open Archives Initiative Object Re-Use & Exchange
Open Archives Initiative Object Re-Use & ExchangeOpen Archives Initiative Object Re-Use & Exchange
Open Archives Initiative Object Re-Use & Exchange
 
Augmenting interoperability across scholarly repositories
Augmenting interoperability across scholarly repositoriesAugmenting interoperability across scholarly repositories
Augmenting interoperability across scholarly repositories
 
The djatoka Image Server
The djatoka Image ServerThe djatoka Image Server
The djatoka Image Server
 
the UPS protoproto project
the UPS protoproto projectthe UPS protoproto project
the UPS protoproto project
 
The Roof is on Fire
The Roof is on FireThe Roof is on Fire
The Roof is on Fire
 
Attempts at innovation in scholarly communication
Attempts at innovation in scholarly communicationAttempts at innovation in scholarly communication
Attempts at innovation in scholarly communication
 
An HTTP-Based Versioning Mechanism for Linked Data
An HTTP-Based Versioning Mechanism for Linked DataAn HTTP-Based Versioning Mechanism for Linked Data
An HTTP-Based Versioning Mechanism for Linked Data
 
The aDORe Federation Architecture
The aDORe Federation ArchitectureThe aDORe Federation Architecture
The aDORe Federation Architecture
 
The Web as infrastructure for scholarly research and communication
The Web as infrastructure for scholarly research and communicationThe Web as infrastructure for scholarly research and communication
The Web as infrastructure for scholarly research and communication
 
Motivation, inspiration and innovation from frustration
Motivation, inspiration and innovation from frustrationMotivation, inspiration and innovation from frustration
Motivation, inspiration and innovation from frustration
 
Memento: Time Travel for the Web
Memento: Time Travel for the WebMemento: Time Travel for the Web
Memento: Time Travel for the Web
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
 
A Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly RecordA Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly Record
 
PID Signposting Pattern
PID Signposting PatternPID Signposting Pattern
PID Signposting Pattern
 
Memento: Big Leaps Towards Seamless Navigation of the Web of the Past
Memento: Big Leaps Towards Seamless Navigation of the Web of the PastMemento: Big Leaps Towards Seamless Navigation of the Web of the Past
Memento: Big Leaps Towards Seamless Navigation of the Web of the Past
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication System
 
Untitled I: Challenges ahead
Untitled I: Challenges aheadUntitled I: Challenges ahead
Untitled I: Challenges ahead
 

Similaire à The bX project: Federating and Mining Usage Logs from Linking Servers

Making project data avalialble eNanomapper through Database
Making project data avalialble eNanomapper through  DatabaseMaking project data avalialble eNanomapper through  Database
Making project data avalialble eNanomapper through Database
Nina Jeliazkova
 
Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16
GenomeInABottle
 
Deep behavioral phenotyping in functional MRI for cognitive mapping of the hu...
Deep behavioral phenotyping in functional MRI for cognitive mapping of the hu...Deep behavioral phenotyping in functional MRI for cognitive mapping of the hu...
Deep behavioral phenotyping in functional MRI for cognitive mapping of the hu...
Ana Luísa Pinho
 
Anvita Wisp 2007 Presentation
Anvita Wisp 2007 PresentationAnvita Wisp 2007 Presentation
Anvita Wisp 2007 Presentation
guest6e7a1b1
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
butest
 
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Spark Summit
 
Presentation Doctoral Consortium EuroITV2009 - Audiovisual cultural heritage:...
Presentation Doctoral Consortium EuroITV2009 - Audiovisual cultural heritage:...Presentation Doctoral Consortium EuroITV2009 - Audiovisual cultural heritage:...
Presentation Doctoral Consortium EuroITV2009 - Audiovisual cultural heritage:...
Guido Ongena
 
Anvita Ncvpripg 2008 Presentation
Anvita Ncvpripg 2008 PresentationAnvita Ncvpripg 2008 Presentation
Anvita Ncvpripg 2008 Presentation
guest6e7a1b1
 

Similaire à The bX project: Federating and Mining Usage Logs from Linking Servers (20)

Making project data avalialble eNanomapper through Database
Making project data avalialble eNanomapper through  DatabaseMaking project data avalialble eNanomapper through  Database
Making project data avalialble eNanomapper through Database
 
Statistical Analysis of Web of Data Usage
Statistical Analysis of Web of Data UsageStatistical Analysis of Web of Data Usage
Statistical Analysis of Web of Data Usage
 
Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16Genome in a Bottle Consortium Workshop Welcome Aug. 16
Genome in a Bottle Consortium Workshop Welcome Aug. 16
 
Crushing, Blending, and Stretching Data
Crushing, Blending, and Stretching DataCrushing, Blending, and Stretching Data
Crushing, Blending, and Stretching Data
 
Deep behavioral phenotyping in functional MRI for cognitive mapping of the hu...
Deep behavioral phenotyping in functional MRI for cognitive mapping of the hu...Deep behavioral phenotyping in functional MRI for cognitive mapping of the hu...
Deep behavioral phenotyping in functional MRI for cognitive mapping of the hu...
 
Anvita Wisp 2007 Presentation
Anvita Wisp 2007 PresentationAnvita Wisp 2007 Presentation
Anvita Wisp 2007 Presentation
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials Project
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013EarthCube Monthly Community Webinar- Nov. 22, 2013
EarthCube Monthly Community Webinar- Nov. 22, 2013
 
Information Quality in the Web Era
Information Quality in the Web EraInformation Quality in the Web Era
Information Quality in the Web Era
 
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
 
Trip Report Seattle
Trip Report SeattleTrip Report Seattle
Trip Report Seattle
 
g-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionalityg-Social - Enhancing e-Science Tools with Social Networking Functionality
g-Social - Enhancing e-Science Tools with Social Networking Functionality
 
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...
 
Learnometrics: Metrics for Learning Objects
Learnometrics: Metrics for Learning ObjectsLearnometrics: Metrics for Learning Objects
Learnometrics: Metrics for Learning Objects
 
Presentation Doctoral Consortium EuroITV2009 - Audiovisual cultural heritage:...
Presentation Doctoral Consortium EuroITV2009 - Audiovisual cultural heritage:...Presentation Doctoral Consortium EuroITV2009 - Audiovisual cultural heritage:...
Presentation Doctoral Consortium EuroITV2009 - Audiovisual cultural heritage:...
 
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYCNYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
 
STI Summit 2011 - Mlr-sm
STI Summit 2011 - Mlr-smSTI Summit 2011 - Mlr-sm
STI Summit 2011 - Mlr-sm
 
Anvita Ncvpripg 2008 Presentation
Anvita Ncvpripg 2008 PresentationAnvita Ncvpripg 2008 Presentation
Anvita Ncvpripg 2008 Presentation
 

Plus de Herbert Van de Sompel

ResourceSync Overview
ResourceSync OverviewResourceSync Overview
ResourceSync Overview
Herbert Van de Sompel
 

Plus de Herbert Van de Sompel (20)

The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about it
 
Researcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebResearcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized Web
 
Persistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DonePersistent Identification: Easier Said than Done
Persistent Identification: Easier Said than Done
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning Issue
 
Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)
 
Collecting the organizational scholarly record
Collecting the organizational scholarly recordCollecting the organizational scholarly record
Collecting the organizational scholarly record
 
To the Rescue of Scholarly Orphans
To the Rescue of Scholarly OrphansTo the Rescue of Scholarly Orphans
To the Rescue of Scholarly Orphans
 
Almost two decades at LANL
Almost two decades at LANLAlmost two decades at LANL
Almost two decades at LANL
 
Perseverance on Persistence
Perseverance on PersistencePerseverance on Persistence
Perseverance on Persistence
 
Paul Evan Peters Lecture
Paul Evan Peters LecturePaul Evan Peters Lecture
Paul Evan Peters Lecture
 
Achieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsAchieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed Collections
 
Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)
 
Signposting Overview
Signposting OverviewSignposting Overview
Signposting Overview
 
Interoperability for web based scholarship
Interoperability for web based scholarshipInteroperability for web based scholarship
Interoperability for web based scholarship
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
 
Creating Pockets of Persistence
Creating Pockets of PersistenceCreating Pockets of Persistence
Creating Pockets of Persistence
 
ResourceSync Quick Overview
ResourceSync Quick OverviewResourceSync Quick Overview
ResourceSync Quick Overview
 
Memento 101
Memento 101Memento 101
Memento 101
 
ResourceSync Overview
ResourceSync OverviewResourceSync Overview
ResourceSync Overview
 
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous MappingPersistent Identifiers and the Web: The Need for an Unambiguous Mapping
Persistent Identifiers and the Web: The Need for an Unambiguous Mapping
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

The bX project: Federating and Mining Usage Logs from Linking Servers

  • 1. The bX project: Federating and Mining Usage Logs from Linking Servers Johan Bollen (1), Oren Beit-Arie (2), and Herbert Van de Sompel (1) Digital Library Research & Prototyping Team (1) Research Library, Los Alamos National Laboratory (2) Ex Libris Inc., Boston, MA jbollen@lanl.gov , oren@exlibris-usa.com , herbertv@lanl.gov Acknowledgement: Marvin Pollard (CalState) The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 2. Outline 1.  Problem statement 2.  Analysis of local usage data 3.  Towards federated usage data 4.  Collaborating on the bX project 5.  Mining federated usage data 6.  Discussion 7.  Conclusion The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 3. Outline 1.  Problem statement 2.  Analysis of local usage data 3.  Towards federated usage data 4.  Collaborating on the bX project 5.  Mining federated usage data 6.  Discussion 7.  Conclusion The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 4. Scholarly evaluation in an electronic publishing paradigm Evaluation scholarly quality •  Scholarly quality evaluated by citation counts paper paradigm o  Citable, published literature only o  Metrics: citation frequency electronic Articles, journals: o  Limited resources: what and how we count paradigm Citation data + •  Electronic paradigm changes everything •  New models of communication: IR, Pre-print, Citation metrics •  Everything will be published multimedia, raw data, •  No central vetting authority software, etc o  New models of scholarship o  Publish multimedia, raw data, software o  New metrics of evaluation? ? metrics The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 5. Evaluation of resources: a user-driven revolution authors Evaluation of resources (quality, scholarship evaluation now status, prestige) is required on all levels of our digital infrastructure. Trend: 1.  author -> user Google’s PR 2.  frequency -> structure IF, citation counts Technorati frequentist structural Flickr.org Amazon.com Slashdot.org Collaborative filtering novel methods of LANL experiments for scholarship evaluation resource evaluation since 1999 users The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 6. Outline 1.  Problem statement 2.  Analysis of local usage data 3.  Towards federated usage data 4.  Collaborating on the bX project 5.  Mining federated usage data 6.  Discussion 7.  Conclusion The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 7. Scholarly evaluation: process flow for data analysis source data structure metrics evaluation Usage: user activity that expresses interest or preference Access data: particular instance(s) of usage (e.g. request abstract, download full-text) Co-access: repeated instances of users accessing same pairs of items (documents) Co-access graph: network of co-access data Social network metrics: prestige from network structure The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 8. Scholarly evaluation: mining usage data and deriving metrics Two essential components to move beyond descriptive usage stats: 1)  Datamine usage patterns for networks of items relationships: •  Citation: when A cites B, A and B are related •  Usage: when A and B are frequently co-used, they are related 2)  Structural analysis of resulting networks: •  Social network metrics of visibility (in-degree), prestige (PageRank), power (betweenness), etc •  Mapping techniques: multi-dimensional scaling, self-organizing maps • Kothari (2003). On using page cooccurence … • Kim (2004). A clickstream-based collaborative… • Sarwar (2001. Item-based collaborative filtering The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 9. LANL experiments: demonstrating the power of usage data analysis •  LANL has been active in this area since early 1999 o  Early analysis of LANL RL usage data (local) in 1999 o  Extraction of item networks o  Calculation of impact metrics (social network approach) •  Preliminary success o  Demonstrated valid journal and article networks o  Surprising success in ranking of items according to institutional focus o  Discovery of hidden interest groups and focii •  Next two slides: recent results o  February 2004 to April 2005 o  392,455 usage events: any indication of preferences/interest o  5,866 users o  330,109 articles o  10,695 journals •  See publication list at end for more information The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 10. A comparison of 2004 LANL usage data and citation Impact Factor rank Usage (PageRank) IF (2003) ISSN Title 1 60.196 7.035 0031-9007 PHYS REV LETT 2 37.568 2.950 0021-9606 J CHEM PHYS 3 34.618 1.179 0022-3115 J NUCL MATER 4 31.132 2.202 1063-651X PHYS REV E 5 30.441 2.171 0021-8979 J APPL PHYS 6 30.128 30.979 0028-0836 NATURE 7 29.972 29.781 0036-8075 SCIENCE 8 27.187 6.516 0002-7863 J AM CHEM SOC 9 24.602 4.049 0003-6951 APPL PHYS LETT 10 23.631 2.992 0148-0227 J GEOPHYS RES Green: convergent Red: divergent The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 11. Information landscapes LANL 2004 Usage Data ISI Journal Citation Reports 2003 •  Two component model •  Principal Component 1: Life vs. natural science •  Principal Component 2: Microscopic vs. macroscopic •  Z-axis: cluster density The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 12. Outline 1.  Problem statement 2.  Analysis of local usage data 3.  Towards federated usage data 4.  Collaborating on the bX project 5.  Mining federated usage data 6.  Discussion 7.  Conclusion The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 13. From local usage data to global usage data •  Local usage is interesting •  Informs local collection management •  Prominent communities can inform assessments of science trends •  Covers wide range of communication items •  Immediate availability •  Global, aggregated usage data is even more interesting •  Monitor science as it takes place LANL •  Replace/augment/validate proprietary data sets •  Allow free-form aggregation: •  Clusters of institutions ISI Scholary •  Focus on sub-domains and core communication communities The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 14. Local aggregation of usage data: linking servers ISI Publisher ingenta •  Linking servers can record sites activities across multiple OpenURL-enabled information sources of a specific digital library environment Full text Link Ehost DBs Resolver EJS •  Linking server logs are representative of the activities of a particular user population •  Allows recording of British A&I clickstream data: other Library service methods of log aggregation can not connect “same user, Google different system” streams The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 15. Global aggregation of usage data Log Repository 1 Link Usage logs Resolver •  Aggregation of linking server logs leads to data set representative Log of large sample of Repository 2 Usage logs Aggregated Usage Data scholarly community •  Global really means Link Aggregated different samples of Resolver Log DB logs scholarly community •  Can be finetuned for local Log Repository 3 communities •  Possibility of truly Usage logs global coverage Link Resolver The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 16. Analysis and services based on global usage data Log Repository 1 Link Usage logs Resolver Log Repository 2 Usage logs Aggregated Usage Data Data Metrics Item Link Aggregated mining relations Resolver Log DB Services logs •  Recommender Services Log •  Analysis services Repository 3 •  Collection management Usage logs •  Trend analysis Link Resolver The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 17. bX project: standards-based aggregation of usage data Log Repository 1 OpenURL Link ContextObjects Usage log aggregation via OAI-PMH Resolver CO CO CO Log Repository properties: Log •  OAI-PMH metadata record: Repository 2 CO CO Service •  linking server event log for CO provider specific document in specific Link session Resolver Aggregated •  expressed using OpenURL XML Log DB logs ContextObject Format CO Log CO CO harvester •  OAI-PMH identifier: UUID for event Log •  OAI-PMH datestamp: datetime the Repository 3 event was added to the Log Repository Link Resolver The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 18. bX project: OpenURL ContextObject to represent usage data <?xml version=“1.0” encoding=“UTF-8”?> <ctx:context-object Event information: timestamp=“2005-06-01T10:22:33Z” … * event datetime identifier=“urn:UUID:58f202ac-22cf-11d1-b12d-002035b29062” …> * globally unique event ID … <ctx:referent> <ctx:identifier>info:pmid/12572533</ctx:identifier> <ctx:metadata-by-val> Referent <ctx:format>info:ofi/fmt:xml:xsd:journal</ctx:format> * identifier <ctx:metadata> <jou:journal xmlns:jou=“info:ofi/fmt:xml:xsd:journal”> … * metadata <jou:atitle>Isolation of common receptor for coxsackie B … <jou:jtitle>Science</jou:jtitle> … </ctx:referent> … Requester <ctx:requester> <ctx:identifier>urn:ip:63.236.2.100</ctx:identifier> * User or user proxy: IP, </ctx:requester> session, … … <ctx:service-type> … ServiceType <full-text>yes</full-text> … </ctx:service-type> … Resolver: Resolver… * identifier of linking Referrer… …. server </ctx:context-object> The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 19. bX project: analysis and services based on aggregated usage data Log Repository 1 OpenURL Link ContextObjects Resolver CO CO CO Log Repository 2 CO CO Service CO provider Data Metrics Link Item Resolver Aggregated mining relations Log DB Services logs CO CO Log CO harvester •  Recommender Services Log •  Analysis services Repository 3 •  Collection management •  Trend analysis Link Resolver The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 20. bX project: analysis and services based on aggregated usage data •  Data mining: o  Derive document relationships from access sequences o  Use common techniques: clickstream datamining and association rule learning •  Metrics: o  Recommender systems: item-based collaborative filtering and spreading activation o  Common social network metrics of impact, prestige, prominence, etc The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 21. Outline 1.  Problem statement 2.  Analysis of local usage data 3.  Towards federated usage data 4.  Collaborating on the bX project 5.  Mining federated usage data 6.  Discussion 7.  Conclusion The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 22. Partners and collaborations: Ex Libris/SFX •  Launched SFX in March 2001 •  Co-developed the OpenURL •  About 900 libraries in 36 countries o  66% are members of consortia o  74 ARL libraries (60%) o  Central and Local hosting o  Growing usage •  Extensive usage logs •  Some relevant features: o  Support for Z39.88-2004 (OpenURL 1.0) -  SAP1 and SAP2 -  Internal representation of Context Object o  Supports various consortia models -  Supports distributive linking environments •  Involvement in bX: o  Enabling role for research and development o  Enhanced SFX to facilitate experimentation o  Facilitate access to usage data sources The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 23. Partners and collaborations: CalState •  23 campuses and seven off-campus centers, •  409,000 students •  44,000 faculty and staff •  SFX live since Fall 2002 •  SFX consortium model: 23 instances (for each of the campuses) + 1 shared (the Chancellor’s Office, for shared resources) •  Involvement in bX: provided access to usage data for experimentation in framework of bX project The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 24. Outline 1.  Problem statement 2.  Analysis of local usage data 3.  Towards federated usage data 4.  Collaborating on the bX project 5.  Mining federated usage data 6.  Discussion 7.  Conclusion The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 25. Mining federated usage data: CalState experiments This is not pie in the sky: we have actually done it! •  Collaboration with CalState system via Ex Libris: •  23 campuses, seven off-campus centers, 409,000 students, and 44,000 faculty and staff •  CalState collaborator and point of contact: * • Marvin Pollard (Chancellorʼs office) * •  Recorded usage includes all requests for which * merged SFX menu has been presented: •  Full-text requests •  Abstract requests * •  Any expression of user interest * •  Present analysis covers 9 CalState institutions: * •  Chancellor, CPSLO, Los Angeles, Northridge., Sacramento, San Jose, San * Marcos, SDSU, and SFSU * * •  167,204 individuals, 3,507,484 accesses, 2,133,556 documents, Nov. 2003 - Aug. 2005 The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 26. Some statistics: the academic rhythm Work late Sleep-in Fall Spring Semester break Summer The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 27. Results: journal ranking rank Usage (PageRank) IF (2003) ISSN Title 1 78.565 21.455 0098-7484 JAMA-J AM MED ASSOC 2 71.414 29.781 0036-8075 SCIENCE 3 60.373 30.979 0028-0836 NATURE 4 40.828 3.779 0890-8567 J AM ACAD CHILD PSY 5 39.708 7.157 0002-953X AM J PSYCHIAT 6 38.113 34.833 0028-4793 NEW ENGL J MED 7 37.492 3.363 0090-0036 AM J PUBLIC HEALTH 8 37.031 2.591 0195-9131 MED SCI SPORT EXER 9 27.248 0.998 0309-2402 J ADV NURS 10 26.987 5.692 0002-9165 AM J CLIN NUTR Green: convergent Red: divergent The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 28. Comparison of journal usage PageRank and citation Impact Factor COMPUTER SCIENCE The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 29. Comparison of journal usage PageRank and citation Impact Factor PSYCHOLOGY PSYCHIATRY The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 30. Mapping the structure of science PSYCHOLOGY PSYCHIATRY NEWS PUBLIC HEALTH FAMILY The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 31. Usage-based recommender system •  Operates on network derived from aggregated usage data Movie: article level recommendations •  Starts from (set of) documents (articles or journals) •  Scans usage network links for Movie: journal level recommendations directly and indirectly related documents •  Results: o  Scalable o  Highly efficient o  Highly relevant results derived from accumulated, aggregated usage data The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 32. Outline 1.  Problem statement 2.  Analysis of local usage data 3.  Towards federated usage data 4.  Collaborating on the bX project 5.  Mining federated usage data 6.  Discussion 7.  Conclusion The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 33. General issues •  Privacy and other legal issues involved in large-scale usage recording: user and session identification, legal implications of log storage, ownership, retention policies •  Data validity: usage definition, recording and representation, quality benchmarks, falsification issues •  Metrics: frequency, structure, mappings and trends •  Aggregation and scalability: o  different architectural frameworks: linking server-based, other, scalability, anonymization issues o  social/economic models of aggregation: trusted log repository, incentives, sampling issues •  Log data processing: o  Datamining approaches: support from informetric and bibliometric community, Grouping, isolating and aggregating useful usage patterns o  Cross-validation issues: comparison and validation to citation data, data validity metrics •  Metrics and services: informetric indicators, interfaces with existing bibliometric products, definition of end-user services •  Advocacy, strategies and policies: implications for IR and OA movement The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 34. What’s next? •  Emerging activities in the realm of applications of usage data : o  Mellon Foundation workshop on Usage Data, early 2005 o  DINI meeting Humboldt-Universität zu Berlin o  SUSHI: Standardized Usage Statistics Harvesting Initiative (Harvard, Thomson Scientific, Cornell, and others) o  IRS: Interoperable Repository Statistics (U. Southampton) •  LANL and Ex Libris exploring further collaboration in the realm of bX The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 35. Outline 1.  Problem statement 2.  Analysis of local usage data 3.  Towards federated usage data 4.  Collaborating on the bX project 5.  Mining federated usage data 6.  Discussion 7.  Conclusion The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 36. Conclusion •  Scholarly communication is going •  Feasibility of usage analysis demonstrated through a revolution at local and semi-global level •  Scholarly evaluation will too! Focus o  LANL results indicate: -  Possibility of local prestige and impact will be on ranking o  Immediacy -  Additional usage-based services such o  Representativeness as recommender systems possible o  Openness, standards and o  bX project on aggregated data and scalability analysis: -  Large-scale aggregation demonstrated o  Acknowledging structural aspects scalability of prestige and impact in the -  Use of existing standards ensures scholarly community openness, ability of all to participate •  User driven evaluation offers an -  Possibility of spontaneous emergence interesting alternative to current of vetting and standardization system for usage quality indicators short-front evaluation methods in a -  Enticing community and global long-tail world recommender services offer further incentives to adopt locally and collaborate globally The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA
  • 37. Some papers: •  J. Bollen, H. Van de Sompel, J. Smith, and R. Luce. Toward alternative metrics of journal impact: a comparison of download and citation data. Information Processing and Management, 41(6):1419-1440, 2005. o  http://dx.doi.org/10.1016/j.ipm.2005.03.024 •  J. Bollen, R. Luce, S. Vemulapalli, and W. Xu. Detecting research trends in digital library readership. In Proceedings of the Seventh European Conference on Digital Libraries (LNCS 2769), pages 24-28, Trondheim, Norway, August 18 2003. Springer-Verlag. o  http://www.springerlink.com/openurl.asp?genre=article&issn=0302-9743&volume=2769&spage=24 •  J. Bollen, R. Luce, S. Vemulapalli, and W. Xu. Usage analysis for the identification of research trends in digital libraries. D-Lib Magazine, 9(5), 2003. o  http://www.dlib.org/dlib/may03/bollen/05bollen.html The bX Project: Federating and Mining Usage Logs from Linking Servers Johan Bollen, Oren Beit-Arie, Herbert Van de Sompel CNI Fall 2005, December 5th - 6th 2005, Phoenix, Arizona, USA