SlideShare une entreprise Scribd logo
1  sur  12
ViBRANT
                                                                             Virtual Biodiversity Research




      ‘Wish you were here before!’
Who gains from collaboration between computer science
                and social research?


 Daphne Duin, David King, Peter van den Besselaar

 Dep. of Organization Sciences & Network Institute, VU-University Amsterdam
        Department of Computing, The Open University, Milton Keynes




          Social Science and Digital Research: Interdisciplinary Insights,
                    March, 12, 2012, Oxford e-Research Centre
ViBRANT
                                                                                                                                        Virtual Biodiversity Research




Help! How is this social data?
 Time taken to serve the request (microseconds)                   Host name (equates to Scratchpad)                """Full URL"" (in quotes)"
                Origin of request (IP address) F5                 Time the request was received (e#g# (01/Apr/2011:11:17:42 +0100)
                """First line of request"" (in quotes)"           Status of final request (e#g# 200, 301, etc)     Size of the response in
      bytes     Remote logname (Almost always blank)              """Referer"" (in quotes)"
      able.myspecies.info        http://able.myspecies.info/favicon.ico            24.218.227.223 --               [14/Jul/2010:19:54:06
                GET /favicon.ico HTTP/1.1        200              198              -               Mozilla/5.0 (Macintosh; U; Intel Mac OS X
      10.6; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6
      polychaetes.info           http://polychaetes.info/node/add/forum/forum/                     24.229.196.151 --
                [14/Jul/2010:20:16:48            GET /node/add/forum/forum/ HTTP/1.0               301             -
                http://polychaetes.info/node/add/forum/forum/                      Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x
      4.90; Creative)
      ciliateguide.myspecies.info                http://ciliateguide.myspecies.info/node/add/forum/forum/          24.229.196.151 --
                [14/Jul/2010:20:39:14            GET /node/add/forum/forum/ HTTP/1.0               301             -
                http://ciliateguide.myspecies.info/node/add/forum/forum/           Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;
      MRA 4.6 (build 01425); MRSPUTNIK 1, 5, 0, 19 SW)
      ciliateguide.myspecies.info                http://ciliateguide.myspecies.info/node/add/forum/forum           24.229.196.151 --
                [14/Jul/2010:20:39:22            GET /node/add/forum/forum HTTP/1.0                200             25219
                http://ciliateguide.myspecies.info/node/add/forum/forum            Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;
      MRA 4.6 (build 01425); MRSPUTNIK 1, 5, 0, 19 SW)
      ciliateguide.myspecies.info                http://ciliateguide.myspecies.info/node/add/forum/forum           24.229.196.151 --
                [14/Jul/2010:20:39:37            POST /node/add/forum/forum HTTP/1.0               200             27128
                http://ciliateguide.myspecies.info/node/add/forum/forum            Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;
      MRA 4.6 (build 01425); MRSPUTNIK 1, 5, 0, 19 SW)
      ciliateguide.myspecies.info                http://ciliateguide.myspecies.info/node/add/forum/forum           24.229.196.151 --
                [14/Jul/2010:20:39:47            GET /node/add/forum/forum HTTP/1.0                200             25219
                http://ciliateguide.myspecies.info/node/add/forum/forum            Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1;
      MRA 4.6 (build 01425); MRSPUTNIK 1, 5, 0, 19 SW)
 26141          wallacefund.info                 http://wallacefund.info/robots.txt                38.101.148.126 --
                [15/Jul/2010:03:48:42            GET /robots.txt HTTP/1.1          200             44              -               Mozilla/5.0
      (compatible; discobot/1.1; +http://discoveryengine.com/discobot.html
      mhp.myspecies.info         http://mhp.myspecies.info/robots.txt              38.101.148.126 --               [15/Jul/2010:03:48:49
                GET /robots.txt HTTP/1.1         200              44               -               Mozilla/5.0 (compatible; discobot/1.1; +
ViBRANT
                                                           Virtual Biodiversity Research




Interdisciplinary work for e-science
 E-science
 1. Application of an e-infrastructure to do science
 2. The study of the design, uptake and use of e-Science

 E-infrastructure: Scratchpads, online platform for
  biodiversity research

 Need: Developing alternative evaluation metrics for e-
  science

 Goal: Identification of different types of users

 Approach: Collaboration between social science and
  omputer science valuable for e-science
ViBRANT
                                                          Virtual Biodiversity Research




What is the impact of e-science?

  Question from e-science facility to social scientists


  Identification of different types of users
         Who are visiting Scratchpad platform?
         Web data (eg server log files)
         Identify Internet Service Providers visiting
         Scratchpads
         Cluster Internet Service Providers visiting
         Scratchpads, into meaningful categories
ViBRANT
                                                                        Virtual Biodiversity Research




Material
Standard web analytics report of Scratchpads
   >300 community sites
   > 5,000 registred users (unpaid)
   Public and closed content



Names of 6,728 unique Internet Service Providers
  (ISPs) (6 months)
  natural history museum               telstra internet   verizon online llc
  freie universitaet berlin
  queensland department of natural resources and water
  Gemeente maastricht
  national parks board (ministry of national development)
  agriculture and agrifood canada
  Commission europeenne
  u.s. fish and wildlife service irm/bfo hqstate of nebraska / office of
ViBRANT
                                                                  Virtual Biodiversity Research




Social scientists and computer scientists
 First trying alone…
 ….marina|marine|medical|medisch|microsoft|mineral|mining|ministerie|
   ministry|monsanto|museo|museum|national
   park|naval|navy|nerc|news|novartis|observatoire|office….


  Then question to computer scientist
   ...from social scientists: could you help us to better...
   • collect web data?
   • refine/cluster the data ?
   • develop tools/methods for measuring robustness of
       data?
ViBRANT
                                                           Virtual Biodiversity Research




Altmetrics for e-science: a social science and
computer science project

 “to what extent can we improve a human developed method
    with computational techniques, in order to cluster ISPs into
    meaningful categories representing the various audiences
    using Scratchpads? “
ViBRANT
                                                      Virtual Biodiversity Research




Method computer scientist
 Identify Internet Service Providers visiting
 Scratchpads, removing noise
        Inductive logic program, Aleph



 Cluster Internet Service Providers visiting Scratchpads
 into meaningful categories
       Bayesian classifier
ViBRANT
                                                                                       Virtual Biodiversity Research




Results: Identification of ISPs
Manually build filter (181 terms)
- accuracy 94%
- precision 92%
- recall 97%
       Many hours of work

Computational filter (6 terms)
 - accuracy 84%                Comparison of filters             6 term filter set
                         120%                                    181 term filter set
- precision 98%          100%

- recall 73%              80%
  c
     Couple of minutes 60%
                             40%

                             20%

                              0%
                                      precision:       recall:            f-measure:
ViBRANT
                                                                                    Virtual Biodiversity Research




Results: Clustering ISPs in meaningful
categories                                           ISPs by Sector



Manual method: filter with key
                                                                              government
words                                                                         industry
“university” “research” “school”                                              media/arts

“museum”                                                                      research/edu


Problematic!
Computational method: classifiers
- 90% accuracy
       Couple of minutes!               Classifier Accuracy

                      100%
                       90%
                       80%
                       70%
                       60%                                                                 Simple
                       50%
                                                                                           Bayes
                       40%
                       30%
                       20%
                       10%
                        0%
                               Sector        Level                    Focus
                                             Tiers
ViBRANT
                                                   Virtual Biodiversity Research




Who gains from collaboration between
computer science and social research?

  •   E-science facilities, e-science uptake and
      implementation
  •   Social Science and
  •   Computer Science
ViBRANT
                                                          Virtual Biodiversity Research




Acknowledgments

  ViBRANT –http://vbrant.eu
  Scratchpads –http://scratchpads.eu/

  Laura Hollink for her help with the raw log files
  Simon Rycroft for his help with the web analytics reports
  Vince Smith for sharing presentation material

Contenu connexe

En vedette

Rethinking the Functions of a Journal - some case studies from PLoS by Mark P...
Rethinking the Functions of a Journal - some case studies from PLoS by Mark P...Rethinking the Functions of a Journal - some case studies from PLoS by Mark P...
Rethinking the Functions of a Journal - some case studies from PLoS by Mark P...
dduin
 
Cmat powerpoint presentation
Cmat powerpoint presentationCmat powerpoint presentation
Cmat powerpoint presentation
JustBryan
 
Optosem presentation
Optosem presentationOptosem presentation
Optosem presentation
xantec
 
Abbi sw 180 chatroom project
Abbi sw 180 chatroom projectAbbi sw 180 chatroom project
Abbi sw 180 chatroom project
AbbiJohnson
 
งานนำเสนอ1
งานนำเสนอ1งานนำเสนอ1
งานนำเสนอ1
fahasholy
 
EDIT & Scientific Publishing in Natural History Institutions
EDIT & Scientific Publishing in Natural History InstitutionsEDIT & Scientific Publishing in Natural History Institutions
EDIT & Scientific Publishing in Natural History Institutions
dduin
 
Open science and scholarly publishing practices by Daphne Duin
Open science and scholarly publishing practices by Daphne DuinOpen science and scholarly publishing practices by Daphne Duin
Open science and scholarly publishing practices by Daphne Duin
dduin
 
Misconception of biology student
Misconception of biology studentMisconception of biology student
Misconception of biology student
mangkibone
 
Misconception of biology student
Misconception of biology studentMisconception of biology student
Misconception of biology student
mangkibone
 
Enhanced Publications by John Doove
Enhanced Publications by John DooveEnhanced Publications by John Doove
Enhanced Publications by John Doove
dduin
 
Acquisition policy and business models of research libraries in a digital era...
Acquisition policy and business models of research libraries in a digital era...Acquisition policy and business models of research libraries in a digital era...
Acquisition policy and business models of research libraries in a digital era...
dduin
 
Pro and con in smart school1
Pro and con in smart school1Pro and con in smart school1
Pro and con in smart school1
mangkibone
 
Global References index to Biodiversity (GRIB), a bibliographic index of EDIT...
Global References index to Biodiversity (GRIB), a bibliographic index of EDIT...Global References index to Biodiversity (GRIB), a bibliographic index of EDIT...
Global References index to Biodiversity (GRIB), a bibliographic index of EDIT...
dduin
 

En vedette (20)

Rethinking the Functions of a Journal - some case studies from PLoS by Mark P...
Rethinking the Functions of a Journal - some case studies from PLoS by Mark P...Rethinking the Functions of a Journal - some case studies from PLoS by Mark P...
Rethinking the Functions of a Journal - some case studies from PLoS by Mark P...
 
Building development trajectories
Building development trajectoriesBuilding development trajectories
Building development trajectories
 
Cmat powerpoint presentation
Cmat powerpoint presentationCmat powerpoint presentation
Cmat powerpoint presentation
 
Writing CouchDB Views using ClojureScript
Writing CouchDB Views using ClojureScriptWriting CouchDB Views using ClojureScript
Writing CouchDB Views using ClojureScript
 
Optosem presentation
Optosem presentationOptosem presentation
Optosem presentation
 
Abbi sw 180 chatroom project
Abbi sw 180 chatroom projectAbbi sw 180 chatroom project
Abbi sw 180 chatroom project
 
งานนำเสนอ1
งานนำเสนอ1งานนำเสนอ1
งานนำเสนอ1
 
EDIT & Scientific Publishing in Natural History Institutions
EDIT & Scientific Publishing in Natural History InstitutionsEDIT & Scientific Publishing in Natural History Institutions
EDIT & Scientific Publishing in Natural History Institutions
 
Open science and scholarly publishing practices by Daphne Duin
Open science and scholarly publishing practices by Daphne DuinOpen science and scholarly publishing practices by Daphne Duin
Open science and scholarly publishing practices by Daphne Duin
 
Assessing social and economic impacts of building materials
Assessing social and economic impacts of building materialsAssessing social and economic impacts of building materials
Assessing social and economic impacts of building materials
 
Misconception of biology student
Misconception of biology studentMisconception of biology student
Misconception of biology student
 
Misconception of biology student
Misconception of biology studentMisconception of biology student
Misconception of biology student
 
Enhanced Publications by John Doove
Enhanced Publications by John DooveEnhanced Publications by John Doove
Enhanced Publications by John Doove
 
Współczesne procesory
Współczesne procesoryWspółczesne procesory
Współczesne procesory
 
Common book
Common bookCommon book
Common book
 
Acquisition policy and business models of research libraries in a digital era...
Acquisition policy and business models of research libraries in a digital era...Acquisition policy and business models of research libraries in a digital era...
Acquisition policy and business models of research libraries in a digital era...
 
Working Together on the Web, Working Well?
Working Together on the Web, Working Well? Working Together on the Web, Working Well?
Working Together on the Web, Working Well?
 
Pro and con in smart school1
Pro and con in smart school1Pro and con in smart school1
Pro and con in smart school1
 
Historia Procesorów
Historia ProcesorówHistoria Procesorów
Historia Procesorów
 
Global References index to Biodiversity (GRIB), a bibliographic index of EDIT...
Global References index to Biodiversity (GRIB), a bibliographic index of EDIT...Global References index to Biodiversity (GRIB), a bibliographic index of EDIT...
Global References index to Biodiversity (GRIB), a bibliographic index of EDIT...
 

Similaire à Wish you were here before!' Who Gains from Collaboration between Computer Science and Social Research?

Similaire à Wish you were here before!' Who Gains from Collaboration between Computer Science and Social Research? (20)

Sinnott Paper
Sinnott PaperSinnott Paper
Sinnott Paper
 
No specimen (software) left behind
No specimen (software) left behindNo specimen (software) left behind
No specimen (software) left behind
 
TDWG_ ViBRANT_301013
TDWG_ ViBRANT_301013TDWG_ ViBRANT_301013
TDWG_ ViBRANT_301013
 
IoT overview 2014
IoT overview 2014IoT overview 2014
IoT overview 2014
 
Roberts leiden110213
Roberts leiden110213Roberts leiden110213
Roberts leiden110213
 
Structural Biology in the Clouds: A Success Story of 10 years
Structural Biology in the Clouds: A Success Story of 10 yearsStructural Biology in the Clouds: A Success Story of 10 years
Structural Biology in the Clouds: A Success Story of 10 years
 
An introduction to ViBRANT: Virtual Biodiversity Research and Access Network ...
An introduction to ViBRANT: Virtual Biodiversity Research and Access Network ...An introduction to ViBRANT: Virtual Biodiversity Research and Access Network ...
An introduction to ViBRANT: Virtual Biodiversity Research and Access Network ...
 
WSO2 Big Data Platform and Applications
WSO2 Big Data Platform and ApplicationsWSO2 Big Data Platform and Applications
WSO2 Big Data Platform and Applications
 
Community web sites: small pieces loosely joined
Community web sites: small pieces loosely joinedCommunity web sites: small pieces loosely joined
Community web sites: small pieces loosely joined
 
20130503 iCore at calipso workshop fia dublin
20130503 iCore at calipso workshop fia dublin20130503 iCore at calipso workshop fia dublin
20130503 iCore at calipso workshop fia dublin
 
OI in the Public Sector by Esteve Almirall
OI in the Public Sector by Esteve AlmirallOI in the Public Sector by Esteve Almirall
OI in the Public Sector by Esteve Almirall
 
Web open standards for linked data and knowledge graphs as enablers of EU dig...
Web open standards for linked data and knowledge graphs as enablers of EU dig...Web open standards for linked data and knowledge graphs as enablers of EU dig...
Web open standards for linked data and knowledge graphs as enablers of EU dig...
 
Software Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglySoftware Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The Ugly
 
Scratchpad training
Scratchpad trainingScratchpad training
Scratchpad training
 
Internet of things (IoT) and big data- r.nabati
Internet of things (IoT) and big data- r.nabatiInternet of things (IoT) and big data- r.nabati
Internet of things (IoT) and big data- r.nabati
 
2020_12_11 «Opening Education with Artificial Intelligence» - Mitja Jermol
2020_12_11 «Opening Education with Artificial Intelligence» - Mitja Jermol2020_12_11 «Opening Education with Artificial Intelligence» - Mitja Jermol
2020_12_11 «Opening Education with Artificial Intelligence» - Mitja Jermol
 
Francis da costa rethinks the internet of things zd_net
Francis da costa rethinks the internet of things   zd_netFrancis da costa rethinks the internet of things   zd_net
Francis da costa rethinks the internet of things zd_net
 
Semantic Sensor Networks and Linked Stream Data
Semantic Sensor Networks and Linked Stream DataSemantic Sensor Networks and Linked Stream Data
Semantic Sensor Networks and Linked Stream Data
 
Information Engineering in the Age of the Internet of Things
Information Engineering in the Age of the Internet of Things Information Engineering in the Age of the Internet of Things
Information Engineering in the Age of the Internet of Things
 
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
Introduction to Big Data Analytics: Batch, Real-Time, and the Best of Both Wo...
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Dernier (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Wish you were here before!' Who Gains from Collaboration between Computer Science and Social Research?

  • 1. ViBRANT Virtual Biodiversity Research ‘Wish you were here before!’ Who gains from collaboration between computer science and social research? Daphne Duin, David King, Peter van den Besselaar Dep. of Organization Sciences & Network Institute, VU-University Amsterdam Department of Computing, The Open University, Milton Keynes Social Science and Digital Research: Interdisciplinary Insights, March, 12, 2012, Oxford e-Research Centre
  • 2. ViBRANT Virtual Biodiversity Research Help! How is this social data? Time taken to serve the request (microseconds) Host name (equates to Scratchpad) """Full URL"" (in quotes)" Origin of request (IP address) F5 Time the request was received (e#g# (01/Apr/2011:11:17:42 +0100) """First line of request"" (in quotes)" Status of final request (e#g# 200, 301, etc) Size of the response in bytes Remote logname (Almost always blank) """Referer"" (in quotes)" able.myspecies.info http://able.myspecies.info/favicon.ico 24.218.227.223 -- [14/Jul/2010:19:54:06 GET /favicon.ico HTTP/1.1 200 198 - Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6 polychaetes.info http://polychaetes.info/node/add/forum/forum/ 24.229.196.151 -- [14/Jul/2010:20:16:48 GET /node/add/forum/forum/ HTTP/1.0 301 - http://polychaetes.info/node/add/forum/forum/ Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; Creative) ciliateguide.myspecies.info http://ciliateguide.myspecies.info/node/add/forum/forum/ 24.229.196.151 -- [14/Jul/2010:20:39:14 GET /node/add/forum/forum/ HTTP/1.0 301 - http://ciliateguide.myspecies.info/node/add/forum/forum/ Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MRA 4.6 (build 01425); MRSPUTNIK 1, 5, 0, 19 SW) ciliateguide.myspecies.info http://ciliateguide.myspecies.info/node/add/forum/forum 24.229.196.151 -- [14/Jul/2010:20:39:22 GET /node/add/forum/forum HTTP/1.0 200 25219 http://ciliateguide.myspecies.info/node/add/forum/forum Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MRA 4.6 (build 01425); MRSPUTNIK 1, 5, 0, 19 SW) ciliateguide.myspecies.info http://ciliateguide.myspecies.info/node/add/forum/forum 24.229.196.151 -- [14/Jul/2010:20:39:37 POST /node/add/forum/forum HTTP/1.0 200 27128 http://ciliateguide.myspecies.info/node/add/forum/forum Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MRA 4.6 (build 01425); MRSPUTNIK 1, 5, 0, 19 SW) ciliateguide.myspecies.info http://ciliateguide.myspecies.info/node/add/forum/forum 24.229.196.151 -- [14/Jul/2010:20:39:47 GET /node/add/forum/forum HTTP/1.0 200 25219 http://ciliateguide.myspecies.info/node/add/forum/forum Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MRA 4.6 (build 01425); MRSPUTNIK 1, 5, 0, 19 SW) 26141 wallacefund.info http://wallacefund.info/robots.txt 38.101.148.126 -- [15/Jul/2010:03:48:42 GET /robots.txt HTTP/1.1 200 44 - Mozilla/5.0 (compatible; discobot/1.1; +http://discoveryengine.com/discobot.html mhp.myspecies.info http://mhp.myspecies.info/robots.txt 38.101.148.126 -- [15/Jul/2010:03:48:49 GET /robots.txt HTTP/1.1 200 44 - Mozilla/5.0 (compatible; discobot/1.1; +
  • 3. ViBRANT Virtual Biodiversity Research Interdisciplinary work for e-science E-science 1. Application of an e-infrastructure to do science 2. The study of the design, uptake and use of e-Science E-infrastructure: Scratchpads, online platform for biodiversity research Need: Developing alternative evaluation metrics for e- science Goal: Identification of different types of users Approach: Collaboration between social science and omputer science valuable for e-science
  • 4. ViBRANT Virtual Biodiversity Research What is the impact of e-science? Question from e-science facility to social scientists Identification of different types of users Who are visiting Scratchpad platform? Web data (eg server log files) Identify Internet Service Providers visiting Scratchpads Cluster Internet Service Providers visiting Scratchpads, into meaningful categories
  • 5. ViBRANT Virtual Biodiversity Research Material Standard web analytics report of Scratchpads >300 community sites > 5,000 registred users (unpaid) Public and closed content Names of 6,728 unique Internet Service Providers (ISPs) (6 months) natural history museum telstra internet verizon online llc freie universitaet berlin queensland department of natural resources and water Gemeente maastricht national parks board (ministry of national development) agriculture and agrifood canada Commission europeenne u.s. fish and wildlife service irm/bfo hqstate of nebraska / office of
  • 6. ViBRANT Virtual Biodiversity Research Social scientists and computer scientists First trying alone… ….marina|marine|medical|medisch|microsoft|mineral|mining|ministerie| ministry|monsanto|museo|museum|national park|naval|navy|nerc|news|novartis|observatoire|office…. Then question to computer scientist ...from social scientists: could you help us to better... • collect web data? • refine/cluster the data ? • develop tools/methods for measuring robustness of data?
  • 7. ViBRANT Virtual Biodiversity Research Altmetrics for e-science: a social science and computer science project “to what extent can we improve a human developed method with computational techniques, in order to cluster ISPs into meaningful categories representing the various audiences using Scratchpads? “
  • 8. ViBRANT Virtual Biodiversity Research Method computer scientist Identify Internet Service Providers visiting Scratchpads, removing noise Inductive logic program, Aleph Cluster Internet Service Providers visiting Scratchpads into meaningful categories Bayesian classifier
  • 9. ViBRANT Virtual Biodiversity Research Results: Identification of ISPs Manually build filter (181 terms) - accuracy 94% - precision 92% - recall 97% Many hours of work Computational filter (6 terms) - accuracy 84% Comparison of filters 6 term filter set 120% 181 term filter set - precision 98% 100% - recall 73% 80% c Couple of minutes 60% 40% 20% 0% precision: recall: f-measure:
  • 10. ViBRANT Virtual Biodiversity Research Results: Clustering ISPs in meaningful categories ISPs by Sector Manual method: filter with key government words industry “university” “research” “school” media/arts “museum” research/edu Problematic! Computational method: classifiers - 90% accuracy Couple of minutes! Classifier Accuracy 100% 90% 80% 70% 60% Simple 50% Bayes 40% 30% 20% 10% 0% Sector Level Focus Tiers
  • 11. ViBRANT Virtual Biodiversity Research Who gains from collaboration between computer science and social research? • E-science facilities, e-science uptake and implementation • Social Science and • Computer Science
  • 12. ViBRANT Virtual Biodiversity Research Acknowledgments ViBRANT –http://vbrant.eu Scratchpads –http://scratchpads.eu/ Laura Hollink for her help with the raw log files Simon Rycroft for his help with the web analytics reports Vince Smith for sharing presentation material

Notes de l'éditeur

  1. It all started with a 19GB file with this type of data, which was send to the social scientists by an e-science facility with the question: “Please tell us who our users are and how they are using our infrastructure”. The file contained transaction logs from users visiting the infra.E-science facilities generates electronic data. These are the digital footprints of users and usages that are stored in the logs of the e-infrastructure and tell us when the infra is accessed, information is downloaded, uploaded or edited. In other words, how users ‘behaved’. We learnt from this experience that this type of “electronice use” data is characterized by:-Large data sets -Fuzzy data (ambiguous) (example uni spelled wrong)Where did this question come from /motivated byhow does this link to different definitions of e-science?
  2. We understand e-science as:1. Application of an e-infrastructure to do science2. The study of the design, uptake and use of e-ScienceAs you can see the expertise from CS and SS are apparent in the definition of e-science …We’ll demonstrate in this presentation how a collaborative project that we set up contributes to the development of ScratchpadsE-infra: Scratchpads are an online platform for scientists facilitating e-science in the field of Biodiversity Research.Need:With more and more scientific work moving to the web or into databases there is a growing need to understand the impact this change has for science, scientists and the users of scientific information. Goal: Crucial in such an impact is the identification of different types of users and use. Approach: Interdisciplinary work of CS and SS, sophisticated data treatment is needed to give itmeaning in the context of evaluation.
  3. So we had a question, who are the visitors of SPs?And a file withelectronic use data ...the challenge then was how to analyse the data and to know how robust the data are.Identify:Wedecided to start with identifying “the users”. Web analytics packages can be used to generate information on the visitors (users), notably through identification of the names of the visiting Internet Service Providers (ISPs). Through the name of the ISP, (i.e. ‘VrijeUniversiteit’) we may be able to identify the nature and activities of the users. Clusters: Additionally, and next to identification we also wanted to cluster the ISP into categories that make sense for evaluation purposesWe were in particular interested to see the partition of academic users versus other educational users and sectors such as government and business as this could tell us something about the (societal) impact of the e-infrastructure.
  4. We zoom further in on the data we had access toWe left the raw log files for what they were and used standard web analytics report...we made this decision after consulting a computer scientistsWe are looking at 300 websites at once! Generrates a long list of ISPs. The list contains ISPs that are clearly part of the community of BR, government, fundersWe call the first examples‘specific ISPs’And the rest ‘ general ISPs’We filter the general ISPs out.The others are Relevant for VIBRANT and evaluation purposes
  5. First the Social scientists tried to handle the data alone, manually developed a natural expression filter based on 181 ‘include’ terms based on many hours of work. We run into several limits,technically what the system allowed us to do and our skills...we couldn't work around the limits. We discussed our problems with David and it turned out that we questions we had were Also interesting questions for CSThis is how we decided to join forces and started a project together. David will tell you now something about the computer science contribution and the outcomes of our work
  6. The social scientists produced a 181-term filter set after many hours of effort that gave 94% accuracy, whereas the computer scientist produced a 6-term filter set in a couple of minutes that gave 84% accuracy. The tested computer-aided filtering reached a higher precision than the manually‑developed filter (98% vs 92%) though for the recall in this initial test favored the manual approach (73% vs 97%).
  7. Meaningful categories in this context are categories that The manual process highlighted a problem with continuing to use keywords to categorize ISPs. Some categories are easily made up from words in the name of the full ISP such as “university” or “research” and could be grouped under the tier one category “research & education”. However, this approach is limited. For example, to simply categorize all ISPs who had within their name the terms “health” or “medic*” as “public health” meant that a range of research, educational, governmental and corporate affiliated ISPs were wrongly classified. Therefore, we were encouraged to categorize ISPs using classifiers rather than by extending our work with filters.
  8. Interdisciplinary work of CS and SS will bring to e-science enhanced insights on the actual use and usage of the e-science environment based on robust (log) data and analysis, in a relative short amount of time2. Social science will benefit from working with CS because of increased scale and speed of data collection and analysis and for their insight in the technological boundaries/charateritics. 3. CS will benefit because collaboration provides opportunity to demonstrate their engineering insights (tool building for the e-science facility as well as tools for analyzing social science data sets); 2) access to large datasets with behavioral/user information which are nice cases to test computer science theories Possible costs:Above we listed several reasons for collaboration between e-science facilities, computer science and social sciences, nevertheless every collaboration does have costs: it requires time in planning and communication. Furthermore, collaborators support each other’s work often at the costs advancing their own research