SlideShare une entreprise Scribd logo
1  sur  35
WebART project
Web Archive RetrievalTools
Jaap Kamps, Richard Rogers, Arjen deVries 

Hildelies Balk, RenéVoorburg 	

!
Thaer Samar, Hugo Huurdeman, Sanna Kumpulainen
Flickr: LucViatour
!
Hugo Huurdeman!
University of Amsterdam!
huurdeman@uva.nl!
!
!
!
Towards Research Engines: 

Supporting Search Stages in Web Archives
webarchiving.nl
Web Archives as Scholarly Sources conference, Aarhus University, 10 June 2015
Introduction
• Web archives preserve the fast-
changing Web
• By now containing Petabytes of
valuable Web data
!
• This could be a valuable resource,
however, archives have not
frequently been used for research
!
• Several underlying reasons exist.
Here, the focus is on potential
limitations in access
Flickr: laughingsquid
The concept of ‘task-sharing’
• We look at the concept of task-
sharing (Beaulieu, 1999)
!
• i.e. how should we design web
archive access systems to better
facilitate task-sharing between
scholar and system?
!
• Bottom-up approach: looking at
scholars’ use of Web data,

and how currents systems
support scholars’ needs
scholar
research task
system
1 Scholars’ use of web data!
& current support
1.1 Study: scholars’ research phases
• Exploratory analysis of scholars’
research tasks (journal papers)!
• scholars using temporal Web data
!
• Use research phases as a ‘lens’
to analyze these papers
artist:
1.1 Background: Research Phases
• Various scholars have
defined different 

stages occurring in 

research tasks 

(Bronstein ’07; Chu ’99; 

Meho & Tibbo ’03)
!
• Specifically, Brügger 

(2014) has defined several
research phases relevant 

to web archive research:
1. Corpus creation
2. Analysis
3. Dissemination
1.2 Study: scholars’ research phases
• Method:!
• querying EBSCOhost using the CMMC (Communication & Mass
Media Complete), and LISTA (Library, Information Science &
Technology Abstracts) databases
!
• selecting all journal papers (2007-2015) which contain longitudinal
analyses (excluding computer science papers)
1.2 Study: literature corpus overview
• 18 papers (17 distinct first authors)
!
• Main areas:
• Information Science
• Communication
• New Media
• Political Science
1.2 Study: literature corpus overview
• Observation: various ways of
corpus definition, analysis and
dissemination in journal papers
!
• However, most papers in this
literature set did not use Web
archives as a data source
!
• Corresponds to large gap
potential community addressed
by web archives & small group
actually using them thus far
(Dougherty & Meyer, 2014)
1.3.1 Study results: Corpus definition phase
• 1. selecting webpages or
websites, e.g. based on
authoritative lists (13)
!
• 2. querying regular search
engines (5)
!
• 3. taking a sample of
webpages (4)
!
• Often: combination of methods
e.g. the term ‘informetrics’ (Bar-Ilan, 2009), descriptors
of youth movements (Xenos & Bennet, 2007)
e.g. a list of insurance companies (Waite and Harrison,
2007)
e.g. one week per month (Li et al, 2014) ; to reduce
large size of corpus, or data bias (John, 2013)
1.3.1 Study results: Corpus definition phase
Query
Selection
Sample
Query
Selection
Sample
➤
➤
➤
➤
➤
13
5
1
3
4
• Current support:
• Most: Selecting URLs (Wayback Machine)
• Many: Querying the contents of the archive
• Few: Selecting (predefined) categories
• Very few: Sampling contents of the archive
• Current limitations:
• Defining, saving & sharing of corpora
• Document-centric access methods [Hockx-Yu, 14]
• Limitations of search [Ben-David & Huurdeman,14]
1.3.2 Results: Analysis phase (1/2)
• Content analysis (66.7%)!
• manual coding
• coding schemes, at times based
on existing frameworks
!
• Content analysis (22.2%)
• automatic
• existing/customly developed tools
!
• Network analysis (11.1%)!
• issue crawler, link
classifications
1.3.2 Results: Analysis phase (2/2)
• Level of analysis:

(b/o Brügger, 2013)!
!
• page element (4) (22%)
• e.g. mission statements
• web page (6) (33%)
• e.g. blog pages
• web site* (7) (39%)
• e.g. political actors’ sites
• web sphere (1) (6%)
• e.g. youth web sphere
web sphere (1)
website (7)
page element (4)
webpage (8)
• Current support
• Very few: analysis (n-gram,
trends), export options
• Current limitations:
• Generally not applicable to custom corpora
• No ways to define granularity of results
• Often have to resort to script-based analysis tools
• Lack of integrated content analysis, coding support, ..
1.3.2 Support: Analysis phase
1.3.3 Results: Dissemination phase
• Tables (16)
!
• Graphs (10)
!
• Link networks (1)
!
• Model (1)
1.3.3 Support: Dissemination phase
• Current limitations
• Set of visualizations
depends on archive
• Generally not applicable
to user-defined corpora
• Current support
• some visualization options
(n-gram, tag clouds)
1.4 Summary
• Observation: omissions in current
support for corpus creation,
analysis and dissemination in a
research context
!
• Opportunities arise to increase
task-sharing in future systems
scholar
research task
system
2 From Search to Research engines
2.1 Supporting the flow (1/2)
• How to integrate this varied set of features into an
integrated access system?
• with a high usability and without cognitive overload
!
!
!
!
!
!
!
• Traditional approach: “Complex” interface 

integrating all functionality
Search
?
Dunne
Dunne et al, 2012
2.1 Supporting the flow (2/2)
• Our approach: Divide functionality per (research) stage
!
• Inspired by ongoing work on supporting the flow of Web and
book search in multistage interfaces, based on cognitive models
of the search process 

[Huurdeman & Kamps, 2014; Huurdeman, Kamps, Koolen & Kumpulainen, 2015]
Search
Corpus Creation
Search
Visualization
Search
Analysis
2.2 Current research prototypes: b/o Dutch Web archive
• National Library of the
Netherlands (KB) !
!
• Selective Web archive (2007-now)!
• 10+ Terabyte (25,000+ harvests)
!
• Idea: modular system
2.2.1 Supporting research phases: corpus creation
• faceted search
interface
• different modalities to
explore results
• possibility to
• save (complex) 

queries
• save results
• categorize
Search
Corpus Creation
Saved queries
2.2.1 Supporting research phases: corpus creation
• Further customization
’Under the hood’:
define search strategy
• via visual building blocks
• flexibility in defining a
corpus (determine
selection, ranking,
queries, etc)

[De Vries et al, 2010]

see also: spinque.com
Search
Corpus Creation
2.2.2 Supporting research phases: analysis
• Analysis interface !
• edit/annotate
dataset
• search &
browse dataset
• analyze
Search
Analysis
2.2.3 Supporting research phases: dissemination
• Visualization interface!
• based on RAW
(raw.densitydesign.org)
• visualize datasets
(graphs and
visualizations)
Search
Dissemination
2.3 Caveats & discussion
• Looking at access aspects
• not at underlying data & its properties
• next step: contextualizing ‘completeness’ of
results [see Huurdeman, Kamps, Samar, De Vries, Ben-
David & Rogers, 2015]
!
• Slightly utopian vision: not all analysis
can be supported
• generic versus specific approaches
• towards ‘toolmaker’s tools’
!
• Different archives offer different toolsets
• Importance of sharing (open-source) and
collaboration!
2.4 Conclusion
• Exploratory analysis of scholars’
choices related to corpus
definition, analysis and
dissemination!
!
• These choices revealed a number
of limitations of current access
interfaces
!
• Therefore, we propose a more
fluid approach, moving from mere
search to ‘research engines’
Wayback
Machine
Search
engine
‘Research’
engine
webarchiving.nl
@webart12
Thanks & Acknowledgements
• The WebART team (’12-’16): 

Jaap Kamps, Richard Rogers, 

Arjen de Vries, Thaer Samar, 

Sanna Kumpulainen; 

and Anat Ben-David.
!
• We gratefully acknowledge the
collaboration with the Dutch Web
Archive of the National Library of the
Netherlands.
!
• This research was supported by the
Netherlands Organization for Scientific
Research (WebART project, NWO
CATCH # 640.005.001).
References
• Beaulieu, M. (2000). Interaction in information searching and retrieval. Journal of Documentation, 56(4), 431–439.
• Ben-David A. & Huurdeman H. (2014). Web Archive Search as Research: Methodological and Theoretical
Implications. Alexandria Journal, Volume 25, No. 1 (2014)
• Bronstein, J. (n.d.). The role of the research phase in information seeking behaviour of Jewish scholars: a
modification of Ellis’s behavioural characteristics. Retrieved April 20, 2015, from http://www.informationr.net/ir/12-3/
paper318.html
• Brügger, N. (2014). Concluding Remarks. International Internet Preservation Consortium General Consortium.
Paris, France. Retrieved from: http://netpreserve.org/sites/default/files/attachments/Brugger.ppt (April 19, 2015)
• Brügger, N. (2013). Historical Network Analysis of the Web. Social Science Computer Review, 31(3), 306–321
• Chu, C. M. (1999). Literary critics at work and their information needs: A research-phases model. Library &
Information Science Research, 21(2), 247–273.
• Dunne, C., Shneiderman, B., Gove, R., Klavans, J., & Dorr, B. (2012). Rapid understanding of scientific paper
collections: Integrating statistics, text analytics, and visualization. Journal of the American Society for Information
Science and Technology, 63(12), 2351–2369.
• Hockx-Yu, H. (2014). Access and Scholarly Use of Web Archives. Alexandria, 25(1-2), 113–127.
• Huurdeman H., Kamps J., Samar T., de Vries A., Ben-David A., Rogers R. (2015). Finding Pages in the Unarchived
Web. International Journal on Digital Libraries.
• Huurdeman H., Kamps J., Koolen M., Kumpulainen, S. (forthcoming). The Value of Multistage Interfaces for Book
Search. CEUR-WS.
• Huurdeman, H., & Kamps, J. (2014). From Multistage Information-seeking Models to Multistage Search Systems. In
Proceedings of the 5th Information Interaction in Context Symposium (pp. 145–154). New York, NY, USA: ACM.
• Meho, L. I., & Tibbo, H. R. (2003). Modeling the information-seeking behavior of social scientists: Ellis’s study
revisited. Journal of the American Society for Information Science and Technology, 54(6), 570–587.
• Rogers R. (2013). Digital Methods. MIT Press 2013
• de Vries A., Alink W., Cornacchia R. (2010). Search by Strategy. Proc. ESAIR '10
!
Hugo Huurdeman!
University of Amsterdam!
huurdeman@uva.nl!
!
!
!
Towards Research Engines: 

Supporting Search Stages in Web Archives
webarchiving.nl
Web Archives as Scholarly Sources conference, Aarhus University, 10 June 2015

Contenu connexe

Tendances

Data Designed for Discovery
Data Designed for DiscoveryData Designed for Discovery
Data Designed for DiscoveryOCLC
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanMicah Altman
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
 
The library in the life of the user
The library in the life of the userThe library in the life of the user
The library in the life of the userlisld
 
An Open Context for Archaeology
An Open Context for ArchaeologyAn Open Context for Archaeology
An Open Context for Archaeologyguest756e05
 
The Future of Finding: Resource Discovery @ The University of Oxford
The Future of Finding: Resource Discovery @ The University of OxfordThe Future of Finding: Resource Discovery @ The University of Oxford
The Future of Finding: Resource Discovery @ The University of OxfordChristine Madsen
 
Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?OCLC
 
Best Practices for Descriptive Metadata
Best Practices for Descriptive MetadataBest Practices for Descriptive Metadata
Best Practices for Descriptive MetadataOCLC
 
Exploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadataExploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadataShenghui Wang
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceMicah Altman
 
Intro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLWIntro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLWGlen Robson
 
Let's Get Visible! with Karla Smith, Winnefox Library System
Let's Get Visible! with Karla Smith, Winnefox Library SystemLet's Get Visible! with Karla Smith, Winnefox Library System
Let's Get Visible! with Karla Smith, Winnefox Library SystemWiLS
 
BIBFRAME and OCLC Works: Defining Models and Discovering Evidence
BIBFRAME and OCLC Works: Defining Models and Discovering EvidenceBIBFRAME and OCLC Works: Defining Models and Discovering Evidence
BIBFRAME and OCLC Works: Defining Models and Discovering EvidenceOCLC
 
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅kulibrarians
 
Shifting ground: scholarly communication in geography
Shifting ground: scholarly communication in geographyShifting ground: scholarly communication in geography
Shifting ground: scholarly communication in geographyElizabeth Yates
 

Tendances (18)

Data Designed for Discovery
Data Designed for DiscoveryData Designed for Discovery
Data Designed for Discovery
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental Scan
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 
The library in the life of the user
The library in the life of the userThe library in the life of the user
The library in the life of the user
 
An Open Context for Archaeology
An Open Context for ArchaeologyAn Open Context for Archaeology
An Open Context for Archaeology
 
The Future of Finding: Resource Discovery @ The University of Oxford
The Future of Finding: Resource Discovery @ The University of OxfordThe Future of Finding: Resource Discovery @ The University of Oxford
The Future of Finding: Resource Discovery @ The University of Oxford
 
Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?Linked Data Implementations—Who, What and Why?
Linked Data Implementations—Who, What and Why?
 
Best Practices for Descriptive Metadata
Best Practices for Descriptive MetadataBest Practices for Descriptive Metadata
Best Practices for Descriptive Metadata
 
Exploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadataExploring a world of networked information built from free-text metadata
Exploring a world of networked information built from free-text metadata
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information Science
 
Intro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLWIntro to IIIF and IIIF @NLW
Intro to IIIF and IIIF @NLW
 
Ir1
Ir1Ir1
Ir1
 
Let's Get Visible! with Karla Smith, Winnefox Library System
Let's Get Visible! with Karla Smith, Winnefox Library SystemLet's Get Visible! with Karla Smith, Winnefox Library System
Let's Get Visible! with Karla Smith, Winnefox Library System
 
BIBFRAME and OCLC Works: Defining Models and Discovering Evidence
BIBFRAME and OCLC Works: Defining Models and Discovering EvidenceBIBFRAME and OCLC Works: Defining Models and Discovering Evidence
BIBFRAME and OCLC Works: Defining Models and Discovering Evidence
 
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
 
Lauruhn-5-jun15
Lauruhn-5-jun15Lauruhn-5-jun15
Lauruhn-5-jun15
 
Clark - Metadata is the Message
Clark - Metadata is the MessageClark - Metadata is the Message
Clark - Metadata is the Message
 
Shifting ground: scholarly communication in geography
Shifting ground: scholarly communication in geographyShifting ground: scholarly communication in geography
Shifting ground: scholarly communication in geography
 

Similaire à Towards Research Engines: Supporting Search Stages in Web Archives (2015)

Interrogating the Politics and Performativity of Web Archiving
Interrogating the Politics and Performativity of Web ArchivingInterrogating the Politics and Performativity of Web Archiving
Interrogating the Politics and Performativity of Web ArchivingJessica Ogden
 
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...hsuleslie
 
Data Management for Collaboration, Access, and Interoperability
Data Management for Collaboration, Access, and InteroperabilityData Management for Collaboration, Access, and Interoperability
Data Management for Collaboration, Access, and InteroperabilityPlato L. Smith II
 
A Case Study Of An Open Online Course
A Case Study Of An Open Online CourseA Case Study Of An Open Online Course
A Case Study Of An Open Online CourseSuzan Koseoglu
 
Tracking Social Media Participation: New Approaches to Studying User-Genera...
Tracking Social Media Participation: New Approaches to Studying User-Genera...Tracking Social Media Participation: New Approaches to Studying User-Genera...
Tracking Social Media Participation: New Approaches to Studying User-Genera...Axel Bruns
 
Tracking Social Media Participation: New Approaches to Studying User-Genera...
Tracking Social Media Participation: New Approaches to Studying User-Genera...Tracking Social Media Participation: New Approaches to Studying User-Genera...
Tracking Social Media Participation: New Approaches to Studying User-Genera...Axel Bruns
 
Studying archives of online behavior
Studying archives of online behaviorStudying archives of online behavior
Studying archives of online behaviorJames Howison
 
When Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchWhen Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchJaap Kamps
 
How Do UK Students, Researchers and Academics use the Internet
How Do UK Students, Researchers and Academics use the InternetHow Do UK Students, Researchers and Academics use the Internet
How Do UK Students, Researchers and Academics use the InternetCaroline Williams
 
Dissemination Information Packages (DIPS) for Information Reuse
Dissemination Information Packages (DIPS) for Information Reuse Dissemination Information Packages (DIPS) for Information Reuse
Dissemination Information Packages (DIPS) for Information Reuse Micah Altman
 
Research Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesResearch Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesCelia Emmelhainz
 
Relationship status: Libraries and linked data in Europe
Relationship status: Libraries and linked data in EuropeRelationship status: Libraries and linked data in Europe
Relationship status: Libraries and linked data in EuropeDiane Rasmussen Pennington
 
Bibliometric-enhanced Information Retrieval: Connecting IR with Bibliometrics
Bibliometric-enhanced Information Retrieval: Connecting IR with BibliometricsBibliometric-enhanced Information Retrieval: Connecting IR with Bibliometrics
Bibliometric-enhanced Information Retrieval: Connecting IR with BibliometricsGESIS
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypseENUG
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsRebecca Grant
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsMartin Donnelly
 
Internationalising South African Scholarly Journals
Internationalising South African Scholarly Journals Internationalising South African Scholarly Journals
Internationalising South African Scholarly Journals KidsintheCloud
 

Similaire à Towards Research Engines: Supporting Search Stages in Web Archives (2015) (20)

Interrogating the Politics and Performativity of Web Archiving
Interrogating the Politics and Performativity of Web ArchivingInterrogating the Politics and Performativity of Web Archiving
Interrogating the Politics and Performativity of Web Archiving
 
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
Sediment Experimentalist Network (SEN): Sharing and reusing methods and data ...
 
Data Management for Collaboration, Access, and Interoperability
Data Management for Collaboration, Access, and InteroperabilityData Management for Collaboration, Access, and Interoperability
Data Management for Collaboration, Access, and Interoperability
 
A Case Study Of An Open Online Course
A Case Study Of An Open Online CourseA Case Study Of An Open Online Course
A Case Study Of An Open Online Course
 
Tracking Social Media Participation: New Approaches to Studying User-Genera...
Tracking Social Media Participation: New Approaches to Studying User-Genera...Tracking Social Media Participation: New Approaches to Studying User-Genera...
Tracking Social Media Participation: New Approaches to Studying User-Genera...
 
Tracking Social Media Participation: New Approaches to Studying User-Genera...
Tracking Social Media Participation: New Approaches to Studying User-Genera...Tracking Social Media Participation: New Approaches to Studying User-Genera...
Tracking Social Media Participation: New Approaches to Studying User-Genera...
 
Studying archives of online behavior
Studying archives of online behaviorStudying archives of online behavior
Studying archives of online behavior
 
When Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes SearchWhen Search becomes Research and Research becomes Search
When Search becomes Research and Research becomes Search
 
How Do UK Students, Researchers and Academics use the Internet
How Do UK Students, Researchers and Academics use the InternetHow Do UK Students, Researchers and Academics use the Internet
How Do UK Students, Researchers and Academics use the Internet
 
Dissemination Information Packages (DIPS) for Information Reuse
Dissemination Information Packages (DIPS) for Information Reuse Dissemination Information Packages (DIPS) for Information Reuse
Dissemination Information Packages (DIPS) for Information Reuse
 
Research Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social SciencesResearch Data Management in the Humanities and Social Sciences
Research Data Management in the Humanities and Social Sciences
 
Relationship status: Libraries and linked data in Europe
Relationship status: Libraries and linked data in EuropeRelationship status: Libraries and linked data in Europe
Relationship status: Libraries and linked data in Europe
 
Bibliometric-enhanced Information Retrieval: Connecting IR with Bibliometrics
Bibliometric-enhanced Information Retrieval: Connecting IR with BibliometricsBibliometric-enhanced Information Retrieval: Connecting IR with Bibliometrics
Bibliometric-enhanced Information Retrieval: Connecting IR with Bibliometrics
 
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
NISO Virtual Conference Scientific Data Management: Caring for Your Instituti...
 
Guy avoiding-dat apocalypse
Guy avoiding-dat apocalypseGuy avoiding-dat apocalypse
Guy avoiding-dat apocalypse
 
Curating Humanities Data: Law, technology and reality
Curating Humanities Data: Law, technology and realityCurating Humanities Data: Law, technology and reality
Curating Humanities Data: Law, technology and reality
 
Managing Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research MethodsManaging Ireland's Research Data - 3 Research Methods
Managing Ireland's Research Data - 3 Research Methods
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and Solutions
 
Anu digital research literacies
Anu digital research literaciesAnu digital research literacies
Anu digital research literacies
 
Internationalising South African Scholarly Journals
Internationalising South African Scholarly Journals Internationalising South African Scholarly Journals
Internationalising South African Scholarly Journals
 

Plus de TimelessFuture

Webmapping: maps for presentation, exploration & analysis
Webmapping: maps for presentation, exploration & analysisWebmapping: maps for presentation, exploration & analysis
Webmapping: maps for presentation, exploration & analysisTimelessFuture
 
Experiential Interfaces: 

3D reconstructions as entry points for exploration...
Experiential Interfaces: 

3D reconstructions as entry points for exploration...Experiential Interfaces: 

3D reconstructions as entry points for exploration...
Experiential Interfaces: 

3D reconstructions as entry points for exploration...TimelessFuture
 
Step inside the Image: 

Interpretative Interfaces for 
3D Historical Content
Step inside the Image: 

Interpretative Interfaces for 
3D Historical ContentStep inside the Image: 

Interpretative Interfaces for 
3D Historical Content
Step inside the Image: 

Interpretative Interfaces for 
3D Historical ContentTimelessFuture
 
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...TimelessFuture
 
The Multi-Stage Experience: the Simulated Work Task Approach to Studying Info...
The Multi-Stage Experience: the Simulated Work Task Approach to Studying Info...The Multi-Stage Experience: the Simulated Work Task Approach to Studying Info...
The Multi-Stage Experience: the Simulated Work Task Approach to Studying Info...TimelessFuture
 
Op Ontdekkingsreis door het KB Webarchief - Exploratieve Visualisatie in een ...
Op Ontdekkingsreis door het KB Webarchief - Exploratieve Visualisatie in een ...Op Ontdekkingsreis door het KB Webarchief - Exploratieve Visualisatie in een ...
Op Ontdekkingsreis door het KB Webarchief - Exploratieve Visualisatie in een ...TimelessFuture
 
Visualization Lecture - Clariah Summer School 2018
Visualization Lecture - Clariah Summer School 2018Visualization Lecture - Clariah Summer School 2018
Visualization Lecture - Clariah Summer School 2018TimelessFuture
 
Outcomes Visual Navigation Project
Outcomes Visual Navigation ProjectOutcomes Visual Navigation Project
Outcomes Visual Navigation ProjectTimelessFuture
 
KNVI 2017: De collectie in een ander licht - Creatieve inzet van nieuwe techn...
KNVI 2017: De collectie in een ander licht - Creatieve inzet van nieuwe techn...KNVI 2017: De collectie in een ander licht - Creatieve inzet van nieuwe techn...
KNVI 2017: De collectie in een ander licht - Creatieve inzet van nieuwe techn...TimelessFuture
 
Workshop: Inspirational Journeys - Challenges and Solutions for Visual Naviga...
Workshop: Inspirational Journeys - Challenges and Solutions for Visual Naviga...Workshop: Inspirational Journeys - Challenges and Solutions for Visual Naviga...
Workshop: Inspirational Journeys - Challenges and Solutions for Visual Naviga...TimelessFuture
 
“More than Meets the Eye” - Analyzing the Success of User Queries in Oria
“More than Meets the Eye” - Analyzing the Success of User Queries in Oria“More than Meets the Eye” - Analyzing the Success of User Queries in Oria
“More than Meets the Eye” - Analyzing the Success of User Queries in OriaTimelessFuture
 
Not available, or not found? Lessons from user queries in the Oria catalog at...
Not available, or not found? Lessons from user queries in the Oria catalog at...Not available, or not found? Lessons from user queries in the Oria catalog at...
Not available, or not found? Lessons from user queries in the Oria catalog at...TimelessFuture
 
Webarchief & Wetenschap (Dutch)
Webarchief & Wetenschap (Dutch)Webarchief & Wetenschap (Dutch)
Webarchief & Wetenschap (Dutch)TimelessFuture
 
From Exploration to Construction
 - How to Support the Complex Dynamics of In...
From Exploration to Construction
 - How to Support the Complex Dynamics of In...From Exploration to Construction
 - How to Support the Complex Dynamics of In...
From Exploration to Construction
 - How to Support the Complex Dynamics of In...TimelessFuture
 
Active & Passive Utility of Search Interface Features in different Informatio...
Active & Passive Utility of Search Interface Features in different Informatio...Active & Passive Utility of Search Interface Features in different Informatio...
Active & Passive Utility of Search Interface Features in different Informatio...TimelessFuture
 
Supporting the Process - Adapting Search Systems To Search Stages (ECIL15)
Supporting the Process - Adapting Search Systems To Search Stages (ECIL15)Supporting the Process - Adapting Search Systems To Search Stages (ECIL15)
Supporting the Process - Adapting Search Systems To Search Stages (ECIL15)TimelessFuture
 
The Value of Multistage Search Systems for Book Search
The Value of Multistage Search Systems for Book SearchThe Value of Multistage Search Systems for Book Search
The Value of Multistage Search Systems for Book SearchTimelessFuture
 
WebART: hoe maak je webarchieven bruikbaar voor de wetenschap? (Dutch)
WebART: hoe maak je webarchieven bruikbaar voor de wetenschap? (Dutch)WebART: hoe maak je webarchieven bruikbaar voor de wetenschap? (Dutch)
WebART: hoe maak je webarchieven bruikbaar voor de wetenschap? (Dutch)TimelessFuture
 
From multistage information seeking models to multistage search systems (IIiX...
From multistage information seeking models to multistage search systems (IIiX...From multistage information seeking models to multistage search systems (IIiX...
From multistage information seeking models to multistage search systems (IIiX...TimelessFuture
 
WebART - "Data Digging" - eHumanities Group 2013
WebART - "Data Digging" - eHumanities Group 2013WebART - "Data Digging" - eHumanities Group 2013
WebART - "Data Digging" - eHumanities Group 2013TimelessFuture
 

Plus de TimelessFuture (20)

Webmapping: maps for presentation, exploration & analysis
Webmapping: maps for presentation, exploration & analysisWebmapping: maps for presentation, exploration & analysis
Webmapping: maps for presentation, exploration & analysis
 
Experiential Interfaces: 

3D reconstructions as entry points for exploration...
Experiential Interfaces: 

3D reconstructions as entry points for exploration...Experiential Interfaces: 

3D reconstructions as entry points for exploration...
Experiential Interfaces: 

3D reconstructions as entry points for exploration...
 
Step inside the Image: 

Interpretative Interfaces for 
3D Historical Content
Step inside the Image: 

Interpretative Interfaces for 
3D Historical ContentStep inside the Image: 

Interpretative Interfaces for 
3D Historical Content
Step inside the Image: 

Interpretative Interfaces for 
3D Historical Content
 
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...
Supporting the Interpretation of Enriched Audiovisual Sources through Tempora...
 
The Multi-Stage Experience: the Simulated Work Task Approach to Studying Info...
The Multi-Stage Experience: the Simulated Work Task Approach to Studying Info...The Multi-Stage Experience: the Simulated Work Task Approach to Studying Info...
The Multi-Stage Experience: the Simulated Work Task Approach to Studying Info...
 
Op Ontdekkingsreis door het KB Webarchief - Exploratieve Visualisatie in een ...
Op Ontdekkingsreis door het KB Webarchief - Exploratieve Visualisatie in een ...Op Ontdekkingsreis door het KB Webarchief - Exploratieve Visualisatie in een ...
Op Ontdekkingsreis door het KB Webarchief - Exploratieve Visualisatie in een ...
 
Visualization Lecture - Clariah Summer School 2018
Visualization Lecture - Clariah Summer School 2018Visualization Lecture - Clariah Summer School 2018
Visualization Lecture - Clariah Summer School 2018
 
Outcomes Visual Navigation Project
Outcomes Visual Navigation ProjectOutcomes Visual Navigation Project
Outcomes Visual Navigation Project
 
KNVI 2017: De collectie in een ander licht - Creatieve inzet van nieuwe techn...
KNVI 2017: De collectie in een ander licht - Creatieve inzet van nieuwe techn...KNVI 2017: De collectie in een ander licht - Creatieve inzet van nieuwe techn...
KNVI 2017: De collectie in een ander licht - Creatieve inzet van nieuwe techn...
 
Workshop: Inspirational Journeys - Challenges and Solutions for Visual Naviga...
Workshop: Inspirational Journeys - Challenges and Solutions for Visual Naviga...Workshop: Inspirational Journeys - Challenges and Solutions for Visual Naviga...
Workshop: Inspirational Journeys - Challenges and Solutions for Visual Naviga...
 
“More than Meets the Eye” - Analyzing the Success of User Queries in Oria
“More than Meets the Eye” - Analyzing the Success of User Queries in Oria“More than Meets the Eye” - Analyzing the Success of User Queries in Oria
“More than Meets the Eye” - Analyzing the Success of User Queries in Oria
 
Not available, or not found? Lessons from user queries in the Oria catalog at...
Not available, or not found? Lessons from user queries in the Oria catalog at...Not available, or not found? Lessons from user queries in the Oria catalog at...
Not available, or not found? Lessons from user queries in the Oria catalog at...
 
Webarchief & Wetenschap (Dutch)
Webarchief & Wetenschap (Dutch)Webarchief & Wetenschap (Dutch)
Webarchief & Wetenschap (Dutch)
 
From Exploration to Construction
 - How to Support the Complex Dynamics of In...
From Exploration to Construction
 - How to Support the Complex Dynamics of In...From Exploration to Construction
 - How to Support the Complex Dynamics of In...
From Exploration to Construction
 - How to Support the Complex Dynamics of In...
 
Active & Passive Utility of Search Interface Features in different Informatio...
Active & Passive Utility of Search Interface Features in different Informatio...Active & Passive Utility of Search Interface Features in different Informatio...
Active & Passive Utility of Search Interface Features in different Informatio...
 
Supporting the Process - Adapting Search Systems To Search Stages (ECIL15)
Supporting the Process - Adapting Search Systems To Search Stages (ECIL15)Supporting the Process - Adapting Search Systems To Search Stages (ECIL15)
Supporting the Process - Adapting Search Systems To Search Stages (ECIL15)
 
The Value of Multistage Search Systems for Book Search
The Value of Multistage Search Systems for Book SearchThe Value of Multistage Search Systems for Book Search
The Value of Multistage Search Systems for Book Search
 
WebART: hoe maak je webarchieven bruikbaar voor de wetenschap? (Dutch)
WebART: hoe maak je webarchieven bruikbaar voor de wetenschap? (Dutch)WebART: hoe maak je webarchieven bruikbaar voor de wetenschap? (Dutch)
WebART: hoe maak je webarchieven bruikbaar voor de wetenschap? (Dutch)
 
From multistage information seeking models to multistage search systems (IIiX...
From multistage information seeking models to multistage search systems (IIiX...From multistage information seeking models to multistage search systems (IIiX...
From multistage information seeking models to multistage search systems (IIiX...
 
WebART - "Data Digging" - eHumanities Group 2013
WebART - "Data Digging" - eHumanities Group 2013WebART - "Data Digging" - eHumanities Group 2013
WebART - "Data Digging" - eHumanities Group 2013
 

Dernier

Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 

Dernier (20)

Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 

Towards Research Engines: Supporting Search Stages in Web Archives (2015)

  • 1. WebART project Web Archive RetrievalTools Jaap Kamps, Richard Rogers, Arjen deVries 
 Hildelies Balk, RenéVoorburg ! Thaer Samar, Hugo Huurdeman, Sanna Kumpulainen Flickr: LucViatour
  • 2. ! Hugo Huurdeman! University of Amsterdam! huurdeman@uva.nl! ! ! ! Towards Research Engines: 
 Supporting Search Stages in Web Archives webarchiving.nl Web Archives as Scholarly Sources conference, Aarhus University, 10 June 2015
  • 3. Introduction • Web archives preserve the fast- changing Web • By now containing Petabytes of valuable Web data ! • This could be a valuable resource, however, archives have not frequently been used for research ! • Several underlying reasons exist. Here, the focus is on potential limitations in access Flickr: laughingsquid
  • 4. The concept of ‘task-sharing’ • We look at the concept of task- sharing (Beaulieu, 1999) ! • i.e. how should we design web archive access systems to better facilitate task-sharing between scholar and system? ! • Bottom-up approach: looking at scholars’ use of Web data,
 and how currents systems support scholars’ needs scholar research task system
  • 5. 1 Scholars’ use of web data! & current support
  • 6. 1.1 Study: scholars’ research phases • Exploratory analysis of scholars’ research tasks (journal papers)! • scholars using temporal Web data ! • Use research phases as a ‘lens’ to analyze these papers artist:
  • 7. 1.1 Background: Research Phases • Various scholars have defined different 
 stages occurring in 
 research tasks 
 (Bronstein ’07; Chu ’99; 
 Meho & Tibbo ’03) ! • Specifically, Brügger 
 (2014) has defined several research phases relevant 
 to web archive research: 1. Corpus creation 2. Analysis 3. Dissemination
  • 8. 1.2 Study: scholars’ research phases • Method:! • querying EBSCOhost using the CMMC (Communication & Mass Media Complete), and LISTA (Library, Information Science & Technology Abstracts) databases ! • selecting all journal papers (2007-2015) which contain longitudinal analyses (excluding computer science papers)
  • 9. 1.2 Study: literature corpus overview • 18 papers (17 distinct first authors) ! • Main areas: • Information Science • Communication • New Media • Political Science
  • 10. 1.2 Study: literature corpus overview • Observation: various ways of corpus definition, analysis and dissemination in journal papers ! • However, most papers in this literature set did not use Web archives as a data source ! • Corresponds to large gap potential community addressed by web archives & small group actually using them thus far (Dougherty & Meyer, 2014)
  • 11. 1.3.1 Study results: Corpus definition phase • 1. selecting webpages or websites, e.g. based on authoritative lists (13) ! • 2. querying regular search engines (5) ! • 3. taking a sample of webpages (4) ! • Often: combination of methods e.g. the term ‘informetrics’ (Bar-Ilan, 2009), descriptors of youth movements (Xenos & Bennet, 2007) e.g. a list of insurance companies (Waite and Harrison, 2007) e.g. one week per month (Li et al, 2014) ; to reduce large size of corpus, or data bias (John, 2013)
  • 12. 1.3.1 Study results: Corpus definition phase Query Selection Sample Query Selection Sample ➤ ➤ ➤ ➤ ➤ 13 5 1 3 4
  • 13. • Current support: • Most: Selecting URLs (Wayback Machine) • Many: Querying the contents of the archive • Few: Selecting (predefined) categories • Very few: Sampling contents of the archive • Current limitations: • Defining, saving & sharing of corpora • Document-centric access methods [Hockx-Yu, 14] • Limitations of search [Ben-David & Huurdeman,14]
  • 14. 1.3.2 Results: Analysis phase (1/2) • Content analysis (66.7%)! • manual coding • coding schemes, at times based on existing frameworks ! • Content analysis (22.2%) • automatic • existing/customly developed tools ! • Network analysis (11.1%)! • issue crawler, link classifications
  • 15. 1.3.2 Results: Analysis phase (2/2) • Level of analysis:
 (b/o Brügger, 2013)! ! • page element (4) (22%) • e.g. mission statements • web page (6) (33%) • e.g. blog pages • web site* (7) (39%) • e.g. political actors’ sites • web sphere (1) (6%) • e.g. youth web sphere web sphere (1) website (7) page element (4) webpage (8)
  • 16. • Current support • Very few: analysis (n-gram, trends), export options • Current limitations: • Generally not applicable to custom corpora • No ways to define granularity of results • Often have to resort to script-based analysis tools • Lack of integrated content analysis, coding support, .. 1.3.2 Support: Analysis phase
  • 17. 1.3.3 Results: Dissemination phase • Tables (16) ! • Graphs (10) ! • Link networks (1) ! • Model (1)
  • 18. 1.3.3 Support: Dissemination phase • Current limitations • Set of visualizations depends on archive • Generally not applicable to user-defined corpora • Current support • some visualization options (n-gram, tag clouds)
  • 19. 1.4 Summary • Observation: omissions in current support for corpus creation, analysis and dissemination in a research context ! • Opportunities arise to increase task-sharing in future systems scholar research task system
  • 20. 2 From Search to Research engines
  • 21. 2.1 Supporting the flow (1/2) • How to integrate this varied set of features into an integrated access system? • with a high usability and without cognitive overload ! ! ! ! ! ! ! • Traditional approach: “Complex” interface 
 integrating all functionality Search ?
  • 23. 2.1 Supporting the flow (2/2) • Our approach: Divide functionality per (research) stage ! • Inspired by ongoing work on supporting the flow of Web and book search in multistage interfaces, based on cognitive models of the search process 
 [Huurdeman & Kamps, 2014; Huurdeman, Kamps, Koolen & Kumpulainen, 2015] Search Corpus Creation Search Visualization Search Analysis
  • 24. 2.2 Current research prototypes: b/o Dutch Web archive • National Library of the Netherlands (KB) ! ! • Selective Web archive (2007-now)! • 10+ Terabyte (25,000+ harvests) ! • Idea: modular system
  • 25. 2.2.1 Supporting research phases: corpus creation • faceted search interface • different modalities to explore results • possibility to • save (complex) 
 queries • save results • categorize Search Corpus Creation Saved queries
  • 26. 2.2.1 Supporting research phases: corpus creation • Further customization ’Under the hood’: define search strategy • via visual building blocks • flexibility in defining a corpus (determine selection, ranking, queries, etc)
 [De Vries et al, 2010]
 see also: spinque.com Search Corpus Creation
  • 27. 2.2.2 Supporting research phases: analysis • Analysis interface ! • edit/annotate dataset • search & browse dataset • analyze Search Analysis
  • 28. 2.2.3 Supporting research phases: dissemination • Visualization interface! • based on RAW (raw.densitydesign.org) • visualize datasets (graphs and visualizations) Search Dissemination
  • 29. 2.3 Caveats & discussion • Looking at access aspects • not at underlying data & its properties • next step: contextualizing ‘completeness’ of results [see Huurdeman, Kamps, Samar, De Vries, Ben- David & Rogers, 2015] ! • Slightly utopian vision: not all analysis can be supported • generic versus specific approaches • towards ‘toolmaker’s tools’ ! • Different archives offer different toolsets • Importance of sharing (open-source) and collaboration!
  • 30. 2.4 Conclusion • Exploratory analysis of scholars’ choices related to corpus definition, analysis and dissemination! ! • These choices revealed a number of limitations of current access interfaces ! • Therefore, we propose a more fluid approach, moving from mere search to ‘research engines’ Wayback Machine Search engine ‘Research’ engine
  • 31.
  • 33. Thanks & Acknowledgements • The WebART team (’12-’16): 
 Jaap Kamps, Richard Rogers, 
 Arjen de Vries, Thaer Samar, 
 Sanna Kumpulainen; 
 and Anat Ben-David. ! • We gratefully acknowledge the collaboration with the Dutch Web Archive of the National Library of the Netherlands. ! • This research was supported by the Netherlands Organization for Scientific Research (WebART project, NWO CATCH # 640.005.001).
  • 34. References • Beaulieu, M. (2000). Interaction in information searching and retrieval. Journal of Documentation, 56(4), 431–439. • Ben-David A. & Huurdeman H. (2014). Web Archive Search as Research: Methodological and Theoretical Implications. Alexandria Journal, Volume 25, No. 1 (2014) • Bronstein, J. (n.d.). The role of the research phase in information seeking behaviour of Jewish scholars: a modification of Ellis’s behavioural characteristics. Retrieved April 20, 2015, from http://www.informationr.net/ir/12-3/ paper318.html • Brügger, N. (2014). Concluding Remarks. International Internet Preservation Consortium General Consortium. Paris, France. Retrieved from: http://netpreserve.org/sites/default/files/attachments/Brugger.ppt (April 19, 2015) • Brügger, N. (2013). Historical Network Analysis of the Web. Social Science Computer Review, 31(3), 306–321 • Chu, C. M. (1999). Literary critics at work and their information needs: A research-phases model. Library & Information Science Research, 21(2), 247–273. • Dunne, C., Shneiderman, B., Gove, R., Klavans, J., & Dorr, B. (2012). Rapid understanding of scientific paper collections: Integrating statistics, text analytics, and visualization. Journal of the American Society for Information Science and Technology, 63(12), 2351–2369. • Hockx-Yu, H. (2014). Access and Scholarly Use of Web Archives. Alexandria, 25(1-2), 113–127. • Huurdeman H., Kamps J., Samar T., de Vries A., Ben-David A., Rogers R. (2015). Finding Pages in the Unarchived Web. International Journal on Digital Libraries. • Huurdeman H., Kamps J., Koolen M., Kumpulainen, S. (forthcoming). The Value of Multistage Interfaces for Book Search. CEUR-WS. • Huurdeman, H., & Kamps, J. (2014). From Multistage Information-seeking Models to Multistage Search Systems. In Proceedings of the 5th Information Interaction in Context Symposium (pp. 145–154). New York, NY, USA: ACM. • Meho, L. I., & Tibbo, H. R. (2003). Modeling the information-seeking behavior of social scientists: Ellis’s study revisited. Journal of the American Society for Information Science and Technology, 54(6), 570–587. • Rogers R. (2013). Digital Methods. MIT Press 2013 • de Vries A., Alink W., Cornacchia R. (2010). Search by Strategy. Proc. ESAIR '10
  • 35. ! Hugo Huurdeman! University of Amsterdam! huurdeman@uva.nl! ! ! ! Towards Research Engines: 
 Supporting Search Stages in Web Archives webarchiving.nl Web Archives as Scholarly Sources conference, Aarhus University, 10 June 2015