Project Gutenberg as Information Retrieval System

•Télécharger en tant que PPTX, PDF•

2 j'aime•1,438 vues

This document summarizes Project Gutenberg as an information retrieval system. It describes Project Gutenberg as the first digital library project, initiated in 1971, which now offers over 41,000 public domain eBooks. It discusses the intended audience, functionalities, indexing system, searching, browsing, and categorization of the Project Gutenberg website. It also evaluates issues with the website's interface design, search and browsing features, and lack of utilizing available metadata.

Technologie Formation

Project Gutenberg as an
Information Retrieval System
Kai Li
IST616 Final Assignment
2012.11

Introduction to Project Gutenberg
• The first digital library project in the
world, initiated by the late Michael Hart in
1971.
• Project Gutenberg currently offers more than
41,000 public domain eBooks (in more than
50 languages) as well as other resources (like
scientific data).
• Website: http://www.gutenberg.org/

Intended Audience and Functionalities
• Intended audience: eBook readers and general
users.
• Functionalities: portal of the project, eBook
repository and discovery system.

Mobile Site
• There are two kinds of
interfaces of this
website based on the
device one uses. Only
the traditional nonmobile interface will be
examined in this
presentation due to the
limited scope of the
assignment.

Issues of Indexing/Tag System
• There is a searching box as well as a tag called
“Search Catalog”;
– The searching box is too small to be noticed;
– The tag “Search Catalog” actually leads users to a
page where one cannot find the searching box,
but only some browsing selections;

• There are a number of repetitive tags on the
left-hand bar and on the top of the page;
– For example, the tag “Book Categories”.

Means To Find a Book
• Searching
• Browsing
– By categories

Issues of Searching
• The display is different from most of the
interfaces one can see on the Internet, which
may result some difficulties for new users;
• Due to a lack of navigation mechanism and
the function to refine the result by facets, it’s
extremely inconvenient to locate a resource if
the result is big.

Precision and Recall
• The retrieval method used by this website is a
string-matching method, which matches the
string inputted by the user with the full-text of all
the resources.
– “Or” relationship used for multiple words.

• Because the scope of the index is the full-text, the
recall is higher than traditional library catalogs;
however, since it is still a string-matching
method, the precision is still not very good.

Issues of Browsing
• There are three searching tools offered on this
page, which should have been offered on the
searching page rather than this one.
• Only one standard can be used to limit the
resources at the same time. And after one
chooses a certain standard, there is no other
way to further limit the result.

Categories/Classification
• There are two tiers of the “classification” on
this website:
– Subcategories: 23
• These subcategories are called “bookshelf” too, which
is confusing.

– Bookshelves: 133
• Which can be seen as a lower level than subcategories.
However, not all bookshelves are linked to a given
subcategory.

Overall Evaluation
• Advantages:
– Mobile functionalities:
• Mobile site
• QR codes

• Disadvantages:
– Poorly organized and
designed;
– Failing to display the full
richness of the metadata
on the website:
• LoC classification and
subject headings

– The interface being lack
of communication with
the users;

Contenu connexe

Tendances

Anglo-American Cataloguing Rules AACR 2 pptUniversity of Delhi

Z39.50 basicsMildred Odongo

koha PPT 23822.pptxNikunjTyagi20

Subject analysis, subject heading principlesRichard.Sapon-White

Koha Cataloguing ModuleMeet Singh Bains

VinitiManish Aarya

Web 2.0 in LibrariesAnupama Saini

RESOURCE SHARING: A LIBRARY PERCEPTIVE IAEME Publication

Ict uses in librariesLiaquat Rahoo

Classified Catalogue Code (ccc)University of Delhi

Chain indexingsilambu111

SQL Reports in KohaNicole C. Engard

Trabajo final historia de las bibliotecas...Julian Valencia

An an overview of selection acquisition, and usage of e resourcesEKITI STATE UNIVERSITY LIBRARY

How ict used in librariesjanjangammod

Brodt - Plan de Desarrollo de ColeccionesRomina Brodt

Canon of classificationavid

Proposal otomatisasi perpustakaanJusuf Nursjamsu

Academic libraries in new normalDr Trivedi

Historia de las bibliotecasNohelia Ríos

Tendances (20)

Anglo-American Cataloguing Rules AACR 2 ppt

Z39.50 basics

koha PPT 23822.pptx

Subject analysis, subject heading principles

Koha Cataloguing Module

Viniti

Web 2.0 in Libraries

RESOURCE SHARING: A LIBRARY PERCEPTIVE

Ict uses in libraries

Classified Catalogue Code (ccc)

Chain indexing

SQL Reports in Koha

Trabajo final historia de las bibliotecas...

An an overview of selection acquisition, and usage of e resources

How ict used in libraries

Brodt - Plan de Desarrollo de Colecciones

Canon of classification

Proposal otomatisasi perpustakaan

Academic libraries in new normal

Historia de las bibliotecas

Similaire à Project Gutenberg as Information Retrieval System

Lost in Translation: tmnewberry

Leveraging Library Thing (2009)Niamh Walker-Headon

What Public Library Users Want and How to Nina McHale

K3 edith falk_discoverytoolslibraryevaminerva

Web-Scale Discovery: Post ImplementationRachel Vacek

web opac akash kurmi

Federated to library discovery platfomsNikesh Narayanan

WorldCat Local@AurariaNina McHale

Presentacion tics (1)87895

Discovery on a budgetChris Bulock

Discovery on a budget: Improved searching without a Web-scale discovery productNASIG

Rethinking Library Cooperatives: Prepared for the Program for Cooperative Cat...Karen S Calhoun

Library portal by Gaurav BoudhLibrary and Information Science Blog

Web Scale Discovery Services: Google like search experienceNikesh Narayanan

Device agnostic discovery using drupal and bibliocommonsonlinenw

Creating better user interfaces for libraries catalogues: how to present and ...Tanja Merčun

Role of libraries in research and scholarly communicationNikesh Narayanan

opacs.pptKiran Malik

Use of "NewGenLib" Open Source Software for Library Automation, Digital Libra...Emmanuel E C

Similaire à Project Gutenberg as Information Retrieval System (20)

Lost in Translation:

Leveraging Library Thing (2009)

What Public Library Users Want and How to

K3 edith falk_discoverytoolslibrary

Web-Scale Discovery: Post Implementation

web opac

Federated to library discovery platfoms

WorldCat Local@Auraria

Presentacion tics (1)

Discovery on a budget

Discovery on a budget: Improved searching without a Web-scale discovery product

Rethinking Library Cooperatives: Prepared for the Program for Cooperative Cat...

Library portal by Gaurav Boudh

Web Scale Discovery Services: Google like search experience

Device agnostic discovery using drupal and bibliocommons

Creating better user interfaces for libraries catalogues: how to present and ...

Role of libraries in research and scholarly communication

opacs.ppt

Use of "NewGenLib" Open Source Software for Library Automation, Digital Libra...

Plus de Kai Li

Using a keyword extraction pipeline to understand concepts in future work sec...Kai Li

Knowledge production between laboratories and scientific texts: a proposal of...Kai Li

Data and Software in Scientific Activities: a Literature ReviewKai Li

A metadata scheme of the software-data relationship: A proposalKai Li

Software Citation, Reuse and Metadata Considerations: An Exploratory Study ...Kai Li

On metaphor: a book review of Metaphors we live byKai Li

Visual perception and mixed-initiative interaction for assisted visualization...Kai Li

A family tree of graph typesKai Li

Introduction to Visualizing UncertaintiesKai Li

InfoVis Final Project: NBA in historical contextKai Li

Introduction to bibframeKai Li

Grassroots Read: Planning, Marketing and Assessing PlanKai Li

RDFa: an introductionKai Li

Culture Classification: An AnalysisKai Li

RDA in ChinaKai Li

How Americans recognize librariesKai Li

How libraries use 新浪微博Kai Li

新一代的Opac服务Kai Li

Ipad and LibraryKai Li

Augmented reality @ librariesKai Li

Plus de Kai Li (20)

Using a keyword extraction pipeline to understand concepts in future work sec...

Knowledge production between laboratories and scientific texts: a proposal of...

Data and Software in Scientific Activities: a Literature Review

A metadata scheme of the software-data relationship: A proposal

Software Citation, Reuse and Metadata Considerations: An Exploratory Study ...

On metaphor: a book review of Metaphors we live by

Visual perception and mixed-initiative interaction for assisted visualization...

A family tree of graph types

Introduction to Visualizing Uncertainties

InfoVis Final Project: NBA in historical context

Introduction to bibframe

Grassroots Read: Planning, Marketing and Assessing Plan

RDFa: an introduction

Culture Classification: An Analysis

RDA in China

How Americans recognize libraries

How libraries use 新浪微博

新一代的Opac服务

Ipad and Library

Augmented reality @ libraries

Dernier

Ransomware_Q4_2023. The report. [EN].pdfOverkill Security

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz

ICT role in 21st century education and its challengesrafiqahmad00786416

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays

Corporate and higher education May webinar.pptxRustici Software

Real Time Object Detection Using Open CVKhem

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Why Teams call analytics are critical to your entire businesspanagenda

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra

DBX First Quarter 2024 Investor PresentationDropbox

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

Dernier (20)

Ransomware_Q4_2023. The report. [EN].pdf

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...

ICT role in 21st century education and its challenges

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...

Corporate and higher education May webinar.pptx

Real Time Object Detection Using Open CV

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

Boost Fertility New Invention Ups Success Rates.pdf

Why Teams call analytics are critical to your entire business

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

DBX First Quarter 2024 Investor Presentation

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Data Cloud, More than a CDP by Matt Robison

AXA XL - Insurer Innovation Award Americas 2024

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

Project Gutenberg as Information Retrieval System

1. Project Gutenberg as an Information Retrieval System Kai Li IST616 Final Assignment 2012.11

2. Introduction to Project Gutenberg • The first digital library project in the world, initiated by the late Michael Hart in 1971. • Project Gutenberg currently offers more than 41,000 public domain eBooks (in more than 50 languages) as well as other resources (like scientific data). • Website: http://www.gutenberg.org/

3. Intended Audience and Functionalities • Intended audience: eBook readers and general users. • Functionalities: portal of the project, eBook repository and discovery system.

4. Mobile Site • There are two kinds of interfaces of this website based on the device one uses. Only the traditional nonmobile interface will be examined in this presentation due to the limited scope of the assignment.

5. Indexing System

6. Issues of Indexing/Tag System • There is a searching box as well as a tag called “Search Catalog”; – The searching box is too small to be noticed; – The tag “Search Catalog” actually leads users to a page where one cannot find the searching box, but only some browsing selections; • There are a number of repetitive tags on the left-hand bar and on the top of the page; – For example, the tag “Book Categories”.

7. Means To Find a Book • Searching • Browsing – By categories

8. Searching

9. Issues of Searching • The display is different from most of the interfaces one can see on the Internet, which may result some difficulties for new users; • Due to a lack of navigation mechanism and the function to refine the result by facets, it’s extremely inconvenient to locate a resource if the result is big.

10. Precision and Recall • The retrieval method used by this website is a string-matching method, which matches the string inputted by the user with the full-text of all the resources. – “Or” relationship used for multiple words. • Because the scope of the index is the full-text, the recall is higher than traditional library catalogs; however, since it is still a string-matching method, the precision is still not very good.

11. Browsing

12. Issues of Browsing • There are three searching tools offered on this page, which should have been offered on the searching page rather than this one. • Only one standard can be used to limit the resources at the same time. And after one chooses a certain standard, there is no other way to further limit the result.

13. Categories/Classification • There are two tiers of the “classification” on this website: – Subcategories: 23 • These subcategories are called “bookshelf” too, which is confusing. – Bookshelves: 133 • Which can be seen as a lower level than subcategories. However, not all bookshelves are linked to a given subcategory.

14. Overall Evaluation • Advantages: – Mobile functionalities: • Mobile site • QR codes • Disadvantages: – Poorly organized and designed; – Failing to display the full richness of the metadata on the website: • LoC classification and subject headings – The interface being lack of communication with the users;

15. Thanks!

Notes de l'éditeur

The project has been accepting eBooks uploaded by members which are not protected by US copyright laws.
Because this website is also the main page of the whole project, the audience include not only the people who want to get the eBooks but also people who are interested in the project itself.
The indexing system is actually very confusing. This slide lists some of the problems.
The searching result page: related bookshelves and subjects are displayed in front of all the books; books are ranked by popularity (times of download), but one can also choose to sort alphabetically or by released date.
The interface was very unintuitive for me when I first used it.If the book is not ranked high in terms of alphabetic, popularity or released date, and if the result is big, it’s almost impossible for one to find a specific book. Like traditional library catalogs, this interface doesn’t support finding an unknown book very well.
String-matching method cannot solve the issues of one words with multiple meanings or different words bearing the same meaning.
Methods: by author; by title; by language; by recently added; by popularity.One can also browse the website by LC classification (as well as LCSH). However, they are not listed on this page. LC classification can be found only from the book pages.
Not all bookshelves can be linked with a subcategory.Moreover, there are also some bookshelves containing materials in other languages that is not inside the above system, which indicates that the classification scheme in English may not cover all the resources on the website.
Many libraries and other parties have imported the metadata of Gutenberg eBooks to the local systems, which makes the issues of this website a less important one.But this is still a problem!

Project Gutenberg as Information Retrieval System

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Project Gutenberg as Information Retrieval System

Similaire à Project Gutenberg as Information Retrieval System (20)

Plus de Kai Li

Plus de Kai Li (20)

Dernier

Dernier (20)

Project Gutenberg as Information Retrieval System

Notes de l'éditeur