This document summarizes Project Gutenberg as an information retrieval system. It describes Project Gutenberg as the first digital library project, initiated in 1971, which now offers over 41,000 public domain eBooks. It discusses the intended audience, functionalities, indexing system, searching, browsing, and categorization of the Project Gutenberg website. It also evaluates issues with the website's interface design, search and browsing features, and lack of utilizing available metadata.
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Project Gutenberg as Information Retrieval System
1. Project Gutenberg as an
Information Retrieval System
Kai Li
IST616 Final Assignment
2012.11
2. Introduction to Project Gutenberg
• The first digital library project in the
world, initiated by the late Michael Hart in
1971.
• Project Gutenberg currently offers more than
41,000 public domain eBooks (in more than
50 languages) as well as other resources (like
scientific data).
• Website: http://www.gutenberg.org/
3. Intended Audience and Functionalities
• Intended audience: eBook readers and general
users.
• Functionalities: portal of the project, eBook
repository and discovery system.
4. Mobile Site
• There are two kinds of
interfaces of this
website based on the
device one uses. Only
the traditional nonmobile interface will be
examined in this
presentation due to the
limited scope of the
assignment.
6. Issues of Indexing/Tag System
• There is a searching box as well as a tag called
“Search Catalog”;
– The searching box is too small to be noticed;
– The tag “Search Catalog” actually leads users to a
page where one cannot find the searching box,
but only some browsing selections;
• There are a number of repetitive tags on the
left-hand bar and on the top of the page;
– For example, the tag “Book Categories”.
7. Means To Find a Book
• Searching
• Browsing
– By categories
9. Issues of Searching
• The display is different from most of the
interfaces one can see on the Internet, which
may result some difficulties for new users;
• Due to a lack of navigation mechanism and
the function to refine the result by facets, it’s
extremely inconvenient to locate a resource if
the result is big.
10. Precision and Recall
• The retrieval method used by this website is a
string-matching method, which matches the
string inputted by the user with the full-text of all
the resources.
– “Or” relationship used for multiple words.
• Because the scope of the index is the full-text, the
recall is higher than traditional library catalogs;
however, since it is still a string-matching
method, the precision is still not very good.
12. Issues of Browsing
• There are three searching tools offered on this
page, which should have been offered on the
searching page rather than this one.
• Only one standard can be used to limit the
resources at the same time. And after one
chooses a certain standard, there is no other
way to further limit the result.
13. Categories/Classification
• There are two tiers of the “classification” on
this website:
– Subcategories: 23
• These subcategories are called “bookshelf” too, which
is confusing.
– Bookshelves: 133
• Which can be seen as a lower level than subcategories.
However, not all bookshelves are linked to a given
subcategory.
14. Overall Evaluation
• Advantages:
– Mobile functionalities:
• Mobile site
• QR codes
• Disadvantages:
– Poorly organized and
designed;
– Failing to display the full
richness of the metadata
on the website:
• LoC classification and
subject headings
– The interface being lack
of communication with
the users;
The project has been accepting eBooks uploaded by members which are not protected by US copyright laws.
Because this website is also the main page of the whole project, the audience include not only the people who want to get the eBooks but also people who are interested in the project itself.
The indexing system is actually very confusing. This slide lists some of the problems.
The searching result page: related bookshelves and subjects are displayed in front of all the books; books are ranked by popularity (times of download), but one can also choose to sort alphabetically or by released date.
The interface was very unintuitive for me when I first used it.If the book is not ranked high in terms of alphabetic, popularity or released date, and if the result is big, it’s almost impossible for one to find a specific book. Like traditional library catalogs, this interface doesn’t support finding an unknown book very well.
String-matching method cannot solve the issues of one words with multiple meanings or different words bearing the same meaning.
Methods: by author; by title; by language; by recently added; by popularity.One can also browse the website by LC classification (as well as LCSH). However, they are not listed on this page. LC classification can be found only from the book pages.
Not all bookshelves can be linked with a subcategory.Moreover, there are also some bookshelves containing materials in other languages that is not inside the above system, which indicates that the classification scheme in English may not cover all the resources on the website.
Many libraries and other parties have imported the metadata of Gutenberg eBooks to the local systems, which makes the issues of this website a less important one.But this is still a problem!