1. LIS 703 Taverekere Kanti Srikantaiah Chapter 9 Part 1 Małgorzata Kot Subject Analysis
2. Subject Analysis Historically, subject access has been one of the most challenging aspects of organizing information. Even with the most traditional information resources, determining and identifying what an item is about can be difficult and time-consumming. With non-textual, imaginative, or complex materials, the process can be even more demanding.
3. Subject Analysis Despite the difficulties and costs associated with subject analysis, information professionals still see the value inherent in identifying precisely an item’s subject matter (often referred as aboutness in the library and information science literature), and then carefully choosing the most suitable terms and symbols from a subject language to represent the item’s aboutness in its surrogate record. In the recent years with advance in search engine technology and high costs of the original cataloging, the necessity for subject analysis has been questioned.
4. In the recent years with advance in search engine technology and high costs of the original cataloging, the necessity for subject analysis has been questioned. Some have suggested that information resources no longer need to be analyzed because when users are searching for information, computers can can identify the documents, and therefore the time an money could be diverted to other activities such as digitization projects. Others have suggested that computers can analyze the documents and assign classification numbers and/or descriptors from a list of controlled vocabulary terms.
5. Despite the improvements in search engines and the recent development of user tagging, many professionals are reluctant to turn over all subject analysis to machines.
6. Machines are not yet good with identyfying the aboutness of information resources and they still cannot assign controlled vocabulary and classification with any satisfactory degree of acuracy. While a computer can determine what words are used in a document and the frequency of those words, at this time it cannot understand the nuanced concepts represented by those words. Even the most sophisticated algorithms cannot replace the human mind. (automatic translations via Google).
7. Subject analysis – including analyzing content and creating and applying subject headings and classification numbers – is a core function of cataloging; although expensive, it is notheless critical. In 2008, the Working Group on the Future of Bibliographic Control, a group organized by the Library of Congress to examine the role of cataloging in the 21st century reaffirmed the importance and need for this human-centric task.
8. Determining aboutness requires controlled vocabulary and classification Computers can count words and the number of times they appear but cannot determine their meaning. Aboutness is dependent upon who is using the document and for what purpose.
9. Questions addressed in the chapter: What is subject analysis? What are some difficulties associated with the subject analysis process? How is the process performed? What bibliographic features are useful in the determination of aboutness? How is subject content determined for nontextual materials?
10. Subject analysis is creating metadata about an information package by determining its aboutness to create controlled vocabulary terms and classification notations. The process includes: conducting a conceptual analysis to determine what the item is about, describing the aboutness in a written statement, using that aboutness statement to assign controlled vocabulary terms and/or classification notations. (In the 3rd step – authorized therms from Library of Congress Subject Headings, DD Classification, LCC, Universal Decimal Classification, or another classification scheme.
11. Subject analysis: Provides meaningful subject access to information. Provides for collocation of information resources of a like nature. Provides a logical location for similar information resources on the shelves. Saves users’ time.
12. Challenges in subject analysis Determining what an information resource is about can be difficult, and not everyone agrees on how it should be done or even where the difficulties lie. With the burgeoning relationships among various fields, topics, and ideas in this increasingly interdisciplinary world, the result can be some very challenging materials to analyze. For example, a highly technical dissertation may be more difficult to analyze than a work of popular science.
13. Cultural Differences There are numerous other factors that influence the conceptual analysis process. Some are related to the nature of the resource being analyzed and others are related to the persons who perform the analysis. An understanding of the place of one’s culture as well as one’s education in determining subject matter is important. George Lakoff has written about the research on the understanding the color depending upon one’s language. There are 11 basic color cathegories in English, but in some other languages there are fewer categories. (Some languages have only 2 basic color terms - black and white, or cool and warm). Differences among Western cultures are perhaps not quite so dissimilar as those between Western and non-Western cultures.
15. Consistency That is another challenge associated with the subject analysis process. Evidence of the difficulty in consistently determinig and articulating aboutness is found in a number of studies in which people have been asked to list terminology that they would use to search for specific items. In a 1954 study by Oliver Lilley, 340 students looked at 6 books and suggested an average of 62 different terms that could be used to search for each book. This is not a failure of controlled vocabulary, it is a failure of people to determine aboutness. Catalogers using the same controlled vocabulary and the same rules for it will produce consistent subject headings.
16. Nontextual Information Is even less clear-cut than the process for textual ones. For visual resources, several levels of conceptual analysis are possible. In 1939, art historian Erwin Panofsky identified 3 levels of meaning in works of art: The primary or natural subject matter The secondary or conventional subject matter The intrinsic meaning or content
17. Nontextual Information The primary or natural subject matter. This is the preiconographic or factual level, in which objects and events are identified. This is a painting of 13 long-haired men in robes gathered around a long table for dinner. The secondary or conventional subject matter. This is the iconographic level, in which some cultural knowledge of themes and concepts manifested in stories, images, and allegory is needed. It is not just an image of 13 men gathered for dinner, it is a representation of the Lord’s supper.
18. The intrinsic meaning or content This is the iconological level, in which the work is interpreteted, based on an understanding of the basic attitudes of a nation, a period, a class, a religious or philosophical persuasion – unconsciously qualified by one personality [the artist] and condensed into one work. This painting is Leonardo da Vinci’s Last Supper from 1498, a mural in the Convent of Santa Maria delle Grazie in Milan, Italy. It depicts the internal confusion of the twelve disciples after Jesus announced one of them would betray him – each one wondering if it would be him. Meaning of this level depends upon an understanding of the two previous levels. This level requires a sophisticated understanding of world cultures, symbolism, and the significance of the work and its context in art history.
19.
20. The intrinsic meaning or content Sara Shatford Layne relates Panofsky’s first level of meaning to the of-ness of the item (what this is an image of) and his second level to aboutness, she states that the third level cannot be used to analyze visual images with any degree of consistency. With musical works, it is even harder to identify concepts or to enumerate what themes and topics are being represented. It is fairly easy to decsribe how objects look, but identification of intrinsic meaning or iconological significance for any nontextual information resources requires special study and training.
21. Exhaustivity Is the number of concepts that will be considered in the analysis. It is best to know beforehand what level of exhaustivity is required. There are 2 basic degrees of exhaustivity: Summarization – identifies only a dominant, overall subject of the item, recognizing only concepts embodied in the main theme. Depth Indexing – aims to extract all the main concepts addressed in an item, recognizing subtopics and lesser themes.
23. Exhaustivity There is difference in degree between document retrieval and information retrieval. Summarization allows for document retrieval, after which many users consult the document’s internal index to retrieve the relevant information they need from the document. Depth indexing allows retrieval at a much more specific level, even to the retrieval sections or paragraphs in a document.
24. Exhaustivity Exhaustivity affects both precision and recall in retrieval. Precission is the measurment of how many of the documents retrieved are relevant. Depth indexing increase precision because more specific terminology is used. Recall is the measurement of how many of the relevant documents in a system are actually retrieved. Summarization is likely to increase recall because the search terms are broader. Search engines do the ultimate depth indexing because the specific words that come up with results, however it increased recall while greatly decreasing precision.
25. Objectivity Information professionals are expected to remain objective and impartial in all of their work-related activities, but is this realistic? Information professionals and LIS students are only human after all. There is human tendency to judge the information that we encounter. It is important to be aware of one’s biases, prejudices, and beliefs when conducting the conceptual analysis, and to seek the opinions of others when needed. We should attempt to keep our biases in check as much as possible while performing the process.
26. Differences in Methods Used Not everyone agrees on how to approach the determination of aboutness, so there is no single process that is used by everyone. Langridge’s Approach Langridge views process as a series of discrete activities. Cataloger or indexer must keep three basic questions in mind in order to determine the aboutness of an information resource. Those questions are: What is it? Answered by one of the fundamental forms or categories of knowledge. He indentifies 12 forms of knowledge (philosophy, natural science, technology, human science, social practice, history, moral knowledge, religion, art, criticism, personal experience, and prolegomena: logic, mathematics, grammar – the foundations of knowlegde) What is for? Looks at the purpose of document. Why it was created? What is about? A topic or multiple topics is the answer. Topics are not specific to any one form of knowledge or discipline (clothing).
27. Differences in Methods Used Wilson’s Approaches W. has described 4 methods, he did not name them, the authors supplied the names. Purposive Method: one tries to determine the author’s aim or purpose in creating the information resource. Figure-Ground Method: one tries to determine a central figure that stands out from the background of the rest of information resource. Objective Method: method used in most attempts at automated conceptual analysis. One tries to be objective by counting references to various items to determine which ones vastly outnumber the others. Cohesion Method: an approach that looks at the unity of the content. One tries to determine what holds the work together, what content has been included, and what has been left out of the treatment of the topic.
28. Use-based Approaches Aboutness can be determined by looking at how a resource could be used or what questions a resource could answer. Lancaster, concerned about users and how the item might be used, suggests asking 3 questions: What is it about? Why has it been added to our collection? What aspects will our users be interested in? There seems to be no one correct way to determine aboutness. In the following section, the authors and Pamela provide an approach to conceptual analysis.