Slides from the Introduction and Theoretical Foundations of New Media course of the Interactive Media and Knowledge Environments master program (Tallinn University).
4. Metadata So, why is metadata relevant? Or… why should we care about metadata? David Lamas, TLU, 2011 4
5. Metadata As a concept, is not new Metadata has long been for managing document collections such as the ones kept by libraries But the term itself, was only coined in 1968 By Philip Bagley, a pioneer of computerized document retrieval David Lamas, TLU, 2011 5
6. Metadata Literally, a set of data that describes and gives information about other data, metadata in our context is: Machine readable Descriptive For the purposes of resource… Discovery Management Delivery Access control Use Re-use Long term preservation David Lamas, TLU, 2011 6
7. Metadata Or in other words, metadata allows for the description of the… Definition Structure; and Administration of selected resources with all contents in context to ease the further use of the resource David Lamas, TLU, 2011 7
8. MARC Or… Machine Readable Catalogue Is still the main metadata standard in the library world although it is not a full cataloguing scheme being David Lamas, TLU, 2011 8
9. UDC, AARC2 and RDA Universal Decimal Classification A multilingual classification scheme for all fields of knowledge Available at… http://www.udcc.org/udcsummary/php/index.php Anglo-American Cataloguing Rules For use in the construction of catalogues Available at… http://www.aacr2.org/ Resource description and access Available at… http://www.rda-jsc.org/rda.html David Lamas, TLU, 2011 9
10. Z39.50, SRW and SRU Z39.50 is a client–server protocol for searching and retrieving information widely used in library environments Search & Retrieve Web Service A intended standard web-based text-searching interface Search/Retrieval via URL Astandard XML-focused search protocol for Internet search queries, which uses the Contextual Query Language David Lamas, TLU, 2011 10
11. But… This should not bother you other than to note that… Metadata tends to get more complicated the longer you think about it David Lamas, TLU, 2011 11
12. As for the web… It was early recognized that finding what you need was going to start getting difficult We’re talking about the mid nineties when the web’s size was referred to in terms of tens of thousands Users, mainly information sciences specialists, begun trying to catalogue it by hand Do you remember Yahoo’s earlier versions? David Lamas, TLU, 2011 12
13. As for the web… The first search engines appeared and authors begun to realize that the metadata they embedded into web pages might be important <html> <head> <title>A web page</title> <meta name=“keywords” content=“some, key, words” /> <meta name=“description” content=“a summary” /> </head> <body> … David Lamas, TLU, 2011 13
14. As for the web… Then came Google And metadata lost some relevance as Google’s PageRank algorithm takes note of links between pages but places less emphasis on embedded metadata to avoid… Metaspam <meta name=“description” content=“a summary” /> Metacrap <title>put your title here</title> David Lamas, TLU, 2011 14
15. Dublin Core Despite the initial drawbacks, work continued on embedded metadata and the Dublin Core was and still is one of the main players with its 15 elements… Title, Creator, Subject, Description, Publisher, Contributor, Date, Type, Format, Identifier, Source, Language, Relation, Coverage, Rights …embedded into web pages or encoded using XML The initial intention was to improve indexing by search engines But whereas its promoters forgot about metaspam and metacrap, the search engines didn’t And so, main search engines still ignore embedded metadata David Lamas, TLU, 2011 15
17. Metadata Remarkably, there has been fairly widespread adoption of metadata principles, specially in policy terms, namely in government (look into http://www.esd.org.uk/standards/egms/viewer/viewer.aspxfor and interesting example) And in: Education Health Cultural heritage Environmental agencies, and… Libraries, of course David Lamas, TLU, 2011 17
18. Metadata This resulted in the… Growth of metadata cataloguing rules (although every community has its own rules) Growth in use of additional elements for particular communities (and again, every community’s additions are different) Adoption of application profiles to document the distinct cataloguing rules and additions Institution of the Dublin Core Metadata Initiative as an organization engaged in the development of interoperable metadata standards that support a broad range of purposes and business models David Lamas, TLU, 2011 18
19. Metadata But the Dublin Core isn’t alone, far from it Many other standards were and are being developed such as these, just to name two: RDF (Resource Description Framework) LOM (Learning Object Metadata) David Lamas, TLU, 2011 19
20. Resource Description Framework The resource description framework was developed by the W3C, the RDF is the envisioned standard for the semantic web Its goal is to allow software to automatically navigate and reason about web content thus enabling… A web of (linked) data David Lamas, TLU, 2011 20
22. Learning Object Metadata Learning Object Metadata is a data model Usually encoded in XML, it is used to describe learning objects and similar digital resources used to support learning. David Lamas, TLU, 2011 22
24. Metadata As said in the beginning… Metadata tends to get more complicated the longer we think about it The current metadata efforts lack of within standards and within communities coherence and cohesion are a good example And that is why we will next look into Ontologies So… do we care about metadata? Why are we interested? David Lamas, TLU, 2011 24
25. Metadata I guess the answer is yes, we care. And yes, we are interested, because metadata is everywhere Sometimes it is explicitly available, Other times it is hidden or not so readily available, butanyway… It would be foolish not to make use of it David Lamas, TLU, 2011 25
26. Metadata Further, there is increasing pressure to expose metadata on the web for other to mash up and this is specially true today in settingssuch as… Education; Research; and Government And finally, metadata becomes paramount in scenarios where content is data; or the required information can not easily derived from content David Lamas, TLU, 2011 26
28. Ontologies One way of dealing with the lack of within standards and within communities coherence and cohesion of current metadata efforts is to evolve to an ontology-base metadata approach But what does this means? David Lamas, TLU, 2011 28
29. Ontologies An ontology is a logical theory which gives an explicit partial account of a conceptualization An intentional semantic structure which encodes the implicit rules constraining the structure of a piece of reality In this light, the aim of an ontology is to define which primitives, provided with their associated semantics, are necessary for knowledge representation in a given context David Lamas, TLU, 2011 Thomas R. Gruber (1993). Toward principles for the design of ontologies used for knowledge sharing. Originally in N. Guarino and R. Poli, (Eds.), International Workshop on Formal Ontology, Padova, Italy. Revised August 1993. Published in International Journal of Human-Computer Studies, Volume 43 , Issue 5-6 Nov./Dec. 1995, Pages: 907-928, special issue on the role of formal ontology in the information technology.
30. Ontologies Ontologies are usually characterized by their… Coverage The extent to which the primitives mobilized by the perceived usage scenarios are covered by the ontology Specificity The extent to which ontological primitives are precisely identified Granularity The extent to which primitives are precisely and formally defined Formality The extent to which primitives are described in a formal language David Lamas, TLU, 2011 30
31. Ontologies And ontologies are not… taxonomies But taxonomy might be perceived as a specific case of an ontology A taxonomy is a particular classification arranged in a hierarchical structure Typically it is organized by supertype/subtype relationships also called generalization/specialization relationships David Lamas, TLU, 2011 31
35. Why ontologies? In short, we interpret, machines don’t As such, an effort must be undertaken in order to support adequate usage of digital resources So, what’s missing? Among other… The possibility to share a common understanding of the structure of information within a specific domain The possibility to reuse domain knowledge The possibility to make domain assumptions explicit The possibility to analyze domain knowledge David Lamas, TLU, 2011 35
36. Ontologies and the web It is estimated that by 2010… 70% of public web pages will have some level of metadata, but only 20% will use more extensive semantic web approaches such as ontology-based metadata But why should we care? David Lamas, TLU, 2011 36 http://www.afsg.nl/InformationManagement/images/nieuws/finding%20and%20exploiting%20value%20of%20semantic%20tech%20on%20web.pdf
37. Ontologies and the web An emerging ontological approach is OWL or… Web Ontology Language A vocabulary extension of the Resource Description Framework, which adds more vocabulary for describing characteristics of properties and classes or relations between classes David Lamas, TLU, 2011 37
38. Web Ontology Language OWL enables ontology-based information sharing and manipulation together with RDF and XML In reverse order… XML allows users to add arbitrary structure to their docuemnts but says nothing about what such structures mean RDF enables expression of meaning over XML (and other) structures Using subject, verb and object triples OWL enables machines to comprehend semantic documents and data David Lamas, TLU, 2011 38
40. Ontologies This said and while addressing some of the current metadata efforts weaknesses, present-day ontologies still largely depend on explicit human intervention to be useful And that is why we will next look into folksonomies David Lamas, TLU, 2011 40
42. Folksonomies Are mainly a bottom-up social classification system A way to organize and share contents by tagging resources Synonyms are… Ethno-classification; and Collaborative tagging David Lamas, TLU, 2011
43. Folksonomies Folksonomies are created by users and have… No structure No fixed vocabulary No explicit relationships between terms, and No authority David Lamas, TLU, 2011 43
44. Folksonomies Folksonomies also are… Distributed, and Collaboratively built and maintained You can tag items owned by others You can get instant feedback All items for the same tag All tags for the same item You can a adapt your tags to the group norm But you are never forced David Lamas, TLU, 2011 44
45. Folksonomies Some of their apparent benefits are… Being cheap and easy to build and use Being capable to adapt very quickly to changes and users needs They scale well Foster serendipity Semantic browsing instead of searching Lower the cooperation barriers David Lamas, TLU, 2011 45
46. Folksonomies But they have limits such as… Semantic ambiguity Polysemy, synonymy, cardinality and the use of acronyms Syntax free Spaces and multiple words are used without rules Language Different languages can be used for the same tag Being eventually shortsighted Fail to depict the general overview Lack of (or minimal) structure No explicit relationships between otherwise related tags David Lamas, TLU, 2011 46
47. Folksonomies and ontologies Folksonomies Domains Large corpus Informal categories Unstable entities Unclear edges Participants Naïve cataloguers No authority Uncoordinated users Amateur users Critical mass needed Ontologies Domains Small corpus Formal categories Stable entities Restricted entities Clear edges Participants Expert cataloguers Authoritative sources of judgment Coordinated users Expert users David Lamas, TLU, 2011 47
48. Folksonomies and ontologies How do we choose? Folksonomies are useful when all that is needed is the ability to link items to topics Ontologies are useful when what is needed is to formally define meaning But… do we need to choose? Not really, at least that what current research is exploring David Lamas, TLU, 2011 48
49. Folksonomies and ontologies Research directions include The combination of the folksonomy and ontology approaches into an hybrid system where the most consensual constructs would long last while others would be forgotten or redefined An approach that combines the ease and adaptability of folksonomy with the formality and semantic richness of an ontology Quantitative tag analysis and qualitative use analysis in current online social networking services To understand if tag usage converge or not To understand how a folksonomy is formed To… any ideas? David Lamas, TLU, 2011 49
51. Semantic Web The Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help One of the major obstacles to this has been the fact that most information on the Web is designed for human consumption, and even if it was derived from a database with well defined meanings (in at least some terms) for its columns, that the structure of the data is not evident to a robot browsing the web Leaving aside the artificial intelligence problem of training machines to behave like people, the Semantic Web approach instead develops languages for expressing information in a machine processable form. David Lamas, TLU, 2011
53. The internet of things The internet of things might be described as a self-configuring wireless network of sensors whose purpose would be to interconnect all things And the concept is attributed to the former Auto-ID Center, founded in 1999, based at the time at the MIT An alternative viewfocuses instead on making all things addressable by the existing naming protocols In the current vision, objects themselves do not interact, but they may now be referred to by other agents, such as centralized servers acting for their human users David Lamas, TLU, 2011
54. Metadata and Ontologies recap Metadata Ontologies Folksonomies The sematic web The internet of things David Lamas, TLU, 2011 54