SlideShare une entreprise Scribd logo
1  sur  25
Discoverability and the Web
Getting PROV ready for the semantic web
What do I hope you'll take away from this
presentation?
•The web is moving from a web of documents to a web of data
•Making web content machine readable is important for discoverability
•We can also use APIs, web mark-up, and LOD to make our web resources more
discoverable and reusable
•This isn't a fad or a fantasy, it's happening all over the globe right now and we can be a
part of it if we want to at low cost in a timely fashion
Before we get into the heavy stuff
Some Big Bang Theory c/o Google
https://www.youtube.com/watch?
v=mmQl6VGvX-c
Who needs all this data and who works with
it?
•Researchers who are after as fine grain data as possible on a given topic i.e. basically anyone who
isn’t satisfied with just a web page of interpretation (document) about something
https://en.wikipedia.org/wiki/Vida_Goldstein but would rather supplement this with the granular
details about that thing or person and browse to related data across the web
http://dbpedia.org/page/Vida_Goldstein
•Any organisation who wants to make its web resources as discoverable and usable as possible e.g.
the BBC, the Smithsonian, the Getty, UK National Archives, Digital NZ, National Archives of Korea,
SLNSW, TROVE, SRNSW, Auckland Museum or just check out http://bit.ly/1OGYZYJ
•Anyone who wants to help annotate content on the web for the social good. Think of TROVE or
our own WIKI in which 92 tags have been used 457,750 times across more than 50,000 pages!
Here’s just one example http://wiki.prov.vic.gov.au/index.php/Property:Has_keywords
•Software developers who want to build new applications out of this data to make it more
accessible and engaging. (We’ll look at some real life examples in just a moment).
•Anyone who wants to ask or allow to be asked sophisticated questions like "Show me all 20th
Century painters who were born near Timaru“, "Who were Colin McCahon's contemporaries and
let me see a chronology of their major paintings.“, “Show me all the Works in Harvard Library by
Swedish Nobel Prize winners.”, “How many people died from tuberculosis in Victoria from 1840-
1940?”, “List all Parish Plans showing allotments purchased by person X from 1900-1915 for up to
300 pounds only”.
Imagine...
Imagine... a researcher in 10 years time who
wants to use research data about the Eureka
Stockade from the Life Sciences, the Humanities
and the decorative arts to examine the
consequences of the event for Victoria’s
economy, environment and art trends from 1854
to 1870. Imagine they have access to a range of
documents but also statistical and other data
from a range of institutions that allows them to
carry this out.
Is this just a fantasy driven by a select few for a niche audience?
In a word...NO. Next slide please...
2014 Linked Open Data
Cloud
http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/#toc2
1014 organisations
About 183 gov’t
The Semantic Web
From a web of documents to a web of data
http://dbpedia.org/page/Jerilderie_Letter
PART 1: EXPLAINING SOME KEY CONCEPTS OF
THE SEMANTIC WEB
Linked Open Data refers to the way in which we have moved from the ability to link web pages and
documents over the web to the ability to link data within those web pages and documents over
the web to related data and documents.
What?Here’s a page in Wikipedia about the Jerilderie Letter
https://en.wikipedia.org/wiki/Jerilderie_Letter and here’s the Wikipedia data behind that
page/topic with links to related data http://dbpedia.org/page/Jerilderie_Letter
LODLAM is the acronym for this Linked Open Data process within Libraries, Archives and Museums.
Let’s start with an example to see how this happens...
A real life example!
http://lodlive.it/LodLive-wiki.prov.vic.gov.au/app_en.html?http://wiki.prov.vic.gov.au/index.php/Special:
This graph is showing us all the metadata contained within the PROV wiki page on the famous
Jerilderie Letter http://wiki.prov.vic.gov.au/index.php/Jerilderie_Letter
The lodview application was written by a developer based in Italy whom I met at the LODLAM 2015
Conference in Sydney recently. It is a piece of software that he wrote to help humans browse the
linked open data universe in a visual way. The beauty of it is that you can literally follow your nose
through all the connections between resources on the internet via their metadata. Or as in this
example you can just explore the metadata within the wiki page itself. So this is the machine
readable view of the wiki page, which begs the question...
What's the value in machine readable
metadata?
It simply means that as developers come up with new presentation environments for our content
we will be ready to make it accessible to them in a form they can actually use!
Here's a human readable page of the PROV wiki relating to the copy of the famous Ned Kelly
Jerilderie Letter we have in our collection http://wiki.prov.vic.gov.au/index.php/Jerilderie_Letter
Okay so how do we create machine readable
metadata?
Well we don’t need to. The beauty of the platform that the PROV wiki is built on (semantic
Mediawiki) is that it automatically creates it for us for every single page we have created metadata
for. The metadata is turned into a standard data model for machines to read called RDF. It’s the
lingua franca of the semantic web and fortunately there are a lot of smart people out there who
have developed software to transform other data types and models into RDF.
Because our wiki makes all of its contents machine readable using the standard data model of the
semantic web i.e. RDF we also offer developers a machine readable version of the same wiki page
for them to consume in whatever applications they build for browsing the semantic web.
Does that mean we’re reliant on the Wiki?
No, we can actually turn all of PROV’s Function , Agency and Series metadata into Linked Open
Data because we have something very magical called an API!
An AP What? With the help of the developer behind http://metadata.prov.vic.gov.au/provisualizer
we can use the PROV API developed by Kaz and David Fowler to gather A1 metadata consisting of
http://metadata.prov.vic.gov.au/oai/query?verb=ListSets 139 Functions, 2579 Agencies and 15212
Series and turn that into Linked Open Data. It will be inexpensive, fast and take our ACM into the
Semantic Web, similar to how we already have with the PROV wiki
http://wiki.prov.vic.gov.au/rdf/Public_Record_Office_Victoria_Semantic_Wiki.rdf
When we make Item level data accessible through our API we’ll be able to create Linked Open
Data for it as well associating it with the archives ontology we deem most appropriate.
We’re not the first archive to do or think
about this!
http://www.archivesnext.com/?p=3450
Archives Hub (UK): The Archives Hub provides a gateway to thousands of the UK’s richest archives.
Representing over 220 institutions across the country.(http://archiveshub.ac.uk/introduction/)
•Linked Jazz(Pratt Institute): a research project investigating the application of Linked Open Data (LOD)
technologies to digital cultural heritage materials. (https://linkedjazz.org/about-the-project/ )
•SNAC( Unmiversity of Virginia): an aggregate of biographical information about people, both individuals and
groups, who created or are documented in historical resources. Users can search for names of individual
people, organizations, and families; browse featured descriptions; and discover and locate connected
historical resources. Search results can be filtered by occupation and subject. (
http://socialarchive.iath.virginia.edu/snac/search )
•Conal Touhy(Brisbane-based independent software developer: “ I’ve spent a bit of time just recently poking
at the new Web API of Museum Victoria Collections, and making a Linked Open Data service based on their
API. I’m writing this up as an example of one way — a relatively easy way — to publish Linked Data off the
back of some existing API. I hope that some other libraries, archives, and museums with their own API will
adopt this approach and start publishing their data in a standard Linked Data style, so it can be linked up with
the wider web of data.” (http://conaltuohy.com/blog/lod-from-custom-web-api/ ).
And here is san example of the Linked Open Data he created for 1 item from the MV API http://bit.ly/1Zjge5P
And these are the item details from the MV website http://collections.museumvictoria.com.au/items/1411018
Just one of 93,817 man made objects they have in their collection http://collections.museumvictoria.com.au/
all accessible through their API http://collections.museumvictoria.com.au/api
See more applications pertaining to documentary heritage here http://summit2015.lodlam.net/
Time to Re-Cap and Breathe
Way back in our first example of the Jerilderie Letter you'll notice that we serve up some really
useful metadata including image URLS, georeferencing data etc that can all be consumed by
software (i.e. ‘intelligent agents’ as first communicated by Sir Tim Berners Lee).
http://lodlive.it/LodLive-wiki.prov.vic.gov.au/app_en.html?
http://wiki.prov.vic.gov.au/index.php/Special:URIResolver/Jerilderie_Letter
So, increased access to and awareness of our cultural collection by other cultural collections and
links to significant datasets such as DBpedia (the semantic database version of Wikipedia) carries
with it the benefits of increased item count / usage ( metrics that feeds directly into BP3 stats) and
ultimately continued funding for all the very important work we all continue to do in storing,
preserving and making accessible the State archives to the people of Victoria. However, that’s a
very inward looking view.
The flip side of this is something Richard Lehane from SRNSW touched on in a blog post on API’s in
October 2011 which argues that making their search tool open and accessible to developers via
their API means they can garner the work of others and inform their own choices around mobile
application development etc, though that’s probably worth a talk by itself for another time http://
data.records.nsw.gov.au/?p=248
More breathing space 
LODLAM is all about promoting the free and open use of collection metadata between cultural
institutions around the world in a way that software can parse and use in various applications that
generate increased value for the user and the organisation re access, usage, interoperability. It’s
still in its early stages but has made significant progress in just a few short years. Tim Berners Lee’s
vision for a web of data as opposed to a web of documents is not impossible to imagine as really
big organisations across the globe get onboard:
http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/#toc2
So how might this work for PROV?
Well imagine a situation where a researcher has access to related content across all the archives in
Australia or Australasia simply because that content has been annotated with metadata in a
shared language (i.e. RDF) which means software can parse it and make the necessary connections
for search engines to deliver rich results to complex questions. Not only can a researcher explore
related material within a single archival collection but can broaden this out to multiple collections.
And then what if it is then possible to bring in related content from Libraries, Galleries and
Museums as well, all the time filtering out the irrelevant material you don’t want to see?
This is the vision of the LODLAM community which brings me to part 2 of this presentation. Don’t
worry it’s going to be brief compared to Part 1
PART 2: LODLAM SYDNEY 29-30/JUNE 2015
http://summit2015.lodlam.net/about/
100 people from around the world meet up for 2 days every year to try and work out how best to
make LODLAM work. It’s a heady mix of digital humanists, developers, data wranglers, curators,
geeks and people with an interest such as me! I first attended a LODLAM Conference in San
Francisco in 2011, funded by the Internet Archive and the Sloan philanthropic foundation ( all you
had to do was apply). This year it was held in Sydney and PROV very kindly paid my registration fee
of $US100.00.It’s run according to the unconference format
“where attendees propose sessions on the first day, start those sessions then on the 2nd day the
same thing happens with a degree of socialising, tweeting etc around the discussions. There are no
keynote speakers. The meeting is based on the two primary principles of passion and
responsibility: passion to jump in and play an active role; and responsibility to lead, and follow
through with action. No papers will be submitted or read, no plenaries given, and everyone will
participate.”( https://en.wikipedia.org/wiki/Open_Space_Technology )
So what did I do?
I tried to get to as many sessions as possible but it was hard and there was so much on offer!
https://docs.google.com/spreadsheets/d/19mfLBoztvaaaik20-P2syANn2fjURzokE-
xILLMOVQ0/pubhtml
I particularly enjoyed
• A pre conference presentation by Rachael Frick, Digital Public Library of America
g:provaccess managementprojectslodlamfricksydney.pdf
• So you've got a collection API, now what? merged with How to add LOD publication
functions to existing collection management systems. Lightweight, plug-in approaches
• LODlive graph browser. Diego Valerio Camarda
• archive.schema.org. Richard Wallis
I’ll try and give you a brief overview of what I learned:
What is the DPLA?
The Digital Public Library of America (DPLA) is an all-digital library that aggregates metadata — or information
describing an item — and thumbnails for millions of photographs, manuscripts, books, sounds, moving images,
and more from libraries, archives, and museums around the United States. DPLA brings together the riches of
America’s libraries, archives, and museums, and makes them freely available to the world.”
It is very much about creating a portal for developers to use the metadata to build tools:
http://dp.la/info/developers/
The dpla use a number of hubs that reach out to content partners. These hubs facilitate content migration,
providing guidance and support around content/ rights and technical issues that might appear.
The DPLA provides a beautiful segue into the role that APIs play in exposing collection metadata to the world
and allowing others to use it to build tools useful to the collecting organisation and the researcher community.
What is an API?
Basically a way into an organisations’s metadata via a programmatic interface. If you want a really
great definition check http://data.records.nsw.gov.au/?p=248
PROV has 2 APIs that I know of, the ANDS API and the PROV wiki API. The first one feeds directly
into Research Data Australia and uses the metadata schema Rif-CS. The second one is a little more
accessible e.g.http://www.culturevictoria.com/collection-search/ delivering item level content,
using the OpenSearch protocol first developed by Amazon.
While this isn’t LOD, both are a step in the right direction of improving our discoverability.
What is mark-up? Schema.org
an initiative launched on 2 June 2011 by Bing, Google and Yahoo![ (the operators of the then
world's largest search engines) create and support a common set of schemas for structured data
mark-up on web pages. At LODLAM in Sydney , Richard Wallace proposed the creation of a
working group to develop an extension to Schema.org to encompass mark-up of web pages
relating directly to archives. An initial model of this has recently been created, and as I
understand , the NAA will be marking up their pages in the near future. Zoe D’Arcy from the NAA
will keep me informed as to their experience after doing this.
Why bother?
Search Engines can deliver richer more relevant results if they can ‘see’ the context behind web
pages e.g. a mention of Public Record Office Victoria on our website refers to an archive as
described by the Scema.org extension the working group is developing, as opposed to a string of
characters that could be the name of a rock band or all manner of things!
What might the extension look like?
This diagram shows the basic relationship between the proposed main archive, specific types plus
relevant Schema types in the model.
And how might a web page be ‘marked up’?
@prefix schema: <http://schema.org/>.
#An Archive (Organization)
<http://archive.example.com>
a schema:Archive;
schema:name "The Example Archive";
schema:address "The Old Archive, City Square, Anytown";
schema:email "info@archive.example.com";
schema:owns [
a schema:OwnershipInfo;
schema:ownedFrom "1957";
schema:typeOfGood <http://archive.example.com/boolarchive>;
schema:ownershipType schema:HasCustodyOwnership.
]
#An ArchiveCollection
<http://archive.example.com/boolarchive>
a schema:ArchiveCollection;
schema:name "The Boolean Papers Collection:;
schema:creator "Sir Binary Boolean";
schema:accessAndUse "Public view, in archive location, no image reproductions";
schema:itemLocation <http://archive.example.com>.
Conclusions: Yes it’s the end!
•We all want to make the archives as discoverable as possible
•As long as we’re on the net we might as well be on it well (clumsy I know but you get
the gist)
•There are many pieces to the puzzle...APIs, Linked Open Data, non proprietary
software, marking up web pages for Search Engines e.g. Schema.org
• We have the ability to become highly discoverable right now at low cost and in a way
that is scalable.
•What will the Access the Collection of the future look like?
• All it will take is the ability to join the dots. Many others around the world have
already done this so we’re not alone. We are lucky to have some brilliant minds with
exceptional skills in our own back yard so let’s use them.
2014 Linked Open Data
Cloud
http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/#toc2
1015 organisations
What do I hope you‘ve take away from this
presentation?
•The web is moving from a web of documents to a web of data
•Making web content machine readable is important for discoverability
•We can also use APIs, web mark-up, and LOD to make our web resources more
discoverable and reusable
•This isn't a fad or a fantasy, it's happening all over the globe right now and we can be a
part of it if we want to at low cost in a timely fashion

Contenu connexe

Tendances

What is Linked Data, and What Does It Mean for Libraries?
What is Linked Data, and What Does It Mean for Libraries?What is Linked Data, and What Does It Mean for Libraries?
What is Linked Data, and What Does It Mean for Libraries?
Emily Nimsakont
 

Tendances (20)

Linked Data Patterns
Linked Data PatternsLinked Data Patterns
Linked Data Patterns
 
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...
agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Ab...
 
Interoperability for web based scholarship
Interoperability for web based scholarshipInteroperability for web based scholarship
Interoperability for web based scholarship
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
 
What is Linked Data, and What Does It Mean for Libraries?
What is Linked Data, and What Does It Mean for Libraries?What is Linked Data, and What Does It Mean for Libraries?
What is Linked Data, and What Does It Mean for Libraries?
 
Achieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsAchieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed Collections
 
The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about it
 
Quantifying Orphaned Annotations in Hypothes.is
Quantifying Orphaned Annotations in Hypothes.isQuantifying Orphaned Annotations in Hypothes.is
Quantifying Orphaned Annotations in Hypothes.is
 
Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)
 
Promises and Pitfalls: Linked Data, Privacy, and Library Catalogs
Promises and Pitfalls: Linked Data, Privacy, and Library CatalogsPromises and Pitfalls: Linked Data, Privacy, and Library Catalogs
Promises and Pitfalls: Linked Data, Privacy, and Library Catalogs
 
Linked Data and Libraries: What? Why? How?
Linked Data and Libraries: What? Why? How?Linked Data and Libraries: What? Why? How?
Linked Data and Libraries: What? Why? How?
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
 
Metadata / Linked Data
Metadata / Linked DataMetadata / Linked Data
Metadata / Linked Data
 
Joy Nelson - BIBFRAME: MARC Replacement and Much More
Joy Nelson - BIBFRAME: MARC Replacement and Much MoreJoy Nelson - BIBFRAME: MARC Replacement and Much More
Joy Nelson - BIBFRAME: MARC Replacement and Much More
 
The Web of Data is Our Oyster
The Web of Data is Our OysterThe Web of Data is Our Oyster
The Web of Data is Our Oyster
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning Issue
 
Linked Data - the Future for Open Repositories?
Linked Data - the Future for Open Repositories?Linked Data - the Future for Open Repositories?
Linked Data - the Future for Open Repositories?
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCID
 
Signposting Overview
Signposting OverviewSignposting Overview
Signposting Overview
 
Linked data - A radical change?
Linked data - A radical change?Linked data - A radical change?
Linked data - A radical change?
 

Similaire à Lodlam presentation v1.0 final al20151104

ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
Jon Voss
 

Similaire à Lodlam presentation v1.0 final al20151104 (20)

Semantic Web 2.0: Creating Social Semantic Information Spaces
Semantic Web 2.0: Creating Social Semantic Information SpacesSemantic Web 2.0: Creating Social Semantic Information Spaces
Semantic Web 2.0: Creating Social Semantic Information Spaces
 
Linked Open Data Publications through Wikidata & Persistent Identification in...
Linked Open Data Publications through Wikidata & Persistent Identification in...Linked Open Data Publications through Wikidata & Persistent Identification in...
Linked Open Data Publications through Wikidata & Persistent Identification in...
 
Open Culture - How Wiki loves art and data - Packed
 Open Culture - How Wiki loves art and data - Packed Open Culture - How Wiki loves art and data - Packed
Open Culture - How Wiki loves art and data - Packed
 
Linked Open Data Publications through Wikidata & Persistent Identification...
Linked Open Data  Publications through  Wikidata &  Persistent Identification...Linked Open Data  Publications through  Wikidata &  Persistent Identification...
Linked Open Data Publications through Wikidata & Persistent Identification...
 
I Linked Open Data nei Beni Culturali, alcuni progetti e casi di studio
I Linked Open Data nei Beni Culturali, alcuni progetti e casi di studioI Linked Open Data nei Beni Culturali, alcuni progetti e casi di studio
I Linked Open Data nei Beni Culturali, alcuni progetti e casi di studio
 
Doing Clever Things with the Semantic Web
Doing Clever Things with the Semantic WebDoing Clever Things with the Semantic Web
Doing Clever Things with the Semantic Web
 
Web 2.0...it’s okay to play!
Web 2.0...it’s okay to play!Web 2.0...it’s okay to play!
Web 2.0...it’s okay to play!
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
 
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & MuseumsALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
ALIAOnline Practical Linked (Open) Data for Libraries, Archives & Museums
 
Gatenby Vvbad 200909
Gatenby Vvbad 200909Gatenby Vvbad 200909
Gatenby Vvbad 200909
 
DataPortability and Me: Introducing SIOC, FOAF and the Semantic Web
DataPortability and Me: Introducing SIOC, FOAF and the Semantic WebDataPortability and Me: Introducing SIOC, FOAF and the Semantic Web
DataPortability and Me: Introducing SIOC, FOAF and the Semantic Web
 
Methodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked DataMethodological Guidelines for Publishing Linked Data
Methodological Guidelines for Publishing Linked Data
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011
 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcase
 
RBMS LODLAM presentation
RBMS LODLAM presentationRBMS LODLAM presentation
RBMS LODLAM presentation
 
Linked Data and why we (librarians) should care
Linked Data and why we (librarians) should careLinked Data and why we (librarians) should care
Linked Data and why we (librarians) should care
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 
What can linked data do for digital libraries
What can linked data do for digital librariesWhat can linked data do for digital libraries
What can linked data do for digital libraries
 
Digital Libraries of the Future
Digital Libraries of the Future
Digital Libraries of the Future
Digital Libraries of the Future
 

Dernier

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Dernier (20)

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 

Lodlam presentation v1.0 final al20151104

  • 1. Discoverability and the Web Getting PROV ready for the semantic web
  • 2. What do I hope you'll take away from this presentation? •The web is moving from a web of documents to a web of data •Making web content machine readable is important for discoverability •We can also use APIs, web mark-up, and LOD to make our web resources more discoverable and reusable •This isn't a fad or a fantasy, it's happening all over the globe right now and we can be a part of it if we want to at low cost in a timely fashion
  • 3. Before we get into the heavy stuff Some Big Bang Theory c/o Google https://www.youtube.com/watch? v=mmQl6VGvX-c
  • 4. Who needs all this data and who works with it? •Researchers who are after as fine grain data as possible on a given topic i.e. basically anyone who isn’t satisfied with just a web page of interpretation (document) about something https://en.wikipedia.org/wiki/Vida_Goldstein but would rather supplement this with the granular details about that thing or person and browse to related data across the web http://dbpedia.org/page/Vida_Goldstein •Any organisation who wants to make its web resources as discoverable and usable as possible e.g. the BBC, the Smithsonian, the Getty, UK National Archives, Digital NZ, National Archives of Korea, SLNSW, TROVE, SRNSW, Auckland Museum or just check out http://bit.ly/1OGYZYJ •Anyone who wants to help annotate content on the web for the social good. Think of TROVE or our own WIKI in which 92 tags have been used 457,750 times across more than 50,000 pages! Here’s just one example http://wiki.prov.vic.gov.au/index.php/Property:Has_keywords •Software developers who want to build new applications out of this data to make it more accessible and engaging. (We’ll look at some real life examples in just a moment). •Anyone who wants to ask or allow to be asked sophisticated questions like "Show me all 20th Century painters who were born near Timaru“, "Who were Colin McCahon's contemporaries and let me see a chronology of their major paintings.“, “Show me all the Works in Harvard Library by Swedish Nobel Prize winners.”, “How many people died from tuberculosis in Victoria from 1840- 1940?”, “List all Parish Plans showing allotments purchased by person X from 1900-1915 for up to 300 pounds only”.
  • 5. Imagine... Imagine... a researcher in 10 years time who wants to use research data about the Eureka Stockade from the Life Sciences, the Humanities and the decorative arts to examine the consequences of the event for Victoria’s economy, environment and art trends from 1854 to 1870. Imagine they have access to a range of documents but also statistical and other data from a range of institutions that allows them to carry this out.
  • 6. Is this just a fantasy driven by a select few for a niche audience? In a word...NO. Next slide please...
  • 7. 2014 Linked Open Data Cloud http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/#toc2 1014 organisations About 183 gov’t
  • 8. The Semantic Web From a web of documents to a web of data http://dbpedia.org/page/Jerilderie_Letter
  • 9. PART 1: EXPLAINING SOME KEY CONCEPTS OF THE SEMANTIC WEB Linked Open Data refers to the way in which we have moved from the ability to link web pages and documents over the web to the ability to link data within those web pages and documents over the web to related data and documents. What?Here’s a page in Wikipedia about the Jerilderie Letter https://en.wikipedia.org/wiki/Jerilderie_Letter and here’s the Wikipedia data behind that page/topic with links to related data http://dbpedia.org/page/Jerilderie_Letter LODLAM is the acronym for this Linked Open Data process within Libraries, Archives and Museums. Let’s start with an example to see how this happens...
  • 10. A real life example! http://lodlive.it/LodLive-wiki.prov.vic.gov.au/app_en.html?http://wiki.prov.vic.gov.au/index.php/Special: This graph is showing us all the metadata contained within the PROV wiki page on the famous Jerilderie Letter http://wiki.prov.vic.gov.au/index.php/Jerilderie_Letter The lodview application was written by a developer based in Italy whom I met at the LODLAM 2015 Conference in Sydney recently. It is a piece of software that he wrote to help humans browse the linked open data universe in a visual way. The beauty of it is that you can literally follow your nose through all the connections between resources on the internet via their metadata. Or as in this example you can just explore the metadata within the wiki page itself. So this is the machine readable view of the wiki page, which begs the question...
  • 11. What's the value in machine readable metadata? It simply means that as developers come up with new presentation environments for our content we will be ready to make it accessible to them in a form they can actually use! Here's a human readable page of the PROV wiki relating to the copy of the famous Ned Kelly Jerilderie Letter we have in our collection http://wiki.prov.vic.gov.au/index.php/Jerilderie_Letter Okay so how do we create machine readable metadata? Well we don’t need to. The beauty of the platform that the PROV wiki is built on (semantic Mediawiki) is that it automatically creates it for us for every single page we have created metadata for. The metadata is turned into a standard data model for machines to read called RDF. It’s the lingua franca of the semantic web and fortunately there are a lot of smart people out there who have developed software to transform other data types and models into RDF. Because our wiki makes all of its contents machine readable using the standard data model of the semantic web i.e. RDF we also offer developers a machine readable version of the same wiki page for them to consume in whatever applications they build for browsing the semantic web.
  • 12. Does that mean we’re reliant on the Wiki? No, we can actually turn all of PROV’s Function , Agency and Series metadata into Linked Open Data because we have something very magical called an API! An AP What? With the help of the developer behind http://metadata.prov.vic.gov.au/provisualizer we can use the PROV API developed by Kaz and David Fowler to gather A1 metadata consisting of http://metadata.prov.vic.gov.au/oai/query?verb=ListSets 139 Functions, 2579 Agencies and 15212 Series and turn that into Linked Open Data. It will be inexpensive, fast and take our ACM into the Semantic Web, similar to how we already have with the PROV wiki http://wiki.prov.vic.gov.au/rdf/Public_Record_Office_Victoria_Semantic_Wiki.rdf When we make Item level data accessible through our API we’ll be able to create Linked Open Data for it as well associating it with the archives ontology we deem most appropriate.
  • 13. We’re not the first archive to do or think about this! http://www.archivesnext.com/?p=3450 Archives Hub (UK): The Archives Hub provides a gateway to thousands of the UK’s richest archives. Representing over 220 institutions across the country.(http://archiveshub.ac.uk/introduction/) •Linked Jazz(Pratt Institute): a research project investigating the application of Linked Open Data (LOD) technologies to digital cultural heritage materials. (https://linkedjazz.org/about-the-project/ ) •SNAC( Unmiversity of Virginia): an aggregate of biographical information about people, both individuals and groups, who created or are documented in historical resources. Users can search for names of individual people, organizations, and families; browse featured descriptions; and discover and locate connected historical resources. Search results can be filtered by occupation and subject. ( http://socialarchive.iath.virginia.edu/snac/search ) •Conal Touhy(Brisbane-based independent software developer: “ I’ve spent a bit of time just recently poking at the new Web API of Museum Victoria Collections, and making a Linked Open Data service based on their API. I’m writing this up as an example of one way — a relatively easy way — to publish Linked Data off the back of some existing API. I hope that some other libraries, archives, and museums with their own API will adopt this approach and start publishing their data in a standard Linked Data style, so it can be linked up with the wider web of data.” (http://conaltuohy.com/blog/lod-from-custom-web-api/ ). And here is san example of the Linked Open Data he created for 1 item from the MV API http://bit.ly/1Zjge5P And these are the item details from the MV website http://collections.museumvictoria.com.au/items/1411018 Just one of 93,817 man made objects they have in their collection http://collections.museumvictoria.com.au/ all accessible through their API http://collections.museumvictoria.com.au/api See more applications pertaining to documentary heritage here http://summit2015.lodlam.net/
  • 14. Time to Re-Cap and Breathe Way back in our first example of the Jerilderie Letter you'll notice that we serve up some really useful metadata including image URLS, georeferencing data etc that can all be consumed by software (i.e. ‘intelligent agents’ as first communicated by Sir Tim Berners Lee). http://lodlive.it/LodLive-wiki.prov.vic.gov.au/app_en.html? http://wiki.prov.vic.gov.au/index.php/Special:URIResolver/Jerilderie_Letter So, increased access to and awareness of our cultural collection by other cultural collections and links to significant datasets such as DBpedia (the semantic database version of Wikipedia) carries with it the benefits of increased item count / usage ( metrics that feeds directly into BP3 stats) and ultimately continued funding for all the very important work we all continue to do in storing, preserving and making accessible the State archives to the people of Victoria. However, that’s a very inward looking view. The flip side of this is something Richard Lehane from SRNSW touched on in a blog post on API’s in October 2011 which argues that making their search tool open and accessible to developers via their API means they can garner the work of others and inform their own choices around mobile application development etc, though that’s probably worth a talk by itself for another time http:// data.records.nsw.gov.au/?p=248
  • 15. More breathing space  LODLAM is all about promoting the free and open use of collection metadata between cultural institutions around the world in a way that software can parse and use in various applications that generate increased value for the user and the organisation re access, usage, interoperability. It’s still in its early stages but has made significant progress in just a few short years. Tim Berners Lee’s vision for a web of data as opposed to a web of documents is not impossible to imagine as really big organisations across the globe get onboard: http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/#toc2 So how might this work for PROV? Well imagine a situation where a researcher has access to related content across all the archives in Australia or Australasia simply because that content has been annotated with metadata in a shared language (i.e. RDF) which means software can parse it and make the necessary connections for search engines to deliver rich results to complex questions. Not only can a researcher explore related material within a single archival collection but can broaden this out to multiple collections. And then what if it is then possible to bring in related content from Libraries, Galleries and Museums as well, all the time filtering out the irrelevant material you don’t want to see? This is the vision of the LODLAM community which brings me to part 2 of this presentation. Don’t worry it’s going to be brief compared to Part 1
  • 16. PART 2: LODLAM SYDNEY 29-30/JUNE 2015 http://summit2015.lodlam.net/about/ 100 people from around the world meet up for 2 days every year to try and work out how best to make LODLAM work. It’s a heady mix of digital humanists, developers, data wranglers, curators, geeks and people with an interest such as me! I first attended a LODLAM Conference in San Francisco in 2011, funded by the Internet Archive and the Sloan philanthropic foundation ( all you had to do was apply). This year it was held in Sydney and PROV very kindly paid my registration fee of $US100.00.It’s run according to the unconference format “where attendees propose sessions on the first day, start those sessions then on the 2nd day the same thing happens with a degree of socialising, tweeting etc around the discussions. There are no keynote speakers. The meeting is based on the two primary principles of passion and responsibility: passion to jump in and play an active role; and responsibility to lead, and follow through with action. No papers will be submitted or read, no plenaries given, and everyone will participate.”( https://en.wikipedia.org/wiki/Open_Space_Technology )
  • 17. So what did I do? I tried to get to as many sessions as possible but it was hard and there was so much on offer! https://docs.google.com/spreadsheets/d/19mfLBoztvaaaik20-P2syANn2fjURzokE- xILLMOVQ0/pubhtml I particularly enjoyed • A pre conference presentation by Rachael Frick, Digital Public Library of America g:provaccess managementprojectslodlamfricksydney.pdf • So you've got a collection API, now what? merged with How to add LOD publication functions to existing collection management systems. Lightweight, plug-in approaches • LODlive graph browser. Diego Valerio Camarda • archive.schema.org. Richard Wallis I’ll try and give you a brief overview of what I learned:
  • 18. What is the DPLA? The Digital Public Library of America (DPLA) is an all-digital library that aggregates metadata — or information describing an item — and thumbnails for millions of photographs, manuscripts, books, sounds, moving images, and more from libraries, archives, and museums around the United States. DPLA brings together the riches of America’s libraries, archives, and museums, and makes them freely available to the world.” It is very much about creating a portal for developers to use the metadata to build tools: http://dp.la/info/developers/ The dpla use a number of hubs that reach out to content partners. These hubs facilitate content migration, providing guidance and support around content/ rights and technical issues that might appear. The DPLA provides a beautiful segue into the role that APIs play in exposing collection metadata to the world and allowing others to use it to build tools useful to the collecting organisation and the researcher community.
  • 19. What is an API? Basically a way into an organisations’s metadata via a programmatic interface. If you want a really great definition check http://data.records.nsw.gov.au/?p=248 PROV has 2 APIs that I know of, the ANDS API and the PROV wiki API. The first one feeds directly into Research Data Australia and uses the metadata schema Rif-CS. The second one is a little more accessible e.g.http://www.culturevictoria.com/collection-search/ delivering item level content, using the OpenSearch protocol first developed by Amazon. While this isn’t LOD, both are a step in the right direction of improving our discoverability.
  • 20. What is mark-up? Schema.org an initiative launched on 2 June 2011 by Bing, Google and Yahoo![ (the operators of the then world's largest search engines) create and support a common set of schemas for structured data mark-up on web pages. At LODLAM in Sydney , Richard Wallace proposed the creation of a working group to develop an extension to Schema.org to encompass mark-up of web pages relating directly to archives. An initial model of this has recently been created, and as I understand , the NAA will be marking up their pages in the near future. Zoe D’Arcy from the NAA will keep me informed as to their experience after doing this. Why bother? Search Engines can deliver richer more relevant results if they can ‘see’ the context behind web pages e.g. a mention of Public Record Office Victoria on our website refers to an archive as described by the Scema.org extension the working group is developing, as opposed to a string of characters that could be the name of a rock band or all manner of things!
  • 21. What might the extension look like? This diagram shows the basic relationship between the proposed main archive, specific types plus relevant Schema types in the model.
  • 22. And how might a web page be ‘marked up’? @prefix schema: <http://schema.org/>. #An Archive (Organization) <http://archive.example.com> a schema:Archive; schema:name "The Example Archive"; schema:address "The Old Archive, City Square, Anytown"; schema:email "info@archive.example.com"; schema:owns [ a schema:OwnershipInfo; schema:ownedFrom "1957"; schema:typeOfGood <http://archive.example.com/boolarchive>; schema:ownershipType schema:HasCustodyOwnership. ] #An ArchiveCollection <http://archive.example.com/boolarchive> a schema:ArchiveCollection; schema:name "The Boolean Papers Collection:; schema:creator "Sir Binary Boolean"; schema:accessAndUse "Public view, in archive location, no image reproductions"; schema:itemLocation <http://archive.example.com>.
  • 23. Conclusions: Yes it’s the end! •We all want to make the archives as discoverable as possible •As long as we’re on the net we might as well be on it well (clumsy I know but you get the gist) •There are many pieces to the puzzle...APIs, Linked Open Data, non proprietary software, marking up web pages for Search Engines e.g. Schema.org • We have the ability to become highly discoverable right now at low cost and in a way that is scalable. •What will the Access the Collection of the future look like? • All it will take is the ability to join the dots. Many others around the world have already done this so we’re not alone. We are lucky to have some brilliant minds with exceptional skills in our own back yard so let’s use them.
  • 24. 2014 Linked Open Data Cloud http://linkeddatacatalog.dws.informatik.uni-mannheim.de/state/#toc2 1015 organisations
  • 25. What do I hope you‘ve take away from this presentation? •The web is moving from a web of documents to a web of data •Making web content machine readable is important for discoverability •We can also use APIs, web mark-up, and LOD to make our web resources more discoverable and reusable •This isn't a fad or a fantasy, it's happening all over the globe right now and we can be a part of it if we want to at low cost in a timely fashion

Notes de l'éditeur

  1. http://metadata.prov.vic.gov.au/oai/query?verb=GetRecord&amp;metadataPrefix=rif&amp;identifier=PROV%20VPRS%201189