Boost PC performance: How more available memory can improve productivity
Linked Open Data_mlanet13
1. An Introduction to the Semantic
Web and Linked Open Data
Kristi L. Holmes, PhD
Twitter: @kristiholmes
Layne Mark Johnson, PhD
@LayneJohnson
The day after May the Fourth, 2013
5. “a web of data that can be
processed directly and
indirectly by machines”
-Tim Berners-Lee
6. At its heart, the Semantic
Web is really about
extending standard Web
technologies to better deal
with data on the Web.
If the WWW is for people, the Semantic Web is
for machines
George Thomas and Jim Hendler, http://www.data.gov/communities/node/116/blogs/142
Data modeled as bidirectional relationships
Semantic Web Value
Proposition…
Web-based infrastructure of standards and technologies which
allows for a distributable, machine readable description of data
that allows for stronger data and smart web application linkages
7. How the Semantic Web works
Anakin Skywalker is Luke Skywalker's father.
8. How the Semantic Web works
XML and RDF are at the heart of the Semantic Web.
They give computers a structure in which to look for
information and define relationships between resources.
http://computer.howstuffworks.com/semantic-web
9. An ontology is simply a vocabulary that describes
objects and how they relate to one another. A schema
is a method for organizing information
http://computer.howstuffworks.com/semantic-web
11. Semantic web: describes methods and
technologies to allow machines to
understand the meaning or "semantics”
of information on the web.
-- W3C director Sir Tim Berners-Lee
Ontology: a formal representation of the
knowledge by a set of concepts within a
domain and the relationships between
those concepts.
-- Wikipedia
12. Let’s talk about the data…
The Semantic Web isn't just about
putting data on the web. It is about
making links, so that a person or
machine can explore the web of
data. With linked data, when you have
some of it, you can find other related
data.
http://computer.howstuffworks.com/semantic-web
13. The 5 Stars of Linked Open Data
★
★★
★★★
★★★★
★★★★★
http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/
http://www.w3.org/DesignIssues/LinkedData.html
AVAILABILITY
& VALUE
14. Is your data 5-star??
The 5 Stars of Linked Open Data
http://5stardata.info
17. Models and standards that allow for greater
data exchange (and flexibility!)
It takes layers and layers of
metadata, logic and security to
make the Web machine-
readable.
http://computer.howstuffworks.com/semantic-web
18. Building a web of data
http://geology.com/articles/night-satellite/satellite-photo-of-europe-at-night-lg.jpg
Data Creators, Data Aggregators,
& Data Consumers
Repositories. Tools. Applications.
Workflows
19.
20. Ok! Now let’s dig into a few good
examples of how we can put
these things to work
21. Linked Open Data and
Biomedical Research: A Survey
of Current International Efforts
Kristi L. Holmes, PhD
Twitter: @kristiholmes
Layne Mark Johnson, PhD
@LayneJohnson
May 5, 2013
24. Research Networking
Information about scholars is optimized using a Web-based
infrastructure of standards and technologies which allows for a
distributable, machine readable description of data that allows for
stronger data and smart web application linkages across many
universities, agencies, societies both within the US and abroad.
Why is this important?
Linked data infrastructure allows for
• Visualizations, research and clinical data integration, and deep
semantic searching across multiple types and sources of data
• By breaking data out of traditional database silos, research
networking platforms promote a network effect within a single
site and across multiple sites
– The value of the network increases with the amount of linked data
and applications that are available to consume the linked data.
25. The Semantic Web
& Researcher Networking
• Increasing recognition of the value of semantic web standards
• Increasing momentum in support of semantic web
technologies to facilitate research discovery
• Recommendations for researcher networking recently
endorsed by the CTSA Consortium Steering Committee
represent a new standard in researcher networking.
• Examples of applications that consume these rich data
include: visualizations, enhanced multi-site search. Other
utilities are in development across a wide range of topic
areas.
26. Recommendations and Best Practices
for Research Networking
The Research Networking Recommendations were approved by the CTSA Consortium Executive and
Steering Committee on October 25, 2011.
Recommendations for Research Networking:
• Recommendation: All CTSAs should encourage their institution(s) to implement
research networking tool(s) institution-wide that utilize RDF triples and an ontology
compatible with the VIVO ontology.
• Recommendation: Information in people profiles at institutions should be publicly
available as data as a general principle, specifically as Linked Open Data. To ensure
quality of information, authoritative electronic data sources versus manual entry
should be emphasized. Institutions will vary in the amount of information that they
will include and make publicly available but the value is enhanced by the quality and
quantity of information.
• Recommendation: Monitoring of the research networking landscape, technology,
and tools should continue to be overseen by experts from the CTSA consortium (e.g.,
the Research Networking group of the Informatics KFC).
https://www.ctsacentral.org/recommendations-and-best-practices-research-networking
27. Research Networking Systems
• VIVO, Profiles, SciVal Experts, Stanford’s
CAP, Iowa’s Loki
• Encourage your RN provider to meet the
recommendations for Researcher
networking
– Better visibility
– Enhanced utility
29. VIVO
This work is funded by the National Institutes of Health, U24
VIVO enjoys a robust open source, open
community space to support implementation,
adoption, and development efforts around the
world. See http://vivo.sourceforge.net
30. www.ctsaconnect.org CTSAconnect
Reveal Connections. Realize Potential.
CTSAConnect Project
Goals:
– Identify potential collaborators, relevant resources, and
expertise across scientific disciplines
– Assemble translational teams of scientists to address
specific research questions
Approach:
Create a semantic representation of clinician and basic
science researcher expertise to enable
– Broad and computable representation of translational
expertise
– Publication of expertise as Linked Data (LD) for use in
other applications
31. 1/25/2015 31www.ctsaconnect.org CTSAconnect
Reveal Connections. Realize Potential.
Merging VIVO and eagle-i
eagle-i is an ontology-driven application for collecting and
searching research resources.
VIVO is an ontology-driven application for collecting and
displaying information about people.
Both publish Linked Data. Neither addresses clinical
expertise.
CTSAconnect will produce a single Integrated Semantic
Framework, a modular collection of ontologies — that also
includes clinical expertise
eagle-i
Resources
VIV
O
People
Coordination
eagle-i
VIV
O
Semantic
Clinical
activities
32. OpenPHACTS
Open PHACTS Project
• To reduce the barriers to drug discovery in industry,
academia and for small businesses, the Open PHACTS
consortium is building the Open PHACTS Discovery
Platform. This will be freely available, integrating
pharmacological data from a variety of information
resources and providing tools and services to question this
integrated data to support pharmacological research.
Guiding principle is open access, open usage, open source
- Key to standards adoption -
http://www.openphacts.org/
33. OpenPHACTS
Open PHACTS Project
• Develop a set of robust standards…
• Implement the standards in a semantic integration hub
• Deliver services to support drug discovery programs in pharma
and public domain
• 22 partners, 8 pharmaceutical companies, 3 biotechs
• 36 months project, through March 2014
Guiding principle is open access, open usage, open source
- Key to standards adoption -
http://www.openphacts.org/
37. Search
• VIVOsearch and CTSAsearch
• VIVOsearchlight
• AgriVIVO – FAO of the UN
• Search across
– Land Grant institutions
– CTSA Consortium Schools
– State university systems; Big 10, Big 12, etc.
42. Are you using Linked Open Data?
What are your hopes for this
collection of technologies?
How can you get involved?
43. Open data, open tools, open process
Thank you!
Acknowledgements:
• Carlo Torniai & Melissa Haendel – OHSU
• Tony Williams – OpenPHACTS, RSC
• CTSA Research Networking Affinity workgroup
• VIVO Project
Notes de l'éditeur
My favorite tool…
Here were talking about different machines – computers.
Describe typical web page, limitations of HTML
Connecting documents, not concepts; not easy to traverse across disparate data sources
From DERI (http://www.deri.ie/about/press/coverage/details/?uid=194&ref=214):
The semantic web is a term coined by world wide web inventor and Deri advisory board member Tim Berners-Lee, to describe the “web of data” that enables machines to understand the semantics, or meaning, of information on the web.
It involves the insertion of machine-readable metadata into web pages to give information on how they are related to each other, enabling automated agents to access the web more intelligently and perform tasks on behalf of users.
Berners-Lee has defined the semantic web as “a web of data that can be processed directly and indirectly by machines”.
Anakin Skywalker is Luke Skywalker's father.
It's easy for you to figure out what this sentence means -- Anakin and Luke Skywalker are both people, and there is a relationship between them.
You know that a father is a type of parent, and that the sentence also means that Luke is Anakin's son.
But a computer can't figure any of that out without help. To allow a computer to understand what this sentence means, you'd need to add machine-readable information that describes who Anakin and Luke are and what their relationship is.
This starts with two tools -- eXtensible Markup Language (XML) and Resource Description Framework (RDF).
XML is a markup language XML complements HTML by adding tags that describe data. These tags are invisible to the people who read the document but visible to computers.
RDF does exactly what its name indicates -- using XML tags, it provides a framework to describe resources. In RDF terms, pretty much everything in the world is a resource.
To do this, RDF uses triples written as XML tags to express this information as a graph. These triples consist of a subject, property and object, which are like the subject, verb and direct object of a sentence. (Some sources call these the subject, predicate and object.)
So far in this example, the computer knows that there are two objects in this sentence and that there is a relationship between them. But it doesn't know what the objects are or how they relate to one another.
Another obstacle is that computers don't have the kind of vocabulary that people do.
Difficult to know the connections between different words and concepts and to infer meanings based on contexts.
In order to understand what words mean and what the relationships between words are, the computer has to have documents that describe all the words and logic to make the necessary connections.
In the Semantic Web, this comes from schemata and ontologies.
From DERI (http://www.deri.ie/about/press/coverage/details/?uid=194&ref=214):
The Semantic Web involves publishing in languages specifically designed for data: Resource Description Framework (RDF), Web Ontology Language (OWL), and Extensible Markup Language (XML).
HTML describes documents and the links between them. RDF, OWL, and XML, by contrast, can describe arbitrary things such as people, meetings, or airplane parts.
Semantic/ontology definitions (below RDF), Elly in RDF example for visual, point out links.
content from the VIVO team – http://vivoweb.org
Go to http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/ (linked on arrow) and view data levels and examples – emphasize costs and benefits
★ make your stuff available on the Web (whatever format) under an open license
★★ make it available as structured data (e.g., Excel instead of image scan of a table)
★★★ use non-proprietary formats (e.g., CSV instead of Excel)
★★★★ use URIs to identify things, so that people can point at your stuff
★★★★★ link your data to other data to provide context
What kind of things are available as linked data?
It takes layers and layers of metadata, logic and security to make the Web machine-readable.
Most visual representations of these layers involve a stack -- sort of a tower of blocks that represent all the layers.
The stack changes and evolves as the concepts behind the Semantic Web develop.
We want to use linked open data concepts to provide data as RDF at URIs. This is critically important for building a web of data.
Predicates have addresses, sites point to objects in other triples stores.
Resolve queries across triple stores – “show investigators who genetic work is implicated in breast cancer.” VIVO won’t have information linkages between breast cancer and disease. Other resources will. But VIVO can link to external sources. “Mike worksOn GeneY”
Archives. Data Aggregators. Publishers. Institutional repositories.
So now we turn to tools
This is a simplified version of the ecosystem of information we are creating. Additional elements not depicted are concepts and events.
VIVO enables collaboration and understanding across an institution and among institutions
VIVO harvests much of its data automatically from verified sources so it is accurate and current, reducing the need for manual input.
The rich information in VIVO profiles can be repurposed and shared with other institutional web pages and consumers, reducing cost and increasing efficiencies across the institution.
Data is housed and maintained at the local institutions. There it can be updated on a regular basis.
Search results are faceted so information can be located rapidly and with less time spent sorting through information.
Profiles are largely created via automated data feeds, but can be customized to suit the needs of the individual.
Profiles are richer in content than typical [web pages or] social networking sites and will rank higher in general internet searches.
Across institutions VIVO provides a uniform semantic structure to enable a new class of tools using the data to advance science. …..visualizations, search, discovery, etc
Each institution provides its own VIVO system and data. Local governance determines data to be provided.
VIVO structures data in RDF triples using the VIVO ontology. Moreover, the recommendations state that as a general principle the profile data should be publically available as Linked Open Data. This announcement demonstrates the CTSA Consortium’s recognition of the value of semantic web standards and increasing momentum in support of semantic web technologies to facilitate research discovery. Examples of applications which consume these rich data, including: visualizations (Katy’s viz URL), enhanced multi-site search (VIVO search URL), and VIVO Searchlight (searchlight URL). Other utilities are in development across a wide range of functionalities.
Strong open source development component to the project – this is reflected in part by the top notch applications that were submitted to a recent call for applications by the project
Data are reused and repurposed in a wide array of tools and settings.
Cornell University has done a stellar job of this – using VIVO data to provide current information about faculty and their interests for department and college websites; University of Florida reuses data from their VIVO for their CTSI member database – a move that other institutions are making, as well.