SlideShare une entreprise Scribd logo
1  sur  15
Data.dcs: Converting Legacy Data into Linked Data Matthew Rowe Organisations, Information and Knowledge Group University of Sheffield
Outline Problem Legacy data contained within the Department of Computer Science Motivation Why produce linked data? Converting Legacy Data into Linked Data: Triplification of Legacy Data Coreference  Resolution Linking into the Web of Linked Data Deployment Conclusions
Problem ,[object Object],People Research groups Publications Legacy data is defined as important information which is stored in proprietary formats  Each member of the DCS maintains his/her own web page Heterogeneous formatting Different presentation of content Devoid of any semantic markup Hard to find information due to its distribution and heterogeneity E.g. cannot query for all publications which two given people have worked on in the past year Must manually search and filter through a lot of information ,[object Object],Rarely updated  Contains duplicates and errors Not machine-readable
Motivation Leveraging legacy data from the DCS in a machine-readable and consistent form would allow related information to be linked together People would be linked with their publications Research groups would be linked to their members Co-authors of papers could be found Linking DCS data into the Web of Linked Data would allow additional information to be provided: Listing conferences which DCS members have attended Provide up-to-date publication listings Via external linked datasets The DCS web site use case is indicative of the makeup of the WWW overall: Organisations provide legacy data which is intended for human consumption HTML documents are provided which lack any explicit machine-readable semantics (Mika et al, 2009) presented crawling logs from 2008-2009 where only 0.6% of web documents contained RDFa and 2% contained the hCardmicroformat
Converting Legacy Data into Linked Data The approach is divided into 3 different stages: Triplification Converting legacy data into RDF triples Coreference Resolution Identifying coreferring entities into the RDF dataset Linked to the Web of Linked Data Querying and discovering links into the Linked Data cloud
Triplification of Legacy Data The DCS publication database provides publication listings as XML However, all publication information is contained within the same <description> element (title, author, year, book title): <description>   <![CDATA[Rowe, M. (2009). Interlinking Distributed Social Graphs.    In <i>Proceedings of Linked Data on the Web Workshop, WWW 2009,    Madrid, Spain. (2009)</i>. Madrid, Madrid, Spain.<br>   <br>Edited by Sarah Duffy on Tue, 08 Dec 2009 09:31:30 +0000.]]> </description> DCS web site provides person information and research group listings in HTML documents Information is devoid of markup and is provided in a heterogeneous formats Extracting person information from HTML documents requires the derivation of context windows Portions of a given HTML document which contain person information (e.g. a person’s name together with their email address and homepage)
Triplification of Legacy Data
Triplification of Legacy Data Context windows are generated by identifying portions of a HTML document which contain a person’s name The structure of the HTML DOM is then used to partition the window such that it contains information about a single person HTML markup provides clues as to the segmentation of legacy data within the document Once a name is identified a set of algorithms moves up the DOM tree until a layout element is discovered (e.g. <div>, <li>, <tr>) Provided with a set of context windows, legacy data is then extracted from the window using Hidden Markov Models (HMMs) Context windows are extracted from HTML documents for DCS members and are provided from the publication listings XML HMMs:  Input: sequence of tokens from the context window Output: most likely state labels for each of the tokens States correspond to information which is to be extracted
Triplification of Legacy Data An RDF dataset is built from the extracted legacy data This provides the sourcedataset from which a linked dataset is built For person information triples are formed as follows: <http://data.dcs.shef.ac.uk/person/12025>  rdf:typefoaf:Person ; foaf:name "Matthew Rowe" . <http://www.dcs.shef.ac.uk/~mrowe/publications.html>  foaf:topic <http://data.dcs.shef.ac.uk/person/12025> For publication information triples are formed as follows: <http://data.dcs.shef.ac.uk/paper/239>  rdf:typebib:Entry ; bib:title "Interlinking Distributed Social Graphs." ; bib:hasYear "2009" ;	 bib:hasBookTitle "Proceedings of Linked Data on the Web Workshop, WWW, Madrid, Spain." ; foaf:maker _:a1 . _:a1 rdf:typefoaf:Person ;  foaf:name "Matthew Rowe"
Coreference Resolution The triplification of legacy data contained within the DCS web sites (from ~12,000 HTML documents)  produced 17,896 instances of foaf:Person and 1,088 instances of bib:Entry Contains many equivalent foaf:Person instances Must also assign people to their publications We create information about each research group manually to relate DCS members with their research groups: <http://data.dcs.shef.ac.uk/group/oak>	 rdf:typefoaf:Group ; foaf:name  "Organisations, Information and Knowledge Group" ; foaf:workplaceHomepage  <http://oak.dcs.shef.ac.uk> Querying the source dataset for all the people to appear on each of the group personnel listings pages provides the members of each group This provides a list of people who are members of the DCS For each person equivalent instances are found by Graph smushing: detecting equivalent instances which have the same email address or homepage Co-occurrence queries: detecting equivalent instances by matching coworkers who appear in the same page People are assigned to their publications using queries containing an abbreviated form of their name Applicable in this context due to the localised nature of the dataset
Coreference Resolution
Linking to the Web of Linked Data Our dataset at this stage in the approach is not linked data All links are internal to the dataset To overcome the burden of researchers updating their publications we query the DBLP linked dataset using a Networked Graph SPARQL query: The query detects authored research papers in DBLP based on coauthorship with co-workers The CONSTRUCT clause creates a triple relating DCS members with their papers in DBLP Using foaf:made When the Data.dcs member is later looked up, a follow-your-nose strategy can be used to get all papers from DBLP Can also look up conferences attended, countries visited, etc CONSTRUCT {   ?qfoaf:made ?paper .   ?pfoaf:made ?paper  } WHERE {   ?group foaf:member ?q .   ?group foaf:member ?p .   ?qfoaf:name ?n .   ?pfoaf:name ?c .    GRAPH <http://www4.wiwiss.fu-berlin.de/dblp/>    {     ?paper dc:creator ?x .     ?xfoaf:name ?n .     ?paper dc:creator ?y .     ?yfoaf:name ?c .   }   FILTER (?p != ?q) } <http://data.dcs.shef.ac.uk/person/Fabio-Ciravegna> foaf:made  <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/icml/IresonCCFKL05> ; foaf:made  <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/ijcai/BrewsterCW01>
Deployment Data.dcs is now up and running and can be accessed at the following URL: http://data.dcs.shef.ac.uk (please try it!) The data is deployed using Recipe 1 from “How to Publish Linked Data” http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/ Recipe 2 for Slash Namespaces from “Best Practices for Publishing RDF Vocabularies” http://www.w3.org/TR/swbp-vocab-pub/ Viewing <http://data.dcs.shef.ac.uk/group/oak> using OpenLinks’sURIBurner
Conclusions Leveraging legacy data requires information extraction components able to handle heterogeneous formats Hidden Markov Models provide a single solution to this problem, however other methods exist which could be explored Presented methods are applicable to other domains, simply requires a different topology and training   Current methods for Linked DCS Data into the Web of Linked Data are conservative: Bespoke SPARQL queries Future work will include the exploration of machine learning classification techniques to perform URI disambiguation This work is now being used as a blueprint for producing linked data from the entire University of Sheffield Covering different faculties and departments Intended to foster and identify possible collaborations across work areas
Twitter: @mattroweshow Web: http://www.dcs.shef.ac.uk/~mrowe Email: m.rowe@dcs.shef.ac.uk Questions? (Mika et al, 2009) - Peter Mika, Edgar Meij, and Hugo Zaragoza. Investigating the semantic gap through query log analysis. In 8th International Semantic Web Conference (ISWC2009), October 2009.

Contenu connexe

Tendances

Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic WebTomek Pluskiewicz
 
RDFa Introductory Course Session 4/4 When RDFa
RDFa Introductory Course Session 4/4 When RDFaRDFa Introductory Course Session 4/4 When RDFa
RDFa Introductory Course Session 4/4 When RDFaPlatypus
 
Microformats Workshop (2009)
Microformats Workshop  (2009)Microformats Workshop  (2009)
Microformats Workshop (2009)Kelley Howell
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
 
An introduction to Linked (Open) Data
An introduction to Linked (Open) DataAn introduction to Linked (Open) Data
An introduction to Linked (Open) DataAli Khalili
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapubeswcsummerschool
 
Semantic Web 2.0: Creating Social Semantic Information Spaces
Semantic Web 2.0: Creating Social Semantic Information SpacesSemantic Web 2.0: Creating Social Semantic Information Spaces
Semantic Web 2.0: Creating Social Semantic Information SpacesJohn Breslin
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataMatthew Rowe
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in LibrariesCarl Hess
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015Cason Snow
 
Introduction to Linked Data
Introduction to Linked DataIntroduction to Linked Data
Introduction to Linked DataJuan Sequeda
 
How links can make your open data even greater
How links can make your open data even greaterHow links can make your open data even greater
How links can make your open data even greaterCristina Sarasua
 
Search Joins with the Web - ICDT2014 Invited Lecture
Search Joins with the Web - ICDT2014 Invited LectureSearch Joins with the Web - ICDT2014 Invited Lecture
Search Joins with the Web - ICDT2014 Invited LectureChris Bizer
 
The Linked Data Lifecycle
The Linked Data LifecycleThe Linked Data Lifecycle
The Linked Data Lifecyclegeoknow
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)Besnik Fetahu
 
Get on the Linked Data Web!
Get on the Linked Data Web!Get on the Linked Data Web!
Get on the Linked Data Web!Armin Haller
 

Tendances (20)

Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
RDFa Introductory Course Session 4/4 When RDFa
RDFa Introductory Course Session 4/4 When RDFaRDFa Introductory Course Session 4/4 When RDFa
RDFa Introductory Course Session 4/4 When RDFa
 
Microformats Workshop (2009)
Microformats Workshop  (2009)Microformats Workshop  (2009)
Microformats Workshop (2009)
 
Linked Data In Action
Linked Data In ActionLinked Data In Action
Linked Data In Action
 
Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
The Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of LeipzigThe Semantic Data Web, Sören Auer, University of Leipzig
The Semantic Data Web, Sören Auer, University of Leipzig
 
An introduction to Linked (Open) Data
An introduction to Linked (Open) DataAn introduction to Linked (Open) Data
An introduction to Linked (Open) Data
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapub
 
Semantic Web 2.0: Creating Social Semantic Information Spaces
Semantic Web 2.0: Creating Social Semantic Information SpacesSemantic Web 2.0: Creating Social Semantic Information Spaces
Semantic Web 2.0: Creating Social Semantic Information Spaces
 
Semantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic DataSemantic Technologies: Representing Semantic Data
Semantic Technologies: Representing Semantic Data
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in Libraries
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015
 
Introduction to Linked Data
Introduction to Linked DataIntroduction to Linked Data
Introduction to Linked Data
 
How links can make your open data even greater
How links can make your open data even greaterHow links can make your open data even greater
How links can make your open data even greater
 
Search Joins with the Web - ICDT2014 Invited Lecture
Search Joins with the Web - ICDT2014 Invited LectureSearch Joins with the Web - ICDT2014 Invited Lecture
Search Joins with the Web - ICDT2014 Invited Lecture
 
The Linked Data Lifecycle
The Linked Data LifecycleThe Linked Data Lifecycle
The Linked Data Lifecycle
 
euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)euclid_linkedup WWW tutorial (Besnik Fetahu)
euclid_linkedup WWW tutorial (Besnik Fetahu)
 
Get on the Linked Data Web!
Get on the Linked Data Web!Get on the Linked Data Web!
Get on the Linked Data Web!
 
Jarrar: Linked Data
Jarrar: Linked DataJarrar: Linked Data
Jarrar: Linked Data
 
Library Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic ControlLibrary Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic Control
 

En vedette

Information extraction systems aspects and characteristics
Information extraction systems  aspects and characteristicsInformation extraction systems  aspects and characteristics
Information extraction systems aspects and characteristicsGeorge Ang
 
Adaptive information extraction
Adaptive information extractionAdaptive information extraction
Adaptive information extractionunyil96
 
Information Extraction from the Web - In today's web
Information Extraction from the Web - In today's webInformation Extraction from the Web - In today's web
Information Extraction from the Web - In today's webBenjamin Habegger
 
Open Calais
Open CalaisOpen Calais
Open Calaisymark
 
Web Scale Information Extraction tutorial ecml2013
Web Scale Information Extraction tutorial ecml2013Web Scale Information Extraction tutorial ecml2013
Web Scale Information Extraction tutorial ecml2013Anna Lisa Gentile
 
Predicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian SequencesPredicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian SequencesMatthew Rowe
 
Social Computing Research with Apache Spark
Social Computing Research with Apache SparkSocial Computing Research with Apache Spark
Social Computing Research with Apache SparkMatthew Rowe
 

En vedette (9)

Information extraction systems aspects and characteristics
Information extraction systems  aspects and characteristicsInformation extraction systems  aspects and characteristics
Information extraction systems aspects and characteristics
 
Adaptive information extraction
Adaptive information extractionAdaptive information extraction
Adaptive information extraction
 
Information Extraction from the Web - In today's web
Information Extraction from the Web - In today's webInformation Extraction from the Web - In today's web
Information Extraction from the Web - In today's web
 
Information Extraction
Information ExtractionInformation Extraction
Information Extraction
 
Open Calais
Open CalaisOpen Calais
Open Calais
 
Web Scale Information Extraction tutorial ecml2013
Web Scale Information Extraction tutorial ecml2013Web Scale Information Extraction tutorial ecml2013
Web Scale Information Extraction tutorial ecml2013
 
Predicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian SequencesPredicting Online Community Churners using Gaussian Sequences
Predicting Online Community Churners using Gaussian Sequences
 
Social Computing Research with Apache Spark
Social Computing Research with Apache SparkSocial Computing Research with Apache Spark
Social Computing Research with Apache Spark
 
Comparing Ontotext KIM and Apache Stanbol
Comparing Ontotext KIM and Apache StanbolComparing Ontotext KIM and Apache Stanbol
Comparing Ontotext KIM and Apache Stanbol
 

Similaire à Data.dcs: Converting Legacy Data into Linked Data

An Implementation of a New Framework for Automatic Generation of Ontology and...
An Implementation of a New Framework for Automatic Generation of Ontology and...An Implementation of a New Framework for Automatic Generation of Ontology and...
An Implementation of a New Framework for Automatic Generation of Ontology and...IJCSIS Research Publications
 
15. session 15 data binding
15. session 15   data binding15. session 15   data binding
15. session 15 data bindingPhúc Đỗ
 
15. session 15 data binding
15. session 15   data binding15. session 15   data binding
15. session 15 data bindingPhúc Đỗ
 
2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data introvafopoulos
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked datavafopoulos
 
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Stuart Chalk
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data TutorialSören Auer
 
IRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET Journal
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Jane Stevenson
 
Linked Data Overview - AGI Technical SIG
Linked Data Overview - AGI Technical SIGLinked Data Overview - AGI Technical SIG
Linked Data Overview - AGI Technical SIGChris Ewing
 
Search Engines After The Semanatic Web
Search Engines After The Semanatic WebSearch Engines After The Semanatic Web
Search Engines After The Semanatic Websamar_slideshare
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commonsJesse Wang
 
20100614 ISWSA Keynote
20100614 ISWSA Keynote20100614 ISWSA Keynote
20100614 ISWSA KeynoteAxel Polleres
 
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...cscpconf
 
A semantic based approach for information retrieval from html documents using...
A semantic based approach for information retrieval from html documents using...A semantic based approach for information retrieval from html documents using...
A semantic based approach for information retrieval from html documents using...csandit
 
Information Intermediaries
Information IntermediariesInformation Intermediaries
Information IntermediariesDave Reynolds
 
Data Portability with SIOC and FOAF
Data Portability with SIOC and FOAFData Portability with SIOC and FOAF
Data Portability with SIOC and FOAFUldis Bojars
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And VisualizationIvan Ermilov
 

Similaire à Data.dcs: Converting Legacy Data into Linked Data (20)

An Implementation of a New Framework for Automatic Generation of Ontology and...
An Implementation of a New Framework for Automatic Generation of Ontology and...An Implementation of a New Framework for Automatic Generation of Ontology and...
An Implementation of a New Framework for Automatic Generation of Ontology and...
 
15. session 15 data binding
15. session 15   data binding15. session 15   data binding
15. session 15 data binding
 
15. session 15 data binding
15. session 15   data binding15. session 15   data binding
15. session 15 data binding
 
2011 05-02 linked data intro
2011 05-02 linked data intro2011 05-02 linked data intro
2011 05-02 linked data intro
 
2011 05-01 linked data
2011 05-01 linked data2011 05-01 linked data
2011 05-01 linked data
 
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
Rule-based Capture/Storage of Scientific Data from PDF Files and Export using...
 
Linked Data Tutorial
Linked Data TutorialLinked Data Tutorial
Linked Data Tutorial
 
IRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET- Data Retrieval using Master Resource Description Framework
IRJET- Data Retrieval using Master Resource Description Framework
 
Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011 Linked Data and Locah, UKSG2011
Linked Data and Locah, UKSG2011
 
Gt ea2009
Gt ea2009Gt ea2009
Gt ea2009
 
The Social Data Web
The Social Data WebThe Social Data Web
The Social Data Web
 
Linked Data Overview - AGI Technical SIG
Linked Data Overview - AGI Technical SIGLinked Data Overview - AGI Technical SIG
Linked Data Overview - AGI Technical SIG
 
Search Engines After The Semanatic Web
Search Engines After The Semanatic WebSearch Engines After The Semanatic Web
Search Engines After The Semanatic Web
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
20100614 ISWSA Keynote
20100614 ISWSA Keynote20100614 ISWSA Keynote
20100614 ISWSA Keynote
 
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...
A SEMANTIC BASED APPROACH FOR INFORMATION RETRIEVAL FROM HTML DOCUMENTS USING...
 
A semantic based approach for information retrieval from html documents using...
A semantic based approach for information retrieval from html documents using...A semantic based approach for information retrieval from html documents using...
A semantic based approach for information retrieval from html documents using...
 
Information Intermediaries
Information IntermediariesInformation Intermediaries
Information Intermediaries
 
Data Portability with SIOC and FOAF
Data Portability with SIOC and FOAFData Portability with SIOC and FOAF
Data Portability with SIOC and FOAF
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 

Plus de Matthew Rowe

Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...Matthew Rowe
 
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting RatingsSemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings Matthew Rowe
 
The Semantic Evolution of Online Communities
The Semantic Evolution of Online CommunitiesThe Semantic Evolution of Online Communities
The Semantic Evolution of Online CommunitiesMatthew Rowe
 
From Mining to Understanding: The Evolution of Social Web Users
From Mining to Understanding: The Evolution of Social Web UsersFrom Mining to Understanding: The Evolution of Social Web Users
From Mining to Understanding: The Evolution of Social Web UsersMatthew Rowe
 
Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...Matthew Rowe
 
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...Matthew Rowe
 
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...Matthew Rowe
 
Identity: Physical, Cyber, Future
Identity: Physical, Cyber, FutureIdentity: Physical, Cyber, Future
Identity: Physical, Cyber, FutureMatthew Rowe
 
Measuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMeasuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMatthew Rowe
 
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Matthew Rowe
 
Attention Economics in Social Web Systems
Attention Economics in Social Web SystemsAttention Economics in Social Web Systems
Attention Economics in Social Web SystemsMatthew Rowe
 
What makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositionsWhat makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositionsMatthew Rowe
 
Existing Research and Future Research Agenda
Existing Research and Future Research AgendaExisting Research and Future Research Agenda
Existing Research and Future Research AgendaMatthew Rowe
 
Tutorial: Social Semantics
Tutorial: Social SemanticsTutorial: Social Semantics
Tutorial: Social SemanticsMatthew Rowe
 
Modelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesModelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesMatthew Rowe
 
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web SystemsUsing Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web SystemsMatthew Rowe
 
Anticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsAnticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsMatthew Rowe
 
Forecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeForecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeMatthew Rowe
 
Predicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebPredicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebMatthew Rowe
 
PhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataPhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataMatthew Rowe
 

Plus de Matthew Rowe (20)

Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
Transferring Semantic Categories with Vertex Kernels: Recommendations with Se...
 
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting RatingsSemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
SemanticSVD++: Incorporating Semantic Taste Evolution for Predicting Ratings
 
The Semantic Evolution of Online Communities
The Semantic Evolution of Online CommunitiesThe Semantic Evolution of Online Communities
The Semantic Evolution of Online Communities
 
From Mining to Understanding: The Evolution of Social Web Users
From Mining to Understanding: The Evolution of Social Web UsersFrom Mining to Understanding: The Evolution of Social Web Users
From Mining to Understanding: The Evolution of Social Web Users
 
Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...Mining User Lifecycles from Online Community Platforms and their Application ...
Mining User Lifecycles from Online Community Platforms and their Application ...
 
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
From User Needs to Community Health: Mining User Behaviour to Analyse Online ...
 
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
Changing with Time: Modelling and Detecting User Lifecycle Periods in Online ...
 
Identity: Physical, Cyber, Future
Identity: Physical, Cyber, FutureIdentity: Physical, Cyber, Future
Identity: Physical, Cyber, Future
 
Measuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMeasuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online Communities
 
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-I...
 
Attention Economics in Social Web Systems
Attention Economics in Social Web SystemsAttention Economics in Social Web Systems
Attention Economics in Social Web Systems
 
What makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositionsWhat makes communities tick? Community health analysis using role compositions
What makes communities tick? Community health analysis using role compositions
 
Existing Research and Future Research Agenda
Existing Research and Future Research AgendaExisting Research and Future Research Agenda
Existing Research and Future Research Agenda
 
Tutorial: Social Semantics
Tutorial: Social SemanticsTutorial: Social Semantics
Tutorial: Social Semantics
 
Modelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online CommunitiesModelling and Analysis of User Behaviour in Online Communities
Modelling and Analysis of User Behaviour in Online Communities
 
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web SystemsUsing Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
Using Behaviour Analysis to Detect Cultural Aspects in Social Web Systems
 
Anticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community ForumsAnticipating Discussion Activity on Community Forums
Anticipating Discussion Activity on Community Forums
 
Forecasting Audience Increase on Youtube
Forecasting Audience Increase on YoutubeForecasting Audience Increase on Youtube
Forecasting Audience Increase on Youtube
 
Predicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic WebPredicting Discussions on the Social Semantic Web
Predicting Discussions on the Social Semantic Web
 
PhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social DataPhD Viva - Disambiguating Identity Web References using Social Data
PhD Viva - Disambiguating Identity Web References using Social Data
 

Dernier

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdfssuserdda66b
 

Dernier (20)

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 

Data.dcs: Converting Legacy Data into Linked Data

  • 1. Data.dcs: Converting Legacy Data into Linked Data Matthew Rowe Organisations, Information and Knowledge Group University of Sheffield
  • 2. Outline Problem Legacy data contained within the Department of Computer Science Motivation Why produce linked data? Converting Legacy Data into Linked Data: Triplification of Legacy Data Coreference Resolution Linking into the Web of Linked Data Deployment Conclusions
  • 3.
  • 4. Motivation Leveraging legacy data from the DCS in a machine-readable and consistent form would allow related information to be linked together People would be linked with their publications Research groups would be linked to their members Co-authors of papers could be found Linking DCS data into the Web of Linked Data would allow additional information to be provided: Listing conferences which DCS members have attended Provide up-to-date publication listings Via external linked datasets The DCS web site use case is indicative of the makeup of the WWW overall: Organisations provide legacy data which is intended for human consumption HTML documents are provided which lack any explicit machine-readable semantics (Mika et al, 2009) presented crawling logs from 2008-2009 where only 0.6% of web documents contained RDFa and 2% contained the hCardmicroformat
  • 5. Converting Legacy Data into Linked Data The approach is divided into 3 different stages: Triplification Converting legacy data into RDF triples Coreference Resolution Identifying coreferring entities into the RDF dataset Linked to the Web of Linked Data Querying and discovering links into the Linked Data cloud
  • 6. Triplification of Legacy Data The DCS publication database provides publication listings as XML However, all publication information is contained within the same <description> element (title, author, year, book title): <description> <![CDATA[Rowe, M. (2009). Interlinking Distributed Social Graphs. In <i>Proceedings of Linked Data on the Web Workshop, WWW 2009, Madrid, Spain. (2009)</i>. Madrid, Madrid, Spain.<br> <br>Edited by Sarah Duffy on Tue, 08 Dec 2009 09:31:30 +0000.]]> </description> DCS web site provides person information and research group listings in HTML documents Information is devoid of markup and is provided in a heterogeneous formats Extracting person information from HTML documents requires the derivation of context windows Portions of a given HTML document which contain person information (e.g. a person’s name together with their email address and homepage)
  • 8. Triplification of Legacy Data Context windows are generated by identifying portions of a HTML document which contain a person’s name The structure of the HTML DOM is then used to partition the window such that it contains information about a single person HTML markup provides clues as to the segmentation of legacy data within the document Once a name is identified a set of algorithms moves up the DOM tree until a layout element is discovered (e.g. <div>, <li>, <tr>) Provided with a set of context windows, legacy data is then extracted from the window using Hidden Markov Models (HMMs) Context windows are extracted from HTML documents for DCS members and are provided from the publication listings XML HMMs: Input: sequence of tokens from the context window Output: most likely state labels for each of the tokens States correspond to information which is to be extracted
  • 9. Triplification of Legacy Data An RDF dataset is built from the extracted legacy data This provides the sourcedataset from which a linked dataset is built For person information triples are formed as follows: <http://data.dcs.shef.ac.uk/person/12025> rdf:typefoaf:Person ; foaf:name "Matthew Rowe" . <http://www.dcs.shef.ac.uk/~mrowe/publications.html> foaf:topic <http://data.dcs.shef.ac.uk/person/12025> For publication information triples are formed as follows: <http://data.dcs.shef.ac.uk/paper/239> rdf:typebib:Entry ; bib:title "Interlinking Distributed Social Graphs." ; bib:hasYear "2009" ; bib:hasBookTitle "Proceedings of Linked Data on the Web Workshop, WWW, Madrid, Spain." ; foaf:maker _:a1 . _:a1 rdf:typefoaf:Person ; foaf:name "Matthew Rowe"
  • 10. Coreference Resolution The triplification of legacy data contained within the DCS web sites (from ~12,000 HTML documents) produced 17,896 instances of foaf:Person and 1,088 instances of bib:Entry Contains many equivalent foaf:Person instances Must also assign people to their publications We create information about each research group manually to relate DCS members with their research groups: <http://data.dcs.shef.ac.uk/group/oak> rdf:typefoaf:Group ; foaf:name "Organisations, Information and Knowledge Group" ; foaf:workplaceHomepage <http://oak.dcs.shef.ac.uk> Querying the source dataset for all the people to appear on each of the group personnel listings pages provides the members of each group This provides a list of people who are members of the DCS For each person equivalent instances are found by Graph smushing: detecting equivalent instances which have the same email address or homepage Co-occurrence queries: detecting equivalent instances by matching coworkers who appear in the same page People are assigned to their publications using queries containing an abbreviated form of their name Applicable in this context due to the localised nature of the dataset
  • 12. Linking to the Web of Linked Data Our dataset at this stage in the approach is not linked data All links are internal to the dataset To overcome the burden of researchers updating their publications we query the DBLP linked dataset using a Networked Graph SPARQL query: The query detects authored research papers in DBLP based on coauthorship with co-workers The CONSTRUCT clause creates a triple relating DCS members with their papers in DBLP Using foaf:made When the Data.dcs member is later looked up, a follow-your-nose strategy can be used to get all papers from DBLP Can also look up conferences attended, countries visited, etc CONSTRUCT { ?qfoaf:made ?paper . ?pfoaf:made ?paper } WHERE { ?group foaf:member ?q . ?group foaf:member ?p . ?qfoaf:name ?n . ?pfoaf:name ?c . GRAPH <http://www4.wiwiss.fu-berlin.de/dblp/> { ?paper dc:creator ?x . ?xfoaf:name ?n . ?paper dc:creator ?y . ?yfoaf:name ?c . } FILTER (?p != ?q) } <http://data.dcs.shef.ac.uk/person/Fabio-Ciravegna> foaf:made <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/icml/IresonCCFKL05> ; foaf:made <http://www4.wiwiss.fu-berlin.de/dblp/resource/record/conf/ijcai/BrewsterCW01>
  • 13. Deployment Data.dcs is now up and running and can be accessed at the following URL: http://data.dcs.shef.ac.uk (please try it!) The data is deployed using Recipe 1 from “How to Publish Linked Data” http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/ Recipe 2 for Slash Namespaces from “Best Practices for Publishing RDF Vocabularies” http://www.w3.org/TR/swbp-vocab-pub/ Viewing <http://data.dcs.shef.ac.uk/group/oak> using OpenLinks’sURIBurner
  • 14. Conclusions Leveraging legacy data requires information extraction components able to handle heterogeneous formats Hidden Markov Models provide a single solution to this problem, however other methods exist which could be explored Presented methods are applicable to other domains, simply requires a different topology and training Current methods for Linked DCS Data into the Web of Linked Data are conservative: Bespoke SPARQL queries Future work will include the exploration of machine learning classification techniques to perform URI disambiguation This work is now being used as a blueprint for producing linked data from the entire University of Sheffield Covering different faculties and departments Intended to foster and identify possible collaborations across work areas
  • 15. Twitter: @mattroweshow Web: http://www.dcs.shef.ac.uk/~mrowe Email: m.rowe@dcs.shef.ac.uk Questions? (Mika et al, 2009) - Peter Mika, Edgar Meij, and Hugo Zaragoza. Investigating the semantic gap through query log analysis. In 8th International Semantic Web Conference (ISWC2009), October 2009.