Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

BibBase Linked Data Triplification Challenge 2010 Presentation

Prochain SlideShare
Linking up your data
Linking up your data
Chargement dans…3
×

Consultez-les par la suite

1 sur 15
1 sur 15

Plus De Contenu Connexe

Similaire à BibBase Linked Data Triplification Challenge 2010 Presentation

Livres associés

Gratuit avec un essai de 30 jours de Scribd

Tout voir

Livres audio associés

Gratuit avec un essai de 30 jours de Scribd

Tout voir

BibBase Linked Data Triplification Challenge 2010 Presentation

  1. 1. BibBase Triplified http://data.bibbase.org/ Presented by: Reynold S. Xin UC Berkeley Joint work with: Oktie Hassanzadeh, Yang Yang, Jiang Du, Minghua Zhao, Renee J. Miller University of Toronto Christian Fritz University of Southern California
  2. 2. Outline  Goals and Status  Duplicate detection  Interlinking of data sources  Additional features  Conclusions and future work
  3. 3. Goals http://www.bibbase.org  Makes it easy for scientists to maintain publications pages  Scientists maintain a bibtex file; BibBase does the rest  Publishes them in HTML
  4. 4. Goals http://data.bibbase.org  Makes it easy for scientists to maintain publications pages  Scientists maintain a bibtex file; BibBase does the rest  Publishes them in HTML  Publishes them in RDF  Links entries to the open linked data cloud  With incentive, scientists are helping us build a bibliographic database (think DBLP but automated)  Invaluable data set for benchmarking duplicate detection and semantic link discovery systems
  5. 5. Some statistics  “Beta” went online in June 2010  As of yesterday (September 1, 2010)  ~ 100 active users  4520 publications, 4883 authors, 502 journals, 1881 proceedings, 88 keywords  39201 author links, 2768 publication links, 30 keyword links  Note that this is before we do any form of “marketing”
  6. 6. Duplicate Detection  Examples  Authors: “Renee J. Miller” or “R. J. Miller” or “RJ Miller”  Publication entries  Journal & conferences: “VLDB” or “Very Large Data Base”  Solutions  Local detection (within a single bibtex file)  Global detection (across multiple files)
  7. 7. Local Detection  A set of predefined rules to identify duplicates.  E.g. within a single file, it is highly likely that “Renee J Miller” is the same as “RJ Miller”.  Users can specify a suffix to the name to differentiate them (DBLP approach).  E.g. “Min Wang” vs “Min Wang2”
  8. 8. Global Detection  Duplicate detection, also known as entity resolution, record linkage, or reference reconciliation is a well- studied problem and an active research area. [Tutorial- VLDB’05, Tutorial-SIGMOD’06]  We use existing declarative techniques [D.App.σ-SIGMOD’07] to detect duplicates across multiple files.  Display disambiguation page on HTML interface and rdfs:seeAlso attribute on RDF interface.  Also enables user to provide feedback by @string{vldb = Very Large Data Base}
  9. 9. Interlinking of Data Sources  Leverages both offline dictionaries and online real-time URL verifications.  Some external data sources  DBLP  DBpedia  RKBExplorer  Semantic Web Dogfood  LOD foaf
  10. 10. Additional Features  Storage and publication of provenance information  Dynamic grouping of entities (by year, keyword, etc)  RSS feed for notification  DBLP scraper to generate bibtex files from DBLP records  Statistics on usage  Enhancement to existing MIT bibtex ontology file
  11. 11. Conclusion and Future Work  BibBase  Light-weight publication of bibliographic data  Semantic web technologies as a result of complex triplification performed inside the system  Invaluable data set  Future Work  More comprehensive duplicate detection  Links to more external data sources  Better engineering and service level agreement (99.99%?)  Broader user base
  12. 12. Questions?

×