SlideShare a Scribd company logo
1 of 45
Download to read offline
Insiders
                                                            January
                                                               2010


                  Using the Web of Data
                            for
                  Information Extraction


    scoobie
          sparql rdfa
D2R server rdf
 squin    epiphany
  Linked Data
                OBIE




                        Benjamin Adrian
                        http://www.dfki.uni-kl.de/~adrian
Insiders
Are you still surfing ...                  January
                                              2010




       Benjamin Adrian
       http://www.dfki.uni-kl.de/~adrian
Insiders
… or overloaded?                       January
                                          2010




   Benjamin Adrian
   http://www.dfki.uni-kl.de/~adrian
Insiders
                 A simple question ...                                January
                                                                         2010


What are the cities of the universities in Rhineland Palatinate and
what is the unemployment rate of these cities?




                             Benjamin Adrian
                             http://www.dfki.uni-kl.de/~adrian
Insiders
                     A simple question ...                                       January
                                                                                    2010


What are the cities of the universities in Rhineland Palatinate and
what is the unemployment rate of these cities?

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX eurostat: <http://www4.wiwiss.fu-berlin.de/eurostat/resource/eurostat/>
PREFIX dbpedia: <http://dbpedia.org/ontology/>
PREFIX dbpedia_cat: <http://dbpedia.org/resource/Category>

SELECT ?dbpcity ?cityName ?ur WHERE {
?uni      skos:subject dbpedia_cat:Universities_and_colleges_in_Rhineland-Palatinate;
          dbpedia:city                       ?dbpcity .
?dbpcity  owl:sameAs                         ?statcity.
?statcity rdfs:label                         ?cityName ;
          eurostat:unemployment_rate_total ?ur
}
                 http://www.w3.org/TR/rdf-sparql-query/
                                  Benjamin Adrian
                                  http://www.dfki.uni-kl.de/~adrian
Insiders
                      … and its answer.                                           January
                                                                                     2010



         dbpcity                                      cityName          ur

         http://dbpedia.org/resource/Koblenz          Koblenz           8.8
         http://dbpedia.org/resource/Trier            Trier             7.3




Data Sources:

 http://epp.eurostat.ec.europa.eu                       http://wiki.dbpedia.org
 http://www4.wiwiss.fu-berlin.de/eurostat/


Query Engine:    SQUIN - Query the Web of Linked Data
                 http://squin.sourceforge.net/




                                 Benjamin Adrian
                                 http://www.dfki.uni-kl.de/~adrian
So much data out there,                      Insiders
                                             January
too much?                                       2010




         Benjamin Adrian
         http://www.dfki.uni-kl.de/~adrian
Insiders
What data do you have?                    January
                                             2010




      Benjamin Adrian
      http://www.dfki.uni-kl.de/~adrian
Insiders
Are you still surfing ...                  January
                                              2010




       Benjamin Adrian
       http://www.dfki.uni-kl.de/~adrian
Insiders
                   Agenda                             January
                                                         2010


In order to use Web of Data for information
extraction, you have to understand its basics.
●   RDF on one slide
●   Publish data in RDF with D2R Server
●   Publish RDF as Linked Data
●   Query Linked Data with SPARQL and Squin
●   Use RDF for information extraction
●   Bring Linked Data to text via RDFa


                  Benjamin Adrian
                  http://www.dfki.uni-kl.de/~adrian
Insiders
       Wouldn't this be nice.                    January
                                                    2010



Data




             Benjamin Adrian
             http://www.dfki.uni-kl.de/~adrian              11
Insiders
       Wouldn't this be nice.                                             January
                                                                             2010



Data        Text


                                              User-defined Filter




           Ex
             tra
                ct
                   io
                        n
                            Pi
                                 pe
                                   l in
                                          e


                                                             Extraction
                                                              Results
                                          enrich

                Benjamin Adrian
                http://www.dfki.uni-kl.de/~adrian                                    12
Insiders
       Wouldn't this be nice.                                                   January
                                                                                   2010

                                                            annotated
Data        Text                                                 text


                                              User-defined Filter




           Ex                                                             annotate
             tra
                ct
                   io
                        n
                            Pi
                                 pe
                                   l in
                                          e


                                                             Extraction
                                                              Results
                                          enrich

                Benjamin Adrian
                http://www.dfki.uni-kl.de/~adrian                                          13
Insiders
       Wouldn't this be nice.                                                    January
                                                                                    2010

                                                             annotated
Data          Text                                                text


                                               User-defined Filter




            Ex                                                             annotate
              tra
                 ct
                    io
                         n
                             Pi
                                  pe
          populate                  l in
                                           e


                                                              Extraction
                                                               Results
                                           enrich

                 Benjamin Adrian
                 http://www.dfki.uni-kl.de/~adrian                                          14
Insiders
                        RDF on one slide                                                  January
                                                                                             2010

@prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix acm: <http://acm.rkbexplorer.com/description/> .

dblp_author:Michael_Gillmann
    foaf:name „Michael Gillmann“ ;
    rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ;
    rdf:type     foaf:Agent ;
    owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ;
    foaf:isMakerOf <http://dblp.l3s.de/d2r/resource/publications//icdar/SchulzEGAAD09> .

<http://dblp.l3s.de/d2r/resource/publications/conf/icdar/SchulzEGAAD09>
    dc:creator dblp_author:Michael_Gillmann ;
    dc:creator dblp_author:Markus_Ebbecke ;
    dc:title       „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ .
* From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf

                                   Benjamin AdrianFound at:
                                   http://www.dfki.uni-kl.de/~adrian
Insiders
                               RDF on one slide                                                January
                                                                                                  2010

@prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
                                                                                Vocabularies
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix acm: <http://acm.rkbexplorer.com/description/> .

dblp_author:Michael_Gillmann
    foaf:name „Michael Gillmann“ ;
    rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ;
    rdf:type       foaf:Agent ;
    owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ;
    foaf:isMakerOf
<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> .

<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09>
    dc:creator dblp_author:Michael_Gillmann ;
    dc:creator dblp_author:Markus_Ebbecke ;
    dc:title       „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ .

* From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf

                                            Benjamin AdrianFound at:
                                            http://www.dfki.uni-kl.de/~adrian
Insiders
                               RDF on one slide                                               January
                                                                                                 2010

@prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
                                                                                URLs / URIs
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix acm: <http://acm.rkbexplorer.com/description/> .

dblp_author:Michael_Gillmann
    foaf:name „Michael Gillmann“ ;
    rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ;
    rdf:type       foaf:Agent ;
    owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ;
    foaf:isMakerOf
<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> .

<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09>
    dc:creator dblp_author:Michael_Gillmann ;
    dc:creator dblp_author:Markus_Ebbecke ;
    dc:title       „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ .

* From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf

                                            Benjamin AdrianFound at:
                                            http://www.dfki.uni-kl.de/~adrian
Insiders
                               RDF on one slide                                            January
                                                                                              2010

@prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
                                                                                Subjects
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix acm: <http://acm.rkbexplorer.com/description/> .

dblp_author:Michael_Gillmann
    foaf:name „Michael Gillmann“ ;
    rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ;
    rdf:type       foaf:Agent ;
    owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ;
    foaf:isMakerOf
<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> .

<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09>
    dc:creator dblp_author:Michael_Gillmann ;
    dc:creator dblp_author:Markus_Ebbecke ;
    dc:title       „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ .

* From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf

                                            Benjamin AdrianFound at:
                                            http://www.dfki.uni-kl.de/~adrian
Insiders
                               RDF on one slide                                              January
                                                                                                2010

@prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
                                                                                Predicates
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix acm: <http://acm.rkbexplorer.com/description/> .

dblp_author:Michael_Gillmann
    foaf:name „Michael Gillmann“ ;
    rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ;
    rdf:type       foaf:Agent ;
    owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ;
    foaf:isMakerOf
<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> .

<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09>
    dc:creator dblp_author:Michael_Gillmann ;
    dc:creator dblp_author:Markus_Ebbecke ;
    dc:title       „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ .

* From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf

                                            Benjamin AdrianFound at:
                                            http://www.dfki.uni-kl.de/~adrian
Insiders
                               RDF on one slide                                           January
                                                                                             2010

@prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
                                                                                Objects
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix acm: <http://acm.rkbexplorer.com/description/> .

dblp_author:Michael_Gillmann
    foaf:name „Michael Gillmann“ ;
    rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ;
    rdf:type       foaf:Agent ;
    owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ;
    foaf:isMakerOf
<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> .

<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09>
    dc:creator dblp_author:Michael_Gillmann ;
    dc:creator dblp_author:Markus_Ebbecke ;
    dc:title       „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ .

* From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf

                                            Benjamin AdrianFound at:
                                            http://www.dfki.uni-kl.de/~adrian
Insiders
RDF data is graph data.                    January
                                              2010




       Benjamin Adrian
       http://www.dfki.uni-kl.de/~adrian
Publishing relational                     Insiders
                                          January
    data in RDF                              2010




      Benjamin Adrian
      http://www.dfki.uni-kl.de/~adrian
Publishing relational                                                 Insiders
                                                                                         January
                       data in RDF                                                          2010


D2R Server -    Publishing Relational Databases on
                the Semantic Web

   http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/




                                         Two small command line calls:

                                         ./d2r-server
                                              -p 80
                                              -b http://projects.dfki.uni-kl.de/mydatabase/
                                              mydatabase.n3
                                        ./generate-mapping
                                             -o mydatabase.n3
                                             -b http://projects.dfki.uni-kl.de/mydatabase/
                                             jdbc:mysql://localhost:3306/mydatabase


                                  Benjamin Adrian
                                  http://www.dfki.uni-kl.de/~adrian
Linked Data: Linking RDF                             Insiders
                                                     January
data from different sources                             2010


   Customer DB                        Employees DB




                  How to interlink
                  these datasets?




   Project DB                          DBpedia

                Benjamin Adrian
                http://www.dfki.uni-kl.de/~adrian
Linked Data: Linking RDF                                                Insiders
                                                                                    January
            data from different sources                                                2010


Linked Data Principles (TimBL, 2006)

1. Use URIs as names for things
                            (e.g., http://dbpedia.org/resource/Berlin)
2. Use HTTP-URIs so that people can look up those names
3. Provide useful information in RDF when someone looks up an URI
4. Include links to other URIs to enable discovery of more information
Example:

<http://dbpedia.org/resource/Berlin>
    owl:sameAs opencyc:en/CityOfBerlinGermany ;
    owl:sameAs opencyc:en/Berlin_StateGermany
    owl:sameAs <http://sws.geonames.org/2950159/>
    owl:sameAs <http://www4.wiwiss.fu-berlin.de/eurostat/resource/regions/Berlin>
    owl:sameAs freebase:http://dbpedia.org/resource/Berlin


                                  Benjamin Adrian
                                  http://www.dfki.uni-kl.de/~adrian
SPARQL: Querying RDF                                            Insiders
                                                                              January
                      data                                                       2010



SPARQL - the RDF query language.
In contrast to SQL, it's data model is not set oriented but graph oriented.

Some Examples:

     Resulting in tuples:
     SELECT ?interest ?friend WHERE {
         <http://www.w3.org/People/Berners­Lee/card#i> foaf:knows ?friend .
         ?friend foaf:interest ?interest .    }

     Resulting as graph :
     CONSTRUCT {?friend foaf:interest ?interest } WHERE {
         <http://www.w3.org/People/Berners­Lee/card#i> foaf:knows ?friend .
         ?friend foaf:interest ?interest .    }




                                  Benjamin Adrian
                                  http://www.dfki.uni-kl.de/~adrian
SPARQL: Query Linked                                Insiders
                                                     January
Data from different sources                             2010


   Customer DB                        Employees DB




                  How to access
                 these datasets
                  with a single
                 SPARQL query?




   Project DB                          DBpedia

                Benjamin Adrian
                http://www.dfki.uni-kl.de/~adrian
SPARQL: Query Linked                                                             Insiders
                                                                                         January
       Data from different sources                                                          2010


Customer DB           Employees DB              Squin: Query the Web of
                                                Linked Data

                                                http://squin.sourceforge.net/

                                                Squin follows a Link Traversal
 D2R Server           D2R Server                approach over HTTP URIs.

              SQUIN                             Remember:

                                                 SELECT DISTINCT ?c ?cityName ?ur
                                                WHERE {
D2R Server            D2R Server                ?u skos:subject
                                                dbpedia_cat:Universities_and_colleges_i
                                                n_Rhineland-Palatinate;
                                                   dbpedia:city ?c .
                                                 ?c owl:sameAs [ rdfs:label ?cityName ;

                                                eurostat:unemployment_rate_total ?ur ]
                                                }
Project DB            DBpedia

                      Benjamin Adrian
                      http://www.dfki.uni-kl.de/~adrian
Using RDF and Linked Data                                     Insiders
                                                              January
 for Information Extraction                                      2010


   User          Linked Data                          Query


          asks                      question



                       t
                  a bou




           to                      answers




   Text            Extraction                Result Graph
                   Pipeline



                  Benjamin Adrian
                  http://www.dfki.uni-kl.de/~adrian
Using RDF and Linked Data                                                       Insiders
                                                                                            January
             for Information Extraction                                                        2010


What data do we have?
Example RDF data
<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09>
    rdf:type     foaf:Document ;
    dc:creator   dblp_author:Markus_Ebbecke ; 
    dc:title     „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ .




  Classes            Instances      Datatype Properties          Object Properties     Literals
 foaf:Document .../SchulzEGAAD09      dc:title                       dc:creator      „Markus“
 foaf:Person   .../Markus_Ebbecke     foaf:name                      foaf:knows      „Ebbecke“
                                      foaf:firstName                                 „Seizing the
                                      foaf:surName                                   Treasure:
                                                                                     Transferring
                                                                                     Knowledge
                                                                                     in Invoice
                                                                                     Analysis“

                                 Benjamin Adrian
                                 http://www.dfki.uni-kl.de/~adrian
SCOOBIE                                    Insiders
                                                                         January
                         Domain Adaption                                    2010



    Structured                            Text Corpus
    Data                                         Data

                                                          Patterns and
                                                           Gazetteers
                                                                  Data



                 Vocabulary Data

Instance Data



                    Data Preprocessing                Information
                    & Learning (offline)           Extraction (online)


                     Benjamin Adrian
                     http://www.dfki.uni-kl.de/~adrian                              31
SCOOBIE                       Insiders
                                                                             January
                                              Eco System                        2010


               Index      Domain Knowledge                   Models
                                 Text                             Training
                                Corpus                             Corpus
Session Data



                         Instances

                         Ontology                             Models

                                         Patterns +
                                         Gazetteers
                 Pre-
               process               Train                   Extract
Tasks
API




                I   O            I



                         Benjamin Adrian
                         http://www.dfki.uni-kl.de/~adrian                              32
SCOOBIE                       Insiders
                                                               January
                              OBIE Pipeline                       2010


Normalization                        Text Extraction
                                     Language Detection
Segmentation                         Tokenization
                                     Sentence Extraction
                                     POS-Tagging
Symbolization                        Named Entity Recognition
                                     Structured Entity Recognition
                                     Noun Phrase Chunking
                                     Symbol Recognition
Instantiation                        Instance Recognition
                                     Instance Disambiguation
                                     Chunk Classification
Contextualization                    Fact Extraction
                                     Fact Selection
Population                           Query Answering
                Benjamin Adrian
                http://www.dfki.uni-kl.de/~adrian                         33
Used Machine                           Insiders
                                                                 January

                        Learning Models                             2010


             Semi-Supervised Learning

            CRF-based Noun Phrase Chunker
I
             Supervised Learning

            Gazetteer matching statistics (Named Entity Recognition)
        I   Regex matching statistics (Structured Entity Recognition)

            Unsupervised or Instance-based Learning

            TF/IDF-based instance re-ranking (Instance Disambiguation)
    I       K-Nearest-Neighbor chunk classifier (Chunk Classification)
            Spreading Activation-based fact ranking (Fact Selection)


                       Benjamin Adrian
                       http://www.dfki.uni-kl.de/~adrian                    34
Used Machine Learning:                                                             Insiders
                                                                                          January
       Conditional Random Field                                                              2010



CRFs are sequence taggers:

Train it with:   Bill      CAPITALIZED                noun
                 slept     LOWERCASE                  non-noun
                 here      LOWERCASE                  non-noun

Test it with:    He            CAPITALIZED
                 visited       LOWERCASE
                 London        CAPITALIZED

CRF results:     noun                                           MALLET - MAchine Learning
                 non-noun                                       for LanguagE Toolkit
                 non-noun
                                                                http://mallet.cs.umass.edu/


                            Benjamin Adrian
                            http://www.dfki.uni-kl.de/~adrian                                        35
Bringing Linked Data to                            Insiders
                                                                January
                       Text                                        2010


Annotate plain text or HTML with RDF data.
   I'm working at DFKI.

RDFa offers an HTML extension:

   I'm working at
   <span about="dbpedia:DFKI" property="rdfs:label">
   DFKI</span>



Now lets generate RDFa automatically ...




                            Benjamin Adrian
                            http://www.dfki.uni-kl.de/~adrian              36
Insiders
       Do you remember?                                                        January
                                                                                  2010

                                                           annotated
Data        Text                                                text


                                             User-defined Filter




          Ex                                                             annotate
            tra
               ct
                  io
                       n
                           Pi
                                pe
        populate                  l in
                                         e


                                                            Extraction
                                                             Results
                                         enrich

               Benjamin Adrian
               http://www.dfki.uni-kl.de/~adrian                                          37
Insiders
RDF Epiphany                                               January
                                                              2010



                                      Epiphany takes the
                                      original webpage
                                       …




  Benjamin Adrian
  http://www.dfki.uni-kl.de/~adrian                                   38
Insiders
RDF Epiphany                                               January
                                                              2010



                                      Epiphany takes the
                                      original webpage
                                       …
                                      and SCOOBIE initialized
                                      with an RDF data set
                                      …




  Benjamin Adrian
  http://www.dfki.uni-kl.de/~adrian                                   39
Insiders
RDF Epiphany                                                 January
                                                                2010



                                      Epiphany takes the
                                      original webpage
                                       …
                                      and SCOOBIE initialized
                                      with an RDF data set
                                      …
                                      It extracts RDF information
                                      from text and annotates it as
                                      RDFa
                                      …




  Benjamin Adrian
  http://www.dfki.uni-kl.de/~adrian                                     40
Insiders
RDF Epiphany                                                January
                                                               2010



                                      Epiphany takes the
                                      original webpage
                                       …
                                      and SCOOBIE initialized
                                      with an RDF Linked Data set
                                      …
                                      It extracts RDF information
                                      from text and annotates it as
                                      RDFa
                                      …
                                      clicking on RDFa annotations
                                      opens further information from
                                      the Linked Data set




  Benjamin Adrian
  http://www.dfki.uni-kl.de/~adrian                                    41
Insiders
                              RDF Epiphany                                          January
                                                                                       2010




At a glance
●   Epiphany is a free web service.

●   Epiphany uses SCOOBIE.
                                                                          SCOOBIE
●   Epiphany can be initialized with any RDF
       Linked Data set.

●   Epiphany generates an RDF document about
       a web page.

●   Epiphany annotates RDF as RDFa in the web
       page.


http://projects.dfki.uni-kl.de/epiphany/


                                      Benjamin Adrian
                                      http://www.dfki.uni-kl.de/~adrian                        42
Insiders
                                           Summary                                                           January
                                                                                                                2010

Customer DB          Employees DB                                                      annotated
                                                                     Text                   text

 D2R                 D2R
 Server
             SQUIN
                     Server                                              User-defined Filter

D2R                  D2R
Server               Server



Project DB           DBpedia          Ex                                                               annotate
                                        tra
                                           ct
                                              io
                                                   n
                                                       Pi
                                                            pe
                                    populate                  l in
                                                                     e


                                                                                          Extraction
                                                                                           Results
                                                                     enrich

                                           Benjamin Adrian
                                           http://www.dfki.uni-kl.de/~adrian                                            43
Insiders
                                              Outlook                                                        January
                                                                                                                2010

Customer DB          Employees DB
                                                                     E-Mail
                                                                                          annotated
                                                                                             E-Mail
 D2R                 D2R
 Server
             SQUIN
                     Server                                              User-defined Filter

D2R                  D2R
Server               Server



Project DB           DBpedia          Ex                                                               annotate
                                        tra
                                           ct
                                              io
                                                   n
                                                       Pi
                                                            pe
                                    populate                  l in
                                                                     e


                                                                                          Extraction
                                                                                           Results
                                                                     enrich

                                           Benjamin Adrian
                                           http://www.dfki.uni-kl.de/~adrian                                            44
Insiders
                                                Thank you!   January
                                                                2010




    scoobie
          sparql rdfa
D2R server rdf
 squin    epiphany
  Linked Data
                OBIE




                        Benjamin Adrian
                        http://www.dfki.uni-kl.de/~adrian               45

More Related Content

Viewers also liked

Mining Product Synonyms - Slides
Mining Product Synonyms - SlidesMining Product Synonyms - Slides
Mining Product Synonyms - SlidesAnkush Jain
 
IRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research PapersIRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research PapersSriTeja Allaparthi
 
System for-health-diagnosis
System for-health-diagnosisSystem for-health-diagnosis
System for-health-diagnosisask2372
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Textbutest
 
Information_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIITInformation_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIITAnkit Sharma
 
A survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrievalA survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrievalChen Xi
 
Open Information Extraction 2nd
Open Information Extraction 2ndOpen Information Extraction 2nd
Open Information Extraction 2ndhit_alex
 
Information Retrieval and Extraction
Information Retrieval and ExtractionInformation Retrieval and Extraction
Information Retrieval and ExtractionChristopher Frenz
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionDeeksha thakur
 
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...Masumi Shirakawa
 
ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...Jim Jenkins
 
N-gram統計量からの係り受け情報の復元 (YANS2011)
N-gram統計量からの係り受け情報の復元 (YANS2011)N-gram統計量からの係り受け情報の復元 (YANS2011)
N-gram統計量からの係り受け情報の復元 (YANS2011)Yuya Unno
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesTommaso Teofili
 
Information Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsInformation Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsBenjamin Habegger
 
Enterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challengesEnterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challengesYunyao Li
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment AnalysisAyush Khandelwal
 
Information Extraction with Linked Data
Information Extraction with Linked DataInformation Extraction with Linked Data
Information Extraction with Linked DataIsabelle Augenstein
 
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsCrowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsMatthew Lease
 
SAS University Edition - Getting Started
SAS University Edition - Getting StartedSAS University Edition - Getting Started
SAS University Edition - Getting StartedCraig Trim
 

Viewers also liked (20)

Mining Product Synonyms - Slides
Mining Product Synonyms - SlidesMining Product Synonyms - Slides
Mining Product Synonyms - Slides
 
IRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research PapersIRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research Papers
 
System for-health-diagnosis
System for-health-diagnosisSystem for-health-diagnosis
System for-health-diagnosis
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Text
 
Information_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIITInformation_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIIT
 
A survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrievalA survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrieval
 
Open Information Extraction 2nd
Open Information Extraction 2ndOpen Information Extraction 2nd
Open Information Extraction 2nd
 
Information Retrieval and Extraction
Information Retrieval and ExtractionInformation Retrieval and Extraction
Information Retrieval and Extraction
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & Extraction
 
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
 
ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...
 
N-gram統計量からの係り受け情報の復元 (YANS2011)
N-gram統計量からの係り受け情報の復元 (YANS2011)N-gram統計量からの係り受け情報の復元 (YANS2011)
N-gram統計量からの係り受け情報の復元 (YANS2011)
 
2 13
2 132 13
2 13
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - Usecases
 
Information Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsInformation Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and Tools
 
Enterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challengesEnterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challenges
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
 
Information Extraction with Linked Data
Information Extraction with Linked DataInformation Extraction with Linked Data
Information Extraction with Linked Data
 
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsCrowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
 
SAS University Edition - Getting Started
SAS University Edition - Getting StartedSAS University Edition - Getting Started
SAS University Edition - Getting Started
 

Recently uploaded

Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxRosabel UA
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 

Recently uploaded (20)

Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptx
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 

Using the Web of Data for Information Extraction

  • 1. Insiders January 2010 Using the Web of Data for Information Extraction scoobie sparql rdfa D2R server rdf squin epiphany Linked Data OBIE Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 2. Insiders Are you still surfing ... January 2010 Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 3. Insiders … or overloaded? January 2010 Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 4. Insiders A simple question ... January 2010 What are the cities of the universities in Rhineland Palatinate and what is the unemployment rate of these cities? Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 5. Insiders A simple question ... January 2010 What are the cities of the universities in Rhineland Palatinate and what is the unemployment rate of these cities? PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX eurostat: <http://www4.wiwiss.fu-berlin.de/eurostat/resource/eurostat/> PREFIX dbpedia: <http://dbpedia.org/ontology/> PREFIX dbpedia_cat: <http://dbpedia.org/resource/Category> SELECT ?dbpcity ?cityName ?ur WHERE { ?uni skos:subject dbpedia_cat:Universities_and_colleges_in_Rhineland-Palatinate; dbpedia:city ?dbpcity . ?dbpcity owl:sameAs ?statcity. ?statcity rdfs:label ?cityName ; eurostat:unemployment_rate_total ?ur } http://www.w3.org/TR/rdf-sparql-query/ Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 6. Insiders … and its answer. January 2010 dbpcity cityName ur http://dbpedia.org/resource/Koblenz Koblenz 8.8 http://dbpedia.org/resource/Trier Trier 7.3 Data Sources: http://epp.eurostat.ec.europa.eu http://wiki.dbpedia.org http://www4.wiwiss.fu-berlin.de/eurostat/ Query Engine: SQUIN - Query the Web of Linked Data http://squin.sourceforge.net/ Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 7. So much data out there, Insiders January too much? 2010 Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 8. Insiders What data do you have? January 2010 Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 9. Insiders Are you still surfing ... January 2010 Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 10. Insiders Agenda January 2010 In order to use Web of Data for information extraction, you have to understand its basics. ● RDF on one slide ● Publish data in RDF with D2R Server ● Publish RDF as Linked Data ● Query Linked Data with SPARQL and Squin ● Use RDF for information extraction ● Bring Linked Data to text via RDFa Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 11. Insiders Wouldn't this be nice. January 2010 Data Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 11
  • 12. Insiders Wouldn't this be nice. January 2010 Data Text User-defined Filter Ex tra ct io n Pi pe l in e Extraction Results enrich Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 12
  • 13. Insiders Wouldn't this be nice. January 2010 annotated Data Text text User-defined Filter Ex annotate tra ct io n Pi pe l in e Extraction Results enrich Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 13
  • 14. Insiders Wouldn't this be nice. January 2010 annotated Data Text text User-defined Filter Ex annotate tra ct io n Pi pe populate l in e Extraction Results enrich Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 14
  • 15. Insiders RDF on one slide January 2010 @prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dc: <http://purl.org/dc/terms/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix acm: <http://acm.rkbexplorer.com/description/> . dblp_author:Michael_Gillmann foaf:name „Michael Gillmann“ ; rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ; rdf:type foaf:Agent ; owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ; foaf:isMakerOf <http://dblp.l3s.de/d2r/resource/publications//icdar/SchulzEGAAD09> . <http://dblp.l3s.de/d2r/resource/publications/conf/icdar/SchulzEGAAD09> dc:creator dblp_author:Michael_Gillmann ; dc:creator dblp_author:Markus_Ebbecke ; dc:title „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ . * From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf Benjamin AdrianFound at: http://www.dfki.uni-kl.de/~adrian
  • 16. Insiders RDF on one slide January 2010 @prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . Vocabularies @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dc: <http://purl.org/dc/terms/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix acm: <http://acm.rkbexplorer.com/description/> . dblp_author:Michael_Gillmann foaf:name „Michael Gillmann“ ; rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ; rdf:type foaf:Agent ; owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ; foaf:isMakerOf <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> . <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> dc:creator dblp_author:Michael_Gillmann ; dc:creator dblp_author:Markus_Ebbecke ; dc:title „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ . * From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf Benjamin AdrianFound at: http://www.dfki.uni-kl.de/~adrian
  • 17. Insiders RDF on one slide January 2010 @prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . URLs / URIs @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dc: <http://purl.org/dc/terms/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix acm: <http://acm.rkbexplorer.com/description/> . dblp_author:Michael_Gillmann foaf:name „Michael Gillmann“ ; rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ; rdf:type foaf:Agent ; owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ; foaf:isMakerOf <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> . <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> dc:creator dblp_author:Michael_Gillmann ; dc:creator dblp_author:Markus_Ebbecke ; dc:title „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ . * From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf Benjamin AdrianFound at: http://www.dfki.uni-kl.de/~adrian
  • 18. Insiders RDF on one slide January 2010 @prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . Subjects @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dc: <http://purl.org/dc/terms/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix acm: <http://acm.rkbexplorer.com/description/> . dblp_author:Michael_Gillmann foaf:name „Michael Gillmann“ ; rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ; rdf:type foaf:Agent ; owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ; foaf:isMakerOf <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> . <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> dc:creator dblp_author:Michael_Gillmann ; dc:creator dblp_author:Markus_Ebbecke ; dc:title „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ . * From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf Benjamin AdrianFound at: http://www.dfki.uni-kl.de/~adrian
  • 19. Insiders RDF on one slide January 2010 @prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . Predicates @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dc: <http://purl.org/dc/terms/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix acm: <http://acm.rkbexplorer.com/description/> . dblp_author:Michael_Gillmann foaf:name „Michael Gillmann“ ; rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ; rdf:type foaf:Agent ; owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ; foaf:isMakerOf <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> . <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> dc:creator dblp_author:Michael_Gillmann ; dc:creator dblp_author:Markus_Ebbecke ; dc:title „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ . * From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf Benjamin AdrianFound at: http://www.dfki.uni-kl.de/~adrian
  • 20. Insiders RDF on one slide January 2010 @prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . Objects @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dc: <http://purl.org/dc/terms/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix acm: <http://acm.rkbexplorer.com/description/> . dblp_author:Michael_Gillmann foaf:name „Michael Gillmann“ ; rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ; rdf:type foaf:Agent ; owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ; foaf:isMakerOf <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> . <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> dc:creator dblp_author:Michael_Gillmann ; dc:creator dblp_author:Markus_Ebbecke ; dc:title „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ . * From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf Benjamin AdrianFound at: http://www.dfki.uni-kl.de/~adrian
  • 21. Insiders RDF data is graph data. January 2010 Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 22. Publishing relational Insiders January data in RDF 2010 Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 23. Publishing relational Insiders January data in RDF 2010 D2R Server - Publishing Relational Databases on the Semantic Web http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/ Two small command line calls: ./d2r-server -p 80 -b http://projects.dfki.uni-kl.de/mydatabase/ mydatabase.n3 ./generate-mapping -o mydatabase.n3 -b http://projects.dfki.uni-kl.de/mydatabase/ jdbc:mysql://localhost:3306/mydatabase Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 24. Linked Data: Linking RDF Insiders January data from different sources 2010 Customer DB Employees DB How to interlink these datasets? Project DB DBpedia Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 25. Linked Data: Linking RDF Insiders January data from different sources 2010 Linked Data Principles (TimBL, 2006) 1. Use URIs as names for things (e.g., http://dbpedia.org/resource/Berlin) 2. Use HTTP-URIs so that people can look up those names 3. Provide useful information in RDF when someone looks up an URI 4. Include links to other URIs to enable discovery of more information Example: <http://dbpedia.org/resource/Berlin> owl:sameAs opencyc:en/CityOfBerlinGermany ; owl:sameAs opencyc:en/Berlin_StateGermany owl:sameAs <http://sws.geonames.org/2950159/> owl:sameAs <http://www4.wiwiss.fu-berlin.de/eurostat/resource/regions/Berlin> owl:sameAs freebase:http://dbpedia.org/resource/Berlin Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 26. SPARQL: Querying RDF Insiders January data 2010 SPARQL - the RDF query language. In contrast to SQL, it's data model is not set oriented but graph oriented. Some Examples: Resulting in tuples: SELECT ?interest ?friend WHERE {    <http://www.w3.org/People/Berners­Lee/card#i> foaf:knows ?friend .    ?friend foaf:interest ?interest .  } Resulting as graph : CONSTRUCT {?friend foaf:interest ?interest } WHERE {    <http://www.w3.org/People/Berners­Lee/card#i> foaf:knows ?friend .    ?friend foaf:interest ?interest .  } Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 27. SPARQL: Query Linked Insiders January Data from different sources 2010 Customer DB Employees DB How to access these datasets with a single SPARQL query? Project DB DBpedia Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 28. SPARQL: Query Linked Insiders January Data from different sources 2010 Customer DB Employees DB Squin: Query the Web of Linked Data http://squin.sourceforge.net/ Squin follows a Link Traversal D2R Server D2R Server approach over HTTP URIs. SQUIN Remember: SELECT DISTINCT ?c ?cityName ?ur WHERE { D2R Server D2R Server ?u skos:subject dbpedia_cat:Universities_and_colleges_i n_Rhineland-Palatinate; dbpedia:city ?c . ?c owl:sameAs [ rdfs:label ?cityName ; eurostat:unemployment_rate_total ?ur ] } Project DB DBpedia Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 29. Using RDF and Linked Data Insiders January for Information Extraction 2010 User Linked Data Query asks question t a bou to answers Text Extraction Result Graph Pipeline Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 30. Using RDF and Linked Data Insiders January for Information Extraction 2010 What data do we have? Example RDF data <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> rdf:type foaf:Document ; dc:creator dblp_author:Markus_Ebbecke ;  dc:title „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ . Classes Instances Datatype Properties Object Properties Literals foaf:Document .../SchulzEGAAD09 dc:title dc:creator „Markus“ foaf:Person .../Markus_Ebbecke foaf:name foaf:knows „Ebbecke“ foaf:firstName „Seizing the foaf:surName Treasure: Transferring Knowledge in Invoice Analysis“ Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 31. SCOOBIE Insiders January Domain Adaption 2010 Structured Text Corpus Data Data Patterns and Gazetteers Data Vocabulary Data Instance Data Data Preprocessing Information & Learning (offline) Extraction (online) Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 31
  • 32. SCOOBIE Insiders January Eco System 2010 Index Domain Knowledge Models Text Training Corpus Corpus Session Data Instances Ontology Models Patterns + Gazetteers Pre- process Train Extract Tasks API I O I Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 32
  • 33. SCOOBIE Insiders January OBIE Pipeline 2010 Normalization Text Extraction Language Detection Segmentation Tokenization Sentence Extraction POS-Tagging Symbolization Named Entity Recognition Structured Entity Recognition Noun Phrase Chunking Symbol Recognition Instantiation Instance Recognition Instance Disambiguation Chunk Classification Contextualization Fact Extraction Fact Selection Population Query Answering Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 33
  • 34. Used Machine Insiders January Learning Models 2010 Semi-Supervised Learning CRF-based Noun Phrase Chunker I Supervised Learning Gazetteer matching statistics (Named Entity Recognition) I Regex matching statistics (Structured Entity Recognition) Unsupervised or Instance-based Learning TF/IDF-based instance re-ranking (Instance Disambiguation) I K-Nearest-Neighbor chunk classifier (Chunk Classification) Spreading Activation-based fact ranking (Fact Selection) Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 34
  • 35. Used Machine Learning: Insiders January Conditional Random Field 2010 CRFs are sequence taggers: Train it with: Bill CAPITALIZED noun slept LOWERCASE non-noun here LOWERCASE non-noun Test it with: He CAPITALIZED visited LOWERCASE London CAPITALIZED CRF results: noun MALLET - MAchine Learning non-noun for LanguagE Toolkit non-noun http://mallet.cs.umass.edu/ Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 35
  • 36. Bringing Linked Data to Insiders January Text 2010 Annotate plain text or HTML with RDF data. I'm working at DFKI. RDFa offers an HTML extension: I'm working at <span about="dbpedia:DFKI" property="rdfs:label"> DFKI</span> Now lets generate RDFa automatically ... Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 36
  • 37. Insiders Do you remember? January 2010 annotated Data Text text User-defined Filter Ex annotate tra ct io n Pi pe populate l in e Extraction Results enrich Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 37
  • 38. Insiders RDF Epiphany January 2010 Epiphany takes the original webpage … Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 38
  • 39. Insiders RDF Epiphany January 2010 Epiphany takes the original webpage … and SCOOBIE initialized with an RDF data set … Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 39
  • 40. Insiders RDF Epiphany January 2010 Epiphany takes the original webpage … and SCOOBIE initialized with an RDF data set … It extracts RDF information from text and annotates it as RDFa … Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 40
  • 41. Insiders RDF Epiphany January 2010 Epiphany takes the original webpage … and SCOOBIE initialized with an RDF Linked Data set … It extracts RDF information from text and annotates it as RDFa … clicking on RDFa annotations opens further information from the Linked Data set Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 41
  • 42. Insiders RDF Epiphany January 2010 At a glance ● Epiphany is a free web service. ● Epiphany uses SCOOBIE. SCOOBIE ● Epiphany can be initialized with any RDF Linked Data set. ● Epiphany generates an RDF document about a web page. ● Epiphany annotates RDF as RDFa in the web page. http://projects.dfki.uni-kl.de/epiphany/ Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 42
  • 43. Insiders Summary January 2010 Customer DB Employees DB annotated Text text D2R D2R Server SQUIN Server User-defined Filter D2R D2R Server Server Project DB DBpedia Ex annotate tra ct io n Pi pe populate l in e Extraction Results enrich Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 43
  • 44. Insiders Outlook January 2010 Customer DB Employees DB E-Mail annotated E-Mail D2R D2R Server SQUIN Server User-defined Filter D2R D2R Server Server Project DB DBpedia Ex annotate tra ct io n Pi pe populate l in e Extraction Results enrich Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 44
  • 45. Insiders Thank you! January 2010 scoobie sparql rdfa D2R server rdf squin epiphany Linked Data OBIE Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 45