SlideShare une entreprise Scribd logo
1  sur  37
Linked Enterprise Data
      LEVERAGING THE SEMANTIC WEB STACK
      IN A CORPORATE ENVIRONMENT

      ISWC 2012 – BOSTON
      FABRICE LACROIX – LACROIX@ANTIDOT.NET
                                              1
Copyright Antidot™
Antidot – who we are

             French-based Software Vendor
             Since 1999 | Paris, Lyon, Aix-en-Provence
             Information access | Data management

             Mission: Provide our customers with innovative
             customizable solutions that help them create
             value with their data, and make their employees
             more aware and efficient.



                                                               2
Copyright Antidot™
Clients
             Enterprises   Publishing   E-commerce




             Healthcare




                                                     3
Copyright Antidot™
Unstructured documents

             files, ECM, collaborative spaces
             intranet, extranet, Web sites
             e-mails, instant messaging




                                                4
Copyright Antidot™
Structured data

             CRM, ERP, directory
             knowledge bases
             business applications (production, support)




                                                           5
Copyright Antidot™
IS are bloated
             1 practice => 1 need => 1 application => 1 silo
             Information system is driven by the process
             Data are numerous, various and scattered




                                                               6
Copyright Antidot™
Solutions or workarounds?




                     BI   MDM

                     SOA Search




                                  7
Copyright Antidot™
Solutions and workarounds
             Enterprise Search brings little value to users
             Document oriented
             Does not solve real business problems




      Google like                       Verity like



                                                              8
Copyright Antidot™
What we want




                     9
Copyright Antidot™
What we want
                               ERP




      CRM


                                             Production




   LDAP



                                                   ECM


                     Support         Files                10
Copyright Antidot™
Changing the paradigm

             Switching from an application view to a
             data centric way of thinking.




                                                       11
Copyright Antidot™
Bring out the implicit

             Build the Giant   Enterprise Graph




                                                  12
Copyright Antidot™
LED

             Linked Enterprise Data
                application of the Semantic Web technologies
                and Linked Data principles to the enterprise
                infrastructure




                                                               13
Copyright Antidot™
What works for the Web…

             Federating silos on the Web




                     http://www.w3.org/People/Ivan/CorePresentations/RDFTutorial/Slides.html#(102)
                                                                                                     14
Copyright Antidot™
…can’t always be used

             in corporate IS
             Legacy apps can’t be "Sparql’ed"
             80% un- or semi- structured data don’t fit in the model
              as such
             Defining vocabularies/ontologies for silos is too
              complex and expensive
             Don’t want RDF per se but valuable information
             External data is available in XML/JSON through Web
              Services
             Staff trained for RDB, XML, Web apps.
             No Risk and stability strategy: SemWeb technology
              considered as new and immature
                                                                        15
Copyright Antidot™
The RDF/storage approach

             Setting up a global RDF repository does not
             work either
             ITs are afraid by the "RDF everywhere" activists




                                                                 16
Copyright Antidot™
Semantic Web technology
                      still is the right solution
                     in corporate environment
                        BUT it is not an aim
                            JUST use it

                        as a means
                                                    17
Copyright Antidot™
Just do it

             Think of it as a stream paradigm
             build new objects using existing data
             without interfering with the existing infrastructure
             with SemWeb somewhere under the hood




                                                                 18
Copyright Antidot™
Enterprise Graph HowTo

             Construct the graph
             generate triples from data
             create triples from documents
             Leverage the graph
             enrich
             infer
             Browse the graph
             select resources
             build objects
             Trash the graph
                                              19
Copyright Antidot™
How: extract & normalize

             Harvest and normalize
             as in an ETL
             fetch, clean, transform…
             normalize records (names, IDs) to prepare the
              linking step

             For databases
             db2triples : an RDB2RDF implementation by
              Antidot (open source, W3C validated)


                                                              20
Copyright Antidot™
How: semantize

             Don’t transform everything in RDF
             cherry-pick a subset of interesting fields for
              each object and create their RDF triples
              counterpart
             interesting == needed for linking or inferring




                                    Semantize

                                                               21
Copyright Antidot™
How: semantize

             Triples generation
             Be smart: avoid upfront ontology design, use
              small vocabularies
             Be pragmatic: transform XML tags and field
              names to predicates
             Be agile: only insert what you need. And when
              you need more, add more.

             Semantic Web fuels the modeling, linking
             and information building process

                                                              22
Copyright Antidot™
Enterprise Graph HowTo

             Construct the graph
             generate triples from data
             create triples from documents
             Leverage the graph
             enrich
             infer
             Browse the graph
             select resources
             build objects
             Trash the graph
                                              23
Copyright Antidot™
How: semantize

             Unstructured documents
             Extract metadata and transform them as
              needed to RDF.
                 ➡ Ex: author =>dc:creator


             Use of text-mining to extract named entities:
              people, organizations, products…
                 ➡ generate those entities list using the data sources:
                   directory for employees, CRM for companies and
                   people, ERP for products
                 ➡ create triples like doc_URI quotes entity_URI

                                                                          24
Copyright Antidot™
How: semantize

             Unstructured documents
             Compare documents using various and
              dedicated algorithms
                 ➡ is the same
                 ➡ is included
                 ➡ is similar
                 ➡ is related
             Generates new triples
                 ➡ create triples like
                 <docA>is_sub_version_of<docB>


                                                    25
Copyright Antidot™
Enterprise Graph HowTo

             Construct the graph
             generate triples from data
             create triples from documents
             Leverage the graph
             enrich
             infer
             Browse the graph
             select resources
             build objects
             Trash the graph
                                              26
Copyright Antidot™
How: enrich

             Enrich the graph
             run specific algorithms to generate more links
              and triples (classifiers, topic detection, …)
             insert external data gathered from the LOD or
              other external datasets or APIs




                                                               27
Copyright Antidot™
How: infer

             Create new knowledge
             add rules according to your needs




          IF a coworker is quoted in documents
            AND this coworker belongs to a business unit
          THEN the business unit is bound to the documents
                                                             28
Copyright Antidot™
Enterprise Graph HowTo

             Construct the graph
             generate triples from data
             create triples from documents
             Leverage the graph
             enrich
             infer
             Browse the graph
             select resources
             build objects
             Trash the graph
                                              29
Copyright Antidot™
How: build

             Build
             select resources corresponding to objects
              seeds (using Sparql queries)
             for each seed, follow links smartly in order to
              create basic objects




                                          Build

                                                                30
Copyright Antidot™
How: build

             Finalize
             decorate the new knowledge objects with data
              set apart (not loaded in the triplestore)
             now we have rich user-actionable objects




                                    Build   Finalize
                                                             31
Copyright Antidot™
Enterprise Graph HowTo

             Construct the graph
             generate triples from data
             create triples from documents
             Leverage the graph
             enrich
             infer
             Browse the graph
             select resources
             build objects
             Trash the graph
                                              32
Copyright Antidot™
How: expose

             Make the new information available to
             users and to the entire IS


                                                                      Relational DB
                                  Enrich
                     Harvest                 Semantize
                                                                      RDF Triplestore
                                                                      (Linked Data)

                      Normalize               Classify
                                  Annotate

                                                         Indexation     AFS search
                                                                        engine




                                                                                        33
Copyright Antidot™
Conclusion

             It works!
             The triples we create and the inference rules
              we add are dictated by the goal / application
                 ➡ usage and value oriented
             We benefit from the lazy-flexible-dynamic
              modeling of RDF-RDFS-OWL
                 ➡ we are agile
             What matters is the graph. But the graph is
              not the triplestore
                 ➡ storage independent


                                                              34
Copyright Antidot™
There’s an app for that

             Antidot Information Factory
             a software solution designed specifically
              to leverage structured and unstructured data
             enable large-scale processing of existing data
             automate publishing of enriched or newly
              created information.




                        Harvest Normalize Semantize Enrich Build Expose


                                                                          35
Copyright Antidot™
The Giant Enterprise Graph

             Now we have a path to let SemWeb enter
             the enterprise




                                                      36
Copyright Antidot™
Discuss
      Understand
      Learn
      Exchange

      www.antidot.net
      info@antidot.net

      THANKS FOR YOUR ATTENTION
      QUESTIONS?

                                  37
Copyright Antidot™

Contenu connexe

Tendances

2021 Trends from the Trenches
2021 Trends from the Trenches2021 Trends from the Trenches
2021 Trends from the TrenchesChris Dagdigian
 
The Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | QuboleThe Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | QuboleVasu S
 
Approximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous EventsApproximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous EventsEdward Curry
 
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Chris Dagdigian
 
Trends from the Trenches: 2019
Trends from the Trenches: 2019Trends from the Trenches: 2019
Trends from the Trenches: 2019Chris Dagdigian
 
The Perfect Storm: The Impact of Analytics, Big Data and Analytics
The Perfect Storm: The Impact of Analytics, Big Data and AnalyticsThe Perfect Storm: The Impact of Analytics, Big Data and Analytics
The Perfect Storm: The Impact of Analytics, Big Data and AnalyticsInside Analysis
 
An Environmental Chargeback for Data Center and Cloud Computing Consumers
An Environmental Chargeback for Data Center and Cloud Computing ConsumersAn Environmental Chargeback for Data Center and Cloud Computing Consumers
An Environmental Chargeback for Data Center and Cloud Computing ConsumersEdward Curry
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Chris Dagdigian
 
RDFa: putting RDF on the Web
RDFa: putting RDF on the WebRDFa: putting RDF on the Web
RDFa: putting RDF on the WebBenjamin Heitmann
 
Leveraging open source for big data stack
Leveraging open source for big data stackLeveraging open source for big data stack
Leveraging open source for big data stackFlytxt
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...Vladimir Bacvanski, PhD
 
Self-service Linked Government Data
Self-service Linked Government DataSelf-service Linked Government Data
Self-service Linked Government DataFadi Maali
 
Wikipedia (DBpedia): Crowdsourced Data Curation
Wikipedia (DBpedia): Crowdsourced Data CurationWikipedia (DBpedia): Crowdsourced Data Curation
Wikipedia (DBpedia): Crowdsourced Data CurationEdward Curry
 
Humans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryHumans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryPaco Nathan
 
Cisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt onlyCisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt onlyArthur_Hansen
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open DataDerilinx
 

Tendances (16)

2021 Trends from the Trenches
2021 Trends from the Trenches2021 Trends from the Trenches
2021 Trends from the Trenches
 
The Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | QuboleThe Evolving Role of the Data Engineer - Whitepaper | Qubole
The Evolving Role of the Data Engineer - Whitepaper | Qubole
 
Approximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous EventsApproximate Semantic Matching of Heterogeneous Events
Approximate Semantic Matching of Heterogeneous Events
 
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Cloud Sobriety for Life Science IT Leadership (2018 Edition)
Cloud Sobriety for Life Science IT Leadership (2018 Edition)
 
Trends from the Trenches: 2019
Trends from the Trenches: 2019Trends from the Trenches: 2019
Trends from the Trenches: 2019
 
The Perfect Storm: The Impact of Analytics, Big Data and Analytics
The Perfect Storm: The Impact of Analytics, Big Data and AnalyticsThe Perfect Storm: The Impact of Analytics, Big Data and Analytics
The Perfect Storm: The Impact of Analytics, Big Data and Analytics
 
An Environmental Chargeback for Data Center and Cloud Computing Consumers
An Environmental Chargeback for Data Center and Cloud Computing ConsumersAn Environmental Chargeback for Data Center and Cloud Computing Consumers
An Environmental Chargeback for Data Center and Cloud Computing Consumers
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)
 
RDFa: putting RDF on the Web
RDFa: putting RDF on the WebRDFa: putting RDF on the Web
RDFa: putting RDF on the Web
 
Leveraging open source for big data stack
Leveraging open source for big data stackLeveraging open source for big data stack
Leveraging open source for big data stack
 
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...
 
Self-service Linked Government Data
Self-service Linked Government DataSelf-service Linked Government Data
Self-service Linked Government Data
 
Wikipedia (DBpedia): Crowdsourced Data Curation
Wikipedia (DBpedia): Crowdsourced Data CurationWikipedia (DBpedia): Crowdsourced Data Curation
Wikipedia (DBpedia): Crowdsourced Data Curation
 
Humans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industryHumans in the loop: AI in open source and industry
Humans in the loop: AI in open source and industry
 
Cisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt onlyCisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt only
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 

Similaire à ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic Web stack in a corporate IS environment.

Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big DataJean-Marc Desvaux
 
Libera la potenza del Machine Learning
Libera la potenza del Machine LearningLibera la potenza del Machine Learning
Libera la potenza del Machine LearningJürgen Ambrosi
 
Left Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsLeft Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsInside Analysis
 
From open data to API-driven business
From open data to API-driven businessFrom open data to API-driven business
From open data to API-driven businessOpenDataSoft
 
Standard Issue: Preparing for the Future of Data Management
Standard Issue: Preparing for the Future of Data ManagementStandard Issue: Preparing for the Future of Data Management
Standard Issue: Preparing for the Future of Data ManagementInside Analysis
 
A Strategic View of Enterprise Reporting and Analytics: The Data Funnel
A Strategic View of Enterprise Reporting and Analytics: The Data FunnelA Strategic View of Enterprise Reporting and Analytics: The Data Funnel
A Strategic View of Enterprise Reporting and Analytics: The Data FunnelInside Analysis
 
Impulser la digitalisation et modernisation de la fonction Finance grâce à la...
Impulser la digitalisation et modernisation de la fonction Finance grâce à la...Impulser la digitalisation et modernisation de la fonction Finance grâce à la...
Impulser la digitalisation et modernisation de la fonction Finance grâce à la...Denodo
 
Data Virtualization – Gateway to a Digital Business - Barry Devlin
Data Virtualization – Gateway to a Digital Business - Barry DevlinData Virtualization – Gateway to a Digital Business - Barry Devlin
Data Virtualization – Gateway to a Digital Business - Barry DevlinDenodo
 
MarkLogic Semantic use cases
MarkLogic Semantic use cases MarkLogic Semantic use cases
MarkLogic Semantic use cases Fernando Mesa
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise DataWorks Summit
 
Data APIs as a Foundation for Systems of Engagement
Data APIs as a Foundation for Systems of EngagementData APIs as a Foundation for Systems of Engagement
Data APIs as a Foundation for Systems of EngagementVictor Olex
 
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarial
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarialInteligencia artificial - Quebrando el paradigma de la amnesia empresarial
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarialMarcos Quezada
 
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT IntegrationDenodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT IntegrationDenodo
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldDataWorks Summit/Hadoop Summit
 
ICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceKaran Sachdeva
 

Similaire à ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic Web stack in a corporate IS environment. (20)

Introduction to Big Data An analogy between Sugar Cane & Big Data
Introduction to Big Data An analogy  between Sugar Cane & Big DataIntroduction to Big Data An analogy  between Sugar Cane & Big Data
Introduction to Big Data An analogy between Sugar Cane & Big Data
 
Accelerate Return on Data
Accelerate Return on DataAccelerate Return on Data
Accelerate Return on Data
 
Libera la potenza del Machine Learning
Libera la potenza del Machine LearningLibera la potenza del Machine Learning
Libera la potenza del Machine Learning
 
Left Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise AnalyticsLeft Brain, Right Brain: How to Unify Enterprise Analytics
Left Brain, Right Brain: How to Unify Enterprise Analytics
 
From open data to API-driven business
From open data to API-driven businessFrom open data to API-driven business
From open data to API-driven business
 
Standard Issue: Preparing for the Future of Data Management
Standard Issue: Preparing for the Future of Data ManagementStandard Issue: Preparing for the Future of Data Management
Standard Issue: Preparing for the Future of Data Management
 
A Strategic View of Enterprise Reporting and Analytics: The Data Funnel
A Strategic View of Enterprise Reporting and Analytics: The Data FunnelA Strategic View of Enterprise Reporting and Analytics: The Data Funnel
A Strategic View of Enterprise Reporting and Analytics: The Data Funnel
 
Impulser la digitalisation et modernisation de la fonction Finance grâce à la...
Impulser la digitalisation et modernisation de la fonction Finance grâce à la...Impulser la digitalisation et modernisation de la fonction Finance grâce à la...
Impulser la digitalisation et modernisation de la fonction Finance grâce à la...
 
Data Virtualization – Gateway to a Digital Business - Barry Devlin
Data Virtualization – Gateway to a Digital Business - Barry DevlinData Virtualization – Gateway to a Digital Business - Barry Devlin
Data Virtualization – Gateway to a Digital Business - Barry Devlin
 
Datumize Deck 2019
Datumize Deck 2019 Datumize Deck 2019
Datumize Deck 2019
 
MarkLogic Semantic use cases
MarkLogic Semantic use cases MarkLogic Semantic use cases
MarkLogic Semantic use cases
 
Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey TheoremInfochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey Theorem
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise Making Hadoop Ready for the Enterprise
Making Hadoop Ready for the Enterprise
 
Data APIs as a Foundation for Systems of Engagement
Data APIs as a Foundation for Systems of EngagementData APIs as a Foundation for Systems of Engagement
Data APIs as a Foundation for Systems of Engagement
 
Semantic Web For Dummies
Semantic Web For DummiesSemantic Web For Dummies
Semantic Web For Dummies
 
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarial
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarialInteligencia artificial - Quebrando el paradigma de la amnesia empresarial
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarial
 
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT IntegrationDenodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data World
 
ICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data Science
 

Plus de Antidot

Comment l'intelligence artificielle améliore la recherche documentaire
Comment l'intelligence artificielle améliore la recherche documentaireComment l'intelligence artificielle améliore la recherche documentaire
Comment l'intelligence artificielle améliore la recherche documentaireAntidot
 
Antidot Content Classifier - Valorisez vos contenus
Antidot Content Classifier - Valorisez vos contenusAntidot Content Classifier - Valorisez vos contenus
Antidot Content Classifier - Valorisez vos contenusAntidot
 
Comment l’intelligence artificielle réinvente la fouille de texte
Comment l’intelligence artificielle réinvente la fouille de texteComment l’intelligence artificielle réinvente la fouille de texte
Comment l’intelligence artificielle réinvente la fouille de texteAntidot
 
Antidot Content Classifier
Antidot Content ClassifierAntidot Content Classifier
Antidot Content ClassifierAntidot
 
Cas client CAIJ
Cas client CAIJCas client CAIJ
Cas client CAIJAntidot
 
Du Big Data à la Smart Information : comment valoriser les actifs information...
Du Big Data à la Smart Information : comment valoriser les actifs information...Du Big Data à la Smart Information : comment valoriser les actifs information...
Du Big Data à la Smart Information : comment valoriser les actifs information...Antidot
 
Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"
Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"
Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"Antidot
 
Web sémantique et Web de données, et si on passait à la pratique ?
Web sémantique et Web de données, et si on passait à la pratique ?Web sémantique et Web de données, et si on passait à la pratique ?
Web sémantique et Web de données, et si on passait à la pratique ?Antidot
 
Machine learning, deep learning et search : à quand ces innovations dans nos ...
Machine learning, deep learning et search : à quand ces innovations dans nos ...Machine learning, deep learning et search : à quand ces innovations dans nos ...
Machine learning, deep learning et search : à quand ces innovations dans nos ...Antidot
 
Flyer AFS@Store 2015 FR
Flyer AFS@Store 2015 FRFlyer AFS@Store 2015 FR
Flyer AFS@Store 2015 FRAntidot
 
WISS 2015 - Machine Learning lecture by Ludovic Samper
WISS 2015 - Machine Learning lecture by Ludovic Samper WISS 2015 - Machine Learning lecture by Ludovic Samper
WISS 2015 - Machine Learning lecture by Ludovic Samper Antidot
 
Do’s and don'ts : la recherche interne aux sites de ecommerce
Do’s and don'ts : la recherche interne aux sites de ecommerceDo’s and don'ts : la recherche interne aux sites de ecommerce
Do’s and don'ts : la recherche interne aux sites de ecommerceAntidot
 
Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...
Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...
Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...Antidot
 
Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...
Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...
Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...Antidot
 
En 2015, quelles sont les bonnes pratiques du searchandising ?
En 2015, quelles sont les bonnes pratiques du searchandising ?En 2015, quelles sont les bonnes pratiques du searchandising ?
En 2015, quelles sont les bonnes pratiques du searchandising ?Antidot
 
Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...
Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...
Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...Antidot
 
Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...
Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...
Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...Antidot
 
Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...
Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...
Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...Antidot
 
Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...
Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...
Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...Antidot
 
Comment sélectionner, qualifier puis exploiter les données ouvertes
Comment sélectionner, qualifier puis exploiter les données ouvertesComment sélectionner, qualifier puis exploiter les données ouvertes
Comment sélectionner, qualifier puis exploiter les données ouvertesAntidot
 

Plus de Antidot (20)

Comment l'intelligence artificielle améliore la recherche documentaire
Comment l'intelligence artificielle améliore la recherche documentaireComment l'intelligence artificielle améliore la recherche documentaire
Comment l'intelligence artificielle améliore la recherche documentaire
 
Antidot Content Classifier - Valorisez vos contenus
Antidot Content Classifier - Valorisez vos contenusAntidot Content Classifier - Valorisez vos contenus
Antidot Content Classifier - Valorisez vos contenus
 
Comment l’intelligence artificielle réinvente la fouille de texte
Comment l’intelligence artificielle réinvente la fouille de texteComment l’intelligence artificielle réinvente la fouille de texte
Comment l’intelligence artificielle réinvente la fouille de texte
 
Antidot Content Classifier
Antidot Content ClassifierAntidot Content Classifier
Antidot Content Classifier
 
Cas client CAIJ
Cas client CAIJCas client CAIJ
Cas client CAIJ
 
Du Big Data à la Smart Information : comment valoriser les actifs information...
Du Big Data à la Smart Information : comment valoriser les actifs information...Du Big Data à la Smart Information : comment valoriser les actifs information...
Du Big Data à la Smart Information : comment valoriser les actifs information...
 
Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"
Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"
Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"
 
Web sémantique et Web de données, et si on passait à la pratique ?
Web sémantique et Web de données, et si on passait à la pratique ?Web sémantique et Web de données, et si on passait à la pratique ?
Web sémantique et Web de données, et si on passait à la pratique ?
 
Machine learning, deep learning et search : à quand ces innovations dans nos ...
Machine learning, deep learning et search : à quand ces innovations dans nos ...Machine learning, deep learning et search : à quand ces innovations dans nos ...
Machine learning, deep learning et search : à quand ces innovations dans nos ...
 
Flyer AFS@Store 2015 FR
Flyer AFS@Store 2015 FRFlyer AFS@Store 2015 FR
Flyer AFS@Store 2015 FR
 
WISS 2015 - Machine Learning lecture by Ludovic Samper
WISS 2015 - Machine Learning lecture by Ludovic Samper WISS 2015 - Machine Learning lecture by Ludovic Samper
WISS 2015 - Machine Learning lecture by Ludovic Samper
 
Do’s and don'ts : la recherche interne aux sites de ecommerce
Do’s and don'ts : la recherche interne aux sites de ecommerceDo’s and don'ts : la recherche interne aux sites de ecommerce
Do’s and don'ts : la recherche interne aux sites de ecommerce
 
Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...
Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...
Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...
 
Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...
Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...
Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...
 
En 2015, quelles sont les bonnes pratiques du searchandising ?
En 2015, quelles sont les bonnes pratiques du searchandising ?En 2015, quelles sont les bonnes pratiques du searchandising ?
En 2015, quelles sont les bonnes pratiques du searchandising ?
 
Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...
Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...
Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...
 
Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...
Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...
Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...
 
Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...
Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...
Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...
 
Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...
Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...
Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...
 
Comment sélectionner, qualifier puis exploiter les données ouvertes
Comment sélectionner, qualifier puis exploiter les données ouvertesComment sélectionner, qualifier puis exploiter les données ouvertes
Comment sélectionner, qualifier puis exploiter les données ouvertes
 

Dernier

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Dernier (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic Web stack in a corporate IS environment.

  • 1. Linked Enterprise Data LEVERAGING THE SEMANTIC WEB STACK IN A CORPORATE ENVIRONMENT ISWC 2012 – BOSTON FABRICE LACROIX – LACROIX@ANTIDOT.NET 1 Copyright Antidot™
  • 2. Antidot – who we are French-based Software Vendor  Since 1999 | Paris, Lyon, Aix-en-Provence  Information access | Data management Mission: Provide our customers with innovative customizable solutions that help them create value with their data, and make their employees more aware and efficient. 2 Copyright Antidot™
  • 3. Clients Enterprises Publishing E-commerce Healthcare 3 Copyright Antidot™
  • 4. Unstructured documents files, ECM, collaborative spaces intranet, extranet, Web sites e-mails, instant messaging 4 Copyright Antidot™
  • 5. Structured data CRM, ERP, directory knowledge bases business applications (production, support) 5 Copyright Antidot™
  • 6. IS are bloated 1 practice => 1 need => 1 application => 1 silo Information system is driven by the process Data are numerous, various and scattered 6 Copyright Antidot™
  • 7. Solutions or workarounds? BI MDM SOA Search 7 Copyright Antidot™
  • 8. Solutions and workarounds Enterprise Search brings little value to users  Document oriented  Does not solve real business problems Google like Verity like 8 Copyright Antidot™
  • 9. What we want 9 Copyright Antidot™
  • 10. What we want ERP CRM Production LDAP ECM Support Files 10 Copyright Antidot™
  • 11. Changing the paradigm Switching from an application view to a data centric way of thinking. 11 Copyright Antidot™
  • 12. Bring out the implicit Build the Giant Enterprise Graph 12 Copyright Antidot™
  • 13. LED Linked Enterprise Data application of the Semantic Web technologies and Linked Data principles to the enterprise infrastructure 13 Copyright Antidot™
  • 14. What works for the Web… Federating silos on the Web http://www.w3.org/People/Ivan/CorePresentations/RDFTutorial/Slides.html#(102) 14 Copyright Antidot™
  • 15. …can’t always be used in corporate IS  Legacy apps can’t be "Sparql’ed"  80% un- or semi- structured data don’t fit in the model as such  Defining vocabularies/ontologies for silos is too complex and expensive  Don’t want RDF per se but valuable information  External data is available in XML/JSON through Web Services  Staff trained for RDB, XML, Web apps.  No Risk and stability strategy: SemWeb technology considered as new and immature 15 Copyright Antidot™
  • 16. The RDF/storage approach Setting up a global RDF repository does not work either  ITs are afraid by the "RDF everywhere" activists 16 Copyright Antidot™
  • 17. Semantic Web technology still is the right solution in corporate environment BUT it is not an aim JUST use it as a means 17 Copyright Antidot™
  • 18. Just do it Think of it as a stream paradigm  build new objects using existing data  without interfering with the existing infrastructure  with SemWeb somewhere under the hood 18 Copyright Antidot™
  • 19. Enterprise Graph HowTo Construct the graph  generate triples from data  create triples from documents Leverage the graph  enrich  infer Browse the graph  select resources  build objects Trash the graph 19 Copyright Antidot™
  • 20. How: extract & normalize Harvest and normalize  as in an ETL  fetch, clean, transform…  normalize records (names, IDs) to prepare the linking step For databases  db2triples : an RDB2RDF implementation by Antidot (open source, W3C validated) 20 Copyright Antidot™
  • 21. How: semantize Don’t transform everything in RDF  cherry-pick a subset of interesting fields for each object and create their RDF triples counterpart  interesting == needed for linking or inferring Semantize 21 Copyright Antidot™
  • 22. How: semantize Triples generation  Be smart: avoid upfront ontology design, use small vocabularies  Be pragmatic: transform XML tags and field names to predicates  Be agile: only insert what you need. And when you need more, add more. Semantic Web fuels the modeling, linking and information building process 22 Copyright Antidot™
  • 23. Enterprise Graph HowTo Construct the graph  generate triples from data  create triples from documents Leverage the graph  enrich  infer Browse the graph  select resources  build objects Trash the graph 23 Copyright Antidot™
  • 24. How: semantize Unstructured documents  Extract metadata and transform them as needed to RDF. ➡ Ex: author =>dc:creator  Use of text-mining to extract named entities: people, organizations, products… ➡ generate those entities list using the data sources: directory for employees, CRM for companies and people, ERP for products ➡ create triples like doc_URI quotes entity_URI 24 Copyright Antidot™
  • 25. How: semantize Unstructured documents  Compare documents using various and dedicated algorithms ➡ is the same ➡ is included ➡ is similar ➡ is related  Generates new triples ➡ create triples like <docA>is_sub_version_of<docB> 25 Copyright Antidot™
  • 26. Enterprise Graph HowTo Construct the graph  generate triples from data  create triples from documents Leverage the graph  enrich  infer Browse the graph  select resources  build objects Trash the graph 26 Copyright Antidot™
  • 27. How: enrich Enrich the graph  run specific algorithms to generate more links and triples (classifiers, topic detection, …)  insert external data gathered from the LOD or other external datasets or APIs 27 Copyright Antidot™
  • 28. How: infer Create new knowledge  add rules according to your needs IF a coworker is quoted in documents AND this coworker belongs to a business unit THEN the business unit is bound to the documents 28 Copyright Antidot™
  • 29. Enterprise Graph HowTo Construct the graph  generate triples from data  create triples from documents Leverage the graph  enrich  infer Browse the graph  select resources  build objects Trash the graph 29 Copyright Antidot™
  • 30. How: build Build  select resources corresponding to objects seeds (using Sparql queries)  for each seed, follow links smartly in order to create basic objects Build 30 Copyright Antidot™
  • 31. How: build Finalize  decorate the new knowledge objects with data set apart (not loaded in the triplestore)  now we have rich user-actionable objects Build Finalize 31 Copyright Antidot™
  • 32. Enterprise Graph HowTo Construct the graph  generate triples from data  create triples from documents Leverage the graph  enrich  infer Browse the graph  select resources  build objects Trash the graph 32 Copyright Antidot™
  • 33. How: expose Make the new information available to users and to the entire IS Relational DB Enrich Harvest Semantize RDF Triplestore (Linked Data) Normalize Classify Annotate Indexation AFS search engine 33 Copyright Antidot™
  • 34. Conclusion It works!  The triples we create and the inference rules we add are dictated by the goal / application ➡ usage and value oriented  We benefit from the lazy-flexible-dynamic modeling of RDF-RDFS-OWL ➡ we are agile  What matters is the graph. But the graph is not the triplestore ➡ storage independent 34 Copyright Antidot™
  • 35. There’s an app for that Antidot Information Factory  a software solution designed specifically to leverage structured and unstructured data  enable large-scale processing of existing data  automate publishing of enriched or newly created information. Harvest Normalize Semantize Enrich Build Expose 35 Copyright Antidot™
  • 36. The Giant Enterprise Graph Now we have a path to let SemWeb enter the enterprise 36 Copyright Antidot™
  • 37. Discuss Understand Learn Exchange www.antidot.net info@antidot.net THANKS FOR YOUR ATTENTION QUESTIONS? 37 Copyright Antidot™

Notes de l'éditeur

  1. Our information system, like any other corporate IS is blossoming with of all type of information. Most of it this information is UNstructured.
  2. And part of it is structured : mostly due to relational database storage underlying business applications.This is applications we run internally: CRM, ERP, Support tracking, …
  3. Many approaches have been developed to solve this problem of isolated silos.Most of them only apply to structured data (BI, MDM).And in most cases they entail a long and costly deployment process and make the system more complex.
  4. Enterprise search is not a solution. And we know that for sure since we are a leading vendor in the realm of search solutions.The problem is related to the very nature of current search engines :- they are document oriented : they read documents, they index documents, they reply documents.
  5. This is what we want: agile information, meshed, merged, enriched.
  6. What you see is not data mashup! Not just data put side by side.Some information you see here need advanced processing that can not be done on the fly.
  7. The solution is to change the paradigm: forget the applications and the APIs.Just look at the data.
  8. Weneed to create the Enterprise graph
  9. There is a solution:one that has been thought and designed for the Web.If it works for the Web, it should work for youand us.
  10. The architecture for integrating data on the Web from various silos relies on a federated principle where a query is synchronously distributed over the sources through SPARQL endpoints exposed by each of them.This approach presents many scientific and technological challenges but considering the rationale behind the Web of Data and the need to work in the gigantic open Web space, this seems to be the only reasonable way to make it work.
  11. Though theoretically correct, this approach is not applicable to the corporate IS for a large variety of reasons:• The corporate information system is built with numerous legacy or closed applications that cannot be adapted or extended with Sparql endpoints• The enterprise information realm is made up at 80% of unstructured or semi-structured data that cannot fit in the model as such.• Enterprises do not want access to raw data in RDF format. They want to reap valuable information derived from the data, which requires large and complex computations to create these new informational objects.• The bottom-up approach of mapping silos and their data to RDF to fit the model requires an enormous work for defining vocabularies or ontologies for each source, which is a too heavy investment.• Companies dream of seamlessly integrating external data to leverage their internal information. But this external data is mostly available in XML or JSON through Web Services, and not yet in RDF, so that using Sparql as a way to query and integrate does not make sense.• ITdepartments have invested heavily in their “relational database for storing / XML for exchanging / Web apps for accessing” infrastructure. Their staffs are trained for this paradigm. They lack in-house skills for integrating the graph-way-of-thinking.• Stability matters most and Semantic Web technology is unknown, considered as new and immature: CIOs are not ready to take the risk of adding load and technological uncertainty on systems that are critical to the company for its daily business operations.
  12. Does not work because process: modeling, know-how technology: performance, scalability enterprise don’t care about technology, especially if new one.
  13. We tailor the Normalize process by aligning fields content in order to mesh data coming from different sources (such as records from a CRM and an ERP).R2RML and Direct Mapping compliant module named db2triples.
  14. “Why do we transform only a subpart of the harvested data in RDF and what do we do with the rest of it?” Indeed, not to mention the fact that text documents are not graph friendly, as stated above we only transform a selected part of the structured data into RDF:From a technical standpoint we don’t feel like the technology is mature and stable enough to proceed differently. In industrial projects, millions of seed objects are regularly extracted from the sources (invoices, clients, files, etc.), each having tens of fields. And having billions of triples doesn’t scale well in available triplestores.Transforming only a subpart of the data largely simplifies the task of choosing the predicates, hence reinforces the choice of using many small available vocabularies instead of big ontologies.The data that is not transformed to RDF is stored by Information Factory for later use during the Build step.
  15. Unstructured documents like office files, PDF files or emails content don’t fit the RDF formalism and cannot be linked to the graph as such.Extra work is necessary: First, we transform available metadata like document name, author, creation date, sender and receivers for a mail, subject and so forth into RDF.Then, we use text-mining technology to extract named entities like people, organizations, products, etc. from the documents. These entities lists are generated using different sources of the enterprise: directories, CRM or ERP are providing people and company names, while products are listed in ERPs or taxonomies.
  16. And last, we run various specific algorithms designed to do document versus document comparison to detect duplicates, different versions of the same document, inclusions, semantically related ones, etc. Each of these relations is inserted in the graph with an appropriate predicate.
  17. It is like cooking: the rules are your own personal touch. Rules depend on the information and knowledge you want to create by inferring on the graph.
  18. We created the graph by inserting basic triples. Then we grew the graph with enriching and inferring.Now it is time to extract the information we need.For this, we first select the resources we look for.Then we follow some links to grab the information and create basic objects.
  19. We agree we all would like to see those technologies invading the information systeml.We would like to put these stickers on this beautiful zSeries mainframe. But what does it mean? How can we do that?