SDMX-RDF is a proposed standard for publishing statistical data and metadata according to Linked Data principles based on SDMX. It aims to disseminate statistics over the web as linked data by providing a high-fidelity representation of statistical information and enabling the linking of statistical data with other information assets and the reuse of artifacts. SDMX-RDF builds on the existing SDMX information model and syntaxes by expressing key SDMX concepts like datasets, code lists, and concepts as RDF to make them available on the web. The roadmap for SDMX-RDF involves further developing the specification, tutorials, converters from existing formats, and engaging with the SDMX user community for feedback and
This document discusses the history and modern approaches to knowledge management systems on the desktop. It traces ideas back to Vannevar Bush's 1945 proposal of the memex device. More recent systems like Doug Engelbart's NLS in the 1960s and Ted Nelson's Xanadu aimed to improve linking and navigation of information. Modern semantic desktops take a layered, modular approach and use ontologies and semantic technologies to unlock and integrate desktop data. They provide services like storage, extraction, annotation and inference to enhance existing applications and help users manage information overload. Evaluation of these systems remains a challenge due to their personal, customized nature.
This document discusses enabling linked open government data through the use of linked data principles and vocabularies. It describes open government data and the benefits of publishing data as linked open data using RDF and shared vocabularies. It also discusses using the Data Catalog Vocabulary (DCAT) to describe government data catalogs as linked data and how this can enable federated search across catalogs. Finally, it outlines steps for publishing government datasets as linked open data.
The Briefing Room with Colin White and Composite Software
Live Webcast Feb. 26, 2013
The modern business analyst needs data from all over the place: yes, the data warehouse, but also the Web, big data, production systems, as well as via partners and vendors. In fact, the typical analyst spends more than 50% of the time chasing data, which slows delivery of analytic insights and limits the time available for thorough analysis. Some practitioners refer to this conundrum as "the data problem."
Check out the slides from this episode of The Briefing Room to hear veteran Analyst Colin White of BI Research as he explains why analytical sandboxes and data hubs can be an analyst's best friend. He'll be briefed by Bob Eve of Composite Software who will discuss his company's mature data virtualization platform, which includes a number of capabilities that help organizations leverage agile analytics. He will discuss why time-to-insight is fast becoming the battle cry of analysis-driven organizations.
Visit: http://www.insideanalysis.com
Towards Expertise Modelling for Routing Data Cleaning Tasks within a Communit...Umair ul Hassan
https://www.insight-centre.org/content/towards-expertise-modelling-routing-data-cleaning-tasks-within-community-knowledge-workers
Presented at the ICIQ 2012
ABSTRACT:
Applications consuming data have to deal with variety of data quality issues such as missing values, duplication, incorrect values, etc. Although automatic approaches can be utilized for data cleaning the results can remain uncertain. Therefore updates suggested by automatic data cleaning algorithms require further human verification. This paper presents an approach for generating tasks for uncertain updates and routing these tasks to appropriate workers based on their expertise. Specifically the paper tackles the problem of modelling the expertise of knowledge workers for the purpose of routing tasks within collaborative data quality management. The proposed expertise model represents the profile of a worker against a set of concepts describing the data. A simple routing algorithm is employed for leveraging the expertise profiles for matching data cleaning tasks with workers. The proposed approach is evaluated on a real world dataset using human workers. The results demonstrate the effectiveness of using concepts for modelling expertise, in terms of likelihood of receiving responses to tasks routed to workers.
The Gnowsis Semantic Desktopapproach to Personal InformationManagement - Di...leobard
Slides of my phd dissertation "The Gnowsis Semantic Desktopapproach to Personal InformationManagement"
Presented on 5th June 2009 at University of Kaiserslautern to get a PhD in engineering science.
Dcat - Machine Accessible Data CataloguesFadi Maali
The document discusses the Data Catalog (DCAT) vocabulary, which is being developed by the W3C Government Linked Data Working Group to facilitate interoperability between data catalogs published on the web. It provides examples of how DCAT can be used to enable advanced queries across multiple catalogs and describes implementations of DCAT by several government organizations to publish metadata about their datasets.
SDMX-RDF is a proposed standard for publishing statistical data and metadata according to Linked Data principles based on SDMX. It aims to disseminate statistics over the web as linked data by providing a high-fidelity representation of statistical information and enabling the linking of statistical data with other information assets and the reuse of artifacts. SDMX-RDF builds on the existing SDMX information model and syntaxes by expressing key SDMX concepts like datasets, code lists, and concepts as RDF to make them available on the web. The roadmap for SDMX-RDF involves further developing the specification, tutorials, converters from existing formats, and engaging with the SDMX user community for feedback and
This document discusses the history and modern approaches to knowledge management systems on the desktop. It traces ideas back to Vannevar Bush's 1945 proposal of the memex device. More recent systems like Doug Engelbart's NLS in the 1960s and Ted Nelson's Xanadu aimed to improve linking and navigation of information. Modern semantic desktops take a layered, modular approach and use ontologies and semantic technologies to unlock and integrate desktop data. They provide services like storage, extraction, annotation and inference to enhance existing applications and help users manage information overload. Evaluation of these systems remains a challenge due to their personal, customized nature.
This document discusses enabling linked open government data through the use of linked data principles and vocabularies. It describes open government data and the benefits of publishing data as linked open data using RDF and shared vocabularies. It also discusses using the Data Catalog Vocabulary (DCAT) to describe government data catalogs as linked data and how this can enable federated search across catalogs. Finally, it outlines steps for publishing government datasets as linked open data.
The Briefing Room with Colin White and Composite Software
Live Webcast Feb. 26, 2013
The modern business analyst needs data from all over the place: yes, the data warehouse, but also the Web, big data, production systems, as well as via partners and vendors. In fact, the typical analyst spends more than 50% of the time chasing data, which slows delivery of analytic insights and limits the time available for thorough analysis. Some practitioners refer to this conundrum as "the data problem."
Check out the slides from this episode of The Briefing Room to hear veteran Analyst Colin White of BI Research as he explains why analytical sandboxes and data hubs can be an analyst's best friend. He'll be briefed by Bob Eve of Composite Software who will discuss his company's mature data virtualization platform, which includes a number of capabilities that help organizations leverage agile analytics. He will discuss why time-to-insight is fast becoming the battle cry of analysis-driven organizations.
Visit: http://www.insideanalysis.com
Towards Expertise Modelling for Routing Data Cleaning Tasks within a Communit...Umair ul Hassan
https://www.insight-centre.org/content/towards-expertise-modelling-routing-data-cleaning-tasks-within-community-knowledge-workers
Presented at the ICIQ 2012
ABSTRACT:
Applications consuming data have to deal with variety of data quality issues such as missing values, duplication, incorrect values, etc. Although automatic approaches can be utilized for data cleaning the results can remain uncertain. Therefore updates suggested by automatic data cleaning algorithms require further human verification. This paper presents an approach for generating tasks for uncertain updates and routing these tasks to appropriate workers based on their expertise. Specifically the paper tackles the problem of modelling the expertise of knowledge workers for the purpose of routing tasks within collaborative data quality management. The proposed expertise model represents the profile of a worker against a set of concepts describing the data. A simple routing algorithm is employed for leveraging the expertise profiles for matching data cleaning tasks with workers. The proposed approach is evaluated on a real world dataset using human workers. The results demonstrate the effectiveness of using concepts for modelling expertise, in terms of likelihood of receiving responses to tasks routed to workers.
The Gnowsis Semantic Desktopapproach to Personal InformationManagement - Di...leobard
Slides of my phd dissertation "The Gnowsis Semantic Desktopapproach to Personal InformationManagement"
Presented on 5th June 2009 at University of Kaiserslautern to get a PhD in engineering science.
Dcat - Machine Accessible Data CataloguesFadi Maali
The document discusses the Data Catalog (DCAT) vocabulary, which is being developed by the W3C Government Linked Data Working Group to facilitate interoperability between data catalogs published on the web. It provides examples of how DCAT can be used to enable advanced queries across multiple catalogs and describes implementations of DCAT by several government organizations to publish metadata about their datasets.
The Evolving Role of the Data Engineer - Whitepaper | QuboleVasu S
A whitepaper about how the evolving data engineering profession helps data-driven companies work smarter and lower cloud costs with Qubole.
https://www.qubole.com/resources/white-papers/the-evolving-role-of-the-data-engineer
Approximate Semantic Matching of Heterogeneous EventsEdward Curry
Event-based systems have loose coupling within space, time and synchronization, providing a scalable infrastructure for information exchange and distributed workflows. However, event-based systems are tightly coupled, via event subscriptions and patterns, to the semantics of the underlying event schema and values. The high degree of semantic heterogeneity of events in large and open deployments such as smart cities and the sensor web makes it difficult to develop and maintain event-based systems. In order to address semantic coupling within event-based systems, we propose vocabulary free subscriptions together with the use of approximate semantic matching of events. This paper examines the requirement of event semantic decoupling and discusses approximate semantic event matching and the consequences it implies for event processing systems. We introduce a semantic event matcher and evaluate the suitability of an approximate hybrid matcher based on both thesauri-based and distributional semantics-based similarity and relatedness measures. The matcher is evaluated over show that the approach matches a representation of Wikipedia and Freebase events. Initial evaluations events structured with maximal combined precision-recall F1 score of 75.89% on average in all experiments with a subscription set of 7 subscriptions. The evaluation shows how a hybrid approach to semantic event matching outperforms a single similarity measure approach.
Hasan S, O'Riain S, Curry E. Approximate Semantic Matching of Heterogeneous Events. In: 6th ACM International Conference on Distributed Event-Based Systems (DEBS 2012).
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Chris Dagdigian
Chris Dagdigian provides practical tips for life science IT leadership based on his experience working in bioinformatics. Some key points include:
1) Cloud adoption in life sciences is driven by the need for flexible capabilities and collaboration rather than cost savings alone.
2) Common mistakes include lack of planning, bypassing security reviews, and forcing legacy patterns onto cloud infrastructure.
3) AWS is the leader in cloud capabilities but all providers oversimplify challenges in their marketing. Real-world requirements around networking, security and provisioning need to be considered.
Annual address covering trends, emerging requirements, pain points and infrastructure issues in the "Bio-IT" aka life science informatics and HPC realm; Email me if you want a PDF of this talk - chris@bioteam.net
The Perfect Storm: The Impact of Analytics, Big Data and AnalyticsInside Analysis
The document discusses the impacts of analytics, big data, and cloud computing. It notes that these trends are driving rapid business changes by enabling closed-loop businesses, massive information volumes, and increased collaboration. Recent technology advances like mobile apps, analytics, data processing improvements, and social/enterprise tools can help address emerging needs. Key points discussed include the growth of data sources and types, improvements in memory and parallel processing, and the evolving relationship between business and IT.
An Environmental Chargeback for Data Center and Cloud Computing ConsumersEdward Curry
Government, business, and the general public increasingly agree that the polluter should pay. Carbon dioxide and environmental damage are considered viable chargeable commodities. The net effect of this for data center and cloud computing operators is that they should look to “chargeback” the environmental impacts of their services to the consuming end-users. An environmental chargeback model can have a positive effect on environmental impacts by linking consumers to the indirect impacts of their usage, facilitating clearer understanding of the impact of their actions. In this paper we motivate the need for environmental chargeback mechanisms. The environmental chargeback model is described including requirements, methodology for definition, and environmental impact allocation strategies. The paper details a proof-of-concept within an operational data center together with discussion on experiences gained and future research directions.
Curry, E.; Hasan, S.; White, M.; and Melvin, H. 2012. An Environmental Chargeback for Data Center and Cloud Computing Consumers. In Huusko, J.; de Meer, H.; Klingert, S.; and Somov, A., eds., First International Workshop on Energy-Efficient Data Centers. Madrid, Spain: Springer Berlin / Heidelberg.
Bio-IT Trends From The Trenches (digital edition)Chris Dagdigian
Note: Contact me directly dag@bioteam.net if you would like a PDF download of these slides
This is Chris Dagdigian’s 10th year delivering his no holds barred, candid state of the industry address at BioIT World, and we are not going to let a pandemic stop him.
Instead of his typical talk, five distinguished panelists will join Chris for a spirited discussion on Current Events and Scientific Computing and the impacts of the COVID-19 Pandemic:
These slides explain (1) the motivation for using RDFa, for embedding structured data on web pages, (2) RDF as the foundation of RDFa, and (3) RDFa through examples.
The document discusses big data and the open source big data stack. It defines big data as large datasets that are difficult to store, manage and analyze. Everyday, 2.5 trillion bytes of data are created, with 90% created in the last two years. The open source big data stack includes tools like Hadoop, HBase, Hive and Pig that can handle large datasets through distributed computing across multiple servers. The stack provides flexibility, reliability, auditability and fast deployment at low cost compared to proprietary solutions.
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...Vladimir Bacvanski, PhD
This document discusses how to analyze large datasets using Hadoop and BigInsights. It describes how IBM's Watson uses Hadoop to distribute its workload and load information into memory from sources like 200 million pages of text, CRM data, POS data, and social media to provide distilled insights. The document provides two use case examples of how energy companies and global media firms could use big data analytics to analyze weather data and identify unauthorized streaming content.
This document describes a self-service approach to publishing government data as linked open data. It proposes using Google Refine with extensions for exporting to RDF and reconciling entities, along with best practices like the Data Catalog Vocabulary and Data Cube vocabulary. Publishers would provide metadata about their datasets using these standards. Users could then select datasets in Google Refine, map them to RDF schemas, link entities, and export the resulting RDF. This RDF could then be shared on a platform like CKAN along with provenance information about the transformation process. The approach is intended to lower the costs of publishing linked data while maintaining quality.
Human: Thank you, that is a concise and accurate 3 sentence summary of
Wikipedia (DBpedia): Crowdsourced Data CurationEdward Curry
Wikipedia is an open-source encyclopedia, built collaboratively by a large community of web editors. The success of Wikipedia as one of the most important sources of information available today still challenges existing models of content creation. Despite the fact that the term ‘curation’ is not commonly addressed by Wikipedia’s contributors, the task of digital curation is the central activity of Wikipedia editors, who have the responsibility for information quality standards.
Wikipedia, is already widely used as a collaborative environment inside organizations5.
The investigation of the collaboration dynamics behind Wikipedia highlights important features and good practices which can be applied to different organizations. Our analysis focuses on the curation perspective and covers two important dimensions: social organization and artifacts, tools & processes for cooperative work coordination. These are key enablers that support the creation of high quality information products in Wikipedia’s decentralized environment.
Humans in the loop: AI in open source and industryPaco Nathan
Nike Tech Talk, Portland, 2017-08-10
https://niketechtalks-aug2017.splashthat.com/
O'Reilly Media gets to see the forefront of trends in artificial intelligence: what the leading teams are working on, which use cases are getting the most traction, previews of advances before they get announced on stage. Through conferences, publishing, and training programs, we've been assembling resources for anyone who wants to learn. An excellent recent example: Generative Adversarial Networks for Beginners, by Jon Bruner.
This talk covers current trends in AI, industry use cases, and recent highlights from the AI Conf series presented by O'Reilly and Intel, plus related materials from Safari learning platform, Strata Data, Data Show, and the upcoming JupyterCon.
Along with reporting, we're leveraging AI in Media. This talk dives into O'Reilly uses of deep learning -- combined with ontology, graph algorithms, probabilistic data structures, and even some evolutionary software -- to help editors and customers alike accomplish more of what they need to do.
In particular, we'll show two open source projects in Python from O'Reilly's AI team:
• pytextrank built atop spaCy, NetworkX, datasketch, providing graph algorithms for advanced NLP and text analytics
• nbtransom leveraging Project Jupyter for a human-in-the-loop design pattern approach to AI work: people and machines collaborating on content annotation
This document provides an agenda and overview for a presentation on leveraging big data to create value. The agenda includes sessions on Hadoop in the real world, Cisco servers for big data, and breakout brainstorming sessions. The presentation discusses how big data can be a competitive strategy, its financial benefits, and goals for applying it in ways that improve important business metrics. An overview of key big data technologies is presented, including Hadoop, NoSQL databases, and in-memory databases. The big data software stack and how big data expands the traditional data stack is also summarized.
The document provides information about the Digital Enterprise Research Institute (DERI) in Galway, Ireland. It discusses DERI's research areas including semantic web, social networks, and data mining. It also outlines DERI's funding sources and partners. The document then shifts to discussing linked open data, including its key components like RDF and vocabularies. Finally, it provides examples of linked open data projects by DERI and others.
Introduction to Big Data An analogy between Sugar Cane & Big DataJean-Marc Desvaux
Big data is large and complex data that exceeds the processing capacity of conventional database systems. It is characterized by high volume, velocity, and variety of data. An enterprise can leverage big data through an analytical use to gain new insights, or through enabling new data-driven products and services. An analogy compares an enterprise's big data architecture to a sugar cane factory that acquires, organizes, analyzes, and generates business intelligence from big data sources to create value for the organization. NoSQL databases are complementary to rather than replacements for relational databases in big data solutions.
The document discusses information management challenges in today's data-intensive world. It highlights how IBM offers a comprehensive vision and single platform to address issues like extreme data growth, complexity, and the need for real-time insights. IBM helps organizations optimize investments, improve customer satisfaction, increase coupon redemption rates, and reduce road congestion through analytics, governance, integration, and other solutions.
Un approccio completo di tipo cognitivo comprende tre componenti: un metodo, un ecosistema e una piattaforma. In questa sessione scopriremo come realizzare questo approccio grazie anche a Watson Data Platform, che aiuta i data scientist e gli esperti di business analytics a far “lavorare i dati” in un’ottica cognitive. In questo modo si può dare impulso alla crescita e al cambiamento aziendale. Ci concentreremo sulla possibilità di analizzare i dati provenienti dai Social Media per valutare la percezione dell’Amministrazione da parte di studenti, genitori, stampa, blogger…
Al cuore della soluzione ci sono una serie di servizi disegnati per funzione aziendale (sviluppatori, data scientist, data engineers, comunicazione / marketing) e la capacità di imparare propria della tecnologia cognitiva, che completano l’architettura e aiutano a “comporre” nuove soluzioni di business.
Left Brain, Right Brain: How to Unify Enterprise AnalyticsInside Analysis
The Briefing Room with Robin Bloor and Teradata
Live Webcast on Jan. 29, 2013
Despite its name, effective Data Science requires a certain amount of artistic flair. Analysts must be creative about how and where they find the insights that will drive business value. One classic roadblock to that kind of frictionless process? Programming. Not everyone can code Java, which makes the unstructured domain of Hadoop quite challenging for the average business analyst.
Check out the slides from this episode of the Briefing Room to hear veteran Analyst Dr. Robin Bloor explain how a new generation of analytical platforms will solve the complexity of unifying structured and unstructured data. He'll be briefed by Steve Wooledge of Teradata Aster who will tout his company's Big Data Appliance, which leverages the SQL-H bridge, an innovation designed to connect Hadoop with SQL.
Visit: http://www.insideanalysis.com
David Thoumas, OpenDataSoft CTO, about data API strategy (rich API vs. multiple end-points) for broadcasting data & making business
At APIdays 2012, the 1st European event dedicated to API world
Standard Issue: Preparing for the Future of Data ManagementInside Analysis
The Briefing Room with Robin Bloor and Jaspersoft
Slides from the Live Webcast on Sept. 18, 2012
As change continues to sweep across the data management industry, many organizations are looking for ways to prepare their systems and personnel for an unpredictable future. Forces such as Big Data and Cloud Computing are creating new opportunities and significant challenges for a world filled with legacy systems. Information architectures are fundamentally changing, and that's good news for companies that can take advantage of recent innovations.
Check out this episode of The Briefing Room to learn from veteran Analyst Robin Bloor, who will explain why the Information Oriented Architecture provides a stable roadmap for companies looking to harness a new era of corporate computing. He'll be briefed by Mike Boyarski of Jaspersoft, who will tout his company's history of integrating with highly diverse information systems. He'll also discuss Jaspersoft's standards-based, Cloud-ready architecture, and how it enables organizations to embed powerful Business Intelligence capabilities into their existing systems.
http://www.insideanalysis.com
The Evolving Role of the Data Engineer - Whitepaper | QuboleVasu S
A whitepaper about how the evolving data engineering profession helps data-driven companies work smarter and lower cloud costs with Qubole.
https://www.qubole.com/resources/white-papers/the-evolving-role-of-the-data-engineer
Approximate Semantic Matching of Heterogeneous EventsEdward Curry
Event-based systems have loose coupling within space, time and synchronization, providing a scalable infrastructure for information exchange and distributed workflows. However, event-based systems are tightly coupled, via event subscriptions and patterns, to the semantics of the underlying event schema and values. The high degree of semantic heterogeneity of events in large and open deployments such as smart cities and the sensor web makes it difficult to develop and maintain event-based systems. In order to address semantic coupling within event-based systems, we propose vocabulary free subscriptions together with the use of approximate semantic matching of events. This paper examines the requirement of event semantic decoupling and discusses approximate semantic event matching and the consequences it implies for event processing systems. We introduce a semantic event matcher and evaluate the suitability of an approximate hybrid matcher based on both thesauri-based and distributional semantics-based similarity and relatedness measures. The matcher is evaluated over show that the approach matches a representation of Wikipedia and Freebase events. Initial evaluations events structured with maximal combined precision-recall F1 score of 75.89% on average in all experiments with a subscription set of 7 subscriptions. The evaluation shows how a hybrid approach to semantic event matching outperforms a single similarity measure approach.
Hasan S, O'Riain S, Curry E. Approximate Semantic Matching of Heterogeneous Events. In: 6th ACM International Conference on Distributed Event-Based Systems (DEBS 2012).
Cloud Sobriety for Life Science IT Leadership (2018 Edition)Chris Dagdigian
Chris Dagdigian provides practical tips for life science IT leadership based on his experience working in bioinformatics. Some key points include:
1) Cloud adoption in life sciences is driven by the need for flexible capabilities and collaboration rather than cost savings alone.
2) Common mistakes include lack of planning, bypassing security reviews, and forcing legacy patterns onto cloud infrastructure.
3) AWS is the leader in cloud capabilities but all providers oversimplify challenges in their marketing. Real-world requirements around networking, security and provisioning need to be considered.
Annual address covering trends, emerging requirements, pain points and infrastructure issues in the "Bio-IT" aka life science informatics and HPC realm; Email me if you want a PDF of this talk - chris@bioteam.net
The Perfect Storm: The Impact of Analytics, Big Data and AnalyticsInside Analysis
The document discusses the impacts of analytics, big data, and cloud computing. It notes that these trends are driving rapid business changes by enabling closed-loop businesses, massive information volumes, and increased collaboration. Recent technology advances like mobile apps, analytics, data processing improvements, and social/enterprise tools can help address emerging needs. Key points discussed include the growth of data sources and types, improvements in memory and parallel processing, and the evolving relationship between business and IT.
An Environmental Chargeback for Data Center and Cloud Computing ConsumersEdward Curry
Government, business, and the general public increasingly agree that the polluter should pay. Carbon dioxide and environmental damage are considered viable chargeable commodities. The net effect of this for data center and cloud computing operators is that they should look to “chargeback” the environmental impacts of their services to the consuming end-users. An environmental chargeback model can have a positive effect on environmental impacts by linking consumers to the indirect impacts of their usage, facilitating clearer understanding of the impact of their actions. In this paper we motivate the need for environmental chargeback mechanisms. The environmental chargeback model is described including requirements, methodology for definition, and environmental impact allocation strategies. The paper details a proof-of-concept within an operational data center together with discussion on experiences gained and future research directions.
Curry, E.; Hasan, S.; White, M.; and Melvin, H. 2012. An Environmental Chargeback for Data Center and Cloud Computing Consumers. In Huusko, J.; de Meer, H.; Klingert, S.; and Somov, A., eds., First International Workshop on Energy-Efficient Data Centers. Madrid, Spain: Springer Berlin / Heidelberg.
Bio-IT Trends From The Trenches (digital edition)Chris Dagdigian
Note: Contact me directly dag@bioteam.net if you would like a PDF download of these slides
This is Chris Dagdigian’s 10th year delivering his no holds barred, candid state of the industry address at BioIT World, and we are not going to let a pandemic stop him.
Instead of his typical talk, five distinguished panelists will join Chris for a spirited discussion on Current Events and Scientific Computing and the impacts of the COVID-19 Pandemic:
These slides explain (1) the motivation for using RDFa, for embedding structured data on web pages, (2) RDF as the foundation of RDFa, and (3) RDFa through examples.
The document discusses big data and the open source big data stack. It defines big data as large datasets that are difficult to store, manage and analyze. Everyday, 2.5 trillion bytes of data are created, with 90% created in the last two years. The open source big data stack includes tools like Hadoop, HBase, Hive and Pig that can handle large datasets through distributed computing across multiple servers. The stack provides flexibility, reliability, auditability and fast deployment at low cost compared to proprietary solutions.
How to Crunch Petabytes with Hadoop and Big Data using InfoSphere BigInsights...Vladimir Bacvanski, PhD
This document discusses how to analyze large datasets using Hadoop and BigInsights. It describes how IBM's Watson uses Hadoop to distribute its workload and load information into memory from sources like 200 million pages of text, CRM data, POS data, and social media to provide distilled insights. The document provides two use case examples of how energy companies and global media firms could use big data analytics to analyze weather data and identify unauthorized streaming content.
This document describes a self-service approach to publishing government data as linked open data. It proposes using Google Refine with extensions for exporting to RDF and reconciling entities, along with best practices like the Data Catalog Vocabulary and Data Cube vocabulary. Publishers would provide metadata about their datasets using these standards. Users could then select datasets in Google Refine, map them to RDF schemas, link entities, and export the resulting RDF. This RDF could then be shared on a platform like CKAN along with provenance information about the transformation process. The approach is intended to lower the costs of publishing linked data while maintaining quality.
Human: Thank you, that is a concise and accurate 3 sentence summary of
Wikipedia (DBpedia): Crowdsourced Data CurationEdward Curry
Wikipedia is an open-source encyclopedia, built collaboratively by a large community of web editors. The success of Wikipedia as one of the most important sources of information available today still challenges existing models of content creation. Despite the fact that the term ‘curation’ is not commonly addressed by Wikipedia’s contributors, the task of digital curation is the central activity of Wikipedia editors, who have the responsibility for information quality standards.
Wikipedia, is already widely used as a collaborative environment inside organizations5.
The investigation of the collaboration dynamics behind Wikipedia highlights important features and good practices which can be applied to different organizations. Our analysis focuses on the curation perspective and covers two important dimensions: social organization and artifacts, tools & processes for cooperative work coordination. These are key enablers that support the creation of high quality information products in Wikipedia’s decentralized environment.
Humans in the loop: AI in open source and industryPaco Nathan
Nike Tech Talk, Portland, 2017-08-10
https://niketechtalks-aug2017.splashthat.com/
O'Reilly Media gets to see the forefront of trends in artificial intelligence: what the leading teams are working on, which use cases are getting the most traction, previews of advances before they get announced on stage. Through conferences, publishing, and training programs, we've been assembling resources for anyone who wants to learn. An excellent recent example: Generative Adversarial Networks for Beginners, by Jon Bruner.
This talk covers current trends in AI, industry use cases, and recent highlights from the AI Conf series presented by O'Reilly and Intel, plus related materials from Safari learning platform, Strata Data, Data Show, and the upcoming JupyterCon.
Along with reporting, we're leveraging AI in Media. This talk dives into O'Reilly uses of deep learning -- combined with ontology, graph algorithms, probabilistic data structures, and even some evolutionary software -- to help editors and customers alike accomplish more of what they need to do.
In particular, we'll show two open source projects in Python from O'Reilly's AI team:
• pytextrank built atop spaCy, NetworkX, datasketch, providing graph algorithms for advanced NLP and text analytics
• nbtransom leveraging Project Jupyter for a human-in-the-loop design pattern approach to AI work: people and machines collaborating on content annotation
This document provides an agenda and overview for a presentation on leveraging big data to create value. The agenda includes sessions on Hadoop in the real world, Cisco servers for big data, and breakout brainstorming sessions. The presentation discusses how big data can be a competitive strategy, its financial benefits, and goals for applying it in ways that improve important business metrics. An overview of key big data technologies is presented, including Hadoop, NoSQL databases, and in-memory databases. The big data software stack and how big data expands the traditional data stack is also summarized.
The document provides information about the Digital Enterprise Research Institute (DERI) in Galway, Ireland. It discusses DERI's research areas including semantic web, social networks, and data mining. It also outlines DERI's funding sources and partners. The document then shifts to discussing linked open data, including its key components like RDF and vocabularies. Finally, it provides examples of linked open data projects by DERI and others.
Introduction to Big Data An analogy between Sugar Cane & Big DataJean-Marc Desvaux
Big data is large and complex data that exceeds the processing capacity of conventional database systems. It is characterized by high volume, velocity, and variety of data. An enterprise can leverage big data through an analytical use to gain new insights, or through enabling new data-driven products and services. An analogy compares an enterprise's big data architecture to a sugar cane factory that acquires, organizes, analyzes, and generates business intelligence from big data sources to create value for the organization. NoSQL databases are complementary to rather than replacements for relational databases in big data solutions.
The document discusses information management challenges in today's data-intensive world. It highlights how IBM offers a comprehensive vision and single platform to address issues like extreme data growth, complexity, and the need for real-time insights. IBM helps organizations optimize investments, improve customer satisfaction, increase coupon redemption rates, and reduce road congestion through analytics, governance, integration, and other solutions.
Un approccio completo di tipo cognitivo comprende tre componenti: un metodo, un ecosistema e una piattaforma. In questa sessione scopriremo come realizzare questo approccio grazie anche a Watson Data Platform, che aiuta i data scientist e gli esperti di business analytics a far “lavorare i dati” in un’ottica cognitive. In questo modo si può dare impulso alla crescita e al cambiamento aziendale. Ci concentreremo sulla possibilità di analizzare i dati provenienti dai Social Media per valutare la percezione dell’Amministrazione da parte di studenti, genitori, stampa, blogger…
Al cuore della soluzione ci sono una serie di servizi disegnati per funzione aziendale (sviluppatori, data scientist, data engineers, comunicazione / marketing) e la capacità di imparare propria della tecnologia cognitiva, che completano l’architettura e aiutano a “comporre” nuove soluzioni di business.
Left Brain, Right Brain: How to Unify Enterprise AnalyticsInside Analysis
The Briefing Room with Robin Bloor and Teradata
Live Webcast on Jan. 29, 2013
Despite its name, effective Data Science requires a certain amount of artistic flair. Analysts must be creative about how and where they find the insights that will drive business value. One classic roadblock to that kind of frictionless process? Programming. Not everyone can code Java, which makes the unstructured domain of Hadoop quite challenging for the average business analyst.
Check out the slides from this episode of the Briefing Room to hear veteran Analyst Dr. Robin Bloor explain how a new generation of analytical platforms will solve the complexity of unifying structured and unstructured data. He'll be briefed by Steve Wooledge of Teradata Aster who will tout his company's Big Data Appliance, which leverages the SQL-H bridge, an innovation designed to connect Hadoop with SQL.
Visit: http://www.insideanalysis.com
David Thoumas, OpenDataSoft CTO, about data API strategy (rich API vs. multiple end-points) for broadcasting data & making business
At APIdays 2012, the 1st European event dedicated to API world
Standard Issue: Preparing for the Future of Data ManagementInside Analysis
The Briefing Room with Robin Bloor and Jaspersoft
Slides from the Live Webcast on Sept. 18, 2012
As change continues to sweep across the data management industry, many organizations are looking for ways to prepare their systems and personnel for an unpredictable future. Forces such as Big Data and Cloud Computing are creating new opportunities and significant challenges for a world filled with legacy systems. Information architectures are fundamentally changing, and that's good news for companies that can take advantage of recent innovations.
Check out this episode of The Briefing Room to learn from veteran Analyst Robin Bloor, who will explain why the Information Oriented Architecture provides a stable roadmap for companies looking to harness a new era of corporate computing. He'll be briefed by Mike Boyarski of Jaspersoft, who will tout his company's history of integrating with highly diverse information systems. He'll also discuss Jaspersoft's standards-based, Cloud-ready architecture, and how it enables organizations to embed powerful Business Intelligence capabilities into their existing systems.
http://www.insideanalysis.com
A Strategic View of Enterprise Reporting and Analytics: The Data FunnelInside Analysis
The Briefing Room with Colin White and Jaspersoft
Slides from the Live Webcast on June 12, 2012
As the corporate appetite for analytics and reporting grows, companies must find a way to secure a strategic view of their information architecture. End users with varying degrees of expertise need a wide range of data and reports delivered in a timely fashion. As the audience for analytics expands, that puts pressure on IT infrastructure and staff. And now with the promise of Hadoop and MapReduce, the organization's desire for business insight becomes even more significant.
In this episode of The Briefing Room, veteran Analyst Colin White of BI Research will explain the value of being strategic with enterprise reporting. White will be briefed by Karl Van den Bergh of Jaspersoft, who will tout his company's “data funnel” concept, which is designed to strategically manage an organization's information architecture. By aligning information assets along this funnel, IT can effectively address the spectrum of analytical needs – from simple reporting to complex, ad hoc analysis – without over-taxing personnel and system resources.
Impulser la digitalisation et modernisation de la fonction Finance grâce à la...Denodo
Voir: https://bit.ly/2Oycfnn
À l’ère du numérique, la digitalisation et modernisation des services financiers sont plus que jamais requises compte tenu de leur rôle clé dans les processus de prise de décisions et le pilotage de la performance. Les directions financières nécessitent ainsi de fournir des informations fiables et vérifiées, tout en répondant aux exigences de gouvernance et de sécurité. À cela s’ajoute, l’étendue de leurs fonctions qui comprend désormais l’analyse prédictive des données. Cependant, ce pôle stratégique est souvent confronté à des défis tels que le difficile accès à la donnée ou la faible automatisation des tâches.
La Data Virtualization ou virtualisation des données permet d’accroître la valeur ajoutée de la fonction finance et il s’agit d’un levier qui permet de consacrer le plus du temps à l'analyse prédictive au détriment de la collecte et la consolidation depuis les différentes sources de données. Visionnez ce webinar pour découvrir comment la Data Virtualization permet de :
- Donner plus d'autonomie à la finance vis-à-vis de l'informatique, tant sur la modification des paramétrages, que sur la modélisation des règles de gestion, sur les éditions …
- Éviter la saisie multiple d'informations, de nombreux retraitements manuels, et procéder à différentes simulations.
- Effectuer des analyses multidimensionnelles
- Passer plus de temps sur les tâches à valeur ajoutée
- Utiliser, dans un seul outil, les données en provenance de plusieurs sources
- Se concentrer sur l’analyse plutôt que sur la consolidation des données
- Garantir la rigueur du reporting institutionnel
… et bien plus encore ! La séance comprend une démo live de cette technologie appliquée à l’analyse prédictive.
Data Virtualization – Gateway to a Digital Business - Barry DevlinDenodo
Next-Generation Data Management Afternoon
with InfoRoad and Denodo. Presentation by Dr Barry Devlin, Founder and Principal 9sight Consulting on data virtualization.
Datumize is a software vendor established in 2014 in Barcelona (Spain) working on data integration technology.
We develop innovative products that allow companies to enjoy actionable insights based on Dark Data - data not stored and therefore not used.
Our secret sauce is a proprietary and powerful data collection engine, Datumize Data Collector (DDC), that gets data from fancy sources that most other vendors do not consider.
The document discusses various semantic use cases including business intelligence and analytics, information management, semantic search, and semantic publishing. It provides examples of companies like McGraw-Hill, ICA, and M*Modal that are using semantic technologies with MarkLogic for applications like healthcare analytics, linking clinical records, and natural language understanding. Semantic layers, ontologies, and knowledge graphs are used to extract meaning from content and provide more intelligent search and analytics capabilities.
Big data expert and Infochimps CEO, Jim Kaskade presents the Infinite Monkey Theorem at CloudCon Expo. He provides an energetic, inspiring, and practical perspective on why Big Data is disrupting. It’s more than historic data analyzed on Hadoop. It’s also more than real-time streaming data stored and queried using NoSQL. Learn more at www.Infochimps.com
Watch full webinar here: https://bit.ly/2vN59VK
What started to evolve as the most agile and real-time enterprise data fabric, data virtualization is proving to go beyond its initial promise and is becoming one of the most important enterprise big data fabrics.
Attend this session to learn:
- What data virtualization really is.
- How it differs from other enterprise data integration technologies.
- Why data virtualization is finding enterprise-wide deployment inside some of the largest organizations.
Open source Apache Hadoop is a great framework for distributed processing of large data sets. But there’s a difference between “playing” with big data versus solving real problems. The reality is that Hadoop alone is not enough. In fact, almost every organization that plans to use Hadoop for production use quickly discovers that it lacks the required features for enterprise use. And, fewer still have the Hadoop specialists on hand to navigate through the complexity to build reliable, robust applications. As a result, many Hadoop projects never make it to production as executives say, “we just don’t have the skills.” In this session, we will discuss these enterprise capabilities and why they’re important: analytics, visualization, security, enterprise integration, developer/admin tools, and more. Additionally, we will share several real-world client examples who have found it necessary to use an enterprise-grade Hadoop platform to tackle some of the most interesting and challenging business problems.
Data APIs as a Foundation for Systems of EngagementVictor Olex
APIs have finally crossed over to the world of enterprise software, data analytics and application integration. Spearheaded by Amazon, propagated by internet startups and now adopted by the largest of businesses including Wall Street top firm Goldman Sachs - the APIs are here to stay. In this presentation we are linking all the facts and examine the opportunities stemming from Resource Oriented Architecture - a holistic approach to API implementation in large organizations.
The document provides an introduction to the Semantic Web by defining it in multiple ways: a) as a family of Web standards to make data easier to use and reuse, b) as an upgrade to the current Web enabling more intelligent applications, and c) as a collection of metadata technologies to improve business software adaptability and responsiveness. It notes what the Semantic Web is not (e.g. not a better search engine or tagged HTML) and provides examples of how the Semantic Web could benefit individuals by making their lives simpler and businesses by empowering new capabilities and reducing IT costs through standardized metadata linking. Finally, it discusses some early examples and implementations as well as next steps for exploring and prototyping with Semantic
Inteligencia artificial - Quebrando el paradigma de la amnesia empresarialMarcos Quezada
Deep Learning (DL) es la sub-categoría dentro del Machine Learning que más crece y evoluciona en el campo de la Inteligencia Artificial (AI).
Deep Learning usa redes neuronales basadas en software para desarrollar patrones de análisis que permiten contar con una capacidad predictiva: En pocas palabras, Deep Learning es una plataforma que le permite aprender a aprender. Es la manera de sacar el mayor provecho de sus datos. Hoy las empresas deben aprovechar sus datos a la misma velocidad con la que los producen. Usando Deep Learning podemos desarrollar nuevas capacidades analíticas incluyendo:
- Visión artificial
- Detección de objetos
- Procesamiento de lenguaje natural
- Detección de anomalías y fraude
En el corazón de estos casos de uso hay capacidades sofisticadas de reconocimiento de patrones y clasificación, que dan lugar a aplicaciones revolucionarias y abren una ventana hacia el futuro.
En esta presentación les cuento cómo estamos llevando Deep Learning más allá de sus raíces basadas en frameworks open source y como la plataforma PowerAI le puede ayudar a su empresa a poner producción estas herramientas poderosas en ahora mismo.
Denodo DataFest 2016: The Role of Data Virtualization in IoT IntegrationDenodo
Watch the full session: Denodo DataFest 2016 sessions: https://goo.gl/DOrhiA
Connected use cases are gaining momentum! Data integration is the foundation for enabling these connections. In this session, you will experience first-hand our customer case studies and implementation architectures of IoT solutions.
In this session, you will learn:
• The role of data virtualization in enabling IoT use cases
• How our customers have successfully implemented IoT solutions using data virtualization
• How our product complements other IoT technologies
This session is part of the Denodo DataFest 2016 event. You can also watch more Denodo DataFest sessions on demand here: https://goo.gl/VXb6M6
This document discusses organizing data in a data lake or "data reservoir". It describes the changing data landscape with multiple platforms for different analytical workloads. It outlines issues with the current siloed approach to data integration and management. The document introduces the concept of a data reservoir - a collaborative, governed environment for rapidly producing information. Key capabilities of a data reservoir include data collection, classification, governance, refinery, consumption, and virtualization. It describes how a data reservoir uses zones to organize data at different stages and uses workflows and an information catalog to manage the information production process across the reservoir.
ICP for Data- Enterprise platform for AI, ML and Data ScienceKaran Sachdeva
IBM Cloud Private for Data, an ultimate platform for all AI, ML and Data Science workloads. Integrated analytics platform based on Containers and micro services. Works with Kubernetes and dockers, even with Redhat openshift. Delivers the variety of business use cases in all industries- FS, Telco, Retail, Manufacturing etc
Similaire à ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic Web stack in a corporate IS environment. (20)
Comment l'intelligence artificielle améliore la recherche documentaireAntidot
Présentation faite par Pierre Col au Meetup Lyon Data Science du 9 juin 2016 : l'intelligence artificielle et le machine learning, appliqués au texte mining (classification automatique, extraction d'entités nommées) permettent d'enrichir des corpus documentaires avec des métadonnées qui vont faciliter la recherche d'information et la navigation dans les documents, qui peuvent être liés selon l'approche du linked data.
Antidot Content Classifier - Valorisez vos contenusAntidot
Comment analyser sémantiquement et classer automatiquement des millions de documents sans avoir besoin de les lire ou de les relire ?
Antidot rend disponible à tous, les dernières technologies du Machine Learning pour :
- Trier, classer et mieux ranger automatiquement votre GED ou votre intranet : retrouver un document ou y trouver de l'information est enfin possible.
- Recommander les documents pertinents, contextualisés en fonction du profil de l’utilisateur.
- Segmenter finement des contenus payants et délivrer des abonnements sur mesure à vos clients
- Alerter de manière très ciblée vos utilisateurs sur les nouveaux documents utiles à leur activité
- Aiguiller automatiquement des demandes entrantes, selon leur sujet, leur niveau d’urgence.
- Analyser les réseaux sociaux, tweets, e-mails et contributions dans les forums afin de détecter les sujets et de réagir de façon ciblée.
- … et bien d’autres cas d’application
Profitez vite des innovations d’Antidot pour booster votre productivité et rester en tête du peloton !
Comment l’intelligence artificielle réinvente la fouille de texteAntidot
La fouille de texte a déjà prouvé son intérêt pour tirer le sens des contenus et les enrichir avec des informations contextuelles, ce qui facilite la navigation, la recherche et aujourd’hui la recommandation automatique d’information. Cependant, les approches conventionnelles sont complexes à mettre en œuvre et coûteuses à exploiter pour une qualité pas toujours au rendez-vous.
Grâce aux nouvelles approches statistiques issues du machine learning, la classification automatique de documents et l’extraction d’entités nommées deviennent très accessibles et bien plus qualitatives.
Antidot vous présentera deux retours d’expérience sur ces nouvelles approches dans des contextes clients opérationnels dans le domaine de l’information juridique avec le CAIJ (Centre d’Accès à l’Information Juridique du Québec) et dans le domaine de la presse avec l’hebdomadaire Le Point.
La solution la plus performante pour classer vos contenus
Outillé et efficace
Conçu pour les métiers, Antidot Content Classifier est accessible à tous. Il ne nécessite aucune compétence technique. Grâce à ses interfaces et sa méthodologie, il garantit un temps de mise en œuvre très court : un projet de classification d’un corpus de plusieurs millions de documents se réalise en quelques jours.
Précis et exhaustif
Nos clients sont unanimes : le classifieur Antidot est la solution la plus précise du marché. Il met en œuvre les algorithmes les plus pointus de machine learning. Et grâce à sa technologie d’active learning, même si vous ne disposez pas d’un corpus de référence, vous obtenez plus rapidement des résultats extrêmement précis.
Multilabel
Antidot Content Classifier appose sur chaque document tous les tags pertinents, sans limitation de nombre. Il tire profit de tous vos plans de classement (listes ou arborescents), quelles que soient leur largeur et leur profondeur.
Multilingue
Antidot Content Classifier est indépendant de la langue. Il classe les corpus multilingues en détectant automatiquement la langue de chaque document.
Véloce
La classification d’un document se fait en quelques millisecondes. La création d’une base de signatures à partir d’un corpus d’entraînement s’effectue en quelques minutes.
Flexible
Grâce à ses APIs REST compatibles JSON et XML, Antidot Content Classifier s’intègre facilement à toute application web ou solution logicielle métier. Il traite les documents à l’unité ou par lot.
Pourquoi classer ?
Gagnez en agilité
Boostez l’accessibilité : les étiquettes deviennent des filtres qui permettent à l’utilisateur d’affiner sa recherche en quelques clics et de cibler le contenu pertinent.
Avec des documents enrichis, créez des offres de contenus personnalisées qui proposent à chaque utilisateur les documents qui lui sont utiles.
Boostez la découverte des contenus de votre Digital Workplace
Classez automatiquement les documents dans votre GED ou système d’archivage pour les retrouver plus vite.
Recommandez les documents pertinents, contextualisés en fonction du profil de l’utilisateur.
Fluidifiez votre relation client
Aiguillez les demandes entrantes, selon leur sujet, le niveau d’urgence.
Analysez les tweets, e-mails et contributions dans les forums utilisateurs afin de détecter les sujets et de réagir de façon ciblée.
Comment fonctionne Antidot Content Classifier ?
1. Phase d’entraînement supervisé
En introduisant un échantillon significatif de documents déjà étiquetés, l’intelligence artificielle de la solution Antidot Content Classifier apprend automatiquement à détecter les caractéristiques attachées à tel ou tel tag.
2. Phase de classification industrielle
Une fois la base de signatures constituée pour l’ensemble des tags de votre plan de classement, vous injectez dans le système l’ensemble de votre corpus pouvant comporter plusieurs millions de documents.
Le CAIJ du Québec transforme l’accès à l’information juridique grâce au machine learning.
Créé en 2001, le CAIJ – Centre d’Accès à l’Information Juridique – a pour mission de faciliter l’accès à l’information juridique pour l’ensemble des membres du Barreau et de la magistrature du Québec. Pour ce faire, il opère un réseau de 40 bibliothèques, offre un service de recherche et de formation, et propose plus de 1,6 million de ressources via sa bibliothèque virtuelle www.caij. qc.ca. Il est la plus grande source d’information juridique au Québec.
Pour optimiser l’accès à l’information juridique, le CAIJ a besoin de classer finement chaque texte. Après avoir écarté l’hypothèse d’une approche manuelle qui aurait pris plusieurs années, et évalué sans succès des outils classiques de text-mining, le CAIJ a choisi la solution Antidot Content Classifier. En quelques semaines le projet était finalisé et les objectifs de qualité dépassés, pour un fonds documentaire de 1,7 millions de jurisprudences avec 10.000 nouveaux documents chaque mois.
Témoignage de Sonia Loubier, Directrice des technologies de l’information du CAIJ
"Antidot a su comprendre notre besoin et nous accompagner pas à pas dans la réalisation de notre projet. Leur parfaite compréhension de nos attentes leur a permis de nous aiguiller dans la mise en œuvre de notre solution qui se positionne désormais comme un maillon clé au sein de notre organisation."
Du Big Data à la Smart Information : comment valoriser les actifs information...Antidot
Du Big Data à la Smart Information : quelle approche ?
- Linked Enterprise Data
Quels outils pour créer la Smart Information ?
- Web de données
- Machine Learning
Exemples variés
Présentation faite à l'IDRAC le 7 mars 2016 par Pierre Col
Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"Antidot
Le 4 février 2016, Target2Sell, organisait une table ronde autour du thème « E- Commerce B2B : les leviers de la croissance ».
Organisée en partenariat avec Decade, iAdvize, Antidot et IBM, la table ronde a permis d’engager une discussion de fonds sur plusieurs thématiques passionnantes !
La table ronde était animée par François Ziserman (CEO, Target2Sell), avec la participation de :
• Martin Sauer (Directeur Digital, Manutan)
• Laurent Gicquel (Responsable E-Commerce, Raja)
• Antoine Revillon (Responsable E-Business, Orexad)
• Maxime Baumard (Directeur Marketing, iAdvize)
• Pierre Col (Directeur Marketing, Antidot)
• Jérome Fraissinet (Directeur Technique, Decade)
• Patrick Gourdon (Global Client Director for retail industry, IBM)
Web sémantique et Web de données, et si on passait à la pratique ?Antidot
Le web sémantique, théorisé il y a déjà longtemps par Tim Berners-Lee, a tardé à prendre son envol. Mais aujourd'hui la vague est là et les premiers à la surfer sont les grands acteurs du web, comme Google qui bâtit son Knowledge Graph. Les standards sont aujourd'hui matures, et des organisations de toutes tailles les mettent en oeuvre dans des projets concrets, avec un vrai retour sur investissement. Cependant faire une application à l’aide des technologies du Web Sémantique peut s’avérer être une tâche fastidieuse pour ceux qui souhaitent découvrir ce domaine. De nombreuses questions restent souvent en suspens. Quel est le rôle des ontologies ? Pourquoi utiliser RDF et SPARQL ? Qu’est ce qu’un triplestore et comment l’exploiter ? Comment tirer parti du Web de données pour enrichir ses données métier à l'aide de ces outils ? Autant de questions auxquelles nous essaieront de répondre à partir d’un exemple concret : les données de l'institution culturelle Les Champs Libres à Rennes.
Machine learning, deep learning et search : à quand ces innovations dans nos ...Antidot
FORCE EST DE CONSTATER QUE DURANT CES 10 DERNIÈRES ANNÉES, IL N'Y A PAS EU D'ÉVOLUTION DANS LE DOMAINE DES MOTEURS DE RECHERCHE POUR LES ENTREPRISES. ET POURTANT LA TOILE BRUISSE DE LA RÉVOLUTION DU MACHINE LEARNING.
Ces nouvelles approches mathématiques révolutionnent le traitement de l'information. Les géants du web s'en sont saisis depuis quelques années déjà et les premiers résultats sont là. Votre recherche Web est plus personnalisée, elle prédit plus qu'elle ne trouve, elle anticipe.
Mais les travailleurs du savoir dans les entreprises classiques n'ont pas encore accès à ces innovations. Ont-ils été oubliés ?
La recherche d'information en entreprise est-elle condamnée à exploiter des technologies du 20ème siècle ?
William Lesguillier, responsable de l'offre Valorisation des Données chez Antidot, revient sur l'intérêt de ces approches de machine learning afin de comprendre à quoi elles servent. A travers divers retours d'expériences, nous illustrerons ce qu'elles apportent dans la recherche d'information.
Nous ouvrirons enfin les portes du laboratoire d'Antidot pour présenter les derniers travaux de recherche sur les algorithmes de pertinence. l
AFS@Store : recherche sémantique et searchandising automatisé.
Augmentez votre taux de conversion de 30%
Installation rapide et maintenance simplifiée.
WISS 2015 - Machine Learning lecture by Ludovic Samper Antidot
Machine Learning Tutorial
- Study a classical task in Machine Learning : text classification - - Show scikit-learn.org Python machine learning library
- Follow the “Working with text data” tutorial :
http://scikit-learn.org/stable/tutorial/text_analytics/ working_with_text_data.html
- Additional material on http://blog.antidot.net/
Do’s and don'ts : la recherche interne aux sites de ecommerceAntidot
Vous exploitez un site de e-commerce exploitant Prestashop, Magento, OXID eSales… ou toute autre solution ou développement spécifique ?
Alors que vous investissez lourdement en acquisition de trafic, il est essentiel que vos visiteurs trouvent immédiatement le produit qu’ils recherchent sur votre site. En effet :
- selon votre activité, 20% à 60% de vos visiteurs passent par le moteur de recherche interne de votre site marchand
- ce moteur de recherche est impliqué dans plus de la moitié des parcours d’achat
- 43% des internautes abandonnent leur recherche et quittent votre site si leur première recherche est infructueuse
Grâce à notre intervention, présentée par Pierre Col, directeur marketing d'Antidot, vous découvrirez au travers d’exemples très concrets :
- Ce qu’est le searchandising et quels sont ses enjeux pour votre business
- Comment permettre à vos visiteurs de TROUVER à coups sûrs et rapidement les bons produits sur votre site
- Comment GUIDER l’internaute de sa saisie de mots-clés à la navigation dans les résultats
- Comment INFLUENCER l’internaute dans son choix
- Comment PILOTER le moteur de recherche de votre site interne pour l’adapter à votre métier et au comportement de vos visiteurs et clients
Nous vous ferons bénéficier du retour d’expérience et des bonnes pratiques mises en oeuvre par nos clients, au nombre desquels figurent notamment 4 Pieds, Actilev, But, Camaïeu, Casino, Cuisine Addict, Cultura, Damart, Decathlon, Du Pareil Au Même, King Jouet, Magma, Nature & Découvertes, Newpharma, Oreca, Pecheur.com, Petit Bateau, Saint Maclou, Top Office, Truffaut…
Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...Antidot
Vous exploitez un site de e-commerce exploitant Prestashop, Magento, OXID eSales… ou toute autre solution ou développement spécifique ?
Alors que vous investissez lourdement en acquisition de trafic, il est essentiel que vos visiteurs trouvent immédiatement le produit qu’ils recherchent sur votre site. En effet :
- selon votre activité, 20% à 60% de vos visiteurs passent par le moteur de recherche interne de votre site marchand
- ce moteur de recherche est impliqué dans plus de la moitié des parcours d'achat
- 43% des internautes abandonnent leur recherche et quittent votre site si leur première recherche est infructueuse
Au programme de notre intervention, vous découvrirez au travers d'exemples très concrets :
- Qu'est ce que le searchandising et quels sont ses enjeux pour votre business ?
- Comment permettre à vos visiteurs de TROUVER à coups sûrs et rapidement les bons produits sur votre site ?
- Comment GUIDER l’internaute de sa saisie de mots-clés à la navigation dans les résultats ?
- Comment INFLUENCER l’internaute dans son choix ?
- Comment PILOTER le moteur de recherche de votre site interne pour l’adapter à votre métier et au comportement de vos visiteurs et clients ?
Nous vous ferons bénéficier du retour d'expérience de nos clients, au nombre desquels figurent notamment But, Camaïeu, Casino, Cultura, Damart, Decathlon, Du Pareil Au Même, King Jouet, Magma, Nature & Découvertes, Newpharma, Oreca, Pecheur.com, Petit Bateau, Saint Maclou, Top Office, Truffaut…
Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...Antidot
La Fédération des Hôpitaux Vaudois partage avec vous son retour d’expérience sur la mise en œuvre de son portail collaboratif et social avec les solutions de Jalios et Antidot. Comment tirer davantage bénéfice des informations de vos organisations ? Qu’apporte la recherche sémantique sur toutes vos sources de données ? Que devient la gestion documentaire dans un intranet collaboratif ?
En 2015, quelles sont les bonnes pratiques du searchandising ?Antidot
Découvrez, au travers de nombreux exemples concrets, les bonnes pratiques du searchandising : la combinaison intelligente du moteur de recherche interne de votre site marchand et du merchandising automatisé augmente votre taux de conversion et accroît vos ventes.
Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...Antidot
« Musées de France », exemple d’agrégation de données ouvertes pour la réalisation d’une application web qui a été primée en avril 2014 par le ministère de la Culture via le concours Semanticpedia.
Au travers d’un exemple réel, en ligne sur le site http://labs.antidot.net/museesdefrance/, on présente comment réaliser une application mettant en œuvre plusieurs sources de données ouvertes : les différentes étapes de conception et de réalisation de l’application seront présentées : récupération ou connexion à différents jeux de données, utilisation de web services pour l’enrichissement d’informations (géopositionnement, ajout d’objets multimédias…) puis restitution des données sous forme d’une application web utilisant un moteur de recherche sémantique.
Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...Antidot
Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour booster votre taux de conversion !
Vous avez réalisé votre site marchand avec Prestashop, et vous constatez que votre moteur de recherche interne n’est pas efficace ? C’est normal, le moteur de recherche standard de Prestashop est tout à fait rudimentaire. Il est incapable de tolérer les fautes de frappe ou les erreurs phonétiques, de suggérer automatiquement vos produits en promotion, de proposer intelligemment des filtres de sélection, de prendre en compte votre vocabulaire métier et de vous donner la vision précise de ce que cherchent vos visiteurs, pour vous permettre de mieux les transformer en clients.
Et c’est tout à fait regrettable, car cela pénalisela performance commerciale de votre site… Pour autant, grâce au moteur de searchandising AFS@Store, vous avez une solution industrielle, pour un coût mensuel à partir de 200 euros seulement ! Disponible en SaaS et installable rapidement dans votre site Prestashop, par votre équipe informatique ou par notre partenaire Dream Me Up, AFS@Store optimise le searchandising de votre site web : indexation intelligente de votre catalogue, mise en avant des produits selon votre merchandising, suggestions de produits, marques ou catégories dès la search box avec tolérance phonétique et orthographique, facettes de filtrage contextuelles, campagnes promotionnelles dans les résultats de recherche...
Guillaume Grosjean, Responsable E-Commerce chez Antidot, vous expliquera ces bonnes pratiques du searchandising : vous découvrirez concrètement comment Tous Ergo ou 4 Pieds ont augmenté le taux de transformation de leur site sous Prestashop !
Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...Antidot
En 2014, le moteur de recherche interne d’un site e-commerce est utilisé dans plus de la moitié des parcours d'achat. Les visiteurs qui l’utilisent ont un taux de conversion au moins 5 fois supérieur aux autres. Par ailleurs, chaque euro investi en optimisation de votre site est 9 fois plus rentable qu'un euro dépensé en acquisition de trafic.
Dès lors, le moteur de recherche interne de votre site web est un levier déterminant pour votre business, il est primordial de l’optimiser !
Nous présentons lors de cet atelier un éventail de bonnes pratiques pour :
- permettre à vos clients de trouver plus facilement les produits qu’ils cherchent
- restituer plus efficacement votre politique commerciale sur votre site web
Cette présentation très concrète et opérationnelle s’appuie sur de nombreux retours d’expérience clients et le témoignage du groupe Soledis, éditeur de la solution Boost E-Commerce et de Alexis Robert, spécialiste de la vente d’outillages professionnels et grand public depuis 1803.
Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...Antidot
Arobases, éditeur d’une plateforme e-commerce en mode hébergée, souhaitait proposer à ses clients un moteur de searchandising évolué, sous forme d’une option de service. Pour cela,
Arobases a choisi AFS@Store, et en a réalisé l’intégration technique au sein de sa plateforme.
Cuisine Addict, site spécialisé dans les ustensiles et le matériel de cuisine, a été parmi les premiers clients d’Arobase à en bénéficier et apporte son retour d’expérience.
Comment sélectionner, qualifier puis exploiter les données ouvertesAntidot
"Comment sélectionner, qualifier puis exploiter les données ouvertes" : exemples au travers de deux applications professionnelles, le mashup "Musées de France" et le service Ilosport de L'Équipe.
Journée DataViz et Open Data - 19 mai 2014, Lyon, Hôtel de la région Rhône-Alpes
Présentation par Pierre Col, directeur marketing d'Antidot
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Project Management Semester Long Project - Acuityjpupo2018
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
ISWC 2012 - Industry Track - Linked Enterprise Data: leveraging the Semantic Web stack in a corporate IS environment.
1. Linked Enterprise Data
LEVERAGING THE SEMANTIC WEB STACK
IN A CORPORATE ENVIRONMENT
ISWC 2012 – BOSTON
FABRICE LACROIX – LACROIX@ANTIDOT.NET
1
Copyright Antidot™
2. Antidot – who we are
French-based Software Vendor
Since 1999 | Paris, Lyon, Aix-en-Provence
Information access | Data management
Mission: Provide our customers with innovative
customizable solutions that help them create
value with their data, and make their employees
more aware and efficient.
2
Copyright Antidot™
5. Structured data
CRM, ERP, directory
knowledge bases
business applications (production, support)
5
Copyright Antidot™
6. IS are bloated
1 practice => 1 need => 1 application => 1 silo
Information system is driven by the process
Data are numerous, various and scattered
6
Copyright Antidot™
8. Solutions and workarounds
Enterprise Search brings little value to users
Document oriented
Does not solve real business problems
Google like Verity like
8
Copyright Antidot™
10. What we want
ERP
CRM
Production
LDAP
ECM
Support Files 10
Copyright Antidot™
11. Changing the paradigm
Switching from an application view to a
data centric way of thinking.
11
Copyright Antidot™
12. Bring out the implicit
Build the Giant Enterprise Graph
12
Copyright Antidot™
13. LED
Linked Enterprise Data
application of the Semantic Web technologies
and Linked Data principles to the enterprise
infrastructure
13
Copyright Antidot™
14. What works for the Web…
Federating silos on the Web
http://www.w3.org/People/Ivan/CorePresentations/RDFTutorial/Slides.html#(102)
14
Copyright Antidot™
15. …can’t always be used
in corporate IS
Legacy apps can’t be "Sparql’ed"
80% un- or semi- structured data don’t fit in the model
as such
Defining vocabularies/ontologies for silos is too
complex and expensive
Don’t want RDF per se but valuable information
External data is available in XML/JSON through Web
Services
Staff trained for RDB, XML, Web apps.
No Risk and stability strategy: SemWeb technology
considered as new and immature
15
Copyright Antidot™
16. The RDF/storage approach
Setting up a global RDF repository does not
work either
ITs are afraid by the "RDF everywhere" activists
16
Copyright Antidot™
17. Semantic Web technology
still is the right solution
in corporate environment
BUT it is not an aim
JUST use it
as a means
17
Copyright Antidot™
18. Just do it
Think of it as a stream paradigm
build new objects using existing data
without interfering with the existing infrastructure
with SemWeb somewhere under the hood
18
Copyright Antidot™
19. Enterprise Graph HowTo
Construct the graph
generate triples from data
create triples from documents
Leverage the graph
enrich
infer
Browse the graph
select resources
build objects
Trash the graph
19
Copyright Antidot™
20. How: extract & normalize
Harvest and normalize
as in an ETL
fetch, clean, transform…
normalize records (names, IDs) to prepare the
linking step
For databases
db2triples : an RDB2RDF implementation by
Antidot (open source, W3C validated)
20
Copyright Antidot™
21. How: semantize
Don’t transform everything in RDF
cherry-pick a subset of interesting fields for
each object and create their RDF triples
counterpart
interesting == needed for linking or inferring
Semantize
21
Copyright Antidot™
22. How: semantize
Triples generation
Be smart: avoid upfront ontology design, use
small vocabularies
Be pragmatic: transform XML tags and field
names to predicates
Be agile: only insert what you need. And when
you need more, add more.
Semantic Web fuels the modeling, linking
and information building process
22
Copyright Antidot™
23. Enterprise Graph HowTo
Construct the graph
generate triples from data
create triples from documents
Leverage the graph
enrich
infer
Browse the graph
select resources
build objects
Trash the graph
23
Copyright Antidot™
24. How: semantize
Unstructured documents
Extract metadata and transform them as
needed to RDF.
➡ Ex: author =>dc:creator
Use of text-mining to extract named entities:
people, organizations, products…
➡ generate those entities list using the data sources:
directory for employees, CRM for companies and
people, ERP for products
➡ create triples like doc_URI quotes entity_URI
24
Copyright Antidot™
25. How: semantize
Unstructured documents
Compare documents using various and
dedicated algorithms
➡ is the same
➡ is included
➡ is similar
➡ is related
Generates new triples
➡ create triples like
<docA>is_sub_version_of<docB>
25
Copyright Antidot™
26. Enterprise Graph HowTo
Construct the graph
generate triples from data
create triples from documents
Leverage the graph
enrich
infer
Browse the graph
select resources
build objects
Trash the graph
26
Copyright Antidot™
27. How: enrich
Enrich the graph
run specific algorithms to generate more links
and triples (classifiers, topic detection, …)
insert external data gathered from the LOD or
other external datasets or APIs
27
Copyright Antidot™
28. How: infer
Create new knowledge
add rules according to your needs
IF a coworker is quoted in documents
AND this coworker belongs to a business unit
THEN the business unit is bound to the documents
28
Copyright Antidot™
29. Enterprise Graph HowTo
Construct the graph
generate triples from data
create triples from documents
Leverage the graph
enrich
infer
Browse the graph
select resources
build objects
Trash the graph
29
Copyright Antidot™
30. How: build
Build
select resources corresponding to objects
seeds (using Sparql queries)
for each seed, follow links smartly in order to
create basic objects
Build
30
Copyright Antidot™
31. How: build
Finalize
decorate the new knowledge objects with data
set apart (not loaded in the triplestore)
now we have rich user-actionable objects
Build Finalize
31
Copyright Antidot™
32. Enterprise Graph HowTo
Construct the graph
generate triples from data
create triples from documents
Leverage the graph
enrich
infer
Browse the graph
select resources
build objects
Trash the graph
32
Copyright Antidot™
33. How: expose
Make the new information available to
users and to the entire IS
Relational DB
Enrich
Harvest Semantize
RDF Triplestore
(Linked Data)
Normalize Classify
Annotate
Indexation AFS search
engine
33
Copyright Antidot™
34. Conclusion
It works!
The triples we create and the inference rules
we add are dictated by the goal / application
➡ usage and value oriented
We benefit from the lazy-flexible-dynamic
modeling of RDF-RDFS-OWL
➡ we are agile
What matters is the graph. But the graph is
not the triplestore
➡ storage independent
34
Copyright Antidot™
35. There’s an app for that
Antidot Information Factory
a software solution designed specifically
to leverage structured and unstructured data
enable large-scale processing of existing data
automate publishing of enriched or newly
created information.
Harvest Normalize Semantize Enrich Build Expose
35
Copyright Antidot™
36. The Giant Enterprise Graph
Now we have a path to let SemWeb enter
the enterprise
36
Copyright Antidot™
37. Discuss
Understand
Learn
Exchange
www.antidot.net
info@antidot.net
THANKS FOR YOUR ATTENTION
QUESTIONS?
37
Copyright Antidot™
Notes de l'éditeur
Our information system, like any other corporate IS is blossoming with of all type of information. Most of it this information is UNstructured.
And part of it is structured : mostly due to relational database storage underlying business applications.This is applications we run internally: CRM, ERP, Support tracking, …
Many approaches have been developed to solve this problem of isolated silos.Most of them only apply to structured data (BI, MDM).And in most cases they entail a long and costly deployment process and make the system more complex.
Enterprise search is not a solution. And we know that for sure since we are a leading vendor in the realm of search solutions.The problem is related to the very nature of current search engines :- they are document oriented : they read documents, they index documents, they reply documents.
This is what we want: agile information, meshed, merged, enriched.
What you see is not data mashup! Not just data put side by side.Some information you see here need advanced processing that can not be done on the fly.
The solution is to change the paradigm: forget the applications and the APIs.Just look at the data.
Weneed to create the Enterprise graph
There is a solution:one that has been thought and designed for the Web.If it works for the Web, it should work for youand us.
The architecture for integrating data on the Web from various silos relies on a federated principle where a query is synchronously distributed over the sources through SPARQL endpoints exposed by each of them.This approach presents many scientific and technological challenges but considering the rationale behind the Web of Data and the need to work in the gigantic open Web space, this seems to be the only reasonable way to make it work.
Though theoretically correct, this approach is not applicable to the corporate IS for a large variety of reasons:• The corporate information system is built with numerous legacy or closed applications that cannot be adapted or extended with Sparql endpoints• The enterprise information realm is made up at 80% of unstructured or semi-structured data that cannot fit in the model as such.• Enterprises do not want access to raw data in RDF format. They want to reap valuable information derived from the data, which requires large and complex computations to create these new informational objects.• The bottom-up approach of mapping silos and their data to RDF to fit the model requires an enormous work for defining vocabularies or ontologies for each source, which is a too heavy investment.• Companies dream of seamlessly integrating external data to leverage their internal information. But this external data is mostly available in XML or JSON through Web Services, and not yet in RDF, so that using Sparql as a way to query and integrate does not make sense.• ITdepartments have invested heavily in their “relational database for storing / XML for exchanging / Web apps for accessing” infrastructure. Their staffs are trained for this paradigm. They lack in-house skills for integrating the graph-way-of-thinking.• Stability matters most and Semantic Web technology is unknown, considered as new and immature: CIOs are not ready to take the risk of adding load and technological uncertainty on systems that are critical to the company for its daily business operations.
Does not work because process: modeling, know-how technology: performance, scalability enterprise don’t care about technology, especially if new one.
We tailor the Normalize process by aligning fields content in order to mesh data coming from different sources (such as records from a CRM and an ERP).R2RML and Direct Mapping compliant module named db2triples.
“Why do we transform only a subpart of the harvested data in RDF and what do we do with the rest of it?” Indeed, not to mention the fact that text documents are not graph friendly, as stated above we only transform a selected part of the structured data into RDF:From a technical standpoint we don’t feel like the technology is mature and stable enough to proceed differently. In industrial projects, millions of seed objects are regularly extracted from the sources (invoices, clients, files, etc.), each having tens of fields. And having billions of triples doesn’t scale well in available triplestores.Transforming only a subpart of the data largely simplifies the task of choosing the predicates, hence reinforces the choice of using many small available vocabularies instead of big ontologies.The data that is not transformed to RDF is stored by Information Factory for later use during the Build step.
Unstructured documents like office files, PDF files or emails content don’t fit the RDF formalism and cannot be linked to the graph as such.Extra work is necessary: First, we transform available metadata like document name, author, creation date, sender and receivers for a mail, subject and so forth into RDF.Then, we use text-mining technology to extract named entities like people, organizations, products, etc. from the documents. These entities lists are generated using different sources of the enterprise: directories, CRM or ERP are providing people and company names, while products are listed in ERPs or taxonomies.
And last, we run various specific algorithms designed to do document versus document comparison to detect duplicates, different versions of the same document, inclusions, semantically related ones, etc. Each of these relations is inserted in the graph with an appropriate predicate.
It is like cooking: the rules are your own personal touch. Rules depend on the information and knowledge you want to create by inferring on the graph.
We created the graph by inserting basic triples. Then we grew the graph with enriching and inferring.Now it is time to extract the information we need.For this, we first select the resources we look for.Then we follow some links to grab the information and create basic objects.
We agree we all would like to see those technologies invading the information systeml.We would like to put these stickers on this beautiful zSeries mainframe. But what does it mean? How can we do that?