SlideShare une entreprise Scribd logo
1  sur  51
Télécharger pour lire hors ligne
Text Mining and Knowledge Graphs in
the Cloud: The Self-Service Semantic
Suite (S4)
A webinar with
Marin Dimitrov, CTO of Ontotext
Feb 26th, 2015
Text Mining & Knowledge Graphs in the Cloud with S4 #1Feb 2015
• Semantic technologies for data management
• Self-Service Semantic Suite (S4)
• Text analytics
• RDF data management in the Cloud
• Knowledge graphs
• S4 for developers
• Roadmap
• Q&A session
Today’s Topics
Text Mining & Knowledge Graphs in the Cloud with S4 #2Feb 2015
About Ontotext
• Provides products & solutions for content
enrichment and metadata management
– 70 employees, head quartered in Sofia (Bulgaria)
– Sales presence in London, Washington & Boston
• Major clients and industries
– Media & Publishing
– Health Care & Life Sciences
– Cultural Heritage & Digital Libraries
– Government
– Education
Text Mining & Knowledge Graphs in the Cloud with S4 #3Feb 2015
Some of our clients
Text Mining & Knowledge Graphs in the Cloud with S4 #4Feb 2015
Semantic Technologies for
Smart Data Management
Text Mining & Knowledge Graphs in the Cloud with S4 #5Feb 2015
• How can we unlock more insight from text?
• How can we interlink & search across text and
structured data sources?
• How can we improve data & content reuse?
• How can we integrate data sources faster?
• How can we reuse external open data sources?
• How can we discover relations between entities?
Typical challenges for our customers
Text Mining & Knowledge Graphs in the Cloud with S4 #6Feb 2015
Ontotext’s vision for smart data
management
Graph Database
• Flexible RDF graph
data model
• Ontology metadata
layer
Semantic Search
• Semantic,
exploratory search
• Metadata driven
content
Text Mining & Interlinking
• People, locations,
organisations, topics
• Discover implicit
relations
• Reuse open knowledge
graphs
Text Mining & Knowledge Graphs in the Cloud with S4 #7Feb 2015
Ontotext and AstraZeneca
Profile
• Global, Bio-pharma company
• $28 billion in sales in 2012
• $4 billion in R&D across three continents
Goals
• Efficient design of new clinical studies
• Quick access to all of the data
• Improved evidence based decision-making
• Strengthen the knowledge feedback loop
• Enable predictive science
Challenges
• Over 7,000 studies and 23,000 documents
are difficult to obtain
• Searches returning 1,000 – 10,000 results
• Document repositories not designed for
reuse
• Tedious process to arrive at evidence
based decisions
#8Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015
Ontotext and LMI
Profile
• Established in 1961 to enable federal
agencies
• Specializes in logistics, financial,
infrastructure & information management
Goals
• Unlock large collections of complex
documents
• Improve analyst productivity
• Create an application they can sell to US
Federal agencies
Challenges
• Analysts taking hours to find, download
and search documents, using inaccurate
keyword searches
• Needed a knowledge base to search
quickly and guide the analysts – highly
relevant searches
• Extracts knowledge from collection of
documents
• Uses GraphDB to intuitively search and filter
• More than 90% savings in analyst time
• Accurate results
#9Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015
Ontotext and Euromoney
Profile
• Euromoney Institutional Investor PLC, the
international online information and events
group
Goals
• Create a horizontal platform to serve 100
different publications / 80 business units
• create a new unified publishing and
information platform
Challenges
• Different domains covered
• Sophisticated content analytics incl.
relation, template and scenario extraction
• Text analytics of reports and news in various
domains
• Extraction of sophisticated macro economic
views on markets and market conditions
• Triplestore for flexible data integration &
reasoning
• Multi-faceted search
#10Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015
The Self-Service Semantic Suite
(S4)
Text Mining & Knowledge Graphs in the Cloud with S4 #11Feb 2015
• Unlock the value of semantic technologies to SMEs
– Most success stories so far come from bigger companies
• Lower the technology adoption barriers and risks
– Challenge: perceived risks associated with new
technology adoption
– Challenge: insufficient resources to implement new
technologies
– Challenge: bureaucratic budgeting, procurement &
provisioning processes
Why did we create S4?
Text Mining & Knowledge Graphs in the Cloud with S4 #12Feb 2015
• Self-service capabilities for text analytics, content
enrichment and metadata management
– Text analytics for news, life sciences and social media
– RDF graph database as-a-service
– Access to large open knowledge graphs
• Available anytime, anywhere
– Simple RESTful services
• Simple, pay-per-use pricing
– No upfront commitments
What is S4?
Text Mining & Knowledge Graphs in the Cloud with S4 #13Feb 2015
• Utilise semantic technology for smart data
applications
– Extract more value hidden in text
– Interlink structured and unstructured data sources
– Semantic search (instead of keyword-based search)
– Reuse open knowledge graphs
• Low adoption cost and risk
• No need for complex planning & procurement
• Pay only for what you use, reduce TCO
S4 benefits
Text Mining & Knowledge Graphs in the Cloud with S4 #14Feb 2015
• Enables quick prototyping & shorter time-to-
market, increase innovation speed
• Available on-demand in the cloud, no provisioning
& operations required
• Based on enterprise grade semantic technology by
Ontotext
• Migration path from S4 based prototypes to
customised enterprise solutions with Ontotext
technology
S4 benefits
Text Mining & Knowledge Graphs in the Cloud with S4 #15Feb 2015
• Instantly available
• Free tier
• Easy to start, shorter learning curve
– Various add-ons, SDKs and demo code
• Simplify the technology stack for smart data
applications
• Focus on building applications, don’t worry about
infrastructure & operations
• Quicker prototyping, shorter development cycles
S4 for developers
Text Mining & Knowledge Graphs in the Cloud with S4 #16Feb 2015
Text Analytics
Text Mining & Knowledge Graphs in the Cloud with S4 #17Feb 2015
• Text analytics services
– News annotation
– News categorisation
– Biomedical
– Twitter
• Entity linking & disambiguation
– Mappings to DBpedia & GeoNames instances
– Mappings to biomedical data sources (LinkedLifeData)
• HTML, MS Word, XML, plain text input
• Simple JSON output
Text analytics with S4
Text Mining & Knowledge Graphs in the Cloud with S4 #18Feb 2015
• Entity types
– Person
– Organization
– Location
– Relation (affiliation, customer, competitor, partner,
acquisition, role, …)
– Keywords and key phrases
• Enterprise grade technology
– Based on successful text mining solutions for big media
& publishing companies
News analytics with S4
Text Mining & Knowledge Graphs in the Cloud with S4 #19Feb 2015
Text Mining & Knowledge Graphs in the Cloud with S4 #20Feb 2015
News analytics with S4
News analytics example
Text Mining & Knowledge Graphs in the Cloud with S4 #21Feb 2015
S4 result
News analytics example
Text Mining & Knowledge Graphs in the Cloud with S4 #22Feb 2015
API_KEY=s4trm64sb76u
KEY_SECRET=lrcki2kkajslsp6
SERVICE_ENDPOINT="https://$API_KEY:$KEY_SECRET@text.s4.ontotext.com/v1/news"
CONTENT="President Barack Obama is urging parents to get their children vaccinated in the face
of a measles outbreak that has infected more than 100 people in the United States. In excerpts
from an interview with NBC News that will air on Monday, Obama said measles was a
preventable disease."
CONTENT_TYPE="text/plain"
JSON_REQUEST="{"document" : "$CONTENT", "documentType" : "$CONTENT_TYPE"}"
curl -X POST -H "Content-Type: application/json" -d "$JSON_REQUEST" $SERVICE_ENDPOINT
{
"document" : "President Barack Obama is urging parents to get their children vaccinated in
the face of a measles outbreak that has infected more than 100 people in the United States.
In excerpts from an interview with NBC News that will air on Monday, Obama said measles
was a preventable disease" ,
"documentType" : "text/plain"
}
API key pair REST service
text
Request structure
Request
structure
• 17 top-level categories from the IPTC Subject
Reference System
– Arts / Culture / Entertainment, Crime / Law / Justice,
Disaster / Accident, Economy / Business / Finance,
Education, Environment, Health, Politics, …
• Enterprise grade technology
– Based on successful text mining solutions for big media
& publishing companies
News classification with S4
Text Mining & Knowledge Graphs in the Cloud with S4 #23Feb 2015
News classification example
Text Mining & Knowledge Graphs in the Cloud with S4 #24Feb 2015
S4 result
News classification example
Text Mining & Knowledge Graphs in the Cloud with S4 #25Feb 2015
API_KEY=s4trm64sb76u
KEY_SECRET=lrcki2kkajslsp6
SERVICE_ENDPOINT="https://$API_KEY:$KEY_SECRET@text.s4.ontotext.com/v1/news-classifier"
CONTENT_URL="http://www.theguardian.com/world/2015/feb/04/taiwan-plane-crash-lands-in-
river"
CONTENT_TYPE="text/plain"
JSON_REQUEST="{"documentUrl" : "$CONTENT_URL", "documentType" :
"$CONTENT_TYPE"}"
curl -X POST -H "Content-Type: application/json" -d "$JSON_REQUEST" $SERVICE_ENDPOINT
{
"documentUrl" : "http://www.theguardian.com/world/2015/feb/04/taiwan-plane-crash-lands-
in-river" ,
"documentType" : "text/html"
}
API key pair REST service
URL Request structure
Request
structure
• 130 biomedical entity types
– Organism, Virus, Animal, Anatomical Structure, Organ,
Tissue, Cell, Genome, Chemical, Lab Result, Clinical Drug,
Biologic Function, Organ Function, Disease/Syndrome, …
• Enterprise grade technology
– Based on successful text mining solutions for big
pharmaceuticals and healthcare providers
Biomedical analytics with S4
Text Mining & Knowledge Graphs in the Cloud with S4 #26Feb 2015
Biomedical analytics with S4
Text Mining & Knowledge Graphs in the Cloud with S4 #27Feb 2015
Biomedical analytics example
Text Mining & Knowledge Graphs in the Cloud with S4 #28Feb 2015
S4 result
• Entity types
– Person, Location, Organisation, Date, Address, Money
– Hashtag, Emoticon, URL, @UserID
• Based on TwitIE microblog pipeline by GATE /
University of Sheffield
Twitter analytics with S4
Text Mining & Knowledge Graphs in the Cloud with S4 #29Feb 2015
Twitter analytics example
Text Mining & Knowledge Graphs in the Cloud with S4 #30Feb 2015
RDF Data Management
Text Mining & Knowledge Graphs in the Cloud with S4 #31Feb 2015
• Standards compliance
– Based on a mature set of W3C standards: RDF/S, OWL,
SPARQL
– Portability & interoperability
• Schema-less data integration, easy querying of
diverse data
• Complex & exploratory queries
• Infer implicit relations in the graph
• Reuse open knowledge graphs (Linked Open Data)
RDF for smart data management
Text Mining & Knowledge Graphs in the Cloud with S4 #32Feb 2015
A visual view of RDF data
Text Mining & Knowledge Graphs in the Cloud with S4 #33Feb 2015
Sub-properties
Sub-classes
Transitive
relations
Inference
• High performance RDF database
• Full SPARQL 1.1 support
• Various reasoning profiles, including custom rules
• Efficient data integration (“sameAs” optimisations)
• Efficient deletion of statements & their inferences
• Geo-spatial indexing & querying with SPARQL
• RDF Rank, full-text search, 3rd party plugins
GraphDB by Ontotext
Text Mining & Knowledge Graphs in the Cloud with S4 #34Feb 2015
• Ideal for customers who are…
– still evaluating and testing RDF technology
– In the early phase of adoption / POC
• Enterprise grade RDF database in the Cloud
– No need for upfront payments for licenses & hardware
– Pay only for what you use, when you use it
– Instantly operational within minutes
– No need for complex planning - use as many DB
instances for as long as needed
– Timely upgrades to the latest version
• Self-managed and full-managed options
RDF database in the Cloud with S4
Text Mining & Knowledge Graphs in the Cloud with S4 #35Feb 2015
• Available from AWS Marketplace
• Variety of hardware configurations
– 2 to 8 CPU cores / 8 to 61 GB RAM
– IOPS performance & encryption (EBS)
• Manage large data volumes
• Pay-per-hour pricing
Self-managed database in the Cloud
Text Mining & Knowledge Graphs in the Cloud with S4 #36Feb 2015
• (available in Q2’2015)
• Low-cost DBaaS available 24/7
• Ideal for small & moderate data volumes
• Instantly start new databases when needed
• Zero administration: automated operations,
maintenance & upgrades
• Users pay only for the actual database utilisation
– database size + number of queries per period
Fully-managed database in the Cloud
Text Mining & Knowledge Graphs in the Cloud with S4 #37Feb 2015
Knowledge Graphs
Text Mining & Knowledge Graphs in the Cloud with S4 #38Feb 2015
• SPARQL query endpoint to FactForge knowledge
graph
– 500 million entities
– 5 billion triples
• Key LOD datasets integrated
– DBpedia, Freebase, GeoNames, WordNet
– Dublin Core, SKOS, PROTON ontologies and
vocabularies
Knowledge graphs with S4
Text Mining & Knowledge Graphs in the Cloud with S4 #39Feb 2015
Knowledge graph query example
Text Mining & Knowledge Graphs in the Cloud with S4 #40Feb 2015
SPARQL query
using DBpedia
data
For Developers
Text Mining & Knowledge Graphs in the Cloud with S4 #41Feb 2015
Getting started in minutes
Text Mining & Knowledge Graphs in the Cloud with S4 #42Feb 2015
1. Register a personal
account at s4.ontotext.com
2. Generate an
API key pair
3. Check out the docs,
demos & code at
docs.s4.ontotext.com
4. Contact us
with questions!
• Java & C# SDKs
• Sample code
– Java, C#, NodeJS, JavaScript, Python, PHP, Groovy
– Curl examples for the most impatient
• GATE plugin (UIMA plugin in Q2’2015)
• Firefox plugin
• Online documentation
S4 for developers
Text Mining & Knowledge Graphs in the Cloud with S4 #43Feb 2015
• March 1st – 30th 2015
• Submit a cool text analytics & Linked Data
application using S4
• $1,000 for the winning submission
• More details at http://bit.ly/s4-challenge
S4 Developers Challenge
Text Mining & Knowledge Graphs in the Cloud with S4 #44Feb 2015
Roadmap
Text Mining & Knowledge Graphs in the Cloud with S4 #45Feb 2015
• Text analytics
– Multi-lingual text analytics
– Sentiment analytics
– JSON-LD output format
• RDF databases
– Fully managed RDF DBaaS
– Regular updates of the self-managed GraphDB on AWS
• Knowledge Graphs
– Private knowledge graph databases with
DBpedia/Wikidata
– 3rd party Linked Data visualisation & exploration tools
What to expect in 2015?
Text Mining & Knowledge Graphs in the Cloud with S4 #46Feb 2015
• Pricing plans
– Simple, transparent, usage based pricing
– Pay only for what you use, when you use it
• For developers
– UIMA plugin for S4
– More SDKs
– mode add-ons
– Demos and sample code
– S4 Developers Challenges
What to expect in 2015?
Text Mining & Knowledge Graphs in the Cloud with S4 #47Feb 2015
Key Takeaways
Text Mining & Knowledge Graphs in the Cloud with S4 #48Feb 2015
• Semantic technologies provide good capabilities
for smart data management
• Key S4 benefits
– Lowers the risks and costs for semantic technology
adoption
– Shortens time-to-market, reduces TCO
– Provides a safe migration path into custom enterprise
solutions with Ontotext technology
• Key S4 capabilities
– Various text analytics components (more to come!)
– Self-managed & fully managed RDF DB in the Cloud
– Knowledge graphs with reusable open data
Key Takeaways
Text Mining & Knowledge Graphs in the Cloud with S4 #49Feb 2015
• Online documentation
– http://docs.s4.ontotext.com/
• Sample code & demos on GitHub
– https://github.com/Ontotext-AD/S4
• Helpdesk
– http://support.s4.ontotext.com/
• Twitter
– @Ontotext_S4
Additional S4 resources
#50Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015
Thank you!
Text Mining and Knowledge Graphs in the Cloud:
The Self-Service Semantic Suite
A link to the recording will be sent out shortly
Feb 26th, 2015
#51Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015

Contenu connexe

En vedette

Delivering Linked Data Training to Data Science Practitioners
Delivering Linked Data Training to Data Science PractitionersDelivering Linked Data Training to Data Science Practitioners
Delivering Linked Data Training to Data Science Practitioners
Marin Dimitrov
 
Hackconf 2016 - Да пишем код за хиляди сървъри
Hackconf 2016 - Да пишем код за хиляди сървъриHackconf 2016 - Да пишем код за хиляди сървъри
Hackconf 2016 - Да пишем код за хиляди сървъри
Nikolay Stoitsev
 
Semantic Technologies for Big Data
Semantic Technologies for Big DataSemantic Technologies for Big Data
Semantic Technologies for Big Data
Marin Dimitrov
 

En vedette (11)

RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4
 
Text Analytics & Linked Data Management As-a-Service
Text Analytics & Linked Data Management As-a-ServiceText Analytics & Linked Data Management As-a-Service
Text Analytics & Linked Data Management As-a-Service
 
Delivering Linked Data Training to Data Science Practitioners
Delivering Linked Data Training to Data Science PractitionersDelivering Linked Data Training to Data Science Practitioners
Delivering Linked Data Training to Data Science Practitioners
 
Hackconf 2016 - Да пишем код за хиляди сървъри
Hackconf 2016 - Да пишем код за хиляди сървъриHackconf 2016 - Да пишем код за хиляди сървъри
Hackconf 2016 - Да пишем код за хиляди сървъри
 
Scaling to Millions of Concurrent SPARQL Queries on the Cloud
Scaling to Millions of Concurrent SPARQL Queries on the CloudScaling to Millions of Concurrent SPARQL Queries on the Cloud
Scaling to Millions of Concurrent SPARQL Queries on the Cloud
 
From Python to Java
From Python to JavaFrom Python to Java
From Python to Java
 
GraphDB Connectors – Powering Complex SPARQL Queries
GraphDB Connectors – Powering Complex SPARQL QueriesGraphDB Connectors – Powering Complex SPARQL Queries
GraphDB Connectors – Powering Complex SPARQL Queries
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
From Big Data to Smart Data
From Big Data to Smart DataFrom Big Data to Smart Data
From Big Data to Smart Data
 
Crossing the Chasm with Semantic Technology
Crossing the Chasm with Semantic TechnologyCrossing the Chasm with Semantic Technology
Crossing the Chasm with Semantic Technology
 
Semantic Technologies for Big Data
Semantic Technologies for Big DataSemantic Technologies for Big Data
Semantic Technologies for Big Data
 

Plus de Marin Dimitrov

Linked Data for the Enterprise: Opportunities and Challenges
Linked Data for the Enterprise: Opportunities and ChallengesLinked Data for the Enterprise: Opportunities and Challenges
Linked Data for the Enterprise: Opportunities and Challenges
Marin Dimitrov
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
Marin Dimitrov
 
Linked Data Marketplaces
Linked Data MarketplacesLinked Data Marketplaces
Linked Data Marketplaces
Marin Dimitrov
 

Plus de Marin Dimitrov (14)

Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...Measuring the Productivity of Your Engineering Organisation - the Good, the B...
Measuring the Productivity of Your Engineering Organisation - the Good, the B...
 
Mapping Your Career Journey
Mapping Your Career JourneyMapping Your Career Journey
Mapping Your Career Journey
 
Open Source @ Uber
Open Source @ Uber Open Source @ Uber
Open Source @ Uber
 
Trust - the Key Success Factor for Teams & Organisations
Trust - the Key Success Factor for Teams & OrganisationsTrust - the Key Success Factor for Teams & Organisations
Trust - the Key Success Factor for Teams & Organisations
 
Uber @ Telerik Academy 2018
Uber @ Telerik Academy 2018Uber @ Telerik Academy 2018
Uber @ Telerik Academy 2018
 
Machine Learning @ Uber
Machine Learning @ UberMachine Learning @ Uber
Machine Learning @ Uber
 
Career Advice for My Younger Self
Career Advice for My Younger SelfCareer Advice for My Younger Self
Career Advice for My Younger Self
 
Scaling Your Engineering Organization with Distributed Sites
Scaling Your Engineering Organization with Distributed SitesScaling Your Engineering Organization with Distributed Sites
Scaling Your Engineering Organization with Distributed Sites
 
Building, Scaling and Leading High-Performance Teams
Building, Scaling and Leading High-Performance TeamsBuilding, Scaling and Leading High-Performance Teams
Building, Scaling and Leading High-Performance Teams
 
Uber @ Career Days 2017 (Sofia University)
Uber @ Career Days 2017 (Sofia University)Uber @ Career Days 2017 (Sofia University)
Uber @ Career Days 2017 (Sofia University)
 
Career Days 2012 @ Sofia University
Career Days 2012 @ Sofia UniversityCareer Days 2012 @ Sofia University
Career Days 2012 @ Sofia University
 
Linked Data for the Enterprise: Opportunities and Challenges
Linked Data for the Enterprise: Opportunities and ChallengesLinked Data for the Enterprise: Opportunities and Challenges
Linked Data for the Enterprise: Opportunities and Challenges
 
Semantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business IntelligenceSemantic Technologies and Triplestores for Business Intelligence
Semantic Technologies and Triplestores for Business Intelligence
 
Linked Data Marketplaces
Linked Data MarketplacesLinked Data Marketplaces
Linked Data Marketplaces
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)

  • 1. Text Mining and Knowledge Graphs in the Cloud: The Self-Service Semantic Suite (S4) A webinar with Marin Dimitrov, CTO of Ontotext Feb 26th, 2015 Text Mining & Knowledge Graphs in the Cloud with S4 #1Feb 2015
  • 2. • Semantic technologies for data management • Self-Service Semantic Suite (S4) • Text analytics • RDF data management in the Cloud • Knowledge graphs • S4 for developers • Roadmap • Q&A session Today’s Topics Text Mining & Knowledge Graphs in the Cloud with S4 #2Feb 2015
  • 3. About Ontotext • Provides products & solutions for content enrichment and metadata management – 70 employees, head quartered in Sofia (Bulgaria) – Sales presence in London, Washington & Boston • Major clients and industries – Media & Publishing – Health Care & Life Sciences – Cultural Heritage & Digital Libraries – Government – Education Text Mining & Knowledge Graphs in the Cloud with S4 #3Feb 2015
  • 4. Some of our clients Text Mining & Knowledge Graphs in the Cloud with S4 #4Feb 2015
  • 5. Semantic Technologies for Smart Data Management Text Mining & Knowledge Graphs in the Cloud with S4 #5Feb 2015
  • 6. • How can we unlock more insight from text? • How can we interlink & search across text and structured data sources? • How can we improve data & content reuse? • How can we integrate data sources faster? • How can we reuse external open data sources? • How can we discover relations between entities? Typical challenges for our customers Text Mining & Knowledge Graphs in the Cloud with S4 #6Feb 2015
  • 7. Ontotext’s vision for smart data management Graph Database • Flexible RDF graph data model • Ontology metadata layer Semantic Search • Semantic, exploratory search • Metadata driven content Text Mining & Interlinking • People, locations, organisations, topics • Discover implicit relations • Reuse open knowledge graphs Text Mining & Knowledge Graphs in the Cloud with S4 #7Feb 2015
  • 8. Ontotext and AstraZeneca Profile • Global, Bio-pharma company • $28 billion in sales in 2012 • $4 billion in R&D across three continents Goals • Efficient design of new clinical studies • Quick access to all of the data • Improved evidence based decision-making • Strengthen the knowledge feedback loop • Enable predictive science Challenges • Over 7,000 studies and 23,000 documents are difficult to obtain • Searches returning 1,000 – 10,000 results • Document repositories not designed for reuse • Tedious process to arrive at evidence based decisions #8Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015
  • 9. Ontotext and LMI Profile • Established in 1961 to enable federal agencies • Specializes in logistics, financial, infrastructure & information management Goals • Unlock large collections of complex documents • Improve analyst productivity • Create an application they can sell to US Federal agencies Challenges • Analysts taking hours to find, download and search documents, using inaccurate keyword searches • Needed a knowledge base to search quickly and guide the analysts – highly relevant searches • Extracts knowledge from collection of documents • Uses GraphDB to intuitively search and filter • More than 90% savings in analyst time • Accurate results #9Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015
  • 10. Ontotext and Euromoney Profile • Euromoney Institutional Investor PLC, the international online information and events group Goals • Create a horizontal platform to serve 100 different publications / 80 business units • create a new unified publishing and information platform Challenges • Different domains covered • Sophisticated content analytics incl. relation, template and scenario extraction • Text analytics of reports and news in various domains • Extraction of sophisticated macro economic views on markets and market conditions • Triplestore for flexible data integration & reasoning • Multi-faceted search #10Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015
  • 11. The Self-Service Semantic Suite (S4) Text Mining & Knowledge Graphs in the Cloud with S4 #11Feb 2015
  • 12. • Unlock the value of semantic technologies to SMEs – Most success stories so far come from bigger companies • Lower the technology adoption barriers and risks – Challenge: perceived risks associated with new technology adoption – Challenge: insufficient resources to implement new technologies – Challenge: bureaucratic budgeting, procurement & provisioning processes Why did we create S4? Text Mining & Knowledge Graphs in the Cloud with S4 #12Feb 2015
  • 13. • Self-service capabilities for text analytics, content enrichment and metadata management – Text analytics for news, life sciences and social media – RDF graph database as-a-service – Access to large open knowledge graphs • Available anytime, anywhere – Simple RESTful services • Simple, pay-per-use pricing – No upfront commitments What is S4? Text Mining & Knowledge Graphs in the Cloud with S4 #13Feb 2015
  • 14. • Utilise semantic technology for smart data applications – Extract more value hidden in text – Interlink structured and unstructured data sources – Semantic search (instead of keyword-based search) – Reuse open knowledge graphs • Low adoption cost and risk • No need for complex planning & procurement • Pay only for what you use, reduce TCO S4 benefits Text Mining & Knowledge Graphs in the Cloud with S4 #14Feb 2015
  • 15. • Enables quick prototyping & shorter time-to- market, increase innovation speed • Available on-demand in the cloud, no provisioning & operations required • Based on enterprise grade semantic technology by Ontotext • Migration path from S4 based prototypes to customised enterprise solutions with Ontotext technology S4 benefits Text Mining & Knowledge Graphs in the Cloud with S4 #15Feb 2015
  • 16. • Instantly available • Free tier • Easy to start, shorter learning curve – Various add-ons, SDKs and demo code • Simplify the technology stack for smart data applications • Focus on building applications, don’t worry about infrastructure & operations • Quicker prototyping, shorter development cycles S4 for developers Text Mining & Knowledge Graphs in the Cloud with S4 #16Feb 2015
  • 17. Text Analytics Text Mining & Knowledge Graphs in the Cloud with S4 #17Feb 2015
  • 18. • Text analytics services – News annotation – News categorisation – Biomedical – Twitter • Entity linking & disambiguation – Mappings to DBpedia & GeoNames instances – Mappings to biomedical data sources (LinkedLifeData) • HTML, MS Word, XML, plain text input • Simple JSON output Text analytics with S4 Text Mining & Knowledge Graphs in the Cloud with S4 #18Feb 2015
  • 19. • Entity types – Person – Organization – Location – Relation (affiliation, customer, competitor, partner, acquisition, role, …) – Keywords and key phrases • Enterprise grade technology – Based on successful text mining solutions for big media & publishing companies News analytics with S4 Text Mining & Knowledge Graphs in the Cloud with S4 #19Feb 2015
  • 20. Text Mining & Knowledge Graphs in the Cloud with S4 #20Feb 2015 News analytics with S4
  • 21. News analytics example Text Mining & Knowledge Graphs in the Cloud with S4 #21Feb 2015 S4 result
  • 22. News analytics example Text Mining & Knowledge Graphs in the Cloud with S4 #22Feb 2015 API_KEY=s4trm64sb76u KEY_SECRET=lrcki2kkajslsp6 SERVICE_ENDPOINT="https://$API_KEY:$KEY_SECRET@text.s4.ontotext.com/v1/news" CONTENT="President Barack Obama is urging parents to get their children vaccinated in the face of a measles outbreak that has infected more than 100 people in the United States. In excerpts from an interview with NBC News that will air on Monday, Obama said measles was a preventable disease." CONTENT_TYPE="text/plain" JSON_REQUEST="{"document" : "$CONTENT", "documentType" : "$CONTENT_TYPE"}" curl -X POST -H "Content-Type: application/json" -d "$JSON_REQUEST" $SERVICE_ENDPOINT { "document" : "President Barack Obama is urging parents to get their children vaccinated in the face of a measles outbreak that has infected more than 100 people in the United States. In excerpts from an interview with NBC News that will air on Monday, Obama said measles was a preventable disease" , "documentType" : "text/plain" } API key pair REST service text Request structure Request structure
  • 23. • 17 top-level categories from the IPTC Subject Reference System – Arts / Culture / Entertainment, Crime / Law / Justice, Disaster / Accident, Economy / Business / Finance, Education, Environment, Health, Politics, … • Enterprise grade technology – Based on successful text mining solutions for big media & publishing companies News classification with S4 Text Mining & Knowledge Graphs in the Cloud with S4 #23Feb 2015
  • 24. News classification example Text Mining & Knowledge Graphs in the Cloud with S4 #24Feb 2015 S4 result
  • 25. News classification example Text Mining & Knowledge Graphs in the Cloud with S4 #25Feb 2015 API_KEY=s4trm64sb76u KEY_SECRET=lrcki2kkajslsp6 SERVICE_ENDPOINT="https://$API_KEY:$KEY_SECRET@text.s4.ontotext.com/v1/news-classifier" CONTENT_URL="http://www.theguardian.com/world/2015/feb/04/taiwan-plane-crash-lands-in- river" CONTENT_TYPE="text/plain" JSON_REQUEST="{"documentUrl" : "$CONTENT_URL", "documentType" : "$CONTENT_TYPE"}" curl -X POST -H "Content-Type: application/json" -d "$JSON_REQUEST" $SERVICE_ENDPOINT { "documentUrl" : "http://www.theguardian.com/world/2015/feb/04/taiwan-plane-crash-lands- in-river" , "documentType" : "text/html" } API key pair REST service URL Request structure Request structure
  • 26. • 130 biomedical entity types – Organism, Virus, Animal, Anatomical Structure, Organ, Tissue, Cell, Genome, Chemical, Lab Result, Clinical Drug, Biologic Function, Organ Function, Disease/Syndrome, … • Enterprise grade technology – Based on successful text mining solutions for big pharmaceuticals and healthcare providers Biomedical analytics with S4 Text Mining & Knowledge Graphs in the Cloud with S4 #26Feb 2015
  • 27. Biomedical analytics with S4 Text Mining & Knowledge Graphs in the Cloud with S4 #27Feb 2015
  • 28. Biomedical analytics example Text Mining & Knowledge Graphs in the Cloud with S4 #28Feb 2015 S4 result
  • 29. • Entity types – Person, Location, Organisation, Date, Address, Money – Hashtag, Emoticon, URL, @UserID • Based on TwitIE microblog pipeline by GATE / University of Sheffield Twitter analytics with S4 Text Mining & Knowledge Graphs in the Cloud with S4 #29Feb 2015
  • 30. Twitter analytics example Text Mining & Knowledge Graphs in the Cloud with S4 #30Feb 2015
  • 31. RDF Data Management Text Mining & Knowledge Graphs in the Cloud with S4 #31Feb 2015
  • 32. • Standards compliance – Based on a mature set of W3C standards: RDF/S, OWL, SPARQL – Portability & interoperability • Schema-less data integration, easy querying of diverse data • Complex & exploratory queries • Infer implicit relations in the graph • Reuse open knowledge graphs (Linked Open Data) RDF for smart data management Text Mining & Knowledge Graphs in the Cloud with S4 #32Feb 2015
  • 33. A visual view of RDF data Text Mining & Knowledge Graphs in the Cloud with S4 #33Feb 2015 Sub-properties Sub-classes Transitive relations Inference
  • 34. • High performance RDF database • Full SPARQL 1.1 support • Various reasoning profiles, including custom rules • Efficient data integration (“sameAs” optimisations) • Efficient deletion of statements & their inferences • Geo-spatial indexing & querying with SPARQL • RDF Rank, full-text search, 3rd party plugins GraphDB by Ontotext Text Mining & Knowledge Graphs in the Cloud with S4 #34Feb 2015
  • 35. • Ideal for customers who are… – still evaluating and testing RDF technology – In the early phase of adoption / POC • Enterprise grade RDF database in the Cloud – No need for upfront payments for licenses & hardware – Pay only for what you use, when you use it – Instantly operational within minutes – No need for complex planning - use as many DB instances for as long as needed – Timely upgrades to the latest version • Self-managed and full-managed options RDF database in the Cloud with S4 Text Mining & Knowledge Graphs in the Cloud with S4 #35Feb 2015
  • 36. • Available from AWS Marketplace • Variety of hardware configurations – 2 to 8 CPU cores / 8 to 61 GB RAM – IOPS performance & encryption (EBS) • Manage large data volumes • Pay-per-hour pricing Self-managed database in the Cloud Text Mining & Knowledge Graphs in the Cloud with S4 #36Feb 2015
  • 37. • (available in Q2’2015) • Low-cost DBaaS available 24/7 • Ideal for small & moderate data volumes • Instantly start new databases when needed • Zero administration: automated operations, maintenance & upgrades • Users pay only for the actual database utilisation – database size + number of queries per period Fully-managed database in the Cloud Text Mining & Knowledge Graphs in the Cloud with S4 #37Feb 2015
  • 38. Knowledge Graphs Text Mining & Knowledge Graphs in the Cloud with S4 #38Feb 2015
  • 39. • SPARQL query endpoint to FactForge knowledge graph – 500 million entities – 5 billion triples • Key LOD datasets integrated – DBpedia, Freebase, GeoNames, WordNet – Dublin Core, SKOS, PROTON ontologies and vocabularies Knowledge graphs with S4 Text Mining & Knowledge Graphs in the Cloud with S4 #39Feb 2015
  • 40. Knowledge graph query example Text Mining & Knowledge Graphs in the Cloud with S4 #40Feb 2015 SPARQL query using DBpedia data
  • 41. For Developers Text Mining & Knowledge Graphs in the Cloud with S4 #41Feb 2015
  • 42. Getting started in minutes Text Mining & Knowledge Graphs in the Cloud with S4 #42Feb 2015 1. Register a personal account at s4.ontotext.com 2. Generate an API key pair 3. Check out the docs, demos & code at docs.s4.ontotext.com 4. Contact us with questions!
  • 43. • Java & C# SDKs • Sample code – Java, C#, NodeJS, JavaScript, Python, PHP, Groovy – Curl examples for the most impatient • GATE plugin (UIMA plugin in Q2’2015) • Firefox plugin • Online documentation S4 for developers Text Mining & Knowledge Graphs in the Cloud with S4 #43Feb 2015
  • 44. • March 1st – 30th 2015 • Submit a cool text analytics & Linked Data application using S4 • $1,000 for the winning submission • More details at http://bit.ly/s4-challenge S4 Developers Challenge Text Mining & Knowledge Graphs in the Cloud with S4 #44Feb 2015
  • 45. Roadmap Text Mining & Knowledge Graphs in the Cloud with S4 #45Feb 2015
  • 46. • Text analytics – Multi-lingual text analytics – Sentiment analytics – JSON-LD output format • RDF databases – Fully managed RDF DBaaS – Regular updates of the self-managed GraphDB on AWS • Knowledge Graphs – Private knowledge graph databases with DBpedia/Wikidata – 3rd party Linked Data visualisation & exploration tools What to expect in 2015? Text Mining & Knowledge Graphs in the Cloud with S4 #46Feb 2015
  • 47. • Pricing plans – Simple, transparent, usage based pricing – Pay only for what you use, when you use it • For developers – UIMA plugin for S4 – More SDKs – mode add-ons – Demos and sample code – S4 Developers Challenges What to expect in 2015? Text Mining & Knowledge Graphs in the Cloud with S4 #47Feb 2015
  • 48. Key Takeaways Text Mining & Knowledge Graphs in the Cloud with S4 #48Feb 2015
  • 49. • Semantic technologies provide good capabilities for smart data management • Key S4 benefits – Lowers the risks and costs for semantic technology adoption – Shortens time-to-market, reduces TCO – Provides a safe migration path into custom enterprise solutions with Ontotext technology • Key S4 capabilities – Various text analytics components (more to come!) – Self-managed & fully managed RDF DB in the Cloud – Knowledge graphs with reusable open data Key Takeaways Text Mining & Knowledge Graphs in the Cloud with S4 #49Feb 2015
  • 50. • Online documentation – http://docs.s4.ontotext.com/ • Sample code & demos on GitHub – https://github.com/Ontotext-AD/S4 • Helpdesk – http://support.s4.ontotext.com/ • Twitter – @Ontotext_S4 Additional S4 resources #50Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015
  • 51. Thank you! Text Mining and Knowledge Graphs in the Cloud: The Self-Service Semantic Suite A link to the recording will be sent out shortly Feb 26th, 2015 #51Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015