This Ontotext webinar is designed to provide a summary of the value of Semantic Technology for smarter data management, as well as a brief technical introduction to the Self-Service Semantic Suite (S4) by Ontotext, which provides on-demand capabilities in the Cloud for text analytics, RDF data management and access to knowledge graphs. Featuring Ontotext CTO, Marin Dimitrov (Twitter: @marin_dim), this session will address important questions including:
* How can organizations utilize Semantic Technology for getting more value from data?
* Why did Ontotext create the Self-Service Semantic Suite (S4)?
* What are the main capabilities of S4?
* Practical examples of using various S4 services
* What is the S4 roadmap and what new features will be available throughout the 1st half of 2015?
video recording of the talk is available at http://info.ontotext.com/s4-webinar-recording
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Text Mining and Knowledge Graphs in the Cloud: the Self-Service Semantic Suite (S4)
1. Text Mining and Knowledge Graphs in
the Cloud: The Self-Service Semantic
Suite (S4)
A webinar with
Marin Dimitrov, CTO of Ontotext
Feb 26th, 2015
Text Mining & Knowledge Graphs in the Cloud with S4 #1Feb 2015
2. • Semantic technologies for data management
• Self-Service Semantic Suite (S4)
• Text analytics
• RDF data management in the Cloud
• Knowledge graphs
• S4 for developers
• Roadmap
• Q&A session
Today’s Topics
Text Mining & Knowledge Graphs in the Cloud with S4 #2Feb 2015
3. About Ontotext
• Provides products & solutions for content
enrichment and metadata management
– 70 employees, head quartered in Sofia (Bulgaria)
– Sales presence in London, Washington & Boston
• Major clients and industries
– Media & Publishing
– Health Care & Life Sciences
– Cultural Heritage & Digital Libraries
– Government
– Education
Text Mining & Knowledge Graphs in the Cloud with S4 #3Feb 2015
4. Some of our clients
Text Mining & Knowledge Graphs in the Cloud with S4 #4Feb 2015
6. • How can we unlock more insight from text?
• How can we interlink & search across text and
structured data sources?
• How can we improve data & content reuse?
• How can we integrate data sources faster?
• How can we reuse external open data sources?
• How can we discover relations between entities?
Typical challenges for our customers
Text Mining & Knowledge Graphs in the Cloud with S4 #6Feb 2015
7. Ontotext’s vision for smart data
management
Graph Database
• Flexible RDF graph
data model
• Ontology metadata
layer
Semantic Search
• Semantic,
exploratory search
• Metadata driven
content
Text Mining & Interlinking
• People, locations,
organisations, topics
• Discover implicit
relations
• Reuse open knowledge
graphs
Text Mining & Knowledge Graphs in the Cloud with S4 #7Feb 2015
8. Ontotext and AstraZeneca
Profile
• Global, Bio-pharma company
• $28 billion in sales in 2012
• $4 billion in R&D across three continents
Goals
• Efficient design of new clinical studies
• Quick access to all of the data
• Improved evidence based decision-making
• Strengthen the knowledge feedback loop
• Enable predictive science
Challenges
• Over 7,000 studies and 23,000 documents
are difficult to obtain
• Searches returning 1,000 – 10,000 results
• Document repositories not designed for
reuse
• Tedious process to arrive at evidence
based decisions
#8Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015
9. Ontotext and LMI
Profile
• Established in 1961 to enable federal
agencies
• Specializes in logistics, financial,
infrastructure & information management
Goals
• Unlock large collections of complex
documents
• Improve analyst productivity
• Create an application they can sell to US
Federal agencies
Challenges
• Analysts taking hours to find, download
and search documents, using inaccurate
keyword searches
• Needed a knowledge base to search
quickly and guide the analysts – highly
relevant searches
• Extracts knowledge from collection of
documents
• Uses GraphDB to intuitively search and filter
• More than 90% savings in analyst time
• Accurate results
#9Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015
10. Ontotext and Euromoney
Profile
• Euromoney Institutional Investor PLC, the
international online information and events
group
Goals
• Create a horizontal platform to serve 100
different publications / 80 business units
• create a new unified publishing and
information platform
Challenges
• Different domains covered
• Sophisticated content analytics incl.
relation, template and scenario extraction
• Text analytics of reports and news in various
domains
• Extraction of sophisticated macro economic
views on markets and market conditions
• Triplestore for flexible data integration &
reasoning
• Multi-faceted search
#10Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015
11. The Self-Service Semantic Suite
(S4)
Text Mining & Knowledge Graphs in the Cloud with S4 #11Feb 2015
12. • Unlock the value of semantic technologies to SMEs
– Most success stories so far come from bigger companies
• Lower the technology adoption barriers and risks
– Challenge: perceived risks associated with new
technology adoption
– Challenge: insufficient resources to implement new
technologies
– Challenge: bureaucratic budgeting, procurement &
provisioning processes
Why did we create S4?
Text Mining & Knowledge Graphs in the Cloud with S4 #12Feb 2015
13. • Self-service capabilities for text analytics, content
enrichment and metadata management
– Text analytics for news, life sciences and social media
– RDF graph database as-a-service
– Access to large open knowledge graphs
• Available anytime, anywhere
– Simple RESTful services
• Simple, pay-per-use pricing
– No upfront commitments
What is S4?
Text Mining & Knowledge Graphs in the Cloud with S4 #13Feb 2015
14. • Utilise semantic technology for smart data
applications
– Extract more value hidden in text
– Interlink structured and unstructured data sources
– Semantic search (instead of keyword-based search)
– Reuse open knowledge graphs
• Low adoption cost and risk
• No need for complex planning & procurement
• Pay only for what you use, reduce TCO
S4 benefits
Text Mining & Knowledge Graphs in the Cloud with S4 #14Feb 2015
15. • Enables quick prototyping & shorter time-to-
market, increase innovation speed
• Available on-demand in the cloud, no provisioning
& operations required
• Based on enterprise grade semantic technology by
Ontotext
• Migration path from S4 based prototypes to
customised enterprise solutions with Ontotext
technology
S4 benefits
Text Mining & Knowledge Graphs in the Cloud with S4 #15Feb 2015
16. • Instantly available
• Free tier
• Easy to start, shorter learning curve
– Various add-ons, SDKs and demo code
• Simplify the technology stack for smart data
applications
• Focus on building applications, don’t worry about
infrastructure & operations
• Quicker prototyping, shorter development cycles
S4 for developers
Text Mining & Knowledge Graphs in the Cloud with S4 #16Feb 2015
18. • Text analytics services
– News annotation
– News categorisation
– Biomedical
– Twitter
• Entity linking & disambiguation
– Mappings to DBpedia & GeoNames instances
– Mappings to biomedical data sources (LinkedLifeData)
• HTML, MS Word, XML, plain text input
• Simple JSON output
Text analytics with S4
Text Mining & Knowledge Graphs in the Cloud with S4 #18Feb 2015
19. • Entity types
– Person
– Organization
– Location
– Relation (affiliation, customer, competitor, partner,
acquisition, role, …)
– Keywords and key phrases
• Enterprise grade technology
– Based on successful text mining solutions for big media
& publishing companies
News analytics with S4
Text Mining & Knowledge Graphs in the Cloud with S4 #19Feb 2015
20. Text Mining & Knowledge Graphs in the Cloud with S4 #20Feb 2015
News analytics with S4
22. News analytics example
Text Mining & Knowledge Graphs in the Cloud with S4 #22Feb 2015
API_KEY=s4trm64sb76u
KEY_SECRET=lrcki2kkajslsp6
SERVICE_ENDPOINT="https://$API_KEY:$KEY_SECRET@text.s4.ontotext.com/v1/news"
CONTENT="President Barack Obama is urging parents to get their children vaccinated in the face
of a measles outbreak that has infected more than 100 people in the United States. In excerpts
from an interview with NBC News that will air on Monday, Obama said measles was a
preventable disease."
CONTENT_TYPE="text/plain"
JSON_REQUEST="{"document" : "$CONTENT", "documentType" : "$CONTENT_TYPE"}"
curl -X POST -H "Content-Type: application/json" -d "$JSON_REQUEST" $SERVICE_ENDPOINT
{
"document" : "President Barack Obama is urging parents to get their children vaccinated in
the face of a measles outbreak that has infected more than 100 people in the United States.
In excerpts from an interview with NBC News that will air on Monday, Obama said measles
was a preventable disease" ,
"documentType" : "text/plain"
}
API key pair REST service
text
Request structure
Request
structure
23. • 17 top-level categories from the IPTC Subject
Reference System
– Arts / Culture / Entertainment, Crime / Law / Justice,
Disaster / Accident, Economy / Business / Finance,
Education, Environment, Health, Politics, …
• Enterprise grade technology
– Based on successful text mining solutions for big media
& publishing companies
News classification with S4
Text Mining & Knowledge Graphs in the Cloud with S4 #23Feb 2015
25. News classification example
Text Mining & Knowledge Graphs in the Cloud with S4 #25Feb 2015
API_KEY=s4trm64sb76u
KEY_SECRET=lrcki2kkajslsp6
SERVICE_ENDPOINT="https://$API_KEY:$KEY_SECRET@text.s4.ontotext.com/v1/news-classifier"
CONTENT_URL="http://www.theguardian.com/world/2015/feb/04/taiwan-plane-crash-lands-in-
river"
CONTENT_TYPE="text/plain"
JSON_REQUEST="{"documentUrl" : "$CONTENT_URL", "documentType" :
"$CONTENT_TYPE"}"
curl -X POST -H "Content-Type: application/json" -d "$JSON_REQUEST" $SERVICE_ENDPOINT
{
"documentUrl" : "http://www.theguardian.com/world/2015/feb/04/taiwan-plane-crash-lands-
in-river" ,
"documentType" : "text/html"
}
API key pair REST service
URL Request structure
Request
structure
26. • 130 biomedical entity types
– Organism, Virus, Animal, Anatomical Structure, Organ,
Tissue, Cell, Genome, Chemical, Lab Result, Clinical Drug,
Biologic Function, Organ Function, Disease/Syndrome, …
• Enterprise grade technology
– Based on successful text mining solutions for big
pharmaceuticals and healthcare providers
Biomedical analytics with S4
Text Mining & Knowledge Graphs in the Cloud with S4 #26Feb 2015
29. • Entity types
– Person, Location, Organisation, Date, Address, Money
– Hashtag, Emoticon, URL, @UserID
• Based on TwitIE microblog pipeline by GATE /
University of Sheffield
Twitter analytics with S4
Text Mining & Knowledge Graphs in the Cloud with S4 #29Feb 2015
32. • Standards compliance
– Based on a mature set of W3C standards: RDF/S, OWL,
SPARQL
– Portability & interoperability
• Schema-less data integration, easy querying of
diverse data
• Complex & exploratory queries
• Infer implicit relations in the graph
• Reuse open knowledge graphs (Linked Open Data)
RDF for smart data management
Text Mining & Knowledge Graphs in the Cloud with S4 #32Feb 2015
33. A visual view of RDF data
Text Mining & Knowledge Graphs in the Cloud with S4 #33Feb 2015
Sub-properties
Sub-classes
Transitive
relations
Inference
34. • High performance RDF database
• Full SPARQL 1.1 support
• Various reasoning profiles, including custom rules
• Efficient data integration (“sameAs” optimisations)
• Efficient deletion of statements & their inferences
• Geo-spatial indexing & querying with SPARQL
• RDF Rank, full-text search, 3rd party plugins
GraphDB by Ontotext
Text Mining & Knowledge Graphs in the Cloud with S4 #34Feb 2015
35. • Ideal for customers who are…
– still evaluating and testing RDF technology
– In the early phase of adoption / POC
• Enterprise grade RDF database in the Cloud
– No need for upfront payments for licenses & hardware
– Pay only for what you use, when you use it
– Instantly operational within minutes
– No need for complex planning - use as many DB
instances for as long as needed
– Timely upgrades to the latest version
• Self-managed and full-managed options
RDF database in the Cloud with S4
Text Mining & Knowledge Graphs in the Cloud with S4 #35Feb 2015
36. • Available from AWS Marketplace
• Variety of hardware configurations
– 2 to 8 CPU cores / 8 to 61 GB RAM
– IOPS performance & encryption (EBS)
• Manage large data volumes
• Pay-per-hour pricing
Self-managed database in the Cloud
Text Mining & Knowledge Graphs in the Cloud with S4 #36Feb 2015
37. • (available in Q2’2015)
• Low-cost DBaaS available 24/7
• Ideal for small & moderate data volumes
• Instantly start new databases when needed
• Zero administration: automated operations,
maintenance & upgrades
• Users pay only for the actual database utilisation
– database size + number of queries per period
Fully-managed database in the Cloud
Text Mining & Knowledge Graphs in the Cloud with S4 #37Feb 2015
42. Getting started in minutes
Text Mining & Knowledge Graphs in the Cloud with S4 #42Feb 2015
1. Register a personal
account at s4.ontotext.com
2. Generate an
API key pair
3. Check out the docs,
demos & code at
docs.s4.ontotext.com
4. Contact us
with questions!
43. • Java & C# SDKs
• Sample code
– Java, C#, NodeJS, JavaScript, Python, PHP, Groovy
– Curl examples for the most impatient
• GATE plugin (UIMA plugin in Q2’2015)
• Firefox plugin
• Online documentation
S4 for developers
Text Mining & Knowledge Graphs in the Cloud with S4 #43Feb 2015
44. • March 1st – 30th 2015
• Submit a cool text analytics & Linked Data
application using S4
• $1,000 for the winning submission
• More details at http://bit.ly/s4-challenge
S4 Developers Challenge
Text Mining & Knowledge Graphs in the Cloud with S4 #44Feb 2015
46. • Text analytics
– Multi-lingual text analytics
– Sentiment analytics
– JSON-LD output format
• RDF databases
– Fully managed RDF DBaaS
– Regular updates of the self-managed GraphDB on AWS
• Knowledge Graphs
– Private knowledge graph databases with
DBpedia/Wikidata
– 3rd party Linked Data visualisation & exploration tools
What to expect in 2015?
Text Mining & Knowledge Graphs in the Cloud with S4 #46Feb 2015
47. • Pricing plans
– Simple, transparent, usage based pricing
– Pay only for what you use, when you use it
• For developers
– UIMA plugin for S4
– More SDKs
– mode add-ons
– Demos and sample code
– S4 Developers Challenges
What to expect in 2015?
Text Mining & Knowledge Graphs in the Cloud with S4 #47Feb 2015
49. • Semantic technologies provide good capabilities
for smart data management
• Key S4 benefits
– Lowers the risks and costs for semantic technology
adoption
– Shortens time-to-market, reduces TCO
– Provides a safe migration path into custom enterprise
solutions with Ontotext technology
• Key S4 capabilities
– Various text analytics components (more to come!)
– Self-managed & fully managed RDF DB in the Cloud
– Knowledge graphs with reusable open data
Key Takeaways
Text Mining & Knowledge Graphs in the Cloud with S4 #49Feb 2015
50. • Online documentation
– http://docs.s4.ontotext.com/
• Sample code & demos on GitHub
– https://github.com/Ontotext-AD/S4
• Helpdesk
– http://support.s4.ontotext.com/
• Twitter
– @Ontotext_S4
Additional S4 resources
#50Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015
51. Thank you!
Text Mining and Knowledge Graphs in the Cloud:
The Self-Service Semantic Suite
A link to the recording will be sent out shortly
Feb 26th, 2015
#51Text Mining & Knowledge Graphs in the Cloud with S4 Feb 2015