Text and content analytics have become a source of competitive advantage, enabling business, government agencies, and researchers to extract unprecedented value from unstructured data.
Treparel (Delft, The Netherlands) is a independent provider of Text analytics and Visualization software. Organizations like Philips, Bayer, Abbott, NXP Semiconductors are using KMX Text Analytics software to gain faster, reliable, precise insights in large complex unstructured data sets.
The KMX API allows software and service companies to enhance their unstructured data analysis capabilities by embedding world class machine learning based clustering, categorization and visualization.
A recent review by IDC states: “KMX visualization capabilities around its auto-categorization and clustering offer immediate insight into unstructured data sets and appear to be adaptable and customizable to customer needs. Its approach to auto-categorization utilizes statistical principles and machine learning that require significantly less training and tuning on the part of customers than other approaches.”
2014: Treparel Big Data Text Analytics & Visualization
1. Introducing
Treparel:
Big Data Text
Analytics &
Visualization
applications
Treparel
Delftechpark 26
2628 XH Delft
The Netherlands
www.treparel.com
Jeroen Kleinhoven
CEO
jeroen@treparel.com
February, 2014
2. Industry
Thought
Leaders
about
Treparel
“Treparel
KMX’s
visualiza(on
capabili(es
around
its
auto-‐categoriza8on
and
clustering
offer
immediate
insight
into
unstructured
data
sets
and
appear
to
be
adaptable
and
customizable
to
customer
needs.
Its
approach
to
auto-‐categoriza8on
u8lizes
sta8s8cal
principles
and
machine
learning
that
require
significantly
less
training
and
tuning
on
the
part
of
customers
than
other
approaches.”
David
Schubmehl,
IDC
“As
we
acquire
more
and
more
informa8on,
we
need
tools
that
will
guide
us
through
the
data
maze.
Analysts
need
tools
to
help
them
understand
paGerns
and
define
clusters.
Users
need
to
explore
data
to
uncover
rela8onships
from
scaGered
sources.
Treparel’s
KMX
serves
both
these
needs
with
its
ability
to
cluster
and
categorize
collec8ons
of
data
with
a
high
degree
of
accuracy,
and
its
interac8ve
visualiza8on
tools
that
enable
explora8on
of
large
data
sets.”
Sue
Feldman,
Synthexis.com
(author:
The
Answer
Machine.
Treparel KMX – All Rights Reserved 2013
www.treparel.com
2
3. Some
of
our
clients
&
partners
KMX
is
an
integral
part
of
our
IP
analysis
toolbox.
It
contributes
to
our
capability
of
making
added
value
IP
analyses
of
technologies
and
compe8tors
to
support
strategic
decision
making.
“We’ve
speed
up
our
patent
searches
from
2
days
to
2
hours
using
KMX
technology”
www.fusepool.eu
Treparel KMX – All rights reserved 2014
3
4. Key
Business
Problems
Treparel
KMX
solves
Applica'on
Area
Business
problem
Value
IP
&
Patent
Search
How
to
improve
the
Bme-‐
consuming
and
costly
manual
search-‐process
of
patents.
Reduce
research
Bme,
improve
precision
&
recall
of
relevant
documents.
Improve
legal
posiBon
and
drive
more
revenue
from
IP.
Compe''ve
Analysis
How
to
increase
knowledge
on
compeBtors
by
gaining
clustered
insights
from
(semi-‐)
public
sources.
Improve
compeBBve
advantage
by
determining
internaBonal
strategy,
product
roadmap,
R&D
planning,
markeBng
campaigns
and
customer
senBment.
Healthcare
How
to
idenBfy
health
risks
and
find
correlaBons
in
deceases
or
medical
defects.
Early
idenBficaBon
on
health
risks
by
cross-‐discipline
analyses
on
medical
records,
clinical
observaBons
and
medical
images.
Media
&
Publishing
How
to
improve
search
and
content
analyBcs
on
large
volumes
of
publicaBons.
Text
analyBcs
embedded
in
publishing
improves
relevance
and
accuracy
of
search
and
shows
previously
hidden
documents.
Treparel KMX – All Rights Reserved 2013
www.treparel.com
4
5. Key
Business
Problems
Treparel
KMX
solves
-‐
2
Use
Cases
Business
problem
Value
Sen'ment
Analysis
How
to
manage
current
and
future
customers
and
their
interacBons
Deriving
senBment
from
criBcal
customer-‐based
text
sources
can
drive
revenue,
saBsfacBon
and
loyalty
Voice
of
Customer
Analyzing
HR-‐related
informaBon
How
to
manage
communicaBons
(like
CVs
and
projects)
to
match
and
interacBons
with
employees,
demand
to
supply.
managers,
subordinates
and
employment
candidates
eDiscovery
How
to
manage
and
miBgate
general
liBgaBon
risk
and
cost
in
large
sets
of
text
and
emails.
Text
analyBcs
applied
to
legal
trials
or
in
laws
and
jurisprudence
improves
accuracy
in
legal
cases
and
lowers
costs.
Predic've
Analysis
How
to
idenBfy
early
signs
of
required
maintenance
that
affect
customer
saBsfacBon
and
operaBonal
costs
Use
customer
saBsfacBon
surveys
on
food
quality
to
idenBfy
airplane
ovens
requiring
maintenance
tune-‐
ups
5
6. Part
1:
KMX:
Ready
to
Use
Text
AnalyBcs
Intui8ve
Content
Clustering,
Classifica8on
&
Visualiza8on
Treparel KMX – All rights reserved 2014
www.treparel.com
6
7. KMX
Text
AnalyBcs
ApplicaBon
overview
Query &
Search Tools
Acquire
documents
Text
Preprocessing
and
Indexing
Clustering
ClassificaBon
VisualizaBon
SemanBc
Analysis
KMX
unique
funcBons:
• Extract
concepts
in
context
using
clustering
and
classificaBon
of
documents
• Use
classificaBon
to
create
ranked
lists
and
to
tag
subsets
• Support
of
binary
and
mulB-‐
class
ClassificaBon
• Enterprise
ediBon
(server/
cloud)
&
Professional
ediBon
(desktop)
• IntegraBon
with
other
applicaBons
through
KMX
API
Taxonomies,
Ontologies
Present
Results
Treparel KMX – All rights reserved 2013
7
8. Clustering:
User
Unsupervised
AnalyBcs
Benefits:
Get
quick
insights
through
automated
visual
clusters
with
annotaBons
to
enhance
the
discovery
process
1. Analyze
the
clusters
and
the
relaBonships
in
the
data
2. Explore
outliers
in
the
data
3. Find
documents
of
interest
What
it
does:
A
visualizaBon
of
clusters
where
the
documents
are
displayed
as
points
and
the
distance
between
them
shows
their
similarity.
What
KMX
delivers:
Use
KMX
to
do:
1.
2.
3.
4.
Perform
text
preprocessing
(stemming/tokenizaBon
etc)
Calculate
between
all
documents
a
similarity
measure
Calculate
visualizaBon
(landscape)
with
automaBc
annotaBon
Create
the
visualizaBon
– As
a
staBc
image
– Or
provide
interacBon
where
the
user
can
zoom
in/out
with
support
for
adapBve
annotaBon
Treparel KMX – All rights reserved 2014
www.treparel.com
8
9. ClassificaBon:
User
Supervised
AnalyBcs
Benefits:
Finding
fast,
accurate
and
precise
small
result
sets
and
enabling
trend
reporBng
and
AlerBng
by
reusing
predefined
categorizaBon
models.
1. Obtain
a
ranked
list
of
the
most
relevant
documents
2. Separate
the
important
documents
from
the
irrelevant
documents
(noise)
How
it
works:
A
list
of
the
relevant
documents
defined
from
a
users
perspecBve.
What
KMX
delivers:
Use
KMX
to
do:
1. Tag
(label)
a
small
number
of
relevant
and
irrelevant
documents
– Use
search
to
idenBfy
documents
that
need
to
be
tagged
– Perform
manual
tagging
– Select
documents
interacBve
from
the
visualizaBon
(brushing)
2. Create
a
Classifier
(categorizer)
using
the
tagged
documents
3. AutomaBcally
perform
the
classificaBon
on
all
documents
4. Obtain
the
important
documents
as
ranked
high
and
the
irrelevant
documents
which
are
ranked
low
Treparel KMX – All rights reserved 2014
www.treparel.com
9
10. VisualizaBon:
Discovering
Unexpected
Insights
Benefits:
KMX
VisualisaBons
are
supporBng
the
process
of
construcBng
a
visual
image
in
the
mind
to
understand
the
data
be_er.
How
it
works:
KMX
offers
a
visualizaBon
framework
with
various
methods
for
seeing
the
unseen.
It
enriches
the
process
of
discovery
and
fosters
profound
and
unexpected
insights.
What
KMX
delivers:
Different
visualizaBons
or
visual
pipelines
to:
• Comprehend
large
datasets,
datasets
that
are
too
large
to
grasp
by
mental
imaginaBon.
• Discover
previous
unknown
properBes
of
the
data
set
that
may
not
have
been
anBcipated
• Reveal
inherent
problems
of
the
data,
for
instance
errors
and
artefacts
• Examine
large-‐scale
features
of
the
dataset
as
well
as
the
local
features
or
allows
the
user
to
see
local
features
in
a
larger
scale
reference
• Let
users
form
hypothesis
based
on
the
(newly)
observed
phenomena
or
developed
insights
Treparel KMX – All rights reserved 2014
www.treparel.com
10
11. Add-‐on
servers:
Auto
ReporBng
&
Batch
ClassificaBon
• Auto
Repor'ng
Server
– Support
automated
analysis
for
aggregated
results
for
mulBple
users
– Pie
&
bar
charts
– Landscape
visualizaBons
for
overview
of
subjects
– Enabling
rich
interacBon
via
web
interface
• Classifica'on
Batch
Server
– high-‐performance
stand-‐alone
text-‐
classificaBon
server
– Enables
large
scale
parallel
processing
Treparel KMX – All rights reserved 2014
Page 11
www.treparel.com
11
12. Business
Value
from
Content
with
KMX
þ Text
Analy'cs
for
Anyone
and
Everyone
–
IntuiBve
to
use
and
learn.
Designed
for
every
user:
business
(info
consumers)
and
scienBfic
(info
creators).
þ Instant
Business
Insights
–
Explore
all
of
your
unstructured
data
(text,
blogs,
email,
patents)
without
limits.
þ Rapid
Time
to
Value
-‐
Adaptable
and
customizable
to
users
needs.
No
implementaBon
or
extensive
and
expensive
modelling
or
development.
Significant
less
training
and
tuning.
þ Any
size
deployment
–
Meets
every
business
need
from
a
single
user
to
large
mulBlevel
type
user
groups.
þ Language
independent
–
Search
and
analyze
most
of
the
world’s
languages
using
machine
translaBon.
þ Any
kind
or
deployment
-‐
Use
it
from
your
desktop
or
in
a
-‐
private
-‐
cloud.
Buy
the
socware-‐as-‐a-‐service
or
get
the
output-‐as-‐a-‐service.
þ Enterprise-‐proven,
IP
&
IT
friendly
–
Successfully
delivering
value
to
IP,
business
and
markets
in
mulBnaBonal
companies.
þ Integra'on
–
Use
the
KMX
API
to
increase
the
value
of
unstructured
data
in
your
IP
discovery
infrastructure
www.treparel.com
Treparel KMX – All rights reserved 2012
12
13. Part
2:
KMX
socware:
User
Interface,
key
func8ons
&
value
Treparel KMX – All rights reserved 2014
www.treparel.com
13
14. KMX
:
Model,
Analyse,
Discover
and
Visualize
in
one
view
and
deploy
it
to
large
scale
Search
and
highligh'ng
Brushing
Filtering
Document
text
Landscape
visualiza'on
www.treparel.com rights reserved 2014
Treparel KMX – All
Coloring
of
classifica'on
score
14
KMX Example: ‘Ebola, SARS, Bird flue: How do they relate?’
15. KMX
:
OpBmize
Output
using
ClassificaBon
Performance
Tuning
Precision
And
Recall
Document
classifica'on
for
three
classes
Distribu'on
of
classifica'on
scores
www.treparel.com rights reserved 2014
Treparel KMX – All
15
16. Use
Case
1:
Performing
small
to
large
scale
SWOT
analysis
(on
AstraZeneca
patents)
SWOT
analysis
example
Start
with
removing
irrelevant
patents
using
Classifica8on
and
Filtering
to
determine:
• Who
are
the
important
players
(assignees,
inventors)?
• Where
are
the
important
patents
filed
(countries)?
• What
is
the
trend
over
Bme
(growth
of
patents
over
the
years)?
• NB:
we
used
a
(very)
simple
query
to
find
986
patents
filed
under
Astrazeneca.
Patent
Database
Queries
+10.000 patents
Ranking
Filtering
Ranking
Filtering
986 patents
29 patents
Ranking
Filtering
Business
User
Treparel KMX – All rights reserved 2014
Output
17. Landscaping
and
Ranking:
From
986
to
the
most
relevant
patents
Fig: Using vlsual selection (brushing) to build a classification model (Classifier) to be able to rank
the full data set and to extract the most relevant.
17
18. Landscaping
and
Ranking:
What
are
most
relevant
Respiratory
&
Inflamma8on
patents?
Yellow = most
important patents
(+80% score)
Blue = least
relevant patents
(for this analysis)
NB: crosshair
points to 1
specific patent
(full text in left
pane)
Fig: Ranked patents using a Classifier for Respiratory & Inflammation patents (In yellow the selection of 29
18
absolute relevant patents to be further analyzed). We used ‘respiratory’ to demonstrate highlighting
capabilities.
19. How
Reliable
&
Accurate
are
the
results?
Review
your
results
with
advanced
performance
tools
The
quality
of
the
automaBc
classificaBon
(categorizaBon)
is
shown
in
the
histogram,
where
a
small
number
of
documents
with
a
high
classificaBon
score
are
separated
from
the
large
number
of
documents.
Fig: Classification performance 1280 patents on ‘biomass’
Non
relevant
documents
Relevant
documents
KMX
calculates
the
Precision
and
Recall
of
the
results
using
cross
validaBon.
• Precision
is
essenBal
for:
First
analysis
&
AlerBng
services
• Recall
is
crucial
for:
Freedom
to
Operate
search,
Validity
search
Patentability
search
• Both
need
to
be
high
for:
Patent
porkolio
landscape
analysis,
Technology
ExploraBon,
Risk
Assessments
19
20. Use
Case
2:
Concept
detecBon
using
document
classificaBon
Extrac8ng
concepts
in
context
from
classifica8on
of
documents
1. VisualizaBon
à
mulBple
topic
clusters
2. Select
cluster
à
select
documents
with
similar
topics
3. Select
training
documents
within
the
sub-‐cluster
4. Build
Classifier
and
classify
5. Rank
documents
à
find
set
of
documents
with
related
concepts
6. Extract
concepts
KMX Example: ‘Ebola, SARS, Bird flue: How do they relate?’
Treparel KMX – All rights reserved 2014
Page
20
|
20
21. Part
3:
NEW:
Content
Dashboard
(InfoApp)
Integrated
SAAS
based
search,
repor8ng,
visualiza8on
and
analysis
Treparel KMX – All rights reserved 2014
www.treparel.com
21
22.
Role
of
KMX
in
Integrated
InformaBon
ApplicaBons
Client/
Server
Reporting
Dashboard
Informa'on
Consumers
(+
100
users)
Mobile
Web
Search
Alerting
Visualization
Exploring
Domain or Market Specific InfoApps (by Partners)
Management, Development and Integration
Text Mining
Text PreP
Creators/
Data
Scien'sts
(1-‐5
users)
Stem/Token
Tweets
Documents
Treparel KMX – All Rights Reserved 2013
Indexing
Patent
Data
Clustering
Classification
Research
Literature
Enterprise
Content
jeroen@treparel.com
Visualize
Email
Text
Websites
22
23. Content
Dashboard:
Content
Driven
AnalyBcal
solu8on
Ease of Use access to Search, Reporting & Analysis of
content like Patents, Emails, Legislation, Application Notes, websites
Treparel KMX – All rights reserved 2014
www.treparel.com
23
24. Content
Dashboard:
Content
analyBcs
beyond
key-‐word
search
Interactive taxonomy with multiple coupled views
and advanced search in large sets of documents
Treparel KMX – All rights reserved 2014
www.treparel.com
24
25. Content
Dashboard:
Built
in
analy8cs
&
interac8ve
visualiza8ons
Ad-hoc or Standard interactive visualizations
leading directly to the underlying documents or notes
Treparel KMX – All rights reserved 2014
www.treparel.com
25
26. Part
4:
NEW:
KMX
API
for
OEM
partners:
Put
best
in
class
content
analy8cs
in
your
solu8ons
Treparel KMX – All rights reserved 2014
www.treparel.com
26
27. SoluBons
built
on
KMX
KMX Empowers InfoApps
(solution partners/OEM/VAR)
Partner solutions:
• IP & Patent Analytics
• Media & Publishing
• HR
• eDiscovery (Law & Legislation)
• Fraud Detection
• National Security & Police
• Sentiment analytics
• CRM/Voice of Customer
• Government
• Sharepoint (Enrich & Migrate)
• Content-based Dashboards
KMX platform
Big Data Text Analytics
(cloud based platform / API)
Fig 1. McKinsey diagram showing the three technology layers of the Big
Data technology stack
27
28. KMX
API
for
OEM:
Embed
Advanced
Text
AnalyBcs
in
your
soluBon
Clustering
Provides users unsupervised
analytics and automatically
identifies inherent themes or
information clusters.
Classification
Supervised analytics to
help users automatically
categorize large sets of
documents.
Through a dynamic
hierarchical topic view into
search results it enables users
to quickly focus on annotated
subjects rather than scrolling
through long results lists.
The Classification process
can use a small number of
documents sets for learn-byexample categorization.
KMX API
XML-RPC and REST (JSON)
Python Pickle protocol
Visualization
Advanced visual knowledge
discovery for displaying,
exporting and sharing data
results, ranked document
lists, labeled and enriched
data or interactive
visualizations.
Server: User / Tenant mgt
User objects mgt (datasets,
work spaces, classifiers, stop
lists,.)
Databases: Oracle,
PostgreSQL
Client Application:
Native Windows (for creating
Analysis pipelines)
Using QT for GUI
Using OpenGL for
visualizations
By sorting the content of
documents by topic,
relevancy and keywords
users can apply their own
models or rules for
classification.
Terms can be extracted to
use in building thesauri or
taxonomies.
Example Applications Areas
Advanced Visualizations, Interactive Analytics, Text Disambiguation, Data Enrichment, Clickthrough Optimization, Concept Extraction, Automated Tagging, Semantic Discovery, Named Entity
Recognition Document Overlap Display, SWOT analysis, Sentiment Analysis, Predictive Analytics
29. KMX enables information and knowledge professionals
to gain faster, reliable, more precise insights in large
complex unstructured data sets allowing them to make
better informed decisions.
Treparel is a leading technology solution provider in
Big Data Text Analytics & Visualization