SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Introducing
Treparel:
Big Data Text
Analytics &
Visualization
applications

Treparel
Delftechpark 26
2628 XH Delft
The Netherlands
www.treparel.com

Jeroen Kleinhoven
CEO
jeroen@treparel.com

February, 2014
Industry	
  Thought	
  Leaders	
  about	
  Treparel	
  
“Treparel	
  KMX’s	
  visualiza(on	
  capabili(es	
  around	
  its	
  auto-­‐categoriza8on	
  
and	
  clustering	
  offer	
  immediate	
  insight	
  into	
  unstructured	
  data	
  sets	
  and	
  
appear	
  to	
  be	
  adaptable	
  and	
  customizable	
  to	
  customer	
  needs.	
  Its	
  approach	
  to	
  
auto-­‐categoriza8on	
  u8lizes	
  sta8s8cal	
  principles	
  and	
  machine	
  learning	
  that	
  
require	
  significantly	
  less	
  training	
  and	
  tuning	
  on	
  the	
  part	
  of	
  customers	
  than	
  
other	
  approaches.”	
  David	
  Schubmehl,	
  IDC	
  

“As	
  we	
  acquire	
  more	
  and	
  more	
  informa8on,	
  we	
  need	
  tools	
  that	
  will	
  guide	
  us	
  
through	
  the	
  data	
  maze.	
  Analysts	
  need	
  tools	
  to	
  help	
  them	
  understand	
  
paGerns	
  and	
  define	
  clusters.	
  	
  Users	
  need	
  to	
  explore	
  data	
  to	
  uncover	
  
rela8onships	
  from	
  scaGered	
  sources.	
  	
  Treparel’s	
  KMX	
  serves	
  both	
  these	
  
needs	
  with	
  its	
  ability	
  to	
  cluster	
  and	
  categorize	
  collec8ons	
  of	
  data	
  with	
  a	
  high	
  
degree	
  of	
  accuracy,	
  and	
  its	
  interac8ve	
  visualiza8on	
  tools	
  that	
  enable	
  
explora8on	
  of	
  large	
  data	
  sets.”	
  Sue	
  Feldman,	
  Synthexis.com	
  (author:	
  
The	
  Answer	
  Machine.	
  
Treparel KMX – All Rights Reserved 2013

www.treparel.com

2
Some	
  of	
  our	
  clients	
  &	
  partners	
  
KMX	
  is	
  an	
  integral	
  part	
  of	
  our	
  IP	
  analysis	
  toolbox.	
  It	
  contributes	
  to	
  our	
  
capability	
  of	
  making	
  added	
  value	
  IP	
  analyses	
  of	
  technologies	
  and	
  
compe8tors	
  to	
  support	
  strategic	
  decision	
  making.	
  

“We’ve	
  speed	
  up	
  our	
  patent	
  searches	
  from	
  2	
  days	
  to	
  2	
  hours	
  using	
  
KMX	
  technology”	
  

www.fusepool.eu

Treparel KMX – All rights reserved 2014

3
Key	
  Business	
  Problems	
  Treparel	
  KMX	
  solves	
  
Applica'on	
  Area	
  

Business	
  problem	
  

Value	
  

IP	
  &	
  Patent	
  Search	
  
How	
  to	
  improve	
  the	
  Bme-­‐
consuming	
  and	
  costly	
  manual	
  
search-­‐process	
  of	
  patents.	
  

Reduce	
  research	
  Bme,	
  improve	
  
precision	
  &	
  recall	
  of	
  relevant	
  
documents.	
  Improve	
  legal	
  posiBon	
  
and	
  drive	
  more	
  revenue	
  from	
  IP.	
  

Compe''ve	
  Analysis	
  	
  
How	
  to	
  increase	
  knowledge	
  on	
  
compeBtors	
  by	
  gaining	
  
clustered	
  insights	
  from	
  (semi-­‐)	
  
public	
  sources.	
  

Improve	
  compeBBve	
  advantage	
  by	
  
determining	
  internaBonal	
  strategy,	
  
product	
  roadmap,	
  R&D	
  planning,	
  
markeBng	
  campaigns	
  and	
  customer	
  
senBment.	
  

Healthcare	
  	
  
How	
  to	
  idenBfy	
  health	
  risks	
  and	
  
find	
  correlaBons	
  in	
  deceases	
  or	
  
medical	
  defects.	
  

Early	
  idenBficaBon	
  on	
  health	
  risks	
  by	
  
cross-­‐discipline	
  analyses	
  on	
  medical	
  
records,	
  clinical	
  observaBons	
  and	
  
medical	
  images.	
  

Media	
  &	
  Publishing	
  
How	
  to	
  improve	
  search	
  and	
  
content	
  analyBcs	
  on	
  large	
  
volumes	
  of	
  publicaBons.	
  

Text	
  analyBcs	
  embedded	
  in	
  publishing	
  
improves	
  relevance	
  and	
  accuracy	
  of	
  
search	
  and	
  shows	
  previously	
  hidden	
  
documents.	
  

Treparel KMX – All Rights Reserved 2013

www.treparel.com

4
Key	
  Business	
  Problems	
  Treparel	
  KMX	
  solves	
  -­‐	
  2	
  
Use	
  Cases	
  

Business	
  problem	
  

Value	
  

Sen'ment	
  Analysis	
  
How	
  to	
  manage	
  current	
  and	
  
future	
  customers	
  and	
  their	
  
interacBons	
  

Deriving	
  senBment	
  from	
  criBcal	
  
customer-­‐based	
  text	
  sources	
  can	
  
drive	
  revenue,	
  saBsfacBon	
  and	
  
loyalty	
  	
  

Voice	
  of	
  Customer	
  
Analyzing	
  HR-­‐related	
  informaBon	
  
How	
  to	
  manage	
  communicaBons	
   (like	
  CVs	
  and	
  projects)	
  to	
  match	
  
and	
  interacBons	
  with	
  employees,	
   demand	
  to	
  supply.	
  
managers,	
  subordinates	
  and	
  
employment	
  candidates	
  
eDiscovery	
  
How	
  to	
  manage	
  and	
  miBgate	
  
general	
  liBgaBon	
  risk	
  and	
  cost	
  in	
  
large	
  sets	
  of	
  text	
  and	
  emails.	
  

Text	
  analyBcs	
  applied	
  to	
  
legal	
  trials	
  or	
  in	
  laws	
  and	
  
jurisprudence	
  improves	
  accuracy	
  
in	
  legal	
  cases	
  and	
  lowers	
  costs.	
  

Predic've	
  Analysis	
  
How	
  to	
  idenBfy	
  early	
  signs	
  of	
  
required	
  maintenance	
  that	
  affect	
  
customer	
  saBsfacBon	
  and	
  
operaBonal	
  costs	
  

Use	
  customer	
  saBsfacBon	
  surveys	
  
on	
  food	
  quality	
  to	
  idenBfy	
  airplane	
  
ovens	
  requiring	
  maintenance	
  tune-­‐
ups	
  
5
Part	
  1:	
  	
  
KMX:	
  Ready	
  to	
  Use	
  Text	
  AnalyBcs	
  	
  
Intui8ve	
  Content	
  Clustering,	
  
Classifica8on	
  &	
  Visualiza8on	
  

Treparel KMX – All rights reserved 2014

www.treparel.com

6
KMX	
  Text	
  AnalyBcs	
  ApplicaBon	
  overview	
  
Query &
Search Tools

Acquire	
  documents	
  

Text	
  Preprocessing	
  and	
  Indexing	
  

Clustering	
  

ClassificaBon	
  

VisualizaBon	
  

SemanBc	
  Analysis	
  

KMX	
  unique	
  funcBons:	
  
•  Extract	
  concepts	
  in	
  context	
  
using	
  clustering	
  and	
  
classificaBon	
  of	
  documents	
  
•  Use	
  classificaBon	
  to	
  create	
  
ranked	
  lists	
  and	
  to	
  tag	
  subsets	
  
•  Support	
  of	
  binary	
  and	
  mulB-­‐
class	
  ClassificaBon	
  
•  Enterprise	
  ediBon	
  (server/
cloud)	
  &	
  Professional	
  ediBon	
  
(desktop)	
  
•  IntegraBon	
  with	
  other	
  
applicaBons	
  through	
  KMX	
  API	
  

Taxonomies,	
  
Ontologies	
  

Present	
  Results	
  
Treparel KMX – All rights reserved 2013

7
Clustering:	
  User	
  Unsupervised	
  AnalyBcs	
  
Benefits:	
  Get	
  quick	
  insights	
  through	
  automated	
  visual	
  clusters	
  
with	
  annotaBons	
  to	
  enhance	
  the	
  discovery	
  process	
  	
  
1.  Analyze	
  the	
  clusters	
  and	
  the	
  relaBonships	
  in	
  the	
  data	
  	
  
2.  Explore	
  outliers	
  in	
  the	
  data	
  
3.  Find	
  documents	
  of	
  interest	
  
What	
  it	
  does:	
  A	
  visualizaBon	
  of	
  clusters	
  where	
  the	
  documents	
  
are	
  displayed	
  as	
  points	
  and	
  the	
  distance	
  between	
  them	
  shows	
  
their	
  similarity.	
  	
  
	
  
What	
  KMX	
  delivers:	
  Use	
  KMX	
  to	
  do:	
  
1. 
2. 
3. 
4. 

Perform	
  text	
  preprocessing	
  (stemming/tokenizaBon	
  etc)	
  
Calculate	
  between	
  all	
  documents	
  a	
  similarity	
  measure	
  
Calculate	
  visualizaBon	
  (landscape)	
  with	
  automaBc	
  annotaBon	
  
Create	
  the	
  visualizaBon	
  	
  
–  As	
  a	
  staBc	
  image	
  
–  Or	
  provide	
  interacBon	
  where	
  the	
  user	
  can	
  zoom	
  in/out	
  with	
  
support	
  for	
  adapBve	
  annotaBon	
  

Treparel KMX – All rights reserved 2014

www.treparel.com

8
ClassificaBon:	
  User	
  Supervised	
  AnalyBcs	
  
Benefits:	
  Finding	
  fast,	
  accurate	
  and	
  precise	
  small	
  result	
  sets	
  and	
  enabling	
  trend	
  
reporBng	
  and	
  AlerBng	
  by	
  reusing	
  predefined	
  categorizaBon	
  models.	
  
1.  Obtain	
  a	
  ranked	
  list	
  of	
  the	
  most	
  relevant	
  documents	
  	
  
2.  Separate	
  the	
  important	
  documents	
  from	
  the	
  irrelevant	
  documents	
  (noise)	
  
How	
  it	
  works:	
  A	
  list	
  of	
  the	
  relevant	
  documents	
  defined	
  from	
  a	
  users	
  
perspecBve.	
  	
  
	
  
What	
  KMX	
  delivers:	
  Use	
  KMX	
  to	
  do:	
  
1.  Tag	
  (label)	
  a	
  small	
  number	
  of	
  relevant	
  and	
  irrelevant	
  documents	
  
–  Use	
  search	
  to	
  idenBfy	
  documents	
  that	
  need	
  to	
  be	
  tagged	
  
–  Perform	
  manual	
  tagging	
  
–  Select	
  documents	
  interacBve	
  from	
  the	
  visualizaBon	
  (brushing)	
  
2.  Create	
  a	
  Classifier	
  (categorizer)	
  using	
  the	
  tagged	
  documents	
  
3.  AutomaBcally	
  perform	
  the	
  classificaBon	
  on	
  all	
  documents	
  	
  
4.  Obtain	
  the	
  important	
  documents	
  as	
  ranked	
  high	
  and	
  the	
  irrelevant	
  
documents	
  which	
  are	
  ranked	
  low	
  
Treparel KMX – All rights reserved 2014

www.treparel.com

9
VisualizaBon:	
  Discovering	
  Unexpected	
  Insights	
  
Benefits:	
  KMX	
  VisualisaBons	
  are	
  supporBng	
  	
  
the	
  process	
  of	
  construcBng	
  a	
  visual	
  image	
  	
  
in	
  the	
  mind	
  to	
  understand	
  the	
  data	
  be_er.	
  
How	
  it	
  works:	
  KMX	
  offers	
  a	
  visualizaBon	
  framework	
  with	
  various	
  methods	
  for	
  
seeing	
  the	
  unseen.	
  It	
  enriches	
  the	
  process	
  of	
  discovery	
  and	
  fosters	
  profound	
  
and	
  unexpected	
  insights.	
  
	
  
What	
  KMX	
  delivers:	
  Different	
  visualizaBons	
  or	
  visual	
  pipelines	
  to:	
  
•  Comprehend	
  large	
  datasets,	
  datasets	
  that	
  are	
  too	
  large	
  to	
  grasp	
  by	
  mental	
  
imaginaBon.	
  
•  Discover	
  previous	
  unknown	
  properBes	
  of	
  the	
  data	
  set	
  that	
  may	
  not	
  have	
  
been	
  anBcipated	
  
•  Reveal	
  inherent	
  problems	
  of	
  the	
  data,	
  for	
  instance	
  errors	
  and	
  artefacts	
  
•  Examine	
  large-­‐scale	
  features	
  of	
  the	
  dataset	
  as	
  well	
  as	
  the	
  local	
  features	
  or	
  
allows	
  the	
  user	
  to	
  see	
  local	
  features	
  in	
  a	
  larger	
  scale	
  reference	
  
•  Let	
  users	
  form	
  hypothesis	
  based	
  on	
  the	
  (newly)	
  observed	
  phenomena	
  or	
  
developed	
  insights	
  	
  

Treparel KMX – All rights reserved 2014

www.treparel.com

10
Add-­‐on	
  servers:	
  
Auto	
  ReporBng	
  &	
  Batch	
  ClassificaBon	
  
•  Auto	
  Repor'ng	
  Server	
  
–  Support	
  automated	
  analysis	
  for	
  aggregated	
  
results	
  for	
  mulBple	
  users	
  
–  Pie	
  &	
  bar	
  charts	
  
–  Landscape	
  visualizaBons	
  for	
  overview	
  of	
  
subjects	
  
–  Enabling	
  rich	
  interacBon	
  via	
  web	
  interface	
  

•  Classifica'on	
  Batch	
  Server	
  
–  high-­‐performance	
  stand-­‐alone	
  text-­‐
classificaBon	
  server	
  
–  Enables	
  large	
  scale	
  parallel	
  processing	
  

Treparel KMX – All rights reserved 2014
Page 11

www.treparel.com

11
Business	
  Value	
  from	
  Content	
  with	
  KMX	
  
þ  Text	
  Analy'cs	
  for	
  Anyone	
  and	
  Everyone	
  –	
  IntuiBve	
  to	
  use	
  and	
  learn.	
  Designed	
  
for	
  every	
  user:	
  business	
  (info	
  consumers)	
  and	
  scienBfic	
  (info	
  creators).	
  
þ  Instant	
  Business	
  Insights	
  –	
  Explore	
  all	
  of	
  your	
  unstructured	
  data	
  (text,	
  blogs,	
  
email,	
  patents)	
  without	
  limits.	
  	
  
þ  Rapid	
  Time	
  to	
  Value	
  -­‐	
  Adaptable	
  and	
  customizable	
  to	
  users	
  needs.	
  No	
  
implementaBon	
  or	
  extensive	
  and	
  expensive	
  modelling	
  or	
  development.	
  
Significant	
  less	
  training	
  and	
  tuning.	
  	
  	
  
þ  Any	
  size	
  deployment	
  –	
  Meets	
  every	
  business	
  need	
  from	
  a	
  single	
  user	
  to	
  large	
  
mulBlevel	
  type	
  user	
  groups.	
  	
  
þ  Language	
  independent	
  –	
  Search	
  and	
  analyze	
  most	
  of	
  the	
  world’s	
  languages	
  
using	
  machine	
  translaBon.	
  
þ  Any	
  kind	
  or	
  deployment	
  -­‐	
  Use	
  it	
  from	
  your	
  desktop	
  or	
  in	
  a	
  -­‐	
  private	
  -­‐	
  	
  cloud.	
  Buy	
  
the	
  socware-­‐as-­‐a-­‐service	
  or	
  get	
  the	
  output-­‐as-­‐a-­‐service.	
  	
  	
  
þ  Enterprise-­‐proven,	
  IP	
  &	
  IT	
  friendly	
  –	
  Successfully	
  delivering	
  value	
  to	
  IP,	
  business	
  
and	
  markets	
  in	
  mulBnaBonal	
  companies.	
  
þ  Integra'on	
  –	
  Use	
  the	
  KMX	
  API	
  to	
  increase	
  the	
  value	
  of	
  unstructured	
  data	
  in	
  your	
  
IP	
  discovery	
  infrastructure	
  
www.treparel.com

Treparel KMX – All rights reserved 2012

12
Part	
  2:	
  	
  
KMX	
  socware:	
  	
  
User	
  Interface,	
  key	
  func8ons	
  &	
  value	
  

Treparel KMX – All rights reserved 2014

www.treparel.com

13
KMX	
  :	
  Model,	
  Analyse,	
  Discover	
  and	
  Visualize	
  	
  
in	
  one	
  view	
  and	
  deploy	
  it	
  to	
  large	
  scale	
  
Search	
  and	
  
highligh'ng	
  

Brushing	
  

Filtering	
  

Document	
  text	
  

Landscape	
  visualiza'on	
  

www.treparel.com rights reserved 2014
Treparel KMX – All

Coloring	
  of	
  classifica'on	
  score	
  
14
KMX Example: ‘Ebola, SARS, Bird flue: How do they relate?’
KMX	
  :	
  OpBmize	
  Output	
  	
  
using	
  ClassificaBon	
  Performance	
  Tuning	
  
Precision	
  
And	
  	
  
Recall	
  

Document	
  
classifica'on	
  
for	
  three	
  
classes	
  

Distribu'on	
  of	
  classifica'on	
  scores	
  
www.treparel.com rights reserved 2014
Treparel KMX – All

15
Use	
  Case	
  1:	
  Performing	
  small	
  to	
  large	
  scale	
  SWOT	
  
analysis	
  (on	
  AstraZeneca	
  patents)	
  
SWOT	
  analysis	
  example	
  
	
  

Start	
  with	
  removing	
  irrelevant	
  
patents	
  using	
  Classifica8on	
  and	
  
Filtering	
  to	
  determine:	
  

•  Who	
  are	
  the	
  important	
  players	
  
(assignees,	
  inventors)?	
  
•  Where	
  are	
  the	
  important	
  patents	
  
filed	
  (countries)?	
  
•  What	
  is	
  the	
  trend	
  over	
  Bme	
  (growth	
  
of	
  patents	
  over	
  the	
  years)?	
  
•  NB:	
  we	
  used	
  a	
  (very)	
  simple	
  query	
  to	
  
find	
  986	
  patents	
  filed	
  under	
  
Astrazeneca.	
  
	
  	
  

Patent	
  
Database	
  

Queries	
  

+10.000 patents

Ranking	
  

Filtering	
  

Ranking	
  

Filtering	
  

986 patents

29 patents

Ranking	
  

Filtering	
  

Business	
  
User	
  
	
  
	
  

Treparel KMX – All rights reserved 2014

Output
Landscaping	
  and	
  Ranking:	
  
From	
  986	
  to	
  the	
  most	
  relevant	
  patents	
  

Fig: Using vlsual selection (brushing) to build a classification model (Classifier) to be able to rank
the full data set and to extract the most relevant.

17
Landscaping	
  and	
  Ranking:	
  

What	
  are	
  most	
  relevant	
  Respiratory	
  &	
  Inflamma8on	
  patents?	
  
Yellow = most
important patents
(+80% score)
Blue = least
relevant patents
(for this analysis)

NB: crosshair
points to 1
specific patent
(full text in left
pane)

Fig: Ranked patents using a Classifier for Respiratory & Inflammation patents (In yellow the selection of 29
18
absolute relevant patents to be further analyzed). We used ‘respiratory’ to demonstrate highlighting
capabilities.
How	
  Reliable	
  &	
  Accurate	
  are	
  the	
  results?	
  

Review	
  your	
  results	
  with	
  advanced	
  performance	
  tools	
  
The	
  quality	
  of	
  the	
  automaBc	
  classificaBon	
  (categorizaBon)	
  is	
  shown	
  in	
  the	
  
histogram,	
  where	
  a	
  small	
  number	
  of	
  documents	
  with	
  a	
  high	
  classificaBon	
  score	
  
are	
  separated	
  from	
  the	
  large	
  number	
  of	
  documents.	
  

Fig: Classification performance 1280 patents on ‘biomass’

Non	
  relevant	
  documents	
  

Relevant	
  documents	
  

KMX	
  calculates	
  the	
  Precision	
  and	
  Recall	
  of	
  the	
  results	
  using	
  cross	
  validaBon.	
  

• Precision	
  is	
  essenBal	
  for:	
  First	
  analysis	
  &	
  AlerBng	
  services	
  
• Recall	
  is	
  crucial	
  for:	
  Freedom	
  to	
  Operate	
  search,	
  Validity	
  search	
  Patentability	
  search	
  
• Both	
  need	
  to	
  be	
  high	
  for:	
  Patent	
  porkolio	
  landscape	
  analysis,	
  Technology	
  ExploraBon,	
  Risk	
  Assessments	
  

	
  

19
Use	
  Case	
  2:	
  
Concept	
  detecBon	
  using	
  document	
  classificaBon	
  
Extrac8ng	
  concepts	
  in	
  context	
  from	
  classifica8on	
  of	
  documents	
  
1.  VisualizaBon	
  à	
  mulBple	
  topic	
  
clusters	
  
2.  Select	
  cluster	
  à	
  select	
  documents	
  
with	
  similar	
  topics	
  
3.  Select	
  training	
  documents	
  within	
  
the	
  sub-­‐cluster	
  
4.  Build	
  Classifier	
  and	
  classify	
  
5.  Rank	
  documents	
  à	
  find	
  set	
  of	
  
documents	
  with	
  related	
  concepts	
  
6.  Extract	
  concepts	
  

KMX Example: ‘Ebola, SARS, Bird flue: How do they relate?’

Treparel KMX – All rights reserved 2014

Page	
  20	
  |	
  	
  

20
Part	
  3:	
  	
  
NEW:	
  Content	
  Dashboard	
  (InfoApp)	
  
Integrated	
  SAAS	
  based	
  search,	
  repor8ng,	
  
visualiza8on	
  and	
  analysis	
  

Treparel KMX – All rights reserved 2014

www.treparel.com

21
 
Role	
  of	
  KMX	
  in	
  Integrated	
  InformaBon	
  ApplicaBons	
  
	
  
Client/
Server

Reporting

Dashboard
Informa'on	
  
Consumers	
  
(+	
  100	
  users)	
  

Mobile

Web

Search

Alerting

Visualization

Exploring

Domain or Market Specific InfoApps (by Partners)
Management, Development and Integration
Text Mining
Text PreP

Creators/	
  
Data	
  Scien'sts	
  
(1-­‐5	
  users)	
  

Stem/Token

Tweets
Documents

Treparel KMX – All Rights Reserved 2013

Indexing

Patent
Data

Clustering

Classification

Research
Literature
Enterprise
Content

jeroen@treparel.com

Visualize

Email

Text

Websites
22
Content	
  Dashboard:	
  	
  
Content	
  Driven	
  AnalyBcal	
  solu8on	
  

Ease of Use access to Search, Reporting & Analysis of
content like Patents, Emails, Legislation, Application Notes, websites
Treparel KMX – All rights reserved 2014

www.treparel.com

23
Content	
  Dashboard:	
  	
  
Content	
  analyBcs	
  beyond	
  key-­‐word	
  search	
  

Interactive taxonomy with multiple coupled views
and advanced search in large sets of documents
Treparel KMX – All rights reserved 2014

www.treparel.com

24
Content	
  Dashboard:	
  	
  
Built	
  in	
  analy8cs	
  &	
  interac8ve	
  visualiza8ons	
  

Ad-hoc or Standard interactive visualizations
leading directly to the underlying documents or notes
Treparel KMX – All rights reserved 2014

www.treparel.com

25
Part	
  4:	
  	
  
NEW:	
  KMX	
  API	
  for	
  OEM	
  partners:	
  
Put	
  best	
  in	
  class	
  content	
  analy8cs	
  
in	
  your	
  solu8ons	
  

Treparel KMX – All rights reserved 2014

www.treparel.com

26
SoluBons	
  built	
  on	
  KMX	
  
KMX Empowers InfoApps
(solution partners/OEM/VAR)

Partner solutions:
•  IP & Patent Analytics
•  Media & Publishing
•  HR
•  eDiscovery (Law & Legislation)
•  Fraud Detection
•  National Security & Police
•  Sentiment analytics
•  CRM/Voice of Customer
•  Government
•  Sharepoint (Enrich & Migrate)
•  Content-based Dashboards

KMX platform
Big Data Text Analytics
(cloud based platform / API)

Fig 1. McKinsey diagram showing the three technology layers of the Big
Data technology stack

27
KMX	
  API	
  for	
  OEM:	
  
Embed	
  Advanced	
  Text	
  AnalyBcs	
  in	
  your	
  soluBon	
  
Clustering
Provides users unsupervised
analytics and automatically
identifies inherent themes or
information clusters.

Classification
Supervised analytics to
help users automatically
categorize large sets of
documents.

Through a dynamic
hierarchical topic view into
search results it enables users
to quickly focus on annotated
subjects rather than scrolling
through long results lists.

The Classification process
can use a small number of
documents sets for learn-byexample categorization.

KMX API
XML-RPC and REST (JSON)
Python Pickle protocol

Visualization
Advanced visual knowledge
discovery for displaying,
exporting and sharing data
results, ranked document
lists, labeled and enriched
data or interactive
visualizations.

Server: User / Tenant mgt
User objects mgt (datasets,
work spaces, classifiers, stop
lists,.)
Databases: Oracle,
PostgreSQL
Client Application:
Native Windows (for creating
Analysis pipelines)
Using QT for GUI
Using OpenGL for
visualizations

By sorting the content of
documents by topic,
relevancy and keywords
users can apply their own
models or rules for
classification.

Terms can be extracted to
use in building thesauri or
taxonomies.
Example Applications Areas
Advanced Visualizations, Interactive Analytics, Text Disambiguation, Data Enrichment, Clickthrough Optimization, Concept Extraction, Automated Tagging, Semantic Discovery, Named Entity
Recognition Document Overlap Display, SWOT analysis, Sentiment Analysis, Predictive Analytics
KMX enables information and knowledge professionals
to gain faster, reliable, more precise insights in large
complex unstructured data sets allowing them to make
better informed decisions.

Treparel is a leading technology solution provider in
Big Data Text Analytics & Visualization

Contenu connexe

Tendances

IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...Dr. Haxel Consult
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesSơn Còm Nhom
 
Data Mining : Concepts
Data Mining : ConceptsData Mining : Concepts
Data Mining : ConceptsPragya Pandey
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data miningDevakumar Jain
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cyclehktripathy
 
Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Mark Tabladillo
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar reportmayurik19
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
Data analytics beyond data processing and how it affects Industry 4.0
Data analytics beyond data processing and how it affects Industry 4.0Data analytics beyond data processing and how it affects Industry 4.0
Data analytics beyond data processing and how it affects Industry 4.0Mathieu d'Aquin
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and workAmr Abd El Latief
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CSThanveen
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Miningtobiemuir
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 
Data Warehouse By Piyush
Data Warehouse By PiyushData Warehouse By Piyush
Data Warehouse By Piyushastronish
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and predictionDataminingTools Inc
 

Tendances (19)

Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
IC-SDV 2019: Down-to-earth machine learning: What you always wanted your data...
 
Lecture1
Lecture1Lecture1
Lecture1
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
Data Mining : Concepts
Data Mining : ConceptsData Mining : Concepts
Data Mining : Concepts
 
Knowledge discovery thru data mining
Knowledge discovery thru data miningKnowledge discovery thru data mining
Knowledge discovery thru data mining
 
SciBite
SciBiteSciBite
SciBite
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cycle
 
Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008Data Mining With Excel 2007 And SQL Server 2008
Data Mining With Excel 2007 And SQL Server 2008
 
Data mining seminar report
Data mining seminar reportData mining seminar report
Data mining seminar report
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Data analytics beyond data processing and how it affects Industry 4.0
Data analytics beyond data processing and how it affects Industry 4.0Data analytics beyond data processing and how it affects Industry 4.0
Data analytics beyond data processing and how it affects Industry 4.0
 
Testing
TestingTesting
Testing
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Additional themes of data mining for Msc CS
Additional themes of data mining for Msc CSAdditional themes of data mining for Msc CS
Additional themes of data mining for Msc CS
 
Key Principles Of Data Mining
Key Principles Of Data MiningKey Principles Of Data Mining
Key Principles Of Data Mining
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data Warehouse By Piyush
Data Warehouse By PiyushData Warehouse By Piyush
Data Warehouse By Piyush
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 

Similaire à 2014: Treparel Big Data Text Analytics & Visualization

Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...
Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...
Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...Cre-Aid
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricCambridge Semantics
 
Driving Customer Loyalty with Azure Machine Learning
Driving Customer Loyalty with Azure Machine LearningDriving Customer Loyalty with Azure Machine Learning
Driving Customer Loyalty with Azure Machine LearningCCG
 
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...Findwise
 
Treparel lt innovate summit june 27, 2013
Treparel lt innovate summit june 27, 2013Treparel lt innovate summit june 27, 2013
Treparel lt innovate summit june 27, 2013Treparel
 
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...PwC
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewDataWorks Summit/Hadoop Summit
 
Health Plan Survey Paper
Health Plan Survey PaperHealth Plan Survey Paper
Health Plan Survey PaperLisa Olive
 
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...IRJET Journal
 
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...IJTET Journal
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIDenodo
 
lawTechCamp - Knowledge Management Panel
lawTechCamp - Knowledge Management PanellawTechCamp - Knowledge Management Panel
lawTechCamp - Knowledge Management Panellawtechcamp
 
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Karen Thompson
 
Futuristic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaFuturistic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaBabasab Patil
 
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemLeveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemSemantic Web Company
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discoveryadamkraut
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignMarianne Sweeny
 

Similaire à 2014: Treparel Big Data Text Analytics & Visualization (20)

Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...
Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...
Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
OpenKM commercial
OpenKM commercialOpenKM commercial
OpenKM commercial
 
Advanced Database System
Advanced Database SystemAdvanced Database System
Advanced Database System
 
Driving Customer Loyalty with Azure Machine Learning
Driving Customer Loyalty with Azure Machine LearningDriving Customer Loyalty with Azure Machine Learning
Driving Customer Loyalty with Azure Machine Learning
 
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
Enterprise Search White Paper: Beyond the Enterprise Data Warehouse - The Eme...
 
Treparel lt innovate summit june 27, 2013
Treparel lt innovate summit june 27, 2013Treparel lt innovate summit june 27, 2013
Treparel lt innovate summit june 27, 2013
 
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
 
Health Plan Survey Paper
Health Plan Survey PaperHealth Plan Survey Paper
Health Plan Survey Paper
 
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
IRJET- Towards Efficient Framework for Semantic Query Search Engine in Large-...
 
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...
 
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BIAugmentation, Collaboration, Governance: Defining the Future of Self-Service BI
Augmentation, Collaboration, Governance: Defining the Future of Self-Service BI
 
lawTechCamp - Knowledge Management Panel
lawTechCamp - Knowledge Management PanellawTechCamp - Knowledge Management Panel
lawTechCamp - Knowledge Management Panel
 
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
 
Futuristic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mbaFuturistic knowledge management ppt bec bagalkot mba
Futuristic knowledge management ppt bec bagalkot mba
 
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemLeveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
 
PoolParty Semantic Classifier
PoolParty Semantic ClassifierPoolParty Semantic Classifier
PoolParty Semantic Classifier
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discovery
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By Design
 

Dernier

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

2014: Treparel Big Data Text Analytics & Visualization

  • 1. Introducing Treparel: Big Data Text Analytics & Visualization applications Treparel Delftechpark 26 2628 XH Delft The Netherlands www.treparel.com Jeroen Kleinhoven CEO jeroen@treparel.com February, 2014
  • 2. Industry  Thought  Leaders  about  Treparel   “Treparel  KMX’s  visualiza(on  capabili(es  around  its  auto-­‐categoriza8on   and  clustering  offer  immediate  insight  into  unstructured  data  sets  and   appear  to  be  adaptable  and  customizable  to  customer  needs.  Its  approach  to   auto-­‐categoriza8on  u8lizes  sta8s8cal  principles  and  machine  learning  that   require  significantly  less  training  and  tuning  on  the  part  of  customers  than   other  approaches.”  David  Schubmehl,  IDC   “As  we  acquire  more  and  more  informa8on,  we  need  tools  that  will  guide  us   through  the  data  maze.  Analysts  need  tools  to  help  them  understand   paGerns  and  define  clusters.    Users  need  to  explore  data  to  uncover   rela8onships  from  scaGered  sources.    Treparel’s  KMX  serves  both  these   needs  with  its  ability  to  cluster  and  categorize  collec8ons  of  data  with  a  high   degree  of  accuracy,  and  its  interac8ve  visualiza8on  tools  that  enable   explora8on  of  large  data  sets.”  Sue  Feldman,  Synthexis.com  (author:   The  Answer  Machine.   Treparel KMX – All Rights Reserved 2013 www.treparel.com 2
  • 3. Some  of  our  clients  &  partners   KMX  is  an  integral  part  of  our  IP  analysis  toolbox.  It  contributes  to  our   capability  of  making  added  value  IP  analyses  of  technologies  and   compe8tors  to  support  strategic  decision  making.   “We’ve  speed  up  our  patent  searches  from  2  days  to  2  hours  using   KMX  technology”   www.fusepool.eu Treparel KMX – All rights reserved 2014 3
  • 4. Key  Business  Problems  Treparel  KMX  solves   Applica'on  Area   Business  problem   Value   IP  &  Patent  Search   How  to  improve  the  Bme-­‐ consuming  and  costly  manual   search-­‐process  of  patents.   Reduce  research  Bme,  improve   precision  &  recall  of  relevant   documents.  Improve  legal  posiBon   and  drive  more  revenue  from  IP.   Compe''ve  Analysis     How  to  increase  knowledge  on   compeBtors  by  gaining   clustered  insights  from  (semi-­‐)   public  sources.   Improve  compeBBve  advantage  by   determining  internaBonal  strategy,   product  roadmap,  R&D  planning,   markeBng  campaigns  and  customer   senBment.   Healthcare     How  to  idenBfy  health  risks  and   find  correlaBons  in  deceases  or   medical  defects.   Early  idenBficaBon  on  health  risks  by   cross-­‐discipline  analyses  on  medical   records,  clinical  observaBons  and   medical  images.   Media  &  Publishing   How  to  improve  search  and   content  analyBcs  on  large   volumes  of  publicaBons.   Text  analyBcs  embedded  in  publishing   improves  relevance  and  accuracy  of   search  and  shows  previously  hidden   documents.   Treparel KMX – All Rights Reserved 2013 www.treparel.com 4
  • 5. Key  Business  Problems  Treparel  KMX  solves  -­‐  2   Use  Cases   Business  problem   Value   Sen'ment  Analysis   How  to  manage  current  and   future  customers  and  their   interacBons   Deriving  senBment  from  criBcal   customer-­‐based  text  sources  can   drive  revenue,  saBsfacBon  and   loyalty     Voice  of  Customer   Analyzing  HR-­‐related  informaBon   How  to  manage  communicaBons   (like  CVs  and  projects)  to  match   and  interacBons  with  employees,   demand  to  supply.   managers,  subordinates  and   employment  candidates   eDiscovery   How  to  manage  and  miBgate   general  liBgaBon  risk  and  cost  in   large  sets  of  text  and  emails.   Text  analyBcs  applied  to   legal  trials  or  in  laws  and   jurisprudence  improves  accuracy   in  legal  cases  and  lowers  costs.   Predic've  Analysis   How  to  idenBfy  early  signs  of   required  maintenance  that  affect   customer  saBsfacBon  and   operaBonal  costs   Use  customer  saBsfacBon  surveys   on  food  quality  to  idenBfy  airplane   ovens  requiring  maintenance  tune-­‐ ups   5
  • 6. Part  1:     KMX:  Ready  to  Use  Text  AnalyBcs     Intui8ve  Content  Clustering,   Classifica8on  &  Visualiza8on   Treparel KMX – All rights reserved 2014 www.treparel.com 6
  • 7. KMX  Text  AnalyBcs  ApplicaBon  overview   Query & Search Tools Acquire  documents   Text  Preprocessing  and  Indexing   Clustering   ClassificaBon   VisualizaBon   SemanBc  Analysis   KMX  unique  funcBons:   •  Extract  concepts  in  context   using  clustering  and   classificaBon  of  documents   •  Use  classificaBon  to  create   ranked  lists  and  to  tag  subsets   •  Support  of  binary  and  mulB-­‐ class  ClassificaBon   •  Enterprise  ediBon  (server/ cloud)  &  Professional  ediBon   (desktop)   •  IntegraBon  with  other   applicaBons  through  KMX  API   Taxonomies,   Ontologies   Present  Results   Treparel KMX – All rights reserved 2013 7
  • 8. Clustering:  User  Unsupervised  AnalyBcs   Benefits:  Get  quick  insights  through  automated  visual  clusters   with  annotaBons  to  enhance  the  discovery  process     1.  Analyze  the  clusters  and  the  relaBonships  in  the  data     2.  Explore  outliers  in  the  data   3.  Find  documents  of  interest   What  it  does:  A  visualizaBon  of  clusters  where  the  documents   are  displayed  as  points  and  the  distance  between  them  shows   their  similarity.       What  KMX  delivers:  Use  KMX  to  do:   1.  2.  3.  4.  Perform  text  preprocessing  (stemming/tokenizaBon  etc)   Calculate  between  all  documents  a  similarity  measure   Calculate  visualizaBon  (landscape)  with  automaBc  annotaBon   Create  the  visualizaBon     –  As  a  staBc  image   –  Or  provide  interacBon  where  the  user  can  zoom  in/out  with   support  for  adapBve  annotaBon   Treparel KMX – All rights reserved 2014 www.treparel.com 8
  • 9. ClassificaBon:  User  Supervised  AnalyBcs   Benefits:  Finding  fast,  accurate  and  precise  small  result  sets  and  enabling  trend   reporBng  and  AlerBng  by  reusing  predefined  categorizaBon  models.   1.  Obtain  a  ranked  list  of  the  most  relevant  documents     2.  Separate  the  important  documents  from  the  irrelevant  documents  (noise)   How  it  works:  A  list  of  the  relevant  documents  defined  from  a  users   perspecBve.       What  KMX  delivers:  Use  KMX  to  do:   1.  Tag  (label)  a  small  number  of  relevant  and  irrelevant  documents   –  Use  search  to  idenBfy  documents  that  need  to  be  tagged   –  Perform  manual  tagging   –  Select  documents  interacBve  from  the  visualizaBon  (brushing)   2.  Create  a  Classifier  (categorizer)  using  the  tagged  documents   3.  AutomaBcally  perform  the  classificaBon  on  all  documents     4.  Obtain  the  important  documents  as  ranked  high  and  the  irrelevant   documents  which  are  ranked  low   Treparel KMX – All rights reserved 2014 www.treparel.com 9
  • 10. VisualizaBon:  Discovering  Unexpected  Insights   Benefits:  KMX  VisualisaBons  are  supporBng     the  process  of  construcBng  a  visual  image     in  the  mind  to  understand  the  data  be_er.   How  it  works:  KMX  offers  a  visualizaBon  framework  with  various  methods  for   seeing  the  unseen.  It  enriches  the  process  of  discovery  and  fosters  profound   and  unexpected  insights.     What  KMX  delivers:  Different  visualizaBons  or  visual  pipelines  to:   •  Comprehend  large  datasets,  datasets  that  are  too  large  to  grasp  by  mental   imaginaBon.   •  Discover  previous  unknown  properBes  of  the  data  set  that  may  not  have   been  anBcipated   •  Reveal  inherent  problems  of  the  data,  for  instance  errors  and  artefacts   •  Examine  large-­‐scale  features  of  the  dataset  as  well  as  the  local  features  or   allows  the  user  to  see  local  features  in  a  larger  scale  reference   •  Let  users  form  hypothesis  based  on  the  (newly)  observed  phenomena  or   developed  insights     Treparel KMX – All rights reserved 2014 www.treparel.com 10
  • 11. Add-­‐on  servers:   Auto  ReporBng  &  Batch  ClassificaBon   •  Auto  Repor'ng  Server   –  Support  automated  analysis  for  aggregated   results  for  mulBple  users   –  Pie  &  bar  charts   –  Landscape  visualizaBons  for  overview  of   subjects   –  Enabling  rich  interacBon  via  web  interface   •  Classifica'on  Batch  Server   –  high-­‐performance  stand-­‐alone  text-­‐ classificaBon  server   –  Enables  large  scale  parallel  processing   Treparel KMX – All rights reserved 2014 Page 11 www.treparel.com 11
  • 12. Business  Value  from  Content  with  KMX   þ  Text  Analy'cs  for  Anyone  and  Everyone  –  IntuiBve  to  use  and  learn.  Designed   for  every  user:  business  (info  consumers)  and  scienBfic  (info  creators).   þ  Instant  Business  Insights  –  Explore  all  of  your  unstructured  data  (text,  blogs,   email,  patents)  without  limits.     þ  Rapid  Time  to  Value  -­‐  Adaptable  and  customizable  to  users  needs.  No   implementaBon  or  extensive  and  expensive  modelling  or  development.   Significant  less  training  and  tuning.       þ  Any  size  deployment  –  Meets  every  business  need  from  a  single  user  to  large   mulBlevel  type  user  groups.     þ  Language  independent  –  Search  and  analyze  most  of  the  world’s  languages   using  machine  translaBon.   þ  Any  kind  or  deployment  -­‐  Use  it  from  your  desktop  or  in  a  -­‐  private  -­‐    cloud.  Buy   the  socware-­‐as-­‐a-­‐service  or  get  the  output-­‐as-­‐a-­‐service.       þ  Enterprise-­‐proven,  IP  &  IT  friendly  –  Successfully  delivering  value  to  IP,  business   and  markets  in  mulBnaBonal  companies.   þ  Integra'on  –  Use  the  KMX  API  to  increase  the  value  of  unstructured  data  in  your   IP  discovery  infrastructure   www.treparel.com Treparel KMX – All rights reserved 2012 12
  • 13. Part  2:     KMX  socware:     User  Interface,  key  func8ons  &  value   Treparel KMX – All rights reserved 2014 www.treparel.com 13
  • 14. KMX  :  Model,  Analyse,  Discover  and  Visualize     in  one  view  and  deploy  it  to  large  scale   Search  and   highligh'ng   Brushing   Filtering   Document  text   Landscape  visualiza'on   www.treparel.com rights reserved 2014 Treparel KMX – All Coloring  of  classifica'on  score   14 KMX Example: ‘Ebola, SARS, Bird flue: How do they relate?’
  • 15. KMX  :  OpBmize  Output     using  ClassificaBon  Performance  Tuning   Precision   And     Recall   Document   classifica'on   for  three   classes   Distribu'on  of  classifica'on  scores   www.treparel.com rights reserved 2014 Treparel KMX – All 15
  • 16. Use  Case  1:  Performing  small  to  large  scale  SWOT   analysis  (on  AstraZeneca  patents)   SWOT  analysis  example     Start  with  removing  irrelevant   patents  using  Classifica8on  and   Filtering  to  determine:   •  Who  are  the  important  players   (assignees,  inventors)?   •  Where  are  the  important  patents   filed  (countries)?   •  What  is  the  trend  over  Bme  (growth   of  patents  over  the  years)?   •  NB:  we  used  a  (very)  simple  query  to   find  986  patents  filed  under   Astrazeneca.       Patent   Database   Queries   +10.000 patents Ranking   Filtering   Ranking   Filtering   986 patents 29 patents Ranking   Filtering   Business   User       Treparel KMX – All rights reserved 2014 Output
  • 17. Landscaping  and  Ranking:   From  986  to  the  most  relevant  patents   Fig: Using vlsual selection (brushing) to build a classification model (Classifier) to be able to rank the full data set and to extract the most relevant. 17
  • 18. Landscaping  and  Ranking:   What  are  most  relevant  Respiratory  &  Inflamma8on  patents?   Yellow = most important patents (+80% score) Blue = least relevant patents (for this analysis) NB: crosshair points to 1 specific patent (full text in left pane) Fig: Ranked patents using a Classifier for Respiratory & Inflammation patents (In yellow the selection of 29 18 absolute relevant patents to be further analyzed). We used ‘respiratory’ to demonstrate highlighting capabilities.
  • 19. How  Reliable  &  Accurate  are  the  results?   Review  your  results  with  advanced  performance  tools   The  quality  of  the  automaBc  classificaBon  (categorizaBon)  is  shown  in  the   histogram,  where  a  small  number  of  documents  with  a  high  classificaBon  score   are  separated  from  the  large  number  of  documents.   Fig: Classification performance 1280 patents on ‘biomass’ Non  relevant  documents   Relevant  documents   KMX  calculates  the  Precision  and  Recall  of  the  results  using  cross  validaBon.   • Precision  is  essenBal  for:  First  analysis  &  AlerBng  services   • Recall  is  crucial  for:  Freedom  to  Operate  search,  Validity  search  Patentability  search   • Both  need  to  be  high  for:  Patent  porkolio  landscape  analysis,  Technology  ExploraBon,  Risk  Assessments     19
  • 20. Use  Case  2:   Concept  detecBon  using  document  classificaBon   Extrac8ng  concepts  in  context  from  classifica8on  of  documents   1.  VisualizaBon  à  mulBple  topic   clusters   2.  Select  cluster  à  select  documents   with  similar  topics   3.  Select  training  documents  within   the  sub-­‐cluster   4.  Build  Classifier  and  classify   5.  Rank  documents  à  find  set  of   documents  with  related  concepts   6.  Extract  concepts   KMX Example: ‘Ebola, SARS, Bird flue: How do they relate?’ Treparel KMX – All rights reserved 2014 Page  20  |     20
  • 21. Part  3:     NEW:  Content  Dashboard  (InfoApp)   Integrated  SAAS  based  search,  repor8ng,   visualiza8on  and  analysis   Treparel KMX – All rights reserved 2014 www.treparel.com 21
  • 22.   Role  of  KMX  in  Integrated  InformaBon  ApplicaBons     Client/ Server Reporting Dashboard Informa'on   Consumers   (+  100  users)   Mobile Web Search Alerting Visualization Exploring Domain or Market Specific InfoApps (by Partners) Management, Development and Integration Text Mining Text PreP Creators/   Data  Scien'sts   (1-­‐5  users)   Stem/Token Tweets Documents Treparel KMX – All Rights Reserved 2013 Indexing Patent Data Clustering Classification Research Literature Enterprise Content jeroen@treparel.com Visualize Email Text Websites 22
  • 23. Content  Dashboard:     Content  Driven  AnalyBcal  solu8on   Ease of Use access to Search, Reporting & Analysis of content like Patents, Emails, Legislation, Application Notes, websites Treparel KMX – All rights reserved 2014 www.treparel.com 23
  • 24. Content  Dashboard:     Content  analyBcs  beyond  key-­‐word  search   Interactive taxonomy with multiple coupled views and advanced search in large sets of documents Treparel KMX – All rights reserved 2014 www.treparel.com 24
  • 25. Content  Dashboard:     Built  in  analy8cs  &  interac8ve  visualiza8ons   Ad-hoc or Standard interactive visualizations leading directly to the underlying documents or notes Treparel KMX – All rights reserved 2014 www.treparel.com 25
  • 26. Part  4:     NEW:  KMX  API  for  OEM  partners:   Put  best  in  class  content  analy8cs   in  your  solu8ons   Treparel KMX – All rights reserved 2014 www.treparel.com 26
  • 27. SoluBons  built  on  KMX   KMX Empowers InfoApps (solution partners/OEM/VAR) Partner solutions: •  IP & Patent Analytics •  Media & Publishing •  HR •  eDiscovery (Law & Legislation) •  Fraud Detection •  National Security & Police •  Sentiment analytics •  CRM/Voice of Customer •  Government •  Sharepoint (Enrich & Migrate) •  Content-based Dashboards KMX platform Big Data Text Analytics (cloud based platform / API) Fig 1. McKinsey diagram showing the three technology layers of the Big Data technology stack 27
  • 28. KMX  API  for  OEM:   Embed  Advanced  Text  AnalyBcs  in  your  soluBon   Clustering Provides users unsupervised analytics and automatically identifies inherent themes or information clusters. Classification Supervised analytics to help users automatically categorize large sets of documents. Through a dynamic hierarchical topic view into search results it enables users to quickly focus on annotated subjects rather than scrolling through long results lists. The Classification process can use a small number of documents sets for learn-byexample categorization. KMX API XML-RPC and REST (JSON) Python Pickle protocol Visualization Advanced visual knowledge discovery for displaying, exporting and sharing data results, ranked document lists, labeled and enriched data or interactive visualizations. Server: User / Tenant mgt User objects mgt (datasets, work spaces, classifiers, stop lists,.) Databases: Oracle, PostgreSQL Client Application: Native Windows (for creating Analysis pipelines) Using QT for GUI Using OpenGL for visualizations By sorting the content of documents by topic, relevancy and keywords users can apply their own models or rules for classification. Terms can be extracted to use in building thesauri or taxonomies. Example Applications Areas Advanced Visualizations, Interactive Analytics, Text Disambiguation, Data Enrichment, Clickthrough Optimization, Concept Extraction, Automated Tagging, Semantic Discovery, Named Entity Recognition Document Overlap Display, SWOT analysis, Sentiment Analysis, Predictive Analytics
  • 29. KMX enables information and knowledge professionals to gain faster, reliable, more precise insights in large complex unstructured data sets allowing them to make better informed decisions. Treparel is a leading technology solution provider in Big Data Text Analytics & Visualization