This is the vision of Recognos about the future of Semantic Technology in Document Management. The presentation was created for the SemTech Conference in November, 2011 in Washington DC.
2. A document management system (DMS) is a computer system (or
set of computer programs) used to track and store electronic
documents and/or images of paper documents. It is usually also
capable of keeping track of the different versions created by different
users (history tracking). The term has some overlap with the concepts
of content management systems. It is often viewed as a component
of enterprise content management (ECM) systems and related to
digital asset management, document imaging, workflow systems and
records management systems.
Make the formatted equivalent with non-formatted !
November 2011
4. Volume
Labor extensive
The “research project” – 40% – 60% data
gathering
Metadata independent of content
Shallow Search
Hard to understand by non-experts
November 2011
5. NLP Natural Language Processing –
understand the meaning of documents
(statistic, machine learning, hybrid, graph
based)
Semantic Search – tagging
Data Integration
Sentiment Analysis
Linked Open Data – Linked Data
Inference - Reasoning
November 2011
6. Inside – Controlled Environment - TRUST
Inside – Security issues
Same techniques as outside the enterprise
Integrates non-formatted with formatted
data
Easy to measure the effects - ROI
Add on to the existing KM models
Emerging area – Semantic technologies
started on the www
November 2011
7. New features will become commodity in 2-3 years
Compliance
Data Extraction, Comparison, Change
Analysis
Interactivity
Augmentation
Translation
Linking – Relationships
Sentiment Analysis
New Search (Semantic Tagging, Deep Search,
NL Questions)
November 2011
8. Microsoft: Powerset (Bing), Fast Search, Jinni
Google: Freebase, Needlebase
Apple: SIRI
Etc…
November 2011
10. Example there is a rule: – email –
Rule 0134C: “Not allowed to mention a percentage as a
profit promise investing with the firm”
In an email:
“ Dear John, Our company has an amazing method to
invest, so that you will make at least 10% profit in 3
months !!!! “
The email was stopped – sent to Compliance with the
message: “Violation of the Rule 0134C”
November 2011
11. MFIP data extraction
Link to the original document
November 2011
15. Create Alarm when Trading Policy Changes
Create Alarm when Commissions Change
(fields)
Create Alarms when member of the Board
Changes
November 2011
21. Google Translate
Great for simple translation – emails, non
technical documents
Language Weaver
Specialized translation through machine learning
Train the system per domains
November 2011
34. WWW
Google Meltwaters Forums /
Twitter Facebook Websites
Alerts Alerts Blogs
Exchange
Server
External Data Pull
Exchange Twitter Facebook 80legs Diffbot
Adapter Adapter Adapter Adapter Adapter
Internal Message Storage
File
Server
Natural Language Processing
Uploaded
ESSEX Taxonomy
Web User Interface
Data Storage
MS SQL Server
November 2011
35. Amdocs AIDA (AMDOCS Intelligent Decision Automation)
November 2011
44. Interactive - Exists
Search – Semantic Search, Q&A
Semantic Tagging – Summarization
LOD with domains
Linked : People, Companies, Locations,
Specific Terms
Example a travel book
November 2011
45. The following technologies were used:
- iQser – GIN
- Clark & Parsia – Spanner, StarDog
- Expert System – NLP
- GATE
- Smart Logic – Enterprise Query Platform – Fast Search – Microsoft
Sharepoint 11
- Revelytix
- Cognition
- Franz Systems
- DiffBot
- Ontotext
November 2011
46. George Roth
President and CEO Recognos Inc.
San Francisco
www.recognos.com
groth@recognos.com
Drew Warren
CEO Recognos Financial
New York
dwarren@recognosfinancial.com
www.recognosfinancial.com
November 2011