2. 30-70% of Big Data is Unstructured
• Difficult to mine and analyze
• Ergo, Largely ignored
• Represents a potential gold
mine undiscovered
• NEED:: a seamless, structured
representation of unstructured
data
3. Text Analytics
• Software and transformational processes that
uncovers business value in unstructured text
• Uses statistical, linguistic, machine learning, data
analysis and visualization techniques
• $2Bn market expected to grow @ 25% CAGR
5. WitnessTree: Text Analytics
Discover
Boost search accuracy
Reduce ambiguity
Contextual analysis
Reduce
Analyze relevant data
Identify & Define themes
Content + contextual similarity
Organize
Dynamic categories, Named-Entity (people,
places, brands, dates), Facets (metadata –
real and derived)
6. WT Semantic Analysis Machine (SAM)
6
Near Duplicate
Detector
Thread Analyzer Topic Explorer Search & Facet
API/web service API/web service API/web service API/web service
Client App/service
Semantic Analysis Machine
Named Entity
Extractor
API/web service
Unsupervised
Doc Clustering
API/web service
Theme Detector
API/web service
7. Started with
1,000,000
docs
draw associations
with no prior
knowledge of docs
Clustering
SET-UPNear-DupDe-dup
Reduce redundant docs by 40% to 60%
SET-UP
Smart Search
Categories
Clustering
“on the fly”
Refine Search
Found 10,000 docs
the Few,
the Relevant
WitnessTree hosted solution for legal eDiscovery
How to e-discover 10,000 from 1M?
“Find the Relevant. With intuitive ease."
chains
near-dups
removes
duplicates
Labeled
cluster tree
600k
unique docs
create
“categories” of
search results
dynamic clustering
on categories
concept, example,
similarity, paragraph,
boolean, proximity ,
fuzzy
Topic detection
Email
threading
Recreates email
threads +
Id’s Missing &
Inclusive emails
Extracts
themes from
clusters
13. Theme Detection
• Detects recurring themes
• Filters based on relevancy
ranking
• Search Wide, Dig Deep
14. Named Entity Recognition
Identifies:
• People
• Places
• Companies
• Time/Date
• Monetary
Crew members on the ISS will open the hatch
Monday and unload 2,780 pounds of supplies
and experiments, the news release said.
"From the men and women involved in the
design, integration and test, to those who
launched the Antares (rocket) and operated the
Cygnus, our whole team”, said David W.
Thompson, president and chief executive
officer of Orbital, in a written statement from
the company.
It will burn up during re-entry over the Pacific
Ocean, officials said.
Orbital has a $1.9 billion contract with NASA to
make eight flights to the space station under
the space agency's commercial supply
program.
15. Our Differentiators
• Structured and unstructured (text) data
• API or web application
Analytics Framework
• Minimal training required.
• Web browser + internet connection
Easy to Use
• Hosted model, SaaS, Licensed in-houseFlexibility
• Document classification, visualization, categorization,
APIVersatility
• State-of-the-art feature set, in placeRich Feature-set
• OEM, white-label, resellerPartnership Models