SlideShare a Scribd company logo
1 of 40
Download to read offline
Building
intelligence
through
semantics

Machine
Learning

Sentiment
Analysis

Text
Text
Analytics
Analytic
s

Ontology
Building

Context
Analysis
About Veda
•

A semantic technology service provider leveraging its capabilities to provide
standardized and bespoke solutions

Awards and
references

•

One of 5 companies worldwide named as Semantic Application Specialists by
Gartner (Who’s Who of Text Analytics, September 2012)

Formation
and
background

•
•

Started as a JV with the Fraunhofer Institute, Germany
Earlier part of 3i Infotech, a large listed IT form. Acquired by current promoters as
part of a management buy out

•

Headquartered in Bangalore, India’s software capital, with ready access to critical
talent

•

Currently a 20 member team, also having a sales presence in Chicago, USA. Key
members of technology team each have over a decade’s worth of experience in
semantic technology

Who we are

Location

Team

3
Enterprise’ Information Distribution

~30%
Unstructured Data:
• Consists of textual
information like
contracts, emails,
presentations
• 70% of organizations’
information remains
in an unstructured
form hence it is not
utilized at all.

~70%

Structured Data:
• Consists of information
from ERP, CRM systems,
XML data
• It is organized and
manageable
• Currently only 30% of
organizations’
information is analysed
for decision making

Are we using only structured data for decision making? What are the critical misses
that are made as a result?
5
What is hidden in unstructured data
Examples of unstructured data
•
•
•
•
•
•
•
•
•

Customer complaints
Employee feedback
Brand perception
Financial data from reports
Competitive news
Information
Facts
Events etc.
And many many more….

What it contains

• Insights
• Opportunities
• Risks

• Just the things needed
for good decision
making!

6
Semantics – making sense of unstructured data
• Semantics is the study of meaning. It focuses on the
relation between signifiers, like words, phrases, signs,
and symbols, and what they stand for their denotation.
[Wikipedia]

• SEMANTICS = MEANING
• It is about describing things
• In linguistics, semantics is the subfield that is devoted to
the study of meaning as inherent at the levels of words,
phrases, sentences, and larger units of discourse.

7
Industry Overview - Need for Semantic Technology

Information
overload

Heterogeneous
Distributed
Unorganized

High data
volumes

•
•
•

Increasing numbers
Increasing Sources
Unmanageable

Inefficient
retrieval

8

•
•
•

•
•
•

Keyword search is inefficient
Lack of Classification and relevance
Focus on “Search” rather than “Find”

The definition of ‘Data’,
which had been artificially
restricted to only
numerical data, can now
extend to text and other
unstructured data as
well…
…Providing more insights
and richness for decision
making
Top 9 Technology Trends Likely to Impact Information
Management in 2013
Technology Trend
Big Data
Modern information infrastructure
Semantic technologies
The logical data warehouse
NoSQL DBMSs
In-memory computing

Chief data officer and other information-centric roles
Information stewardship applications
Information valuation / infonomics

Source: Gartner

9
Broadly, text based offerings can be clubbed under two main
heads
Statistical text mining
•
•
•

•

•

•

10

Natural language processing

Looks for documents based on statistical
techniques.
Helps identify high frequency terms or
expressions
Identifies other terms being used in
conjunction with them
Assigns match probability to documents
based on mathematical techniques to
facilitate searches and knowledge
management
Accuracy could be improved further by
using machine learning principles

•

Primary applications: Text mining and
document matching (eg VoC analysis,
Email analysis, E Discovery, etc)

•

•
•
•

Parses a sentence to identify nature of
words in it
More relevant for sentence level analysis
as opposed to document level analysis
Principles of English, as opposed to
statistical techniques, take precedence in
analysis
Accuracy dependent on strengths of
algorithms written

Primary applications: Named Entity
Extraction (knowledge management),
Sentiment analysis (VoC analysis, E mail
monitoring, etc)
Industry Overview – usual application areas
Areas

Technique used

Social media analytics
Better advertising placement
CRM information capture and action

Sentiment Analysis using NLP
Coupled with vertical specific taxonomies

E Discovery
Auto classification
Forensic analysis

Statistical text mining
Named Entity Recognition (NER)
Machine learning

Pattern analysis
Predictive modelling

Statistical text mining
Named Entity Recognition
Coupled with structured data (e.g. frequency of
mails, department information, etc)

Knowledge
Management

Auto tagging and classification
Discovery (eg healthcare information
sharing)

NER (for named entities)
Statistical text mining
Custom ontologies / semantic networks

Vertical specific use
cases

Examples:
Financial services, Publishing, Pharma,
Healthcare, Legal, Insurance, etc

Various degrees of text mining, NLP and
sentiment analysis, and entity extraction
techniques

Marketing

Compliance

Risk analysis, Fraud
detection

11
But purely from an R&D perspective, quality thresholds
have a very high standard deviation
NLP

eDiscovery

Ontology

12

•
•
•
•

Attaching sentiment to attribute, and attribute to object
Handling basic keywords (e.g. I like something, vs. something is like another)
Vertical taxonomies that allow aggregation
Vertical specific sentiment words (e.g. executing a man vs. executing a
transaction, high fuel economy vs. high fuel consumption)

High variability in Recall and Precision rates
Tagging of concepts remains difficult
Summarization techniques based on basic lexical parsing
Limited use cases
Often seen as multi year projects as opposed to quick win areas
The reason for the quality difference is that at many times,
client context is not fully understood and the software is not
trained on such context
•

What is the primary purpose for which the tool will be used for: finding trends, better search, forensics, fraud
prevention, building predictive models, etc

•

Are certain terms so common that they must be ignored while doing an analysis

•

Are there domain specific words that attain a different meaning than in other domains (eg ‘execution’ has a
different meaning in financial services than in the news domain)

•

Should weightages assigned to certain kinds of documents / words be increased to improve relevance

•

How will the results be presented – are they to be shown visually and not be connected to other enterprise
systems, or should they be an integrated part of the overall BI roadmap of an organization

Unlike traditional systems, text analytics has a large dependency on context. Consequently, in order to unleash
its full potential, the usual bifurcation between consultancy, software development and software
implementation must disappear in the case of text analytics. An off-the-shelf product approach will definitely
not help, and one must adopt a services model to better serve client needs!

13
In addition, there is limited focus on client needs and
use cases
Technology
focused

•

Companies mostly founded and run by technology experts

Customer
language

•

Focus on technology capability and terms as opposed to problems to be solved

Product
approach

14

• Leave out value to be derived by examining enterprise specific data more closely, or
integrating it with structured data for greater insights
An example of our Natural Language Processing capabilities
“The car model looks like the old one”
“I loved the food, but the service was terrible”

“Did anyone like the car?”
“I really luuuuv it”
“The Tokyo office does not like the current prototype of the
product. Bob said we should talk to them to find out why they are
unhappy. Must close this ASAP to get the launch done by August
2013.”

IP protection:
• Patent being filed for clause based sentiment extraction process
16

• Can tag sentiments to attributes,
and attributes to products
• Can handle difficult words, eg ‘like’
based on context – most engines
cannot
• Can handle anaphora resolution
(eg pronouns)
• Can handle Named Entity
Recognition with high recall and
precision
Our Discovery product demonstrates the NLP capability in a
powerful manner, making consumer feedback actionable
•

•

Clickthrough allows deeper
dives into each category

•

Though price gets mainly
negative reviews, not too many
people seem to talk about it.
Perhaps a discount scheme
could help?

•

Actual sentences are displayed,
and things to which the
sentiments are attached are
highlighted

•

17

In this example about a vehicle,
most people care about
comfort, and luckily, the
product gets mostly positive
reviews in this area

Sentiments are associated with
specific aspects of the product
Example of Natural Language Processing in Financial
Domain (continuing R&D)

 Extracts economic
factors that have
been impacted
 Recommendations
and predictions help
analyze complex
financial information
in quickest time.
 Helps in predictive
analytics
18
Example of Natural Language Processing in Financial
Domain – highlighting outlook by driver (continuing R&D)
 Linguistic rules to extract financial / economic indicators
 Domain specific verbs and nouns to understand movement
Financial markets rebounded strongly in 2006's third quarter .
FINANCE ENT : Financial markets
ACTION : rebounded
TIME : 2006's third quarter
MOVEMENT : UP
By the end of the third quarter , crude oil had fallen over 20 %
from its[crude_oil] July peak , while a similar retreat in natural
gas prices produced the latest high-profile hedge fund debacle .
FINANCE ENT : crude oil
ACTION : had fallen
TIME : the end of the third quarter
QUANTITY : 20 %
MOVEMENT : DOWN
FINANCE ENT : natural gas prices
ACTION : produced the latest high-profile hedge fund debacle
MOVEMENT : DOWN

Prices of longer-dated bonds rallied too : the 10-year U. S.
Treasury bond yield fell over 60 basis points during the third
quarter .
FINANCE ENT : Prices of longer-dated bonds
ACTION : rallied
MOVEMENT : UP
FINANCE ENT : the 10-year U. S. Treasury bond yield
ACTION : fell over 60 basis points
TIME : the third quarter
QUANTITY : 60 basis points
MOVEMENT : DOWN
Example of Natural Language Processing in Financial
Domain -extracting Cause and Effect (continuing R&D)
As the fourth quarter begins , financial markets remain supported by
positive earnings and interest rate trends .
FINANCE ENT : financial markets
ACTION : remain supported
TIME : the fourth quarter
CAUSE : positive earnings and interest rate trends
EFFECT : financial markets remain supported

However , the pace of U. S. economic activity will slow further by
year-end as weakness in the housing and automotive sectors becomes
increasingly acute .
FINANCE ENT : the pace of U. S. economic activity
ACTION : will slow
TIME : year-end
MOVEMENT : DOWN
CAUSE : weakness in the housing and automotive sectors becomes
increasingly acute .
EFFECT : the pace of U. S. economic activity will slow year-end

20
An example of our Enterprise capabilities
• Ontology modeling using RDF and OWL semantic web standards
• Document Matching / Similarity using statistical models and concept based approach for Patent Search,
Knowledge Management etc..
• Information Extraction using linguistic models for Fraud Detection, analysis of news stories etc..
• Demonstrated capability for patent search, legal cases, handling survey data
• Machine learning capability allows for precision to be attuned and increased for specific client situations
• Can disambiguate based on domain specific situations, e.g. execution may mean a different thing in a
news domain, vs. executing a transaction in financial services domain

21
Veda Text Mining capability – key features

Preprocessing

Processing

Data input in various forms (eg txt, doc, etc)
Can accept data from public sources (eg Facebook, Twitter) apart from Enterprise sources

•
•
•
•
•

Removal of junk text around emails
Removal of small Emails like “Thanks”
Removal of forwarded Emails attached to main Email from analysis
Spell checks and autocorrects
Language parsing for English

•
•
•
•

Natural Language and Statistical Processing techniques
Extraction of key discussion items from the text, and what is being said in relation to them
Key themes from messages and semantic chaining. Can be combined with sentiment analysis as well.
Ability to handle high velocity and high volume data using Big Data infrastructure (Hadoop, Storm, etc.)

•

Input

•
•

Group discussion items into categories and sub categories, while identifying what is being said about
them:
• Automatic for synonyms, singular and plural, etc
• Ability to add / delete categories
• Ability to further analyse sub-categories

Categorization

UI, editing and •
•
export
•

22

Simple, easy custom built UI with filtering and drill down capability
Machine learning approach where human insight guides further results
Output not only available in visual format, but exportable to other applications or databases
Veda Text Mining capability – screens of analysis in
progress
Clustering conversations into categories using
semantic analysis.

Example customized outputs

23
Our Delivery Capabilities
Proof of Concept

Trial & Demonstration

Delivery Methodology

High-level client requirements

Detailed solution requirements

- Define the scope of work

- Delivery framework (core offering +
value added services)
- Documented External Interfaces
with Volume and associated
recurring cost (if any) information
- User Guide & Training

- Proof of concept

- Methodology (Agile, Waterfall
approach or client specified
approach)
- Timelines for each deliverable

24

- Responsibility Matrix
Delivery Methodology
Client assignments
Program
Activities

Project
Delivery

Program
Mgmt

Program
Initiation

Project
Kick-off

Support
Activities

25

Feature
Selection

Data Set
Creation
Business
Requirements

Infrastructure
Readiness

Program
HR
Mgmt

Analysis
and
Design

Operational
Readiness

Program
Benefits
Tracking

Change
Analysis

Project
Closure

Machine
Learning

Development

Support
Delivery

Test &
Verify

Training

Release

Post
Release
Support
26

Taking the next step
*Implement for a
business
function/division/a single
geography
*Multiple features of SIS
implemented including
cross business solutions
leading to concrete
measurable gains

Phase 3

Veda will solve a business
challenge you choose to
demonstrate the power of a
semantics based solutions
in a quick turn around
(Typically within few days)
exercise

Phase 2

Phase 1

For bespoke development, we are prepared to start
small, to show clients clear value and RoI
Replicating the success of
the previous phase –
*Across Larger Sections of
the enterprise
*Wider Data consolidation
scope
*Multiple output delivery
channels
*Visible long term gains
But ultimately, we believe that clients will benefit
considerably by a unified Semantic Information System
Staging Area

Data Warehouse

Reporting

Data Mart

* Insights from Unstructured
data coupled with Analytics
from Structured Data assets (E.g.
BI, Big Data)

Dashboards
Databases

Structured data

Store into Cubes

Data Mart

Processed data

Databases

Alerts

Unstructured
data
(Server,SAN,SAS)

Internet
Public Web Data

Ready insights

Processed data

Online

Natural Language processing

Email Crawler

Ontologies

Files Crawler

Data

Semantic Analysis
Knowledge Base

Crawler
Unstruct ured Data

Categorized
Data

Veda Organising Processes

Web Crawler

Social Media

Auto Classification

Visual Segregation

Unstructured & Semi-Structured Data
Structured Data

Social Media

27

Processed data

Veda Collection Processes

chatter

* Collecting unstructured
data from disparate sources

Databases

Formatted data

Structured Data

* Analyse all collected
unstructured data, Organize it
using rich knowledge
representation/domain
ontologies

Data

Structured Data

Data Mart

Marketing

Purchasing

Payroll

Sales

LOB Applications

Operations
Veda Approach – COP Framework
Our proprietary Collect – Organize- Present framework and tools allow us to undertake quick bespoke
development
• Connectors
Collect

— Collect information from variety of (heterogeneous) sources

• Information Extraction
— Using NLP and semantic analysis

• Semantic Net / Ontology Editor
— Smart knowledge representation of a domain

Organize

• Auto Classifier
— Classify data and tag it to industry specific concepts automatically

• Ontology Reasoning
— Analyze industry knowledge and infer from ontological knowledge

• Analytics
— Identify various patterns and insights from the data

Present

• Semantic Matching
— Provide most relevant information

• Semantic Search and Browsing
— Semantic explorer to retrieve contextual concept-based information

28
Veda’s Value Proposition
•
Technology

Deep understanding of the Semantics space
•
•

Expertise in both NLP and ontologies / taxonomies, and in standards (RDF / OWL)

•

•

In the semantic technology space for more than a decade

Team has provided services not only to clients, but to other semantic service providers

Tie up with academia
•
•

Delivery

29

Allows for cutting edge R&D

•
•

Tie up with leading Indian university in the area

High quality talent pipeline

Live - Delivery and Support Turnaround
— The Veda Platform is the core that
— Is a solution accelerator giving a head start to all our assignments (tested and
certified components)
— Allows for lower costs
— Allows for incremental rollouts
Veda’s Value Proposition (contd)
•

Expertise in Multiple Business Domains
•

Experience

Healthy mix of business and technology expertise – can provide clear use cases for
Semantics and help establish clear RoI metrics

•

Core team members have had experience in Semantic technology since 2003, longer

than most other companies
•

Technology team experienced in providing expertise in a wide variety of business
domains leading to speedy and effective solution implementations

•

Located in India, with associated inherent advantages
•

Lower cost options for clients with onshore – offshore model

•

24 hour work cycle

•

Large talent pool

•

Location

Tie ups with companies focused on various other related technologies to offer
integrated offerings, eg full service offering / working with offshore vendor to make
outsourced processes more efficient using semantics

30
Veda’s End-to-End Semantic Expertise
•

Text Analytics
—

•

Analyzing unstructured text, converting to structured data

Machine learning
— Statistical techniques resulting in increasing accuracy over time (with more inputs)

•

Sentiment Analysis
—

•

Semantic Information Retrieval
—

•

More artifacts searched/More accurate – e- Mails, Documents, Spreadsheets, Output from
existing structured data sources

Semantic Web Standards
—

32

Identifying if the sentiment of a sentence is positive, negative or neutral (and the various shades
in between)

Standardized storage and output formats for easier information sharing
Past Experience
Client Profile

Project Description

A global publishing house in legal, tax,
finance and healthcare

 Context-based content research platform for tax & legal domain
 Automatic meta-tagging , ontology modeling and ontology driven
content reference system.

A prominent product manufacturer on
inference and reasoning engine

 Leveraged semantics for a supply chain process to integrate systems
with heterogeneous data sources and help in automatic decision
making in case of any disruptions in the cycle.
 Provided ontology modeling and application development services.

A reputed university and complex systems  Produced a method for organizing and potentially navigating the wide
research lab in Australia
range of web-pages associated with the Murray-Darling river system in
a seamless fashion

An analytics software manufacturer in
Australia
A premier worldwide online providers of
news, information, communication,
entertainment and shopping services

33

 Assist investigation of fraud and terrorism – Establishing links between
entities
 Unstructured data analysis
 Developed a web analytics platform for analyzing click-stream data in
real-time.
Some sample use cases mapped to our current
technology demonstrators
Current situation
•

How Semantics will help

Mapping to current Veda
technology demonstrator

Saved in C drives or in DMS, separate excel
sheets maintained to check on timely
renewals, etc.
Tough to compare specific clauses across
contracts or find relevant clause as needed

•

Search for specific kind of contract
and specific clause will throw up (a)
master template (b) earlier
contracts entered into in the area (c)
extracts from the relevant clause

•

Patent search demonstrator uses
similar techniques, allowing the user
to also see probabilistic match of
documents

•

Dig deep into embedded code to see what
departments and areas will get impacted

•

Ontology based relational steps
make it easy to see connected
departments, processes, etc. that
will be impacted

•

Tax caselaw and section ontology
created

•

Mapping social sentiment and reviews
done manually or using dictionary based
social monitoring tools

•

Some social marketing and social
listening already being done, though
not accurate. A better quality NLP
engine allows for more accurate
results (e.g. the word ‘like’).

•

Veda Discovery Engine which has
sentiment capabilities

•

Obtaining right resumes using keyword
search remains time consuming
Employee suggestions in open ended
surveys not aggregatable
Qualitative comments in employee
evaluations not aggregated

•

Identify key intervention areas at
aggregate levels
Map trends in overall ratings to key
strength and weakness areas

•

Veda Discovery for aggregation,
Veda Txt for identification of gist of
comments

Metatagging remains a manual process
and as a result, searches remain searches,
not findings

•

Automatic metatagging (Persons,
Locations, Organizations, concepts,
etc.)

•

Veda Discovery – NER Engine, Veda
Legal demonstrator, Veda Msg (for
alerts)

Legal contracts
•

Process
changes

Marketing

HR

•
•
•

Knowledge
management

34

•
Sample use cases by industries
Domain
Publishing,
media

Allows automatic extraction of people, location, dates and events, being extended to
themes and concepts. Helps in automatic metatagging.
• Current tagging process is manual and time consuming. Technology provides clear RoI
by reducing this time and manual labour, providing consistent tagging, and allowing
easier search for future reference, rather than relying on keywords (eg Mahatma vs
Gandhi vs Mahatma Gandhi).

Oil and Gas

35

Description

Can make Incident monitoring and reporting systems more robust, thereby reducing risk
of major accidents
• For incident reporting, a user need not fill in multiple structured data fields. Text
analytics can quickly match data to structured inputs.
• Witness reports, once converted to text, can be monitored across incidents for patters
that would otherwise have gone unnoticed.
Helps make process changes easier and allows all linked aspects to be seen at one go
• Helps determine what other processes and safety regulations are relevant if a sub
process is sought to be changed (could also include contractual information etc if
relevant)
Usually, companies have millions of oil well logs which can be classified by performing
named entity extraction and enrichment
Sample use cases by industries
Domain

Description

Financial services

•
•

•
•
•

Contract matching (including addendums)
VoC analysis
• Churn prediction
• Highlights capability gaps
Promotion management
• Avoids duplication of creation of similar material across divisions / locations. Saving in man
hours and resources by leveraging all available material produced earlier
Risk analysis
• Manage and gather customer documents from various sources to look for areas of concern
“Know your customer” analysis
Competitor analysis
Financial news analysis for investment managers

Telecom

•
•
•

Legal interception and pattern recognition
SMS analyses for recognizing spam to avoid penalties
VoC analysis

Airlines

•

Analysis of unstructured problem and safety logs to avoid incidents

•
•

36
Sample use cases by industries
Domain

Description

Healthcare

•

Link and compare patient records to obtain insights on:
• Symptoms, medicines and discharge times to determine if some medication mixes may be
more beneficial than others across a wide set of patient records
• why some patients may be re-admitted

Pharma

•

•

R&D improvement by allowing scientists, who need to refer to papers but may not know exactly
what to look for, to see relevant topics (based on automatic metatagging, and linked ontology at
the backend)
Better knowledge management - automatically tag papers, saving scientist time and making
search consistent
Feedback analysis for product from distributors, doctors and end patients

•

Broker document analysis to deepen insight on insured risks to improve risk management

•

Insurance

37
Sample functional use cases
Domain
Marketing

•
•
•
•

Voice of Customer analysis
New product ideas
Competitor analysis
Complaint monitoring

HR

•
•

Drawing insights from employee suggestions
Analysing unstructured inputs in evaluations and improving training efficacy

Risk

•

Internal document monitoring for risk and compliance

Legal

38

Description

•

Better contract management
Veda Solutions Currently Deployed
Veda for Business Process Workflow
• Configurable to any Business
requirement across Industries
• Sources of content can be structured
AND Unstructured

• Can be integrated to various Business Applications - ERP, Content Management, Portals, etc..
• Configurable User Interface with features such as:
– Saving of Search for later reference
– Tabbed Views
– No. of results to be displayed with sort order

39
Veda Solutions Currently Deployed
Veda Social Media Analytics

 Registration & log in
 Inputs from Social Media
 Inputs from Blogs, Websites
 Hierarchy & Relevance Analysis
 Sentiment Analysis
 Rich Reporting

40
Veda Solutions Currently Deployed
Veda Recruiter

41
Veda Solutions Currently Deployed
Veda Patent Search
 Registration & log in
 Subscription
 Payment Gateway
 Keyword Search
 Semantic Search
 Rich Internet Application
 Saved Search
 Filters

42
Veda Solutions Currently Deployed
Veda SMS Service
 Registration & log in

• Crunches judgment
text into high
relevance words that
can be sent through
an SMS for
immediate access
• Is combined with
website service
offering full access
for relevant cases

44

 Subscription
 Payment Gateway
 Keyword Search
 Semantic Search
 Legal ontology (Indian)
 Filters
Contact details

Veda Semantics Pvt Ltd

www.vedasemantics.com

Contact person:
Rajat Kumar (CEO)
rajat@vedasemantics.com
# +91-9619308745

45

More Related Content

What's hot

Content Analyst - Conceptualizing LSI Based Text Analytics White Paper
Content Analyst - Conceptualizing LSI Based Text Analytics White PaperContent Analyst - Conceptualizing LSI Based Text Analytics White Paper
Content Analyst - Conceptualizing LSI Based Text Analytics White Paper
John Felahi
 
Designing Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of DiscoveryDesigning Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of Discovery
Joe Lamantia
 
295B_Report_Sentiment_analysis
295B_Report_Sentiment_analysis295B_Report_Sentiment_analysis
295B_Report_Sentiment_analysis
Zahid Azam
 
Data analytics presentation- Management career institute
Data analytics presentation- Management career institute Data analytics presentation- Management career institute
Data analytics presentation- Management career institute
PoojaPatidar11
 
Designing Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of DiscoveryDesigning Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of Discovery
Joe Lamantia
 
Project sentiment analysis
Project sentiment analysisProject sentiment analysis
Project sentiment analysis
Bob Prieto
 

What's hot (20)

Feature Based Semantic Polarity Analysis Through Ontology
Feature Based Semantic Polarity Analysis Through OntologyFeature Based Semantic Polarity Analysis Through Ontology
Feature Based Semantic Polarity Analysis Through Ontology
 
Content Analyst - Conceptualizing LSI Based Text Analytics White Paper
Content Analyst - Conceptualizing LSI Based Text Analytics White PaperContent Analyst - Conceptualizing LSI Based Text Analytics White Paper
Content Analyst - Conceptualizing LSI Based Text Analytics White Paper
 
When to use the different text analytics tools - Meaning Cloud
When to use the different text analytics tools - Meaning CloudWhen to use the different text analytics tools - Meaning Cloud
When to use the different text analytics tools - Meaning Cloud
 
Designing Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of DiscoveryDesigning Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of Discovery
 
Designing Goal-based Experiences
Designing Goal-based ExperiencesDesigning Goal-based Experiences
Designing Goal-based Experiences
 
Search Me: Designing Information Retrieval Experiences
Search Me: Designing Information Retrieval ExperiencesSearch Me: Designing Information Retrieval Experiences
Search Me: Designing Information Retrieval Experiences
 
295B_Report_Sentiment_analysis
295B_Report_Sentiment_analysis295B_Report_Sentiment_analysis
295B_Report_Sentiment_analysis
 
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and ProvidersText/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
 
Taming the Wild West of NLP
Taming the Wild West of NLPTaming the Wild West of NLP
Taming the Wild West of NLP
 
D018212428
D018212428D018212428
D018212428
 
Integrate the most advanced text analytics into your predictive models - Mean...
Integrate the most advanced text analytics into your predictive models - Mean...Integrate the most advanced text analytics into your predictive models - Mean...
Integrate the most advanced text analytics into your predictive models - Mean...
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWSUSING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
 
Data analytics presentation- Management career institute
Data analytics presentation- Management career institute Data analytics presentation- Management career institute
Data analytics presentation- Management career institute
 
Designing Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of DiscoveryDesigning Big Data Interactions Using the Language of Discovery
Designing Big Data Interactions Using the Language of Discovery
 
Project sentiment analysis
Project sentiment analysisProject sentiment analysis
Project sentiment analysis
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Text Analysis in Research
Text Analysis in ResearchText Analysis in Research
Text Analysis in Research
 
Sentiment Analysis of Feedback Data
Sentiment Analysis of Feedback DataSentiment Analysis of Feedback Data
Sentiment Analysis of Feedback Data
 

Viewers also liked

Viewers also liked (6)

Geo Location Semantics
Geo Location SemanticsGeo Location Semantics
Geo Location Semantics
 
Social Fabric of Semantics - SemTech 2010
Social Fabric of Semantics - SemTech 2010Social Fabric of Semantics - SemTech 2010
Social Fabric of Semantics - SemTech 2010
 
Freebase - Semantic Technologies 2010 Code Camp
Freebase - Semantic Technologies 2010 Code CampFreebase - Semantic Technologies 2010 Code Camp
Freebase - Semantic Technologies 2010 Code Camp
 
Text Analytic Summit 2010
Text Analytic Summit 2010Text Analytic Summit 2010
Text Analytic Summit 2010
 
Freebase Schema
Freebase SchemaFreebase Schema
Freebase Schema
 
Qualitative data analysis
Qualitative data analysisQualitative data analysis
Qualitative data analysis
 

Similar to Veda Semantics - introduction document

16     Decision Support and Business Intelligence Systems (9th E.docx
16     Decision Support and Business Intelligence Systems (9th E.docx16     Decision Support and Business Intelligence Systems (9th E.docx
16     Decision Support and Business Intelligence Systems (9th E.docx
RAJU852744
 
16     Decision Support and Business Intelligence Systems (9th E.docx
16     Decision Support and Business Intelligence Systems (9th E.docx16     Decision Support and Business Intelligence Systems (9th E.docx
16     Decision Support and Business Intelligence Systems (9th E.docx
herminaprocter
 
SharePoint Saturday London - The Nuts and Bolts of Metadata Tagging and Taxon...
SharePoint Saturday London - The Nuts and Bolts of Metadata Tagging and Taxon...SharePoint Saturday London - The Nuts and Bolts of Metadata Tagging and Taxon...
SharePoint Saturday London - The Nuts and Bolts of Metadata Tagging and Taxon...
Concept Searching, Inc
 

Similar to Veda Semantics - introduction document (20)

16     Decision Support and Business Intelligence Systems (9th E.docx
16     Decision Support and Business Intelligence Systems (9th E.docx16     Decision Support and Business Intelligence Systems (9th E.docx
16     Decision Support and Business Intelligence Systems (9th E.docx
 
16     Decision Support and Business Intelligence Systems (9th E.docx
16     Decision Support and Business Intelligence Systems (9th E.docx16     Decision Support and Business Intelligence Systems (9th E.docx
16     Decision Support and Business Intelligence Systems (9th E.docx
 
Artificial Intelligence
Artificial Intelligence  Artificial Intelligence
Artificial Intelligence
 
Content analytics
Content analyticsContent analytics
Content analytics
 
A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...
A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...
A Topic Model of Analytics Job Adverts (Operational Research Society Annual C...
 
A Topic Model of Analytics Job Adverts (The Operational Research Society 55th...
A Topic Model of Analytics Job Adverts (The Operational Research Society 55th...A Topic Model of Analytics Job Adverts (The Operational Research Society 55th...
A Topic Model of Analytics Job Adverts (The Operational Research Society 55th...
 
Start With Why: Build Product Progress with a Strong Data Culture
Start With Why: Build Product Progress with a Strong Data CultureStart With Why: Build Product Progress with a Strong Data Culture
Start With Why: Build Product Progress with a Strong Data Culture
 
Start With Why: Build Product Progress with a Strong Data Culture
Start With Why: Build Product Progress with a Strong Data CultureStart With Why: Build Product Progress with a Strong Data Culture
Start With Why: Build Product Progress with a Strong Data Culture
 
SharePoint Saturday London - The Nuts and Bolts of Metadata Tagging and Taxon...
SharePoint Saturday London - The Nuts and Bolts of Metadata Tagging and Taxon...SharePoint Saturday London - The Nuts and Bolts of Metadata Tagging and Taxon...
SharePoint Saturday London - The Nuts and Bolts of Metadata Tagging and Taxon...
 
Unlocking Value from Unstructured Data
Unlocking Value from Unstructured DataUnlocking Value from Unstructured Data
Unlocking Value from Unstructured Data
 
Enabling Success With Big Data - Driven Talent Acquisition
Enabling Success With Big Data - Driven Talent AcquisitionEnabling Success With Big Data - Driven Talent Acquisition
Enabling Success With Big Data - Driven Talent Acquisition
 
Data Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics CapabilitiesData Science - Part I - Sustaining Predictive Analytics Capabilities
Data Science - Part I - Sustaining Predictive Analytics Capabilities
 
Croud Presents: How to Build a Data-driven SEO Strategy Using NLP
Croud Presents: How to Build a Data-driven SEO Strategy Using NLPCroud Presents: How to Build a Data-driven SEO Strategy Using NLP
Croud Presents: How to Build a Data-driven SEO Strategy Using NLP
 
Data Analytics Training in Gurgaon.pdf
Data Analytics Training in Gurgaon.pdfData Analytics Training in Gurgaon.pdf
Data Analytics Training in Gurgaon.pdf
 
Selling Text Analytics to your boss
Selling Text Analytics to your bossSelling Text Analytics to your boss
Selling Text Analytics to your boss
 
Scanning of Business Analysis
Scanning of Business AnalysisScanning of Business Analysis
Scanning of Business Analysis
 
Commercializing Alternative Data
Commercializing Alternative DataCommercializing Alternative Data
Commercializing Alternative Data
 
The Digital Workplace Powered by Intelligent Search
The Digital Workplace Powered by Intelligent SearchThe Digital Workplace Powered by Intelligent Search
The Digital Workplace Powered by Intelligent Search
 
User-Centric Design: How to Leverage Use Cases and User Scenarios to Design S...
User-Centric Design: How to Leverage Use Cases and User Scenarios to Design S...User-Centric Design: How to Leverage Use Cases and User Scenarios to Design S...
User-Centric Design: How to Leverage Use Cases and User Scenarios to Design S...
 
Bardess Moderated - Analytics and Business Intelligence - Society of Informat...
Bardess Moderated - Analytics and Business Intelligence - Society of Informat...Bardess Moderated - Analytics and Business Intelligence - Society of Informat...
Bardess Moderated - Analytics and Business Intelligence - Society of Informat...
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Veda Semantics - introduction document

  • 2. About Veda • A semantic technology service provider leveraging its capabilities to provide standardized and bespoke solutions Awards and references • One of 5 companies worldwide named as Semantic Application Specialists by Gartner (Who’s Who of Text Analytics, September 2012) Formation and background • • Started as a JV with the Fraunhofer Institute, Germany Earlier part of 3i Infotech, a large listed IT form. Acquired by current promoters as part of a management buy out • Headquartered in Bangalore, India’s software capital, with ready access to critical talent • Currently a 20 member team, also having a sales presence in Chicago, USA. Key members of technology team each have over a decade’s worth of experience in semantic technology Who we are Location Team 3
  • 3. Enterprise’ Information Distribution ~30% Unstructured Data: • Consists of textual information like contracts, emails, presentations • 70% of organizations’ information remains in an unstructured form hence it is not utilized at all. ~70% Structured Data: • Consists of information from ERP, CRM systems, XML data • It is organized and manageable • Currently only 30% of organizations’ information is analysed for decision making Are we using only structured data for decision making? What are the critical misses that are made as a result? 5
  • 4. What is hidden in unstructured data Examples of unstructured data • • • • • • • • • Customer complaints Employee feedback Brand perception Financial data from reports Competitive news Information Facts Events etc. And many many more…. What it contains • Insights • Opportunities • Risks • Just the things needed for good decision making! 6
  • 5. Semantics – making sense of unstructured data • Semantics is the study of meaning. It focuses on the relation between signifiers, like words, phrases, signs, and symbols, and what they stand for their denotation. [Wikipedia] • SEMANTICS = MEANING • It is about describing things • In linguistics, semantics is the subfield that is devoted to the study of meaning as inherent at the levels of words, phrases, sentences, and larger units of discourse. 7
  • 6. Industry Overview - Need for Semantic Technology Information overload Heterogeneous Distributed Unorganized High data volumes • • • Increasing numbers Increasing Sources Unmanageable Inefficient retrieval 8 • • • • • • Keyword search is inefficient Lack of Classification and relevance Focus on “Search” rather than “Find” The definition of ‘Data’, which had been artificially restricted to only numerical data, can now extend to text and other unstructured data as well… …Providing more insights and richness for decision making
  • 7. Top 9 Technology Trends Likely to Impact Information Management in 2013 Technology Trend Big Data Modern information infrastructure Semantic technologies The logical data warehouse NoSQL DBMSs In-memory computing Chief data officer and other information-centric roles Information stewardship applications Information valuation / infonomics Source: Gartner 9
  • 8. Broadly, text based offerings can be clubbed under two main heads Statistical text mining • • • • • • 10 Natural language processing Looks for documents based on statistical techniques. Helps identify high frequency terms or expressions Identifies other terms being used in conjunction with them Assigns match probability to documents based on mathematical techniques to facilitate searches and knowledge management Accuracy could be improved further by using machine learning principles • Primary applications: Text mining and document matching (eg VoC analysis, Email analysis, E Discovery, etc) • • • • Parses a sentence to identify nature of words in it More relevant for sentence level analysis as opposed to document level analysis Principles of English, as opposed to statistical techniques, take precedence in analysis Accuracy dependent on strengths of algorithms written Primary applications: Named Entity Extraction (knowledge management), Sentiment analysis (VoC analysis, E mail monitoring, etc)
  • 9. Industry Overview – usual application areas Areas Technique used Social media analytics Better advertising placement CRM information capture and action Sentiment Analysis using NLP Coupled with vertical specific taxonomies E Discovery Auto classification Forensic analysis Statistical text mining Named Entity Recognition (NER) Machine learning Pattern analysis Predictive modelling Statistical text mining Named Entity Recognition Coupled with structured data (e.g. frequency of mails, department information, etc) Knowledge Management Auto tagging and classification Discovery (eg healthcare information sharing) NER (for named entities) Statistical text mining Custom ontologies / semantic networks Vertical specific use cases Examples: Financial services, Publishing, Pharma, Healthcare, Legal, Insurance, etc Various degrees of text mining, NLP and sentiment analysis, and entity extraction techniques Marketing Compliance Risk analysis, Fraud detection 11
  • 10. But purely from an R&D perspective, quality thresholds have a very high standard deviation NLP eDiscovery Ontology 12 • • • • Attaching sentiment to attribute, and attribute to object Handling basic keywords (e.g. I like something, vs. something is like another) Vertical taxonomies that allow aggregation Vertical specific sentiment words (e.g. executing a man vs. executing a transaction, high fuel economy vs. high fuel consumption) High variability in Recall and Precision rates Tagging of concepts remains difficult Summarization techniques based on basic lexical parsing Limited use cases Often seen as multi year projects as opposed to quick win areas
  • 11. The reason for the quality difference is that at many times, client context is not fully understood and the software is not trained on such context • What is the primary purpose for which the tool will be used for: finding trends, better search, forensics, fraud prevention, building predictive models, etc • Are certain terms so common that they must be ignored while doing an analysis • Are there domain specific words that attain a different meaning than in other domains (eg ‘execution’ has a different meaning in financial services than in the news domain) • Should weightages assigned to certain kinds of documents / words be increased to improve relevance • How will the results be presented – are they to be shown visually and not be connected to other enterprise systems, or should they be an integrated part of the overall BI roadmap of an organization Unlike traditional systems, text analytics has a large dependency on context. Consequently, in order to unleash its full potential, the usual bifurcation between consultancy, software development and software implementation must disappear in the case of text analytics. An off-the-shelf product approach will definitely not help, and one must adopt a services model to better serve client needs! 13
  • 12. In addition, there is limited focus on client needs and use cases Technology focused • Companies mostly founded and run by technology experts Customer language • Focus on technology capability and terms as opposed to problems to be solved Product approach 14 • Leave out value to be derived by examining enterprise specific data more closely, or integrating it with structured data for greater insights
  • 13. An example of our Natural Language Processing capabilities “The car model looks like the old one” “I loved the food, but the service was terrible” “Did anyone like the car?” “I really luuuuv it” “The Tokyo office does not like the current prototype of the product. Bob said we should talk to them to find out why they are unhappy. Must close this ASAP to get the launch done by August 2013.” IP protection: • Patent being filed for clause based sentiment extraction process 16 • Can tag sentiments to attributes, and attributes to products • Can handle difficult words, eg ‘like’ based on context – most engines cannot • Can handle anaphora resolution (eg pronouns) • Can handle Named Entity Recognition with high recall and precision
  • 14. Our Discovery product demonstrates the NLP capability in a powerful manner, making consumer feedback actionable • • Clickthrough allows deeper dives into each category • Though price gets mainly negative reviews, not too many people seem to talk about it. Perhaps a discount scheme could help? • Actual sentences are displayed, and things to which the sentiments are attached are highlighted • 17 In this example about a vehicle, most people care about comfort, and luckily, the product gets mostly positive reviews in this area Sentiments are associated with specific aspects of the product
  • 15. Example of Natural Language Processing in Financial Domain (continuing R&D)  Extracts economic factors that have been impacted  Recommendations and predictions help analyze complex financial information in quickest time.  Helps in predictive analytics 18
  • 16. Example of Natural Language Processing in Financial Domain – highlighting outlook by driver (continuing R&D)  Linguistic rules to extract financial / economic indicators  Domain specific verbs and nouns to understand movement Financial markets rebounded strongly in 2006's third quarter . FINANCE ENT : Financial markets ACTION : rebounded TIME : 2006's third quarter MOVEMENT : UP By the end of the third quarter , crude oil had fallen over 20 % from its[crude_oil] July peak , while a similar retreat in natural gas prices produced the latest high-profile hedge fund debacle . FINANCE ENT : crude oil ACTION : had fallen TIME : the end of the third quarter QUANTITY : 20 % MOVEMENT : DOWN FINANCE ENT : natural gas prices ACTION : produced the latest high-profile hedge fund debacle MOVEMENT : DOWN Prices of longer-dated bonds rallied too : the 10-year U. S. Treasury bond yield fell over 60 basis points during the third quarter . FINANCE ENT : Prices of longer-dated bonds ACTION : rallied MOVEMENT : UP FINANCE ENT : the 10-year U. S. Treasury bond yield ACTION : fell over 60 basis points TIME : the third quarter QUANTITY : 60 basis points MOVEMENT : DOWN
  • 17. Example of Natural Language Processing in Financial Domain -extracting Cause and Effect (continuing R&D) As the fourth quarter begins , financial markets remain supported by positive earnings and interest rate trends . FINANCE ENT : financial markets ACTION : remain supported TIME : the fourth quarter CAUSE : positive earnings and interest rate trends EFFECT : financial markets remain supported However , the pace of U. S. economic activity will slow further by year-end as weakness in the housing and automotive sectors becomes increasingly acute . FINANCE ENT : the pace of U. S. economic activity ACTION : will slow TIME : year-end MOVEMENT : DOWN CAUSE : weakness in the housing and automotive sectors becomes increasingly acute . EFFECT : the pace of U. S. economic activity will slow year-end 20
  • 18. An example of our Enterprise capabilities • Ontology modeling using RDF and OWL semantic web standards • Document Matching / Similarity using statistical models and concept based approach for Patent Search, Knowledge Management etc.. • Information Extraction using linguistic models for Fraud Detection, analysis of news stories etc.. • Demonstrated capability for patent search, legal cases, handling survey data • Machine learning capability allows for precision to be attuned and increased for specific client situations • Can disambiguate based on domain specific situations, e.g. execution may mean a different thing in a news domain, vs. executing a transaction in financial services domain 21
  • 19. Veda Text Mining capability – key features Preprocessing Processing Data input in various forms (eg txt, doc, etc) Can accept data from public sources (eg Facebook, Twitter) apart from Enterprise sources • • • • • Removal of junk text around emails Removal of small Emails like “Thanks” Removal of forwarded Emails attached to main Email from analysis Spell checks and autocorrects Language parsing for English • • • • Natural Language and Statistical Processing techniques Extraction of key discussion items from the text, and what is being said in relation to them Key themes from messages and semantic chaining. Can be combined with sentiment analysis as well. Ability to handle high velocity and high volume data using Big Data infrastructure (Hadoop, Storm, etc.) • Input • • Group discussion items into categories and sub categories, while identifying what is being said about them: • Automatic for synonyms, singular and plural, etc • Ability to add / delete categories • Ability to further analyse sub-categories Categorization UI, editing and • • export • 22 Simple, easy custom built UI with filtering and drill down capability Machine learning approach where human insight guides further results Output not only available in visual format, but exportable to other applications or databases
  • 20. Veda Text Mining capability – screens of analysis in progress Clustering conversations into categories using semantic analysis. Example customized outputs 23
  • 21. Our Delivery Capabilities Proof of Concept Trial & Demonstration Delivery Methodology High-level client requirements Detailed solution requirements - Define the scope of work - Delivery framework (core offering + value added services) - Documented External Interfaces with Volume and associated recurring cost (if any) information - User Guide & Training - Proof of concept - Methodology (Agile, Waterfall approach or client specified approach) - Timelines for each deliverable 24 - Responsibility Matrix
  • 22. Delivery Methodology Client assignments Program Activities Project Delivery Program Mgmt Program Initiation Project Kick-off Support Activities 25 Feature Selection Data Set Creation Business Requirements Infrastructure Readiness Program HR Mgmt Analysis and Design Operational Readiness Program Benefits Tracking Change Analysis Project Closure Machine Learning Development Support Delivery Test & Verify Training Release Post Release Support
  • 23. 26 Taking the next step *Implement for a business function/division/a single geography *Multiple features of SIS implemented including cross business solutions leading to concrete measurable gains Phase 3 Veda will solve a business challenge you choose to demonstrate the power of a semantics based solutions in a quick turn around (Typically within few days) exercise Phase 2 Phase 1 For bespoke development, we are prepared to start small, to show clients clear value and RoI Replicating the success of the previous phase – *Across Larger Sections of the enterprise *Wider Data consolidation scope *Multiple output delivery channels *Visible long term gains
  • 24. But ultimately, we believe that clients will benefit considerably by a unified Semantic Information System Staging Area Data Warehouse Reporting Data Mart * Insights from Unstructured data coupled with Analytics from Structured Data assets (E.g. BI, Big Data) Dashboards Databases Structured data Store into Cubes Data Mart Processed data Databases Alerts Unstructured data (Server,SAN,SAS) Internet Public Web Data Ready insights Processed data Online Natural Language processing Email Crawler Ontologies Files Crawler Data Semantic Analysis Knowledge Base Crawler Unstruct ured Data Categorized Data Veda Organising Processes Web Crawler Social Media Auto Classification Visual Segregation Unstructured & Semi-Structured Data Structured Data Social Media 27 Processed data Veda Collection Processes chatter * Collecting unstructured data from disparate sources Databases Formatted data Structured Data * Analyse all collected unstructured data, Organize it using rich knowledge representation/domain ontologies Data Structured Data Data Mart Marketing Purchasing Payroll Sales LOB Applications Operations
  • 25. Veda Approach – COP Framework Our proprietary Collect – Organize- Present framework and tools allow us to undertake quick bespoke development • Connectors Collect — Collect information from variety of (heterogeneous) sources • Information Extraction — Using NLP and semantic analysis • Semantic Net / Ontology Editor — Smart knowledge representation of a domain Organize • Auto Classifier — Classify data and tag it to industry specific concepts automatically • Ontology Reasoning — Analyze industry knowledge and infer from ontological knowledge • Analytics — Identify various patterns and insights from the data Present • Semantic Matching — Provide most relevant information • Semantic Search and Browsing — Semantic explorer to retrieve contextual concept-based information 28
  • 26. Veda’s Value Proposition • Technology Deep understanding of the Semantics space • • Expertise in both NLP and ontologies / taxonomies, and in standards (RDF / OWL) • • In the semantic technology space for more than a decade Team has provided services not only to clients, but to other semantic service providers Tie up with academia • • Delivery 29 Allows for cutting edge R&D • • Tie up with leading Indian university in the area High quality talent pipeline Live - Delivery and Support Turnaround — The Veda Platform is the core that — Is a solution accelerator giving a head start to all our assignments (tested and certified components) — Allows for lower costs — Allows for incremental rollouts
  • 27. Veda’s Value Proposition (contd) • Expertise in Multiple Business Domains • Experience Healthy mix of business and technology expertise – can provide clear use cases for Semantics and help establish clear RoI metrics • Core team members have had experience in Semantic technology since 2003, longer than most other companies • Technology team experienced in providing expertise in a wide variety of business domains leading to speedy and effective solution implementations • Located in India, with associated inherent advantages • Lower cost options for clients with onshore – offshore model • 24 hour work cycle • Large talent pool • Location Tie ups with companies focused on various other related technologies to offer integrated offerings, eg full service offering / working with offshore vendor to make outsourced processes more efficient using semantics 30
  • 28. Veda’s End-to-End Semantic Expertise • Text Analytics — • Analyzing unstructured text, converting to structured data Machine learning — Statistical techniques resulting in increasing accuracy over time (with more inputs) • Sentiment Analysis — • Semantic Information Retrieval — • More artifacts searched/More accurate – e- Mails, Documents, Spreadsheets, Output from existing structured data sources Semantic Web Standards — 32 Identifying if the sentiment of a sentence is positive, negative or neutral (and the various shades in between) Standardized storage and output formats for easier information sharing
  • 29. Past Experience Client Profile Project Description A global publishing house in legal, tax, finance and healthcare  Context-based content research platform for tax & legal domain  Automatic meta-tagging , ontology modeling and ontology driven content reference system. A prominent product manufacturer on inference and reasoning engine  Leveraged semantics for a supply chain process to integrate systems with heterogeneous data sources and help in automatic decision making in case of any disruptions in the cycle.  Provided ontology modeling and application development services. A reputed university and complex systems  Produced a method for organizing and potentially navigating the wide research lab in Australia range of web-pages associated with the Murray-Darling river system in a seamless fashion An analytics software manufacturer in Australia A premier worldwide online providers of news, information, communication, entertainment and shopping services 33  Assist investigation of fraud and terrorism – Establishing links between entities  Unstructured data analysis  Developed a web analytics platform for analyzing click-stream data in real-time.
  • 30. Some sample use cases mapped to our current technology demonstrators Current situation • How Semantics will help Mapping to current Veda technology demonstrator Saved in C drives or in DMS, separate excel sheets maintained to check on timely renewals, etc. Tough to compare specific clauses across contracts or find relevant clause as needed • Search for specific kind of contract and specific clause will throw up (a) master template (b) earlier contracts entered into in the area (c) extracts from the relevant clause • Patent search demonstrator uses similar techniques, allowing the user to also see probabilistic match of documents • Dig deep into embedded code to see what departments and areas will get impacted • Ontology based relational steps make it easy to see connected departments, processes, etc. that will be impacted • Tax caselaw and section ontology created • Mapping social sentiment and reviews done manually or using dictionary based social monitoring tools • Some social marketing and social listening already being done, though not accurate. A better quality NLP engine allows for more accurate results (e.g. the word ‘like’). • Veda Discovery Engine which has sentiment capabilities • Obtaining right resumes using keyword search remains time consuming Employee suggestions in open ended surveys not aggregatable Qualitative comments in employee evaluations not aggregated • Identify key intervention areas at aggregate levels Map trends in overall ratings to key strength and weakness areas • Veda Discovery for aggregation, Veda Txt for identification of gist of comments Metatagging remains a manual process and as a result, searches remain searches, not findings • Automatic metatagging (Persons, Locations, Organizations, concepts, etc.) • Veda Discovery – NER Engine, Veda Legal demonstrator, Veda Msg (for alerts) Legal contracts • Process changes Marketing HR • • • Knowledge management 34 •
  • 31. Sample use cases by industries Domain Publishing, media Allows automatic extraction of people, location, dates and events, being extended to themes and concepts. Helps in automatic metatagging. • Current tagging process is manual and time consuming. Technology provides clear RoI by reducing this time and manual labour, providing consistent tagging, and allowing easier search for future reference, rather than relying on keywords (eg Mahatma vs Gandhi vs Mahatma Gandhi). Oil and Gas 35 Description Can make Incident monitoring and reporting systems more robust, thereby reducing risk of major accidents • For incident reporting, a user need not fill in multiple structured data fields. Text analytics can quickly match data to structured inputs. • Witness reports, once converted to text, can be monitored across incidents for patters that would otherwise have gone unnoticed. Helps make process changes easier and allows all linked aspects to be seen at one go • Helps determine what other processes and safety regulations are relevant if a sub process is sought to be changed (could also include contractual information etc if relevant) Usually, companies have millions of oil well logs which can be classified by performing named entity extraction and enrichment
  • 32. Sample use cases by industries Domain Description Financial services • • • • • Contract matching (including addendums) VoC analysis • Churn prediction • Highlights capability gaps Promotion management • Avoids duplication of creation of similar material across divisions / locations. Saving in man hours and resources by leveraging all available material produced earlier Risk analysis • Manage and gather customer documents from various sources to look for areas of concern “Know your customer” analysis Competitor analysis Financial news analysis for investment managers Telecom • • • Legal interception and pattern recognition SMS analyses for recognizing spam to avoid penalties VoC analysis Airlines • Analysis of unstructured problem and safety logs to avoid incidents • • 36
  • 33. Sample use cases by industries Domain Description Healthcare • Link and compare patient records to obtain insights on: • Symptoms, medicines and discharge times to determine if some medication mixes may be more beneficial than others across a wide set of patient records • why some patients may be re-admitted Pharma • • R&D improvement by allowing scientists, who need to refer to papers but may not know exactly what to look for, to see relevant topics (based on automatic metatagging, and linked ontology at the backend) Better knowledge management - automatically tag papers, saving scientist time and making search consistent Feedback analysis for product from distributors, doctors and end patients • Broker document analysis to deepen insight on insured risks to improve risk management • Insurance 37
  • 34. Sample functional use cases Domain Marketing • • • • Voice of Customer analysis New product ideas Competitor analysis Complaint monitoring HR • • Drawing insights from employee suggestions Analysing unstructured inputs in evaluations and improving training efficacy Risk • Internal document monitoring for risk and compliance Legal 38 Description • Better contract management
  • 35. Veda Solutions Currently Deployed Veda for Business Process Workflow • Configurable to any Business requirement across Industries • Sources of content can be structured AND Unstructured • Can be integrated to various Business Applications - ERP, Content Management, Portals, etc.. • Configurable User Interface with features such as: – Saving of Search for later reference – Tabbed Views – No. of results to be displayed with sort order 39
  • 36. Veda Solutions Currently Deployed Veda Social Media Analytics  Registration & log in  Inputs from Social Media  Inputs from Blogs, Websites  Hierarchy & Relevance Analysis  Sentiment Analysis  Rich Reporting 40
  • 37. Veda Solutions Currently Deployed Veda Recruiter 41
  • 38. Veda Solutions Currently Deployed Veda Patent Search  Registration & log in  Subscription  Payment Gateway  Keyword Search  Semantic Search  Rich Internet Application  Saved Search  Filters 42
  • 39. Veda Solutions Currently Deployed Veda SMS Service  Registration & log in • Crunches judgment text into high relevance words that can be sent through an SMS for immediate access • Is combined with website service offering full access for relevant cases 44  Subscription  Payment Gateway  Keyword Search  Semantic Search  Legal ontology (Indian)  Filters
  • 40. Contact details Veda Semantics Pvt Ltd www.vedasemantics.com Contact person: Rajat Kumar (CEO) rajat@vedasemantics.com # +91-9619308745 45