Text Analysis with SAP HANA

.consulting .solutions .partnership
Text Analysis with SAP HANA

2Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich
Motivation1 3
Text Analysis with SAP HANA2 7
Enhancement Options - Dictionaries and Rules3 21

Motivation1 3

Why do we need Text Analysis?
Oktober 2015 | Text Analysis with SAP HANA - SAP Inside Track Munich 4
• According to Merril Lynch 80-90% of all potentially usable business information may originate in
unstructured form
(Structure, Models and Meaning: Is "unstructured" data merely unmodeled?, Intelligent Enterprise, March 1, 2005.)
• The data might origin from:
 Social Networks
 “Letters” from Customer
 ...
• What is the problem with unstructured data?
• It is unstructured!
 Not organized
 No pre-defined data model
 No metadata or mix of data and metadata
 We have a lot of information that is relevant for the business but we cannot access it 

How can we solve that issue?
• Text Analysis: Extracting high quality information from texts
• Typical process of a text analysis:
 Parsing of the text
 Adding features like linguistic information
 Entity recognition: Is it an organization or a person or a place including domain facts like
requests?
 Sentiment analysis: What attitudinal information is “hidden” in the text?
 Insertion of information to database in structured manner

Motivation1 3

What has this to do with SAP HANA?
© SAP SE

Fulltext Index - Basics
• Starting point: database table containing the text (types like TEXT, NVARCHAR, BLOB …)
• Create a Fulltext index incl. options (see system view SYS.FULLTEXT_INDEXES)

Entity Extraction
• In order to get valuable information out of the data SAP delivers several configurations
• These configurations focus on entity and fact extraction under specific aspects
• Types of Extraction:
 EXTRACTION_CORE
 EXTRACTION_CORE_ENTERPRISE
 EXTRACTION_CORE_PUBLIC_SECTOR
 EXTRACTION_CORE_VOICEOFCUSTOMER

Motivation1 3

Custom Dictionary
• In several use cases you need to enhance the dictionary due to your business domain
• Structure of a dictionary
© SAP SE

Text Analysis with HANA – Workflow of Enhancement
1. Find an extraction configuration that is most fitting for you
2. Copy the configuration into the target folder
3. Create a new custom dictionary
4. Reference the dictionary in your configuration copy
5. Recreate the fulltext index using your custom configuration

Text Analysis with HANA – What’s next?
• Assume that we are in an “industry”-specific context or mining for “slang”-like facts and entities
• Good example for this are sports!
• We use the example of CrossFit® … as there are some funny facts to extract
• Question: How can we extract complex entities from a text?
• Examples:
 Did somebody attend a CrossFit training?
 Does somebody want to join a CrossFit box?

Text Analysis with HANA – Text Analysis Extraction Rules
• Extraction rules (CGUL rules): pattern-based language for pattern matching using character or
token-based regular expressions combined with linguistic attributes to define custom entity types.
• Goal of the rule sets:
 Extract complex facts based on relations between entities and predicates.
 Identify entities in domain-specific language and capture facts expressed in new, popular
“slang”

Text Analysis with HANA – Text Analysis Extraction Rules
Extraction Rule
Regular ExpressionsTokens
Luck Dictionaries

Text Analysis with HANA – “Lessons Learned”
• Text Analysis on SAP HANA is extremely powerful
• Besides the delivered content you have a lot of options to adopt the text analysis to extract the
entities and facts that you need
• This also means you have a lot of options that you can set the wrong way 
• Since SP09 rules get compiled upon activation (no separate compilation necessary)
• The documentation is mostly ok but has room for improvement in case of extraction rules
• Creating custom dictionaries and text rules is cumbersome, finding an error (e. g. a typo) is hell
 No support in IDE 
 You can usually activate all objects, create the index … but the index remains empty 

Q&A

.consulting .solutions .partnership
Dr. Christian Lechner
Principal IT Consultant
+49 (0) 171 7617190
christian.lechner@msg-systems.com
http://scn.sap.com/people/christian.lechner
@lechnerc77

Text Analysis with HANA – Ressources
• SAP HANA Search Developer Guide (Fulltext Index Options)
help.sap.com -> Search Developer Guide
• SAP HANA Text Analysis Developer Guide:
help.sap.com -> TA Developer Guide
• SAP HANA Text Analysis Language Reference Guide:
help.sap.com -> TA Language Refrence Guide
• SAP HANA Text Analysis Extraction Customization Guide:
help.sap.com -> TA Extraction Customization Guide
• YouTube Playlist of SAP HANA Academy:
Text Analysis and Search

Text Analysis with SAP HANA

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (8)

Similaire à Text Analysis with SAP HANA

Similaire à Text Analysis with SAP HANA (20)

Plus de Christian Lechner

Plus de Christian Lechner (10)

Dernier

Dernier (20)

Text Analysis with SAP HANA

Notes de l'éditeur