This document discusses multi-language content discovery through entity-driven search. It describes Zaizi's semantic search engine called Sensefy, which enriches unstructured text during indexing using natural language processing and entity linking to a knowledge base. This allows for improved search experiences like autocomplete suggestions of named entities and entity types, semantic search by entities and types, and semantic filtering and recommendations. A live demo is also presented.
5. ZAIZI
Experienced at building and delivering a wide range of enterprise solutions across the
whole information life cycle
Alfresco & Ephesoft certified Platinum Partner
Red Hat Enterprise Linux Ready Partner
R&D department specialising in Open Source
Search Solutions
Alfresco Partner of the Year 2012 and
2013
7. Zaizi R&D Department
Giving sense to the content
Enriching it semantically
Adding value to ECM/CMS
More structured content, easy to manage,
link and search
Improving search
Across different domains, data sources, User
Experience
Machine Learning applied research
Content Organization – Recommendation Systems
8. Enterprise Search Problems
Challenge :
Search within Big and Heterogeneus Repositories
Heterogeneus data sources
Filesystems, DB, ECM/CMS, Email, …
Unstructured content in different formats
PDF, text plain, Word …
Documents not linked between each other
Federated Search
across data sources
preserving permissions
centralized endpoint
9. Sensefy
Semantic Enterprise Search Engine
Federated Search
Evolved User Experience
Based on cutting-edge Open Source Frameworks
11. Entity Driven Search
Moving from keywords to Entities
More understandable to Humans
Process the unstructured text at indexing time
Enrich it
Build specific indexes
Use entities and concepts in searches
• Trying to foresee the concepts the user wants to express
12. What is an Entity in our domain ?
Real world concepts
Linked Data resources
Rdf(xml) structured data
• Unique identifier + properties
Stored in a Knowledge Base ( Freebase, DbPedia, Custom Dataset)
13. Redlink
Semantic Cloud platform
Providing Software as a Service
Text analysis and Entity Linking using Knowledge Bases
Linked Data Publishing
Enterprise Data Linking
Open-Source based components
14. Indexing - NLP & Semantic Enrichment
Apache ManifoldCF custom processors/output connectors
From unstructured to structured
NLP Analysis. POS Tagging
Named Entities Recognition
Entity Linking using Knowledge Bases
Disambiguation
Indexing in specific Solr Collections
• Primary Index (documents)
• Entity Index
• Entity Types
15. Search - Smart Autocomplete
Multi Phase suggestions
Closer to natural language query formulation
Named Entities
Entity Types
Document Titles
16. Smart Autocomplete – Named Entities
Infix Suggestion ( ron → Cristiano Ronaldo)
Fuzzy suggestion ( cristinao → Cristiano Ronaldo)
Brief description of the suggested entity
Specific Solr index for the entities
• Schema ( label, notable_type, occurrences...)
• Edge-Ngram token filtered label field
• Fuzzy queries with variable distance / classic queries to the label suggestion
field
17. Smart Autocomplete – Entity Types
Infix Suggestion ( play → Football Player)
Fuzzy suggestion ( foobtall → Football Team)
Multi Language ( calcia → Calciatore[it]( Football Player)[en] )
Multi phase suggestion through properties ( ital →
football player nationality italian)
Specific Solr collection for the entity types
• SolrDocument is an entity type ( type,occurrences,attributes,type hierarchy...)
• EdgeNgram token filtered type
• Multi-language suggestion highlight
18. Smart Autocomplete – configuration
Knowledge base for entity linking and dereference
DbPedia, Freebase, Custom Dataset
Properties
For each entity type of interest
Ldpath will be used to identify the property
in the graph
Hierarchy
All the sub-instances of a type
will automatically inherit their parent properties
to ease the configuration
19. Semantic Search
Search by Named Entity
Ex. Give me all the documents related to
Christian Bale
Search by Entity Type
Ex. Give me all the documents about football players
Search by Entity Type + properties
Ex. Give me all the documents about football players whose nationality is British
Query time Join :
Entity-Entity Type collection → primary Index
20. Semantic Facets
Dynamic calculated semantic facets based on
types and entities from documents
Improve the navigation of results
Allow refined search through semantic information
Configurable custom layer on top of Solr faceting component
21. Semantic More Like This
Search for similar documents based on Entities
and Entity Types
Similarity function based on document meaning
Multi Language / Not based on text tokens but concepts
Solr More Like This on custom fields
Entity Frequency /
Inverted Document Frequency
Entity Type Frequency /
Inverted Document Frequency
22. Live Demo
Context
Problem
Solution
Demo
What's upcoming