Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

AI-Powered Linguistics and Search with Fusion and Rosette

For a personalized search experience, search curation requires robust text interpretation, data enrichment, relevancy tuning and recommendations. In order to achieve this, language and entity identification are crucial.

For teams working on search applications, advanced language packages allow them to achieve greater recall without sacrificing precision.

Join us for a guided tour of our new Advanced Linguistics packages, available in Fusion, thanks to the technology partnership between Lucidworks and Basistech.

We’ll explore the application of language identification and entity extraction in the context of search, along with practical examples of personalizing search and enhancing entity extraction.

In this webinar, we’ll cover:
-How Fusion uses the Rosette Basic Linguistics and Entity Extraction packages
-Tips for improving language identification and treatment as well as data enrichment for personalization
-Speech2 demo modeling Active Recommendation
-Use Rosette’s packages with Fusion Pipelines to build custom entities for specific domain use cases

Featuring:
-Radu Miclaus, Director of Product, AI and Cloud, Lucidworks, Lucidworks
-Robert Lucarini, Senior Software Engineer, Lucidworks
-Nick Belanger, Solutions Engineer, Basis Technology

  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

AI-Powered Linguistics and Search with Fusion and Rosette

  1. 1. 1 AI-Powered Linguistics & Search ROSETTE FOR FUSION
  2. 2. 2 Today’s Speakers Radu Miclaus Director of Product, AI and Cloud Lucidworks Robert Lucarini Senior Software Engineer Lucidworks Nick Belanger Solutions Engineer Basis Technology
  3. 3. 3 Agenda Challenges with Languages in Search Applications How Fusion uses Rosette to address these Challenges Deeper dive into Entities Customization
  4. 4. 4 Personalization through Search experience Documents Search Curation Personalization Text Interpretation Data Enrichment Relevancy Tuning Exactly what I am searching for Guide me to other interesting things Recommendations ✔
  5. 5. 5 • LANGUAGE IDENTIFICATION • CHARACTER NORMALIZATION • GREATER RECALL WITHOUT LOSING PRECISION • METADATA EXTRACTION/ENTITIES/FACETS/FILTERS Challenges with Languages in Search Applications
  6. 6. 6 Fusion + Rosette Best-in-Class Search using Best-in-Class Linguistics &
  7. 7. 77 Boosting Global Search Quality with Rosette Essential Elements of Multilingual Search
  8. 8. 8 Lemmatization What is it? Associates words with the same meaning (child/children; beau/belle/beaux/belles). This is an alternative to stemming which associates words that look alike with endings removed (arsen|ic -- arsen|al). Why it matters Important for European languages where adjective agreement of gender/number and verb conjugation create multiple word forms, associating the forms of a single word increases search recall. Impact on search Increases recall of relevant results, especially for European languages. French examples:
  9. 9. 9 Tokenization What is it? Divide sentences into words for languages written without spaces between words. Why it matters The bigram method ignores meaning and essentially does substring matching of one or two characters. Chinese is highly ambiguous. Any one character could be a single word, but often isn’t. Impact on search Greater precision of Chinese, Japanese, Korean searches.
  10. 10. 10 Chinese Script Conversion What is it? Converts all records or queries to between simplified and traditional Chinese. Why it matters It’s impossible to search all Chinese documents at once unless a user searches twice: in traditional and then simplified Chinese. Impact on search With one query, one can search both simplified and traditional Chinese documents simultaneously and see results in your preferred script.
  11. 11. 11 Decompounding What is it? Splits compound nouns. Why it matters A search for a compound word like Jugendarbeitslosigkeit (German: “youth unemployment”) misses results where the two concepts (“youth” and “unemployment”) are separated (“20% more youth were unemployed this month.” Impact on search Greater recall of German, Dutch, Korean searches. German examples:
  12. 12. 12 Named Entity Recognition (NER) What is it? Adds structure to your unstructured, multilingual text by automatically identifying people, organizations, and locations, dates, products, and much more. Why it matters Filter results for the ones containing the entities most pertinent to your search. Impact on search More quickly refine your search, remove noise, and increase search relevance.
  13. 13. 1313 How Does Fusion Use Rosette?
  14. 14. 14 SOLR and Fusion Rosette Enhancing Fusion - SOLR support for multilingual tokenization - 35 languages supported - 7 entities supported with OpenNLP integration SOLR/Fusion/Rosette Base Linguistics: - 32 supported languages - Sentence tagging - Tokenization - Lemmatization - Part-of-speech tagging - Decompounding - Chinese/Japanese readings Rosette Entity Extractor: - 21 supported languages - 29 entity types and 450+ sub-types detected
  15. 15. 15 Rosette is enhancing Fusion’s capabilities to enrich data for search and personalization. Besides language interpretation, robust Entity Extraction can enhance Search through the usage of Facets.
  16. 16. 1616 Fusion Entities Demo
  17. 17. 17 Entity Extraction Workflow REX engine for Entity Extraction and Fusion Pipelines
  18. 18. 18 Fusion 5 Sample Architecture
  19. 19. 1919 Deeper Dive Entities Customization
  20. 20. BASIS TECHNOLOGY The Rosette Entity Extraction Workflow. 20 The Rosette Entity Extractor: ● comes with expertly crafted models. ● can extract 18 different kinds of entities in more than 20 different languages. ● is made with high quality data. ● Is curated by our dedicated data team. ● Is backed by 25 years of NLP expertise.
  21. 21. BASIS TECHNOLOGY The Rosette Entity Extraction Workflow. 21 Machine or deep learned statistical models that identify entities based on context A high performance gazetteer that is dynamically updatable Rules based extraction based on REGEX style patterns
  22. 22. BASIS TECHNOLOGY Configuration and Customization. 22 Configuration: ● Quick and easy ● Leverages pre-defined capabilities ● Primarily file manipulation Customization: ● Drastically change REx capabilities ● Allows for truly custom approaches ● More time-intensive
  23. 23. BASIS TECHNOLOGY Configuration: Gazetteer and Regex. 23 Gazetteer ● Easy to create/modify/maintain ● Create lists of entities to extract ● Great when set is limited/defined ● Accept and reject Regex ● Match any pattern, simple or complex ● Extract all entities following a pattern ● Requires technical resources ● Accept and reject
  24. 24. BASIS TECHNOLOGY Configuration: Model Training and Custom Processors. 24 Model Training ● Customize the ML models directly ● Train on your genre of text ● Teach it to recognize new entities ● Requires training process Custom Processors ● Execute custom code in a sandbox ● Validation, redaction, transformation ● Create more complex extraction rules ● Accept and reject
  25. 25. 25 Take Away ● Text Interpretation and Enrichment are Crucial to Personalization ● Having robust language and entity support technology is essential for text interpretation and enrichment ● Fusion and Rosette technologies stacks are now integrated to provide the best of AI-Powered Search and AI-Powered Linguistics. ● Visit the BasisTech Booth at Activate
  26. 26. 2626 Questions & Answers
  27. 27. 27

    Soyez le premier à commenter

    Identifiez-vous pour voir les commentaires

For a personalized search experience, search curation requires robust text interpretation, data enrichment, relevancy tuning and recommendations. In order to achieve this, language and entity identification are crucial. For teams working on search applications, advanced language packages allow them to achieve greater recall without sacrificing precision. Join us for a guided tour of our new Advanced Linguistics packages, available in Fusion, thanks to the technology partnership between Lucidworks and Basistech. We’ll explore the application of language identification and entity extraction in the context of search, along with practical examples of personalizing search and enhancing entity extraction. In this webinar, we’ll cover: -How Fusion uses the Rosette Basic Linguistics and Entity Extraction packages -Tips for improving language identification and treatment as well as data enrichment for personalization -Speech2 demo modeling Active Recommendation -Use Rosette’s packages with Fusion Pipelines to build custom entities for specific domain use cases Featuring: -Radu Miclaus, Director of Product, AI and Cloud, Lucidworks, Lucidworks -Robert Lucarini, Senior Software Engineer, Lucidworks -Nick Belanger, Solutions Engineer, Basis Technology

Vues

Nombre de vues

329

Sur Slideshare

0

À partir des intégrations

0

Nombre d'intégrations

1

Actions

Téléchargements

10

Partages

0

Commentaires

0

Mentions J'aime

0

×