Custom analyzer using lucene

•Télécharger en tant que PPTX, PDF•

1 j'aime•1,028 vues

GeekGanesh

Built custom analyzer using Lucene

Logiciels

Custom Analyzer in Lucene
Lucene/Solr Meetup
Ganesh.M
http://www.linkedin.com/in/gmurugappan

• Introduction to Analyzer
• Why we require Custom Analyzer
• Use case / Scenario
• Writing custom analyzer
• Know your analyzer

• Analyzer : Analyzes the given text and returns
tokens using Tokenizer and TokenFilter
• Tokenizer : Understands the language and breaks
the text in to tokens.
– WhitespaceTokenizer divides text at whitespace
– LetterTokenizer divides text at non-letter
– CJKTokenizer – Chinese, Japanese, Korean language
tokenizer
• TokenFiler: adds / stem / deletes token
– StopFilter – removes stop words
– PorterStemFilter – Transforms the token

• Lets have the text
“The quick brown fox jumps over lazy dog”
Using Standard Analyzer, it will generate
following tokens
Quick Brown
Fox Jumps
Over Lazy
dog

Know Your analyzer
• It is important to choose best analyzer for
your fields.
• If you choose it wrong then it may not give
expected search result.
• If you ever think you are not expecting the
correct result then check your Analyzer and
Query parser.

Lucene 3.x: Below code will print the tokens
generated from given analyzer
Analyzer analyzer = new SimpleAnalyzer();
TokenStream ts = analyzer.tokenStream(“Field", new
StringReader(“Hello world-2013 "));
ts.reset();
while (ts.incrementToken()) {
System.out.println("token: " +
ts.getAttribute(TermAttribute.class).term());
}
ts.close();

The purpose of Custom Analyzer
• Existing analyzers not always solves our
purpose, some times we need to analyze in a
different way
• Custom Analyzer could use existing inbuilt
filters.
• It could also be used for parsing queries

Use case
• Synonym Injection / Abbreviation Expansion
– Add synonyms at the time of indexing.
– In case of parsing resume, add related content for
a keyword. If you find text “lucene/solr” then you
could add information retrieval, search engine.
– If you are searching medical documents, chat
messages etc you need to expand the
abbreviation / codes at the time of indexing

• Stripping XML / HTML tags and index only the
content
<Address>
<Street>123, MG Road<Street>
<City>Bangalore<Bangalore>
<State>Karnataka<State>
</Address>

• Break Email ID / URL in to multiple tokens
– Sachin Tendulkar
<sachin.tendulkar123@gmail.com>
– Should be analyzed as
• sachin
• tendulkar
• sachin
• tendulkar123
• gmail
• com

$HTMLAnalyzer in Lucene 4.5 public class HTMLAnalyzer extends Analyzer { @Override protected TokenStreamComponents createComponents(String arg0, Reader reader) { HTMLStripCharFilter htmlFilter = new HTMLStripCharFilter(reader); WhitespaceTokenizer tokenizer = new WhitespaceTokenizer(Version.LUCENE_45, htmlFilter); TokenStream result = new LowerCaseFilter(Version.LUCENE_45, tokenizer); return new TokenStreamComponents (tokenizer, result); } }$

HTMLAnalyzer in Solr
<fieldType name="text_html" class="solr.TextField"
positionIncrementGap="100">
<analyzer>
<charFilter class="solr.HTMLStripCharFilterFactory"
escapedTags="a, title" /> <tokenizer
class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldType>

SynonymAnalyzer
• SynonymAnalyzer will inject the synonym as
part of the indexed content using Lucene 3.3
• Check out the code..
https://github.com/geekganesh/SynonymAnal
yzer

PerFieldAnalyzerWrapper
• IndexWriter / IndexWriterConfig will take only
one Analyzer and it will use that for all its
fields.
• We may have multiple fields and each field
should be indexed using specific analyzer then
we need to use PerFieldAnalyzerWrapper
• PerFieldAnalyzerWrapper is used to have
different analyzer per field. It will be passed to
IndexWriter

Contenu connexe

Similaire à Custom analyzer using lucene

Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...

Lucidworks

Configuring Apache Solr for Thai Text Search

sagarote

Autocomplete in elasticsearch

Taimur Qureshi

Lexical Analysis - Compiler Design

Akhil Kaushik

Lucene BootCamp

GokulD

Enterprise Search Solution: Apache SOLR. What's available and why it's so cool

Ecommerce Solution Provider SysIQ

Apache Solr is a state of the art, high performance and scalable search server you can use in your (PHP) application to provide a very feature rich search experience. Besides full-text search, it also provides spell checking, highlighting, facets and powerful functions that can put it in the realm of a general information retrieval engine, replacing complex database queries you would (need to) use otherwise. Use cases range from e-commerce, real-estate database search, intranets/extranets, content management systems, document management systems and anything that offers exploration of structured and/or unstructured information. The recent addition of geo-aware features makes even location searches possible.

Find it, possibly also near you!

Paul Borgermans

Advanced full text searching techniques using Lucene

Asad Abbas

Introduction to Search Engines

Nitin Pande

Accelrys Catalog is a powerful new technology for creating an index of the protocols and components within your organization. You will learn about strategies for indexing and how search capabilities can be deployed to professional client and Web Port end users. You will also learn how to use this technology to find out about system usage to aid with system upgrades, server consolidations, and general system maintenance. The protocol validation capability in the admin portal allows administrators to created standard reports on server usage characteristics. You will learn how to report on violations of IT policies (e.g. around security), bad protocol authoring practices, or missing or incomplete protocol documentation. Developers will also learn how to extend and customize the rules used to create these reports.

(ATS6-PLAT02) Accelrys Catalog and Protocol Validation

BIOVIA

Introduction to Solr

Erik Hatcher

Feedparser

Lindsey Smith

Cd ch2 - lexical analysis

mengistu23

Lucene Bootcamp - 2

GokulD

Let's Build an Inverted Index: Introduction to Apache Lucene/Solr

Sease

Search engines, and Apache Solr in particular, are quickly shifting the focus away from “big data” systems storing massive amounts of raw (but largely unharnessed) content, to “smart data” systems where the most relevant and actionable content is quickly surfaced instead. Apache Solr is the blazing-fast and fault-tolerant distributed search engine leveraged by 90% of Fortune 500 companies. As a community-driven open source project, Solr brings in diverse contributions from many of the top companies in the world, particularly those for whom returning the most relevant results is mission critical. Out of the box, Solr includes advanced capabilities like learning to rank (machine-learned ranking), graph queries and distributed graph traversals, job scheduling for processing batch and streaming data workloads, the ability to build and deploy machine learning models, and a wide variety of query parsers and functions allowing you to very easily build highly relevant and domain-specific semantic search, recommendations, or personalized search experiences. These days, Solr even enables you to run SQL queries directly against it, mixing and matching the full power of Solr’s free-text, geospatial, and other search capabilities with the a prominent query language already known by most developers (and which many external systems can use to query Solr directly). Due to the community-oriented nature of Solr, the ecosystem of capabilities also spans well beyond just the core project. In this talk, we’ll also cover several other projects within the larger Apache Lucene/Solr ecosystem that further enhance Solr’s smart data capabilities: bi-directional integration of Apache Spark and Solr’s capabilities, large-scale entity extraction, semantic knowledge graphs for discovering, traversing, and scoring meaningful relationships within your data, auto-generation of domain-specific ontologies, running SPARQL queries against Solr on RDF triples, probabilistic identification of key phrases within a query or document, conceptual search leveraging Word2Vec, and even Lucidworks’ own Fusion project which extends Solr to provide an enterprise-ready smart data platform out of the box. We’ll dive into how all of these capabilities can fit within your data science toolbox, and you’ll come away with a really good feel for how to build highly relevant “smart data” applications leveraging these key technologies.

The Apache Solr Smart Data Ecosystem

Trey Grainger

Lucene And Solr Intro

pascaldimassimo

Intro to Elasticsearch

Clifford James

Systematic Searching Strategies.pptx

AnPhong9

Sumo Logic QuickStart Webinar Oct 2016

Sumo Logic

Similaire à Custom analyzer using lucene (20)

Semantic & Multilingual Strategies in Lucene/Solr: Presented by Trey Grainger...

Configuring Apache Solr for Thai Text Search

Autocomplete in elasticsearch

Lexical Analysis - Compiler Design

Lucene BootCamp

Enterprise Search Solution: Apache SOLR. What's available and why it's so cool

Find it, possibly also near you!

Advanced full text searching techniques using Lucene

Introduction to Search Engines

(ATS6-PLAT02) Accelrys Catalog and Protocol Validation

Introduction to Solr

Feedparser

Cd ch2 - lexical analysis

Lucene Bootcamp - 2

Let's Build an Inverted Index: Introduction to Apache Lucene/Solr

The Apache Solr Smart Data Ecosystem

Lucene And Solr Intro

Intro to Elasticsearch

Systematic Searching Strategies.pptx

Sumo Logic QuickStart Webinar Oct 2016

Dernier

Artyushina_Guest lecture_YorkU CS May 2024.pptx

AnnaArtyushina1

8257 interfacing 2 in microprocessor for btech students

HimanshiGarg82

The title is not connected to what is inside

shinachiaurasa2

tonesoftg

lanshi9

%in tembisa+277-882-255-28 abortion pills for sale in tembisa

masabamasaba

Direct Style Effect Systems -The Print[A] Example- A Comprehension Aid

Philip Schwarz

We specialize in Psychic Readings, Psychic Love Spells, Binding Love Spells, Obsession Spells, Voodoo Spells, Lottery Spells, Marriage Spells, Black Magic Spells, Palm Readings & much more. Are you depressed? We perform this come-to-me love spell that works instantly with the aim of bringing back the victim to the person performing the magic. Have you lost your lover? We perform this come-to-me love spell that works instantly with the aim of bringing back the victim to the person performing the magic. Have you lost your lover? Do u need to solve any relationship problem? Contact the powerful spells caster chief kule with love spells that work overnight and love spells that really work. Have you found yourself infatuated with a special someone you think could be the one? Are you looking for a spell to provide them with a nudge in the right direction? Or maybe the spell you cast didn’t achieve the results you were hoping for? Whether you’re new or versed in the ways of spell casting, we’re here to help. Today we’re going to provide you with a detailed guide on the types of love spells to cast. Not only that but there’s something for those who wish to find outside advice from more advanced spell casters. We’re also going to provide you with the top sites available to help you with your dilemma. Let’s begin our journey by educating ourselves on love magic and what a real love caster looks like. Love Magic and Love Casters Love magic made its first appearance back in Ancient Egypt and has been an active practice since. This type of magic is a branch of traditional magic and can be practiced in various ways. Typically the more common use of love magic is through the work of spells, but other methods look like Charms Rituals-LOVE Potions-Dolls and even Amulets If you are interested in becoming a love caster, be prepared for what’s to come. A genuine love caster knows that the art of love casting is no easy feat and shouldn’t be done casually. You should know that not only does it require you to be gifted spiritually, but you must be ready to serve others. Someone who is considered a real love caster has experience in all manner of spells, no matter the difficulty. Training yourself in attraction, commitment, and marriage spells is an excellent place to start. But this by no means will make you a professional. Practice your craft and expand your knowledge; understand that you will possess the ability to help others in time truly. Types of Love Spells What better way to start broadening your experiences with love spells than by learning more about them? These spells work like just about any other spell. Simply apply your intention, use a medium (sigils, mantras, candles, or charm bags), and top it off with establishing the belief that you will receive what you want. So what kind of spells are available and which ones suit your needs the best? Let’s take a look at the many options you have at your disposal.

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...

masabamasaba

Investing in AI transformation today The modern business advantage: Uncovering deep insights with AI Organizations around the world have come to recognize AI as the transformative technology that enables them to gain real business advantage. AI’s ability to organize vast quantities of data allows those who implement it to uncover deep business insights, augment human expertise, drive operational efficiency, transform their products, and better serve their customers

Microsoft AI Transformation Partner Playbook.pdf

Willy Marroquin (WillyDevNET)

WSO2CON 2024 - Does Open Source Still Matter?

WSO2

%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...

masabamasaba

ADR, or Architecture Decision Record, is a valuable tool in software development for several reasons. It provides a centralized location for documenting and tracking architectural decisions, aiding both current and future team members. ADRs enhance communication among team members by documenting the rationale behind architectural decisions, especially beneficial during onboarding of new team members or when revisiting decisions. They serve as a knowledge base, enabling teams to learn from past decisions and refine their decision-making process. Additionally, ADRs contribute to transparency by helping stakeholders understand the reasons behind specific architectural choices. As with any other tool or process, introducing them into an organization can face several obstacles, and overcoming these challenges is crucial for successful implementation. In this talk I go through some common problems and our way of solving them.

Architecture decision records - How not to get lost in the past

Papp Krisztián

%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...

masabamasaba

%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg

masabamasaba

+971565801893 Mtp-Kit (500MG) Prices » Dubai [(+971565801893**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Leen Whatsapp +971565801893 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971565801893''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971565801893' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Clinic in Abu Dhabi, United Arab Emirates.+971565801893

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...

Health

WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...

WSO2

%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...

masabamasaba

%in ivory park+277-882-255-28 abortion pills for sale in ivory park

masabamasaba

Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pretoria ● Abortion Pills For Sale In Pretoria ● Pretoria 🏥🚑!! Abortion Clinic Near Me Cost, Price, Women's Clinic Near Me, Abortion Clinic Near, Abortion Doctors Near me, Abortion Services Near Me, Abortion Pills Over The Counter, Abortion Pill Doctors' Offices, Abortion Clinics, Abortion Places Near Me, Cheap Abortion Places Near Me, Medical Abortion & Surgical Abortion, approved cyctotec pills and womb cleaning pills too plus all the instructions needed This Discrete women’s Termination Clinic offers same day services that are safe and pain free, we use approved pills and we clean the womb so that no side effects are present. Our main goal is that of preventing unintended pregnancies and unwanted births every day to enable more women to have children by choice, not chance. We offer Terminations by Pill and The Morning After Pill.” Our Private VIP Abortion Service offers the ultimate in privacy, efficiency and discretion. we do safe and same day termination and we do also womb cleaning as well its done from 1 week up to 28 weeks. We do delivery of our services world wide SAFE ABORTION CLINICS/PILLS ON SALE WE DO DELIVERY OF PILLS ALSO Abortion clinic at very low costs, 100% Guaranteed and it’s safe, pain free and a same day service. It Is A 45 Minutes Procedure, we use tested abortion pills and we do womb cleaning as well. Alternatively the medical abortion pill and womb cleansing !!!

Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...

Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg

In today's dynamic e-commerce landscape, the payment gateway emerges as a linchpin, ensuring smooth and secure transactions between buyers and sellers. In this discourse, we delve into the meticulous process of devising test cases tailored for scrutinizing payment gateways. Crafting precise test cases for payment gateways is a quintessential responsibility for testers operating within the service industry. This article meticulously explores pivotal scenarios integral to how to test payment gateways, coupled with essential guidelines for drafting effective test cases.

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf

kalichargn70th171

This presentation covers the following topics: What is logging? The purpose of logging: Debugging The purpose of logging: Security The purpose of logging: Stats & analytics Traditional logging Traditional logging: Advantages Traditional logging: Disadvantages The solution: Large-scale logging Large-scale logging: Core principles Large-scale logging: Solution types Large-scale logging: Cloud vs on-prem Large-scale logging: Operational complexity Large-scale logging: Security Large-scale logging: Costs Large-scale logging: On-prem comparison - Elasticsearch - Grafana Loki - VictoriaLogs On-prem comparison: Setup and operation On-prem comparison: Costs On-prem comparison: Full-text search support On-prem comparison: How to efficiently query 100TB of logs? On-prem comparison: Integration with CLI tools VictoriaLogs for large-scale logging VictoriaLogs demo instance - Ingestion rate: 3600 messages / minute - The number of log messages: 1.1 billion - Uncompressed log messages’ size: 1.5TB - Compressed log messages’ size: 23GB - Compression ratio: 47x - Memory usage: 150MB VictoriaLogs CLI integration demo - Which errors have occurred in all the apps during the last hour? - How many errors have occurred during the last hour? - Which apps generated the most of errors during the last hour? - The number of per-minute errors for the last 10 minutes - Status codes for the last hour - Non-200 status codes for the last week - Top client IPs for the last 4 weeks with 404 and 500 response status codes - Per-month stats for the given IP across all the logs Large-scale logging solution MUST provide excellent CLI integration VictoriaLogs: (temporary) drawbacks VictoriaLogs: Recap - Easy to setup and operate - The lowest RAM usage and disk space usage (up to 30x less than Elasticsearch and Grafana Loki) - Fast full-text search - Excellent integration with traditional command-line tools for log analysis - Accepts logs from popular log shippers (Filebeat, Fluentbit, Logstash, Vector, Promtail, Grafana Agent) - Open source and free to use!

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024

VictoriaMetrics

Dernier (20)