SlideShare une entreprise Scribd logo
1  sur  15
Technology 
Drives 
Business 
CUSTOM SOLR TOKENIZER 
FLEXIBLE TOKENIZER WITH JFLEX 
2014 BerlinBuzzword
Agenda 
• ME & SHI 
• JFLEX Tokenizer 
• Motivation 
• JFlex ?! 
• Solr implementation 
• Demo 
• Q & A
Markus Klose – Search Consultant 
• Expertise in Solr, Lucene, Elasticsearch, 
Fast ESP 
• Certified Apache Solr Trainer 
• Speaker, Blogger, Coder 
• Author “Einführung in Apache Solr” 
• @markus_klose
SHI GmbH & Co KG 
2013 
2011 
Delivering mission-critical data-driven solution for multiple industries. 
Partnering with 
Partnering with LucidWorks 
2000 Embracing Open Source. 
1994 
Foundation. Development of home-grown information retrieval 
platform. 
2014
OUR MISSION 
Vendor-independent IT Consulting and Software Engineering company. 
Dedicated to deliver next generation Semantic Search, Big Data and Exploratory Data 
Analytics solutions. 
Using Enterprise Data Hub approach for 360° data integration. 
And helping customers to Accelerate (e)Business through better technology adoption 
and data utilization.
Technology 
Drives 
Business 
CUSTOM TOKENIZER WITH JFLEX 
JFlex based tokenizer - the idea is not new, but great
Motivation 1 
• In customer projects we have to deal very 
often with custom „meta“ data 
• IDs 
• Type designation 
• Product description 
• How to face that problem? PatternTokenizer?
Motivation 2 
• Use and combine 
existing tools to be more 
flexible 
• Configuration over 
Coding 
• JFlex allready used in 
ClassicTokenizer / 
StandardTokenizer
UseCase – Type designation 
• Product Data 
• nymj3x1,5 / nym-j 3x1,5 / nymj 3x1,5 / nym-j 3 x 
1,5 
• Search Input 
• nymj 3 1,5 / nym-j 3x1,5 
• Index 
• nymj315 / nymj / nym / j / 315 / 3 / 15
JFlex - The Fast Scanner Generator 
• JFlex is a lexical analyzer generator (aka 
scanner generator) 
• Current version 1.5.1 
• Download - http://jflex.de/download.html 
• Mailing Lists 
• BSD-style license 
• CLI API & GUI
JFlex - The Fast Scanner Generator 
• Berlin Buzzword 26.05.2014 
• LETTERS -> „Berlin“, „Buzzword“ 
• LETTERS and SPACE -> „Berlin Buzzword“ 
• DIGITS -> „26“, „05“, „2014“ 
• DIGITS and . -> „26.05.2014“ 
• LETTERS and SPACE or DIGITS and . 
-> „Berlin Buzzword“ , „26.05.2014“
Custom Tokenizer – Project Setup 
• JAVA - TokenizerFactory 
–> typical factory, tokenizer configuration 
• JAVA - Tokenizer 
-> base class, token manipulation 
• JFLEX – Scanner 
-> description of token patterns 
• (JAVA – Scanner) 
-> Generated scanner
Demo 
ISBN Tokenizer / URL Tokenizer 
https://github.com/scherziglu
Resources 
• JFlex Tokenizer 
• GitHub (https://github.com/scherziglu) 
• Solr Source Code (e.g. ClassicTokenizer) 
• @markus-klose / @SHIEngineers 
• JFlex Websites 
http://jflex.de/ 
• Q & A
CONTACT 
SHI GmbH & Co KG 
Curt-Frenzel-Str. 12 
86167 Augsburg 
Germany 
info@shi-gmbh.com 
+49.821.74 82 633 0 
mma@shi-gmbh.com mk@shi-gmh.com 
@markus_klose 
dwr@sgi-gmbh.com 
@SHIEngineers @wrigley_dan

Contenu connexe

Tendances

App Services - Connecting the dots of Web Mobile and Integration_published
App Services - Connecting the dots of Web Mobile and Integration_publishedApp Services - Connecting the dots of Web Mobile and Integration_published
App Services - Connecting the dots of Web Mobile and Integration_published
Wagner Silveira
 

Tendances (20)

AstriCon2020 The Great Migration
AstriCon2020 The Great MigrationAstriCon2020 The Great Migration
AstriCon2020 The Great Migration
 
From AIX to Zero-ops by Pierre Baillet
From AIX to Zero-ops by Pierre BailletFrom AIX to Zero-ops by Pierre Baillet
From AIX to Zero-ops by Pierre Baillet
 
Cloud demystified, what remains after the fog has lifted.
Cloud demystified, what remains after the fog has lifted.  Cloud demystified, what remains after the fog has lifted.
Cloud demystified, what remains after the fog has lifted.
 
Piwik presentation 2011
Piwik presentation 2011Piwik presentation 2011
Piwik presentation 2011
 
App Services - Connecting the dots of Web Mobile and Integration_published
App Services - Connecting the dots of Web Mobile and Integration_publishedApp Services - Connecting the dots of Web Mobile and Integration_published
App Services - Connecting the dots of Web Mobile and Integration_published
 
AWS as a code - using ansible
 AWS as a code - using ansible  AWS as a code - using ansible
AWS as a code - using ansible
 
KIWI IoT Presentation
KIWI IoT PresentationKIWI IoT Presentation
KIWI IoT Presentation
 
Артем Логинов «NoSQL DBMSs review and non-relational approaches to store data»
Артем Логинов «NoSQL DBMSs review and non-relational approaches to store data»Артем Логинов «NoSQL DBMSs review and non-relational approaches to store data»
Артем Логинов «NoSQL DBMSs review and non-relational approaches to store data»
 
TYPO3 and t3kit overview
TYPO3 and t3kit overviewTYPO3 and t3kit overview
TYPO3 and t3kit overview
 
OpenStack Summit Hong Kong
OpenStack Summit Hong KongOpenStack Summit Hong Kong
OpenStack Summit Hong Kong
 
IoT-Stockholm-Intro_to_BLE
IoT-Stockholm-Intro_to_BLEIoT-Stockholm-Intro_to_BLE
IoT-Stockholm-Intro_to_BLE
 
Avoid SPOF in Cloud-native Apps
Avoid SPOF in Cloud-native AppsAvoid SPOF in Cloud-native Apps
Avoid SPOF in Cloud-native Apps
 
OSGi for IoT: the good, the bad and the ugly - Tim Verbelen
OSGi for IoT: the good, the bad and the ugly - Tim VerbelenOSGi for IoT: the good, the bad and the ugly - Tim Verbelen
OSGi for IoT: the good, the bad and the ugly - Tim Verbelen
 
Introducing Fn Project
Introducing Fn ProjectIntroducing Fn Project
Introducing Fn Project
 
AWS Finland Meetup 2019 October
AWS Finland Meetup 2019 OctoberAWS Finland Meetup 2019 October
AWS Finland Meetup 2019 October
 
SIP is hard, let's go shopping!
SIP is hard, let's go shopping!SIP is hard, let's go shopping!
SIP is hard, let's go shopping!
 
TAD Summit 2016 - The Mobile World Up Side Down
TAD Summit 2016 - The Mobile World Up Side DownTAD Summit 2016 - The Mobile World Up Side Down
TAD Summit 2016 - The Mobile World Up Side Down
 
IoT in the Cloud: Build and Unleash the Value in your Renewable Energy System
IoT in the Cloud: Build and Unleash the Value in your Renewable Energy SystemIoT in the Cloud: Build and Unleash the Value in your Renewable Energy System
IoT in the Cloud: Build and Unleash the Value in your Renewable Energy System
 
Virtual training InfluxCloud 2018
Virtual training   InfluxCloud 2018Virtual training   InfluxCloud 2018
Virtual training InfluxCloud 2018
 
Will ServerLess kill containers and Operations
Will ServerLess kill containers and OperationsWill ServerLess kill containers and Operations
Will ServerLess kill containers and Operations
 

En vedette

CD-Neuheiten August 2011 (Im Vertrieb der NAXOS Deutschland GmbH)
CD-Neuheiten August 2011 (Im Vertrieb der NAXOS Deutschland GmbH)CD-Neuheiten August 2011 (Im Vertrieb der NAXOS Deutschland GmbH)
CD-Neuheiten August 2011 (Im Vertrieb der NAXOS Deutschland GmbH)
NAXOS Deutschland GmbH
 
CD-Neuheiten September 2011 (Im Vertrieb der NAXOS Deutschland GmbH)
CD-Neuheiten September 2011 (Im Vertrieb der NAXOS Deutschland GmbH)CD-Neuheiten September 2011 (Im Vertrieb der NAXOS Deutschland GmbH)
CD-Neuheiten September 2011 (Im Vertrieb der NAXOS Deutschland GmbH)
NAXOS Deutschland GmbH
 
DVD-SonderVÖ-Neuheiten Juni 2011 (Im Vertrieb der NAXOS Deutschland GmbH)
DVD-SonderVÖ-Neuheiten Juni 2011 (Im Vertrieb der NAXOS Deutschland GmbH)DVD-SonderVÖ-Neuheiten Juni 2011 (Im Vertrieb der NAXOS Deutschland GmbH)
DVD-SonderVÖ-Neuheiten Juni 2011 (Im Vertrieb der NAXOS Deutschland GmbH)
NAXOS Deutschland GmbH
 
First 50
First 50First 50
First 50
Jackane
 

En vedette (20)

José cabezas
José cabezasJosé cabezas
José cabezas
 
Power point...en version casi terminado...........khkjhvjgb
Power point...en version casi terminado...........khkjhvjgbPower point...en version casi terminado...........khkjhvjgb
Power point...en version casi terminado...........khkjhvjgb
 
Gv act1 situación problema
Gv act1 situación problemaGv act1 situación problema
Gv act1 situación problema
 
Agua y sales minerales
Agua y sales mineralesAgua y sales minerales
Agua y sales minerales
 
Octubre172013 clase-5
Octubre172013 clase-5Octubre172013 clase-5
Octubre172013 clase-5
 
CD-Neuheiten August 2011 (Im Vertrieb der NAXOS Deutschland GmbH)
CD-Neuheiten August 2011 (Im Vertrieb der NAXOS Deutschland GmbH)CD-Neuheiten August 2011 (Im Vertrieb der NAXOS Deutschland GmbH)
CD-Neuheiten August 2011 (Im Vertrieb der NAXOS Deutschland GmbH)
 
Blu-ray, DVD- und CD-Neuheiten November 2013 Nr. 4 (Im Vertrieb der NAXOS Deu...
Blu-ray, DVD- und CD-Neuheiten November 2013 Nr. 4 (Im Vertrieb der NAXOS Deu...Blu-ray, DVD- und CD-Neuheiten November 2013 Nr. 4 (Im Vertrieb der NAXOS Deu...
Blu-ray, DVD- und CD-Neuheiten November 2013 Nr. 4 (Im Vertrieb der NAXOS Deu...
 
Hrv musik
Hrv musikHrv musik
Hrv musik
 
8. Community Training ITmitte.de - technische Neuerungen 2012
8. Community Training ITmitte.de - technische Neuerungen 20128. Community Training ITmitte.de - technische Neuerungen 2012
8. Community Training ITmitte.de - technische Neuerungen 2012
 
4 mario saavedra temuco
4 mario saavedra temuco4 mario saavedra temuco
4 mario saavedra temuco
 
CD-Neuheiten September 2011 (Im Vertrieb der NAXOS Deutschland GmbH)
CD-Neuheiten September 2011 (Im Vertrieb der NAXOS Deutschland GmbH)CD-Neuheiten September 2011 (Im Vertrieb der NAXOS Deutschland GmbH)
CD-Neuheiten September 2011 (Im Vertrieb der NAXOS Deutschland GmbH)
 
Präsentationen
PräsentationenPräsentationen
Präsentationen
 
DVD-SonderVÖ-Neuheiten Juni 2011 (Im Vertrieb der NAXOS Deutschland GmbH)
DVD-SonderVÖ-Neuheiten Juni 2011 (Im Vertrieb der NAXOS Deutschland GmbH)DVD-SonderVÖ-Neuheiten Juni 2011 (Im Vertrieb der NAXOS Deutschland GmbH)
DVD-SonderVÖ-Neuheiten Juni 2011 (Im Vertrieb der NAXOS Deutschland GmbH)
 
Weinshop Weingrube.com präsentiert Weine aus dem Weingut Scheiblhofer!
Weinshop Weingrube.com präsentiert Weine aus dem Weingut Scheiblhofer!Weinshop Weingrube.com präsentiert Weine aus dem Weingut Scheiblhofer!
Weinshop Weingrube.com präsentiert Weine aus dem Weingut Scheiblhofer!
 
Jmorenomar tfg0112
Jmorenomar tfg0112Jmorenomar tfg0112
Jmorenomar tfg0112
 
First 50
First 50First 50
First 50
 
Unidad3
Unidad3Unidad3
Unidad3
 
Boarder
BoarderBoarder
Boarder
 
Trackoid Rescue - eine mobile Lösung zur Unterstützung von Rettungsmannschaften
Trackoid Rescue - eine mobile Lösung zur Unterstützung von RettungsmannschaftenTrackoid Rescue - eine mobile Lösung zur Unterstützung von Rettungsmannschaften
Trackoid Rescue - eine mobile Lösung zur Unterstützung von Rettungsmannschaften
 
Blu-ray, DVD- und CD-Neuheiten April Nr. 1 (Im Vertrieb der NAXOS Deutschland...
Blu-ray, DVD- und CD-Neuheiten April Nr. 1 (Im Vertrieb der NAXOS Deutschland...Blu-ray, DVD- und CD-Neuheiten April Nr. 1 (Im Vertrieb der NAXOS Deutschland...
Blu-ray, DVD- und CD-Neuheiten April Nr. 1 (Im Vertrieb der NAXOS Deutschland...
 

Similaire à Custom Solr Tokenizer Flexible Tokenizer with JFlex

Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
Lucidworks (Archived)
 

Similaire à Custom Solr Tokenizer Flexible Tokenizer with JFlex (20)

Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull Just the Job: Employing Solr for Recruitment Search -Charlie Hull
Just the Job: Employing Solr for Recruitment Search -Charlie Hull
 
GraphTour - Neo4j Database Overview
GraphTour - Neo4j Database OverviewGraphTour - Neo4j Database Overview
GraphTour - Neo4j Database Overview
 
Cincom Smalltalk News
Cincom Smalltalk NewsCincom Smalltalk News
Cincom Smalltalk News
 
GraphTalk Copenhagen - Introduction to Graphs and Neo4j
GraphTalk Copenhagen - Introduction to Graphs and Neo4jGraphTalk Copenhagen - Introduction to Graphs and Neo4j
GraphTalk Copenhagen - Introduction to Graphs and Neo4j
 
Knolidge - Discover What You Have
Knolidge - Discover What You HaveKnolidge - Discover What You Have
Knolidge - Discover What You Have
 
Logmatic at ElasticSearch November Paris meetup
Logmatic at ElasticSearch November Paris meetupLogmatic at ElasticSearch November Paris meetup
Logmatic at ElasticSearch November Paris meetup
 
What is the Siemens Open Library, and How it Decreased Development Time for E...
What is the Siemens Open Library, and How it Decreased Development Time for E...What is the Siemens Open Library, and How it Decreased Development Time for E...
What is the Siemens Open Library, and How it Decreased Development Time for E...
 
Scalable Search Analytics
Scalable Search AnalyticsScalable Search Analytics
Scalable Search Analytics
 
ICIC 2013 Conference Proceedings Andreas Pesenhofer max.recall
ICIC 2013 Conference Proceedings Andreas Pesenhofer max.recallICIC 2013 Conference Proceedings Andreas Pesenhofer max.recall
ICIC 2013 Conference Proceedings Andreas Pesenhofer max.recall
 
Neo4j 4 Overview
Neo4j 4 OverviewNeo4j 4 Overview
Neo4j 4 Overview
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Webinar: Value Gain by Modernizing with Applicationinsights1.5
Webinar: Value Gain by Modernizing with Applicationinsights1.5Webinar: Value Gain by Modernizing with Applicationinsights1.5
Webinar: Value Gain by Modernizing with Applicationinsights1.5
 
Data mining tools overall
Data mining tools overallData mining tools overall
Data mining tools overall
 
Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?Beyond DevOps: How Netflix Bridges the Gap?
Beyond DevOps: How Netflix Bridges the Gap?
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
 
MarkLogic User Group - Best of MLW and Search + Semantics
MarkLogic User Group - Best of MLW and Search + SemanticsMarkLogic User Group - Best of MLW and Search + Semantics
MarkLogic User Group - Best of MLW and Search + Semantics
 
Extendable Applications in Go
Extendable Applications in GoExtendable Applications in Go
Extendable Applications in Go
 
Presentation meetup ElasticSearch Paris #10
Presentation meetup ElasticSearch Paris #10Presentation meetup ElasticSearch Paris #10
Presentation meetup ElasticSearch Paris #10
 
Republica 2014 open-source_in_the_wild
Republica 2014 open-source_in_the_wildRepublica 2014 open-source_in_the_wild
Republica 2014 open-source_in_the_wild
 
Transforming Enterprise Release Management in Elastic Beanstalk using Jenkins...
Transforming Enterprise Release Management in Elastic Beanstalk using Jenkins...Transforming Enterprise Release Management in Elastic Beanstalk using Jenkins...
Transforming Enterprise Release Management in Elastic Beanstalk using Jenkins...
 

Plus de SHI Search | Analytics | Big Data

Mit Customer-Journey-Analytics und Recommendations neue Potenziale erschließen
Mit Customer-Journey-Analytics und Recommendations neue Potenziale erschließenMit Customer-Journey-Analytics und Recommendations neue Potenziale erschließen
Mit Customer-Journey-Analytics und Recommendations neue Potenziale erschließen
SHI Search | Analytics | Big Data
 
Neue Kundenpotenziale durch Recommendations erschließen (Vortrag E-Commerce Tag)
Neue Kundenpotenziale durch Recommendations erschließen (Vortrag E-Commerce Tag)Neue Kundenpotenziale durch Recommendations erschließen (Vortrag E-Commerce Tag)
Neue Kundenpotenziale durch Recommendations erschließen (Vortrag E-Commerce Tag)
SHI Search | Analytics | Big Data
 
Suche und Navigation in Online-Shops. Mit Apache Solr und Elasticsearch
Suche und Navigation in Online-Shops. Mit Apache Solr und ElasticsearchSuche und Navigation in Online-Shops. Mit Apache Solr und Elasticsearch
Suche und Navigation in Online-Shops. Mit Apache Solr und Elasticsearch
SHI Search | Analytics | Big Data
 
Apache Solr vs. Elasticsearch - And The Winner Is...! Ein Vergleich der Shoot...
Apache Solr vs. Elasticsearch - And The Winner Is...! Ein Vergleich der Shoot...Apache Solr vs. Elasticsearch - And The Winner Is...! Ein Vergleich der Shoot...
Apache Solr vs. Elasticsearch - And The Winner Is...! Ein Vergleich der Shoot...
SHI Search | Analytics | Big Data
 

Plus de SHI Search | Analytics | Big Data (15)

Buzzword Bingo E-Commerce
Buzzword Bingo E-CommerceBuzzword Bingo E-Commerce
Buzzword Bingo E-Commerce
 
E commerce-tag berlin-nichts_im_sortiment_gefunden
E commerce-tag berlin-nichts_im_sortiment_gefundenE commerce-tag berlin-nichts_im_sortiment_gefunden
E commerce-tag berlin-nichts_im_sortiment_gefunden
 
Mit Customer-Journey-Analytics und Recommendations neue Potenziale erschließen
Mit Customer-Journey-Analytics und Recommendations neue Potenziale erschließenMit Customer-Journey-Analytics und Recommendations neue Potenziale erschließen
Mit Customer-Journey-Analytics und Recommendations neue Potenziale erschließen
 
Apache Solr - die Moderne Open Source Technologie
Apache Solr - die Moderne Open Source TechnologieApache Solr - die Moderne Open Source Technologie
Apache Solr - die Moderne Open Source Technologie
 
Neue Potentiale durch Recommendations erschliessen und Conversions steigern (...
Neue Potentiale durch Recommendations erschliessen und Conversions steigern (...Neue Potentiale durch Recommendations erschliessen und Conversions steigern (...
Neue Potentiale durch Recommendations erschliessen und Conversions steigern (...
 
Neue Kundenpotenziale durch Recommendations erschließen (Vortrag E-Commerce Tag)
Neue Kundenpotenziale durch Recommendations erschließen (Vortrag E-Commerce Tag)Neue Kundenpotenziale durch Recommendations erschließen (Vortrag E-Commerce Tag)
Neue Kundenpotenziale durch Recommendations erschließen (Vortrag E-Commerce Tag)
 
Mehr Umsatz mit einer intelligenten Shop-Suche
Mehr Umsatz mit einer intelligenten Shop-SucheMehr Umsatz mit einer intelligenten Shop-Suche
Mehr Umsatz mit einer intelligenten Shop-Suche
 
What’s new in Apache Solr 4.7 und Elasticsearch 1.1
What’s new in Apache Solr 4.7 und Elasticsearch 1.1What’s new in Apache Solr 4.7 und Elasticsearch 1.1
What’s new in Apache Solr 4.7 und Elasticsearch 1.1
 
Suche und Navigation in Online-Shops. Mit Apache Solr und Elasticsearch
Suche und Navigation in Online-Shops. Mit Apache Solr und ElasticsearchSuche und Navigation in Online-Shops. Mit Apache Solr und Elasticsearch
Suche und Navigation in Online-Shops. Mit Apache Solr und Elasticsearch
 
Setting-up Elasticsearch, Logstash, Kibana für agile Datenanalyse
Setting-up Elasticsearch, Logstash, Kibana für agile DatenanalyseSetting-up Elasticsearch, Logstash, Kibana für agile Datenanalyse
Setting-up Elasticsearch, Logstash, Kibana für agile Datenanalyse
 
Elasticsearch Cluster Management mit Marvel
Elasticsearch Cluster Management mit MarvelElasticsearch Cluster Management mit Marvel
Elasticsearch Cluster Management mit Marvel
 
Apache Solr vs. Elasticsearch - And The Winner Is...! Ein Vergleich der Shoot...
Apache Solr vs. Elasticsearch - And The Winner Is...! Ein Vergleich der Shoot...Apache Solr vs. Elasticsearch - And The Winner Is...! Ein Vergleich der Shoot...
Apache Solr vs. Elasticsearch - And The Winner Is...! Ein Vergleich der Shoot...
 
Überblick über die Suchplattform LucidWorks Search 2.1
Überblick über die Suchplattform LucidWorks Search 2.1Überblick über die Suchplattform LucidWorks Search 2.1
Überblick über die Suchplattform LucidWorks Search 2.1
 
Relevantes schneller finden – mit-Lucene und Solr
Relevantes schneller finden – mit-Lucene und SolrRelevantes schneller finden – mit-Lucene und Solr
Relevantes schneller finden – mit-Lucene und Solr
 
Jax 2012 - Apache Solr as Enterprise Search Platform
Jax 2012 - Apache Solr as Enterprise Search PlatformJax 2012 - Apache Solr as Enterprise Search Platform
Jax 2012 - Apache Solr as Enterprise Search Platform
 

Dernier

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 

Dernier (20)

ManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide DeckManageIQ - Sprint 236 Review - Slide Deck
ManageIQ - Sprint 236 Review - Slide Deck
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 

Custom Solr Tokenizer Flexible Tokenizer with JFlex

  • 1. Technology Drives Business CUSTOM SOLR TOKENIZER FLEXIBLE TOKENIZER WITH JFLEX 2014 BerlinBuzzword
  • 2. Agenda • ME & SHI • JFLEX Tokenizer • Motivation • JFlex ?! • Solr implementation • Demo • Q & A
  • 3. Markus Klose – Search Consultant • Expertise in Solr, Lucene, Elasticsearch, Fast ESP • Certified Apache Solr Trainer • Speaker, Blogger, Coder • Author “Einführung in Apache Solr” • @markus_klose
  • 4. SHI GmbH & Co KG 2013 2011 Delivering mission-critical data-driven solution for multiple industries. Partnering with Partnering with LucidWorks 2000 Embracing Open Source. 1994 Foundation. Development of home-grown information retrieval platform. 2014
  • 5. OUR MISSION Vendor-independent IT Consulting and Software Engineering company. Dedicated to deliver next generation Semantic Search, Big Data and Exploratory Data Analytics solutions. Using Enterprise Data Hub approach for 360° data integration. And helping customers to Accelerate (e)Business through better technology adoption and data utilization.
  • 6. Technology Drives Business CUSTOM TOKENIZER WITH JFLEX JFlex based tokenizer - the idea is not new, but great
  • 7. Motivation 1 • In customer projects we have to deal very often with custom „meta“ data • IDs • Type designation • Product description • How to face that problem? PatternTokenizer?
  • 8. Motivation 2 • Use and combine existing tools to be more flexible • Configuration over Coding • JFlex allready used in ClassicTokenizer / StandardTokenizer
  • 9. UseCase – Type designation • Product Data • nymj3x1,5 / nym-j 3x1,5 / nymj 3x1,5 / nym-j 3 x 1,5 • Search Input • nymj 3 1,5 / nym-j 3x1,5 • Index • nymj315 / nymj / nym / j / 315 / 3 / 15
  • 10. JFlex - The Fast Scanner Generator • JFlex is a lexical analyzer generator (aka scanner generator) • Current version 1.5.1 • Download - http://jflex.de/download.html • Mailing Lists • BSD-style license • CLI API & GUI
  • 11. JFlex - The Fast Scanner Generator • Berlin Buzzword 26.05.2014 • LETTERS -> „Berlin“, „Buzzword“ • LETTERS and SPACE -> „Berlin Buzzword“ • DIGITS -> „26“, „05“, „2014“ • DIGITS and . -> „26.05.2014“ • LETTERS and SPACE or DIGITS and . -> „Berlin Buzzword“ , „26.05.2014“
  • 12. Custom Tokenizer – Project Setup • JAVA - TokenizerFactory –> typical factory, tokenizer configuration • JAVA - Tokenizer -> base class, token manipulation • JFLEX – Scanner -> description of token patterns • (JAVA – Scanner) -> Generated scanner
  • 13. Demo ISBN Tokenizer / URL Tokenizer https://github.com/scherziglu
  • 14. Resources • JFlex Tokenizer • GitHub (https://github.com/scherziglu) • Solr Source Code (e.g. ClassicTokenizer) • @markus-klose / @SHIEngineers • JFlex Websites http://jflex.de/ • Q & A
  • 15. CONTACT SHI GmbH & Co KG Curt-Frenzel-Str. 12 86167 Augsburg Germany info@shi-gmbh.com +49.821.74 82 633 0 mma@shi-gmbh.com mk@shi-gmh.com @markus_klose dwr@sgi-gmbh.com @SHIEngineers @wrigley_dan

Notes de l'éditeur

  1. 9
  2. 36/840 E+P USB-Kabel 000(VE10) 6-30 3S+1Ö M12FR-3L 1x2
  3. JFlex is a lexical analyzer generator (also known as scanner generator) for Java(tm), written in Java(tm). It is also a rewrite of the very useful tool JLex which was developed by Elliot Berk at Princeton University. As Vern Paxson states for his C/C++ tool flex: They do not share any code though. JFlex is designed to work together with the LALR parser generator CUP by Scott Hudson, and the Java modification of Berkeley Yacc BYacc/J by Bob Jamison. It can also be used together with other parser generators like ANTLR or as a standalone tool. JFlex has three mailing lists: jflex-announce is low traffic and read-only for announcements of new releases, jflex-users is for help and discussions, and jflex-devel for developer discussions. If you would like to subscribe to either of the first two, please enter your email address below, check the appropriate boxes, and press [subscribe]. For the developer list, see the mailing lists page. Creating java classes based on a grammar that parses input
  4. Show factory & solrconfig.xml Show Tokenizer -> incrementToken Show JFlex File + Compilation
  5. Step 1 nur text Step 2 simple kombination Step 3 kompex setup ISBN url protocol://subdomain.site.domain/directory