SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Building an easy to
use search solution
(for different languages)

Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference
!
“Making search work” track
Speaker
• Co-owner of Netgen - web development
agency, Zagreb, Croatia

• Started as developer 11 years ago
• Now I do variety of things, but can be best

described as International Business Developer

www.netgenlabs.com
So I am still a developer! :)

www.netgenlabs.com
Use case
• Regulatory reform project: cutting of unneeded
legislative, laws and/or procedures

• Netgen is the technology implementation partner
• Project lead by Sense Consulting
• Croatia, Egypt, Vietnam, Armenia, Iraq - mostly
“exotic” countries

www.netgenlabs.com
We would rather work in
Denmark, but seems that
it doesn’t need such a
solution :(

www.netgenlabs.com
How we use search
Solution
• In 2006. simple filter
• Today eZ Publish CMS powered flexible information
architecture with Solr for search 

• Usually 70% common features, 30% customisation 
• Aiming for 90%/10%
• If you interested in tech specifics ask me later…

www.netgenlabs.com
Search features
•
•
•
•
•
•

Simple (default) and advanced search (with filters)
Full text search on complex data, boosting on attribute level
Filtering with multilevel tags/taxonomies
Stopwords
Search time spelling based on indexed data
Sometimes using faceting on result set

www.netgenlabs.com
Additional features
• Sometimes using multi search
• Typing suggestions
• Latest search phrase list

www.netgenlabs.com
Challenges
Characters
• At the beginning we didn’t have Unicode it was a mess!

• Unicode solved a lot of problems but not all
• Same characters can have more byte codes
which is not being normalised by default

www.netgenlabs.com
Indexing
• Indexing files like Word, PDF or similar proved
to be problematic due to character problems

• token delimiter configuration could be
language specific

• stemming sometimes supported, sometimes
not

www.netgenlabs.com
Searching
• search phrase input problems

www.netgenlabs.com
Blind work
• the biggest challenge is that developers don’t know the
language

• first level of testing is very hard
• still can’t trust Google Translate

www.netgenlabs.com
What vehicle would you
use to transport 10 cases
of Heineken?

www.netgenlabs.com
How to overcome this?
Main idea
• lets try to assess search result quality 
• use editors for rating (not the public)
• use most frequently searched terms (we
can’t test all)

• rate results above the fold

www.netgenlabs.com
The tool
• integrated in the public site
• added thumbs up/down buttons for first X
results and only shown to editors

www.netgenlabs.com
Demo
• imported articles to test instance form various
sources about CMS topic

• rating result quality of 7 search terms
• Thumbs up/down for suggested 3 search results
• Test periods are used for framing test data

www.netgenlabs.com
Rating side
Analysing side
Rate measures
• Discounted Cumulative Gain (DCG) - rate sum

discounted based on position in search results

• Normalised Discounted Cumulative Gain (NDCG) -

discounted rate sum normalised against best possible
outcome (to get percentage as the unit)

• Popularity based NDCG - takes into account the
popularity of the search form

http://en.wikipedia.org/wiki/Discounted_cumulative_gain
www.netgenlabs.com
Known problems
• What if good results are not showing? - something bad
is going on with the search engine

• what if there is no good result?
• what about new content added in time?
• at the end of the day measurements are good for

comparing between test periods, not meaningful by
itself

www.netgenlabs.com
Improvements
• opening rating to public users
• using clicks as rates
• implement “did you find what you have looking for?”
feature

• integrate with analytics
• use rate data to boost particular item in search!

www.netgenlabs.com
Questions now or later
ivo@netgen.hr
ilukac.com/twitter
ilukac.com/facebook
ilukac.com/gplus
ilukac.com/linkedin

Contenu connexe

Similaire à Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

You have Selenium... Now what?
You have Selenium... Now what?You have Selenium... Now what?
You have Selenium... Now what?Great Wide Open
 
How to make a great website
How to make a great websiteHow to make a great website
How to make a great websiteDr. Taher Ghazal
 
WE-06-Testing.ppt
WE-06-Testing.pptWE-06-Testing.ppt
WE-06-Testing.pptjaved281701
 
Easy ways to make your site more accessible
Easy ways to make your site more accessibleEasy ways to make your site more accessible
Easy ways to make your site more accessibleJana Veliskova
 
Build your next single page app in ClojureScript and re-frame
Build your next single page app in ClojureScript and re-frameBuild your next single page app in ClojureScript and re-frame
Build your next single page app in ClojureScript and re-framePaul Bostrom
 
Building a custom cms with django
Building a custom cms with djangoBuilding a custom cms with django
Building a custom cms with djangoYann Malet
 
Mobile media module part 6 - app development rev-mf
Mobile media module   part 6 - app development rev-mfMobile media module   part 6 - app development rev-mf
Mobile media module part 6 - app development rev-mfMichelle Ferrier
 
Engage 2020-nerd-for-move-on-from-x pages
Engage 2020-nerd-for-move-on-from-x pagesEngage 2020-nerd-for-move-on-from-x pages
Engage 2020-nerd-for-move-on-from-x pagesHeiko Voigt
 
Tech Thursdays: Building Products
Tech Thursdays: Building ProductsTech Thursdays: Building Products
Tech Thursdays: Building ProductsHayden Bleasel
 
Untangling spring week11
Untangling spring week11Untangling spring week11
Untangling spring week11Derek Jacoby
 
ConFoo: Moving web performance testing to the left
ConFoo: Moving web performance testing to the leftConFoo: Moving web performance testing to the left
ConFoo: Moving web performance testing to the leftTom Chavez
 
Pearls and Must-Have Tools for the Modern Web / .NET Developer
Pearls and Must-Have Tools for the Modern Web / .NET DeveloperPearls and Must-Have Tools for the Modern Web / .NET Developer
Pearls and Must-Have Tools for the Modern Web / .NET DeveloperOfer Zelig
 
ShopekLobek first term work summary
ShopekLobek first term work summaryShopekLobek first term work summary
ShopekLobek first term work summaryAshraf Hamdy
 
Discover the power of browser developer tools
Discover the power of browser developer toolsDiscover the power of browser developer tools
Discover the power of browser developer toolsylefebvre
 
Bruce Lawson Opera Indonesia
Bruce Lawson Opera IndonesiaBruce Lawson Opera Indonesia
Bruce Lawson Opera Indonesiabrucelawson
 
Untying the Knots of Web Dev with Internet Explorer
Untying the Knots of Web Dev with Internet Explorer Untying the Knots of Web Dev with Internet Explorer
Untying the Knots of Web Dev with Internet Explorer Sarah Dutkiewicz
 
Open Lesson How We Built Guide Me Right - Open Campus Tiscali
Open Lesson How We Built Guide Me Right - Open Campus TiscaliOpen Lesson How We Built Guide Me Right - Open Campus Tiscali
Open Lesson How We Built Guide Me Right - Open Campus TiscaliRiccardo Sirigu
 
Minimum Viable Architecture - Good Enough is Good Enough
Minimum Viable Architecture - Good Enough is Good EnoughMinimum Viable Architecture - Good Enough is Good Enough
Minimum Viable Architecture - Good Enough is Good EnoughRandy Shoup
 

Similaire à Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference (20)

You have Selenium... Now what?
You have Selenium... Now what?You have Selenium... Now what?
You have Selenium... Now what?
 
How to make a great website
How to make a great websiteHow to make a great website
How to make a great website
 
Dmdh workshop #6
Dmdh workshop #6Dmdh workshop #6
Dmdh workshop #6
 
WE-06-Testing.ppt
WE-06-Testing.pptWE-06-Testing.ppt
WE-06-Testing.ppt
 
Easy ways to make your site more accessible
Easy ways to make your site more accessibleEasy ways to make your site more accessible
Easy ways to make your site more accessible
 
Build your next single page app in ClojureScript and re-frame
Build your next single page app in ClojureScript and re-frameBuild your next single page app in ClojureScript and re-frame
Build your next single page app in ClojureScript and re-frame
 
Building a custom cms with django
Building a custom cms with djangoBuilding a custom cms with django
Building a custom cms with django
 
Mobile media module part 6 - app development rev-mf
Mobile media module   part 6 - app development rev-mfMobile media module   part 6 - app development rev-mf
Mobile media module part 6 - app development rev-mf
 
Engage 2020-nerd-for-move-on-from-x pages
Engage 2020-nerd-for-move-on-from-x pagesEngage 2020-nerd-for-move-on-from-x pages
Engage 2020-nerd-for-move-on-from-x pages
 
Tech Thursdays: Building Products
Tech Thursdays: Building ProductsTech Thursdays: Building Products
Tech Thursdays: Building Products
 
Untangling spring week11
Untangling spring week11Untangling spring week11
Untangling spring week11
 
ConFoo: Moving web performance testing to the left
ConFoo: Moving web performance testing to the leftConFoo: Moving web performance testing to the left
ConFoo: Moving web performance testing to the left
 
Pearls and Must-Have Tools for the Modern Web / .NET Developer
Pearls and Must-Have Tools for the Modern Web / .NET DeveloperPearls and Must-Have Tools for the Modern Web / .NET Developer
Pearls and Must-Have Tools for the Modern Web / .NET Developer
 
ShopekLobek first term work summary
ShopekLobek first term work summaryShopekLobek first term work summary
ShopekLobek first term work summary
 
Case study
Case studyCase study
Case study
 
Discover the power of browser developer tools
Discover the power of browser developer toolsDiscover the power of browser developer tools
Discover the power of browser developer tools
 
Bruce Lawson Opera Indonesia
Bruce Lawson Opera IndonesiaBruce Lawson Opera Indonesia
Bruce Lawson Opera Indonesia
 
Untying the Knots of Web Dev with Internet Explorer
Untying the Knots of Web Dev with Internet Explorer Untying the Knots of Web Dev with Internet Explorer
Untying the Knots of Web Dev with Internet Explorer
 
Open Lesson How We Built Guide Me Right - Open Campus Tiscali
Open Lesson How We Built Guide Me Right - Open Campus TiscaliOpen Lesson How We Built Guide Me Right - Open Campus Tiscali
Open Lesson How We Built Guide Me Right - Open Campus Tiscali
 
Minimum Viable Architecture - Good Enough is Good Enough
Minimum Viable Architecture - Good Enough is Good EnoughMinimum Viable Architecture - Good Enough is Good Enough
Minimum Viable Architecture - Good Enough is Good Enough
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Dernier (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference

  • 1. Building an easy to use search solution (for different languages) Ivo Lukač @ J.Boye Aarhus 13: Web & Intranet Conference ! “Making search work” track
  • 2. Speaker • Co-owner of Netgen - web development agency, Zagreb, Croatia • Started as developer 11 years ago • Now I do variety of things, but can be best described as International Business Developer www.netgenlabs.com
  • 3. So I am still a developer! :) www.netgenlabs.com
  • 4. Use case • Regulatory reform project: cutting of unneeded legislative, laws and/or procedures • Netgen is the technology implementation partner • Project lead by Sense Consulting • Croatia, Egypt, Vietnam, Armenia, Iraq - mostly “exotic” countries www.netgenlabs.com
  • 5. We would rather work in Denmark, but seems that it doesn’t need such a solution :( www.netgenlabs.com
  • 6. How we use search
  • 7. Solution • In 2006. simple filter • Today eZ Publish CMS powered flexible information architecture with Solr for search • Usually 70% common features, 30% customisation • Aiming for 90%/10% • If you interested in tech specifics ask me later… www.netgenlabs.com
  • 8. Search features • • • • • • Simple (default) and advanced search (with filters) Full text search on complex data, boosting on attribute level Filtering with multilevel tags/taxonomies Stopwords Search time spelling based on indexed data Sometimes using faceting on result set www.netgenlabs.com
  • 9. Additional features • Sometimes using multi search • Typing suggestions • Latest search phrase list www.netgenlabs.com
  • 11. Characters • At the beginning we didn’t have Unicode it was a mess! • Unicode solved a lot of problems but not all • Same characters can have more byte codes which is not being normalised by default www.netgenlabs.com
  • 12. Indexing • Indexing files like Word, PDF or similar proved to be problematic due to character problems • token delimiter configuration could be language specific • stemming sometimes supported, sometimes not www.netgenlabs.com
  • 13. Searching • search phrase input problems www.netgenlabs.com
  • 14. Blind work • the biggest challenge is that developers don’t know the language • first level of testing is very hard • still can’t trust Google Translate www.netgenlabs.com
  • 15. What vehicle would you use to transport 10 cases of Heineken? www.netgenlabs.com
  • 16.
  • 18. Main idea • lets try to assess search result quality • use editors for rating (not the public) • use most frequently searched terms (we can’t test all) • rate results above the fold www.netgenlabs.com
  • 19. The tool • integrated in the public site • added thumbs up/down buttons for first X results and only shown to editors www.netgenlabs.com
  • 20. Demo • imported articles to test instance form various sources about CMS topic • rating result quality of 7 search terms • Thumbs up/down for suggested 3 search results • Test periods are used for framing test data www.netgenlabs.com
  • 23. Rate measures • Discounted Cumulative Gain (DCG) - rate sum discounted based on position in search results • Normalised Discounted Cumulative Gain (NDCG) - discounted rate sum normalised against best possible outcome (to get percentage as the unit) • Popularity based NDCG - takes into account the popularity of the search form http://en.wikipedia.org/wiki/Discounted_cumulative_gain www.netgenlabs.com
  • 24. Known problems • What if good results are not showing? - something bad is going on with the search engine • what if there is no good result? • what about new content added in time? • at the end of the day measurements are good for comparing between test periods, not meaningful by itself www.netgenlabs.com
  • 25. Improvements • opening rating to public users • using clicks as rates • implement “did you find what you have looking for?” feature • integrate with analytics • use rate data to boost particular item in search! www.netgenlabs.com
  • 26. Questions now or later ivo@netgen.hr ilukac.com/twitter ilukac.com/facebook ilukac.com/gplus ilukac.com/linkedin