Solr

•Télécharger en tant que PPTX, PDF•

1 j'aime•601 vues

Peter Svehla

Solr TechSig 18/4/13 slides.

Technologie

What is it?
• Text search index (engine)
• Open source
• Not a search product
• A tool that allows you to create a search
solution

What is it like?
• Google, Google Appliance.
• FAST
• Oracle Secure Enterprise Search
• etc.

Google Appliance:
• Sucks data in
• Can’t really configure
• Stuck with results
• Bonnet is locked

Solr:
• You need to feed data in
• Highly configurable
• Search results can be tuned
• There is no bonnet

Why am I doing a talk?
• Did a course
• LucidWorks content
• Presented by FindWise
• FindWise are a search specialist that use a
range of search engines

Caveats
• Course was in Solr 4.1.0, we use 3.6.1 for
APVMA
• Course focussed on search, not ingestion or
presentation
• Java API recommended for ingestion
• ‘Browse’ interface uses Velocity templates for
presentation, but probably isn’t good enough
for most projects.

Apache Tika
• Data import handler
• Used to be part of Lucene
• XML
• PDF
• Word
• Excel
• etc.

Manifold CF
• Apache
• Connector framework
• Used to connect to content repositories (source)
• Sharepoint
• Documentum
• CMIS
• JDBC
• RSS

Hydra
• FindWise
• Although Solr supports validation (e.g.
‘required’), don’t use it for data cleanup.
• Validation failure inconvenient: whole job fails
• Feed in clean data.
• Use Hydra for cleanup.

Apache ZooKeeper
• Used for SolrCloud
• Clustering and sharding
• Solr 4.1.0 only
• Side project for Hadoop
• Used to manage Hadoop clusters

General Approach
• Design schema
• Prototyping
• Integration

Design Schema
• A data modelling exercise
• schema.xml
• Dynamic fields can be useful in the first pass:
<dynamicField name=“*" type="string"
indexed="true" />

Prototyping
• Get the data in (index)
• csv, XML, JSON
• post.jar
• URL to search and inspect raw results
• ‘browse’ interface allows developer to
understand how the search is working
• solrconfig.xml

Integration
• Not covered
• Content ingestion
• Presentation of results
• Up to you…

Recommandé

Tips for Tuning Solr Search: No Coding RequiredAcquia

Search Engines: Best PracticeYuliya_Prach

Hibernate Tips ‘n’ Tricks - 15 Tips to solve common problemsThorben Janssen

Apache Solr Search Course Drupal 7 AcquiaDropsolid

Introduction to Apache SolrShalin Shekhar Mangar

Design for scaleDoug Lampe

State of Search, Solr and Facets in Drupal 8 - Drupalcamp Belgium 2015Dropsolid

Intro to Apache SolrShalin Shekhar Mangar

Recommandé

Tips for Tuning Solr Search: No Coding RequiredAcquia

Search Engines: Best PracticeYuliya_Prach

Hibernate Tips ‘n’ Tricks - 15 Tips to solve common problemsThorben Janssen

Apache Solr Search Course Drupal 7 AcquiaDropsolid

Introduction to Apache SolrShalin Shekhar Mangar

Design for scaleDoug Lampe

State of Search, Solr and Facets in Drupal 8 - Drupalcamp Belgium 2015Dropsolid

Intro to Apache SolrShalin Shekhar Mangar

Building Enterprise Search Engines using Open Source TechnologiesRahul Singh

Episerver and search enginesMikko Huilaja

Elastic & Azure & Episever, Case EviraMikko Huilaja

Survey of the Microsoft Azure Data LandscapeIke Ellis

Elasticsearch { "Meetup" : "talk" }Lutf Ur Rehman

Building Search Engines - Lucene, SolR and ElasticsearchRahul Singh

Enterprise Search Using Apache Solrsagar chaturvedi

Scot Hacker: Building a Killer Bucketlist Site with Python/DjangoBayCHI

Schema less table & dynamic schemaDavide Mauri

Search and analyze your data with elasticsearchAnton Udovychenko

Apache Solr-WebinarEdureka!

Alfresco Day Stockholm 2015 - Rapid UI DevelopmentNicole Szigeti

AtlasCamp 2014: Preparing Your Plugin for JIRA Data CenterAtlassian

SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...Lucidworks

Elasticsearch for Autosuggest in Clojure at WorkframeBrian Ballantine

Dev ops-presentationLev Ozeryansky

SSIS Monitoring Deep DiveDavide Mauri

Digital Publishing Made Easy with the OSCI ToolkitKyle Jaebker

Tips & Tricks SQL in the City Seattle 2014Ike Ellis

SQL Server 2016 What's New For DevelopersDavide Mauri

HiltonCristian Rios Morales

Gender in MediaDee Menear

Contenu connexe

Tendances

Building Enterprise Search Engines using Open Source TechnologiesRahul Singh

Episerver and search enginesMikko Huilaja

Elastic & Azure & Episever, Case EviraMikko Huilaja

Survey of the Microsoft Azure Data LandscapeIke Ellis

Elasticsearch { "Meetup" : "talk" }Lutf Ur Rehman

Building Search Engines - Lucene, SolR and ElasticsearchRahul Singh

Enterprise Search Using Apache Solrsagar chaturvedi

Scot Hacker: Building a Killer Bucketlist Site with Python/DjangoBayCHI

Schema less table & dynamic schemaDavide Mauri

Search and analyze your data with elasticsearchAnton Udovychenko

Apache Solr-WebinarEdureka!

Alfresco Day Stockholm 2015 - Rapid UI DevelopmentNicole Szigeti

AtlasCamp 2014: Preparing Your Plugin for JIRA Data CenterAtlassian

SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...Lucidworks

Elasticsearch for Autosuggest in Clojure at WorkframeBrian Ballantine

Dev ops-presentationLev Ozeryansky

SSIS Monitoring Deep DiveDavide Mauri

Digital Publishing Made Easy with the OSCI ToolkitKyle Jaebker

Tips & Tricks SQL in the City Seattle 2014Ike Ellis

SQL Server 2016 What's New For DevelopersDavide Mauri

Tendances (20)

Building Enterprise Search Engines using Open Source Technologies

Episerver and search engines

Elastic & Azure & Episever, Case Evira

Survey of the Microsoft Azure Data Landscape

Elasticsearch { "Meetup" : "talk" }

Building Search Engines - Lucene, SolR and Elasticsearch

Enterprise Search Using Apache Solr

Scot Hacker: Building a Killer Bucketlist Site with Python/Django

Schema less table & dynamic schema

Search and analyze your data with elasticsearch

Apache Solr-Webinar

Alfresco Day Stockholm 2015 - Rapid UI Development

AtlasCamp 2014: Preparing Your Plugin for JIRA Data Center

SearchHub - How to Spend Your Summer Keeping it Real: Presented by Grant Inge...

Elasticsearch for Autosuggest in Clojure at Workframe

Dev ops-presentation

SSIS Monitoring Deep Dive

Digital Publishing Made Easy with the OSCI Toolkit

Tips & Tricks SQL in the City Seattle 2014

SQL Server 2016 What's New For Developers

En vedette

HiltonCristian Rios Morales

Gender in MediaDee Menear

Kids these days (at work)Dee Menear

Rhetoric in Popular CultureDee Menear

Diseños bioclimaticosLuis Franklin Mendoza Floreano

Frank gehryLuis Franklin Mendoza Floreano

Interpersonal Communication in CarsDee Menear

Bad advises for broken heartSerge Rybkin

boilersSaurabh Kumar

La belleza.Luis Franklin Mendoza Floreano

En vedette (10)

Hilton

Gender in Media

Kids these days (at work)

Rhetoric in Popular Culture

Diseños bioclimaticos

Frank gehry

Interpersonal Communication in Cars

Bad advises for broken heart

boilers

La belleza.

Similaire à Solr

Intro to Apache Solr for DrupalChris Caple

Search api d8Dropsolid

QueryPath, Mash-ups, and Web ServicesMatt Butcher

Solr RecipesErik Hatcher

Solr + Hadoop: Interactive Search for Hadoopgregchanan

Middleware in Golang: InVision's RyeCale Hoopes

Solr Recipes WorkshopErik Hatcher

Apereo OAE - BootcampNicolaas Matthijs

Search in the Apache Hadoop Ecosystem: Thoughts from the FieldAlex Moundalexis

Intro to SharePoint 2010 development for .NET developersJohn Ferringer

Search On Hadoopbigdatagurus_meetup

Search all the thingscyberswat

Introduction to SolrErik Hatcher

Data Science at Scale: Using Apache Spark for Data Science at BitlySarah Guido

SolrCloud on HadoopAlex Moundalexis

Full Text Search with LuceneWO Community

Intro to Solr in Drupal Mediacurrent

Wikipedia Cloud Search WebinarSearch Technologies

Drupal for programmersMichael Shahov

Zero to Sixty with Oracle ApExBradley Brown

Similaire à Solr (20)

Intro to Apache Solr for Drupal

Search api d8

QueryPath, Mash-ups, and Web Services

Solr Recipes

Solr + Hadoop: Interactive Search for Hadoop

Middleware in Golang: InVision's Rye

Solr Recipes Workshop

Apereo OAE - Bootcamp

Search in the Apache Hadoop Ecosystem: Thoughts from the Field

Intro to SharePoint 2010 development for .NET developers

Search On Hadoop

Search all the things

Introduction to Solr

Data Science at Scale: Using Apache Spark for Data Science at Bitly

SolrCloud on Hadoop

Full Text Search with Lucene

Intro to Solr in Drupal

Wikipedia Cloud Search Webinar

Drupal for programmers

Zero to Sixty with Oracle ApEx

Dernier

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays

DBX First Quarter 2024 Investor PresentationDropbox

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

[BuildWithAI] Introduction to Gemini.pdfSandro Moreira

MINDCTI Revenue Release Quarter One 2024MIND CTI

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous

Ransomware_Q4_2023. The report. [EN].pdfOverkill Security

Why Teams call analytics are critical to your entire businesspanagenda

Dernier (20)

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

How to Troubleshoot Apps for the Modern Connected Worker

2024: Domino Containers - The Next Step. News from the Domino Container commu...

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe

DBX First Quarter 2024 Investor Presentation

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam

Boost Fertility New Invention Ups Success Rates.pdf

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

Axa Assurance Maroc - Insurer Innovation Award 2024

[BuildWithAI] Introduction to Gemini.pdf

MINDCTI Revenue Release Quarter One 2024

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Ransomware_Q4_2023. The report. [EN].pdf

Why Teams call analytics are critical to your entire business

Solr

1. Solr

2. What is it? • Text search index (engine) • Open source • Not a search product • A tool that allows you to create a search solution

3. What is it like? • Google, Google Appliance. • FAST • Oracle Secure Enterprise Search • etc.

4. Google Appliance: • Sucks data in • Can’t really configure • Stuck with results • Bonnet is locked

5. Solr: • You need to feed data in • Highly configurable • Search results can be tuned • There is no bonnet

6. Why am I doing a talk? • Did a course • LucidWorks content • Presented by FindWise • FindWise are a search specialist that use a range of search engines

7. Caveats • Course was in Solr 4.1.0, we use 3.6.1 for APVMA • Course focussed on search, not ingestion or presentation • Java API recommended for ingestion • ‘Browse’ interface uses Velocity templates for presentation, but probably isn’t good enough for most projects.

8. Where does Solr fit?

9. Application Architecture

10. Apache Tika • Data import handler • Used to be part of Lucene • XML • PDF • Word • Excel • etc.

11. Manifold CF • Apache • Connector framework • Used to connect to content repositories (source) • Sharepoint • Documentum • CMIS • JDBC • RSS

12. Hydra • FindWise • Although Solr supports validation (e.g. ‘required’), don’t use it for data cleanup. • Validation failure inconvenient: whole job fails • Feed in clean data. • Use Hydra for cleanup.

13. Apache ZooKeeper • Used for SolrCloud • Clustering and sharding • Solr 4.1.0 only • Side project for Hadoop • Used to manage Hadoop clusters

14. Inside

15. General Approach • Design schema • Prototyping • Integration

16. Design Schema • A data modelling exercise • schema.xml • Dynamic fields can be useful in the first pass: <dynamicField name=“*" type="string" indexed="true" />

17. Prototyping • Get the data in (index) • csv, XML, JSON • post.jar • URL to search and inspect raw results • ‘browse’ interface allows developer to understand how the search is working • solrconfig.xml

18. Integration • Not covered • Content ingestion • Presentation of results • Up to you…

19. Demo