SlideShare une entreprise Scribd logo
1  sur  20
© Copyright 2013
Intro to Search
Grant Ingersoll
CTO, LucidWorks
@gsingers
© 2013 LucidWorks
• Search is Everywhere!
• The Bar is Raised
- Keyword search is a
commodity
• Holistic view of the data
AND the users is critical
• Scalable
Search, Discovery and
Analytics are the key to
unlocking this view of
users and data
Search is dead, long live search
Documents
User
Interaction
Access
Content
Relationships
© 2013 LucidWorks
3
Search is good for…
• Traditional: Fast, fuzzy text matching across a large document
collection
• De-normalized data
- “light” relational
• Top N problems
- Key-value (top 1)
- Recommendations
- “Good enough” classification, clustering
• Faceting, slicing and dicing of enumerated data
• Spatial, spell checking, record linkage, highlighting
• NoSQL
© 2013 LucidWorks
4
Common Use Cases
• eCommerce
- Search + Recs + Analysis of users
• Knowledge Management
- Financial, transportation, pharma
• Fraud detection
• Social media
- Trend monitoring
• Information technology
- Log monitoring, analysis
• Healthcare
- DNA Analysis
© 2013 LucidWorks
http://bit.ly/get-lws
5
© 2013 LucidWorks
6
Topics
• Intros
• First 5 Minutes with LucidWorks Search (Solr++)
• Search Concepts
• Demo Deep Dive
• Level Up
• Resources
© 2013 LucidWorks
7
› Founded in 2007 to be the go-to-company for Lucene/Solr
expertise
› 250+ customers (many Fortune 500)
› 100% y-y growth
› Over 40% of the active Apache Lucene/Solr Committers
› Host fast-growing Lucene/Solr Revolution User Conference
(400+ attendees)
LucidWorks Overview
© 2013 LucidWorks
8
LucidWorks Product Suite
PRODUCT
LucidWorks Search
LucidWorks Big
Data
Description
Massively adopted open
source search
technology
Enterprise Search
platform built on
Lucene/Solr
Unified development
platform for Big Data
applications
Version
Version 4.3 released
May 2013
Version 2.5 ships
December 2012
GA Version 1.1
released Feb. 2013
LucidWorks
Offering
› Annual Support
Subscriptions
› Professional Services
› Training
› Inside Sales Model
› Free trial
› On-prem or cloud
› Inside sales model
› Free Trial
› On-prem or cloud
› Enterprise sales model
© 2013 LucidWorks
9
5 Minutes to Search
1. Install LWS
1. Unpack, double click to launch Installer
2. Launch, wait for startup
2. http://localhost:8989/
3. Choose “Quick Start”
4. Choose a Data Source
1. For me: /Users/grantingersoll/Desktop/reading
5. Quick Search
6. Search with Flare
1. http://localhost:8989/flare/catalog/quickstart
7. Quick Changes:
1. Add a Facet
2. Change Display Results
© 2013 LucidWorks
10
Prepare Deep Dive Demo
1. https://github.com/LucidWorks/lws-financial-
demo/blob/master/README.md
2. cd src/main/python
3. python setup.py -n setup -a
TWITTER_ACCESS_TOKEN -c
TWITTER_CONSUMER_KEY -s
TWITTER_CONSUMER_SECRET -t
TWITTER_ACCESS_TOKEN_SECRET -p
../../../data/sp500List-30.txt -A -l Finance --data_dir
../../../data
4. python python.py
© 2013 LucidWorks
• Java APIs for building
search applications
• Fast, efficient, flexible
• Modules to add
functionality:
- Lang. Analysis
- Faceting
- Highlighting, spell checking
- Much more
• Lucene best practices
• HTTP-based service
- Many client bindings
• Faceting
• Distributed, fault-tolerant
• Many No-SQL features
11
© 2013 LucidWorks
12
• IT Ready Open Source
- Installation, provisioning, monitoring, administration, integration
• Enterprise Grade
- A robust connector framework
» Including a wide assortment of prebuilt connectors to popular data sources
- Enterprise security framework
» Leverages SSL, LDAP, Active Directory
» Document level access control
• Business Friendly
- Rich graphical administration console
» speeds up search application development, deployment and management
- Expressive Business Logic
» Processing information thru filters for better more accurate results
- Relevancy Work Bench
• Full power of Apache Lucene and Solr
LucidWorks Search Goals
© 2013 LucidWorks
Shards
1
2
3 N
Search View
•Documents
•Users
•Logs
Document
Store
Analytic
Services
View into
numeric/hist
oric data
Classification
Recommendation
Personalization
& Machine
Learning
Services
Classification
Models
In memory
Replicated
Multi-tenant
Discovery &
Enrichment
Clustering, classifi
cation, NLP, topic
identification, sear
ch log
analysis, user
behavior Content Acquisition
ETL, batch or near
real-time
Access APIs
Data
• LucidWorks Search
connectors
• Push
Reference Architecture
© 2013 LucidWorks
14
Basic Vocab
•Documents
- Fields
»Tokens
▪ Payloads
• Query
- Many diff. kinds: term, phrase, regex, spatial, function
•Facets & Filters
•Collection
- Index
»Shard
▪ Segment
© 2013 LucidWorks
15
Search Concepts: Indexing
© 2013 LucidWorks
16
Search Concepts: Ranking
• Search is optimized for solving top
N problems
• Hand Waving Algo:
- Parse query
- For Each Term
» Look up documents containing term
- Rank documents according to
similarity
- Return top X
© 2013 LucidWorks
17
Search Concepts: Faceting
• Dynamically slice and dice query
results in a variety of ways:
- Term
- Range (date and numeric)
- Pivot
- Function
- Multi-select
• Gather Stats
© 2013 LucidWorks
18
Demo Deep Dive
• Application:
- Stock Insights
- Twitter Bootstrap + Python Flask + LWS
- http://localhost:5000
• Goals:
- Explore data sources, scheduling, other features
- Automate setup via script and LWS APIs
• Data:
- Company Info (Symbol, Company, Industry, City, State)
- Twitter, websites
- Historical Stock Prices from Y! Finance
• http://github.com/lucidworks/lws-financial-demo
- README covers setup
© 2013 LucidWorks
19
Level Up
• Explore our APIs:
- http://bit.ly/lws-apis
• Build your own UI or
extend ours
• Write a custom connector
• Customize Solr!
• Scale with SolrCloud
• Explore Solr Marketplace:
• http://bit.ly/solr-market
© 2013 LucidWorks
20
Where to Next?
• http://www.lucidworks.com
• http://lucene.apache.org/solr
• Training: http://bit.ly/lws-training
• LWS more info: http://bit.ly/lws-more-info
• LWS Documentation: http://bit.ly/lws-docs
• Twitter: @gsingers, @LucidWorks
• Taming Text: http://www.manning.com/ingersoll

Contenu connexe

Tendances

Webinar: Fusion for Data Science
Webinar: Fusion for Data ScienceWebinar: Fusion for Data Science
Webinar: Fusion for Data ScienceLucidworks
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineLucidworks (Archived)
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Lucidworks (Archived)
 
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...OpenSource Connections
 
Use cases for cassandra in federal and state government
Use cases for cassandra in federal and state governmentUse cases for cassandra in federal and state government
Use cases for cassandra in federal and state governmentOpenSource Connections
 
Hadoop world overview trends and topics
Hadoop world overview trends and topicsHadoop world overview trends and topics
Hadoop world overview trends and topicsValentin Kropov
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!gagravarr
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Lucidworks
 
Uber's data science workbench
Uber's data science workbenchUber's data science workbench
Uber's data science workbenchRan Wei
 
ApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big DataApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big DataOpenSource Connections
 
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks FusionWebinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks FusionLucidworks
 
Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5Mike King
 
Uof memphis nosql mike king dell v1.5 feb18
Uof memphis nosql mike king dell v1.5 feb18Uof memphis nosql mike king dell v1.5 feb18
Uof memphis nosql mike king dell v1.5 feb18Mike King
 
Data Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoData Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoSpark Summit
 

Tendances (20)

Webinar: Fusion for Data Science
Webinar: Fusion for Data ScienceWebinar: Fusion for Data Science
Webinar: Fusion for Data Science
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
 
Apache Spark in Industry
Apache Spark in IndustryApache Spark in Industry
Apache Spark in Industry
 
Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
Use cases for cassandra in federal and state government
Use cases for cassandra in federal and state governmentUse cases for cassandra in federal and state government
Use cases for cassandra in federal and state government
 
Hadoop world overview trends and topics
Hadoop world overview trends and topicsHadoop world overview trends and topics
Hadoop world overview trends and topics
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!
 
OSSCON: Big Search 4 Big Data
OSSCON: Big Search 4 Big DataOSSCON: Big Search 4 Big Data
OSSCON: Big Search 4 Big Data
 
Big Search 4 Big Data War Stories
Big Search 4 Big Data War StoriesBig Search 4 Big Data War Stories
Big Search 4 Big Data War Stories
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
 
Uber's data science workbench
Uber's data science workbenchUber's data science workbench
Uber's data science workbench
 
ApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big DataApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big Data
 
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks FusionWebinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
 
Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5
 
Indexing big data in the cloud
Indexing big data in the cloudIndexing big data in the cloud
Indexing big data in the cloud
 
Uof memphis nosql mike king dell v1.5 feb18
Uof memphis nosql mike king dell v1.5 feb18Uof memphis nosql mike king dell v1.5 feb18
Uof memphis nosql mike king dell v1.5 feb18
 
Data Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoData Science at Scale by Sarah Guido
Data Science at Scale by Sarah Guido
 
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
 

En vedette

Leveraging Solr and Mahout
Leveraging Solr and MahoutLeveraging Solr and Mahout
Leveraging Solr and MahoutGrant Ingersoll
 
Crowd Sourced Reflected Intelligence for Solr and Hadoop
Crowd Sourced Reflected Intelligence for Solr and HadoopCrowd Sourced Reflected Intelligence for Solr and Hadoop
Crowd Sourced Reflected Intelligence for Solr and HadoopGrant Ingersoll
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrGrant Ingersoll
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xGrant Ingersoll
 

En vedette (6)

Apache Lucene 4
Apache Lucene 4Apache Lucene 4
Apache Lucene 4
 
Leveraging Solr and Mahout
Leveraging Solr and MahoutLeveraging Solr and Mahout
Leveraging Solr and Mahout
 
Crowd Sourced Reflected Intelligence for Solr and Hadoop
Crowd Sourced Reflected Intelligence for Solr and HadoopCrowd Sourced Reflected Intelligence for Solr and Hadoop
Crowd Sourced Reflected Intelligence for Solr and Hadoop
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.x
 
Taming Text
Taming TextTaming Text
Taming Text
 

Similaire à Intro to Search

Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKLucidworks (Archived)
 
This Ain't Your Parents' Search Engine
This Ain't Your Parents' Search EngineThis Ain't Your Parents' Search Engine
This Ain't Your Parents' Search EngineLucidworks
 
Introduction to Cloudera Search Training
Introduction to Cloudera Search TrainingIntroduction to Cloudera Search Training
Introduction to Cloudera Search TrainingCloudera, Inc.
 
Islandora Webinar: Building a Repository Roadmap
Islandora Webinar: Building a Repository RoadmapIslandora Webinar: Building a Repository Roadmap
Islandora Webinar: Building a Repository Roadmapeohallor
 
Three signs your architecture is too small for big data. Camp IT December 2014
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014Craig Jordan
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 
OU Library Research Support webinar: Working with research data
OU Library Research Support webinar: Working with research dataOU Library Research Support webinar: Working with research data
OU Library Research Support webinar: Working with research dataIzzyChad
 
Large Scale Search, Discovery and Analytics in Action
Large Scale Search, Discovery and Analytics in ActionLarge Scale Search, Discovery and Analytics in Action
Large Scale Search, Discovery and Analytics in ActionGrant Ingersoll
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discoverymarkgrover
 
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus WebinarBuild and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus WebinarImpetus Technologies
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarLucidworks (Archived)
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDBMongoDB
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Cloudera, Inc.
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Lucidworks
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and BusinessDenodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and BusinessDenodo
 

Similaire à Intro to Search (20)

Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 
This Ain't Your Parents' Search Engine
This Ain't Your Parents' Search EngineThis Ain't Your Parents' Search Engine
This Ain't Your Parents' Search Engine
 
Introduction to Cloudera Search Training
Introduction to Cloudera Search TrainingIntroduction to Cloudera Search Training
Introduction to Cloudera Search Training
 
Islandora Webinar: Building a Repository Roadmap
Islandora Webinar: Building a Repository RoadmapIslandora Webinar: Building a Repository Roadmap
Islandora Webinar: Building a Repository Roadmap
 
Three signs your architecture is too small for big data. Camp IT December 2014
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
OU Library Research Support webinar: Working with research data
OU Library Research Support webinar: Working with research dataOU Library Research Support webinar: Working with research data
OU Library Research Support webinar: Working with research data
 
2020 | Metadata Day | LinkedIn
2020 | Metadata Day | LinkedIn2020 | Metadata Day | LinkedIn
2020 | Metadata Day | LinkedIn
 
Large Scale Search, Discovery and Analytics in Action
Large Scale Search, Discovery and Analytics in ActionLarge Scale Search, Discovery and Analytics in Action
Large Scale Search, Discovery and Analytics in Action
 
Liwp consider opensource2010
Liwp consider opensource2010Liwp consider opensource2010
Liwp consider opensource2010
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus WebinarBuild and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and BusinessDenodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and Business
 

Plus de Grant Ingersoll

Scalable Machine Learning with Hadoop
Scalable Machine Learning with HadoopScalable Machine Learning with Hadoop
Scalable Machine Learning with HadoopGrant Ingersoll
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrGrant Ingersoll
 
Bet you didn't know Lucene can...
Bet you didn't know Lucene can...Bet you didn't know Lucene can...
Bet you didn't know Lucene can...Grant Ingersoll
 
Starfish: A Self-tuning System for Big Data Analytics
Starfish: A Self-tuning System for Big Data AnalyticsStarfish: A Self-tuning System for Big Data Analytics
Starfish: A Self-tuning System for Big Data AnalyticsGrant Ingersoll
 
Intro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopIntro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopGrant Ingersoll
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and SolrGrant Ingersoll
 
Apache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow ElephantApache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow ElephantGrant Ingersoll
 
Intelligent Apps with Apache Lucene, Mahout and Friends
Intelligent Apps with Apache Lucene, Mahout and FriendsIntelligent Apps with Apache Lucene, Mahout and Friends
Intelligent Apps with Apache Lucene, Mahout and FriendsGrant Ingersoll
 
TriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopTriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopGrant Ingersoll
 

Plus de Grant Ingersoll (10)

Scalable Machine Learning with Hadoop
Scalable Machine Learning with HadoopScalable Machine Learning with Hadoop
Scalable Machine Learning with Hadoop
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
 
Bet you didn't know Lucene can...
Bet you didn't know Lucene can...Bet you didn't know Lucene can...
Bet you didn't know Lucene can...
 
Starfish: A Self-tuning System for Big Data Analytics
Starfish: A Self-tuning System for Big Data AnalyticsStarfish: A Self-tuning System for Big Data Analytics
Starfish: A Self-tuning System for Big Data Analytics
 
Intro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopIntro to Mahout -- DC Hadoop
Intro to Mahout -- DC Hadoop
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and Solr
 
Apache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow ElephantApache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow Elephant
 
Intelligent Apps with Apache Lucene, Mahout and Friends
Intelligent Apps with Apache Lucene, Mahout and FriendsIntelligent Apps with Apache Lucene, Mahout and Friends
Intelligent Apps with Apache Lucene, Mahout and Friends
 
TriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopTriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr Hadoop
 
Intro to Apache Mahout
Intro to Apache MahoutIntro to Apache Mahout
Intro to Apache Mahout
 

Dernier

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 

Dernier (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 

Intro to Search

  • 1. © Copyright 2013 Intro to Search Grant Ingersoll CTO, LucidWorks @gsingers
  • 2. © 2013 LucidWorks • Search is Everywhere! • The Bar is Raised - Keyword search is a commodity • Holistic view of the data AND the users is critical • Scalable Search, Discovery and Analytics are the key to unlocking this view of users and data Search is dead, long live search Documents User Interaction Access Content Relationships
  • 3. © 2013 LucidWorks 3 Search is good for… • Traditional: Fast, fuzzy text matching across a large document collection • De-normalized data - “light” relational • Top N problems - Key-value (top 1) - Recommendations - “Good enough” classification, clustering • Faceting, slicing and dicing of enumerated data • Spatial, spell checking, record linkage, highlighting • NoSQL
  • 4. © 2013 LucidWorks 4 Common Use Cases • eCommerce - Search + Recs + Analysis of users • Knowledge Management - Financial, transportation, pharma • Fraud detection • Social media - Trend monitoring • Information technology - Log monitoring, analysis • Healthcare - DNA Analysis
  • 6. © 2013 LucidWorks 6 Topics • Intros • First 5 Minutes with LucidWorks Search (Solr++) • Search Concepts • Demo Deep Dive • Level Up • Resources
  • 7. © 2013 LucidWorks 7 › Founded in 2007 to be the go-to-company for Lucene/Solr expertise › 250+ customers (many Fortune 500) › 100% y-y growth › Over 40% of the active Apache Lucene/Solr Committers › Host fast-growing Lucene/Solr Revolution User Conference (400+ attendees) LucidWorks Overview
  • 8. © 2013 LucidWorks 8 LucidWorks Product Suite PRODUCT LucidWorks Search LucidWorks Big Data Description Massively adopted open source search technology Enterprise Search platform built on Lucene/Solr Unified development platform for Big Data applications Version Version 4.3 released May 2013 Version 2.5 ships December 2012 GA Version 1.1 released Feb. 2013 LucidWorks Offering › Annual Support Subscriptions › Professional Services › Training › Inside Sales Model › Free trial › On-prem or cloud › Inside sales model › Free Trial › On-prem or cloud › Enterprise sales model
  • 9. © 2013 LucidWorks 9 5 Minutes to Search 1. Install LWS 1. Unpack, double click to launch Installer 2. Launch, wait for startup 2. http://localhost:8989/ 3. Choose “Quick Start” 4. Choose a Data Source 1. For me: /Users/grantingersoll/Desktop/reading 5. Quick Search 6. Search with Flare 1. http://localhost:8989/flare/catalog/quickstart 7. Quick Changes: 1. Add a Facet 2. Change Display Results
  • 10. © 2013 LucidWorks 10 Prepare Deep Dive Demo 1. https://github.com/LucidWorks/lws-financial- demo/blob/master/README.md 2. cd src/main/python 3. python setup.py -n setup -a TWITTER_ACCESS_TOKEN -c TWITTER_CONSUMER_KEY -s TWITTER_CONSUMER_SECRET -t TWITTER_ACCESS_TOKEN_SECRET -p ../../../data/sp500List-30.txt -A -l Finance --data_dir ../../../data 4. python python.py
  • 11. © 2013 LucidWorks • Java APIs for building search applications • Fast, efficient, flexible • Modules to add functionality: - Lang. Analysis - Faceting - Highlighting, spell checking - Much more • Lucene best practices • HTTP-based service - Many client bindings • Faceting • Distributed, fault-tolerant • Many No-SQL features 11
  • 12. © 2013 LucidWorks 12 • IT Ready Open Source - Installation, provisioning, monitoring, administration, integration • Enterprise Grade - A robust connector framework » Including a wide assortment of prebuilt connectors to popular data sources - Enterprise security framework » Leverages SSL, LDAP, Active Directory » Document level access control • Business Friendly - Rich graphical administration console » speeds up search application development, deployment and management - Expressive Business Logic » Processing information thru filters for better more accurate results - Relevancy Work Bench • Full power of Apache Lucene and Solr LucidWorks Search Goals
  • 13. © 2013 LucidWorks Shards 1 2 3 N Search View •Documents •Users •Logs Document Store Analytic Services View into numeric/hist oric data Classification Recommendation Personalization & Machine Learning Services Classification Models In memory Replicated Multi-tenant Discovery & Enrichment Clustering, classifi cation, NLP, topic identification, sear ch log analysis, user behavior Content Acquisition ETL, batch or near real-time Access APIs Data • LucidWorks Search connectors • Push Reference Architecture
  • 14. © 2013 LucidWorks 14 Basic Vocab •Documents - Fields »Tokens ▪ Payloads • Query - Many diff. kinds: term, phrase, regex, spatial, function •Facets & Filters •Collection - Index »Shard ▪ Segment
  • 15. © 2013 LucidWorks 15 Search Concepts: Indexing
  • 16. © 2013 LucidWorks 16 Search Concepts: Ranking • Search is optimized for solving top N problems • Hand Waving Algo: - Parse query - For Each Term » Look up documents containing term - Rank documents according to similarity - Return top X
  • 17. © 2013 LucidWorks 17 Search Concepts: Faceting • Dynamically slice and dice query results in a variety of ways: - Term - Range (date and numeric) - Pivot - Function - Multi-select • Gather Stats
  • 18. © 2013 LucidWorks 18 Demo Deep Dive • Application: - Stock Insights - Twitter Bootstrap + Python Flask + LWS - http://localhost:5000 • Goals: - Explore data sources, scheduling, other features - Automate setup via script and LWS APIs • Data: - Company Info (Symbol, Company, Industry, City, State) - Twitter, websites - Historical Stock Prices from Y! Finance • http://github.com/lucidworks/lws-financial-demo - README covers setup
  • 19. © 2013 LucidWorks 19 Level Up • Explore our APIs: - http://bit.ly/lws-apis • Build your own UI or extend ours • Write a custom connector • Customize Solr! • Scale with SolrCloud • Explore Solr Marketplace: • http://bit.ly/solr-market
  • 20. © 2013 LucidWorks 20 Where to Next? • http://www.lucidworks.com • http://lucene.apache.org/solr • Training: http://bit.ly/lws-training • LWS more info: http://bit.ly/lws-more-info • LWS Documentation: http://bit.ly/lws-docs • Twitter: @gsingers, @LucidWorks • Taming Text: http://www.manning.com/ingersoll

Notes de l'éditeur

  1. The bar is raised: when we first started Lucid, the problems were all around standing up Lucene or Solr or dealing with performance issues, now the large majority of them are around taking search to the next level: better relevance, personalization, recommendations, etc., i.e. how to have better relevance
  2. What is Lucene?What is Solr?
  3. Service-Oriented ArchitectureStatelessFailover/Fault TolerantLightweight Coordination and MessagingSmart about UpdatesDocument store isDistributedScalableAnalysisBatchNear Real-Time