SlideShare une entreprise Scribd logo
1  sur  33
Rapid development of 
website search in Python 
PyCon India, 
Bangalore, Sept’ 12 
Chetan Giridhar
For whom! 
 If you’re, 
an experienced developer who has 
implemented search solutions 
currently dirtying your hands 
prototyping website search for your startup 
dreading to learn Java  
just curious..
Think web development 
 Core functionality 
 Design patterns 
 Web Interface 
 Usability 
 Scalability 
 Performance 
 …?
Search 
 Often considered – ‘good to have’ 
 Enhances user experience 
 Focused information 
 Relevance 
 Interaction 
 Ranked searching
Typical Search Engine 
 Designing a schema 
 Convert your data as Documents and store 
them to index 
 Document is a set of fields 
 Field is a name=value pair 
 {title = “python”, content = “computer”, 
tag = “language”} 
 Analyzers 
 "parse" each field of your data into index-able 
"tokens" or keywords. 
 “Welcome to Pycon" it will produce list 
[“welcome", “to", “Pycon”]
Typical Search Engine 
 Indexing 
 Adding documents to the index 
 Query and query parsers 
 Prepare query 
 Parse 
 Analyze 
 Searching 
 Lookup index
Schema 
based 
document 
Index Writer 
Indexing & Committing 
Input 
files 
Field1 
Field3 
Analyzer 
Field2 
In-memory 
Index 
Committed
Query Parser Analyzer 
Results 
Searching 
Input query 
Index Searcher 
Index
 Sourcing input data set 
 Handling input queries 
 How to search 
 Search engines 
 How to display results 
 Customization 
Development : Considerations
 Apache Solr: Sunburnt 
 Haystack 
 Xapian: Xappy 
 Elastic Search 
Development: Options 
 Whoosh 
 Lucene: Pylucene
 Pythonic APIs 
 Deployment 
Large scale and 
medium sized web sites 
Talking Pylucene & Whoosh 
 Rapid 
Minimal installation 
Clear Documentation 
Quick Setup 
Ease of Integration
Pylucene 
 Pylucene: Python wrappers to Lucene 
 The de-facto standard for search engine library 
 Lucene: an open source, pure Java, search 
engine library 
 Embeds a Java VM with Lucene into a Python 
process
Pylucene 
 Simple API 
 High performance indexing 
 Scalable to millions of documents 
 Efficient and feature rich search algorithms 
 Cross platform
Whoosh 
 Whoosh is a search engine library 
 Fast indexing and search 
 One of the fastest Python search engine 
 100% Python code 
 Extensible code 
 No external dependency 
 Active development and support
Whoosh 
 Easy to setup 
 Neutral to web frameworks 
 Powerful query language 
 Feature rich 
 Intuitive APIs
 Document 
 Field 
 IndexWriter 
 QueryParser 
 Analyzer 
 IndexSearcher 
 fields.Schema 
 index.Index 
 qparser.QueryParser 
 analysis. Analyzer 
 searching.Searcher 
PyLucene Whoosh
 Search design should be: 
 An independent component 
Pluggable 
Platform independent 
Assume minimal external dependency 
Easily extendible 
Seamless integration 
Designing search in websites
Search.py 
fsMgr
Demo
Comparing Engines 
 Basis of comparison 
 Indexing, Committing and Searching 
 Dataset 
 1 GB data 
 ~5000 files 
 file size ranging between 1KB to 50MB 
 Setup 
 Intel® Core™2 Duo CPU P8600 @ 2.40GHz × 2 
3 GB RAM 
 Ubuntu Release 12.04 (precise) 32-bit
Indexing 
500 
400 
300 
200 
100 
0 
Time to Index 
pylucene whoosh 
time (s)
Committing 
300 
250 
200 
150 
100 
50 
0 
Time to Commit 
pylucene whoosh 
time (s)
Searching 
0.01 
0.008 
0.006 
0.004 
0.002 
0 
Time to Search 
pylucene whoosh 
time (s)
Recommendations 
 Search Engine Library 
No one solution fits all problems 
Search engine abstraction is the key 
Scalability is critical 
Rapid to setup, develop and tweak 
Understand and use 
 Getting rapid and easier by the day 
 Web frameworks 
Web development in Python 
 Django, Pylons 
 Http Servers 
 Tornado, Gunicorn 
 Support for SQL/NoSQL databases 
MySQL-python, pymongo 
 Template Engines 
 Cheetah, jinja2 
 Search 
 Pylucene, Whoosh
References 
 Whoosh 
 https://bitbucket.org/mchaput/whoosh/wiki/Home 
 Pylucene 
 http://lucene.apache.org/pylucene/ 
 http://lucene.apache.org/core/3_6_1/api/all/index.html 
 Xappy 
 http://code.google.com/p/xappy/ 
 ElasticSearch 
 http://www.elasticsearch.org/guide/reference/api/
References 
 Chetan’s tech space 
 http://technobeans.com 
 Vishal’s technical blog 
 http://freethreads.net
Q and A
Backup
Whoosh v/s Haystack v/s Xapian 
• Whoosh is suitable for a small project. Limited 
scalability for search and indexing 
– A good beginning 
• Haystack is appropriate with Django 
• Xapian is ultra fast, but is not as feature rich as 
Solr 
• Lucene is not distributed; has external 
dependency
Lucene v/s Database search 
• There are a number of query types that RDBMSs in general do not 
support without vendor extensions: 
• Fuzzy queries, in which "fuzzy" and "wuzzy" are considered 
matches 
• Word stemming queries, which consider "take," "took," and "taken" 
to be identical 
• Sound-like queries, which consider "cat" and "kat" to be identical 
• Synonym queries, which consider "jump," "hop," and "leap" to be 
identical 
• Queries on binary BLOB data types, such as PDF documents, 
Microsoft Word or Excel documents, or HTML and XML documents 
• More disappointingly, SQL search results are not ranked by match-relevance 
scores. The SQL standard is simply not intended for full-text 
querying.
• Indexing 
– Convert files to a format for quick 
look up 
– Fast random access to stored words 
• Searching 
– Specify keywords 
• Displaying 
– Lookup documents that are 
relevant 
– Ranking 
– Different types of queries 
Typical search engine
Advanced Searching 
 Morelikethis 
 didyoumean

Contenu connexe

Tendances

Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
Erik Hatcher
 
Python & Django TTT
Python & Django TTTPython & Django TTT
Python & Django TTT
kevinvw
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
Tommaso Teofili
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
Erik Hatcher
 

Tendances (20)

How the Lucene More Like This Works
How the Lucene More Like This WorksHow the Lucene More Like This Works
How the Lucene More Like This Works
 
How To Start Your InfoSec Career
How To Start Your InfoSec CareerHow To Start Your InfoSec Career
How To Start Your InfoSec Career
 
Solr 101
Solr 101Solr 101
Solr 101
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
Web Hacking With Burp Suite 101
Web Hacking With Burp Suite 101Web Hacking With Burp Suite 101
Web Hacking With Burp Suite 101
 
Django Documentation
Django DocumentationDjango Documentation
Django Documentation
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
API Design & Security in django
API Design & Security in djangoAPI Design & Security in django
API Design & Security in django
 
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/SolrLet's Build an Inverted Index: Introduction to Apache Lucene/Solr
Let's Build an Inverted Index: Introduction to Apache Lucene/Solr
 
JBUG 11 - Django-The Web Framework For Perfectionists With Deadlines
JBUG 11 - Django-The Web Framework For Perfectionists With DeadlinesJBUG 11 - Django-The Web Framework For Perfectionists With Deadlines
JBUG 11 - Django-The Web Framework For Perfectionists With Deadlines
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Web Development with Python and Django
Web Development with Python and DjangoWeb Development with Python and Django
Web Development with Python and Django
 
Python & Django TTT
Python & Django TTTPython & Django TTT
Python & Django TTT
 
Explainability for Learning to Rank
Explainability for Learning to RankExplainability for Learning to Rank
Explainability for Learning to Rank
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Advanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache LuceneAdvanced Document Similarity With Apache Lucene
Advanced Document Similarity With Apache Lucene
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Great webapis
Great webapisGreat webapis
Great webapis
 

En vedette (6)

Async programming and python
Async programming and pythonAsync programming and python
Async programming and python
 
Making Django and NoSQL Play Nice
Making Django and NoSQL Play NiceMaking Django and NoSQL Play Nice
Making Django and NoSQL Play Nice
 
Relational vs. Non-Relational
Relational vs. Non-RelationalRelational vs. Non-Relational
Relational vs. Non-Relational
 
Graph Databases in Python (PyCon Canada 2012)
Graph Databases in Python (PyCon Canada 2012)Graph Databases in Python (PyCon Canada 2012)
Graph Databases in Python (PyCon Canada 2012)
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 

Similaire à PyCon India 2012: Rapid development of website search in python

Intelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneIntelligent crawling and indexing using lucene
Intelligent crawling and indexing using lucene
Swapnil & Patil
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
Petter Skodvin-Hvammen
 
TechDays11 Geneva - Going Further with SharePoint 2010 Search
TechDays11 Geneva - Going Further with SharePoint 2010 SearchTechDays11 Geneva - Going Further with SharePoint 2010 Search
TechDays11 Geneva - Going Further with SharePoint 2010 Search
Marius Constantinescu [MVP]
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
Trey Grainger
 

Similaire à PyCon India 2012: Rapid development of website search in python (20)

EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...
EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...
EPC Group - Comprehensive Overview of SharePoint 2010's Enterprise Search Cap...
 
Enterprise Search in SharePoint 2010
Enterprise Search in SharePoint 2010Enterprise Search in SharePoint 2010
Enterprise Search in SharePoint 2010
 
Apache lucene
Apache luceneApache lucene
Apache lucene
 
In search of: A meetup about Liferay and Search 2016-04-20
In search of: A meetup about Liferay and Search   2016-04-20In search of: A meetup about Liferay and Search   2016-04-20
In search of: A meetup about Liferay and Search 2016-04-20
 
Reference material: Topics or databases?
Reference material: Topics or databases?Reference material: Topics or databases?
Reference material: Topics or databases?
 
Apache Lucene Searching The Web
Apache Lucene Searching The WebApache Lucene Searching The Web
Apache Lucene Searching The Web
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
 
Intelligent crawling and indexing using lucene
Intelligent crawling and indexing using luceneIntelligent crawling and indexing using lucene
Intelligent crawling and indexing using lucene
 
Enterprise Search in SharePoint 2013
Enterprise Search in SharePoint 2013Enterprise Search in SharePoint 2013
Enterprise Search in SharePoint 2013
 
Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)Share point 2013 enterprise search (public)
Share point 2013 enterprise search (public)
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!
 
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
 
TechDays11 Geneva - Going Further with SharePoint 2010 Search
TechDays11 Geneva - Going Further with SharePoint 2010 SearchTechDays11 Geneva - Going Further with SharePoint 2010 Search
TechDays11 Geneva - Going Further with SharePoint 2010 Search
 
Fried dallas spug
Fried dallas spugFried dallas spug
Fried dallas spug
 
The Enterprise Search Market in a Nutshell
The Enterprise Search Market in a NutshellThe Enterprise Search Market in a Nutshell
The Enterprise Search Market in a Nutshell
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache Stanbol
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Webinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewWebinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's New
 
The original vision of Nutch, 14 years later: Building an open source search ...
The original vision of Nutch, 14 years later: Building an open source search ...The original vision of Nutch, 14 years later: Building an open source search ...
The original vision of Nutch, 14 years later: Building an open source search ...
 

Plus de Chetan Giridhar (7)

Rapid development & integration of real time communication in websites
Rapid development & integration of real time communication in websitesRapid development & integration of real time communication in websites
Rapid development & integration of real time communication in websites
 
Fuse'ing python for rapid development of storage efficient FS
Fuse'ing python for rapid development of storage efficient FSFuse'ing python for rapid development of storage efficient FS
Fuse'ing python for rapid development of storage efficient FS
 
Diving into byte code optimization in python
Diving into byte code optimization in python Diving into byte code optimization in python
Diving into byte code optimization in python
 
Testers in product development code review phase
Testers in product development   code review phaseTesters in product development   code review phase
Testers in product development code review phase
 
Design patterns in python v0.1
Design patterns in python v0.1Design patterns in python v0.1
Design patterns in python v0.1
 
PyCon India 2011: Python Threads: Dive into GIL!
PyCon India 2011: Python Threads: Dive into GIL!PyCon India 2011: Python Threads: Dive into GIL!
PyCon India 2011: Python Threads: Dive into GIL!
 
Pycon11: Python threads: Dive into GIL!
Pycon11: Python threads: Dive into GIL!Pycon11: Python threads: Dive into GIL!
Pycon11: Python threads: Dive into GIL!
 

Dernier

Navsari Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girl...
Navsari Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girl...Navsari Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girl...
Navsari Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girl...
mriyagarg453
 
CHEAP Call Girls in Malviya Nagar, (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in  Malviya Nagar, (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in  Malviya Nagar, (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Malviya Nagar, (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

Ranikhet call girls 📞 8617697112 At Low Cost Cash Payment Booking
Ranikhet call girls 📞 8617697112 At Low Cost Cash Payment BookingRanikhet call girls 📞 8617697112 At Low Cost Cash Payment Booking
Ranikhet call girls 📞 8617697112 At Low Cost Cash Payment Booking
 
Mumbai ] Call Girls Service Mumbai ₹7.5k Pick Up & Drop With Cash Payment 983...
Mumbai ] Call Girls Service Mumbai ₹7.5k Pick Up & Drop With Cash Payment 983...Mumbai ] Call Girls Service Mumbai ₹7.5k Pick Up & Drop With Cash Payment 983...
Mumbai ] Call Girls Service Mumbai ₹7.5k Pick Up & Drop With Cash Payment 983...
 
VIP Model Call Girls Vijayawada ( Pune ) Call ON 8005736733 Starting From 5K ...
VIP Model Call Girls Vijayawada ( Pune ) Call ON 8005736733 Starting From 5K ...VIP Model Call Girls Vijayawada ( Pune ) Call ON 8005736733 Starting From 5K ...
VIP Model Call Girls Vijayawada ( Pune ) Call ON 8005736733 Starting From 5K ...
 
Model Call Girls In Velappanchavadi WhatsApp Booking 7427069034 call girl ser...
Model Call Girls In Velappanchavadi WhatsApp Booking 7427069034 call girl ser...Model Call Girls In Velappanchavadi WhatsApp Booking 7427069034 call girl ser...
Model Call Girls In Velappanchavadi WhatsApp Booking 7427069034 call girl ser...
 
Verified Trusted Call Girls Tambaram Chennai ✔✔7427069034 Independent Chenna...
Verified Trusted Call Girls Tambaram Chennai ✔✔7427069034  Independent Chenna...Verified Trusted Call Girls Tambaram Chennai ✔✔7427069034  Independent Chenna...
Verified Trusted Call Girls Tambaram Chennai ✔✔7427069034 Independent Chenna...
 
Low Rate Call Girls Dhakuria (8005736733) 100% GENUINE ESCORT SERVICE & HOTEL...
Low Rate Call Girls Dhakuria (8005736733) 100% GENUINE ESCORT SERVICE & HOTEL...Low Rate Call Girls Dhakuria (8005736733) 100% GENUINE ESCORT SERVICE & HOTEL...
Low Rate Call Girls Dhakuria (8005736733) 100% GENUINE ESCORT SERVICE & HOTEL...
 
Hire 💕 8617697112 North Sikkim Call Girls Service Call Girls Agency
Hire 💕 8617697112 North Sikkim Call Girls Service Call Girls AgencyHire 💕 8617697112 North Sikkim Call Girls Service Call Girls Agency
Hire 💕 8617697112 North Sikkim Call Girls Service Call Girls Agency
 
Call Girls Panaji Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Panaji Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Panaji Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Panaji Just Call 8617370543 Top Class Call Girl Service Available
 
Kanpur call girls 📞 8617697112 At Low Cost Cash Payment Booking
Kanpur call girls 📞 8617697112 At Low Cost Cash Payment BookingKanpur call girls 📞 8617697112 At Low Cost Cash Payment Booking
Kanpur call girls 📞 8617697112 At Low Cost Cash Payment Booking
 
Verified Trusted Call Girls Singaperumal Koil Chennai ✔✔7427069034 Independe...
Verified Trusted Call Girls Singaperumal Koil Chennai ✔✔7427069034  Independe...Verified Trusted Call Girls Singaperumal Koil Chennai ✔✔7427069034  Independe...
Verified Trusted Call Girls Singaperumal Koil Chennai ✔✔7427069034 Independe...
 
WhatsApp Chat: 📞 8617697112 Hire Call Girls Raiganj For a Sensual Sex Experience
WhatsApp Chat: 📞 8617697112 Hire Call Girls Raiganj For a Sensual Sex ExperienceWhatsApp Chat: 📞 8617697112 Hire Call Girls Raiganj For a Sensual Sex Experience
WhatsApp Chat: 📞 8617697112 Hire Call Girls Raiganj For a Sensual Sex Experience
 
Model Call Girls In Pazhavanthangal WhatsApp Booking 7427069034 call girl ser...
Model Call Girls In Pazhavanthangal WhatsApp Booking 7427069034 call girl ser...Model Call Girls In Pazhavanthangal WhatsApp Booking 7427069034 call girl ser...
Model Call Girls In Pazhavanthangal WhatsApp Booking 7427069034 call girl ser...
 
(TOP CLASS) Call Girls In Nungambakkam Phone 7427069034 Call Girls Model With...
(TOP CLASS) Call Girls In Nungambakkam Phone 7427069034 Call Girls Model With...(TOP CLASS) Call Girls In Nungambakkam Phone 7427069034 Call Girls Model With...
(TOP CLASS) Call Girls In Nungambakkam Phone 7427069034 Call Girls Model With...
 
❤Personal Whatsapp Number Mukteshwar Call Girls 8617697112 💦✅.
❤Personal Whatsapp Number Mukteshwar Call Girls 8617697112 💦✅.❤Personal Whatsapp Number Mukteshwar Call Girls 8617697112 💦✅.
❤Personal Whatsapp Number Mukteshwar Call Girls 8617697112 💦✅.
 
(Verified Models) Airport Kolkata Escorts Service (+916297143586) Escort agen...
(Verified Models) Airport Kolkata Escorts Service (+916297143586) Escort agen...(Verified Models) Airport Kolkata Escorts Service (+916297143586) Escort agen...
(Verified Models) Airport Kolkata Escorts Service (+916297143586) Escort agen...
 
Navsari Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girl...
Navsari Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girl...Navsari Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girl...
Navsari Escorts Service ☎️ 6378878445 ( Sakshi Sinha ) High Profile Call Girl...
 
𓀤Call On 6297143586 𓀤 Park Street Call Girls In All Kolkata 24/7 Provide Call...
𓀤Call On 6297143586 𓀤 Park Street Call Girls In All Kolkata 24/7 Provide Call...𓀤Call On 6297143586 𓀤 Park Street Call Girls In All Kolkata 24/7 Provide Call...
𓀤Call On 6297143586 𓀤 Park Street Call Girls In All Kolkata 24/7 Provide Call...
 
Thane West \ Escort Service in Mumbai - 450+ Call Girl Cash Payment 983332523...
Thane West \ Escort Service in Mumbai - 450+ Call Girl Cash Payment 983332523...Thane West \ Escort Service in Mumbai - 450+ Call Girl Cash Payment 983332523...
Thane West \ Escort Service in Mumbai - 450+ Call Girl Cash Payment 983332523...
 
Jodhpur Park ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sexy Bhabi ...
Jodhpur Park ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sexy Bhabi ...Jodhpur Park ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sexy Bhabi ...
Jodhpur Park ( Call Girls ) Kolkata ✔ 6297143586 ✔ Hot Model With Sexy Bhabi ...
 
CHEAP Call Girls in Malviya Nagar, (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in  Malviya Nagar, (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in  Malviya Nagar, (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Malviya Nagar, (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

PyCon India 2012: Rapid development of website search in python

  • 1. Rapid development of website search in Python PyCon India, Bangalore, Sept’ 12 Chetan Giridhar
  • 2. For whom!  If you’re, an experienced developer who has implemented search solutions currently dirtying your hands prototyping website search for your startup dreading to learn Java  just curious..
  • 3. Think web development  Core functionality  Design patterns  Web Interface  Usability  Scalability  Performance  …?
  • 4. Search  Often considered – ‘good to have’  Enhances user experience  Focused information  Relevance  Interaction  Ranked searching
  • 5. Typical Search Engine  Designing a schema  Convert your data as Documents and store them to index  Document is a set of fields  Field is a name=value pair  {title = “python”, content = “computer”, tag = “language”}  Analyzers  "parse" each field of your data into index-able "tokens" or keywords.  “Welcome to Pycon" it will produce list [“welcome", “to", “Pycon”]
  • 6. Typical Search Engine  Indexing  Adding documents to the index  Query and query parsers  Prepare query  Parse  Analyze  Searching  Lookup index
  • 7. Schema based document Index Writer Indexing & Committing Input files Field1 Field3 Analyzer Field2 In-memory Index Committed
  • 8. Query Parser Analyzer Results Searching Input query Index Searcher Index
  • 9.  Sourcing input data set  Handling input queries  How to search  Search engines  How to display results  Customization Development : Considerations
  • 10.  Apache Solr: Sunburnt  Haystack  Xapian: Xappy  Elastic Search Development: Options  Whoosh  Lucene: Pylucene
  • 11.  Pythonic APIs  Deployment Large scale and medium sized web sites Talking Pylucene & Whoosh  Rapid Minimal installation Clear Documentation Quick Setup Ease of Integration
  • 12. Pylucene  Pylucene: Python wrappers to Lucene  The de-facto standard for search engine library  Lucene: an open source, pure Java, search engine library  Embeds a Java VM with Lucene into a Python process
  • 13. Pylucene  Simple API  High performance indexing  Scalable to millions of documents  Efficient and feature rich search algorithms  Cross platform
  • 14. Whoosh  Whoosh is a search engine library  Fast indexing and search  One of the fastest Python search engine  100% Python code  Extensible code  No external dependency  Active development and support
  • 15. Whoosh  Easy to setup  Neutral to web frameworks  Powerful query language  Feature rich  Intuitive APIs
  • 16.  Document  Field  IndexWriter  QueryParser  Analyzer  IndexSearcher  fields.Schema  index.Index  qparser.QueryParser  analysis. Analyzer  searching.Searcher PyLucene Whoosh
  • 17.  Search design should be:  An independent component Pluggable Platform independent Assume minimal external dependency Easily extendible Seamless integration Designing search in websites
  • 19. Demo
  • 20. Comparing Engines  Basis of comparison  Indexing, Committing and Searching  Dataset  1 GB data  ~5000 files  file size ranging between 1KB to 50MB  Setup  Intel® Core™2 Duo CPU P8600 @ 2.40GHz × 2 3 GB RAM  Ubuntu Release 12.04 (precise) 32-bit
  • 21. Indexing 500 400 300 200 100 0 Time to Index pylucene whoosh time (s)
  • 22. Committing 300 250 200 150 100 50 0 Time to Commit pylucene whoosh time (s)
  • 23. Searching 0.01 0.008 0.006 0.004 0.002 0 Time to Search pylucene whoosh time (s)
  • 24. Recommendations  Search Engine Library No one solution fits all problems Search engine abstraction is the key Scalability is critical Rapid to setup, develop and tweak Understand and use 
  • 25.  Getting rapid and easier by the day  Web frameworks Web development in Python  Django, Pylons  Http Servers  Tornado, Gunicorn  Support for SQL/NoSQL databases MySQL-python, pymongo  Template Engines  Cheetah, jinja2  Search  Pylucene, Whoosh
  • 26. References  Whoosh  https://bitbucket.org/mchaput/whoosh/wiki/Home  Pylucene  http://lucene.apache.org/pylucene/  http://lucene.apache.org/core/3_6_1/api/all/index.html  Xappy  http://code.google.com/p/xappy/  ElasticSearch  http://www.elasticsearch.org/guide/reference/api/
  • 27. References  Chetan’s tech space  http://technobeans.com  Vishal’s technical blog  http://freethreads.net
  • 30. Whoosh v/s Haystack v/s Xapian • Whoosh is suitable for a small project. Limited scalability for search and indexing – A good beginning • Haystack is appropriate with Django • Xapian is ultra fast, but is not as feature rich as Solr • Lucene is not distributed; has external dependency
  • 31. Lucene v/s Database search • There are a number of query types that RDBMSs in general do not support without vendor extensions: • Fuzzy queries, in which "fuzzy" and "wuzzy" are considered matches • Word stemming queries, which consider "take," "took," and "taken" to be identical • Sound-like queries, which consider "cat" and "kat" to be identical • Synonym queries, which consider "jump," "hop," and "leap" to be identical • Queries on binary BLOB data types, such as PDF documents, Microsoft Word or Excel documents, or HTML and XML documents • More disappointingly, SQL search results are not ranked by match-relevance scores. The SQL standard is simply not intended for full-text querying.
  • 32. • Indexing – Convert files to a format for quick look up – Fast random access to stored words • Searching – Specify keywords • Displaying – Lookup documents that are relevant – Ranking – Different types of queries Typical search engine
  • 33. Advanced Searching  Morelikethis  didyoumean

Notes de l'éditeur

  1. Whoosh? If you love Python more than learning Java.