SlideShare une entreprise Scribd logo
1  sur  62
Télécharger pour lire hors ligne
Improving MySQL-
based applications
performance with
Sphinx

Maciej Dobrzaoski
(Мачей Добжаньски)
Percona, Inc.
INTRODUCTION
Who am I?
  – Consultant at Percona, Inc.
  – What do I do?
     • Performance audits
     • Fix broken systems
     • Design architectures
  – Typically work from home
INTRODUCTION
What is Percona, Inc.?
   – Consulting company
   – Provides services for MySQL applications
   – Develops open-source software
      • Scalability patches for InnoDB
      • XtraDB storage engine for MySQL
      • Xtrabackup – free backup solution for InnoDB/XtraDB
WHAT IS MYSQL?
WHAT IS MYSQL?
MySQL is...
   – Open-source relational database management system
   – Popular enough to assume everyone here knows it
WHAT IS SPHINX?
WHAT IS SPHINX?
A standalone full-text search engine
   – Consists of two major applications
      • indexer
      • searchd
   – More efficient than MySQL FULLTEXT
      • On larger data sets
WHAT IS SPHINX?
A standalone full-text search engine
   – Can be easily scaled horizontally
      • Sphinx indexes can be distributed across many servers
      • Allows parallel searching
      • One instance becomes a dispatcher
          – Forwards queries to other instances
          – Combines results before sending them back to clients
WHAT IS SPHINX?
WHAT IS SPHINX?
Many additional features beyond just full-text search
   – Indexable attributes for non-FTS filtering
      • numerical, multi-value and now also text
      • Example: limit results to rows which have
        article_score>=2
   – Sorting results by an attribute or an expression
      • Example: @weight+(article_score)*0.1
WHAT IS SPHINX?
Many additional features beyond just full-text search
   – Grouping results by an attribute
      • Additional support for timestamp attributes
      • Returns also row count per group – may be approximate
   – Calculating expressions
      • Much faster than in MySQL as per recent benchmarks
WHAT IS SPHINX?
Anything else?
   – On-line re-indexing
   – Live index updates
   – Extensive API available for many programming languages
      •   PHP
      •   Python
      •   Java
      •   many more
WHAT IS SPHINX?
There’s even more!
   – SphinxQL – MySQL server protocol compatible
      • Connect with any MySQL client
         – command line
         – API call, e.g. mysql_connect()
      • Run SQL-like queries
WHAT IS SPHINX?
Example use of SphinxQL
HOW DOES SPHINX WORK WITH MYSQL?
HOW DOES SPHINX WORK WITH MYSQL?
Sphinx is external application; not part of MYSQL
   – Uses own data files
   – Needs memory
   – Has to be queried separately
      • Sphinx API
      • SphinxQL
      • Sphinx Storage Engine for MySQL
HOW DOES SPHINX WORK WITH MYSQL?
Sphinx is external application; not part of MySQL
   – Updating Sphinx indexes has to be done separately too
      • Periodic data re-indexing with indexer
          – Some information may be outdated for a while
          – Can be optimized through re-indexing the latest changes only
      • Live index updates from applications
          – Applications need to write twice to both MySQL and Sphinx
          – Available only for attributes; full-text updates to come
HOW DOES MYSQL WORK WITH SPHINX?
Example data source for Sphinx index
sql_query = SELECT mi.id, mi.movie_id, t.production_year,
   t.title, mi.info FROM movie_info mi JOIN title t
   ON t.id = mi.movie_id
sql_attr_uint                   = movie_id
sql_attr_uint                   = production_year
• Notice the source can be any valid SQL query
   – Uses joins to denormalize data for Sphinx
• Two integer attributes – movie_id and production_year
HOW DOES SPHINX WORK WITH MYSQL?
Sphinx is not a full database (yet?)
   – It’s primarily a search engine
   – It can return values stored as attributes, e.g:
     movie_id, production_year
   – …but not any full-text searchable columns
   – Results from Sphinx can be used to fetch full details from
     database
IMPORTANT FACTS TO KNOW ABOUT
           MYSQL
IMPORTANT FACTS TO KNOW ABOUT MYSQL
Uses B-TREE indexes to improve search performance
   – Works great for equality operator (=)
   – …and small range lookups: >, >=, <, <=, IN (list), LIKE
      • Range size relative to table size, not an absolute value
      • Large range often turns into plain scan
IMPORTANT FACTS TO KNOW ABOUT MYSQL
MySQL can use any left-most part of an index
   – INDEX (a, b, c) can fully optimize both:
      (1) SELECT * FROM T WHERE a=9
      (2) SELECT * FROM T WHERE a=9 AND b IN (1,2) AND c=4
     …but not any of:
      (3) SELECT * FROM T WHERE b=7 AND c=1
      (4) SELECT * FROM T WHERE a=9 AND c=2 (may still use index for a=9 only)
   – No good indexes means you may need a new one
IMPORTANT FACTS TO KNOW ABOUT MYSQL
Each index slows down writes to a table
   – Index is an organized structure, it has to be maintained
   – There can’t be too many or performance will suffer
MySQL can typically use only one index per query
   – There are rare exceptions – index merge optimizations
   – Merges are often not good enough – an observation
IMPORTANT FACTS TO KNOW ABOUT MYSQL
These work great in MySQL
   – Index optimized searching
      • A query which uses indexes efficiently is fast enough
      • B-TREE lookups are typically very efficient
      • FULLTEXT indexes can be the exception
   – Index optimized sorting and grouping
      • Rows are read in the proper order
IMPORTANT FACTS TO KNOW ABOUT MYSQL
These can cause problems in MySQL
   – Full table scans
      • No index is used
      • Query reads entire table row by row checking for matches
   – Large scans related to poor selectivity
      • An index is used, but it is not selective enough
      • MySQL has to read a lot of rows and reject many of them
IMPORTANT FACTS TO KNOW ABOUT MYSQL
These can cause problems in MySQL
   – Search on many combinations of columns in a single table
      • Each combination may require new index
      • Can’t have too many indexes in table at the same time
   – Handling multi-value properties in searches
      • Keywords, tags
      • Such queries often can’t be optimized very well
IMPORTANT FACTS TO KNOW ABOUT MYSQL
These can cause problems in MySQL
   – Sorting or grouping not done through indexes
      • Requires rewriting rows into temporary storage
      • At least one additional pass over results to complete
      • LIMIT does not work until all matches are found and
        sorted/grouped
IMPORTANT FACTS TO KNOW ABOUT MYSQL
Indexes and data may be cached in memory
   – key_buffer and filesystem cache for MyISAM tables
   – innodb_buffer_pool for InnoDB tables
   – No guarantees what is in RAM
      • MySQL has no option to lock certain data in buffers
IMPORTANT FACTS TO KNOW ABOUT MYSQL
Full-text support in MySQL
   – Available through FULLTEXT keys
   – Only supported by MyISAM engine
      • MyISAM uses table level locking
      • May become a showstopper for busy databases
   – Cannot be used together with any other index
      • Even index merge will not work
IMPORTANT FACTS TO KNOW ABOUT
            SPHINX
IMPORTANT FACTS TO KNOW ABOUT SPHINX
Search remembers no more than max_matches results
  | total           | 1000   |
  | total_found     | 2255   |
  –   Other results are ignored before sending them to client
  –   Saves some CPU and RAM
  –   All results are often unnecessary
  –   Accuracy costs
IMPORTANT FACTS TO KNOW ABOUT SPHINX
IMPORTANT FACTS TO KNOW ABOUT SPHINX
Grouping is done in fixed memory
   – Results may be approximate
      • When number of matches exceeds max_matches
   – Inaccuracy depends on max_matches setting
      • The larger the more accurate grouping results
      • Growing max_matches can reduce performance
   – Accuracy costs
IMPORTANT FACTS TO KNOW ABOUT SPHINX
MySQL                         Sphinx (uses SphinxQL)
SELECT ..., COUNT(1) _c       SELECT *
   FROM movie_info               FROM movies
WHERE                         WHERE
   MATCH (info)                  MATCH ('@info "story"')
   AGAINST ('"story"'         GROUP BY movie_id
       IN BOOLEAN MODE)       ORDER BY @count DESC 4
   GROUP BY movie_id
   ORDER BY _c DESC LIMIT 4
IMPORTANT FACTS TO KNOW ABOUT SPHINX
MySQL                     Sphinx
+----------+----------+   +----------+--------+
| movie_id | COUNT(1) |   | movie_id | @count |
+----------+----------+   +----------+--------+
|    30372 |       15 |   |    30372 |     15 |
|   855624 |       13 |   |   855624 |     13 |
|   590071 |       13 |   |   143384 |     12 |
|   143384 |       12 |   |   590071 |     12 |
+----------+----------+   +----------+--------+
IMPORTANT FACTS TO KNOW ABOUT SPHINX
Full copy of attributes is always kept in RAM
   –   If attribute storage was set to ‘extern’ – the typical use
   –   Preloaded on start
   –   Never read from disk again once Sphinx is up
   –   Guarantees certain performance
   –   Calculate the storage requirements properly
        • Sphinx may want to allocate too much memory
IMPORTANT FACTS TO KNOW ABOUT SPHINX
Sphinx stores rows in blocks
   – 64 rows per block
   – Meta data contains (min, max) range of every attribute
   – Allows quick rejection when filtering by attributes
      • No need to scan every row individually
MYSQL V SPHINX
 PERFORMANCE
FULL-TEXT SEARCH PERFORMANCE

           USES FULL IMDB DATABASE
 IMPORTED INTO MYSQL AND INDEXED WITH SPHINX
FULL-TEXT SEARCH PERFORMANCE
MySQL                        Sphinx (uses SphinxQL)
SELECT COUNT(1)              SELECT *
   FROM movie_info              FROM movies
WHERE                        WHERE
   MATCH (info)                 MATCH ('@info "james
   AGAINST ('"james bond"'      bond"')
       IN BOOLEAN MODE)
FULL-TEXT SEARCH PERFORMANCE
MySQL                     Sphinx
+----------+              +---------------+-------+
| COUNT(1) |              | Variable_name | Value |
+----------+              +---------------+-------+
|     2255 |              | total         | 1000 |
+----------+              | total_found   | 2255 |
1 row in set (0.13 sec)   | time          | 0.003 |
                          ...
SCAN PERFORMANCE

          USES FULL IMDB DATABASE
IMPORTED INTO MYSQL AND INDEXED WITH SPHINX
SCAN PERFORMANCE
MySQL                           Sphinx (uses SphinxQL)
SELECT COUNT(1)                 SELECT *
   FROM title                      FROM titles
WHERE                           WHERE
   production_year >= 1990         production_year >= 1990
   AND                             AND
   production_year <= 2000         production_year <= 2000

No index on `production_year`
SCAN PERFORMANCE
MySQL                     Sphinx
+----------+              +---------------+--------+
| COUNT(1) |              | Variable_name | Value |
+----------+              +---------------+--------+
|   239203 |              | total         | 1000   |
+----------+              | total_found   | 239203 |
1 row in set (1.09 sec)   | time          | 0.051 |
                          ...
MORE COMPLEX CASE
      SEARCH BY KEYWORDS
          USES FULL IMDB DATABASE
IMPORTED INTO MYSQL AND INDEXED WITH SPHINX
SEARCH BY KEYWORDS
MySQL                             Sphinx (uses SphinxQL)
SELECT t.id FROM title t          SELECT *
   JOIN movie_keyword mk             FROM keywords
   ON mk.movie_id = t.id          WHERE
   JOIN keyword k
   ON k.id = mk.keyword_id           MATCH
                                     ('@keywords
WHERE                                     ("beautiful-woman"|
   k.keyword IN ('beautiful-              "women"|"murder")')
   woman', 'women', 'murder')
                                  ORDER BY production_year DESC
GROUP BY t.id ORDER BY               LIMIT 3
   production_year DESC LIMIT 3
SEARCH BY KEYWORDS
MySQL                      Sphinx
+--------+                 +--------+
| id     |                 | id     |
+--------+                 +--------+
| 561959 |                 | 561959 |
| 74273 |                  | 74273 |
| 344814 |                 | 344814 |
+--------+                 +--------+
3 rows in set (1.84 sec)   time = 0.015
SEARCH BY KEYWORDS
Sphinx returns
   – Values of the indexed attrubites
   – Meta information about search and results
   – No text
      • Recent version can actually store and return short strings
      • But only defined as attributes, not full-text searchable
SEARCH BY KEYWORDS
Use that information to fetch full details from MySQL

mysql> SELECT t.id, t.title FROM title t WHERE
        t.id IN(561959, 74273, 344814)
   +--------+---------------------------------------+
   | id     | title                                 |
   +--------+---------------------------------------+
   | 74273 | Blue Silence                           |
   | 344814 | Marvin: The Life Story of Marvin Gaye |
   | 561959 | The Red Man's View                    |
   +--------+---------------------------------------+
SEARCH BY KEYWORDS
MySQL                            Sphinx
+--------+-------------------+   +--------+-----------------+
| id     | title             |   | id     | production_year |
+--------+-------------------+   +--------+-----------------+
| 74273 | Blue Silence       |   | 561959 |            2014 |
| 344814 | Marvin: The Li... |   | 74273 |             2013 |
| 561959 | The Red Man's ... |   | 344814 |            2012 |
+--------+-------------------+   +--------+-----------------+
       Notice MySQL returned rows in different order!
SEARCH BY KEYWORDS
The order in SQL can only be guaranteed with ORDER BY!
What is the solution?
   – Append ORDER       BY production_year DESC
        • applies to only small number of rows, so it’s probably okay
   or
   – Remember the order of Sphinx results in application
   – Restore it after reveiving data from MySQL
SEARCH BY KEYWORDS
What if „keywords” were numerical identifiers?
   – Create „fake keywords” and index them as text
   – Convert numbers into strings when building index
     sql_query = SELECT t.id,
     GROUP_CONCAT(CONCAT('KEY_', mk.keyword_id))
     FROM title t JOIN movie_keyword mk ON t.id = mk.movie_id
     GROUP BY t.id

   – Run full-text searches using strings such as "KEY_1234"
FLEXIBLE SEARCH
FLEXIBLE SEARCH
A data structure describing user profile
CREATE TABLE `members` (
   `user_id` int(10) unsigned,
   `user_firstname` varchar(50) unsigned,
   `user_surname` varchar(50) unsigned,
   `user_dob` date unsigned,
   `user_lastvisit` datetime unsigned,
   `user_datetime` datetime unsigned,
   `user_bio` unsigned,
   `user_hasphoto` tinyint(2) unsigned,
   `user_hasvideo` tinyint(2) unsigned,
   ...
FLEXIBLE SEARCH
Flexible search typically means
   – Search conditions may involve any number of columns in
     any combination
   – Sorting may be done on one of many columns as well

Often impossible to add all necessary indexes in MySQL
FLEXIBLE SEARCH
Many columns may have very low cardinality
   – Example: user_gender
   – MySQL would not even consider using index for such
     column

It may be very difficult to make it work fast in MySQL
   – When tables or traffic are large enough
FLEXIBLE SEARCH
How does Sphinx help?
   –   Scans are optimized
   –   Optimizations apply to all columns
   –   Possibility to use „fake keywords”
   –   Data can be split across several instances
        • Parallel search
        • No extra application logic necessary to combine results
SUMMARY
SUMMARY
Sphinx can be of great help to many MySQL-based apps
   – Developed to work better where MySQL performs poorly
      •   Text search
      •   Large scans
      •   Filtering on many combinations of columns
      •   Handling multi-value properties
SUMMARY
Sphinx can be of great help to any MySQL-based apps
   –   Comes with features that can actually replace database
   –   Easily scalable
   –   Actively developed
   –   You can sponsor development and have features you need
       done soon
        • No need to wait long until some functionality „appears”
Sphinx
http://www.sphinxsearch.com/

Percona Consulting
http://www.percona.com/
THANK YOU!

Contenu connexe

Tendances

Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Spark Summit
 

Tendances (20)

Spark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean WamplerSpark Summit EU talk by Dean Wampler
Spark Summit EU talk by Dean Wampler
 
Spark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub HavaSpark Summit EU talk by Jakub Hava
Spark Summit EU talk by Jakub Hava
 
Pinterest hadoop summit_talk
Pinterest hadoop summit_talkPinterest hadoop summit_talk
Pinterest hadoop summit_talk
 
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
Fault Tolerance in Spark: Lessons Learned from Production: Spark Summit East ...
 
Running Spark on Cloud
Running Spark on CloudRunning Spark on Cloud
Running Spark on Cloud
 
Apis with dotnet postgreSQL and Apsaradb
Apis with dotnet postgreSQL and ApsaradbApis with dotnet postgreSQL and Apsaradb
Apis with dotnet postgreSQL and Apsaradb
 
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
 
An overview of Amazon Athena
An overview of Amazon AthenaAn overview of Amazon Athena
An overview of Amazon Athena
 
MySQL Query Optimization.
MySQL Query Optimization.MySQL Query Optimization.
MySQL Query Optimization.
 
Apache Solr 5.0 and beyond
Apache Solr 5.0 and beyondApache Solr 5.0 and beyond
Apache Solr 5.0 and beyond
 
Hadoopsummit16 myui
Hadoopsummit16 myuiHadoopsummit16 myui
Hadoopsummit16 myui
 
Scala and jvm_languages_praveen_technologist
Scala and jvm_languages_praveen_technologistScala and jvm_languages_praveen_technologist
Scala and jvm_languages_praveen_technologist
 
Solr Consistency and Recovery Internals - Mano Kovacs, Cloudera
Solr Consistency and Recovery Internals - Mano Kovacs, ClouderaSolr Consistency and Recovery Internals - Mano Kovacs, Cloudera
Solr Consistency and Recovery Internals - Mano Kovacs, Cloudera
 
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
Lessons Learned in Deploying the ELK Stack (Elasticsearch, Logstash, and Kibana)
 
Presto
PrestoPresto
Presto
 
Using Elasticsearch for Analytics
Using Elasticsearch for AnalyticsUsing Elasticsearch for Analytics
Using Elasticsearch for Analytics
 
Look Mom nosql
Look Mom nosqlLook Mom nosql
Look Mom nosql
 
Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...Building a Business Logic Translation Engine with Spark Streaming for Communi...
Building a Business Logic Translation Engine with Spark Streaming for Communi...
 
Art of Feature Engineering for Data Science with Nabeel Sarwar
Art of Feature Engineering for Data Science with Nabeel SarwarArt of Feature Engineering for Data Science with Nabeel Sarwar
Art of Feature Engineering for Data Science with Nabeel Sarwar
 
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
Open Source Ingredients for Interactive Data Analysis in Spark by Maxim Lukiy...
 

Similaire à Sphinx new

MariaDB with SphinxSE
MariaDB with SphinxSEMariaDB with SphinxSE
MariaDB with SphinxSE
Colin Charles
 
MySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMySQL: Know more about open Source Database
MySQL: Know more about open Source Database
Mahesh Salaria
 
Cassandra
CassandraCassandra
Cassandra
exsuns
 

Similaire à Sphinx new (20)

MariaDB with SphinxSE
MariaDB with SphinxSEMariaDB with SphinxSE
MariaDB with SphinxSE
 
Plugin Opensql2008 Sphinx
Plugin Opensql2008 SphinxPlugin Opensql2008 Sphinx
Plugin Opensql2008 Sphinx
 
ElasticSearch as (only) datastore
ElasticSearch as (only) datastoreElasticSearch as (only) datastore
ElasticSearch as (only) datastore
 
Upgrade to MySQL 8.0!
Upgrade to MySQL 8.0!Upgrade to MySQL 8.0!
Upgrade to MySQL 8.0!
 
01 upgrade to my sql8
01 upgrade to my sql8 01 upgrade to my sql8
01 upgrade to my sql8
 
Using Sphinx for Search in PHP
Using Sphinx for Search in PHPUsing Sphinx for Search in PHP
Using Sphinx for Search in PHP
 
My sql crashcourse_intro_kdl
My sql crashcourse_intro_kdlMy sql crashcourse_intro_kdl
My sql crashcourse_intro_kdl
 
MySQL :What's New #GIDS16
MySQL :What's New #GIDS16MySQL :What's New #GIDS16
MySQL :What's New #GIDS16
 
MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0
 
Maria db 10 and the mariadb foundation(colin)
Maria db 10 and the mariadb foundation(colin)Maria db 10 and the mariadb foundation(colin)
Maria db 10 and the mariadb foundation(colin)
 
MySQL: Know more about open Source Database
MySQL: Know more about open Source DatabaseMySQL: Know more about open Source Database
MySQL: Know more about open Source Database
 
Sql Server2008
Sql Server2008Sql Server2008
Sql Server2008
 
Cassandra
CassandraCassandra
Cassandra
 
MySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdf
MySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdfMySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdf
MySQL Ecosystem in 2023 - FOSSASIA'23 - Alkin.pptx.pdf
 
Breaking data
Breaking dataBreaking data
Breaking data
 
Data Warehouse Logical Design using Mysql
Data Warehouse Logical Design using MysqlData Warehouse Logical Design using Mysql
Data Warehouse Logical Design using Mysql
 
Membase East Coast Meetups
Membase East Coast MeetupsMembase East Coast Meetups
Membase East Coast Meetups
 
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp0220140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
20140128 webinar-get-more-out-of-mysql-with-tokudb-140319063324-phpapp02
 
MySQL Ecosystem in 2020
MySQL Ecosystem in 2020MySQL Ecosystem in 2020
MySQL Ecosystem in 2020
 
L6.sp17.pptx
L6.sp17.pptxL6.sp17.pptx
L6.sp17.pptx
 

Plus de rit2010

Microsoft cluster systems ritconf
Microsoft cluster systems ritconfMicrosoft cluster systems ritconf
Microsoft cluster systems ritconf
rit2010
 
анатомия интернет банка Publish
анатомия интернет банка Publishанатомия интернет банка Publish
анатомия интернет банка Publish
rit2010
 
анатомия интернет банка Publish
анатомия интернет банка Publishанатомия интернет банка Publish
анатомия интернет банка Publish
rit2010
 
Anatol filin pragmatic documentation 1_r
Anatol filin  pragmatic documentation 1_rAnatol filin  pragmatic documentation 1_r
Anatol filin pragmatic documentation 1_r
rit2010
 
Ilia kantor паттерны серверных comet решений
Ilia kantor паттерны серверных comet решенийIlia kantor паттерны серверных comet решений
Ilia kantor паттерны серверных comet решений
rit2010
 
Alexei shilov 2010 rit-rakudo
Alexei shilov 2010 rit-rakudoAlexei shilov 2010 rit-rakudo
Alexei shilov 2010 rit-rakudo
rit2010
 
Alexandre.iline rit 2010 java_fxui_extra
Alexandre.iline rit 2010 java_fxui_extraAlexandre.iline rit 2010 java_fxui_extra
Alexandre.iline rit 2010 java_fxui_extra
rit2010
 
Konstantin kolomeetz послание внутреннему заказчику
Konstantin kolomeetz послание внутреннему заказчикуKonstantin kolomeetz послание внутреннему заказчику
Konstantin kolomeetz послание внутреннему заказчику
rit2010
 
Bykov monitoring mailru
Bykov monitoring mailruBykov monitoring mailru
Bykov monitoring mailru
rit2010
 
Alexander shigin slides
Alexander shigin slidesAlexander shigin slides
Alexander shigin slides
rit2010
 
иван василевич Eye tracking и нейрокомпьютерный интерфейс
иван василевич Eye tracking и нейрокомпьютерный интерфейсиван василевич Eye tracking и нейрокомпьютерный интерфейс
иван василевич Eye tracking и нейрокомпьютерный интерфейс
rit2010
 
Andrey Petrov P D P
Andrey Petrov P D PAndrey Petrov P D P
Andrey Petrov P D P
rit2010
 
Andrey Petrov методология P D P, часть 1, цели вместо кейсов
Andrey Petrov методология P D P, часть 1, цели вместо кейсовAndrey Petrov методология P D P, часть 1, цели вместо кейсов
Andrey Petrov методология P D P, часть 1, цели вместо кейсов
rit2010
 
Dmitry lohansky rit2010
Dmitry lohansky rit2010Dmitry lohansky rit2010
Dmitry lohansky rit2010
rit2010
 
Dmitry Lohansky Rit2010
Dmitry Lohansky Rit2010Dmitry Lohansky Rit2010
Dmitry Lohansky Rit2010
rit2010
 
Related Queries Braslavski Yandex
Related Queries Braslavski YandexRelated Queries Braslavski Yandex
Related Queries Braslavski Yandex
rit2010
 
молчанов сергей датацентры 10 04 2010 Light
молчанов сергей датацентры 10 04 2010  Lightмолчанов сергей датацентры 10 04 2010  Light
молчанов сергей датацентры 10 04 2010 Light
rit2010
 
Sergey Ilinsky Rit 2010 Complex Gui Development Ample Sdk
Sergey Ilinsky Rit 2010 Complex Gui Development Ample SdkSergey Ilinsky Rit 2010 Complex Gui Development Ample Sdk
Sergey Ilinsky Rit 2010 Complex Gui Development Ample Sdk
rit2010
 
Serge P Nekoval Grails
Serge P  Nekoval GrailsSerge P  Nekoval Grails
Serge P Nekoval Grails
rit2010
 
Pavel Braslavski Related Queries Braslavski Yandex
Pavel Braslavski Related Queries Braslavski YandexPavel Braslavski Related Queries Braslavski Yandex
Pavel Braslavski Related Queries Braslavski Yandex
rit2010
 

Plus de rit2010 (20)

Microsoft cluster systems ritconf
Microsoft cluster systems ritconfMicrosoft cluster systems ritconf
Microsoft cluster systems ritconf
 
анатомия интернет банка Publish
анатомия интернет банка Publishанатомия интернет банка Publish
анатомия интернет банка Publish
 
анатомия интернет банка Publish
анатомия интернет банка Publishанатомия интернет банка Publish
анатомия интернет банка Publish
 
Anatol filin pragmatic documentation 1_r
Anatol filin  pragmatic documentation 1_rAnatol filin  pragmatic documentation 1_r
Anatol filin pragmatic documentation 1_r
 
Ilia kantor паттерны серверных comet решений
Ilia kantor паттерны серверных comet решенийIlia kantor паттерны серверных comet решений
Ilia kantor паттерны серверных comet решений
 
Alexei shilov 2010 rit-rakudo
Alexei shilov 2010 rit-rakudoAlexei shilov 2010 rit-rakudo
Alexei shilov 2010 rit-rakudo
 
Alexandre.iline rit 2010 java_fxui_extra
Alexandre.iline rit 2010 java_fxui_extraAlexandre.iline rit 2010 java_fxui_extra
Alexandre.iline rit 2010 java_fxui_extra
 
Konstantin kolomeetz послание внутреннему заказчику
Konstantin kolomeetz послание внутреннему заказчикуKonstantin kolomeetz послание внутреннему заказчику
Konstantin kolomeetz послание внутреннему заказчику
 
Bykov monitoring mailru
Bykov monitoring mailruBykov monitoring mailru
Bykov monitoring mailru
 
Alexander shigin slides
Alexander shigin slidesAlexander shigin slides
Alexander shigin slides
 
иван василевич Eye tracking и нейрокомпьютерный интерфейс
иван василевич Eye tracking и нейрокомпьютерный интерфейсиван василевич Eye tracking и нейрокомпьютерный интерфейс
иван василевич Eye tracking и нейрокомпьютерный интерфейс
 
Andrey Petrov P D P
Andrey Petrov P D PAndrey Petrov P D P
Andrey Petrov P D P
 
Andrey Petrov методология P D P, часть 1, цели вместо кейсов
Andrey Petrov методология P D P, часть 1, цели вместо кейсовAndrey Petrov методология P D P, часть 1, цели вместо кейсов
Andrey Petrov методология P D P, часть 1, цели вместо кейсов
 
Dmitry lohansky rit2010
Dmitry lohansky rit2010Dmitry lohansky rit2010
Dmitry lohansky rit2010
 
Dmitry Lohansky Rit2010
Dmitry Lohansky Rit2010Dmitry Lohansky Rit2010
Dmitry Lohansky Rit2010
 
Related Queries Braslavski Yandex
Related Queries Braslavski YandexRelated Queries Braslavski Yandex
Related Queries Braslavski Yandex
 
молчанов сергей датацентры 10 04 2010 Light
молчанов сергей датацентры 10 04 2010  Lightмолчанов сергей датацентры 10 04 2010  Light
молчанов сергей датацентры 10 04 2010 Light
 
Sergey Ilinsky Rit 2010 Complex Gui Development Ample Sdk
Sergey Ilinsky Rit 2010 Complex Gui Development Ample SdkSergey Ilinsky Rit 2010 Complex Gui Development Ample Sdk
Sergey Ilinsky Rit 2010 Complex Gui Development Ample Sdk
 
Serge P Nekoval Grails
Serge P  Nekoval GrailsSerge P  Nekoval Grails
Serge P Nekoval Grails
 
Pavel Braslavski Related Queries Braslavski Yandex
Pavel Braslavski Related Queries Braslavski YandexPavel Braslavski Related Queries Braslavski Yandex
Pavel Braslavski Related Queries Braslavski Yandex
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Sphinx new

  • 1. Improving MySQL- based applications performance with Sphinx Maciej Dobrzaoski (Мачей Добжаньски) Percona, Inc.
  • 2. INTRODUCTION Who am I? – Consultant at Percona, Inc. – What do I do? • Performance audits • Fix broken systems • Design architectures – Typically work from home
  • 3. INTRODUCTION What is Percona, Inc.? – Consulting company – Provides services for MySQL applications – Develops open-source software • Scalability patches for InnoDB • XtraDB storage engine for MySQL • Xtrabackup – free backup solution for InnoDB/XtraDB
  • 5. WHAT IS MYSQL? MySQL is... – Open-source relational database management system – Popular enough to assume everyone here knows it
  • 7. WHAT IS SPHINX? A standalone full-text search engine – Consists of two major applications • indexer • searchd – More efficient than MySQL FULLTEXT • On larger data sets
  • 8. WHAT IS SPHINX? A standalone full-text search engine – Can be easily scaled horizontally • Sphinx indexes can be distributed across many servers • Allows parallel searching • One instance becomes a dispatcher – Forwards queries to other instances – Combines results before sending them back to clients
  • 10. WHAT IS SPHINX? Many additional features beyond just full-text search – Indexable attributes for non-FTS filtering • numerical, multi-value and now also text • Example: limit results to rows which have article_score>=2 – Sorting results by an attribute or an expression • Example: @weight+(article_score)*0.1
  • 11. WHAT IS SPHINX? Many additional features beyond just full-text search – Grouping results by an attribute • Additional support for timestamp attributes • Returns also row count per group – may be approximate – Calculating expressions • Much faster than in MySQL as per recent benchmarks
  • 12. WHAT IS SPHINX? Anything else? – On-line re-indexing – Live index updates – Extensive API available for many programming languages • PHP • Python • Java • many more
  • 13. WHAT IS SPHINX? There’s even more! – SphinxQL – MySQL server protocol compatible • Connect with any MySQL client – command line – API call, e.g. mysql_connect() • Run SQL-like queries
  • 14. WHAT IS SPHINX? Example use of SphinxQL
  • 15. HOW DOES SPHINX WORK WITH MYSQL?
  • 16. HOW DOES SPHINX WORK WITH MYSQL? Sphinx is external application; not part of MYSQL – Uses own data files – Needs memory – Has to be queried separately • Sphinx API • SphinxQL • Sphinx Storage Engine for MySQL
  • 17. HOW DOES SPHINX WORK WITH MYSQL? Sphinx is external application; not part of MySQL – Updating Sphinx indexes has to be done separately too • Periodic data re-indexing with indexer – Some information may be outdated for a while – Can be optimized through re-indexing the latest changes only • Live index updates from applications – Applications need to write twice to both MySQL and Sphinx – Available only for attributes; full-text updates to come
  • 18. HOW DOES MYSQL WORK WITH SPHINX? Example data source for Sphinx index sql_query = SELECT mi.id, mi.movie_id, t.production_year, t.title, mi.info FROM movie_info mi JOIN title t ON t.id = mi.movie_id sql_attr_uint = movie_id sql_attr_uint = production_year • Notice the source can be any valid SQL query – Uses joins to denormalize data for Sphinx • Two integer attributes – movie_id and production_year
  • 19. HOW DOES SPHINX WORK WITH MYSQL? Sphinx is not a full database (yet?) – It’s primarily a search engine – It can return values stored as attributes, e.g: movie_id, production_year – …but not any full-text searchable columns – Results from Sphinx can be used to fetch full details from database
  • 20. IMPORTANT FACTS TO KNOW ABOUT MYSQL
  • 21. IMPORTANT FACTS TO KNOW ABOUT MYSQL Uses B-TREE indexes to improve search performance – Works great for equality operator (=) – …and small range lookups: >, >=, <, <=, IN (list), LIKE • Range size relative to table size, not an absolute value • Large range often turns into plain scan
  • 22. IMPORTANT FACTS TO KNOW ABOUT MYSQL MySQL can use any left-most part of an index – INDEX (a, b, c) can fully optimize both: (1) SELECT * FROM T WHERE a=9 (2) SELECT * FROM T WHERE a=9 AND b IN (1,2) AND c=4 …but not any of: (3) SELECT * FROM T WHERE b=7 AND c=1 (4) SELECT * FROM T WHERE a=9 AND c=2 (may still use index for a=9 only) – No good indexes means you may need a new one
  • 23. IMPORTANT FACTS TO KNOW ABOUT MYSQL Each index slows down writes to a table – Index is an organized structure, it has to be maintained – There can’t be too many or performance will suffer MySQL can typically use only one index per query – There are rare exceptions – index merge optimizations – Merges are often not good enough – an observation
  • 24. IMPORTANT FACTS TO KNOW ABOUT MYSQL These work great in MySQL – Index optimized searching • A query which uses indexes efficiently is fast enough • B-TREE lookups are typically very efficient • FULLTEXT indexes can be the exception – Index optimized sorting and grouping • Rows are read in the proper order
  • 25. IMPORTANT FACTS TO KNOW ABOUT MYSQL These can cause problems in MySQL – Full table scans • No index is used • Query reads entire table row by row checking for matches – Large scans related to poor selectivity • An index is used, but it is not selective enough • MySQL has to read a lot of rows and reject many of them
  • 26. IMPORTANT FACTS TO KNOW ABOUT MYSQL These can cause problems in MySQL – Search on many combinations of columns in a single table • Each combination may require new index • Can’t have too many indexes in table at the same time – Handling multi-value properties in searches • Keywords, tags • Such queries often can’t be optimized very well
  • 27. IMPORTANT FACTS TO KNOW ABOUT MYSQL These can cause problems in MySQL – Sorting or grouping not done through indexes • Requires rewriting rows into temporary storage • At least one additional pass over results to complete • LIMIT does not work until all matches are found and sorted/grouped
  • 28. IMPORTANT FACTS TO KNOW ABOUT MYSQL Indexes and data may be cached in memory – key_buffer and filesystem cache for MyISAM tables – innodb_buffer_pool for InnoDB tables – No guarantees what is in RAM • MySQL has no option to lock certain data in buffers
  • 29. IMPORTANT FACTS TO KNOW ABOUT MYSQL Full-text support in MySQL – Available through FULLTEXT keys – Only supported by MyISAM engine • MyISAM uses table level locking • May become a showstopper for busy databases – Cannot be used together with any other index • Even index merge will not work
  • 30. IMPORTANT FACTS TO KNOW ABOUT SPHINX
  • 31. IMPORTANT FACTS TO KNOW ABOUT SPHINX Search remembers no more than max_matches results | total | 1000 | | total_found | 2255 | – Other results are ignored before sending them to client – Saves some CPU and RAM – All results are often unnecessary – Accuracy costs
  • 32. IMPORTANT FACTS TO KNOW ABOUT SPHINX
  • 33. IMPORTANT FACTS TO KNOW ABOUT SPHINX Grouping is done in fixed memory – Results may be approximate • When number of matches exceeds max_matches – Inaccuracy depends on max_matches setting • The larger the more accurate grouping results • Growing max_matches can reduce performance – Accuracy costs
  • 34. IMPORTANT FACTS TO KNOW ABOUT SPHINX MySQL Sphinx (uses SphinxQL) SELECT ..., COUNT(1) _c SELECT * FROM movie_info FROM movies WHERE WHERE MATCH (info) MATCH ('@info "story"') AGAINST ('"story"' GROUP BY movie_id IN BOOLEAN MODE) ORDER BY @count DESC 4 GROUP BY movie_id ORDER BY _c DESC LIMIT 4
  • 35. IMPORTANT FACTS TO KNOW ABOUT SPHINX MySQL Sphinx +----------+----------+ +----------+--------+ | movie_id | COUNT(1) | | movie_id | @count | +----------+----------+ +----------+--------+ | 30372 | 15 | | 30372 | 15 | | 855624 | 13 | | 855624 | 13 | | 590071 | 13 | | 143384 | 12 | | 143384 | 12 | | 590071 | 12 | +----------+----------+ +----------+--------+
  • 36. IMPORTANT FACTS TO KNOW ABOUT SPHINX Full copy of attributes is always kept in RAM – If attribute storage was set to ‘extern’ – the typical use – Preloaded on start – Never read from disk again once Sphinx is up – Guarantees certain performance – Calculate the storage requirements properly • Sphinx may want to allocate too much memory
  • 37. IMPORTANT FACTS TO KNOW ABOUT SPHINX Sphinx stores rows in blocks – 64 rows per block – Meta data contains (min, max) range of every attribute – Allows quick rejection when filtering by attributes • No need to scan every row individually
  • 38. MYSQL V SPHINX PERFORMANCE
  • 39. FULL-TEXT SEARCH PERFORMANCE USES FULL IMDB DATABASE IMPORTED INTO MYSQL AND INDEXED WITH SPHINX
  • 40. FULL-TEXT SEARCH PERFORMANCE MySQL Sphinx (uses SphinxQL) SELECT COUNT(1) SELECT * FROM movie_info FROM movies WHERE WHERE MATCH (info) MATCH ('@info "james AGAINST ('"james bond"' bond"') IN BOOLEAN MODE)
  • 41. FULL-TEXT SEARCH PERFORMANCE MySQL Sphinx +----------+ +---------------+-------+ | COUNT(1) | | Variable_name | Value | +----------+ +---------------+-------+ | 2255 | | total | 1000 | +----------+ | total_found | 2255 | 1 row in set (0.13 sec) | time | 0.003 | ...
  • 42. SCAN PERFORMANCE USES FULL IMDB DATABASE IMPORTED INTO MYSQL AND INDEXED WITH SPHINX
  • 43. SCAN PERFORMANCE MySQL Sphinx (uses SphinxQL) SELECT COUNT(1) SELECT * FROM title FROM titles WHERE WHERE production_year >= 1990 production_year >= 1990 AND AND production_year <= 2000 production_year <= 2000 No index on `production_year`
  • 44. SCAN PERFORMANCE MySQL Sphinx +----------+ +---------------+--------+ | COUNT(1) | | Variable_name | Value | +----------+ +---------------+--------+ | 239203 | | total | 1000 | +----------+ | total_found | 239203 | 1 row in set (1.09 sec) | time | 0.051 | ...
  • 45. MORE COMPLEX CASE SEARCH BY KEYWORDS USES FULL IMDB DATABASE IMPORTED INTO MYSQL AND INDEXED WITH SPHINX
  • 46. SEARCH BY KEYWORDS MySQL Sphinx (uses SphinxQL) SELECT t.id FROM title t SELECT * JOIN movie_keyword mk FROM keywords ON mk.movie_id = t.id WHERE JOIN keyword k ON k.id = mk.keyword_id MATCH ('@keywords WHERE ("beautiful-woman"| k.keyword IN ('beautiful- "women"|"murder")') woman', 'women', 'murder') ORDER BY production_year DESC GROUP BY t.id ORDER BY LIMIT 3 production_year DESC LIMIT 3
  • 47. SEARCH BY KEYWORDS MySQL Sphinx +--------+ +--------+ | id | | id | +--------+ +--------+ | 561959 | | 561959 | | 74273 | | 74273 | | 344814 | | 344814 | +--------+ +--------+ 3 rows in set (1.84 sec) time = 0.015
  • 48. SEARCH BY KEYWORDS Sphinx returns – Values of the indexed attrubites – Meta information about search and results – No text • Recent version can actually store and return short strings • But only defined as attributes, not full-text searchable
  • 49. SEARCH BY KEYWORDS Use that information to fetch full details from MySQL mysql> SELECT t.id, t.title FROM title t WHERE t.id IN(561959, 74273, 344814) +--------+---------------------------------------+ | id | title | +--------+---------------------------------------+ | 74273 | Blue Silence | | 344814 | Marvin: The Life Story of Marvin Gaye | | 561959 | The Red Man's View | +--------+---------------------------------------+
  • 50. SEARCH BY KEYWORDS MySQL Sphinx +--------+-------------------+ +--------+-----------------+ | id | title | | id | production_year | +--------+-------------------+ +--------+-----------------+ | 74273 | Blue Silence | | 561959 | 2014 | | 344814 | Marvin: The Li... | | 74273 | 2013 | | 561959 | The Red Man's ... | | 344814 | 2012 | +--------+-------------------+ +--------+-----------------+ Notice MySQL returned rows in different order!
  • 51. SEARCH BY KEYWORDS The order in SQL can only be guaranteed with ORDER BY! What is the solution? – Append ORDER BY production_year DESC • applies to only small number of rows, so it’s probably okay or – Remember the order of Sphinx results in application – Restore it after reveiving data from MySQL
  • 52. SEARCH BY KEYWORDS What if „keywords” were numerical identifiers? – Create „fake keywords” and index them as text – Convert numbers into strings when building index sql_query = SELECT t.id, GROUP_CONCAT(CONCAT('KEY_', mk.keyword_id)) FROM title t JOIN movie_keyword mk ON t.id = mk.movie_id GROUP BY t.id – Run full-text searches using strings such as "KEY_1234"
  • 54. FLEXIBLE SEARCH A data structure describing user profile CREATE TABLE `members` ( `user_id` int(10) unsigned, `user_firstname` varchar(50) unsigned, `user_surname` varchar(50) unsigned, `user_dob` date unsigned, `user_lastvisit` datetime unsigned, `user_datetime` datetime unsigned, `user_bio` unsigned, `user_hasphoto` tinyint(2) unsigned, `user_hasvideo` tinyint(2) unsigned, ...
  • 55. FLEXIBLE SEARCH Flexible search typically means – Search conditions may involve any number of columns in any combination – Sorting may be done on one of many columns as well Often impossible to add all necessary indexes in MySQL
  • 56. FLEXIBLE SEARCH Many columns may have very low cardinality – Example: user_gender – MySQL would not even consider using index for such column It may be very difficult to make it work fast in MySQL – When tables or traffic are large enough
  • 57. FLEXIBLE SEARCH How does Sphinx help? – Scans are optimized – Optimizations apply to all columns – Possibility to use „fake keywords” – Data can be split across several instances • Parallel search • No extra application logic necessary to combine results
  • 59. SUMMARY Sphinx can be of great help to many MySQL-based apps – Developed to work better where MySQL performs poorly • Text search • Large scans • Filtering on many combinations of columns • Handling multi-value properties
  • 60. SUMMARY Sphinx can be of great help to any MySQL-based apps – Comes with features that can actually replace database – Easily scalable – Actively developed – You can sponsor development and have features you need done soon • No need to wait long until some functionality „appears”