SlideShare a Scribd company logo
1 of 26
Download to read offline
Efficient Pagination Using MySQL
                   Surat Singh Bhati (surat@yahoo-inc.com)
                    Rick James (rjames@yahoo-inc.com)


                               Yahoo Inc


Percona Performance Conference 2009
Outline

  1. Overview
     –    Common pagination UI pattern
     – Sample table and typical solution using OFFSET
     – Techniques to avoid large OFFSET
     – Performance comparison
     – Concerns




                                         -2-
Common Patterns




                  -3-
Basics


    First step toward having efficient pagination over large data set

     – Use index to filter rows (resolve WHERE)
     – Use same index to return rows in sorted order (resolve ORDER)




    Step zero
     – http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html
     – http://dev.mysql.com/doc/refman/5.1/en/order-by-optimization.html
     – http://dev.mysql.com/doc/refman/5.1/en/limit-optimization.html




                                        -4-
Using Index


  KEY a_b_c (a, b, c)

  ORDER may get resolved using Index
       –   ORDER BY a
       –   ORDER BY a,b
       –   ORDER BY a, b, c
       –   ORDER BY a DESC, b DESC, c DESC


  WHERE and ORDER both resolved using index:
       –   WHERE a = const ORDER BY b, c
       –   WHERE a = const AND b = const ORDER BY c
       –   WHERE a = const     ORDER BY b, c
       –   WHERE a = const AND b > const ORDER BY b, c

  ORDER will not get resolved uisng index (file sort)
       –   ORDER BY a ASC, b DESC, c DESC /* mixed sort direction */
       –   WHERE g = const ORDER BY b, c        /* a prefix is missing */
       –   WHERE a = const ORDER BY c           /* b is missing */
       –   WHERE a = const ORDER BY a, d        /* d is not part of index */
                                               -5-
Sample Schema
  CREATE TABLE `message` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `title` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
    `user_id` int(11) NOT NULL,
    `content` text COLLATE utf8_unicode_ci NOT NULL,
    `create_time` int(11) NOT NULL,
    `thumbs_up` int(11) NOT NULL DEFAULT '0', /* Vote Count */
    PRIMARY KEY (`id`),
    KEY `thumbs_up_key` (`thumbs_up`,`id`)
  ) ENGINE=InnoDB

   mysql> show table status like 'message' G
           Engine: InnoDB
          Version: 10
       Row_format: Compact
             Rows: 50000040    /* 50 Million */
   Avg_row_length: 565
      Data_length: 28273803264 /* 26 GB */
     Index_length: 789577728   /* 753 MB */
        Data_free: 6291456
      Create_time: 2009-04-20 13:30:45

  Two use case:
  • Paginate by time, recent message one page one
  • Paginate by thumps_up, largest value on page one
                                          -6-
Typical Query

  1. Get the total records
     SELECT count(*) FROM message


  2. Get current page
     SELECT * FROM message
     ORDER BY id DESC LIMIT 0, 20


  •   http://domain.com/message?page=1
              • ORDER BY id DESC LIMIT 0,                 20

  •   http://domain.com/message?page=2
              • ORDER BY id DESC LIMIT 20,                 20

  •   http://domain.com/message?page=3
              • ORDER BY id DESC LIMIT 40,                 20


  Note: id is auto_increment, same as create_time order, no need to create index on create_time, save space



        –                                           -7-
Explain


  mysql> explain SELECT * FROM message
         ORDER BY id DESC
         LIMIT 10000, 20G
  ***************** 1. row **************
             id: 1
    select_type: SIMPLE
          table: message
           type: index
  possible_keys: NULL
            key: PRIMARY
        key_len: 4
            ref: NULL
           rows: 10020
          Extra:
  1 row in set (0.00 sec)
     – it can read rows using index scan and execution will stop as soon as it finds
       required rows.
     – LIMIT 10000, 20 means it has to read 10020 and throw away 10000 rows, then
       return next 20 rows.


                                          -8-
Performance Implications

     – Larger OFFSET is going to increase active data set, MySQL has to bring data
       in memory that is never returned to caller.
     – Performance issue is more visible when your have database that can't fit in
       main memory.
     – Small percentage of request with large OFFSET would be able to hit disk I/O
       Disk I/O bottleneck
     – In order to display “21 to 40 of 1000,000” , some one has to count 1000,000
       rows.




                                         -9-
Simple Solution

     – Do not display total records, does user really care?



     – Do not let user go to deep pages, redirect him
       http://en.wikipedia.org/wiki/Internet_addiction_disorder after certain number of
       pages




                                          - 10 -
Avoid Count(*)


  1. Never display total messages, let user see more message by clicking
     'next'



  2. Do not count on every request, cache it, display stale count, user do not
     care about 324533 v/s 324633


  3. Display 41 to 80 of Thousands


  4. Use pre calculated count, increment/decrement value as insert/delete
     happens.




                                       - 11 -
Solution to avoid offset

  1. Change User Interface
      – No direct jumps to Nth page



  2. LIMIT N is fine, Do not use LIMIT M,N
      – Provide extra clue about from where to start given page
      – Find the desired records using more restricted WHERE using given clue and
        ORDER BY and LIMIT N without OFFSET)




                                         - 12 -
Find the clue

   150
   111
   102                       Page One
   101
   100   <a href=”/page=2;last_seen=100;dir=next>Next</a>

   98    <a href=”/page=1;last_seen=98;dir=prev>Prev</a>
   97
   96                        Page Two
   95
   94    <a href=”/page=3;last_seen=94;dir=next>Next</a>

   93    <a href=”/page=3;last_seen=93;dir=prev>Prev</a>
   92
   91                         Page Three
   90
   89    <a href=”/page=4;last_seen=89;dir=prev>Next</a>




                                   - 13 -
Solution using clue

  Next Page:
  http://domain.com/forum?page=2&last_seen=100&dir=next

           WHERE id < 100 /* last_seen *
           ORDER BY id DESC LIMIT $page_size /* No OFFSET*/


  Prev Page:
  http://domain.com/forum?page=1&last_seen=98&dir=prev

           WHERE id > 98 /* last_seen *
           ORDER BY id ASC LIMIT $page_size /* No OFFSET*/


           Reverse given 10 rows before sending to user


                                 - 14 -
Explain

  mysql> explain
         SELECT * FROM message
         WHERE id < '49999961'
         ORDER BY id DESC LIMIT 20 G
  *************************** 1. row ***************************
             id: 1
    select_type: SIMPLE
          table: message
           type: range
  possible_keys: PRIMARY
            key: PRIMARY
        key_len: 4
            ref: NULL
           Rows: 25000020 /* ignore this */
          Extra: Using where
  1 row in set (0.00 sec)




                                   - 15 -
What about order by non unique values?


                         99
                         99
                         98    Page One
                         98
                         98
                         98
                         98
                         97    Page Two
                         97
                         10
  We can't do:
            WHERE thumbs_up < 98
            ORDER BY thumbs_up DESC /* It will return few seen rows */

  Can we say this:
            WHERE thumbs_up <= 98
            AND <extra_con>
            ORDER BY thumbs_up DESC

                                      - 16 -
Add more condition

  •   Consider thumbs_up as major number
       – if we have additional minor number, we can use combination of major & minor
         as extra condition




  •   Find additional column (minor number)
       – we can use id primary key as minor number




                                          - 17 -
Solution
                                     First Page
  SELECT thumbs_up, id
  FROM message
  ORDER BY thumbs_up DESC, id DESC
  LIMIT $page_size

  +-----------+----+
  | thumbs_up | id |
  +-----------+----+
  |        99 | 14 |
  |        99 | 2 |
  |        98 | 18 |
  |        98 | 15 |
  |        98 | 13 |
  +-----------+----+
                                     Next Page
  SELECT thumbs_up, id
  FROM message
  WHERE thumbs_up <= 98 AND (id < 13 OR thumbs_up < 98)
  ORDER BY thumbs_up DESC, id DESC
  LIMIT $page_size

  +-----------+----+
  | thumbs_up | id |
  +-----------+----+
  |        98 | 10 |
  |        98 | 6 |
  |        97 | 17 |
                                        - 18 -
Make it better..
   Query:


   SELECT * FROM message
   WHERE thumbs_up <= 98
            AND (id < 13 OR thumbs_up < 98)
   ORDER BY thumbs_up DESC, id DESC
   LIMIT 20


   Can be written as:


   SELECT m2.* FROM message m1, message m2
   WHERE m1.id = m2.id
            AND m1.thumbs_up <= 98
            AND (m1.id < 13 OR m1.thumbs_up < 98)
   ORDER BY m1.thumbs_up DESC, m1.id DESC
   LIMIT 20;

                                     - 19 -
Explain

  *************************** 1. row ***************************
             id: 1
    select_type: SIMPLE
          table: m1
           type: range
  possible_keys: PRIMARY,thumbs_up_key
            key: thumbs_up_key /* (thumbs_up,id) */
        key_len: 4
            ref: NULL
           Rows: 25000020 /*ignore this, we will read just 20 rows*/
          Extra: Using where; Using index /* Cover */
  *************************** 2. row ***************************
             id: 1
    select_type: SIMPLE
          table: m2
           type: eq_ref
  possible_keys: PRIMARY
            key: PRIMARY
        key_len: 4
            ref: forum.m1.id
           rows: 1
          Extra:

                                   - 20 -
Performance Gain (Primary Key Order)




                       - 21 -
Performance Gain (Secondary Key Order)




                      - 22 -
Throughput Gain


 •   Throughput Gain while hitting first 30 pages:

      – Using LIMIT OFFSET, N
          • 600 query/sec




      – Using LIMIT N (no OFFSET)
          • 3.7k query/sec




                                        - 23 -
Bonus Point

  Product issue with LIMIT M, N


     User is reading a page, in the mean time some records may be added to
     previous page.


     Due to insert/delete pages records are going to move forward/backward
     as rolling window:
      – User is reading messages on 4th page
      – While he was reading, one new message posted (it would be there on page
        one), all pages are going to move one message to next page.
      – User Clicks on Page 5
      – One message from page got pushed forward on page 5, user has to read it
        again


  No such issue with news approach

                                        - 24 -
Drawback

  Search Engine Optimization Expert says:
  Let bot reach all you pages with fewer number of deep dive


  Two Solutions:
  •   Read extra rows
       – Read extra rows in advance and construct links for few previous & next pages


  •   Use small offset
       – Do not read extra rows in advance, just add links for few past & next pages
         with required offset & last_seen_id on current page
       – Do query using new approach with small offset to display desired page
       –
                             file:///Users/surat/Desktop/Picture%2043.png




  Additional concern: Dynamic urls, last_seen is not constant over time.
                                                                            - 25 -
Thanks




  - 26 -

More Related Content

What's hot

Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresqlbotsplash.com
 
Improving notes addressing experience with recent contacts
Improving notes addressing experience with recent contactsImproving notes addressing experience with recent contacts
Improving notes addressing experience with recent contactsVinayak Tavargeri
 
MySQL partitions tutorial
MySQL partitions tutorialMySQL partitions tutorial
MySQL partitions tutorialGiuseppe Maxia
 
Advanced REXX Programming Techniques
Advanced REXX Programming TechniquesAdvanced REXX Programming Techniques
Advanced REXX Programming TechniquesDan O'Dea
 
Important tips on Router and SMTP mail routing
Important tips on Router and SMTP mail routingImportant tips on Router and SMTP mail routing
Important tips on Router and SMTP mail routingjayeshpar2006
 
SIP: Call Id, Cseq, Via-branch, From & To-tag role play
SIP: Call Id, Cseq, Via-branch, From & To-tag role playSIP: Call Id, Cseq, Via-branch, From & To-tag role play
SIP: Call Id, Cseq, Via-branch, From & To-tag role playSridhar Kumar N
 
Reading the LISTCAT entries for VSAM
Reading the LISTCAT entries for VSAMReading the LISTCAT entries for VSAM
Reading the LISTCAT entries for VSAMDan O'Dea
 
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster PerformanceWebinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster PerformanceAltinity Ltd
 
Bp101-Can Domino Be Hacked
Bp101-Can Domino Be HackedBp101-Can Domino Be Hacked
Bp101-Can Domino Be HackedHoward Greenberg
 
Mysql Explain Explained
Mysql Explain ExplainedMysql Explain Explained
Mysql Explain ExplainedJeremy Coates
 
Indexing the MySQL Index: Key to performance tuning
Indexing the MySQL Index: Key to performance tuningIndexing the MySQL Index: Key to performance tuning
Indexing the MySQL Index: Key to performance tuningOSSCube
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesJonathan Katz
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAltinity Ltd
 
PostgreSQL Deep Internal
PostgreSQL Deep InternalPostgreSQL Deep Internal
PostgreSQL Deep InternalEXEM
 
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)Hemant Kumar Singh
 
HTTP - The Other Face Of Domino
HTTP - The Other Face Of DominoHTTP - The Other Face Of Domino
HTTP - The Other Face Of DominoGabriella Davis
 
80 different SQL Queries with output
80 different SQL Queries with output80 different SQL Queries with output
80 different SQL Queries with outputNexus
 
Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redisDaeMyung Kang
 
Dns protocol design attacks and security
Dns protocol design attacks and securityDns protocol design attacks and security
Dns protocol design attacks and securityMichael Earls
 

What's hot (20)

Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
 
Improving notes addressing experience with recent contacts
Improving notes addressing experience with recent contactsImproving notes addressing experience with recent contacts
Improving notes addressing experience with recent contacts
 
MySQL partitions tutorial
MySQL partitions tutorialMySQL partitions tutorial
MySQL partitions tutorial
 
Advanced REXX Programming Techniques
Advanced REXX Programming TechniquesAdvanced REXX Programming Techniques
Advanced REXX Programming Techniques
 
Important tips on Router and SMTP mail routing
Important tips on Router and SMTP mail routingImportant tips on Router and SMTP mail routing
Important tips on Router and SMTP mail routing
 
SIP: Call Id, Cseq, Via-branch, From & To-tag role play
SIP: Call Id, Cseq, Via-branch, From & To-tag role playSIP: Call Id, Cseq, Via-branch, From & To-tag role play
SIP: Call Id, Cseq, Via-branch, From & To-tag role play
 
Reading the LISTCAT entries for VSAM
Reading the LISTCAT entries for VSAMReading the LISTCAT entries for VSAM
Reading the LISTCAT entries for VSAM
 
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster PerformanceWebinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
 
Bp101-Can Domino Be Hacked
Bp101-Can Domino Be HackedBp101-Can Domino Be Hacked
Bp101-Can Domino Be Hacked
 
Mysql Explain Explained
Mysql Explain ExplainedMysql Explain Explained
Mysql Explain Explained
 
Indexing the MySQL Index: Key to performance tuning
Indexing the MySQL Index: Key to performance tuningIndexing the MySQL Index: Key to performance tuning
Indexing the MySQL Index: Key to performance tuning
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Operating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with KubernetesOperating PostgreSQL at Scale with Kubernetes
Operating PostgreSQL at Scale with Kubernetes
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
 
PostgreSQL Deep Internal
PostgreSQL Deep InternalPostgreSQL Deep Internal
PostgreSQL Deep Internal
 
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
 
HTTP - The Other Face Of Domino
HTTP - The Other Face Of DominoHTTP - The Other Face Of Domino
HTTP - The Other Face Of Domino
 
80 different SQL Queries with output
80 different SQL Queries with output80 different SQL Queries with output
80 different SQL Queries with output
 
Troubleshooting redis
Troubleshooting redisTroubleshooting redis
Troubleshooting redis
 
Dns protocol design attacks and security
Dns protocol design attacks and securityDns protocol design attacks and security
Dns protocol design attacks and security
 

Viewers also liked

Pagination Done the Right Way
Pagination Done the Right WayPagination Done the Right Way
Pagination Done the Right WayMarkus Winand
 
Modern SQL in Open Source and Commercial Databases
Modern SQL in Open Source and Commercial DatabasesModern SQL in Open Source and Commercial Databases
Modern SQL in Open Source and Commercial DatabasesMarkus Winand
 
A FRAMEWORK TO BUILD A KILLER CUSTOMER SUCESS SCORECARD
A FRAMEWORK TO BUILD A KILLER CUSTOMER SUCESS SCORECARDA FRAMEWORK TO BUILD A KILLER CUSTOMER SUCESS SCORECARD
A FRAMEWORK TO BUILD A KILLER CUSTOMER SUCESS SCORECARDTotango
 
[2014 CodeEngn Conference 11] 최우석 - 자바스크립트 난독화 너네 뭐니?
[2014 CodeEngn Conference 11] 최우석 - 자바스크립트 난독화 너네 뭐니?[2014 CodeEngn Conference 11] 최우석 - 자바스크립트 난독화 너네 뭐니?
[2014 CodeEngn Conference 11] 최우석 - 자바스크립트 난독화 너네 뭐니?GangSeok Lee
 
SQL Transactions - What they are good for and how they work
SQL Transactions - What they are good for and how they workSQL Transactions - What they are good for and how they work
SQL Transactions - What they are good for and how they workMarkus Winand
 
[2014 CodeEngn Conference 11] 김기홍 - 빅데이터 기반 악성코드 자동 분석 플랫폼
[2014 CodeEngn Conference 11] 김기홍 - 빅데이터 기반 악성코드 자동 분석 플랫폼[2014 CodeEngn Conference 11] 김기홍 - 빅데이터 기반 악성코드 자동 분석 플랫폼
[2014 CodeEngn Conference 11] 김기홍 - 빅데이터 기반 악성코드 자동 분석 플랫폼GangSeok Lee
 
How Modern SQL Databases Come up with Algorithms that You Would Have Never Dr...
How Modern SQL Databases Come up with Algorithms that You Would Have Never Dr...How Modern SQL Databases Come up with Algorithms that You Would Have Never Dr...
How Modern SQL Databases Come up with Algorithms that You Would Have Never Dr...Lukas Eder
 
10 SQL Tricks that You Didn't Think Were Possible
10 SQL Tricks that You Didn't Think Were Possible10 SQL Tricks that You Didn't Think Were Possible
10 SQL Tricks that You Didn't Think Were PossibleLukas Eder
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M usersJongyoon Choi
 
A simple example of Earned Value Management (EVM) in action
A simple example of Earned Value Management (EVM) in actionA simple example of Earned Value Management (EVM) in action
A simple example of Earned Value Management (EVM) in actionPlanisware
 

Viewers also liked (10)

Pagination Done the Right Way
Pagination Done the Right WayPagination Done the Right Way
Pagination Done the Right Way
 
Modern SQL in Open Source and Commercial Databases
Modern SQL in Open Source and Commercial DatabasesModern SQL in Open Source and Commercial Databases
Modern SQL in Open Source and Commercial Databases
 
A FRAMEWORK TO BUILD A KILLER CUSTOMER SUCESS SCORECARD
A FRAMEWORK TO BUILD A KILLER CUSTOMER SUCESS SCORECARDA FRAMEWORK TO BUILD A KILLER CUSTOMER SUCESS SCORECARD
A FRAMEWORK TO BUILD A KILLER CUSTOMER SUCESS SCORECARD
 
[2014 CodeEngn Conference 11] 최우석 - 자바스크립트 난독화 너네 뭐니?
[2014 CodeEngn Conference 11] 최우석 - 자바스크립트 난독화 너네 뭐니?[2014 CodeEngn Conference 11] 최우석 - 자바스크립트 난독화 너네 뭐니?
[2014 CodeEngn Conference 11] 최우석 - 자바스크립트 난독화 너네 뭐니?
 
SQL Transactions - What they are good for and how they work
SQL Transactions - What they are good for and how they workSQL Transactions - What they are good for and how they work
SQL Transactions - What they are good for and how they work
 
[2014 CodeEngn Conference 11] 김기홍 - 빅데이터 기반 악성코드 자동 분석 플랫폼
[2014 CodeEngn Conference 11] 김기홍 - 빅데이터 기반 악성코드 자동 분석 플랫폼[2014 CodeEngn Conference 11] 김기홍 - 빅데이터 기반 악성코드 자동 분석 플랫폼
[2014 CodeEngn Conference 11] 김기홍 - 빅데이터 기반 악성코드 자동 분석 플랫폼
 
How Modern SQL Databases Come up with Algorithms that You Would Have Never Dr...
How Modern SQL Databases Come up with Algorithms that You Would Have Never Dr...How Modern SQL Databases Come up with Algorithms that You Would Have Never Dr...
How Modern SQL Databases Come up with Algorithms that You Would Have Never Dr...
 
10 SQL Tricks that You Didn't Think Were Possible
10 SQL Tricks that You Didn't Think Were Possible10 SQL Tricks that You Didn't Think Were Possible
10 SQL Tricks that You Didn't Think Were Possible
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M users
 
A simple example of Earned Value Management (EVM) in action
A simple example of Earned Value Management (EVM) in actionA simple example of Earned Value Management (EVM) in action
A simple example of Earned Value Management (EVM) in action
 

Similar to Efficient Pagination Using MySQL Without Large OFFSET

Covering indexes
Covering indexesCovering indexes
Covering indexesMYXPLAIN
 
MySQL for business developer - Titouan BENOIT
MySQL for business developer - Titouan BENOITMySQL for business developer - Titouan BENOIT
MySQL for business developer - Titouan BENOITTitouan BENOIT
 
Introduction into MySQL Query Tuning
Introduction into MySQL Query TuningIntroduction into MySQL Query Tuning
Introduction into MySQL Query TuningSveta Smirnova
 
Advanced MySQL Query Tuning
Advanced MySQL Query TuningAdvanced MySQL Query Tuning
Advanced MySQL Query TuningAlexander Rubin
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceAltinity Ltd
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxAltinity Ltd
 
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuPostgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuRedis Labs
 
What's New In MySQL 5.6
What's New In MySQL 5.6What's New In MySQL 5.6
What's New In MySQL 5.6Abdul Manaf
 
Database Oracle Basic
Database Oracle BasicDatabase Oracle Basic
Database Oracle BasicKamlesh Singh
 
MySQL Query tuning 101
MySQL Query tuning 101MySQL Query tuning 101
MySQL Query tuning 101Sveta Smirnova
 
Troubleshooting MySQL Performance
Troubleshooting MySQL PerformanceTroubleshooting MySQL Performance
Troubleshooting MySQL PerformanceSveta Smirnova
 
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQLTen Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQLanandology
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeWim Godden
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007paulguerin
 

Similar to Efficient Pagination Using MySQL Without Large OFFSET (20)

Covering indexes
Covering indexesCovering indexes
Covering indexes
 
MySQL for business developer - Titouan BENOIT
MySQL for business developer - Titouan BENOITMySQL for business developer - Titouan BENOIT
MySQL for business developer - Titouan BENOIT
 
Introduction into MySQL Query Tuning
Introduction into MySQL Query TuningIntroduction into MySQL Query Tuning
Introduction into MySQL Query Tuning
 
Optimizando MySQL
Optimizando MySQLOptimizando MySQL
Optimizando MySQL
 
Oracle
OracleOracle
Oracle
 
Advanced MySQL Query Tuning
Advanced MySQL Query TuningAdvanced MySQL Query Tuning
Advanced MySQL Query Tuning
 
Raj mysql
Raj mysqlRaj mysql
Raj mysql
 
PHP tips by a MYSQL DBA
PHP tips by a MYSQL DBAPHP tips by a MYSQL DBA
PHP tips by a MYSQL DBA
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
 
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuPostgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
 
What's New In MySQL 5.6
What's New In MySQL 5.6What's New In MySQL 5.6
What's New In MySQL 5.6
 
SQL (1).pptx
SQL (1).pptxSQL (1).pptx
SQL (1).pptx
 
Database Oracle Basic
Database Oracle BasicDatabase Oracle Basic
Database Oracle Basic
 
MySQL Query tuning 101
MySQL Query tuning 101MySQL Query tuning 101
MySQL Query tuning 101
 
Troubleshooting MySQL Performance
Troubleshooting MySQL PerformanceTroubleshooting MySQL Performance
Troubleshooting MySQL Performance
 
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQLTen Reasons Why You Should Prefer PostgreSQL to MySQL
Ten Reasons Why You Should Prefer PostgreSQL to MySQL
 
MySQLinsanity
MySQLinsanityMySQLinsanity
MySQLinsanity
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Efficient Pagination Using MySQL Without Large OFFSET

  • 1. Efficient Pagination Using MySQL Surat Singh Bhati (surat@yahoo-inc.com) Rick James (rjames@yahoo-inc.com) Yahoo Inc Percona Performance Conference 2009
  • 2. Outline 1. Overview – Common pagination UI pattern – Sample table and typical solution using OFFSET – Techniques to avoid large OFFSET – Performance comparison – Concerns -2-
  • 4. Basics First step toward having efficient pagination over large data set – Use index to filter rows (resolve WHERE) – Use same index to return rows in sorted order (resolve ORDER) Step zero – http://dev.mysql.com/doc/refman/5.1/en/mysql-indexes.html – http://dev.mysql.com/doc/refman/5.1/en/order-by-optimization.html – http://dev.mysql.com/doc/refman/5.1/en/limit-optimization.html -4-
  • 5. Using Index KEY a_b_c (a, b, c) ORDER may get resolved using Index – ORDER BY a – ORDER BY a,b – ORDER BY a, b, c – ORDER BY a DESC, b DESC, c DESC WHERE and ORDER both resolved using index: – WHERE a = const ORDER BY b, c – WHERE a = const AND b = const ORDER BY c – WHERE a = const ORDER BY b, c – WHERE a = const AND b > const ORDER BY b, c ORDER will not get resolved uisng index (file sort) – ORDER BY a ASC, b DESC, c DESC /* mixed sort direction */ – WHERE g = const ORDER BY b, c /* a prefix is missing */ – WHERE a = const ORDER BY c /* b is missing */ – WHERE a = const ORDER BY a, d /* d is not part of index */ -5-
  • 6. Sample Schema CREATE TABLE `message` ( `id` int(11) NOT NULL AUTO_INCREMENT, `title` varchar(255) COLLATE utf8_unicode_ci NOT NULL, `user_id` int(11) NOT NULL, `content` text COLLATE utf8_unicode_ci NOT NULL, `create_time` int(11) NOT NULL, `thumbs_up` int(11) NOT NULL DEFAULT '0', /* Vote Count */ PRIMARY KEY (`id`), KEY `thumbs_up_key` (`thumbs_up`,`id`) ) ENGINE=InnoDB mysql> show table status like 'message' G Engine: InnoDB Version: 10 Row_format: Compact Rows: 50000040 /* 50 Million */ Avg_row_length: 565 Data_length: 28273803264 /* 26 GB */ Index_length: 789577728 /* 753 MB */ Data_free: 6291456 Create_time: 2009-04-20 13:30:45 Two use case: • Paginate by time, recent message one page one • Paginate by thumps_up, largest value on page one -6-
  • 7. Typical Query 1. Get the total records SELECT count(*) FROM message 2. Get current page SELECT * FROM message ORDER BY id DESC LIMIT 0, 20 • http://domain.com/message?page=1 • ORDER BY id DESC LIMIT 0, 20 • http://domain.com/message?page=2 • ORDER BY id DESC LIMIT 20, 20 • http://domain.com/message?page=3 • ORDER BY id DESC LIMIT 40, 20 Note: id is auto_increment, same as create_time order, no need to create index on create_time, save space – -7-
  • 8. Explain mysql> explain SELECT * FROM message ORDER BY id DESC LIMIT 10000, 20G ***************** 1. row ************** id: 1 select_type: SIMPLE table: message type: index possible_keys: NULL key: PRIMARY key_len: 4 ref: NULL rows: 10020 Extra: 1 row in set (0.00 sec) – it can read rows using index scan and execution will stop as soon as it finds required rows. – LIMIT 10000, 20 means it has to read 10020 and throw away 10000 rows, then return next 20 rows. -8-
  • 9. Performance Implications – Larger OFFSET is going to increase active data set, MySQL has to bring data in memory that is never returned to caller. – Performance issue is more visible when your have database that can't fit in main memory. – Small percentage of request with large OFFSET would be able to hit disk I/O Disk I/O bottleneck – In order to display “21 to 40 of 1000,000” , some one has to count 1000,000 rows. -9-
  • 10. Simple Solution – Do not display total records, does user really care? – Do not let user go to deep pages, redirect him http://en.wikipedia.org/wiki/Internet_addiction_disorder after certain number of pages - 10 -
  • 11. Avoid Count(*) 1. Never display total messages, let user see more message by clicking 'next' 2. Do not count on every request, cache it, display stale count, user do not care about 324533 v/s 324633 3. Display 41 to 80 of Thousands 4. Use pre calculated count, increment/decrement value as insert/delete happens. - 11 -
  • 12. Solution to avoid offset 1. Change User Interface – No direct jumps to Nth page 2. LIMIT N is fine, Do not use LIMIT M,N – Provide extra clue about from where to start given page – Find the desired records using more restricted WHERE using given clue and ORDER BY and LIMIT N without OFFSET) - 12 -
  • 13. Find the clue 150 111 102 Page One 101 100 <a href=”/page=2;last_seen=100;dir=next>Next</a> 98 <a href=”/page=1;last_seen=98;dir=prev>Prev</a> 97 96 Page Two 95 94 <a href=”/page=3;last_seen=94;dir=next>Next</a> 93 <a href=”/page=3;last_seen=93;dir=prev>Prev</a> 92 91 Page Three 90 89 <a href=”/page=4;last_seen=89;dir=prev>Next</a> - 13 -
  • 14. Solution using clue Next Page: http://domain.com/forum?page=2&last_seen=100&dir=next WHERE id < 100 /* last_seen * ORDER BY id DESC LIMIT $page_size /* No OFFSET*/ Prev Page: http://domain.com/forum?page=1&last_seen=98&dir=prev WHERE id > 98 /* last_seen * ORDER BY id ASC LIMIT $page_size /* No OFFSET*/ Reverse given 10 rows before sending to user - 14 -
  • 15. Explain mysql> explain SELECT * FROM message WHERE id < '49999961' ORDER BY id DESC LIMIT 20 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: message type: range possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: NULL Rows: 25000020 /* ignore this */ Extra: Using where 1 row in set (0.00 sec) - 15 -
  • 16. What about order by non unique values? 99 99 98 Page One 98 98 98 98 97 Page Two 97 10 We can't do: WHERE thumbs_up < 98 ORDER BY thumbs_up DESC /* It will return few seen rows */ Can we say this: WHERE thumbs_up <= 98 AND <extra_con> ORDER BY thumbs_up DESC - 16 -
  • 17. Add more condition • Consider thumbs_up as major number – if we have additional minor number, we can use combination of major & minor as extra condition • Find additional column (minor number) – we can use id primary key as minor number - 17 -
  • 18. Solution First Page SELECT thumbs_up, id FROM message ORDER BY thumbs_up DESC, id DESC LIMIT $page_size +-----------+----+ | thumbs_up | id | +-----------+----+ | 99 | 14 | | 99 | 2 | | 98 | 18 | | 98 | 15 | | 98 | 13 | +-----------+----+ Next Page SELECT thumbs_up, id FROM message WHERE thumbs_up <= 98 AND (id < 13 OR thumbs_up < 98) ORDER BY thumbs_up DESC, id DESC LIMIT $page_size +-----------+----+ | thumbs_up | id | +-----------+----+ | 98 | 10 | | 98 | 6 | | 97 | 17 | - 18 -
  • 19. Make it better.. Query: SELECT * FROM message WHERE thumbs_up <= 98 AND (id < 13 OR thumbs_up < 98) ORDER BY thumbs_up DESC, id DESC LIMIT 20 Can be written as: SELECT m2.* FROM message m1, message m2 WHERE m1.id = m2.id AND m1.thumbs_up <= 98 AND (m1.id < 13 OR m1.thumbs_up < 98) ORDER BY m1.thumbs_up DESC, m1.id DESC LIMIT 20; - 19 -
  • 20. Explain *************************** 1. row *************************** id: 1 select_type: SIMPLE table: m1 type: range possible_keys: PRIMARY,thumbs_up_key key: thumbs_up_key /* (thumbs_up,id) */ key_len: 4 ref: NULL Rows: 25000020 /*ignore this, we will read just 20 rows*/ Extra: Using where; Using index /* Cover */ *************************** 2. row *************************** id: 1 select_type: SIMPLE table: m2 type: eq_ref possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: forum.m1.id rows: 1 Extra: - 20 -
  • 21. Performance Gain (Primary Key Order) - 21 -
  • 22. Performance Gain (Secondary Key Order) - 22 -
  • 23. Throughput Gain • Throughput Gain while hitting first 30 pages: – Using LIMIT OFFSET, N • 600 query/sec – Using LIMIT N (no OFFSET) • 3.7k query/sec - 23 -
  • 24. Bonus Point Product issue with LIMIT M, N User is reading a page, in the mean time some records may be added to previous page. Due to insert/delete pages records are going to move forward/backward as rolling window: – User is reading messages on 4th page – While he was reading, one new message posted (it would be there on page one), all pages are going to move one message to next page. – User Clicks on Page 5 – One message from page got pushed forward on page 5, user has to read it again No such issue with news approach - 24 -
  • 25. Drawback Search Engine Optimization Expert says: Let bot reach all you pages with fewer number of deep dive Two Solutions: • Read extra rows – Read extra rows in advance and construct links for few previous & next pages • Use small offset – Do not read extra rows in advance, just add links for few past & next pages with required offset & last_seen_id on current page – Do query using new approach with small offset to display desired page – file:///Users/surat/Desktop/Picture%2043.png Additional concern: Dynamic urls, last_seen is not constant over time. - 25 -
  • 26. Thanks - 26 -