Optimizing LIMIT
Queries
If you need only a specified number of rows from a result set, use a LIMIT
clause in the query, rather than fetching the whole result set and throwing away the extra data.
MySQL sometimes optimizes a query that has a LIMIT
clause and no row_count
HAVING
clause:
- If you select only a few rows with
LIMIT
, MariaDB uses indexes in some cases when normally it would prefer to do a full table scan. - If you use
LIMIT
withrow_count
ORDER BY
, MariaDB ends the sorting as soon as it has found the firstrow_count
rows of the sorted result, rather than sorting the entire result. If ordering is done by using an index, this is very fast. If a filesort must be done, all rows that match the query without theLIMIT
clause are selected, and most or all of them are sorted, before the firstrow_count
are found. After the initial rows have been found, MariaDB does not sort any remainder of the result set. - When combining
LIMIT
withrow_count
DISTINCT
, MariaDB stops as soon as it findsrow_count
unique rows. - In some cases, a
GROUP BY
can be resolved by reading the key in order (or doing a sort on the key) and then calculating summaries until the key value changes. In this case,LIMIT
does not calculate any unnecessaryrow_count
GROUP BY
values. - As soon as MariaDB has sent the required number of rows to the client, it aborts the query unless you are using
SQL_CALC_FOUND_ROWS
. LIMIT 0
quickly returns an empty set. This can be useful for checking the validity of a query. When using one of the MariaDB APIs, it can also be employed for obtaining the types of the result columns. (This trick does not work in the MariaDB Monitor (the mysql program), which merely displaysEmpty set
in such cases; instead, useSHOW COLUMNS
orDESCRIBE
for this purpose.)- When the server uses temporary tables to resolve the query, it uses the
LIMIT
clause to calculate how much space is required.row_count
As of MariaDB 5.6.2, the optimizer more efficiently handles queries (and subqueries) of the following form:
SELECT ... FROMsingle_table
... ORDER BYnon_index_column
[DESC] LIMIT [M
,]N
;
That type of query is common in web applications that display only a few rows from a larger result set. For example:
SELECT col1, ... FROM t1 ... ORDER BY name LIMIT 10; SELECT col1, ... FROM t1 ... ORDER BY RAND() LIMIT 15;
The sort buffer has a size of sort_buffer_size
. If the sort elements for N
rows are small enough to fit in the sort buffer (M
+N
rows if M
was specified), the server can avoid using a merge file and perform the sort entirely in memory by treating the sort buffer as a priority queue:
- Scan the table, inserting the select list columns from each selected row in sorted order in the queue. If the queue is full, bump out the last row in the sort order.
- Return the first
N
rows from the queue. (IfM
was specified, skip the firstM
rows and return the nextN
rows.)
Previously, the server performed this operation by using a merge file for the sort:
- Scan the table, repeating these steps through the end of the table:
- Select rows until the sort buffer is filled.
- Write the first
N
rows in the buffer (M
+N
rows ifM
was specified) to a merge file.
- Sort the merge file and return the first
N
rows. (IfM
was specified, skip the firstM
rows and return the nextN
rows.)
The cost of the table scan is the same for the queue and merge-file methods, so the optimizer chooses between methods based on other costs:
- The queue method involves more CPU for inserting rows into the queue in order
- The merge-file method has I/O costs to write and read the file and CPU cost to sort it
The optimizer considers the balance between these factors for particular values of N
and the row size.