How does MySQL process ORDER BY and LIMIT in a query? - mysql

I have a query that looks like this:
SELECT article FROM table1 ORDER BY publish_date LIMIT 20
How does ORDER BY work? Will it order all records, then get the first 20, or will it get 20 records and order them by the publish_date field?
If it's the last one, you're not guaranteed to really get the most recent 20 articles.

It will order first, then get the first 20. A database will also process anything in the WHERE clause before ORDER BY.

The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. LIMIT takes one or two numeric arguments, which must both be nonnegative integer constants (except when using prepared statements).
With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return. The offset of the initial row is 0 (not 1):
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
To retrieve all rows from a certain offset up to the end of the result set, you can use some large number for the second parameter. This statement retrieves all rows from the 96th row to the last:
SELECT * FROM tbl LIMIT 95,18446744073709551615;
With one argument, the value specifies the number of rows to return from the beginning of the result set:
SELECT * FROM tbl LIMIT 5; # Retrieve first 5 rows
In other words, LIMIT row_count is equivalent to LIMIT 0, row_count.
All details on: http://dev.mysql.com/doc/refman/5.0/en/select.html

Just as #James says, it will order all records, then get the first 20 rows.
As it is so, you are guaranteed to get the 20 first published articles, the newer ones will not be shown.
In your situation, I recommend that you add desc to order by publish_date, if you want the newest articles, then the newest article will be first.
If you need to keep the result in ascending order, and still only want the 10 newest articles you can ask mysql to sort your result two times.
This query below will sort the result descending and limit the result to 10 (that is the query inside the parenthesis). It will still be sorted in descending order, and we are not satisfied with that, so we ask mysql to sort it one more time. Now we have the newest result on the last row.
select t.article
from
(select article, publish_date
from table1
order by publish_date desc limit 10) t
order by t.publish_date asc;
If you need all columns, it is done this way:
select t.*
from
(select *
from table1
order by publish_date desc limit 10) t
order by t.publish_date asc;
I use this technique when I manually write queries to examine the database for various things. I have not used it in a production environment, but now when I bench marked it, the extra sorting does not impact the performance.

You could add [asc] or [desc] at the end of the order by to get the earliest or latest records
For example, this will give you the latest records first
ORDER BY stamp DESC
Append the LIMIT clause after ORDER BY

If there is a suitable index, in this case on the publish_date field, then MySQL need not scan the whole index to get the 20 records requested - the 20 records will be found at the start of the index. But if there is no suitable index, then a full scan of the table will be needed.
There is a MySQL Performance Blog article from 2009 on this.

You can use this code
SELECT article FROM table1 ORDER BY publish_date LIMIT 0,10
where 0 is a start limit of record & 10 number of record

LIMIT is usually applied as the last operation, so the result will first be sorted and then limited to 20. In fact, sorting will stop as soon as first 20 sorted results are found.

Could be simplified to this:
SELECT article FROM table1 ORDER BY publish_date DESC FETCH FIRST 20 ROWS ONLY;
You could also add many argument in the ORDER BY that is just comma separated like: ORDER BY publish_date, tab2, tab3 DESC etc...

Related

MySQL: optimize pagination queries [duplicate]

Scenario in short: A table with more than 16 million records [2GB in size]. The higher LIMIT offset with SELECT, the slower the query becomes, when using ORDER BY *primary_key*
So
SELECT * FROM large ORDER BY `id` LIMIT 0, 30
takes far less than
SELECT * FROM large ORDER BY `id` LIMIT 10000, 30
That only orders 30 records and same eitherway. So it's not the overhead from ORDER BY.
Now when fetching the latest 30 rows it takes around 180 seconds. How can I optimize that simple query?
I had the exact same problem myself. Given the fact that you want to collect a large amount of this data and not a specific set of 30 you'll be probably running a loop and incrementing the offset by 30.
So what you can do instead is:
Hold the last id of a set of data(30) (e.g. lastId = 530)
Add the condition WHERE id > lastId limit 0,30
So you can always have a ZERO offset. You will be amazed by the performance improvement.
It's normal that higher offsets slow the query down, since the query needs to count off the first OFFSET + LIMIT records (and take only LIMIT of them). The higher is this value, the longer the query runs.
The query cannot go right to OFFSET because, first, the records can be of different length, and, second, there can be gaps from deleted records. It needs to check and count each record on its way.
Assuming that id is the primary key of a MyISAM table, or a unique non-primary key field on an InnoDB table, you can speed it up by using this trick:
SELECT t.*
FROM (
SELECT id
FROM mytable
ORDER BY
id
LIMIT 10000, 30
) q
JOIN mytable t
ON t.id = q.id
See this article:
MySQL ORDER BY / LIMIT performance: late row lookups
MySQL cannot go directly to the 10000th record (or the 80000th byte as your suggesting) because it cannot assume that it's packed/ordered like that (or that it has continuous values in 1 to 10000). Although it might be that way in actuality, MySQL cannot assume that there are no holes/gaps/deleted ids.
So, as bobs noted, MySQL will have to fetch 10000 rows (or traverse through 10000th entries of the index on id) before finding the 30 to return.
EDIT : To illustrate my point
Note that although
SELECT * FROM large ORDER BY id LIMIT 10000, 30
would be slow(er),
SELECT * FROM large WHERE id > 10000 ORDER BY id LIMIT 30
would be fast(er), and would return the same results provided that there are no missing ids (i.e. gaps).
I found an interesting example to optimize SELECT queries ORDER BY id LIMIT X,Y.
I have 35million of rows so it took like 2 minutes to find a range of rows.
Here is the trick :
select id, name, address, phone
FROM customers
WHERE id > 990
ORDER BY id LIMIT 1000;
Just put the WHERE with the last id you got increase a lot the performance. For me it was from 2minutes to 1 second :)
Other interesting tricks here : http://www.iheavy.com/2013/06/19/3-ways-to-optimize-for-paging-in-mysql/
It works too with strings
The time-consuming part of the two queries is retrieving the rows from the table. Logically speaking, in the LIMIT 0, 30 version, only 30 rows need to be retrieved. In the LIMIT 10000, 30 version, 10000 rows are evaluated and 30 rows are returned. There can be some optimization can be done my the data-reading process, but consider the following:
What if you had a WHERE clause in the queries? The engine must return all rows that qualify, and then sort the data, and finally get the 30 rows.
Also consider the case where rows are not processed in the ORDER BY sequence. All qualifying rows must be sorted to determine which rows to return.
For those who are interested in a comparison and figures :)
Experiment 1: The dataset contains about 100 million rows. Each row contains several BIGINT, TINYINT, as well as two TEXT fields (deliberately) containing about 1k chars.
Blue := SELECT * FROM post ORDER BY id LIMIT {offset}, 5
Orange := #Quassnoi's method. SELECT t.* FROM (SELECT id FROM post ORDER BY id LIMIT {offset}, 5) AS q JOIN post t ON t.id = q.id
Of course, the third method, ... WHERE id>xxx LIMIT 0,5, does not appear here since it should be constant time.
Experiment 2: Similar thing, except that one row only has 3 BIGINTs.
green := the blue before
red := the orange before

Query next 30 records by alphabetical order

Let say I retrieve my list of data...
First
$Query = mysql_query("SELECT * FROM data ORDER BY name ASC LIMIT 0,30");
From there I am using AJAX/jQuery to to append an additional 30 records. Which I need to call another file where the query will pick up where I left off.
Second
$Query = mysql_query("SELECT * FROM data WHERE name < '".mysql_real_escape_string($_GET['lastRecord'])."' ORDER BY name ASC LIMIT 0,30");
My question is, if I have the first query order the first 30 by name, how can I write the second query to pick up where I left off in that order?
Change the LIMIY like this - LIMIT 30, 30
BTW, you tagged this as jQuery and it isn't.
This option is good for short tables, but in a long tables this will give bad performance.
Every query (like that) you execute will do full table scan with filesort which takes some time.
I think you need to read the table once, and put the result in a temporary table (already sorted) with some index column and then use WHERE index BETWEEN $number AND ($number+30).
You don't need to add a WHERE clause to your query, you just need to use LIMIT to slice out the chunks of results you want. Limit is used to limit your MySQL query results to those that fall within a specified range. You can use it to show the first X number of results, or to show a range from X - Y results. The syntax is Limit X, Y where X is the starting point and Y is the number of records to retrieve.
For example:
$Query = mysql_query("SELECT * FROM data ORDER BY name ASC LIMIT 90,30");
would retrieve 30 records beginning with the 91st (records are zero indexed).
The first number in the LIMIT clause is the offset, starting at 0. So, your first query is correctly LIMIT 0, 30 to get the first 30.
Simply increase the offset by 30 to start at record 31, so to get 30 more do LIMIT 30, 30.
For the third page, it would be LIMIT 60, 30, etc.
If you need a page x of n type display, issue a COUNT(*) first to get the total number of rows, but understand that the actual rows and row count may change if rows are being inserted or deleted between calls.

How does 'LIMIT' parameter work in sql?

I have 4000 rows for example, and I define X limit.
The query stops after it finds X rows? or the query finds all the rows and then takes X rows from the found rows?
Thank you.
From MySQL Reference Manual:
If you use LIMIT row_count with ORDER BY, MySQL ends the sorting as
soon as it has found the first row_count rows of the sorted result,
rather than sorting the entire result. If ordering is done by using an
index, this is very fast. If a filesort must be done, all rows that
match the query without the LIMIT clause must be selected, and most or
all of them must be sorted, before it can be ascertained that the
first row_count rows have been found. In either case, after the
initial rows have been found, there is no need to sort any remainder
of the result set, and MySQL does not do so.
So it looks like it's possible that the entire result set is known before the LIMIT is applied. But MySQL will try everything it can not to do so. And you can help it by providing useful indexes that match your queries.
EDIT: Furthermore, if the set is not sorted it terminates the SELECT operation as soon as it's streamed enough rows to the result set.
SELECT * FROM your_table LIMIT 0, 10
This will display the first 10 results from the database.
SELECT * FROM your_table LIMIT 5, 5
This will show records 6, 7, 8, 9, and 10
It's like telling MySql; I want you to start counting from 5+1 or the 6th record, but Select only upto 5 records
I'm assuming you're thinking about MySQL, in which according to the documentation, the answer is it depends. If you're using a LIMIT (without a HAVING), then:
If you are selecting only a few rows with LIMIT, MySQL uses indexes
in some cases when normally it would prefer to do a full table scan.
As soon as MySQL has sent the required number of rows to the client, it aborts the query unless you are using SQL_CALC_FOUND_ROWS.
There are a few other cases which you should read about in the documentation.
Introduction to MySQL LIMIT clause
The following illustrates the LIMIT clause syntax with two arguments:
SELECT
select_list
FROM
table_name
LIMIT [offset,] row_count;
The offset specifies the offset of the first row to return. The offset of the first row is 0, not 1.
The row_count specifies the maximum number of rows to return.
The following picture illustrates the LIMIT clause:
Therefore, these two clauses are equivalent:
> LIMIT row_count;
> LIMIT 0 , row_count;
The following picture illustrates the evaluation order of the LIMIT clause in the SELECT statement:
It stops after it found the number of rows specified in the LIMIT clause. This can be verified with a large amount of data. It retrieves the result in a time that is not possible if it is getting all the rows of the table and filtering after that.
If you are using MS SQL Server, then you can write it as given below.
Select TOP [x]
*
From MyTable
Hope it helps.
Vamyip

Grab a certain amount of database entries from a table

Is there a way to grab an exact amount of entries from a database example. For example say you had a table that just had an id and total visits for the columns. Say you wanted to grab exactly 20 entries and sort them by total visits. How would you go about this? I know how to sort the whole table, but would like to be able to grab the top twenty total visits and then sort them. Thanks
O and right now I am using sqlite, but I know in the future I will be using mysql also. Thanks
Try with:
SELECT * FROM TableName ORDER BY TotalVisits LIMIT 20
using limit to get the top 20,
and if you want to add another sort, add it after visit column
like :
SELECT * FROM mytable ORDER BY visits DESC
/*here put another order by field like date */
, date
LIMIT 20
Use ORDER - LIMIT clause
SELECT * FROM table ORDER BY field [ASC|DESC] LIMIT 20 OFFSET [offset value]
You need to use LIMIT, but you will need to put the whole thing in a subquery if you intend to re-sort the top 20 based on separate criteria. So
SELECT * from <table> order by <total visits column> LIMIT 20
will get you the top 20, but then to sort within that result you would do something like
SELECT * from
(SELECT * from <table> ORDER BY <total visits column> LIMIT 20)
ORDER BY <other criteria>
The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. LIMIT takes one or two numeric arguments, which must both be nonnegative integer constants, with these exceptions:
With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return. The offset of the initial row is 0 (not 1):
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
To retrieve all rows from a certain offset up to the end of the result set, you can use some large number for the second parameter. This statement retrieves all rows from the 96th row to the last:
SELECT * FROM tbl LIMIT 95,18446744073709551615;
With one argument, the value specifies the number of rows to return from the beginning of the result set:
SELECT * FROM tbl LIMIT 5; # Retrieve first 5 rows
In other words, LIMIT row_count is equivalent to LIMIT 0, row_count.
All on: http://dev.mysql.com/doc/refman/5.5/en/select.html better explnation for mysql, however sqlite works same way: http://www.sqlite.org/lang_select.html

mysql limit - a general question

Assuming the query
SELECT * FROM table WHERE a='1' and b='2' and c>'3' and d>'4' and e!='5' and f='6'
returns 10000 results.
My question is, let's say I limit the search to the first 10 results like this:
SELECT * FROM table WHERE a='1' and b='2' and c>'3' and d>'4' and e!='5' and f='6' LIMIT 10
Will mysql search through all the 10000 results or it will stop at the 10th result?
LIMIT will only display the rows specified, based on their position within the resultset. Without an ORDER BY, you're relying on the order the records were inserted.
You'll probably be interested to read about MySQL's ORDER BY/LIMIT performance...
Since there is no ORDER BY, it will stop at the 10th result (after going through as many non-matching rows as necessary). As OMG Ponies says, which 10 rows you get is unspecified.