Good Morning,
I have a table that contains couple million rows and I need to view the data ordered by the TimeStamp.
when I tried to do this
SELECT * FROM table ORDER BY date DESC offset 0 LIMIT 200
the MySQL will order all the data and then will response with the 200 rows and this is a performance issue. because its not wise to order everything each time I want to scroll the page !
do you have any idea on how we could improve the performance ?
Firstly you need to create an index based on the date field. This allows the rows to be retrieved in order without having to sort the entire table every time a request is made.
Secondly, paging based on index gets slower the deeper you delve into the result set. To illustrate:
ORDER BY indexedcolumn LIMIT 0, 200 is very fast because it only has to scan 200 rows of the index.
ORDER BY indexedcolumn LIMIT 200, 200 is relatively fast, but requires scanning 400 rows of the index.
ORDER BY indexedcolumn LIMIT 660000, 200 is very slow because it requires scanning 660,200 rows of the index.
Note: even so, this may still be significantly faster than not having an index at all.
You can fix this in a few different ways.
Implement value-based paging, so you're paging based on the value of the last result on the previous page. For example:
WHERE indexedcolumn>[lastval] ORDER BY indexedcolumn LIMIT 200 replacing [lastval] with the value of the last result of the current page. The index allows random access to a particular value, and proceeding forward or backwards from that value.
Only allow users to view the first X rows (eg. 1000). This is no good if the value they want is the 2529th value.
Think of some logical way of breaking up your large table, for example by the first letter, the year, etc so users never have to encounter the entire result set of millions of rows, instead they need to drill down into a specific subset first, which will be a smaller set and quicker to sort.
If you're combining a WHERE and an ORDER BY you'll need to reflect this in the design of your index to enable MySQL to continue to benefit from the index for sorting. For example if your query is:
SELECT * FROM mytable WHERE year='2012' ORDER BY date LIMIT 0, 200
Then your index will need to be on two columns (year, date) in that order.
If your query is:
SELECT * FROM mytable WHERE firstletter='P' ORDER BY date LIMIT 0, 200
Then your index will need to be on the two columns (firstletter, date) in that order.
The idea is that an index on multiple columns allows sorting by any column as long as you specified previous columns to be constants (single values) in a condition. So an index on A, B, C, D and E allows sorting by C if you specify A and B to be constants in a WHERE condition. A and B cannot be ranges.
Related
Say you have a table with n rows, what is the most efficient way to get the first row ever recorded on that table without sorting?
This is guaranteed to work, but becomes slower as the number of records increases:
SELECT * FROM posts ORDER BY created_at DESC LIMIT 1;
UPDATE:
This is even better in case there are multiple records with the same created_at value, but still needs sorting:
SELECT * FROM posts ORDER BY id ASC LIMIT 1;
Imagine a ledger book with 1 million pages and 1 billion lines of records, to get the first ever record, you'd simply turn to the first page and get the one on the top most, right? Regardless of the size of the ledger, you should get the first ever record with the same efficiency. I was hoping I could do the same in MySQL without doing any kind of sorting or ordering. For research purposes. I mean, why not? Why can't MySQL? Is it impossible by design?
This is possible in typical array structures in programming:
array = [1,2,3,4,5]
The first element is in array[0], the second in array[1] and so on. There is no sorting necessary. The last element is array[array_count(array)-1].
I can offer the following two queries to find the most recent record:
SELECT * FROM posts ORDER BY created_at DESC LIMIT 1
and
SELECT *
FROM posts
WHERE created_at = (SELECT MAX(created_at) FROM posts
Both queries would suffer from performance degredation as the table gets larger, because the sorting operation needed to find the most recent created date would take more time.
But in both cases, adding the following index should improve the performance of the query:
ALTER TABLE posts ADD INDEX created_idx (created_at)
MySQL can use an index both in the ORDER BY clause and when finding the max. See the documentation for more information.
i've a table with 550.000 records
SELECT * FROM logs WHERE user = 'user1' ORDER BY date DESC LIMIT 0, 25
this query takes 0.0171 sec. without LIMIT, there are 3537 results
SELECT * FROM logs WHERE user = 'user2' ORDER BY date DESC LIMIT 0, 25
this query takes 3.0868 sec. without LIMIT, there are 13 results
table keys are:
PRIMARY KEY (`id`),
KEY `date` (`date`)
when using "LIMIT 0,25" if there are less records than 25, the query slows down. How can I solve this problem?
Using limit 25 allows the query to stop when it found 25 rows.
If you have 3537 matching rows out of 550.000, it will, on average, assuming equal distribution, have found 25 rows after examining 550.000/3537*25 rows = 3887 rows in a list that is ordered by date (the index on date) or a list that is not ordered at all.
If you have 13 matching rows out of 550.000, limit 25 will have to examine all 550.000 rows (that are 141 times as many rows), so we expect 0.0171 sec * 141 = 2.4s. There are obviously other factors that determine runtime too, but the order of magnitude fits.
There is an additional effect. Unfortunately the index by date does not contain the value for user, so MySQL has to look up that value in the original table, by jumping back and forth in that table (because the data itself is ordered by the primary key). This is slower than reading the unordered table directly.
So actually, not using an index at all could be faster than using an index, if you have a lot of rows to read. You can force MySQL to not use it by using e.g. FROM logs IGNORE INDEX (date), but this will have the effect that it now has to read the whole table in absolutely every case: the last row could be the newest and thus has to be in the resultset, because you ordered by date. So it might slow down your first query - reading the full 550.000 rows fast can be slower than reading 3887 rows slowly by jumping back and forth. (MySQL doesn't know this either beforehand, so it took a choice - for your second query obviously the wrong one).
So how to get faster results?
Have an index that is ordered by user. Then the query for 'user2' can stop after 13 rows, because it knows there are no more rows. And this will now be faster than the query for 'user1', that has to look through 3537 rows and then order them afterwards by date.
The best index for your query would therefore be user, date, because it then knows when to stop looking for further rows AND the list is already ordered the way you want it (and beat your 0.0171s in all cases).
Indexes require some resources too (e.g. hdd space and time to update the index when you update your table), so adding the perfect index for every single query might be counterproductive sometimes for the system as a whole.
My website has more than 20.000.000 entries, entries have categories (FK) and tags (M2M). As for query even like SELECT id FROM table ORDER BY id LIMIT 1000000, 10 MySQL needs to scan 1000010 rows, but that is really unacceptably slow (and pks, indexes, joins etc etc don't help much here, still 1000010 rows). So I am trying to speed up pagination by storing row count and row number with triggers like this:
DELIMITER //
CREATE TRIGGER #trigger_name
AFTER INSERT
ON entry_table FOR EACH ROW
BEGIN
UPDATE category_table SET row_count = (#rc := row_count + 1)
WHERE id = NEW.category_id;
NEW.row_number_in_category = #rc;
END //
And then I can simply:
SELECT *
FROM entry_table
WHERE row_number_in_category > 10
ORDER BY row_number_in_category
LIMIT 10
(now only 10 rows scanned and therefore selects are blazing fast, although inserts are slower, but they are rare comparing to selects, so it is ok)
Is it a bad approach and are there any good alternatives?
Although I like the solution in the question. It may present some issues if data in the entry_table is changed - perhaps deleted or assigned to different categories over time.
It also limits the ways in which the data can be sorted, the method assumes that data is only sorted by the insert order. Covering multiple sort methods requires additional triggers and summary data.
One alternate way of paginating is to pass in offset of the field you are sorting/paginating by instead of an offset to the limit parameter.
Instead of this:
SELECT id FROM table ORDER BY id LIMIT 1000000, 10
Do this - assuming in this scenario that the last result viewed had an id of 1000000.
SELECT id FROM table WHERE id > 1000000 ORDER BY id LIMIT 0, 10
By tracking the offset of the pagination, this can be passed to subsequent queries for data and avoids the database sorting rows that are not ever going to be part of the end result.
If you really only wanted 10 rows out of 20million, you could go further and guess that the next 10 matching rows will occur in the next 1000 overall results. Perhaps with some logic to repeat the query with a larger allowance if this is not the case.
SELECT id FROM table WHERE id BETWEEN 1000000 AND 1001000 ORDER BY id LIMIT 0, 10
This should be significantly faster because the sort will probably be able to limit the result in a single pass.
I am testing my database design under load and I need to retrieve only a fixed number of rows (5000)
I can specify a LIMIT to achieve this, however it seems that the query builds the result set of all rows that match and then returns only the number of rows specified in the limit. Is that how it is implemented?
Is there a for MySQL to read one row, read another one and basically stop when it retrieves the 5000th matching row?
MySQL is smart in that if you specify a LIMIT 5000 in your query, and it is possible to produce that result without generating the whole result set first, then it will not build the whole result.
For instance, the following query:
SELECT * FROM table ORDER BY column LIMIT 5000
This query will need to scan the whole table unless there is an index on column, in which case it does the smart thing and uses the index to find the rows with the smallest column.
SELECT * FROM `your_table` LIMIT 0, 5000
This will display the first 5000 results from the database.
SELECT * FROM `your_table` LIMIT 1001, 5000
This will show records from 1001 to 6000 (counting from 0).
Complexity of such query is O(LIMIT) (unless you specify order by).
It means that if 10000000 rows will match your query, and you specify limit equal to 5000, then the complexity will be O(5000).
#Jarosław Gomułka is right
If you use LIMIT with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result. If ordering is done by using an index, this is very fast. In either case, after the initial rows have been found, there is no need to sort any remainder of the result set, and MySQL does not do so.
if the set is not sorted it terminates the SELECT operation as soon as it's got enough rows to the result set.
The exact plan the query optimizer uses depends on your query (what fields are being selected, the LIMIT amount and whether there is an ORDER BY) and your table (keys, indexes, and number of rows in the table). Selecting an unindexed column and/or ordering by a non-key column is going to produce a different execution plan than selecting a column and ordering by the primary key column. The later will not even touch the table, and only process the number of rows specified in your LIMIT.
Each database defines its own way of limiting the result set size depends on the database you are using.
While the SQL:2008 specification defines a standard syntax for limiting a SQL query, MySQL 8 does not support it.
Therefore, on MySQL, you need to use the LIMIT clause to restrict the result set to the Top-N records:
SELECT
title
FROM
post
ORDER BY
id DESC
LIMIT 50
Notice that we are using an ORDER BY clause since, otherwise, there is no guarantee which are the first records to be included in the returning result set.
I want to run a simple query to get the "n" oldest records in the table. (It has a creation_date column).
How can i get that without using "order-by". It is a very big table and using order by on entire table to get only "n" records is not so convincing.
(Assume n << size of table)
When you are concerned about performance, you should probably not discard the use of order by too early.
Queries like that can be implemende as Top-N query supported by an appropriate index, that's running very fast because it doesn't need to sort the entire table, not even the selecte rows, because the data is already sorted in the index.
example:
select *
from table
where A = ?
order by creation_date
limit 10;
without appropriate index it will be slow if you are having lot's of data. However, if you create an index like that:
create index test on table (A, creation_date );
The query will be able to start fetching the rows in the correct order, without sorting, and stop when the limit is reached.
Recipe: put the where columns in the index, followed by the order by columns.
If there is no where clause, just put the order by into the index. The order by must match the index definition, especially if there are mixed asc/desc orders.
The indexed Top-N query is the performance king--make sure to use them.
I few links for further reading (all mine):
How to use index efficienty in mysql query
http://blog.fatalmind.com/2010/07/30/analytic-top-n-queries/ (Oracle centric)
http://Use-The-Index-Luke.com/ (not yet covering Top-N queries, but that's to come in 2011).
I haven't tested this concept before but try and create an index on the creation_date column. Which will automatically sort the rows is ascending order. Then your select query can use the orderby creation_date desc with the Limit 20 to get the first 20 records. The database engine should realize the index has already done the work sorting and wont actually need to sort, because the index has already sorted it on save. All it needs to do is read the last 20 records from the index.
Worth a try.
Create an index on creation_date and query by using order by creation_date asc|desc limit n and the response will be very fast (in fact it cannot be faster). For the "latest n" scenario you need to use desc.
If you want more constraints on this query (e.g where state='LIVE') then the query may become very slow and you'll need to reconsider the indexing strategy.
You can use Group By if your grouping some data and then Having clause to select specific records.