mysql slow query when results are less than limit - mysql

i've a table with 550.000 records
SELECT * FROM logs WHERE user = 'user1' ORDER BY date DESC LIMIT 0, 25
this query takes 0.0171 sec. without LIMIT, there are 3537 results
SELECT * FROM logs WHERE user = 'user2' ORDER BY date DESC LIMIT 0, 25
this query takes 3.0868 sec. without LIMIT, there are 13 results
table keys are:
PRIMARY KEY (`id`),
KEY `date` (`date`)
when using "LIMIT 0,25" if there are less records than 25, the query slows down. How can I solve this problem?

Using limit 25 allows the query to stop when it found 25 rows.
If you have 3537 matching rows out of 550.000, it will, on average, assuming equal distribution, have found 25 rows after examining 550.000/3537*25 rows = 3887 rows in a list that is ordered by date (the index on date) or a list that is not ordered at all.
If you have 13 matching rows out of 550.000, limit 25 will have to examine all 550.000 rows (that are 141 times as many rows), so we expect 0.0171 sec * 141 = 2.4s. There are obviously other factors that determine runtime too, but the order of magnitude fits.
There is an additional effect. Unfortunately the index by date does not contain the value for user, so MySQL has to look up that value in the original table, by jumping back and forth in that table (because the data itself is ordered by the primary key). This is slower than reading the unordered table directly.
So actually, not using an index at all could be faster than using an index, if you have a lot of rows to read. You can force MySQL to not use it by using e.g. FROM logs IGNORE INDEX (date), but this will have the effect that it now has to read the whole table in absolutely every case: the last row could be the newest and thus has to be in the resultset, because you ordered by date. So it might slow down your first query - reading the full 550.000 rows fast can be slower than reading 3887 rows slowly by jumping back and forth. (MySQL doesn't know this either beforehand, so it took a choice - for your second query obviously the wrong one).
So how to get faster results?
Have an index that is ordered by user. Then the query for 'user2' can stop after 13 rows, because it knows there are no more rows. And this will now be faster than the query for 'user1', that has to look through 3537 rows and then order them afterwards by date.
The best index for your query would therefore be user, date, because it then knows when to stop looking for further rows AND the list is already ordered the way you want it (and beat your 0.0171s in all cases).
Indexes require some resources too (e.g. hdd space and time to update the index when you update your table), so adding the perfect index for every single query might be counterproductive sometimes for the system as a whole.

Related

understanding mysql limit with non indexed

i have this query which is very simple but i dont want to use index here due to some constraints.
so my worry is how to avoid huge load on server if we are calling non indexed item in where clause.
the solution i feel will be limit.
i am sure of having data in 1000 rows so if i use limit i can check the available values.
SELECT *
from tableA
where status='1' and student='$student_no'
order by id desc
limit 1000
here student column is not indexed in mysql so my worry is it will cause huge load in server
i tried with explain and it seems to be ok but problem is less no of rows in table and as u know mysql goes crazy with more data like millions of rows.
so what are my options ??
i should add index for student ??
if i will add index then i dont need 1000 rows in limit. one row is sufficient and as i said table is going to be several millions of rows so it requires lot of space so i was thinking to avoid indexing of student column and other query is 1000 row with desc row should not cause load on server as id is indexed.
any help will be great
You say:
but i dont want to use index here due to some constraints...
and also say:
how to avoid huge load on server...
If you don't use an index, you'll produce "huge load" on the server. If you want this query to be less resource intensive, you need to add an index. For the aforementioned query the ideal index is:
create index on tableA (student, status, id);
This index should make your query very fast, even with millions of rows.
LIMIT 100 doesn't force the database to search in the first 100 rows.
It just stop searching after 100 matches are found.
So it is not used for performance.
In the query below
SELECT *
from tableA
where status='1' and student='$student_no'
order by id desc
limit 1000
The query will run until it finds 1000 matches.
It doesn't have to search only the first 1000 rows
So this is the behaviour of the above query:
int nb_rows_matched = 0;
while (nb_rows_matched < 1000){
search_for_match();
}

Order by / limit execution in SQL

Lots of thread already on web, just trying to understand some nuances which had me confused!
Quoting the doc reference
If you combine LIMIT row_count with ORDER BY, MySQL stops sorting as
soon as it has found the first row_count rows of the sorted result,
rather than sorting the entire result. If ordering is done by using an
index, this is very fast.
and a SO thread
It will order first, then get the first 20. A database will also
process anything in the WHERE clause before ORDER BY.
Taking the same query from the question :
SELECT article
FROM table1
ORDER BY publish_date
LIMIT 20
lets say table has 2000 rows, of which query is expected to return 20 rows, now, looking at mysql ref ....stops sorting as soon as it has found the first row_count rows.... confuses me as i find it little ambiguous!!
Why does it say stops sorting? isn't the limit clause being applied on an already sorted data returned via order by clause ( assuming its a non-indexed column ) or is my understanding wrong and SQL is limiting first and then sorting!!??
The optimization mentioned in the documentation generally only works if there's an index on the publish_date column. The values are stored in the index in order, so the engine simply iterates through the index of the column, fetching the associated rows, until it has fetched 20 rows.
If the column isn't indexed, the engine will generally need to fetch all the rows, sort them, and then return the first 20 of these.
It's also useful to understand how this interacts with WHERE conditions. Suppose the query is:
SELECT article
FROM table1
WHERE last_read_date > '2018-11-01'
ORDER BY publish_date
LIMIT 20
If publish_date is indexed and last_read_date is not, it will scan the publish_date index in order, test the associated last_read_date against the condition, and add article to the result set if the test succeeds. When there are 20 rows in the result set it will stop and return it.
If last_read_date is indexed and publish_date is not, it will use the last_read_date index to find the subset of all the rows that meet the condition. It will then sort these rows using the publish_date column, and return the first 20 rows from that.
If neither column is indexed it will do a full table scan to test last_read_date, sort all the rows that match the condition, and return the first 20 rows of this.
MySQL stops sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result
This is actually a very sensible optimisation within mysql. If you use limit to return 20 rows and mysql knows it already found them, then why would mysql (or you) care how exactly the rest of the records are sorted? It does not matter, therefore mysql stops sorting the rest of the rows.
If the order by is done on an indexed column, then mysql can tell pretty quickly, if it found the top n records.

MySQL optimization problems with LIMIT keyword

I'm trying to optimize a MySQL query. The below query runs great as long as there are greater than 15 entries in the database for a particular user.
SELECT activityType, activityClass, startDate, endDate, activityNum, count(*) AS activityType
FROM (
SELECT activityType, activityClass, startDate, endDate, activityNum
FROM ActivityX
WHERE user=?
ORDER BY activityNum DESC
LIMIT 15) temp
WHERE startDate=? OR endDate=?
GROUP BY activityType
When there are less than 15 entries, the performance is terrible. My timing is roughly 25 ms vs. 4000 ms. (I need "15" to ensure I get all the relevant data.)
I found these interesting sentences:
"LIMIT N" is the keyword and N is any number starting from 0, putting 0 as the limit does not return any records in the query. Putting a number say 5 will return five records. If the records in the specified table are less than N, then all the records from the queried table are returned in the result set. [source: guru99.com]
To get around this problem, I'm using a heuristic to guess if the number of entries for a user is small - if so, I use a different query that takes about 1500 ms.
Is there anything I'm missing here? I can not use an index since the data is encrypted.
Thanks much,
Jon
I think an index on ActivityX(user, ActivityNum) will solve your problem.
I am guessing that you have an index on (ActivityNum) and the optimizer is trying to figure out if it should use the index. This causes thresholding. The composite index should better match the query.

MySQL paging large data based on a specific order

Good Morning,
I have a table that contains couple million rows and I need to view the data ordered by the TimeStamp.
when I tried to do this
SELECT * FROM table ORDER BY date DESC offset 0 LIMIT 200
the MySQL will order all the data and then will response with the 200 rows and this is a performance issue. because its not wise to order everything each time I want to scroll the page !
do you have any idea on how we could improve the performance ?
Firstly you need to create an index based on the date field. This allows the rows to be retrieved in order without having to sort the entire table every time a request is made.
Secondly, paging based on index gets slower the deeper you delve into the result set. To illustrate:
ORDER BY indexedcolumn LIMIT 0, 200 is very fast because it only has to scan 200 rows of the index.
ORDER BY indexedcolumn LIMIT 200, 200 is relatively fast, but requires scanning 400 rows of the index.
ORDER BY indexedcolumn LIMIT 660000, 200 is very slow because it requires scanning 660,200 rows of the index.
Note: even so, this may still be significantly faster than not having an index at all.
You can fix this in a few different ways.
Implement value-based paging, so you're paging based on the value of the last result on the previous page. For example:
WHERE indexedcolumn>[lastval] ORDER BY indexedcolumn LIMIT 200 replacing [lastval] with the value of the last result of the current page. The index allows random access to a particular value, and proceeding forward or backwards from that value.
Only allow users to view the first X rows (eg. 1000). This is no good if the value they want is the 2529th value.
Think of some logical way of breaking up your large table, for example by the first letter, the year, etc so users never have to encounter the entire result set of millions of rows, instead they need to drill down into a specific subset first, which will be a smaller set and quicker to sort.
If you're combining a WHERE and an ORDER BY you'll need to reflect this in the design of your index to enable MySQL to continue to benefit from the index for sorting. For example if your query is:
SELECT * FROM mytable WHERE year='2012' ORDER BY date LIMIT 0, 200
Then your index will need to be on two columns (year, date) in that order.
If your query is:
SELECT * FROM mytable WHERE firstletter='P' ORDER BY date LIMIT 0, 200
Then your index will need to be on the two columns (firstletter, date) in that order.
The idea is that an index on multiple columns allows sorting by any column as long as you specified previous columns to be constants (single values) in a condition. So an index on A, B, C, D and E allows sorting by C if you specify A and B to be constants in a WHERE condition. A and B cannot be ranges.

MySQL Optimising order by primary key on large table with limit

Sorry for a cryptic title... My issue:
I have a mysql query which in the most simplified form would looks like this:
SELECT * FROM table
WHERE _SOME\_CONDITIONS_
ORDER BY `id` DESC
LIMIT 50
Without the LIMIT clause, the query would return around 50,000 rows, however I am ever only interested in the first 50 rows. Now I realise that because I add ORDER BY bit MySQL has to create a temporary table and load all results in it, then order the 50,000 results and only then it can return the first 50 results.
When I compare performance of this query versus query without ORDER BY I get a staggering difference of 1.8 seconds vs 0.02 seconds.
Given that id is auto incrementing primary key I thought that there should be an elegant work around for my problem. Is there any?
Are the SOME_CONDITIONS such that you could give the query an ID range? At the very least, you could limit the number of rows being added into the temporary table before the sort.
For example:
SELECT * FROM table
WHERE _SOME\_CONDITIONS_
AND id BETWEEN 1 AND 50;
Alternatively maybe a nested query to give the min and max IDs if the SOME_CONDITIONS prevent you from making this sort of assumption about the range of IDs in your result.
If the performance is really that important, I would de-normalize the data by creating another table or cache of the first 50 results and keeping it updated separately.