I'm trying to optimize a MySQL query. The below query runs great as long as there are greater than 15 entries in the database for a particular user.
SELECT activityType, activityClass, startDate, endDate, activityNum, count(*) AS activityType
FROM (
SELECT activityType, activityClass, startDate, endDate, activityNum
FROM ActivityX
WHERE user=?
ORDER BY activityNum DESC
LIMIT 15) temp
WHERE startDate=? OR endDate=?
GROUP BY activityType
When there are less than 15 entries, the performance is terrible. My timing is roughly 25 ms vs. 4000 ms. (I need "15" to ensure I get all the relevant data.)
I found these interesting sentences:
"LIMIT N" is the keyword and N is any number starting from 0, putting 0 as the limit does not return any records in the query. Putting a number say 5 will return five records. If the records in the specified table are less than N, then all the records from the queried table are returned in the result set. [source: guru99.com]
To get around this problem, I'm using a heuristic to guess if the number of entries for a user is small - if so, I use a different query that takes about 1500 ms.
Is there anything I'm missing here? I can not use an index since the data is encrypted.
Thanks much,
Jon
I think an index on ActivityX(user, ActivityNum) will solve your problem.
I am guessing that you have an index on (ActivityNum) and the optimizer is trying to figure out if it should use the index. This causes thresholding. The composite index should better match the query.
Related
If I have a mysql limited query:
SELECT * FROM my_table WHERE date > '2020-12-12' LIMIT 1,16;
Is there a faster way to check and see how many results are left after my limit?
I was trying to do a count with limit, but that wasn't working, i.e.
SELECT count(ID) AS count FROM my_table WHERE date > '2020-12-12' LIMIT 16,32;
The ultimate goal here is just to determine if there ARE any other rows to be had beyond the current result set, so if there is another faster way to do this that would be fine too.
It's best to do this by counting the rows:
SELECT count(*) AS count FROM my_table WHERE date > '2020-12-12'
That tells you how many total rows match the condition. Then you can compare that to the size of the result you got with your query using LIMIT. It's just arithmetic.
Past versions of MySQL had a function FOUND_ROWS() which would report how many rows would have matched if you didn't use LIMIT. But it turns out this had worse performance than running two queries, one to count rows and one to do your limit. So they deprecated this feature.
For details read:
https://www.percona.com/blog/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/
https://dev.mysql.com/worklog/task/?id=12615
(You probably want OFFSET 0, not 1.)
It's simple to test whether there ARE more rows. Assuming you want 16 rows, use 1 more:
SELECT ... WHERE ... ORDER BY ... LIMIT 0,17
Then programmatically see whether it returned only 16 rows (no more available) or 17 (there ARE more).
Because it is piggybacking on the fetch you are already doing and not doing much extra work, it is very efficient.
The second 'page' would use LIMIT 16, 17; 3rd: LIMIT 32,17, etc. Each time, you are potentially getting and tossing an extra row.
I discuss this and other tricks where I point out the evils of OFFSET: Pagination
COUNT(x) checks x for being NOT NULL. This is [usually] unnecessary. The pattern COUNT(*) (or COUNT(1)) simply counts rows; the * or 1 has no significance.
SELECT COUNT(*) FROM t is not free. It will actually do a full index scan, which is slow for a large table. WHERE and ORDER BY are likely to add to that slowness. LIMIT is useless since the result is always 1 row. (That is, the LIMIT is applied to the result, not to the counting.)
i have this query which is very simple but i dont want to use index here due to some constraints.
so my worry is how to avoid huge load on server if we are calling non indexed item in where clause.
the solution i feel will be limit.
i am sure of having data in 1000 rows so if i use limit i can check the available values.
SELECT *
from tableA
where status='1' and student='$student_no'
order by id desc
limit 1000
here student column is not indexed in mysql so my worry is it will cause huge load in server
i tried with explain and it seems to be ok but problem is less no of rows in table and as u know mysql goes crazy with more data like millions of rows.
so what are my options ??
i should add index for student ??
if i will add index then i dont need 1000 rows in limit. one row is sufficient and as i said table is going to be several millions of rows so it requires lot of space so i was thinking to avoid indexing of student column and other query is 1000 row with desc row should not cause load on server as id is indexed.
any help will be great
You say:
but i dont want to use index here due to some constraints...
and also say:
how to avoid huge load on server...
If you don't use an index, you'll produce "huge load" on the server. If you want this query to be less resource intensive, you need to add an index. For the aforementioned query the ideal index is:
create index on tableA (student, status, id);
This index should make your query very fast, even with millions of rows.
LIMIT 100 doesn't force the database to search in the first 100 rows.
It just stop searching after 100 matches are found.
So it is not used for performance.
In the query below
SELECT *
from tableA
where status='1' and student='$student_no'
order by id desc
limit 1000
The query will run until it finds 1000 matches.
It doesn't have to search only the first 1000 rows
So this is the behaviour of the above query:
int nb_rows_matched = 0;
while (nb_rows_matched < 1000){
search_for_match();
}
I have 500000 records table in my MySQL server. When running a query it takes more time for query execution. sometimes it goes beyond a minute.
Below I have added my MySQL machine detail.
RAM-16GB
Processor : Intel(R) -Coreā¢ i5-4460M CPU #3.20GHz
OS: Windows server 64 bit
I know there is no problem with my machine since it is a standalone machine and no other applications there.
Maybe the problem with my query. I have gone through the MySql site and found that I have used proper syntax. But I don't know exactly the reason for the delay in the result.
SELECT SUM(`samplesalesdata50000`.`UnitPrice`) AS `UnitPrice`, `samplesalesdata50000`.`SalesChannel` AS `Channel`
FROM `samplesalesdata50000` AS `samplesalesdata50000`
GROUP BY `samplesalesdata50000`.`SalesChannel`
ORDER BY 2 ASC
LIMIT 200 OFFSET 0
Can anyone please let me know whether the duration, depends on the table or the query that I have used?
Note: Even if try with indexing, there is no much difference in result time.
Thanks
Two approaches to this:
One approach is to create a covering index on the columns needed to satisfy your query. The correct index for your query contains these columns in this order: (SalesChannel, UnitPrice).
Why does this help? For one thing, the index itself contains all data needed to satisfy your query, and nothing else. This means your server does less work.
For another thing, MySQL's indexes are BTREE-organized. That means they're accessible in order. So your query can be satisfied one SalesChannel at a time, and MySQL doesn't need an internal temporary table. That's faster.
A second approach involves recognizing that ORDER BY ... LIMIT is a notorious performance antipattern. You require MySQL to sort a big mess of data, and then discard most of it.
You could try this:
SELECT SUM(UnitPrice) UnitPrice,
SalesChannel Channel
FROM samplesalesdata50000
WHERE SalesChannel IN (
SELECT SalesChannel
FROM samplesalesdata50000
ORDER BY Channel LIMIT 200 OFFSET 0
)
GROUP BY SalesChannel
ORDER BY SalesChannel
LIMIT 200 OFFSET 0
If you have an index on SalesChannel (the covering index mentioned above works) this should speed you up a lot, because your aggregate (GROUP BY) query need only consider a subset of your table.
Your problem with "ORDER BY 2 ASC". Try this "ORDER BY Channel".
If it was MS SQL Server you would use the WITH (NOLOCK)
and the MYSQL equivalent is
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED ;
SELECT SUM(`samplesalesdata50000`.`UnitPrice`) AS `UnitPrice`, `samplesalesdata50000`.`SalesChannel` AS `Channel`
FROM `samplesalesdata50000` AS `samplesalesdata50000`
GROUP BY `samplesalesdata50000`.`SalesChannel`
ORDER BY SalesChannel ASC
LIMIT 200 OFFSET 0
COMMIT ;
To improve on OJones's answer, note that
SELECT SalesChannel FROM samplesalesdata50000
ORDER BY SalesChannel LIMIT 200, 1
will quickly (assuming the index given) find the end of the desired list. Then adding this limits the main query to only the rows needed:
WHERE SalesChannel < (that-select)
There is, however, a problem. If there are fewer than 200 rows in the table, the subquery will return nothing.
You seem to be setting up for "paginating"? In that case, a similar technique can be used to find the starting value:
WHERE SalesChannel >= ...
AND SalesChannel < ...
This also avoids using the inefficient OFFSET, which has to read, then toss, all the rows being skipped over. More
But the real solution may be to build and maintain a Summary Table of the data. It would contain subtotals for each, say, month. Then run the query against the Summary table -- it might be 10x faster. More
i've a table with 550.000 records
SELECT * FROM logs WHERE user = 'user1' ORDER BY date DESC LIMIT 0, 25
this query takes 0.0171 sec. without LIMIT, there are 3537 results
SELECT * FROM logs WHERE user = 'user2' ORDER BY date DESC LIMIT 0, 25
this query takes 3.0868 sec. without LIMIT, there are 13 results
table keys are:
PRIMARY KEY (`id`),
KEY `date` (`date`)
when using "LIMIT 0,25" if there are less records than 25, the query slows down. How can I solve this problem?
Using limit 25 allows the query to stop when it found 25 rows.
If you have 3537 matching rows out of 550.000, it will, on average, assuming equal distribution, have found 25 rows after examining 550.000/3537*25 rows = 3887 rows in a list that is ordered by date (the index on date) or a list that is not ordered at all.
If you have 13 matching rows out of 550.000, limit 25 will have to examine all 550.000 rows (that are 141 times as many rows), so we expect 0.0171 sec * 141 = 2.4s. There are obviously other factors that determine runtime too, but the order of magnitude fits.
There is an additional effect. Unfortunately the index by date does not contain the value for user, so MySQL has to look up that value in the original table, by jumping back and forth in that table (because the data itself is ordered by the primary key). This is slower than reading the unordered table directly.
So actually, not using an index at all could be faster than using an index, if you have a lot of rows to read. You can force MySQL to not use it by using e.g. FROM logs IGNORE INDEX (date), but this will have the effect that it now has to read the whole table in absolutely every case: the last row could be the newest and thus has to be in the resultset, because you ordered by date. So it might slow down your first query - reading the full 550.000 rows fast can be slower than reading 3887 rows slowly by jumping back and forth. (MySQL doesn't know this either beforehand, so it took a choice - for your second query obviously the wrong one).
So how to get faster results?
Have an index that is ordered by user. Then the query for 'user2' can stop after 13 rows, because it knows there are no more rows. And this will now be faster than the query for 'user1', that has to look through 3537 rows and then order them afterwards by date.
The best index for your query would therefore be user, date, because it then knows when to stop looking for further rows AND the list is already ordered the way you want it (and beat your 0.0171s in all cases).
Indexes require some resources too (e.g. hdd space and time to update the index when you update your table), so adding the perfect index for every single query might be counterproductive sometimes for the system as a whole.
I'm experiencing a strange situation with examining my query in MySQL using the "explain" command. I've got a table, which has three non-unique single column index on columns "Period","X", and "Y". All three of these columns have the same integer datatype. Then I examine the following commands:
EXPLAIN SELECT * FROM MyTable WHERE Period = 201208 AND X >= 0 AND Y <= 454;
EXPLAIN SELECT * FROM MyTable WHERE Period = 201304 AND X >= 0 AND Y <= 454;
The first one shows "Using index condition; Using where", but stangely the second onnly shows "Using where", so it seems like changing one parameter somehow eliminates indexes in query execution.
The table has about 65000 total rows, about 5000 per Period value (so it's balanced), and the first query returns about 2000 number of rows, the second returns about 500. Also, the latter period value (201304) is not the "last" physically in the table, and the former value is not the first as well, there are many rows with period values less and greater than this two specific.
My original table is quite complex with lots of columns, so I cannot paste it into here. But the only indexes are this three, and the query is the same as I used during testing, so I hope it should not matter too much.
Could someone give me any tip what can cause this and if I need to take care of something what I don!t know about? Thank you.