I have the following query which runs very slow (almost 50000 records)
SELECT
news.id,
news.title,
DATE_FORMAT(news.date_online,'%d %m %Y')AS newsNL_Date
news_categories.parent_id
FROM
news,
news_categories
WHERE
DATE(news.date_online)=2013-02-25
AND news.category_id = news_categories.id
AND news.date_online < NOW()
AND news.activated ='t'
ORDER BY
news.date_online DESC
I have MySQL client version 5.0.96
When I run a EXPLAIN EXTENDED call with this query, this is the result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE news ref category_id,date_online,activated activated 1 const 43072 Using where; Using filesort
1 SIMPLE news_categories eq_ref PRIMARY,id PRIMARY 4 news_category_id 1
I have an index on the following columns
news_id (primary key)
date_online
activated
category_id
When I look at the EXPLAIN EXTENDED result, I see USING_WHERE; USING FILESORT. I know both of them are bad, but I don't know how to fix this.
Try adding indexes for date_online (type B-tree). Also I would try to avoid using NOW() in the where close rather set it in a variable and use the variable instead.
I suppose the the id fields are keys so they are already indexed.
Use a LEFT JOIN and see the difference
SELECT
news.id,
news.title,
DATE_FORMAT(news.date_online,'%d %m %Y')AS newsNL_Date
news_categories.parent_id
FROM
news
LEFT JOIN news_categories ON news_categories.id = news.category_id
WHERE
DATE(news.date_online)= '2013-02-25'
AND DATE(news.date_online) < DATE(NOW())
AND news.activated ='t'
ORDER BY
news.date_online DESC
Further to the other worthwhile suggestions, you have DATE(news.date_online)=2013-02-25. This would seem to force MySQL to convert news.date_online to a different format for every row on the table. This will stop the indexes being useful.
Possibly change it to news.date_online BETWEEN '2013-02-25 00:00:00' AND '2013-02-25 23:59:59' which should (I think) allow the index on date_online to be used
Your query is getting data for a specific day, so in this case you can omit the AND news.date_online < NOW() part (Maybe the query optimizer does this for you).
In the other case, where you want all active news without specifying a date_online, you also should get rid of the NOW(). The problem is, that your query cannot be cached by the database, because it contains parts ( NOW() ) that are different for every execution. You can do this by providing a calculated constant there. If you omit let's say the minute part during comparison, it can be cached for max. 60 seconds.
Next, you should create a multi-column index for the columns that you are using, because I guess category_id, date_online and activatedare used in almost every query. This will make inserts a little slower, but from what it looks like (news application) there will be much more reads than inserts.
Related
I have MariaDB 10.1.14, For a long time I'm doing the following query without problems (it tooks about 3 seconds):
SELECT
sum(transaction_total) as sum_total,
count(*) as count_all,
transaction_currency
FROM
transactions
WHERE
DATE(transactions.created_at) = DATE(CURRENT_DATE)
AND transaction_type = 1
AND transaction_status = 2
GROUP BY
transaction_currency
Suddenly, I'm not sure exactly why, this query take about 13 seconds.
This is the EXPLAIN:
And those are the all indexes of transactions table:
What is the reason for the sudden query time increase? and how can I decrease it?
If you are adding more data to your table the query time will increase.
But you can do a few things to improve the performance.
Create a composite index for ( transaction_type, transaction_status, created_at)
Remove the DATE() functions (or any function) from your fields, because that doesn't allow engine use the index. CURRENT_DATE is a constant so there doesn't matter, but isn't necessary because already return DATE
if created_at isnt date you can use
created_at >= CURRENT_DATE and created_at < CURRENT_DATE + 1
or create a different field to only save the date part.
+1 to answer from #JuanCarlosOropeza, but you can go a little further with the index.
ALTER TABLE transactions ADD INDEX (
transaction_type,
transaction_status,
created_at,
transaction_currency,
transaction_total
);
As #RickJames mentioned in comments, the order of columns is important.
First, columns in equality comparisons
Next, you can index one column that is used for a range comparison (which is anything besides equality), or GROUP BY or ORDER BY. You have both range comparison and GROUP BY, but you can only get the index to help with one of these.
Last, other columns needed for the query, if you think you can get a covering index.
I describe more detail about index design in my presentation How to Design Indexes, Really (video: https://www.youtube.com/watch?v=ELR7-RdU9XU).
You're probably stuck with the "using temporary" since you have a range condition and also a GROUP BY referencing different columns. But you can at least eliminate the "using filesort" by this trick:
...
GROUP BY
transaction_currency
ORDER BY NULL
Supposing that it's not important to you which order the rows of the query results return in.
I don't know what has made your query slower. More data? Fragmentation? New DB version?
However, I am surprised to see that there is no index really supporting the query. You should have a compound index starting with the column with highest cardinality (the date? well, you can try different column orders and see which index the DBMS picks for the query).
create index idx1 on transactions(created_at, transaction_type, transaction_status);
If created_at contains a date part, then you may want to create a computed column created_on only containing the date and index that instead.
You can even extend this index to a covering index (where clause fields followed by group by clause fields followed by select clause fields):
create index idx2 on transactions(created_at, transaction_type, transaction_status,
transaction_currency, transaction_total);
We're upgrading our DB systems to MySQL 5.7 coming from MySQL 5.6 and since the upgrade a few queries have been running really slow.
After some investigating we narrowed it down to a few JOIN queries which suddenly don't listen to the 'WHERE' clause anymore when using a 'larger than' > or 'smaller than' < operator. When using a '=' operator it does work as expected. When querying a large table this caused a constant 100% CPU usage.
The queries have been simplified to explain the issue at hand; when using explain we get the following outputs:
explain
select * from TableA as A
left join
(
select
DATE_FORMAT(created_at,'%H:%i:00') as `time`
FROM
TableB
WHERE
created_at < DATE_ADD(CURDATE(), INTERVAL -3 HOUR)
)
as V ON V.time = A.time
Output
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE A NULL ALL NULL NULL NULL NULL 10080 100.00 NULL
1 SIMPLE TableB NULL index created_at created_at 4 NULL 488389 100.00 Using where; Using index; Using join buffer (Block Nested Loop)
As you can see, it's querying/matching 488389 rows and not using the where clause since this is the total records in that table.
And now running the same query but with a LIMIT 99999999 command or using the '=' operator:
explain
select * from TableA as A
left join
(
select
DATE_FORMAT(created_at,'%H:%i:00') as `time`
FROM
TableB
WHERE
created_at < DATE_ADD(CURDATE(), INTERVAL -3 HOUR) LIMIT 999999999
)
as V ON V.time = A.time
Output
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY A NULL ALL NULL NULL NULL NULL 10080 100.00 NULL
1 PRIMARY <derived2> NULL ALL NULL NULL NULL NULL 244194 100.00 Using where; Using join buffer (Block Nested Loop)
2 DERIVED TableB NULL range created_at created_at 4 NULL 244194 100.00 Using where; Using index
You can see it's suddenly only matching '244194' rows which is a part of the table, or with the '=' operator:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE A NULL ALL NULL NULL NULL NULL 10080 100.00 NULL
1 SIMPLE TableB NULL ref created_at created_at 4 const 1 100.00 Using where; Using index
Just 1 row, as expected.
So the question now is, have we been querying in a wrong way and
just now finding out while upgrading or have things changed since
MySQL 5.6? It seems odd that the = operator works, but the <
and > are ignored for some reason, unless when using a LIMIT?..
We've searched around and couldn't find the cause of this issue, and we'd rather not use the limit 9999999 solution in our code for obvious reasons.
Note When running just the query inside the join, it works as expected as well.
Note We've also ran the same test on MariaDB 10.1, same issue.
The explain row-output is merely a guess on how many rows it will hit. It is based upon statistical data, that has been resettet with your update. And if I had to guess how many rows of all your existing rows are older than yesterday 9pm, I would too guess its closer to "all rows" than to "just some rows". The reason why 'limit 99999999' is displaying another rowcount is the same: it just guesses the limit will have an effect; in this case, mysql guesses it will be exactly half of the rows (what would be, if true, a strange coincidence), and of course, it doesn't actually look at the limit-value, since 999999999 will not limit anything when you only have 500k rows; and even the "1" in case of "=" is just a guess (and might more often be 0 than 1, and maybe sometimes more).
This estimate will help choose the correct execution plan, and being wrong in this guess is just a problem if it would choose the wrong one; your execution plan looks fine though, and there are not many option to do it otherwise. It does exactly as expected: Scan the index for all dates using the index on created_at. Since you do a left join, you cannot skip values from tableA even if you would start with the inner query, so there is really no alternative execution plan available. (The optimizer actually have been changed in 5.7., but here is doesn't have an effect.)
If that is your actual query, there is no real reason why it should be slower than before (only regarding this query; there are of course a lot of general performance options that might have an indirect effect, like caching strategies, buffersizes, ..., but with standard options, it should not have an effect here).
If not, and you e.g. actually use additional columns from TableB in the subquery (it is often hard to guess which maybe important things have gotten "simplified away" in questions), and thus need access to the actual table, it might depends on how your data is structured (or better: in what order you added it). And you might try Optimize table TableB to make your table and indexes fresh and new, it can't hurt (but will lock your table for a little while).
With mysql 5.7., you now can add generated columns, so it might be worth a try to generate a cleaned up column time as DATE_FORMAT(created_at,'%H:%i:00'), so you don't have to calculate it anymore. And maybe add it to your index, so you don't have to sort it anymore to improve the block nested join, but that may depend on your actual query and how often you use it (spamming indexes increases overhead and uses space).
In MySQL 5.7, derived tables (sub-queries in FROM clause) will be merged into the outer query if possible. This is usually an advantage since one avoids that the result of the sub-query is stored in a temporary table. However, for your query, MySQL 5.6 will create an index on this temporary table that could be used for the join execution.
The problem with the merged query is that the index on TableB.created_at can not be used when the column is a parameter to a function. If you can change the query so that the transformation is made to the column on the left side of the join, an index can be used to access the table on the right side. Something like:
select * from TableA as A
left join
(
select created_at as time
FROM TableB
WHERE created_at < DATE_ADD(CURDATE(), INTERVAL -3 HOUR)
)
as V ON V.time = func(A.time)
Alternatively, if you can use inner join instead of left join, MySQL can reverse the join order, so that the index on tableA.time can be used for the join.
If the subquery use LIMIT, it can not be merged. Hence, by using LIMIT you will get the same query plan as was used in MySQL 5.6.
Use JOIN instead of LEFT JOIN unless you need the 'right' table to be optional.
Avoid JOIN ( SELECT ... ). Although 5.6 and 5.7 added some features to handle it, it is usually better to turn the subquery into a simpler JOIN.
Your time expression leads to 9pm yesterday; did you mean "3 hours ago" instead?
See if this gives the desired results and runs faster:
select A.*, DATE_FORMAT(B.created_at,'%H:%i:00') as `time`
from TableA as A
JOIN TableB as B ON B.time = A.time
WHERE B.created_at < NOW() - INTERVAL 3 HOUR -- (assuming "3 hours ago")
As for 5.6 vs 5.7... 5.7 has a new, 'better', optimizer, based on a "cost model". However your particular query makes it virtually impossible for the optimizer to come up with good costs. I guess that 5.6 happened on the better EXPLAIN, and 5.7 happened on a worse one. By simplifying the query, I think both optimizers will have a better chance at performing the query faster.
You do need these indexes:
B: INDEX(time, created_at) -- in that order
A: INDEX(time)
How is
SELECT t.id
FROM table t
JOIN (SELECT(FLOOR(max(id) * rand())) AS maxid FROM table)
AS tt
ON t.id >= tt.maxid
LIMIT 1
faster than
SELECT * FROM `table` ORDER BY RAND() LIMIT 1
I am actually having trouble understanding the first. Maybe if I knew why one is faster than the other I would have a better understanding.
*original post # Difficult MySQL self-join please explain
You can use EXPLAIN on the queries, but basically:
In the first you're getting a random number (which isn't very slow), based on the maximum of a (i presume) indexed field. This is quite quick, i'd say maybe even near-constant time (depends on the implementation of the index hash?)
Then you're joining on that number and returning only the first row that's greater then, and because you're using an index again, this is lightning quick.
The second is ordering by some random function. This has to, but you'll need to look at the explain for that, do a FULL TABLE scan, and then return the first. This is ofcourse VERY expensive. You're not using any indexes because of that rand.
(the explain will look like this, showing that you're not using keys)
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE table ALL NULL NULL NULL NULL 14 Using temporary; Using filesort
This query pops up in my slow query logs:
SELECT
COUNT(*) AS ordersCount,
SUM(ItemsPrice + COALESCE(extrasPrice, 0.0)) AS totalValue,
SUM(ItemsPrice) AS totalValue,
SUM(std_delivery_charge) AS totalStdDeliveryCharge,
SUM(extra_delivery_charge) AS totalExtraDeliveryCharge,
this_.type AS y5_,
this_.transmissionMethod AS y6_,
this_.extra_delivery AS y7_
FROM orders this_
WHERE this_.deliveryDate BETWEEN '2010-01-01 00:00:00' AND '2010-09-01 00:00:00'
AND this_.status IN(1, 3, 2, 10, 4, 5, 11)
AND this_.senderShop_id = 10017
GROUP BY this_.type, this_.transmissionMethod, this_.extra_delivery
ORDER BY this_.deliveryDate DESC;
The table is InnoDB and has about 880k rows and takes between 9-12 seconds to execute. I tried adding the following index ALTER TABLE orders ADD INDEX _deliverydate_senderShopId_status ( deliveryDate , senderShop_id , status, type, transmissionMethod, extra_delivery); with no practical gains. Any help and/or suggestion is welcomed
Here is the query execution plan right now:
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE this_ ref FKC3DF62E57562BA6F 8 const 139894 100.00 Using where; Using temporary; Using filesort
I took out the possible_keys value out of the text because i think it listed all the indexes in the table. The key used (FKC3DF62E57562BA6F) looks like
Keyname Type Unique Packed Field Cardinality Collation Null Comment
FKC3DF62E57562BA6F BTREE No No senderShop_id 4671 A
I'll tell you one thing that you can look at for increasing the speed.
You only generally have NULL values in the data for either unknown or non-applicable rows. It appears to me that, since you're treating NULL as 0 anyway, you should think about getting rid of them and making sure that all extrasPrice values are 0 where they were previously NULL so that you can get rid of the time penalty of the coalesce.
In fact, you could go one step further and introduce another column called totalPrice which you set with an insert/update trigger to the actual value ItemsPrice + extrasPrice or (ItemsPrice + COALESCE(extrasPrice,0.0) if you still need nullability of extrasPrice).
Then, you can simply use:
SELECT
COUNT(*) AS ordersCount,
SUM(totalPrice) AS totalValue,
SUM(ItemsPrice) AS totalValue2,
:
(I'm not sure you should have two output columns with the same name or whether that was a typo, that's going to be, at worst, an error, at best, confusing).
This moves the cost of the calculation to insert/update time rather than select time and amortises that cost over all the selects - most database tables are read far more often than written. The consistency of the data is maintained due to the trigger and the performance should be better, at the cost of some storage requirements.
But, since the vast majority of database questions are "How can I get more speed?" rather than "How can I use less disk?", that's often a good idea.
Another suggestion is to provide a non-composite index on the column that reduces your result set the fastest (high cardinality). In other words, if you store only two weeks worth of data (14 different dates) in your table but 400 different shops, you should have an index on senderShop_id and make sure your statistics are up to date.
This should cause the DBMS execution engine to whittle down the result set using that key so that subsequent operations are faster.
A composite index on deliveryDate,senderShop_id,... will not be able to use senderShop_id to whittle down the results because the key ordering will be senderShop_id within deliveryDate.
I'm trying to speed up a query which I currently have as:
SELECT *
FROM `events`
WHERE (field1 = 'some string' or field1 = 'some string')
and is_current = true
GROUP BY event_id
ORDER BY pub_date
this takes roughly 30seconds.
field1 is a varchar(150)
I'm currently indexing
field1, is_current, event_id, pub_data
charity, pub_date, is_current
and all the fields individually...
I'm really not sure what fields should be indexed together, when I remove the order by, the query speeds up to around 8 seconds, and if I removed both the order by and group by, it's less than 1 second...
What exactly should be indexed in this case to speed up the query?
Edit:
I've run explain on the modified query (which no longer includes the group by):
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE events range is_current,field1_2,field1_3,field1_4,field1 field1_3 153 NULL 204336 Using where; Using filesort
Which indicates it's using the key field1_3 which is: field1 & is_current
Although it's not using the key which includes those two fields and pub_date (for the ordering..?)
It's also using FILESORT which seems to be the main problem..
any ideas why it's using a filesort even though the pub_date field is also indexed (with the other fields)?
Everything, (field1, is_current, event_id, pub_date) in one index. MySQL can only use one index per joined table in a query.
Use EXPLAIN to see what happens when you do.
Also, an aside - as KoolKabin says, * is rarely a good idea. Sometimes MySQL will copy the rows in a temporary table; and then there are the communication costs. The less you ask from it the faster things will work.
UPDATE: I was actually wrong. Sorry. First off, you can't get full use of indexing if your grouping is different than your ordering. Second, do you have an index where your ordering key (pub_date) is first? If not, try if that fixes the ordering thing.
any ideas why it's using a filesort even though the pub_date field is also indexed (with the other fields)?
This is because the mysql optimizer is trying to use index "field1" and you want the data ordered by pub_date. If you are using mysql 5.1 (the following query will give error in earlier versionn), you can force mysql to use the pub_date index for order by, something like this
SELECT *
FROM `events`
force index for order by (pub_date)
WHERE (field1 = 'some string' or field1 = 'some string')
and is_current = true
GROUP BY event_id
ORDER BY pub_date