MySQL COUNT takes 11s, how to improve - mysql

I am trying to count the occurrence of sales.
Here is my query:
SELECT item, COUNT(item) FROM sales_raw
GROUP BY item HAVING (count(item)>=1)
ORDER BY COUNT(item) DESC
This query takes about eleven seconds on a table of about 500,000 rows. When I do an explain, I get:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE sales_raw index NULL vendor_id 767 NULL 397431 Using temporary; Using filesort
Why does this query take so long and how can I improve this?

Replace every COUNT(item) by COUNT(*). The former checks if values in item are NULL, and not the latter.
If it is still not fast enough, add an index on your item column, which should make your query significantly faster.
Also, the HAVING clause seems useless as COUNT(item) cannot return 0, since the item wouldn't be in the table in the first place.

Related

Laravel Join tables and group by sum query too slow

I am using Laravel query builder to get desired results from database. The following query if working perfectly but taking too much time to get results. Can you please help me with this?
select
`amz_ads_sp_campaigns`.*,
SUM(attributedUnitsOrdered7d) as order7d,
SUM(attributedUnitsOrdered30d) as order30d,
SUM(attributedSales7d) as sale7d,
SUM(attributedSales30d) as sale30d,
SUM(impressions) as impressions,
SUM(clicks) as clicks,
SUM(cost) as cost,
SUM(attributedConversions7d) as attributedConversions7d,
SUM(attributedConversions30d) as attributedConversions30d
from
`amz_ads_sp_product_targetings`
inner join `amz_ads_sp_report_product_targetings` on `amz_ads_sp_product_targetings`.`campaignId` = `amz_ads_sp_report_product_targetings`.`campaignId`
inner join `amz_ads_sp_campaigns` on `amz_ads_sp_report_product_targetings`.`campaignId` = `amz_ads_sp_campaigns`.`campaignId`
where
(
`amz_ads_sp_product_targetings`.`user_id` = ?
and `amz_ads_sp_product_targetings`.`profileId` = ?
)
group by
`amz_ads_sp_product_targetings`.`campaignId`
Result of Explain SQL
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE amz_ads_sp_report_product_targetings ALL campaignId NULL NULL NULL 50061 Using temporary; Using filesort
1 SIMPLE amz_ads_sp_campaigns ref campaignId campaignId 8 pr-amz-ppc.amz_ads_sp_report_product_targetings.ca... 1
1 SIMPLE amz_ads_sp_product_targetings ref campaignId campaignId 8 pr-amz-ppc.amz_ads_sp_report_product_targetings.ca... 33 Using where
Your query could benefit from several indices to cover the WHERE clause as well as the join conditions:
CREATE INDEX idx1 ON amz_ads_sp_product_targetings (
user_id, profileId, campaignId);
CREATE INDEX idx2 ON amz_ads_sp_report_product_targetings (
campaignId);
CREATE INDEX idx3 ON amz_ads_sp_campaigns (campaignId);
The first index idx1 covers the entire WHERE clause, which might let MySQL throw away many records on the initial scan of the amz_ads_sp_product_targetings table. It also includes the campaignId column, which is needed for the first join. The second and third indices cover the join columns of each respective table. This might let MySQL do a more rapid lookup during the join process.
Note that selecting amz_ads_sp_campaigns.* is not valid unless the campaignId of that table be the primary key. Also, there isn't much else we can do speed up the query, as SUM, by its nature, requires touching every record in order to come up the result sum.

JOINs being done in weird order; messing up ORDER BY?

Let's say I have three tables - users, servers and payments. Each user can have multiple servers and each server can have multiple payments. Let's also say I wanted to find the most recent payments and get info about the servers / customers those payments are attached to. Here's a query that could do this:
SELECT *
FROM payments p
JOIN customers c ON p.custID = c.custID
JOIN servers s ON s.serverID = p.serverID
WHERE c.hold = 0
AND c.archive = 0
ORDER BY p.paymentID DESC
LIMIT 10;
The problem is that when I run EXPLAIN on this query I get this:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE c ref PRIMARY,hold_archive hold_archive 3 const,const 28728 Using where; Using index; Using temporary; Using filesort
1 SIMPLE p ref custID custID 5 customers.custID 3 Using where
1 SIMPLE s eq_ref PRIMARY PRIMARY 4 payments.serverID 1 Using index
The problem is that the query takes a while to run. If I remove the ORDER BY it becomes 10x as fast. But I need the ORDER BY. Here's the EXPLAIN when I remove the ORDER BY:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE c ref PRIMARY,hold_archive hold_archive 3 const,const 28728 Using where; Using index
1 SIMPLE p ref custID custID 5 customers.custID 3 Using where
1 SIMPLE s eq_ref PRIMARY PRIMARY 4 payments.serverID 1 Using index
So the big difference here is that "Using temporary" and "Using filesort" are missing from the Extra column.
It seems like the reason, in this case, is that the column I'm doing the ORDER BY on isn't the first column in the EXPLAIN.
Another observation. If I remove one of the WHERE clauses (whilst keeping the ORDER BY) it speeds up similarily, but I need both WHERE's. Here's an example EXPLAIN of that:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE p index custID,serverID PRIMARY 4 NULL 10 Using where
1 SIMPLE c eq_ref PRIMARY,hold_archive PRIMARY 4 payments.custID 1 Using where
1 SIMPLE s eq_ref PRIMARY PRIMARY 4 payments.serverID 1 Using index
Here the ORDER BY column /is/ being done on the first column of the EXPLAIN. But why is MySQL re-arranging the order the tables are JOINed in and how can I make it so it doesn't do that? You can force indexes in MySQL but it doesn't seem like that'd help..
Any ideas?
10x faster -- It can find "any 10 rows" a lot faster than "find all possible rows, sort them, then deliver 10".
Having WHERE and ORDER BY hit different columns is hard to optimize.
What percentage of payments have hold=0 and archive=0? It sounds like a small percentage? How many rows in each table?
Does anything else need INDEX(hold, archive)? If not, get rid of it. It seems to be only causing trouble here.
If hold=0 and archive=0 is common, then you would prefer the execution to go like your 3rd EXPLAIN -- that is scan payments in descending order. With most of them matching the WHERE, it will usually` need to hit not much more than 10 rows before finding 10 matching rows.
Another solution (other than getting rid of the index) is to change JOIN to STRAIGHT_JOIN in the query. This tells the Optimizer that you know better, and payments should be scanned first, customers second. That works well if my previous paragraph applies.
But the query will screw up (by being slow) if, say, you look for archive=1.

How to improve IF NOT NULL query?

I have the following query:
SELECT * FROM `title_mediaasset`
WHERE upload_id is not null
ORDER BY `upload_date` DESC
It takes almost a second and doesn't use an index:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE title_mediaasset ALL upload_id,upload_id_2 NULL NULL NULL 119216 Using where; Using filesort
How can I improve this query?
This table holds about 100k results, and will probably increase to 1M in the next year.
If you need all rows and all columns from the result, you can't re-write the query to make it better. It is probably running slow because you don't have an index on upload_date.
If you don't need all of the rows, use LIMIT and you'll see a decent speed increase on the ORDER BY.
If you don't need all of the columns, use SELECT [columns you need] instead of SELECT *. That way if you really need to optimize the query, you can put the columns you need in your index so that you can read everything directly from the index: index on (upload_id, upload_date, [other columns in select statement]).
If you need all of the columns, or a good number of them, just add index on (upload_id, upload_date).

Optimization of a Virtuemart Attribute Query

I have a select query below, what it does is it selects all the products matching a certain attribute from a Virtuemart table. The attribute table is rather large (almost 6000 rows). Is there any way to optimize the query below or are there any other process that might be helpful, I already tried adding indexes to one and even two tables.
SELECT DISTINCT `jos_vm_product`.`product_id`,
`jos_vm_product_attribute`.`attribute_name`,
`jos_vm_product_attribute`.`attribute_value`,
`jos_vm_product_attribute`.`product_id`
FROM (`jos_vm_product`)
RIGHT JOIN `jos_vm_product_attribute`
ON `jos_vm_product`.`product_id` = `jos_vm_product_attribute`.`product_id`
WHERE ((`jos_vm_product_attribute`.`attribute_name` = 'Size')
AND ((`jos_vm_product_attribute`.`attribute_value` = '6.5')
OR (`jos_vm_product_attribute`.`attribute_value` = '10')))
GROUP BY `jos_vm_product`.`product_sku`
ORDER BY CONVERT(`jos_vm_product_attribute`.`attribute_value`, SIGNED INTEGER)
LIMIT 0, 24
Here is the results of the EXPLAIN table:
id select_type table type possible_keys key key_len ref rows Extras
1 SIMPLE jos_vm_product_attribute range idx_product_attribute_name,attribute_value,attribute_name attribute_value 765 NULL 333 Using where; Using temporary; Using filesort
1 SIMPLE jos_vm_product eq_ref PRIMARY PRIMARY 4 shoemark_com_shop.jos_vm_product_attribute.product_id
Any help would be greatly appreciated. Thanks.
Replacing the jos_vm_product_attribute.attribute_name index with a composite index on jos_vm_product_attribute.attribute_name and jos_vm_product_attribute.attribute_value (in that order) should help this query. Currently, it's only using an index in the WHERE condition for jos_vm_product_attribute.attribute_value, but this new index will be usable for both parts of the WHERE condition.

Why is this query using where instead of index?

EXPLAIN EXTENDED SELECT `board` . *
FROM `board`
WHERE `board`.`category_id` = '5'
AND `board`.`board_id` = '0'
AND `board`.`display` = '1'
ORDER BY `board`.`order` ASC
The output of the above query is
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE board ref category_id_2 category_id_2 9 const,const,const 4 100.00 Using where
I'm a little confused by this because I have an index that contains the columns that I'm using in the same order they're used in the query...:
category_id_2 BTREE No No
category_id 33 A
board_id 33 A
display 33 A
order 66 A
The output of EXPLAIN can sometimes be misleading.
For instance, filesort has nothing to do with files, using where does not mean you are using a WHERE clause, and using index can show up on the tables without a single index defined.
Using where just means there is some restricting clause on the table (WHERE or ON), and not all record will be returned. Note that LIMIT does not count as a restricting clause (though it can be).
Using index means that all information is returned from the index, without seeking the records in the table. This is only possible if all fields required by the query are covered by the index.
Since you are selecting *, this is impossible. Fields other than category_id, board_id, display and order are not covered by the index and should be looked up.
It is actually using index category_id_2.
It's using the index category_id_2 properly, as shown by the key field of the EXPLAIN.
Using where just means that you're selecting only some rows by using the WHERE statement, so you won't get the entire table back ;)