Slow count query with group - mysql

I have a common aggregation query:
SELECT
products.type,
count(products.id)
FROM
products
INNER JOIN product_colors
ON products.id = product_colors.product_id
AND product_colors.is_active = 1
AND product_colors.is_archive = 0
WHERE
(products.is_active = 1
AND product_colors.is_individual = 0
AND product_colors.is_visible = 1)
GROUP BY
type
It lasts in the order of 0.1 seconds. The indexes look fine, tmp_table_size = 128M and
max_heap_table_size = 128M. Why they are so slow? Classic selects are fast, but as there is group and count, no.
Indexes on products table:
Indexes on product_colors table:
Explain SQL:
EDIT Product indexes:

Your indexing is not optimal for what you are asking. Instead of just having an index on each column individually (can be a big waste), you should have composite indexes that better match what you are trying to query and be covering enough to handle any group by or orderings.
In this case, you primary query is ACTIVE products and ordering by type. So I would have a SINGLE index on your primary table on (is_active, type, id). This way, your WHERE criteria is up front via Is_Active, then your order by via Type and finally the ID that qualifies the record. In this case, your query can get all it needs from the INDEX and not have to go to the raw data pages.
Now, your secondary table. Similarly should be composite index. First based on the criteria of the join between tables, THEN based on its restrictions you are looking for, thus: ( product_id, is_active, is_archive ). Why you have two columns of Is_Active and another for Is_Archive, dont know. I would think that if something were in the archives, it would not be active to begin with, but just a guess on that.
Anyhow, with the optimized indexes should help.
One last consideration on your count(product.id). Do you intend DISTINCT Products, or all records found. So if a one product has 8 colors, do you want the ID counted as 1 or 8.
count(*) would give 8
count( distinct product.id ) would give 1

Give these a try:
products: INDEX(is_active, type)
product_colors: INDEX(product_id, is_individual, is_visible, is_active, is_archive)
Since products.id cannot be NULL, you may as well say COUNT(*) instead of count(products.id). (Or, as DRapp points out, maybe you need COUNT(DISTINCT products.id)

Related

Is there a way to avoid sorting for a query with WHERE and ORDER BY?

I am looking to understand how a query with both WHERE and ORDER BY can be indexed properly. Say I have a query like:
SELECT *
FROM users
WHERE id IN (1, 2, 3, 4)
ORDER BY date_created
LIMIT 3
With an index on date_created, it seems like the execution plan will prefer to use the PRIMARY key and then sort the results itself. This seems to be very slow when it needs to sort a large amount of results.
I was reading through this guide on indexing for ordered queries which mentions an almost identical example and it mentions:
If the database uses a sort operation even though you expected a pipelined execution, it can have two reasons: (1) the execution plan with the explicit sort operation has a better cost value; (2) the index order in the scanned index range does not correspond to the order by clause.
This makes sense to me but I am unsure of a solution. Is there a way to index my particular query and avoid an explicit sort or should I rethink how I am approaching my query?
The Optimizer is caught between a rock and a hard place.
Plan A: Use an index starting with id; collect however many rows that is; sort them; then deliver only 3. The downside: If the list is large and the ids are scattered, it could take a long time to find all the candidates.
Plan B: Use an index starting with date_created filtering on id until it gets 3 items. The downside: What if it has to scan all the rows before it finds 3.
If you know that the query will always work better with one query plan than the other, you can use an "index hint". But, when you get it wrong, it will be a slow query.
A partial answer... If * contains bulky columns, both approaches may be hauling around stuff that will eventually be tossed. So, let's minimize that:
SELECT u.*
FROM ( SELECT id
FROM users
WHERE id IN (1, 2, 3, 4)
ORDER BY date_created
LIMIT 3 -- not repeated
) AS x
JOIN users AS u USING(id)
ORDER BY date_created; -- repeated
Together with
INDEX(date_created, id),
INDEX(id, date_created)
Hopefully, the Optimizer will pick one of those "covering" indexes to perform the "derived table" (subquery). If so that will be somewhat efficiently performed. Then the JOIN will look up the rest of the columns for the 3 desired rows.
If you want to discuss further, please provide
SHOW CREATE TABLE.
How many ids you are likely to have.
Why you are not already JOINing to another table to get the ids.
Approximately how many rows in the table.
You best bet might to to write this in a more complicated way:
SELECT u.*
FROM ((SELECT u.*
FROM users u
WHERE id = 1
ORDER BY date_created
LIMIT 3
) UNION ALL
(SELECT u.*
FROM users u
WHERE id = 2
ORDER BY date_created
LIMIT 3
) UNION ALL
(SELECT u.*
FROM users u
WHERE id = 3
ORDER BY date_created
LIMIT 3
) UNION ALL
(SELECT u.*
FROM users u
WHERE id = 4
ORDER BY date_created
LIMIT 3
)
) u
ORDER BY date_created
LIMIT 3;
Each of the subqueries will now use an index on users(id, date_created). The outer query is then sorting at most 12 rows, which should be trivial from a performance perspective.
You could create a composite index on (id, date_created) - that will give the engine the option of using an index for both steps - but the optimiser may still choose not to.
If there aren't many rows in your table or it thinks the resultset will be small it's quicker to sort after the fact than it is to traverse the index tree.
If you really think you know better than the optimiser (which you don't), you can use index hints to tell it what to do, but this is almost always a bad idea.

Optimizate My SQL Index Multiple Table JOIN

I have a 5 tables in mysql. And when I want execute query it executed too long.
There are structure of my tables:
Reciept(count rows: 23799640)reciept table structure
reciept_goods(count rows: 39398989)reciept_goods table structure
good(count rows: 17514)good table structure
good_categories(count rows: 121)good_categories table structure
retail_category(count rows: 10)retail_category table structure
My Indexes:
Date -->reciept.date #1
reciept_goods_index --> reciept_goods.recieptId #1,
reciept_goods.shopId #2,
reciept_goods.goodId #3
category_id -->good.category_id #1
I have a next sql request:
SELECT
R.shopId,
sales,
sum(Amount) as sum_amount,
count(distinct R.id) as count_reciept,
RC.id,
RC.name
FROM
reciept R
JOIN reciept_goods RG
ON R.id = RG.RecieptId
AND R.ShopID = RG.ShopId
JOIN good G
ON RG.GoodId = G.id
JOIN good_categories GC
ON G.category_id = GC.id
JOIN retail_category RC
ON GC.retail_category_id = RC.id
WHERE
R.date >= '2018-01-01 10:00:00'
GROUP BY
R.shopId,
R.sales,
RC.id
Explain this query gives next result:
Explain query
and execution time = 236sec
if use straight_join good ON (good.id = reciept_goods.GoodId ) explain query
Explain query
and execution time = 31sec
SELECT STRAIGHT_JOIN ... rest of query
I think, that problem in the indexes of my tables, but I don't uderstand how to fix them, can someone help me?
With about 2% of your rows in reciepts having the correct date, the 2nd execution plan chosen (with straight_join) seems to be the right execution order. You should be able to optimize it by adding the following covering indexes:
reciept(date, sales)
reciept_goods(recieptId, shopId, goodId, amount)
I assume that the column order in your primary key for reciept_goods currently is (goodId, recieptId, shopId) (or (goodId, shopId, receiptId)). You could change that to recieptId, shopId, goodId (and if you look at e.g. the table name, you may wanted to do this anyway); in that case, you do not need the 2nd index (at least for this query). I would assume that this primary key made MySQL take the slower execution plan (of course assuming that it would be faster) - although sometimes it's just bad statistics, especially on a test server.
With those covering indexes, MySQL should take the faster explain plan even without straight_join, if it doesn't, just add it again (although I would like a look at both executions plans then). Also check that those two new indexes are used in the explain plan, otherwise I may have missed a column.
It looks like you are depending on walking through a couple of many:many tables? Many people design them inefficiently.
Here I have compiled a list of 7 tips on making mapping tables more efficient. The most important is use of composite indexes.

Most efficient way to join "most recent row"

I know this question has been asked 100 times, and this isn't a "how do I do it", but an efficiency question - a topic I don't know much about.
From my internet reading I have settled on one way of solving the most recent problem that sounds like it's pretty efficient - LEFT JOIN a "max" table (grouped by the matching conditions) and then LEFT JOIN the row that matches the grouped conditions. Something like this:
Select employee.*, evaluation.* form employee
LEFT JOIN (select max(report_date) report_date, employee_id
from evaluation group by employee_id) most_recent_eval
on most_recent_eval.employee_id = employee.id
LEFT JOIN evaluation
on evaluation.employee_id = employee.id and evaluation.report_date = most_recent_eval.report_date
Are there problems with this that I don't know about? Is this doing 2 table scans (one to find the max, and one to find the row)? Does it have to do 2 full scans for every employee?
The reason I'm asking is that I am now looking at joining on 3 tables where I need the most recent row (evaluations, security clearance, and project) and it seems like any inefficiencies are going to be massively multiplied.
Can anyone give me some advice on this?
You should be in pretty good shape with the query pattern you propose.
One possible suggestion, that will help if your evaluation table has its own autoincrementing id column. You may be able to find the latest evaluation for each employee with this subquery:
SELECT MAX(id) id
FROM evaluation
GROUP BY employee_id
Then your join can look like this:
FROM employee
LEFT JOIN (
SELECT MAX(id) id
FROM evaluation
GROUP BY employee_id
) most_recent_eval ON most_recent_eval.employee_id=employee.id
LEFT JOIN evaluation ON most_recent_eval.id = evaluation.id
This will work if your id values and your report_date values in your evaluation table are in the same order. Only you know if that's the case in your application. But if it is, this is a very helpful optimization.
Other than that, you may need to add some compound indexes to some tables to speed up your queries. Get them working correctly first. Read http://use-the-index-luke.com/ . Remember that lots of single-column indexes are generally harmful to MySQL query performance unless they're chosen to accelerate particular queries.
If you create a compound index on (employee_id, report_date), this subquery
select max(report_date) report_date, employee_id
from evaluation
group by employee_id
can be satisfied with an astonishingly efficient loose index scan. Similarly, if you're using InnoDB, the query
SELECT MAX(id) id
FROM evaluation
GROUP BY employee_id
can be satisfied by a loose index scan on a single-column index on employee_id. (If you're using MyISAM, you need a compound index on (employee_id, id) because InnoDB puts the primary key column implicitly into every index.)

MySQL slow query when combining two very fast queries

I have two MySQL queries that run very fast, but when I combine them the new query is very slow.
Fast (<1 second, 15 results):
SELECT DISTINCT
Id, Name, Company_Id
FROM people
where Company_Id in (5295, 1834)
and match(Locations) against('austin')
Fast (<1 second, 2970 results):
select distinct Company_Id from technologies
where match(Name) against('elastic')
and Company_Id is not null
When I combine the two like this:
SELECT DISTINCT Id, Name, Company_Id
FROM people
where Company_Id in
( select Company_Id from technologies
where match(Name) against('elastic')
and Company_Id is not null
)
and match(Locations) against('austin')
The result query takes over 2 minutes to complete. It has 278 rows hit.
I've tried rewriting the slow query a few ways. One other example is like this:
SELECT DISTINCT
`Extent1`.`Id`, `Extent1`.`Name`, `Extent1`.`Company_Id`
FROM `people` AS `Extent1`
INNER JOIN `technologies` AS `Extent2`
ON (`Extent1`.`Company_Id` = `Extent2`.`Company_Id`)
WHERE (`Extent1`.`Company_Id` IS NOT NULL)
AND ((match(`Extent1`.`Locations`) against('austin'))
AND (match(`Extent2`.`Name`) against('elastic')))
I'm using MySQL 5.7 on Windows. I have full text index on the Name and Location columns. My InnoDB Buffer Usage never goes above 40%. I tried to use MySQL workbench to look at the execution plan, but it shows "Explain data not available for statement"
Please let me know if you see anything I could improve or try. Thank you.
IN ( SELECT ... ) is poorly optimized, at least in older versions of MySQL. What version are you using?
When using a FULLTEXT index (MATCH...), that part is performed first, if possible. This is because nearly always the FT lookup is faster than whatever else is going on.
But when using two fulltext queries, it picks one, then can't use fulltext on the other.
Here's one possible workaround:
Have a extra table for searches. It includes both Name and Locations in it.
Have FULLTEXT(Name, Locations)
MATCH (Name, Locations) AGAINST ('+austin +elastic' IN BOOLEAN MODE)
If necessary, AND that with something to verify that it is not, for example, finding a person named 'Austin'.
Another possibility:
5.7 (or 5.6?) might be able to optimize this by creating indexes on the subqueries:
SELECT ...
FROM ( SELECT Company_Id FROM ... MATCH(Name) ... ) AS x
JOIN ( SELECT Company_Id FROM ... MATCH(Locations) ... ) AS y
USING(Company_id);
Provide the EXPLAIN; I am hoping to see <auto-key>.
Test that. If it is 'fast', then you may need to add on another JOIN and/or WHERE. (I am unclear what your ultimate query needs to be.)
Write the query with the subquery in the from clause:
select distinct p.Id, p.Name, p.Company_Id
from people p join
(select Company_Id
from technologies
where match(Name) against('elastic') and Company_Id is not null
) t
on p.Company_Id = t.Company_Id
where match(p.Locations) against ('austin');
I suspect that you have a problem with your data structure, though. You should have a CompanyLocations table, rather than storing locations in a list in the table.

Utilizing MySQL multi-column indexes with my JOIN statement

I'm having some trouble utilizing my MySQL multi-column indexes when joining. Maybe a bit of a newbie question, but I appreciate the help.
I have a multi-column index on the notification table across the "type", "status" and "timeSent" columns.
Running this query:
SELECT count(notificationID) FROM notification
WHERE statusID = 2
AND typeID = 1
AND timeSent BETWEEN "2014-01-01" AND "2014-02-01"
This uses my 3 column index and runs fast. If I want to get notifications for a specific client I need to join in my user table.
SELECT COUNT(a.notificationID) FROM notification a
LEFT JOIN user b ON a.userID = b.userID
WHERE a.statusID = 2
AND a.typeID = 1
AND b.clientID = 1
AND a.timeSent BETWEEN "2014-01-01" AND "2014-02-01"
This query ignores my index all together and screeches to halt as there are 15m records in the notification table. I've tried doing a sub-select instead with the same results.
Any input would be greatly appreciated.
Your notification table is pretty big, so it's not cheap to goof around trying to add indexes. But I will suggest that anyhow.
First of all, COUNT(*) is well known to be faster than COUNT(someColumn). I think you can make that change without changing the meaning of your query.
Second, your (typeID, statusID, timeSent) covering index is perfect for your first query. MySQL can random-access the index to the first date, sequentially scan to the second date, and just count the index entries, and so satisfy your query.
Third, let's take a look at that clientId-finding query. What it actually does is look up a date-range of notifications for particular, single, values of statusID, typeID, and userID. You specify the first two of those explicitly, and the query pulls the third one from the joined table. So, another covering index on (userID, typeID, statusID, timeSent) might very well do the trick for you.