Utilizing MySQL multi-column indexes with my JOIN statement - mysql

I'm having some trouble utilizing my MySQL multi-column indexes when joining. Maybe a bit of a newbie question, but I appreciate the help.
I have a multi-column index on the notification table across the "type", "status" and "timeSent" columns.
Running this query:
SELECT count(notificationID) FROM notification
WHERE statusID = 2
AND typeID = 1
AND timeSent BETWEEN "2014-01-01" AND "2014-02-01"
This uses my 3 column index and runs fast. If I want to get notifications for a specific client I need to join in my user table.
SELECT COUNT(a.notificationID) FROM notification a
LEFT JOIN user b ON a.userID = b.userID
WHERE a.statusID = 2
AND a.typeID = 1
AND b.clientID = 1
AND a.timeSent BETWEEN "2014-01-01" AND "2014-02-01"
This query ignores my index all together and screeches to halt as there are 15m records in the notification table. I've tried doing a sub-select instead with the same results.
Any input would be greatly appreciated.

Your notification table is pretty big, so it's not cheap to goof around trying to add indexes. But I will suggest that anyhow.
First of all, COUNT(*) is well known to be faster than COUNT(someColumn). I think you can make that change without changing the meaning of your query.
Second, your (typeID, statusID, timeSent) covering index is perfect for your first query. MySQL can random-access the index to the first date, sequentially scan to the second date, and just count the index entries, and so satisfy your query.
Third, let's take a look at that clientId-finding query. What it actually does is look up a date-range of notifications for particular, single, values of statusID, typeID, and userID. You specify the first two of those explicitly, and the query pulls the third one from the joined table. So, another covering index on (userID, typeID, statusID, timeSent) might very well do the trick for you.

Related

Is there a way to avoid sorting for a query with WHERE and ORDER BY?

I am looking to understand how a query with both WHERE and ORDER BY can be indexed properly. Say I have a query like:
SELECT *
FROM users
WHERE id IN (1, 2, 3, 4)
ORDER BY date_created
LIMIT 3
With an index on date_created, it seems like the execution plan will prefer to use the PRIMARY key and then sort the results itself. This seems to be very slow when it needs to sort a large amount of results.
I was reading through this guide on indexing for ordered queries which mentions an almost identical example and it mentions:
If the database uses a sort operation even though you expected a pipelined execution, it can have two reasons: (1) the execution plan with the explicit sort operation has a better cost value; (2) the index order in the scanned index range does not correspond to the order by clause.
This makes sense to me but I am unsure of a solution. Is there a way to index my particular query and avoid an explicit sort or should I rethink how I am approaching my query?
The Optimizer is caught between a rock and a hard place.
Plan A: Use an index starting with id; collect however many rows that is; sort them; then deliver only 3. The downside: If the list is large and the ids are scattered, it could take a long time to find all the candidates.
Plan B: Use an index starting with date_created filtering on id until it gets 3 items. The downside: What if it has to scan all the rows before it finds 3.
If you know that the query will always work better with one query plan than the other, you can use an "index hint". But, when you get it wrong, it will be a slow query.
A partial answer... If * contains bulky columns, both approaches may be hauling around stuff that will eventually be tossed. So, let's minimize that:
SELECT u.*
FROM ( SELECT id
FROM users
WHERE id IN (1, 2, 3, 4)
ORDER BY date_created
LIMIT 3 -- not repeated
) AS x
JOIN users AS u USING(id)
ORDER BY date_created; -- repeated
Together with
INDEX(date_created, id),
INDEX(id, date_created)
Hopefully, the Optimizer will pick one of those "covering" indexes to perform the "derived table" (subquery). If so that will be somewhat efficiently performed. Then the JOIN will look up the rest of the columns for the 3 desired rows.
If you want to discuss further, please provide
SHOW CREATE TABLE.
How many ids you are likely to have.
Why you are not already JOINing to another table to get the ids.
Approximately how many rows in the table.
You best bet might to to write this in a more complicated way:
SELECT u.*
FROM ((SELECT u.*
FROM users u
WHERE id = 1
ORDER BY date_created
LIMIT 3
) UNION ALL
(SELECT u.*
FROM users u
WHERE id = 2
ORDER BY date_created
LIMIT 3
) UNION ALL
(SELECT u.*
FROM users u
WHERE id = 3
ORDER BY date_created
LIMIT 3
) UNION ALL
(SELECT u.*
FROM users u
WHERE id = 4
ORDER BY date_created
LIMIT 3
)
) u
ORDER BY date_created
LIMIT 3;
Each of the subqueries will now use an index on users(id, date_created). The outer query is then sorting at most 12 rows, which should be trivial from a performance perspective.
You could create a composite index on (id, date_created) - that will give the engine the option of using an index for both steps - but the optimiser may still choose not to.
If there aren't many rows in your table or it thinks the resultset will be small it's quicker to sort after the fact than it is to traverse the index tree.
If you really think you know better than the optimiser (which you don't), you can use index hints to tell it what to do, but this is almost always a bad idea.

Optimizate My SQL Index Multiple Table JOIN

I have a 5 tables in mysql. And when I want execute query it executed too long.
There are structure of my tables:
Reciept(count rows: 23799640)reciept table structure
reciept_goods(count rows: 39398989)reciept_goods table structure
good(count rows: 17514)good table structure
good_categories(count rows: 121)good_categories table structure
retail_category(count rows: 10)retail_category table structure
My Indexes:
Date -->reciept.date #1
reciept_goods_index --> reciept_goods.recieptId #1,
reciept_goods.shopId #2,
reciept_goods.goodId #3
category_id -->good.category_id #1
I have a next sql request:
SELECT
R.shopId,
sales,
sum(Amount) as sum_amount,
count(distinct R.id) as count_reciept,
RC.id,
RC.name
FROM
reciept R
JOIN reciept_goods RG
ON R.id = RG.RecieptId
AND R.ShopID = RG.ShopId
JOIN good G
ON RG.GoodId = G.id
JOIN good_categories GC
ON G.category_id = GC.id
JOIN retail_category RC
ON GC.retail_category_id = RC.id
WHERE
R.date >= '2018-01-01 10:00:00'
GROUP BY
R.shopId,
R.sales,
RC.id
Explain this query gives next result:
Explain query
and execution time = 236sec
if use straight_join good ON (good.id = reciept_goods.GoodId ) explain query
Explain query
and execution time = 31sec
SELECT STRAIGHT_JOIN ... rest of query
I think, that problem in the indexes of my tables, but I don't uderstand how to fix them, can someone help me?
With about 2% of your rows in reciepts having the correct date, the 2nd execution plan chosen (with straight_join) seems to be the right execution order. You should be able to optimize it by adding the following covering indexes:
reciept(date, sales)
reciept_goods(recieptId, shopId, goodId, amount)
I assume that the column order in your primary key for reciept_goods currently is (goodId, recieptId, shopId) (or (goodId, shopId, receiptId)). You could change that to recieptId, shopId, goodId (and if you look at e.g. the table name, you may wanted to do this anyway); in that case, you do not need the 2nd index (at least for this query). I would assume that this primary key made MySQL take the slower execution plan (of course assuming that it would be faster) - although sometimes it's just bad statistics, especially on a test server.
With those covering indexes, MySQL should take the faster explain plan even without straight_join, if it doesn't, just add it again (although I would like a look at both executions plans then). Also check that those two new indexes are used in the explain plan, otherwise I may have missed a column.
It looks like you are depending on walking through a couple of many:many tables? Many people design them inefficiently.
Here I have compiled a list of 7 tips on making mapping tables more efficient. The most important is use of composite indexes.

Improve indexing to speed up slow query

SELECT SQL_NO_CACHE TIME_FORMAT(ADDTIME(journey.departure
, SEC_TO_TIME(SUM(link2.elapsed))), '%H:%i') AS departure
FROM journey
JOIN journey_day
ON journey_day.journey = journey.code
JOIN pattern
ON pattern.code = journey.pattern
JOIN service
ON service.code = pattern.service
JOIN link
ON link.section = pattern.section
AND link.stop = "370023591"
JOIN link link2
ON link2.section = pattern.section
AND link2.id <= link.id
WHERE journey_day.day = 6
GROUP BY journey.id
ORDER BY journey.departure
The above query takes 1-2 seconds to run. I need to reduce this to roughly 100ms. Please note that I understand the service table hasn't been used in the query, but that is just to simplify the question.
Any ideas how I can speed this up? I can see that the link table is using filesort, is this causing the slowness in the query?
One thought is that you could explicitly optimize the selection of the "link" table record with minimum "id" value.
Using a temporary table or materialized WITH statement are two approaches that would work to produce the result set. Two approaches to get the minimum value of "id" are 1) ordering by id, adding a row_number and selecting the first value; or 2) I use a windowed row_number, ordering by id, and again select the row with value 1.
Well planned indexes can by crucial to performance. From what you have presented, I would start with the following specific indexes... these are all covering indexes to qualify all the joins and criteria you will be working with. Covering indexes help the engine because the engine can get all the data that qualify without having to go to the raw data pages.
Most specifically, starting with your journey table, I would explicitly have the composite index based on all 3 fields in the order I have them... Day first as that is your WHERE criteria, then ID as that is the GROUP BY and finally DEPARTURE for your ORDER BY clause.
The LINK table based on section and stop first as those are the criteria as joined to the journey table. ID next as it is basis of joining to LINK2, and finally ELAPSED for your field criteria selection.
table index
journey (day, id, departure)
link (section, stop, id, elapsed)
pattern (code, service, section)
service (code)

Optimize mysql query using indexes

I have a problem with this query:
SELECT DISTINCT s.city, pc.start, pc.end
FROM postal_codes pc LEFT JOIN suspects s ON (s.postalcode BETWEEN pc.start AND pc.end)
WHERE pc.user_id = "username"
ORDER BY pc.start
Suspect table has about 340 000 entries, there is a index on postalcode, I have several users, but this individual query takes about 0.5s, when I run this SQL with explain, I get something like this: http://my.jetscreenshot.com/7536/20111225-myhj-41kb.jpg - does these NULLs mean that the query isn't using index? The index is a BTREE so I think this should run a little faster.
Can you please help me with this? If there are any other informations needed just let me know.
Edit: I have indexes on suspects.postalcode, postal_codes.start, postal_codes.end, postal_codes.user_id.
Basically what I'm trying to achieve: I have a table where each user ID has multiple postalcode ranges assigned, so it looks like:
user_id | start | end
Than I have a table of suspects where each suspect has an address (which contains a postalcode), so in this query I'm trying to get postalcode range - start and end and also name of the city in this range.
Hope this helps.
Whenever left join is used all the records of the first table are picked up rather than the selection on the basis of index. I would suggest to using an inner join. Something like in the below query.
select distinct
s.city,
pc.start,
pc.end
from postal_codes pc, suspect s
where
s.postalcode between (select pc1.start, pc1.end from postal_code pc1 where pc1.user_id = "username" )
and pc.user_id = "username"
order by pc.start
It's using only one index, and not for the fields involved in the join. Try creating an index for the start and end fields, or using >= and <= instead of BETWEEN
Not 100% sure, but this might be relevant:
Sometimes MySQL does not use an index, even if one is available. One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table. (In this case, a table scan is likely to be much faster because it requires fewer seeks.) However, if such a query uses LIMIT to retrieve only some of the rows, MySQL uses an index anyway, because it can much more quickly find the few rows to return in the result.
So try testing with LIMIT, and if it uses the index then, you found your cause.
I have to say I'm a little confused by your table naming convention, I would expect the "suspect" table to have a user_id not the postal_code, but you must have your reasons. If you were to leave this query as it is, you can add an index on postal_code (star,end) to avoid the complete table scan.
I think you can restructure your query like following,
SELECT DISTINCT s.city, pc1.start, pc1.end FROM
(SELECT pc.start and pc.end from postal_codes pc where pc.user_id = "username") as pc1, Suspect s
WHERE s.postalcode BETWEEN pc1.start, pc1.end ORDER BY pc1.start
your query is not picking up the index on s table because of left join and your between condition. Having an Index in your table doesn't necessarily mean that it will be used in all the queries.
Try FORCE INDEX.

How can I improve the performance of this MySQL query?

I have a MySQL query:
SELECT DISTINCT
c.id,
c.company_name,
cd.firstname,
cd.surname,
cis.description AS industry_sector
FROM (clients c)
JOIN clients_details cd ON c.id = cd.client_id
LEFT JOIN clients_industry_sectors cis ON cd.industry_sector_id = cis.id
WHERE c.record_type='virgin'
ORDER BY date_action, company_name asc, id desc
LIMIT 30
The clients table has about 60-70k rows and has an index for 'id', 'record_type', 'date_action' and 'company_name' - unfortunately the query still takes 5+ secs to complete. Removing the 'ORDER BY' reduces this to about 30ms since a filesort is not required. Is there any way I can alter this query to improve upon the 5+ sec response time?
See: http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
Especially:
In some cases, MySQL cannot use indexes to resolve the ORDER BY (..). These cases include the following:
(..)
You are joining many tables, and the columns in the ORDER BY are not all from the first nonconstant table that is used to retrieve rows. (This is the first table in the EXPLAIN output that does not have a const join type.)
You have an index for id, record_type, date_action. But if you want to order by date_action, you really need an index that has date_action as the first field in the index, preferably matching the exact fields in the order by. Otherwise yes, it will be a slow query.
Without seeing all your tables and indexes, it's hard to tell. When asking a question about speeding up a query, the query is just part of the equation.
Does clients have an index on id?
Does clients have an index on record_type
Does clients_details have an index on client_id?
Does clients_industry_sectors have an index on id?
These are the minimum you need for this query to have any chance of working quickly.
thanks so much for the input and suggestions. In the end I've decided to create a new DB table which has the sole purpose of existing to return results for this purpose so no joins are required, I just update the table when records are added or deleted to/from the master clients table. Not ideal from a data storage point of view but it solves the problem and means I'm getting results fantastically fast. :)