Rewriting MySQL subquery to be a join - mysql

I have a table with a lot of rows, and it's very inefficient to do subqueries on. I can't wrap my head around how to do a join on the data to save time.
Here is what I have:
http://sqlfiddle.com/#!2/6ab0c/3/0

This is a bit long for a comment.
First, I think you are missing an ORDER BY in the subquery. I suspect you want order by I2.date to get the "next" row.
Second, MySQL doesn't quite offer the functionality you need. You could rewrite the query using variables. But, because you don't describe what it is doing, it is hard to be sure that a rewrite would be correct. That is one way to speed the query.
Third, this query would be much faster -- and probably fast enough -- with an index on items(location, sku, date). That index is probably all you need.

SELECT I1.*, MIN(I2.exit_date)
FROM Items I1
LEFT JOIN (
SELECT date as exit_date, location, sku
FROM Items
ORDER BY date asc
) as I2
ON I2.exit_date > I1.date
AND I2.location = I1.location
AND I2.sku = I1.sku
GROUP BY I1.id

Related

Mysql Select INNER JOIN with order by very slow

I'm trying to speed up a mysql query. The Listings table has several million rows. If I don't sort them later I get the result in 0.1 seconds but once I sort it takes 7 seconds. What can I improve to speed up the query?
SELECT l.*
FROM listings l
INNER JOIN listings_categories lc
ON l.id=lc.list_id
AND lc.cat_id='2058'
INNER JOIN locations loc
ON l.location_id=loc.id
WHERE l.location_id
IN (7841,7842,7843,7844,7845,7846,7847,7848,7849,7850,7851,7852,7853,7854,7855,7856,7857,7858,7859,7860,7861,7862,7863,7864,7865,7866,7867,7868,7869,7870,7871,7872,7873,7874,7875,7876,7877,7878,7879,7880,7881,7882,7883,7884,7885,7886,7887,7888,7889,7890,7891,7892,7893,7894,7895,7896,7897,7898,7899,7900,7901,7902,7903)
ORDER BY date
DESC LIMIT 0,10;
EXPLAIN SELECT: Using Index l=date, loc=primary, lc=primary
Such performance questions are really difficult to answer and depend on the setup, indexes etc. So, there will likely not the one and only solution and even not really correct or incorrect attempts to improve the speed. This is a lof of try and error. Anyway, some points I noted which often cause performance issues are:
Avoid conditions within joins that should be placed in the where instead. A join should contain the columns only that will be joined, no further conditions. So the "lc.cat_id='2058" should be put in the where clause.
Using IN is often slow. You could try to replace it by using OR (l.location_id = 7841 OR location_id = 7842 OR...)
Open the query execution plan and check whether there is something useful for you.
Try to find out if there are special cases/values within the affected columns which slow down your query
Change "ORDER BY date" to "ORDER BY tablealias.date" and check if this makes a difference in performance. Even if not, it is better to read.
If you can rename the column "date", do this because using SQL keywords as table name or column name is no good idea. I'm unsure if this influences the performance, but it should be avoided if possible.
Good luck!
You can try additonal indexes to speed up the query, but you'll have a tradeoff when creating/manipulating data.
These combined keys could speed up the query:
listings: date, location_id
listings_categories: cat_id, list_id
Since the plan says it uses the date index, there wouldn't be a need to read the record to check the location_id when usign the new index, and same for the join with listinngs_category, index read would be enough
l: INDEX(location_id, id)
lc: INDEX(cat_id, list_id)
If those don't suffice, try the following rewrite.
SELECT l2.*
FROM
(
SELECT l1.id
FROM listings AS l1
JOIN listings_categories AS lc ON lc.list_id = l1.id
JOIN locations AS loc ON loc.id = l1.location_id
WHERE lc.cat_id='2058'
AND l1.location_id IN (7841, ..., 7903)
ORDER BY l1.date DESC
LIMIT 0,10
) AS x
JOIN listings l2 ON l1.id = x.id
ORDER BY l2.date DESC
With
listings: INDEX(location_id, date, id)
listings_categories: INDEX(cat_id, list_id)
The idea here is to get the 10 ids from the index before reaching to the table itself. Your version is probably shoveling around the whole table before sorting, and then delivering the 10.

How to make an efficient Count of rows from another table (join)

I have a database of 100,000 names in cemeteries. The cemeteries number around 6000....i wish to return the number of names in each cemetery..
If i do an individual query, it takes a millisecond
SELECT COUNT(*) FROM tblnames
WHERE tblcemetery_ID = 2
My actual query goes on and on and I end up killing it so I dont kill our database. Can someone point me at a more efficient method?
select tblcemetery.id,
(SELECT COUNT(*) FROM tblnames
WHERE tblcemetery_ID = tblcemetery.id) AS casualtyCount
from tblcemetery
ORDER BY
fldcemetery
You can rephrase your query to use a join instead of a correlated subquery:
SELECT
t1.id,
COUNT(t2.tblcemetery_ID) AS casualtyCount
FROM tblcemetery t1
LEFT JOIN tblnames t2
ON t1.id = t2.tblcemetery_ID
GROUP BY
t1.id
ORDER BY
t1.id
I have heard that in certain databases, such as Oracle, the optimizer is smart enough to figure out what I wrote above, and would refactor your query under the hood. But the MySQL optimizer might not be smart enough to do this.
One nice side effect of this refactor is that we now see an opportunity to improve performance even more, by adding indices to the join columns. I am assuming that id is the primary key of tblcemetery, in which case it is already indexed. But you could add an index to tblcemetery_ID in the tblnames table for a possible performance boost:
CREATE INDEX cmtry_idx ON tblnames (tblcemetery_ID)
It could even be done without a JOIN by using an EXISTS clause like this
SELECT id, COUNT(*) AS casualtyCount
FROM tblcemetery
WHERE EXISTS (SELECT 1 FROM tblnames WHERE tblcemetery_ID=id)
GROUP BY id
ORDER BY id
Or you could look up group by here f.e. and do something like
SELECT tblcemetery_ID, sum(1) from tblnames group by tblscemetery_id
You essentially sum up 1 for each name entry that belongs to this cemetery as you are not interessted in the names at all, no need to join to the cemetary detail table
Not sure if sum(1) or count(*) is better, both should work.
Youll only get cemetaries that have ppl inside though

Query Speed Issue with NOT EXISTS condition

I have a query that works, but it is slow. Is there a way to speed this up? Basically I have a table with timecard entries, and then a second table with time breakdowns of that entry, related by the TimecardID. What I am looking for is timeblocks that there are no breakdowns for. I thought if I cut the criteria down to 2 months that it would speed it up. Thanks for your help
SELECT * FROM Timecards
WHERE NOT EXISTS (SELECT TimeCardID FROM TimecardBreakdown WHERE Timecards.ID = TimecardBreakdown.TimeCardID)
AND Status <> 0
AND DateIn >= CURRENT_DATE() - INTERVAL 2 MONTH
It seems you want to know the TimecardIDs which do not exist in the TimecardBreakdown table, in which case you can use the left outer join.
SELECT a.*
FROM Timecards a
LEFT OUTER JOIN TimecardBreakdown b ON a.TimecardID = b.TimecardID
WHERE b.TimecardID IS NULL
This would get rid of the subquery (which is expensive) and use join (which is more efficient).
MySQL stinks doing correlated subqueries fast. Try to make your subqueries independent and join them. You can use the LEFT JOIN ... IS NULL pattern to replace WHERE NOT EXISTS.
SELECT tc.*
FROM Timecards tc
LEFT JOIN TimecardBreakdown tcb ON tc.ID = tcb.TimeCardId
WHERE tc.DateIn >= CURRENT_DATE() - INTERVAL 2 MONTH
AND tc.Status <> 0
AND tcb.TimeCardId IS NULL
Some optimization points.
First, if you can change tc.Status <> 0 to tc.Status > 0 it makes an index range scan possible on that column.
Second, when you're optimizing stuff, SELECT * is considered harmful. Instead, if you can give the names of just the columns you need, things will be quicker. The database server has to sling around all the data you ask for; it can't tell if you're going to ignore some of it.
Third, this query will be helped by a compound index on Timecards (DateIn, Status, ID). That compound index can be used to do the heavy lifing of satisfying your query conditions.
That's called a covering index; it contains the data needed to satisfy much of your query. If you were to index just the DateIn column, then the query handler would have to bounce back to the main table to find the values of Status and ID. When those columns appear in the index, it saves that extra operation.
If you SELECT a certain set of columns rather than doing SELECT *, including those columns in the covering index can dramatically improve query performance. That's one of several reasons SELECT * is considered harmful.
(Some makes and model of DBMS have ways to specify lists of columns to ride along on indexes without actually indexing them. MySQL requires you to index them. But covering indexes still help.)
Read this: http://use-the-index-luke.com/

Query takes too long to run

I am running the below query to retrive the unique latest result based on a date field within a same table. But this query takes too much time when the table is growing. Any suggestion to improve this is welcome.
select
t2.*
from
(
select
(
select
id
from
ctc_pre_assets ti
where
ti.ctcassettag = t1.ctcassettag
order by
ti.createddate desc limit 1
) lid
from
(
select
distinct ctcassettag
from
ctc_pre_assets
) t1
) ro,
ctc_pre_assets t2
where
t2.id = ro.lid
order by
id
Our able may contain same row multiple times, but each row with different time stamp. My object is based on a single column for example assettag I want to retrieve single row for each assettag with latest timestamp.
It's simpler, and probably faster, to find the newest date for each ctcassettag and then join back to find the whole row that matches.
This does assume that no ctcassettag has multiple rows with the same createddate, in which case you can get back more than one row per ctcassettag.
SELECT
ctc_pre_assets.*
FROM
ctc_pre_assets
INNER JOIN
(
SELECT
ctcassettag,
MAX(createddate) AS createddate
FROM
ctc_pre_assets
GROUP BY
ctcassettag
)
newest
ON newest.ctcassettag = ctc_pre_assets.ctcassettag
AND newest.createddate = ctc_pre_assets.createddate
ORDER BY
ctc_pre_assets.id
EDIT: To deal with multiple rows with the same date.
You haven't actually said how to pick which row you want in the event that multiple rows are for the same ctcassettag on the same createddate. So, this solution just chooses the row with the lowest id from amongst those duplicates.
SELECT
ctc_pre_assets.*
FROM
ctc_pre_assets
WHERE
ctc_pre_assets.id
=
(
SELECT
lookup.id
FROM
ctc_pre_assets lookup
WHERE
lookup.ctcassettag = ctc_pre_assets.ctcassettag
ORDER BY
lookup.createddate DESC,
lookup.id ASC
LIMIT
1
)
This does still use a correlated sub-query, which is slower than a simple nested-sub-query (such as my first answer), but it does deal with the "duplicates".
You can change the rules on which row to pick by changing the ORDER BY in the correlated sub-query.
It's also very similar to your own query, but with one less join.
Nested queries are always known to take longer time than a conventional query since. Can you append 'explain' at the start of the query and put your results here? That will help us analyse the exact query/table which is taking longer to response.
Check if the table has indexes. Unindented tables are not advisable(until unless obviously required to be unindented) and are alarmingly slow in executing queries.
On the contrary, I think the best case is to avoid writing nested queries altogether. Bette, run each of the queries separately and then use the results(in array or list format) in the second query.
First some questions that you should at least ask yourself, but maybe also give us an answer to improve the accuracy of our responses:
Is your data normalized? If yes, maybe you should make an exception to avoid this brutal subquery problem
Are you using indexes? If yes, which ones, and are you using them to the fullest?
Some suggestions to improve the readability and maybe performance of the query:
- Use joins
- Use group by
- Use aggregators
Example (untested, so might not work, but should give an impression):
SELECT t2.*
FROM (
SELECT id
FROM ctc_pre_assets
GROUP BY ctcassettag
HAVING createddate = max(createddate)
ORDER BY ctcassettag DESC
) ro
INNER JOIN ctc_pre_assets t2 ON t2.id = ro.lid
ORDER BY id
Using normalization is great, but there are a few caveats where normalization causes more harm than good. This seems like a situation like this, but without your tables infront of me, I can't tell for sure.
Using distinct the way you are doing, I can't help but get the feeling you might not get all relevant results - maybe someone else can confirm or deny this?
It's not that subqueries are all bad, but they tend to create massive scaleability issues if written incorrectly. Make sure you use them the right way (google it?)
Indexes can potentially save you for a bunch of time - if you actually use them. It's not enough to set them up, you have to create queries that actually uses your indexes. Google this as well.

MySQL Joins, Group By, and Ordering the Group By Choice

Is it possible to order the GROUP BY chosen results of a MySQL query w/out using a subquery? I'm finding that, with my large dataset, the subquery adds a significant amount of load time to my query.
Here is a similar situation: how to sort order of LEFT JOIN in SQL query?
This is my code that works, but it takes way too long to load:
SELECT tags.contact_id, n.last
FROM tags
LEFT JOIN ( SELECT * FROM names ORDER BY timestamp DESC ) n
ON (n.contact_id=tags.contact_id)
WHERE tags.tag='$tag'
GROUP BY tags.contact_id
ORDER BY n.last ASC;
I can get a fast result doing a simple join w/ a table name, but the "group by" command gives me the first row of the joined table, not the last row.
I'm not really sure what you're trying to do. Here are some of the problems with your query:
selecting n.last, although it is neither in the group by clause, nor an aggregate value. Although MySQL allows this, it's really not a good idea to take advantage of.
needlessly sorting a table before joining, instead of just joining
the subquery isn't really doing anything
I would suggest carefully writing down the desired query results, i.e. "I want the contact id and latest date for each tag" or something similar. It's possible that will lead to a natural, easy-to-write and semantically correct query that is also more efficient than what you showed in the OP.
To answer the question "is it possible to order a GROUP BY query": yes, it's quite easy, and here's an example:
select a, b, sum(c) as `c sum`
from <table_name>
group by a,b
order by `c sum`
You are doing a LEFT JOIN on contact ID which implies you want all tag contacts REGARDLESS of finding a match in the names table. Is that really the case, or will the tags table ALWAYS have a "Names" contact ID record. Additionally, your column "n.Last". Is this the person's last name, or last time something done (which I would assume is actually the timestamp)...
So, that being said, I would just do a simple direct join
SELECT DISTINCT
t.contact_id,
n.last
FROM
tags t
JOIN names n
ON t.contact_id = n.contact_id
WHERE
t.tag = '$tag'
ORDER BY
n.last ASC