Best way to write this query? - mysql

I am doing a sub-query join to another table as I wanted to be able to sort the results I got back with it, I only need the first row but I need them ordered in a certain way so I would get the lowest id.
I tried adding LIMIT 1 to this but then the full query returned 0 results; so now it has no limit and in the EXPLAIN I have two rows showing they are using the full 10k+ rows of the auction_media table.
I wrote it this way to avoid having to query the auction_media table for each row separately, but now I'm thinking that this way isn't that great if it has to use the whole auction_media table?
Which way is better? The way I have it or querying the auction_media table separately? ...or is there a better way!?
Here is the code:
SELECT
a.auction_id,
a.name,
media.media_url
FROM
auctions AS a
LEFT JOIN users AS u ON u.user_id=a.owner_id
INNER JOIN ( SELECT media_id,media_url,auction_id
FROM auction_media
WHERE media_type=1
AND upload_in_progress=0
ORDER BY media_id ASC
) AS media
ON a.auction_id=media.auction_id
WHERE a.hpfeat=1
AND a.active=1
AND a.approved=1
AND a.closed=0
AND a.creation_in_progress=0
AND a.deleted=0
AND (a.list_in='auction' OR u.shop_active='1')
GROUP BY a.auction_id;
Edit: Through my testing, using the above query seems like it would be the much faster method overall; however I worry if that will still be the case when the auction_media table grows to like 1M rows or something.

edit: As stated in the comments - DISTINCT is not required because the auctions table can only be associated with (at most) one user table row and one row in the inner query.
You may want to try this. The outer query's GROUP BY is replaced with DISTINCT since you don't have any aggregate function. The inner query, was replaced by a query to find the smallest media_id per auction_id, then JOINed back to get the media_url. (Since I didn't know if the media_id and auction_id were a composite unique key, I used the same WHERE clause to help eliminate potential duplicates.)
SELECT
a.auction_id,
a.name,
auction_media.media_url
FROM auctions AS a
LEFT JOIN users AS u
ON u.user_id=a.owner_id
INNER JOIN (SELECT auction_id, MIN(media_id) AS media_id
FROM auction_media
WHERE media_type=1
AND upload_in_progress=0
GROUP BY auction_id) AS media
ON a.auction_id=media.auction_id
INNER JOIN auction_media
ON auction_media.media_id = media.media_id
AND auction_media.auction_id = media.auction_id
AND auction_media.media_type=1
AND auction_media.upload_in_progress=0
WHERE a.hpfeat=1
AND a.active=1
AND a.approved=1
AND a.closed=0
AND a.creation_in_progress=0
AND a.deleted=0
AND (a.list_in='auction' OR u.shop_active='1');

Related

How can you LEFT JOIN a table to a table being joined ON a sub-select?

I am using the following query to join tables together and it seems to be working although I can't really understand why:
SELECT
listing.name,
biggestLot.price,
FROM Listings listing
LEFT JOIN Lots biggestLot ON biggestLot.lotNumber = (
SELECT lotNumber
FROM Lots
WHERE lotNumber IN
(
SELECT lotNumber
FROM Listings listingInner
WHERE listingInner.listingNumber = listing.listingNumber
)
ORDER BY size DESC
LIMIT 1
)
GROUP BY listing.listingNumber
The context is that each "Listing" is associated with one or more "Lots" through a "lotNumber" field. The tables aren't normalized; i.e. for any given Listing, there may be one or more rows in the "Listings" table that have duplicate values for all fields except for "lotNumber". The intent of the above query is to find the price of the biggest Lot associated with each Listing.
The inner SELECT is getting the largest Lot for a given Listing and I understand how that works. What I don't understand is how the LEFT JOIN merges the biggestLot result with the outer SELECT Listings query. I'm not specifying an ON field to combine the two tables so how does it know how to combine the results?
You could can use window functions along with the JOIN:
SELECT l.name, lo.price,
FROM Listings l LEFT JOIN
(SELECT lo.*,
ROW_NUMBER() OVER (PARITION BY lo.lotNumber ORDER BY lo.size DESC) as seqnum
FROM Lots lo
) lo
USING (lotNmber);

MySQL: Optimizing Sub-queries

I have this query I need to optimize further since it requires too much cpu time and I can't seem to find any other way to write it more efficiently. Is there another way to write this without altering the tables?
SELECT category, b.fruit_name, u.name
, r.count_vote, r.text_c
FROM Fruits b, Customers u
, Categories c
, (SELECT * FROM
(SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r
WHERE b.fruit_id = r.fruit_id
AND u.customer_id = r.customer_id
AND category = "Fruits";
This is your query re-written with explicit joins:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN
(
SELECT * FROM
(
SELECT *
FROM Reviews
ORDER BY fruit_id, count_vote DESC, r_id
) a
GROUP BY fruit_id
) r on r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
CROSS JOIN Categories c
WHERE c.category = 'Fruits';
(I am guessing here that the category column belongs to the categories table.)
There are some parts that look suspicious:
Why do you cross join the Categories table, when you don't even display a column of the table?
What is ORDER BY fruit_id, count_vote DESC, r_id supposed to do? Sub query results are considered unordered sets, so an ORDER BY is superfluous and can be ignored by the DBMS. What do you want to achieve here?
SELECT * FROM [ revues ] GROUP BY fruit_id is invalid. If you group by fruit_id, what count_vote and what r.text_c do you expect to get for the ID? You don't tell the DBMS (which would be something like MAX(count_vote) and MIN(r.text_c)for instance. MySQL should through an error, but silently replacescount_vote, r.text_cbyANY_VALUE(count_vote), ANY_VALUE(r.text_c)` instead. This means you get arbitrarily picked values for a fruit.
The answer hence to your question is: Don't try to speed it up, but fix it instead. (Maybe you want to place a new request showing the query and explaining what it is supposed to do, so people can help you with that.)
Your Categories table seems not joined/related to the others this produce a catesia product between all the rows
If you want distinct resut don't use group by but distint so you can avoid an unnecessary subquery
and you dont' need an order by on a subquery
SELECT category
, b.fruit_name
, u.name
, r.count_vote
, r.text_c
FROM Fruits b
INNER JOIN Customers u ON u.customer_id = r.customer_id
INNER JOIN Categories c ON ?????? /Your Categories table seems not joined/related to the others /
INNER JOIN (
SELECT distinct fruit_id, count_vote, text_c, customer_id
FROM Reviews
) r ON b.fruit_id = r.fruit_id
WHERE category = "Fruits";
for better reading you should use explicit join syntax and avoid old join syntax based on comma separated tables name and where condition
The next time you want help optimizing a query, please include the table/index structure, an indication of the cardinality of the indexes and the EXPLAIN plan for the query.
There appears to be absolutely no reason for a single sub-query here, let alone 2. Using sub-queries mostly prevents the DBMS optimizer from doing its job. So your biggest win will come from eliminating these sub-queries.
The CROSS JOIN creates a deliberate cartesian join - its also unclear if any attributes from this table are actually required for the result, if it is there to produce multiples of the same row in the output, or just an error.
The attribute category in the last line of your query is not attributed to any of the tables (but I suspect it comes from the categories table).
Further, your code uses a GROUP BY clause with no aggregation function. This will produce non-deterministic results and is a bug. Assuming that you are not exploiting a side-effect of that, the query can be re-written as:
SELECT
category, b.fruit_name, u.name, r.count_vote, r.text_c
FROM Fruits b
JOIN Reviews r
ON r.fruit_id = b.fruit_id
JOIN Customers u ON u.customer_id = r.customer_id
ORDER BY r.fruit_id, count_vote DESC, r_id;
Since there are no predicates other than joins in your query, there is no scope for further optimization beyond ensuring there are indexes on the join predicates.
As all too frequently, the biggest benefit may come from simply asking the question of why you need to retrieve every single row in the tables in a single query.

MySql: order by along with group by - performance

I have the performance problem with query that have order by and group by. I have checked similar problems on SO but I did not find the solution to this:(
I have something like this in my db schema:
pattern has many pattern_file belongs to project_template which belongs to project
Now I want to get projects filtered by some data(additional tables that I join) and want to get the result ordered for example by projects.priority and grouped by patterns.id. I have tried many things and to get the desired result I've figured out this query:
SELECT DISTINCT `projects`.* FROM `projects`
INNER JOIN `project_templates` ON `project_templates`.`project_id` = `projects`.`id`
INNER JOIN `pattern_files` ON `pattern_files`.`id` = `project_templates`.`pattern_file_id`
INNER JOIN `patterns` ON `patterns`.`id` = `pattern_files`.`pattern_id`
...[ truncated ]
INNER JOIN (SELECT DISTINCT projects.id FROM `projects` INNER JOIN `project_templates` ON `project_templates`.`project_id` = `projects`.`id`
INNER JOIN `pattern_files` ON `pattern_files`.`id` = `project_templates`.`pattern_file_id`
INNER JOIN `patterns` ON `patterns`.`id` = `pattern_files`.`pattern_id`
...[ truncated ]
WHERE [here my conditions] ORDER BY [here my order]) P
ON P.id = projects.id
WHERE [here my conditions]
GROUP BY patterns.id
ORDER BY [here my order]
From my research I have to INNER JOIN with subquery to conquer the problem "ORDER BY before GROUPing BY" => then I have put the same conditions on the outer query for performance purpose. The order by I had to use again in the outer query too, otherwise the result will be sorted by default.
Now there is real performance problem as I have about 6k projects and when I run this query without any conditions it takes about 15s :/ When I narrow the result by specify the conditions the time drastically dropped down. I've found somewhere that the subquery is run for every outer query row result which could be true when you watch at the execution time :/
Could you please give some advice how I can optimize the query? I do not work much with sql so maybe I do it from the wrong side from the very beginning?
P.S. I have tried WHERE projects.id IN (Select project.id FROM projects ....) and that discarded the performance issue but also discarded the ORDER BY before GROUPing BY
EDIT.
I want to retrieve list of projects, but I want also to filter it and order, and finally I want to get patterns.id unique(that is why I use the group by).
order by in your inner query (p) doesn't make sense (any inner sort will only
have an arbitrary effect).
#Solarflare Unfortunately it does. group by will take first row from grouped result. It preserve the order for join. Well, I believe that it is specific to MySql. Furthermore to keep the order from subquery I could use ORDER BY NULL in outer query :-)
Also, select projects.* ... group by pattern.id is fishy (although MySQL, in contrast to every other dbms, allows you to do this)
so we can assume I retrieve only projects.id, but from docs:
MySQL extends the use of GROUP BY to permit selecting fields that are not mentioned in the GROUP BY clause

mysql joining efficiency - join with where then join with something else

I have a query that looks like this:
select `adverts`.*
from `adverts`
inner join `advert_category` on `advert_category`.`advert_id` = `adverts`.`id`
inner join `advert_location` on `adverts`.`id` = `advert_location`.`advert_id`
where `advert_location`.`location_id` = ?
and `advert_category`.`category_id` = ?
order by `updated_at` desc
The problem here is I have a huge database and this response is absolutely ravaging my database.
What I really need is to do the first join, and then do there where clause. This will whittle down my response from like 100k queries to less than 10k, then I want to do the other join, in order to whittle down the responses again so I can get the advert_location on the category items.
Doing it as is just isn't viable.
So, how do I go about using a join and a where condition, and then after getting that response doing a further join with a where condition?
Thanks
This is your query, written a bit simpler so I can read it:
select a.*
from adverts a inner join
advert_category ac
on ac.advert_id = a.id inner join
advert_location al
on al.advert_id = a.id
where al.location_id = ? and
ac.category_id = ?
order by a.updated_at desc;
I am speculating that advert_category and advert_locations have multiple rows per advert. In that case, you are getting a Cartesian product for each advert.
A better way to write the query uses exists:
select a.*
from adverts a
where exists (select 1
from advert_location al
where al.advert_id = a.id and al.location_id = ?
) and
exists (select 1
from advert_category ac
where ac.advert_id = a.id and ac.category_id = ?
)
order by a.updated_at desc;
For this version, you want indexes on advert_location(advert_id, location_id), advert_category(advert_id, category_id), and probably advert(updated_at, id).
You can write the 1st join in a Derived Table including a WHERE-condition and then do the 2nd join (but a decent optimizer might resolve the Derived Table again and do what he thinks is best based on statistics):
select adverts.*
from
(
select `adverts`.*
from `adverts`
inner join `advert_category`
on `advert_category`.`advert_id` =`adverts`.`id`
where `advert_category`.`category_id` = ?
) as adverts
inner join `advert_location`
on `adverts`.`id` = `advert_location`.`advert_id`
where `advert_location`.`location_id` = ?
order by `updated_at` desc
MySQL will reorder inner joins for you during optimization, regardless of how you wrote them in your query. Inner join is the same in either direction (in algebra this is called commutative), so this is safe to do.
You can see the result of join reordering if you use EXPLAIN on your query.
If you don't like the order MySQL chose for your joins, you can override it with this kind of syntax:
from `adverts`
straight_join `advert_category` ...
https://dev.mysql.com/doc/refman/5.7/en/join.html says:
STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer processes the tables in a suboptimal order.
Once the optimizer has decided on the join order, it always does one join at a time, in that order. This is called the nested join method.
There isn't really any way to "do the join then do the where clause". Conditions are combined together when looking up rows for joined tables. But this is a good thing, because you can then create a compound index that helps match rows based on both join conditions and where conditions.
PS: When asking query optimization question, you should include the EXPLAIN output, and also run SHOW CREATE TABLE <tablename> for each table, and include the result. Then we don't have to guess at the columns and indexes in your table.

How can I make these two queries into one?

I have two tables, one for downloads and one for uploads. They are almost identical but with some other columns that differs them. I want to generate a list of stats for each date for each item in the table.
I use these two queries but have to merge the data in php after running them. I would like to instead run them in a single query, where it would return the columns from both queries in each row grouped by the date. Sometimes there isn't any download data, only upload data, and in all my previous tries it skipped the row if it couldn't find log data from both rows.
How do I merge these two queries into one, where it would display data even if it's just available in one of the tables?
SELECT DATE(upload_date_added) as upload_date, SUM(upload_size) as upload_traffic, SUM(upload_files) as upload_files
FROM packages_uploads
WHERE upload_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY upload_date
ORDER BY upload_date DESC
SELECT DATE(download_date_added) as download_date, SUM(download_size) as download_traffic, SUM(download_files) as download_files
FROM packages_downloads
WHERE download_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY download_date
ORDER BY download_date DESC
I want to get result rows like this:
date, upload_traffic, upload_files, download_traffic, download_files
All help appreciated!
Your two queries can be executed and then combined with the UNION cluase along with an extra field to identify Uploads and Downloads on separate lines:
SELECT
'Uploads' TransmissionType,
DATE(upload_date_added) as TransmissionDate,
SUM(upload_size) as TransmissionTraffic,
SUM(upload_files) as TransmittedFileCount
FROM
packages_uploads
WHERE upload_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY upload_date
ORDER BY upload_date DESC
UNION
SELECT
'Downloads',
DATE(download_date_added),
SUM(download_size),
SUM(download_files)
FROM packages_downloads
WHERE download_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY download_date
ORDER BY download_date DESC;
Give it a Try !!!
What you're asking can only work for rows that have the same add date for upload and download. In this case I think this SQL should work:
SELECT
DATE(u.upload_date_added) as date,
SUM(u.upload_size) as upload_traffic,
SUM(u.upload_files) as upload_files,
SUM(d.download_size) as download_traffic,
SUM(d.download_files) as download_files
FROM
packages_uploads u, packages_downloads d
WHERE u.upload_date_added = d.download_date_added
AND u.upload_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY date
ORDER BY date DESC
Without knowing the schema is hard to give the exact answer so please see the following as a concept not a direct answer.
You could try left join, im not sure if the table package exists but the following may be food for thought
SELECT
p.id,
up.date as upload_date
dwn.date as download_date
FROM
package p
LEFT JOIN package_uploads up ON
( up.package_id = p.id WHERE up.upload_date = 'etc' )
LEFT JOIN package_downloads dwn ON
( dwn.package_id = p.id WHERE up.upload_date = 'etc' )
The above will select all the packages and attempt to join and where the value does not join it will return null.
There is number of ways that you can do this. You can join using primary key and foreign key. In case if you do not have relationship between tables,
You can use,
LEFT JOIN / LEFT OUTER JOIN
Returns all records from the left table and the matched
records from the right table. The result is NULL from the
right side when there is no match.
RIGHT JOIN / RIGHT OUTER JOIN
Returns all records from the right table and the matched
records from the left table. The result is NULL from the left
side when there is no match.
FULL OUTER JOIN
Return all records when there is a match in either left or right table records.
UNION
Is used to combine the result-set of two or more SELECT statements.
Each SELECT statement within UNION must have the same number of,
columns The columns must also have similar data types The columns in,
each SELECT statement must also be in the same order.
INNER JOIN
Select records that have matching values in both tables. -this is good for your situation.
INTERSECT
Does not support MySQL.
NATURAL JOIN
All the column names should be matched.
Since you dont need to update these you can create a view from joining tables then you can use less query in your PHP. But views cannot update. And you did not mentioned about relationship between tables. Because of that I have to go with the UNION.
Like this,
CREATE VIEW checkStatus
AS
SELECT
DATE(upload_date_added) as upload_date,
SUM(upload_size) as upload_traffic,
SUM(upload_files) as upload_files
FROM packages_uploads
WHERE upload_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY upload_date
ORDER BY upload_date DESC
UNION
SELECT
DATE(download_date_added) as download_date,
SUM(download_size) as download_traffic,
SUM(download_files) as download_files
FROM packages_downloads
WHERE download_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY download_date
ORDER BY download_date DESC
Then anywhere you want to select you just need one line:
SELECT * FROM checkStatus
learn more.