Mysql subquery order when using IN - mysql

Does the order of the results inside a mysql subquery affect the order of the actual query?
I tried it but did not came to a real result cause sometimes it seemed so and sometimes it doesn't.
eg:
SELECT name FROM people WHERE pid IN (SELECT mid FROM member ORDER BY mdate)
Is the "order by"-clause going to affect the order of the results in this case?
Thanks.

No it cant and if you want to change the order as per your need then better use a JOIN
Something like this:-
select name
from people p inner join member m on p.pid = m.mid
order by p.name

Your outer query doesn't have ORDER BY; thus, order is not guaranteed.
I guess the only part which may be affected in this particular case is optimizer which might generate a different execution plan depends on how results of subquery are sorted...

Irrespective of whether or not the outer query results depend on the order by clause in the sub-query, one should never depend on the order. If you need any particular order of outer query results, you should explicitly use order by clause on the outer query. AFAK, it makes sense to use order by clause in a sub-query only if you have to use TOP clause in SELECT clause of the sub-query.

It really can't. The data is coming from the from clause. Your subquery is in the where clause. It is just used to filter the rows. If you want the ordering:
select p.name
from people p join
(select member, min(mdate) as minmdate
from member
group by member
) m
on p.pid = m.mid
order by minmdate;
That is, join in the results between the two tables. I am assuming that member could have duplicates, and you want the earliest date associated with each member.

Related

MySql: order by along with group by - performance

I have the performance problem with query that have order by and group by. I have checked similar problems on SO but I did not find the solution to this:(
I have something like this in my db schema:
pattern has many pattern_file belongs to project_template which belongs to project
Now I want to get projects filtered by some data(additional tables that I join) and want to get the result ordered for example by projects.priority and grouped by patterns.id. I have tried many things and to get the desired result I've figured out this query:
SELECT DISTINCT `projects`.* FROM `projects`
INNER JOIN `project_templates` ON `project_templates`.`project_id` = `projects`.`id`
INNER JOIN `pattern_files` ON `pattern_files`.`id` = `project_templates`.`pattern_file_id`
INNER JOIN `patterns` ON `patterns`.`id` = `pattern_files`.`pattern_id`
...[ truncated ]
INNER JOIN (SELECT DISTINCT projects.id FROM `projects` INNER JOIN `project_templates` ON `project_templates`.`project_id` = `projects`.`id`
INNER JOIN `pattern_files` ON `pattern_files`.`id` = `project_templates`.`pattern_file_id`
INNER JOIN `patterns` ON `patterns`.`id` = `pattern_files`.`pattern_id`
...[ truncated ]
WHERE [here my conditions] ORDER BY [here my order]) P
ON P.id = projects.id
WHERE [here my conditions]
GROUP BY patterns.id
ORDER BY [here my order]
From my research I have to INNER JOIN with subquery to conquer the problem "ORDER BY before GROUPing BY" => then I have put the same conditions on the outer query for performance purpose. The order by I had to use again in the outer query too, otherwise the result will be sorted by default.
Now there is real performance problem as I have about 6k projects and when I run this query without any conditions it takes about 15s :/ When I narrow the result by specify the conditions the time drastically dropped down. I've found somewhere that the subquery is run for every outer query row result which could be true when you watch at the execution time :/
Could you please give some advice how I can optimize the query? I do not work much with sql so maybe I do it from the wrong side from the very beginning?
P.S. I have tried WHERE projects.id IN (Select project.id FROM projects ....) and that discarded the performance issue but also discarded the ORDER BY before GROUPing BY
EDIT.
I want to retrieve list of projects, but I want also to filter it and order, and finally I want to get patterns.id unique(that is why I use the group by).
order by in your inner query (p) doesn't make sense (any inner sort will only
have an arbitrary effect).
#Solarflare Unfortunately it does. group by will take first row from grouped result. It preserve the order for join. Well, I believe that it is specific to MySql. Furthermore to keep the order from subquery I could use ORDER BY NULL in outer query :-)
Also, select projects.* ... group by pattern.id is fishy (although MySQL, in contrast to every other dbms, allows you to do this)
so we can assume I retrieve only projects.id, but from docs:
MySQL extends the use of GROUP BY to permit selecting fields that are not mentioned in the GROUP BY clause

mysql sub query with inner join?

I'm trying to select the 10 last rows from my table messages. I'm selecting the name and last name too from table users using inner join.
The thing is I need this rows in ascendant order, so I'm trying to use a subquery as this post accepted answer.
SELECT * FROM (
SELECT me.id, me.message, us.name1, us.lname1, SUBSTRING(us.lname2,1,1)
FROM messages me INNER JOIN users us on me.rut=us.rut
ORDER BY me.id DESC LIMIT 10
) tmp ORDER BY tmp.me.id ASC;
But it doesn't work, I actually don't know what's the proper way to do this with inner join.
Anyways, how can I make it work?
note: The inside parentesis query is working, it's just the outside parentesis query that doesn't work.
In the outer query you will only see a tmp.id and not a tmp.me.id. So your oder clause should be
ORDER BY id
(As the tmp.id is the only one you can leave the tmp. away and ORDER BY implicitly uses ASC.)

MySQL ORDER BY only returns one row

This is my code :
SELECT *
FROM Event_list
WHERE interest in
(
SELECT Interest_name
from Interest
where Interest_id in
(
SELECT Interest_id
FROM `User's Interests`
where P_id=Pid and is_canceled=0
)
)
order by count(Eid) desc
I don't use any GROUP BY clause but still only get one row. when removing the ORDER BY clause I get all the correct rows (but not in the right order).
I'm trying to return a view (named Event_list) sorted by most common Eid (Event id), but I want to see every row without any grouping.
COUNT() is a group function, so using it will automatically result in grouping of rows. This is why you get only one row in your result when you use it in your ORDER BY clause.
Unfortunately, it's not clear what you're trying to do, so I can't tell you how to rewrite your query to get your desired results.
I suspect the query you want is more like this:
SELECT el.*,
(select count(*)
from interest i join
UserInterests ui
on ui.is_canceled = 0 and ui.p_id = i.id
where el.interest = i.interest_name
) as cnt
FROM Event_list el
ORDER BY cnt desc;
It is a bit hard to tell without sample data and a better formed query. Some notes:
Don't use special characters in table and column names. Having to escape the names merely leads to queries that are harder to read, write, and understand.
Qualify column names, so you know what tables columns come from.
Use table aliases -- so queries are easier to write and to read.
The WHERE clause only does filtering. Your description of the problem doesn't seem to involve filtering, only ordering.
Any time you use an aggregation function, the query automatically becomes an aggregation query. Without a group by, exactly one row is returned.
Give foreign keys the same names as primary keys, where possible.
You may try:
SELECT L.* , C.Cnt
FROM Event_list L
LEFT JOIN (
SELECT E.EID, COUNT(*) AS Cnt
FROM Event_List E
JOIN Interest I
ON E.Interest = I.Interest_name
JOIN `User's Interests` U
ON U.Interest_id = I.Insert_Id
Where U.P_id=Pid and U.is_canceled=0
GROUP BY E.EID
) C
ON E.Eid = C.Eid
Order By Cnt DESC
I don't have the tables to test so you may want to correct column names and other conditions. Just provide you the idea.

Best way to write this query?

I am doing a sub-query join to another table as I wanted to be able to sort the results I got back with it, I only need the first row but I need them ordered in a certain way so I would get the lowest id.
I tried adding LIMIT 1 to this but then the full query returned 0 results; so now it has no limit and in the EXPLAIN I have two rows showing they are using the full 10k+ rows of the auction_media table.
I wrote it this way to avoid having to query the auction_media table for each row separately, but now I'm thinking that this way isn't that great if it has to use the whole auction_media table?
Which way is better? The way I have it or querying the auction_media table separately? ...or is there a better way!?
Here is the code:
SELECT
a.auction_id,
a.name,
media.media_url
FROM
auctions AS a
LEFT JOIN users AS u ON u.user_id=a.owner_id
INNER JOIN ( SELECT media_id,media_url,auction_id
FROM auction_media
WHERE media_type=1
AND upload_in_progress=0
ORDER BY media_id ASC
) AS media
ON a.auction_id=media.auction_id
WHERE a.hpfeat=1
AND a.active=1
AND a.approved=1
AND a.closed=0
AND a.creation_in_progress=0
AND a.deleted=0
AND (a.list_in='auction' OR u.shop_active='1')
GROUP BY a.auction_id;
Edit: Through my testing, using the above query seems like it would be the much faster method overall; however I worry if that will still be the case when the auction_media table grows to like 1M rows or something.
edit: As stated in the comments - DISTINCT is not required because the auctions table can only be associated with (at most) one user table row and one row in the inner query.
You may want to try this. The outer query's GROUP BY is replaced with DISTINCT since you don't have any aggregate function. The inner query, was replaced by a query to find the smallest media_id per auction_id, then JOINed back to get the media_url. (Since I didn't know if the media_id and auction_id were a composite unique key, I used the same WHERE clause to help eliminate potential duplicates.)
SELECT
a.auction_id,
a.name,
auction_media.media_url
FROM auctions AS a
LEFT JOIN users AS u
ON u.user_id=a.owner_id
INNER JOIN (SELECT auction_id, MIN(media_id) AS media_id
FROM auction_media
WHERE media_type=1
AND upload_in_progress=0
GROUP BY auction_id) AS media
ON a.auction_id=media.auction_id
INNER JOIN auction_media
ON auction_media.media_id = media.media_id
AND auction_media.auction_id = media.auction_id
AND auction_media.media_type=1
AND auction_media.upload_in_progress=0
WHERE a.hpfeat=1
AND a.active=1
AND a.approved=1
AND a.closed=0
AND a.creation_in_progress=0
AND a.deleted=0
AND (a.list_in='auction' OR u.shop_active='1');

What would cause a query to run slowly when used a subquery, but not when run separately?

I have something similar to the following:
SELECT c.id
FROM contact AS c
WHERE c.id IN (SELECT s.contact_id
FROM sub_table AS s
LEFT JOIN contact_sub AS c2 ON (s.id = c2.sub_field)
WHERE c2.phone LIKE '535%')
ORDER BY c.name
The problem is that the query takes a very very very long time (>2minutes), but if I take the subquery, run it separately, implode the ids and insert them into the main query, it runs in well less than 1 second, including the data retrival and implosion.
I have checked the explains on both methods and keys are being used appropriately and the same ways. The subquery doesn't return more than 200 IDs.
What could be causing the subquery method to take so much longer?
BTW, I know the query above can be written with joins, but the query I have can't be--this is just a simplified version.
Using MySQL 5.0.22.
Sounds suspiciously like MySQL bug #32665: Query with dependent subquery is too slow.
What happens if you try it like this?
SELECT c.id
FROM contact AS c
INNER JOIN (SELECT s.contact_id
FROM sub_table AS s
LEFT JOIN contact_sub AS c2 ON (s.id = c2.sub_field)
WHERE c2.phone LIKE '535%') subq ON subq.contact_id=c.id
ORDER BY c.name
Assuming that the result of s.contact_id is unique. You can add distinct to the subquery if it is not.
I always use uncorrelated subqueries this way rather than using the IN operator in the where clause.
Have you checked the Execution Plan for the query? This will usually show you the problem.
Can't you do another join instead of a subquery?
SELECT c.id
FROM contact AS c
JOIN sub_table AS s on c.id = s.contact_id
LEFT JOIN contact_sub AS cs ON (s.id = cs.sub_field)
WHERE cs.phone LIKE '535%'
ORDER BY c.name
Since the subquery is referring to a field sub_field in the outer select, it has to be run once for each row in the outer table - the results for the inner query will change with each row in the outer table.
It's a correlated subquery. It runs once for each row in the outer select. (I think. You have two tables with the same correlation name, I'm assuming that's a typo. That you say it can't be rewritten as a join means it's correlated. )
Ok, I'm going to give you something to try. You say that the subquery is not correlated, and that you still can't join on it. And that it you take the output of the subquery, and lexically substitute that for the subquery, the main query runs much faster.
So try this: make the subquery into a view: create view foo followed by the text of the subquery. Then rewrite the main query to get rid of the "IN" clause and instead join to the view.
How's the timing on that?