Group BY and ORDER BY optimization - mysql

I'm trying to optimize a query on a table of 180 000 rows.
SELECT
qid
FROM feed_pool
WHERE (
(feed_pool.uid IN
(SELECT uid_followed
FROM followers
WHERE uid_follower = 123 AND unfollowed = 0) AND feed_pool.uid != 123)
OR feed_pool.tid IN (SELECT tid FROM follow_tags WHERE follow_tags.uid = 123)
)
GROUP BY feed_pool.qid ORDER BY feed_pool.id DESC LIMIT 20
The worst part of this query is not the WHERE clause, it is the GROUP BY and ORDER BY part.
Actually, if I do just the GROUP BY, it's fine. Just the ORDER BY is also fine. The problem is when I use both.
I have tried different indexes, and I'm now using an index on feedpool.qid and feedpool.uid.
A good hack is to first SELECT the last 20 rows (ORDER BY), and then do the GROUP BY. But obviously it's not exactly what I want to do, in some cases I don't have 20 rows in the end.
I really don't know what to do. I can change my structure if it optimizes my request (20 sec...). Really, every tip would be appreciated.
Thanks in advance.

try this
GROUP BY feed_pool.qid ORDER BY 1 DESC LIMIT 20

Do you hear about JOIN? Subqueries is always bad for perfomance.
Try something like this:
SELECT feed_pool.qid, followers.uid as fuid, follow_tags as ftuid
FROM feed_pool
LEFT JOIN followers
ON feed_pool.uid = followers.uid_followed
AND followers.uid_follower = 123
AND followers.unfollowed = 0
AND feed_pool.uid != 123
LEFT JOIN follow_tags
ON feed_pool.tid = follow_tags.tid
AND follow_tags.uid = 123
WHERE
fuid IS NOT NULL
OR ftuid IS NOT NULL
ORDER BY feed_pool.id DESC
LIMIT 20

Related

Order By before Group By in Mysql

I have a problem when use Order By and Group By in a query string.
I want Order By before Group By but it's don't work.
I searched and find some solution but it don't work with me:
SELECT * FROM
(
SELECT minder_id, service_type_id
FROM minder_service
WHERE minder_id = 238
AND deleted_at is null
ORDER BY service_type_id ASC
) AS t
GROUP BY t.minder_id
Run 1
SELECT minder_id, service_type_id
FROM minder_service
WHERE minder_id = 238
AND deleted_at is null
ORDER BY service_type_id ASC
Result:
Photo for Result 1
Run 2: Full
Photo for Result 2
Please have a look at.
Thanks so much.
If you want the lowest service_type_id, you can use the MIN function:
SELECT minder_id, MIN(service_type_id)
FROM minder_service
WHERE minder_id = 238 AND deleted_at IS NULL
GROUP BY minder_id
Also make sure deleted_at is really NULL for the record with service_type_id 1 if you say you expect that record.
A subquery returns an unordered set. The database doesn't have to keep the order when passing the result of a subquery to an outer query, or when performing the group by operation.
You should rethink what you're trying to do.

Mysql: Order by max N values from subquery

I'm about to throw in the towel with this.
Preface: I want to make this work with any N, but for the sake of simplicity, I'll set N to be 3.
I've got a query (MySQL, specifically) that needs to pull in data from a table and sort based on top 3 values from that table and after that fallback to other sort criteria.
So basically I've got something like this:
SELECT tbl.id
FROM
tbl1 AS maintable
LEFT JOIN
tbl2 AS othertable
ON
maintable.id = othertable.id
ORDER BY
othertable.timestamp DESC,
maintable.timestamp DESC
Which is all basic textbook stuff. But the issue is I need the first ORDER BY clause to only get the three biggest values in othertable.timestamp and then fallback on maintable.timestamp.
Also, doing a LIMIT 3 subquery to othertable and join it is a no go as this needs to work with an arbitrary number of WHERE conditions applied to maintable.
I was almost able to make it work with a user variable based approach like this, but it fails since it doesn't take into account ordering, so it'll take the FIRST three othertable values it finds:
ORDER BY
(
IF(othertable.timestamp IS NULL, 0,
IF(
(#rank:=#rank+1) > 3, null, othertable.timestamp
)
)
) DESC
(with a #rank:=0 preceding the statement)
So... any tips on this? I'm losing my mind with the problem. Another parameter I have for this is that since I'm only altering an existing (vastly complicated) query, I can't do a wrapping outer query. Also, as noted, I'm on MySQL so any solutions using the ROW_NUMBER function are unfortunately out of reach.
Thanks to all in advance.
EDIT. Here's some sample data with timestamps dumbed down to simpler integers to illustrate what I need:
maintable
id timestamp
1 100
2 200
3 300
4 400
5 500
6 600
othertable
id timestamp
4 250
5 350
3 550
1 700
=>
1
3
5
6
4
2
And if for whatever reason we add WHERE NOT maintable.id = 5 to the query, here's what we should get:
1
3
4
6
2
...because now 4 is among the top 3 values in othertable referring to this set.
So as you see, the row with id 4 from othertable is not included in the ordering as it's the fourth in descending order of timestamp values, thus it falls back into getting ordered by the basic timestamp.
The real world need for this is this: I've got content in "maintable" and "othertable" is basically a marker for featured content with a timestamp of "featured date". I've got a view where I'm supposed to float the last 3 featured items to the top and the rest of the list just be a reverse chronologic list.
Maybe something like this.
SELECT
id
FROM
(SELECT
tbl.id,
CASE WHEN othertable.timestamp IS NULL THEN
0
ELSE
#i := #i + 1
END AS num,
othertable.timestamp as othertimestamp,
maintable.timestamp as maintimestamp
FROM
tbl1 AS maintable
CROSS JOIN (select #i := 0) i
LEFT JOIN tbl2 AS othertable
ON maintable.id = othertable.id
ORDER BY
othertable.timestamp DESC) t
ORDER BY
CASE WHEN num > 0 AND num <= 3 THEN
othertimestamp
ELSE
maintimestamp
END DESC
Modified answer:
select ilv.* from
(select sq.*, #i:=#i+1 rn from
(select #i := 0) i
CROSS JOIN
(select m.*, o.id o_id, o.timestamp o_t
from maintable m
left join othertable o
on m.id = o.id
where 1=1
order by o.timestamp desc) sq
) ilv
order by case when o_t is not null and rn <=3 then rn else 4 end,
timestamp desc
SQLFiddle here.
Amend where 1=1 condition inside subquery sq to match required complex selection conditions, and add appropriate limit criteria after the final order by for paging requirements.
Can you use a union query as below?
(SELECT id,timestamp,1 AS isFeatured FROM tbl2 ORDER BY timestamp DESC LIMIT 3)
UNION ALL
(SELECT id,timestamp,2 AS isFeatured FROM tbl1 WHERE NOT id in (SELECT id from tbl2 ORDER BY timestamp DESC LIMIT 3))
ORDER BY isFeatured,timestamp DESC
This might be somewhat redundant, but it is semantically closer to the question you are asking. This would also allow you to parameterize the number of featured results you want to return.

SQL Distinct - Get all values

Thanks for looking, I'm trying to get 20 entries from the database randomly and unique, so the same one doesn't appear twice. But I also have a questionGroup field, which should also not appear twice. I want to make that field distinct, but then get the ID of the field selected.
Below is my NOT WORKING script, because it does the ID as distinct too which
SELECT DISTINCT `questionGroup`,`id`
FROM `questions`
WHERE `area`='1'
ORDER BY rand() LIMIT 20
Any advise is greatly appreciated!
Thanks
Try doing the group by/distinct first in a subquery:
select *
from (select distinct `questionGroup`,`id`
from `questions`
where `area`='1'
) qc
order by rand()
limit 20
I see . . . What you want is to select a random row from each group, and then limit it to 20 groups. This is a harder problem. I'm not sure if you can do this accurately with a single query in mysql, not using variables or outside tables.
Here is an approximation:
select *
from (select `questionGroup`
coalesce(max(case when rand()*num < 1 then id end), min(id)) as id
from `questions` q join
(select questionGroup, count(*) as num
from questions
group by questionGroup
) qg
on qg.questionGroup = q.questionGroup
where `area`='1'
group by questionGroup
) qc
order by rand()
limit 20
This uses rand() to select an id, taking, on average two per grouping (but it is random, so sometimes 0, 1, 2, etc.). It chooses the max() of these. If none appear, then it takes the minimum.
This will be slightly biased away from the maximum id (or minimum, if you switch the min's and max's in the equation). For most applications, I'm not sure that this bias would make a big difference. In other databases that support ranking functions, you can solve the problem directly.
Something like this
SELECT DISTINCT *
FROM (
SELECT `questionGroup`,`id`
FROM `questions`
WHERE `area`='1'
ORDER BY rand()
) As q
LIMIT 20

SQL join with with where and having count() condition

I have 2 tables
Sleep_sessions [id, user_id, (some other values)]
Tones [id, sleep_sessions.id (FK), (some other values)]
I need to select 10 sleep_sessions where user_id = 55 and where each sleep_session record has at least 2 tone records associated with it.
I currently have the following;
SELECT `sleep_sessions`.*
FROM (`sleep_sessions`)
JOIN `tones` ON sleep_sessions.id = `tones`.`sleep_session_id`
WHERE `user_id` = 55
GROUP BY `sleep_sessions`.`id`
HAVING count(tones.id) > 4
ORDER BY `started` desc
LIMIT 10
However I've noticed that count(tone.id) is basically the entire of the tones table and not the current sleep_session being joined
Many thanks for your help,
Andy
I'm not sure what went wrong with your query. Maybe, try
HAVING count(*)
The following query might be a bit more readable (having can be a bit of a pain to understand):
SELECT *
FROM (`sleep_sessions`)
WHERE `user_id` = 55
AND (SELECT count(*) FROM `tones`
WHERE `sleep_sessions`.`id` = `tones`.`sleep_session_id`) > 4
ORDER BY `started` desc
LIMIT 10
The advantage of this is the fact that you won't mess up the wrong semantics you have created between your GROUP BY and ORDER BY clauses. Only MySQL would ever accept your original query. Here's some insight:
http://dev.mysql.com/doc/refman/5.6/en/group-by-hidden-columns.html

Why is my SQL so slow?

My table is reasonably small around 50,000 rows. My schema is as follows:
DAILY
match_id
user_id
result
round
tournament_id
Query:
SELECT user_id
FROM `daily`
WHERE user_id IN (SELECT user_id
FROM daily
WHERE round > 25
AND tournament_id = 24
AND (result = 'Won' OR result = 'Lost'))
Using the in keyword in the fashion you are is a very dangerous [from a performance perspective] thing to do. It will result in the sub query [(select user_id from daily where round > 25 and tournament_id=24 and (result='Won' or result='Lost'))] being ran 50,000 times in this case.
You'll want to convert this onto a join something to the effect of
select user_id from daily a join
(select user_id from daily where round > 25 and tournament_id=24 and (result='Won' or result='Lost')) b on a.user_id = b.user_id
Doing something similar to this will result in only two queries and a join.
As Cybernate pointed out in your specific example you can simply use where clauses, but I went ahead and suggested this in case your query is actually more complex than what you posted.
First verify and add Indexes as suggested earlier.
Also why are you using an in if you are querying data from same table.
Change your query to:
SELECT user_id
FROM daily
WHERE round > 25
AND tournament_id = 24
AND ( result = 'Won'
OR result = 'Lost' )
Your query only needs to be:
SELECT d.user_id
FROM DAILY d
WHERE d.round > 25
AND d.tournament_id = 24
AND d.result IN ('Won', 'Lost')
Indexes should be considered on:
DAILY.round
DAILY.tournament_id
DAILY.result
This should return in a millisecond.
SELECT user_id FROM daily WITH(NOLOCK)
where user_id in (select user_id from daily WITH(NOLOCK) where round > 25 and tournament_id = 24 and (result = 'Won' or result = 'Lost'))
Then make sure there is an index on the filter columns.
CREATE NONCLUSTERED INDEX IX_1 ON daily (round ASC, tournament_id ASC, result ASC)