MySQL : Group By Clause Not Using Index when used with Case - mysql

Im using MySQL
I cant change the DB structure, so thats not an option sadly
THE ISSUE:
When i use GROUP BY with CASE (as need in my situation), MYSQL uses
file_sort and the delay is humongous (approx 2-3minutes):
http://sqlfiddle.com/#!9/f97d8/11/0
But when i dont use CASE just GROUP BY group_id , MYSQL easily uses
index and result is fast:
http://sqlfiddle.com/#!9/f97d8/12/0
Scenerio: DETAILED
Table msgs, containing records of sent messages, with fields:
id,
user_id, (the guy who sent the message)
type, (0=> means it's group msg. All the msgs sent under this are marked by group_id. So lets say group_id = 5 sent 5 msgs, the table will have 5 records with group_id =5 and type=0. For type>0, the group_id will be NULL, coz all other types have no group_id as they are individual msgs sent to single recipient)
group_id (if type=0, will contain group_id, else NULL)
Table contains approx 10 million records for user id 50001 and with different types (i.e group as well as individual msgs)
Now the QUERY:
SELECT
msgs.*
FROM
msgs
INNER JOIN accounts
ON (
msgs.user_id = accounts.id
)
WHERE 1
AND msgs.user_id IN (50111)
AND msgs.type IN (0, 1, 5, 7)
GROUP BY CASE `msgs`.`type` WHEN 0 THEN `msgs`.`group_id` ELSE `msgs`.`id` END
ORDER BY `msgs`.`group_id` DESC
LIMIT 100
I HAVE to get summary in a single QUERY,
so msgs sent to group lets say 5 (have 5 records in this table) will be shown as 1 record for summary (i may show COUNT later, but thats not an issue).
The individual msgs have NULL as group_id, so i cant just put 'GROUP BY group_id ' coz that will Group all individual msgs to single record which is not acceptable.
Sample output can be something like:
id owner_id, type group_id COUNT
1 50001 0 2 5
1 50001 1 NULL 1
1 50001 4 NULL 1
1 50001 0 7 5
1 50001 5 NULL 1
1 50001 5 NULL 1
1 50001 5 NULL 1
1 50001 0 10 5
Now the problem is that the GROUP condition after using CASE (which i currently think that i have to because i only need to group by group_id if type=0) is causing alot of delay coz it's not using indexes which it does if i dont use CASE (like just group by group_id ). Please view SQLFiddles above to see the explain results
Can anyone plz give an advice how to get it optimized
UPDATE
I tried a workaround , that does somehow works out (drops INITIAL queries to 1sec). Using union, what it does is, to minimize the resultset by union that forces SQL to write on disk for filesort (due to huge resultset), limit the resultset of group msgs, and individual msgs (view query below)
-- first part of union retrieves group msgs (that have type 0 and needs to be grouped by group_id). Applies the limit to captivate the out of control result set
-- The second query retrieves individual msgs, (those with type !=0, grouped by msgs.id - not necessary but just to be save from duplicate entries due to joins). Applies the limit to captivate the out of control result set
-- JOins the two to retrieve the desired resultset
Here's the query:
SELECT
*
FROM
(
(
SELECT
msgs.id as reference_id, user_id, type, group_id
FROM
msgs
INNER JOIN accounts
ON (msgs.user_id = accounts.id)
WHERE 1
AND accounts.id IN (50111 ) AND type = 0
GROUP BY msgs.group_id
ORDER BY msgs.id DESC
LIMIT 40
)
UNION
ALL
(
SELECT
msgs.id as reference_id, user_id, type, group_id
FROM
msgs
INNER JOIN accounts
ON (
msgs.user_id = accounts.id
)
WHERE 1
AND msgs.type != 0
AND accounts.id IN (50111)
GROUP BY msgs.id
ORDER BY msgs.id
LIMIT 40
)
) AS temp
ORDER BY reference_id
LIMIT 20,20
But has alot of caveats,
-I need to handle the limit in inner queries as well. Lets say 20recs per page, and im on page 4. For inner queries , i need to apply limit 0,80, since im not sure which of the two parts had how many records in the previous 3 pages. So, as the records per page and number of pages grow, my query grows heavier. Lets say 1k rec per page, and im on page 100 , or 1K, the load gets heavier and time exponentially increases
I need to handle ordering in inner queries and then apply on the resultset prepared by union , conditions need to be applied on both inner queries seperately(but not much of an issue)
-Cant use calc_found_rows, so will need to get count using queries seperately
The main issue is the first one. The higher i go with the pagination , the heavier it gets

Would this run faster?
SELECT id, user_id, type, group_id
FROM
( SELECT id, user_id, type, group_id, IFNULL(group_id, id) AS foo
FROM msgs
WHERE user_id IN (50111)
AND type IN (0, 1, 5, 7)
)
GROUP BY foo
ORDER BY `group_id` DESC
LIMIT 100
It needs INDEX(user_id, type).
Does this give the 'correct' answer?
SELECT DISTINCT *
FROM msgs
WHERE user_id IN (50111)
AND type IN (0, 1, 5, 7)
GROUP BY IFNULL(group_id, id)
ORDER BY `group_id` DESC
LIMIT 100
(It needs the same index)

Related

Adding Row Values when there are no results - MySQL

Problem Statement: I need my result set to include records that would not naturally return because they are NULL.
I'm going to put some simplified code here since my code seems to be too long.
Table Scores has Company_type, Company, Score, Project_ID
Select Score, Count(Project_ID)
FROM Scores
WHERE company_type= :company_type
GROUP BY Score
Results in the following:
Score Projects
5 95
4 94
3 215
2 51
1 155
Everything is working fine until I apply a condition to company_type that does not include results in one of the 5 score categories. When this happens, I don't have 5 rows in my result set any more.
It displays like this:
Score Projects
5 5
3 6
1 3
I'd like it to display like this:
Score Projects
5 5
4 0
3 6
2 0
1 3
I need the results to always display 5 rows. (Scores = 1-5)
I tried one of the approaches below by Spencer7593. My simplified query now looks like this:
SELECT i.score AS Score, IFNULL(count(*), 0) AS Projects
FROM (SELECT 5 AS score
UNION ALL
SELECT 4
UNION ALL
SELECT 3
UNION ALL
SELECT 2
UNION ALL
SELECT 1) i
LEFT JOIN Scores ON Scores.score = i.score
GROUP BY Score
ORDER BY i.score DESC
And gives the following results, which is accurate except that the rows with 1 in Projects should actually be 0 because they are derived by the "i". There are no projects with a score of 5 or 2.
Score Projects
5 1
4 5
3 6
2 1
1 3
Solved! I just needed to adjust my count to specifically look at the project count - count(project) rather than count(*). This returned the expected results.
If you always want your query to return 5 rows, with Score values of 5,4,3,2,1... you'll need a rowsource that supplies those Score values.
One approach would be to use a simple query to return those fixed values, e.g.
SELECT 5 AS score
UNION ALL SELECT 4
UNION ALL SELECT 3
UNION ALL SELECT 2
UNION ALL SELECT 1
Then use that query as inline view, and do an outer join operation to the results from your current query
SELECT i.score AS `Score`
, IFNULL(q.projects,0) AS `Projects`
FROM ( SELECT 5 AS score
UNION ALL SELECT 4
UNION ALL SELECT 3
UNION ALL SELECT 2
UNION ALL SELECT 1
) i
LEFT
JOIN (
-- the current query with "missing" Score rows goes here
-- for completeness of this example, without the query
-- we emulate that result with a different query
SELECT 5 AS score, 95 AS projects
UNION ALL SELECT 3, 215
UNION ALL SELECT 1, 155
) q
ON q.score = i.score
ORDER BY i.score DESC
It doesn't have to be the view query in this example. But there does need to be a rowsource that the rows can be returned from. You could, for example, have a simple table that contains those five rows, with those five score values.
This is just an example approach for the general approach. It might be possible to modify your existing query to return the rows you want. But without seeing the query, the schema, and example data, we can't tell.
FOLLOWUP
Based on the edit to the question, showing an example of the current query.
If we are guaranteed that the five values of Score will always appear in the Scores table, we could do conditional aggregation, writing a query like this:
SELECT s.score
, COUNT(IF(s.company_type = :company_type,s.project_id,NULL)) AS projects
FROM Scores s
GROUP BY s.score
ORDER BY s.score DESC
Note that this will require a scan of all the rows, so it may not perform as well. The "trick" is the IF function, which returns a NULL value in place of project_id, when the row would have been excluded by the WHERE clause.)
If we are guaranteed that project_id is non-NULL, we could use a more terse MySQL shorthand expression to achieve an equivalent result...
, IFNULL(SUM(s.company_type = :company_type),0) AS projects
This works because MySQL returns 1 when the comparison is TRUE, and otherwisee returns 0 or NULL.
Try something like this:
select distinct score
from (
select distinct score from scores
) s
left outer join (
Select Score, Count(Project_ID) cnt
FROM Scores
WHERE company_type= :company_type
) x
on s.score = x.score
Your posted query would not work without a group by statement. However, even there, if you don't have those particular scores for that company type, it wouldn't work either.
One option is to use an outer join. That would require a little more work though.
Here's another option using conditional aggregation:
select Score, sum(company_type=:company_type)
from Scores
group by Score

Number duplicate records on the MySQL table

Have a table with similar schema
id control code amount
1 200 12 300
2 400 12 300
3 200 12 300
4 100 10 400
5 100 10 400
6 500 13 500
Trying to list the duplicates of records on a UI.
Using following query I can retrieve the duplicate records and show it on UI.
select * from mwt group by control,code,amount having count(id) > 1;
id control code amount
1 200 12 300
4 100 10 400
Here the records with id 1 and 4 are duplicates of 3 and 5 respectively.
On the UI, the user will click a check-box adjacent to the record and corresponding duplicate records should be populate to the UI. To make things easier trying to populate another column named dup_id. Using this dup_id it is possible to filter the results from UI , which is in the JSON format.
How to create a result set similar to the one shown below?
id control code amount dup_id
1 200 12 300 1
2 400 12 300
3 200 12 300 1
4 100 10 400 4
5 100 10 400 4
6 500 13 500
This seems like a simpler solution than that suggested by #kickstarter - but maybe I've misunderstood the requirement...
SELECT x.*
, y.dup_id
FROM my_table x
LEFT
JOIN
( SELECT MIN(id) dup_id
, control
, code
, amount
FROM my_table
GROUP
BY control
, code
, amount
HAVING COUNT(*) > 1
) y
ON y.control = x.control
AND y.code = x.code
AND y.amount = x.amount;
Depending on how accurate the order has to be, you could do something like this.
This is getting all the unique control / code / amount with a count, to get a flag to know if that is a duplicate row, and ordered by control / code / amount so that they are in order. It does a cross join to initialise a few user variables.
Then it calculates a counter, only incrementing it if any of control / code / amount have changed AND it is a duplicate row. Then sets user variables to store the previous values of control / code / amount.
The outer query then orders the results back in to id order.
SELECT sub3.id,
sub3.control,
sub3.code,
sub3.amount,
sub3.dup_id
FROM
(
SELECT sub2.id,
sub2.control,
sub2.code,
sub2.amount,
#cnt:=IF(#control=control AND #code=code AND #amount=amount AND sub2.id_count IS NOT NULL, #cnt, IF(sub2.id_count IS NULL, #cnt, #cnt + 1)),
#control:=control,
#code:=code,
#amount:=amount,
IF(sub2.id_count IS NULL, NULL, #cnt) AS dup_id
FROM
(
SELECT mwt.id, mwt.control, mwt.code, mwt.amount, sub1.id_count
FROM mwt
LEFT OUTER JOIN
(
SELECT control, code, amount, COUNT(id) AS id_count
FROM mwt
GROUP BY control,code,amount
HAVING id_count > 1
) sub1
ON mwt.control = sub1.control
AND mwt.code = sub1.code
AND mwt.amount = sub1.amount
ORDER BY mwt.control, mwt.code, mwt.amount
) sub2
CROSS JOIN
(
SELECT #cnt:=0, #control:=0, #code:=0, #amount:=0
) sub0
) sub3
ORDER BY id
Note that this is ordering by control, code and amount, so not an exact match for your required output (which would require getting the first duplicates ordered by id first).
EDIT - Simpler and better way to do it. This gets all the duplicate rows with the min id for those duplicates (ordered by the min id), and uses a user variable to add a sequence number for those. Then LEFT OUTER JOINs that back against the main table to put that sequence number in all the matching rows.
SELECT mwt.id, mwt.control, mwt.code, mwt.amount, sub2.dup_id
FROM mwt
LEFT OUTER JOIN
(
SELECT sub1.id, sub1.control, sub1.code, sub1.amount, #cnt:=#cnt+1 AS dup_id
FROM
(
SELECT MIN(id) AS id, control, code, amount
FROM mwt
GROUP BY control,code,amount
HAVING COUNT(id) > 1
ORDER BY id
) sub1
CROSS JOIN
(
SELECT #cnt:=0
) sub0
) sub2
ON mwt.control = sub2.control
AND mwt.code = sub2.code
AND mwt.amount = sub2.amount
ORDER BY mwt.id
Would you need a dup_id column ?. I hope this can be achieved with a simple query like below
select id
, control
, code
, amount
from table
where control = from selected Record
and code = from selected Record
and amount = from selected Record
and id not equals from selected Record
You can very well omit the last not equals if the requirement is to list down duplicates including the selected record.

Advanced mysql query which sort down specific records on result set irrespective of its default sorting?

I have a query which actually have a sorting using order by clause. i have a table like following...
user_id user_name user_age user_state user_points
1 Rakul 30 CA 56
2 Naydee 29 NY 144
3 Jeet 40 NJ 43
.....
i have following query...
select * from users where user_state = 'NY' order by user_points desc limit 50;
This gives me the list of 50 people with most points. I wanted to give least preference to few people who's id's were known. Incase if i do not have enough 50 records then those id's should come in the last in the list. I do not want the users 2 and 3 to come on top of the list even though they have higher points... those people should come on the last of the list from the query. Is there any way to push specific records to last on result set irrespective of query sorting ?
If you want to move specific records (like user_id = 2 and 3) down to the list; Then you can run below Query:
mysql> select *,IF(user_id=2 or user_id=3,0,1) as list_order from users where user_state = 'NY' order by list_order desc, user_points desc limit 50;
select * from (
select *
from users
where user_state = 'NY'
-- this order by ensures that 2 and 3 are included
order by case when user_id in (2,3) then 1 else 2 end, user_points desc
limit 50
) as top48plus2n3
-- this order by ensures that 2 and 3 are last
order by case when user_id in (2,3) then 2 else 1 end, user_points desc
Edit: changed id by user_id and corrected outside order by (sorry about that)
On the inner select:
By using this case calculation, what you do is ensuring that records with ids equal to 2 and 3 are "important" (firstly ordered in the order by). Those receive 1 while the others receive 2 as order value, only after that points are relevant.
On the outer select:
Records with ids 2 and 3 recieve 2 as order value, while the rest recieve 1. So they go last irrespective of its "default"
Here you have a reduced fiddle http://sqlfiddle.com/#!9/377c1/1

MySQL Query to find row duplicates based on condition with limit

I have two tables:
Members:
id username
Trips:
id member_id flag_status created
("YES" or "NO")
I can do a query like this:
SELECT
Trip.id, Trip.member_id, Trip.flag_status
FROM
trips Trip
WHERE
Trip.member_id = 1711
ORDER BY
Trip.created DESC
LIMIT
3
Which CAN give results like this:
id member_id flag_status
8 1711 YES
9 1711 YES
10 1711 YES
My goal is to know if the member's last three trips all had a flag_status = "YES", if any of the three != "YES", then I don't want it to count.
I also want to be able to remove the WHERE Trip.member_id = 1711 clause, and have it run for all my members, and give me the total number of members whose last 3 trips all have flag_status = "YES"
Any ideas?
Thanks!
http://sqlfiddle.com/#!2/28b2d
In that sqlfiddle, when the correct query i'm seeking runs, I should see results such as:
COUNT(Member.id)
2
The two members that should qualify are members 1 and 3. Member 5 fails because one of his trips has flag_status = "NO"
You could use GROUP_CONCAT function, to obtain a list of all of the status ordered by id in ascending order:
SELECT
member_id,
GROUP_CONCAT(flag_status ORDER BY id DESC) as status
FROM
trips
GROUP BY
member_id
HAVING
SUBSTRING_INDEX(status, ',', 3) NOT LIKE '%NO%'
and then using SUBSTRING_INDEX you can extract only the last three status flags, and exclude those that contains a NO. Please see fiddle here. I'm assuming that all of your rows are ordered by ID, but if you have a created date you should better use:
GROUP_CONCAT(flag_status ORDER BY created DESC) as status
as Raymond suggested. Then, you could also return just the count of the rows returned using something like:
SELECT COUNT(*)
FROM (
...the query above...
) as q
Although I like the simplicity of fthiella's solution, I just can't think of a solution that depends so much on data representation. In order not to depend on it you can do something like this:
SELECT COUNT(*) FROM (
SELECT member_id FROM (
SELECT
flag_status,
#flag_index := IF(member_id = #member, #flag_index + 1, 1) flag_index,
#member := member_id member_id
FROM trips, (SELECT #member := 0, #flag_index := 1) init
ORDER BY member_id, id DESC
) x
WHERE flag_index <= 3
GROUP BY member_id
HAVING SUM(flag_status = 'NO') = 0
) x
Fiddle here. Note I've slightly modified the fiddle to remove one of the users.
The process basically ranks the trips for each of the members based on their id desc and then only keeps the last 3 of them. Then it makes sure that none of the fetched trips has a NO in the flag_status. FInally all the matching meembers are counted.

MySQL query not offsetting correctly

Can someone help me understand why the following query is not offsetting correctly?
It's meant to select all records in the games table, and add a column with a value of 0 or 1 based on whether a record in another table (wishlist) exists with the same gameId #memberId (in plain English, get me all records from games, and mark any game that exists in the wishlists table, under whatever memberId I give you)
SELECT *,
CASE WHEN wishlists.memberid IS NULL THEN 0 ELSE 1 END AS InMembersList
FROM games
INNER JOIN platforms ON games.platformid = platforms.id
LEFT OUTER JOIN wishlists ON games.id = wishlists.gameid and wishlists.memberid = #memberId
WHERE platforms.platformUrlId = #platformUrlId
ORDER BY releaseDate DESC
LIMIT 1,8
When I change the offset from 1 to 2, or 3, or whatever, many of the same records appear, which does not make any sense. Where am I going wrong?
Schema:
platforms(id, platform)
members(id, name)
games(id, platformId, releaseDate)
wishlists(id, memberId, gameId)
LIMIT 1,8 means start from row number 1 (they start from 0) and fetch 8 rows. So LIMIT 2,8 will give you 8 rows starting from row 2 - seven of which will be the same as with LIMIT 1,8