MySQL Query to find row duplicates based on condition with limit

MySQL Query to find row duplicates based on condition with limit - mysql

I have two tables:
Members:
id username
Trips:
id member_id flag_status created
("YES" or "NO")
I can do a query like this:
SELECT
Trip.id, Trip.member_id, Trip.flag_status
FROM
trips Trip
WHERE
Trip.member_id = 1711
ORDER BY
Trip.created DESC
LIMIT
3
Which CAN give results like this:
id member_id flag_status
8 1711 YES
9 1711 YES
10 1711 YES
My goal is to know if the member's last three trips all had a flag_status = "YES", if any of the three != "YES", then I don't want it to count.
I also want to be able to remove the WHERE Trip.member_id = 1711 clause, and have it run for all my members, and give me the total number of members whose last 3 trips all have flag_status = "YES"
Any ideas?
Thanks!
http://sqlfiddle.com/#!2/28b2d
In that sqlfiddle, when the correct query i'm seeking runs, I should see results such as:
COUNT(Member.id)
2
The two members that should qualify are members 1 and 3. Member 5 fails because one of his trips has flag_status = "NO"

You could use GROUP_CONCAT function, to obtain a list of all of the status ordered by id in ascending order:
SELECT
member_id,
GROUP_CONCAT(flag_status ORDER BY id DESC) as status
FROM
trips
GROUP BY
member_id
HAVING
SUBSTRING_INDEX(status, ',', 3) NOT LIKE '%NO%'
and then using SUBSTRING_INDEX you can extract only the last three status flags, and exclude those that contains a NO. Please see fiddle here. I'm assuming that all of your rows are ordered by ID, but if you have a created date you should better use:
GROUP_CONCAT(flag_status ORDER BY created DESC) as status
as Raymond suggested. Then, you could also return just the count of the rows returned using something like:
SELECT COUNT(*)
FROM (
...the query above...
) as q

Although I like the simplicity of fthiella's solution, I just can't think of a solution that depends so much on data representation. In order not to depend on it you can do something like this:
SELECT COUNT(*) FROM (
SELECT member_id FROM (
SELECT
flag_status,
#flag_index := IF(member_id = #member, #flag_index + 1, 1) flag_index,
#member := member_id member_id
FROM trips, (SELECT #member := 0, #flag_index := 1) init
ORDER BY member_id, id DESC
) x
WHERE flag_index <= 3
GROUP BY member_id
HAVING SUM(flag_status = 'NO') = 0
) x
Fiddle here. Note I've slightly modified the fiddle to remove one of the users.
The process basically ranks the trips for each of the members based on their id desc and then only keeps the last 3 of them. Then it makes sure that none of the fetched trips has a NO in the flag_status. FInally all the matching meembers are counted.

Related

Query for getting top 5 candidate in every group in single table

I have a table in which student marks in each subject and i have to get query in such a way that i will able to get all top 5 student in every subject who secure highest marks.
Here is a sample table:
My expected output look somthing like :
Top five student in PCM, ART, PCB on the basis of students marks,And also if two or more student secure same than those record also need to be in list with single query.

Original Answer
Technically, what you want to accomplish is not possible using a single SQL query. Had you only wanted one student per subject you could have achieved that using GROUP BY, but in your case it won't work.
The only way I can think of to get 5 students for each subject would be to write x queries, one for each subject and use UNION to glue them together. Such query will return a maximum of 5x rows.
Since you want to get the top 5 students based on the mark, you will have to use an ORDER BY clause, which, in combination with the UNION clauses will cause an error. To avoid that, you will have to use subqueries, so that UNION and ORDER BY clauses are not on the same level.
Query:
-- Select the 5 students with the highest mark in the `PCM` subject.
(
SELECT *
FROM student
WHERE subject = 'PCM'
ORDER BY studentMarks DESC
LIMIT 5
)
UNION
(
SELECT *
FROM student
WHERE subject = 'PCB'
ORDER BY studentMarks DESC
LIMIT 5
)
UNION
(
SELECT *
FROM student
WHERE subject = 'ART'
ORDER BY studentMarks DESC
LIMIT 5
);
Check out this SQLFiddle to evaluate the result yourself.
Updated Answer
This update aims to allow getting more than 5 students in the scenario that many students share the same grade in a particular subject.
Instead of using LIMIT 5 to get the top 5 rows, we use LIMIT 4,1 to get the fifth highest grade and use that to get all students that have a grade more or equal to that in a given subject. Though, if there are < 5 students in a subject LIMIT 4,1 will return NULL. In that case, we want essentially every student, so we use the minimum grade.
To achieve what is described above, you will need to use the following piece of code x times, as many as the subjects you have and join them together using UNION. As can be easily understood, this solution can be used for a small handful of different subjects or the query's extent will become unmaintainable.
Code:
-- Select the students with the top 5 highest marks in the `x` subject.
SELECT *
FROM student
WHERE studentMarks >= (
-- If there are less than 5 students in the subject return them all.
IFNULL (
(
-- Get the fifth highest grade.
SELECT studentMarks
FROM student
WHERE subject = 'x'
ORDER BY studentMarks DESC
LIMIT 4,1
), (
-- Get the lowest grade.
SELECT MIN(studentMarks)
FROM student
WHERE subject = 'x'
)
)
) AND subject = 'x';
Check out this SQLFiddle to evaluate the result yourself.
Alternative:
After some research I found an alternative, simpler query that will yield the same result as the one presented above based on the data you have provided without the need of "hardcoding" every subject in its own query.
In the following solution, we define a couple of variables that help us control the data:
one to cache the subject of the previous row and
one to save an incremental value that differentiates the rows having the same subject.
Query:
-- Select the students having the top 5 marks in each subject.
SELECT studentID, studentName, studentMarks, subject FROM
(
-- Use an incremented value to differentiate rows with the same subject.
SELECT *, (#n := if(#s = subject, #n +1, 1)) as n, #s:= subject
FROM student
CROSS JOIN (SELECT #n := 0, #s:= NULL) AS b
) AS a
WHERE n <= 5
ORDER BY subject, studentMarks DESC;
Check out this SQLFiddle to evaluate the result yourself.
Ideas were taken by the following threads:
Get top n records for each group of grouped results
How to SELECT the newest four items per category?
Select X items from every type
Getting the latest n records for each group

Below query produces almost what I desired, may this query helps others in future.
SELECT a.studentId, a.studentName, a.StudentMarks,a.subject FROM testquery AS a WHERE
(SELECT COUNT(*) FROM testquery AS b
WHERE b.subject = a.subject AND b.StudentMarks >= a.StudentMarks) <= 2
ORDER BY a.subject ASC, a.StudentMarks DESC

MySQL : Group By Clause Not Using Index when used with Case

Im using MySQL
I cant change the DB structure, so thats not an option sadly
THE ISSUE:
When i use GROUP BY with CASE (as need in my situation), MYSQL uses
file_sort and the delay is humongous (approx 2-3minutes):
http://sqlfiddle.com/#!9/f97d8/11/0
But when i dont use CASE just GROUP BY group_id , MYSQL easily uses
index and result is fast:
http://sqlfiddle.com/#!9/f97d8/12/0
Scenerio: DETAILED
Table msgs, containing records of sent messages, with fields:
id,
user_id, (the guy who sent the message)
type, (0=> means it's group msg. All the msgs sent under this are marked by group_id. So lets say group_id = 5 sent 5 msgs, the table will have 5 records with group_id =5 and type=0. For type>0, the group_id will be NULL, coz all other types have no group_id as they are individual msgs sent to single recipient)
group_id (if type=0, will contain group_id, else NULL)
Table contains approx 10 million records for user id 50001 and with different types (i.e group as well as individual msgs)
Now the QUERY:
SELECT
msgs.*
FROM
msgs
INNER JOIN accounts
ON (
msgs.user_id = accounts.id
)
WHERE 1
AND msgs.user_id IN (50111)
AND msgs.type IN (0, 1, 5, 7)
GROUP BY CASE `msgs`.`type` WHEN 0 THEN `msgs`.`group_id` ELSE `msgs`.`id` END
ORDER BY `msgs`.`group_id` DESC
LIMIT 100
I HAVE to get summary in a single QUERY,
so msgs sent to group lets say 5 (have 5 records in this table) will be shown as 1 record for summary (i may show COUNT later, but thats not an issue).
The individual msgs have NULL as group_id, so i cant just put 'GROUP BY group_id ' coz that will Group all individual msgs to single record which is not acceptable.
Sample output can be something like:
id owner_id, type group_id COUNT
1 50001 0 2 5
1 50001 1 NULL 1
1 50001 4 NULL 1
1 50001 0 7 5
1 50001 5 NULL 1
1 50001 5 NULL 1
1 50001 5 NULL 1
1 50001 0 10 5
Now the problem is that the GROUP condition after using CASE (which i currently think that i have to because i only need to group by group_id if type=0) is causing alot of delay coz it's not using indexes which it does if i dont use CASE (like just group by group_id ). Please view SQLFiddles above to see the explain results
Can anyone plz give an advice how to get it optimized
UPDATE
I tried a workaround , that does somehow works out (drops INITIAL queries to 1sec). Using union, what it does is, to minimize the resultset by union that forces SQL to write on disk for filesort (due to huge resultset), limit the resultset of group msgs, and individual msgs (view query below)
-- first part of union retrieves group msgs (that have type 0 and needs to be grouped by group_id). Applies the limit to captivate the out of control result set
-- The second query retrieves individual msgs, (those with type !=0, grouped by msgs.id - not necessary but just to be save from duplicate entries due to joins). Applies the limit to captivate the out of control result set
-- JOins the two to retrieve the desired resultset
Here's the query:
SELECT
*
FROM
(
(
SELECT
msgs.id as reference_id, user_id, type, group_id
FROM
msgs
INNER JOIN accounts
ON (msgs.user_id = accounts.id)
WHERE 1
AND accounts.id IN (50111 ) AND type = 0
GROUP BY msgs.group_id
ORDER BY msgs.id DESC
LIMIT 40
)
UNION
ALL
(
SELECT
msgs.id as reference_id, user_id, type, group_id
FROM
msgs
INNER JOIN accounts
ON (
msgs.user_id = accounts.id
)
WHERE 1
AND msgs.type != 0
AND accounts.id IN (50111)
GROUP BY msgs.id
ORDER BY msgs.id
LIMIT 40
)
) AS temp
ORDER BY reference_id
LIMIT 20,20
But has alot of caveats,
-I need to handle the limit in inner queries as well. Lets say 20recs per page, and im on page 4. For inner queries , i need to apply limit 0,80, since im not sure which of the two parts had how many records in the previous 3 pages. So, as the records per page and number of pages grow, my query grows heavier. Lets say 1k rec per page, and im on page 100 , or 1K, the load gets heavier and time exponentially increases
I need to handle ordering in inner queries and then apply on the resultset prepared by union , conditions need to be applied on both inner queries seperately(but not much of an issue)
-Cant use calc_found_rows, so will need to get count using queries seperately
The main issue is the first one. The higher i go with the pagination , the heavier it gets

Would this run faster?
SELECT id, user_id, type, group_id
FROM
( SELECT id, user_id, type, group_id, IFNULL(group_id, id) AS foo
FROM msgs
WHERE user_id IN (50111)
AND type IN (0, 1, 5, 7)
)
GROUP BY foo
ORDER BY `group_id` DESC
LIMIT 100
It needs INDEX(user_id, type).
Does this give the 'correct' answer?
SELECT DISTINCT *
FROM msgs
WHERE user_id IN (50111)
AND type IN (0, 1, 5, 7)
GROUP BY IFNULL(group_id, id)
ORDER BY `group_id` DESC
LIMIT 100
(It needs the same index)

Mysql Ranking Query on 2 columns

Table
id user_id rank_solo lp
1 1 15 45
2 2 7 79
3 3 17 15
How can I sort out a ranking query that sorts on rank_solo ( This ranges from 0 to 28) and if rank_solo = rank_solo , uses lp ( 0-100) to further determine ranking?
(If lp = lp, add a ranking for no tie rankings)
The query should give me the ranking from a certain random user_id. How is this performance wise on 5m+ rows?
So
User_id 1 would have ranking 2
User_id 2 would have ranking 3
User_id 3 would have ranking 1

You can get the ranking using variablesL
select t.*, (#rn := #rn + 1) as ranking
from t cross join
(select #rn := 0) params
order by rank_solo desc, lp;

You can use ORDER BY to sort your query:
SELECT *
FROM `Table`
ORDER BY rank_solo, lp

I'm not sure I quite understand what you're saying. With that many rows, create a query on the fields you're using to do your selects. For example, in MySQL client use:
create index RANKINGS on mytablename(rank_solo,lp,user_id);
Depending on what you use in your query to select the data, you may change the index or add another index with a different field combination. This has improved performance on my tables by a factor of 10 or more.
As for the query, if you're selecting a specific user then could you not just use:
select rank_solo from table where user_id={user id}
If you want the highest ranking individual, you could:
select * from yourtable order by rank_solo,lp limit 1
Remove the limit 1 to list them all.
If I've misunderstood, please comment.

An alternative would be to use a 2nd table.
table2 would have the following fields:
rank (auto_increment)
user_id
rank_solo
lp
With the rank field as auto increment, as it's populated, it will automatically populate with values beginning with "1".
Once the 2nd table is ready, just do this when you want to update the rankings:
delete from table2;
insert into table2 select user_id,rank_solo,lp from table1 order by rank_solo,lp;
It may not be "elegant" but it gets the job done. Plus, if you create an index on both tables, this query would be very quick since the fields are numeric.

Group by user and show latest in MYSQL not working

I have a social network I am coding but it's a bit different than others, in this case there is only ever one status show per user on a feed section.
So I need to sort the status by date with the latest ones on top but also group by the userID
unfortunately I can't seem to get this working....
This is my current query:
SELECT *
FROM status
WHERE userID != "83" #This is just so my statuses don't show
GROUP BY userID
ORDER BY addedDate DESC
LIMIT 10
I expect to see the latest status results and only one per user instead I see the first statuses so the group by is working but not the order by.
Thanks in advance.

As mentioned in the comments to Robin's answer, that approach is unreliable because MySQL does not guarantee that it will always return the most recent status from each group. You must instead join your table with a subquery that selects the most recent status (based on addedDate).
SELECT *
FROM status
NATURAL JOIN (
SELECT userID, MAX(addedDate) as addedDate
FROM status
GROUP BY userID
) AS mostRecent
ORDER BY addedDate DESC
LIMIT 10
Note that if a user has multiple status updates with the same addedDate, the server will return all of them (whereas Robin's query would return an indeterminate one); if you need control over such a situation, you will need to define how one determines which such status update should be selected.

SELECT userID, max(addedDate)
FROM status
WHERE userID != "83" #This is just so my statuses don't show
GROUP BY userID

SELECT *
FROM ( SELECT *
FROM status
WHERE userID != "83"
ORDER BY addedDate DESC) AS h
GROUP BY userID
ORDER BY addedDate DESC
LIMIT 10
You must ORDER BY before GROUP BY'ing.
Example 1
Example 2

Determine total amount of top result returned

I would like to determine two things from a single query:
Most prevalent column in a table
The amount of times such column was located upon querying the table
Example Table:
user_id some_field
1 data
2 data
1 data
The above would return user_id # 1 as being the most prevalent in the table, and it would return (2) for the total amount of times that it was located in the table.
I have done my research and I came across two types of queries.
GROUP BY user_id ORDER BY COUNT(*) DESC
SUM
The problem is that I can't figure out how to use these two queries in conjunction with one another. For example, consider the following query which successfully returns the most prevalent column.
$top_user = "SELECT user_id FROM table_name GROUP BY user_id ORDER BY COUNT(*) DESC";
The above query returns "1" based on the example table shown above. Now, I would like to be able to return "2" for the total amount of times the user_id (1) was found in the table.
Is this by any chance possible?
Thanks,
Evan

You can include count(*) in the SELECT list:
SELECT user_id, count(*) as totaltimes from table_name
GROUP BY user_id ORDER BY count(*) DESC;
If you want only the first one:
SELECT user_id, count(*) as totaltimes from table_name
GROUP BY user_id ORDER BY count(*) DESC LIMIT 1;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL Query to find row duplicates based on condition with limit - mysql

Related

Query for getting top 5 candidate in every group in single table

MySQL : Group By Clause Not Using Index when used with Case

Mysql Ranking Query on 2 columns

Group by user and show latest in MYSQL not working

Determine total amount of top result returned

Categories

Resources