Limit each group in group by - mysql

Now I understand that this has been asked several times before, but I have tried to apply different existing solutions to my specific problems for quite a while without success. So I turn here in hope of some guidance.
I have a table called tblanswers, which contains answers linked to different questions in another table. What I want is to get the count for each answer for a specific question ID, but limit it to the n first answers each month.
Sample data from tblanswers:
id qid answer timestamp
72 162 2 1366027324
71 161 4 1343599200
70 162 2 1366014201
69 161 4 1366011700
68 162 2 1366006729
67 161 3 1366010948
66 162 2 1365951084
This is the query I have so far:
SELECT *, COUNT(*) c FROM(
SELECT answer, timestamp, YEAR(FROM_UNIXTIME(timestamp)) yr, MONTH(FROM_UNIXTIME(timestamp)) mo FROM tblanswers
WHERE qid = 161
ORDER BY timestamp ASC
) q GROUP BY YEAR(FROM_UNIXTIME(timestamp)), MONTH(FROM_UNIXTIME(timestamp)), answer
That would give me something like this: (the dates and numbers in sample data is not accurate)
answer yr mo c
1 2013 5 5
2 2013 5 3
3 2013 5 2
1 2013 6 5
2 2013 6 15
3 2013 6 7
Let's say I only want to see the first three answers in a month, then count could never be more than 3. How can I limit each month?
The final data should be a sum of each answer, like this:
answer num_answers
1 2
2 3
3 3
I think one of these solutions could work, but not how:
http://code.openark.org/blog/mysql/sql-selecting-top-n-records-per-group
http://code.openark.org/blog/mysql/sql-selecting-top-n-records-per-group-another-solution
Any help is appreciated. Thanks!

This solution is based on the top-N-per-group method here
SELECT answer, COUNT(*) num_answers
FROM (SELECT answer, yearmonth,
#rn := CASE WHEN #prevmonth = yearmonth
THEN #rn + 1
ELSE 1
END rn,
#prevmonth := yearmonth
FROM (SELECT #rn := NULL, #prevmonth := NULL) init,
(SELECT answer,
YEAR(FROM_UNIXTIME(timestamp))*100+MONTH(FROM_UNIXTIME(timestamp)) yearmonth
FROM tblanswers
WHERE qid = 220
ORDER BY timestamp) x) y
WHERE rn <= 3
GROUP BY answer
SQLFIDDLE

What about this solution:
SELECT qid, answer, YEAR(FROM_UNIXTIME(timestamp)) yr, MONTH(FROM_UNIXTIME(timestamp)) mo, COUNT(*) no
FROM tblanswers
WHERE qid = 161
GROUP BY answer, yr, mo
HAVING COUNT(*) <= 2
ORDER BY timestamp ASC;
and the fiddle: http://sqlfiddle.com/#!2/1541eb/126

There is no reason to reinvent a wheel and risk you have a buggy, suboptimal code. Your problem is trivial extension of common per group limit problem (see also tag limit-per-group). There are already tested and optimized solutions to solve this problem.

Related

Solving for outlier range, how to calculate on two different rows from same output?

I have query below as:
SELECT
age_quartile,
MAX(age) AS quartile_break
from
(SELECT
full_name,
age,
NTILE(4) OVER (ORDER BY age) AS age_quartile
FROM friends) AS quartiles
WHERE age_quartile IN (1, 3)
GROUP BY age_quartile)
This gives me output that looks like:
age_quartile | quantile_break
1 31
3 35
Desired Output:
outlier range
25
41
where 25 = 31-6 and 41 = 35 + 6
How can I add to my query above where I can my final desired output? My query currently gives me what the numbers are where I need to do one additional step to solve for the outlier range. thanks!
table data looks like:
friends
full_name | age
Ameila Lara 1
Evangeline Griffin 21
Kiara Atkinson 31
Isobel Nieslen 31
Genevuve Miles 32
Jane Jenkins 99
Marie Acevedo null
Dont now ntile is the right function to use here. But one way is to define the age quartiles in a temp table and join with age table and find the results. Just a try. There may be better way. Interested to see other answers.
Sample Query:
with friends as
(
select 'user1' as full_name, 31 as age union all
select 'user2' as full_name, 55 as age union all
select 'user3' as full_name, 75 as age
),
quartiles_age as
(
select 1 as quartile, 0 as st_range, 25 as end_range union all
select 2 as quartile, 26 as st_range, 50 as end_range union all
select 3 as quartile, 51 as st_range, 75 as end_range union all
select 4 as quartile, 76 as st_range, 100 as end_range
)
SELECT
fr.full_name,
fr.age,
qrtl_age.quartile,
qrtl_age.end_range - fr.age as diff_age
FROM
friends fr
join quartiles_age qrtl_age on fr.age between qrtl_age.st_range and qrtl_age.end_range
Fiddle URL : (https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=c48d209264e90d276cea6ae03f2a7af6)
You can calculate the range using:
MAX(age) - MIN(age) AS age_range
That answers the question you asked. Your sample data has an arbitrary 6 for the calculation, which the question does not explain.

MySQL DISTINCT call counts by state

I'm trying to get the unique call counts (no dupe calls) by state. For example...
MO 249
OK 220
CA 216
TX 190
KS 158
The following works (no errors), but it's not removing the dupes.
SELECT DISTINCT CallFrom, FromState, count(*) AS cnt
FROM `calls`
WHERE DateCreated >= CURDATE() - INTERVAL 2 YEAR AND
(CallTo = '+15555555555' OR CallTo = '+15555555556' )
GROUP BY FromState
ORDER BY cnt DESC
Any ideas? Thanks in advance.
UPDATE: The following 'calls' table example was requested...
Index CallTo CallFrom FromState
1 +15555555555 18166283100 MO
2 +15555555556 13307059600 OH
3 +15555555555 17722631600 FL
4 +15555555556 16173024800 MA
5 +15555555556 16173024800 MA
6 +15555555556 16175025500 MA
Just realized I forgot to include the DateCreated column, but like I said, everything is working except for deduplicating. The output for this example would be...
MA 2
MO 1
OH 1
FL 1
Your wording is not very clear, but I think you're saying you want to count how many unique CallFrom numbers occurred in each state. There may be better ways to do this, but this will work. First it builds a list of unique CallFrom/State combinations, and then it groups and counts on that list, instead of on the raw data:
SELECT FromState, COUNT(*)
FROM
(SELECT DISTINCT CallFrom, FromState
FROM `calls`
WHERE
(CallTo = '+15555555555' OR CallTo = '+15555555556' )
) c
GROUP BY FromState
Demo:
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=d057e3482ec9d5ad4519e58056232e58

select top 5 scores for each team [duplicate]

This question already has answers here:
Using LIMIT within GROUP BY to get N results per group?
(14 answers)
Closed 5 years ago.
This is different from the one marked as a double, I want to sum up top 5 for each team. The double post takes out for each of the results in separate rows.
I'm using this question now but it seems that SQL is randomly returning 5 of for exasmple 10 rows and sum up, not the top 5. Anyone has some input for me?
select team, sum(length) as totalScore
from
(SELECT t.*,
#num_in_group:=case when #team!=team then #num_in_group:=0 else #num_in_group:=#num_in_group+1 end as num_in_group,
#team:=team as t
FROM reg_catches t, (select #team:=-1, #num_in_group:=0) init
ORDER BY team asc) sub
WHERE sub.num_in_group<=4 and competition = 16 and team = 25
GROUP BY team
ORDER BY totalScore DESC;
I'm struggeling on a SQL question that I can't get my head around. My result-table looks like below, I'm trying to sum up the top 5 result for each team and limit the output to the top 3 highest ranked teams. Everything was working as expected until I added my last score in the result-table. The output of my SQL now is randomly for team 25. I've expected that to be 520..
team length competition
----------------------
26 70 16
25 70 16
25 95 16
25 98 16
25 100 16
25 100 16
25 100 16
25 122 16
Output:
team totalScore
---- -----------
25 122
26 70
Wanted output:
team totalScore
---- -----------
25 522
26 70
SELECT team, SUM(length) AS totalScore
FROM(
SELECT team, length
FROM table_result m
WHERE competition = 16 and (
SELECT COUNT(*)
FROM table_result mT
WHERE mT.team = m.team AND mT.length >= m.length
) <= 5) tmp
GROUP BY team
ORDER BY totalScore DESC Limit 3
Anyone has any ideas for me?
select team, sum(length)
from
(SELECT t.*,
#num_in_group:=case when #team!=team then #num_in_group:=0 else #num_in_group:=#num_in_group+1 end as num_in_group,
#team:=team as t
FROM test.table_result t, (select #team:=-1, #num_in_group:=0) init
ORDER BY team, length desc) sub
WHERE sub.num_in_group<=4
GROUP BY team
You should use a window function to accomplish this. Here's an example query:
SELECT team, SUM(length) AS totalScore FROM
(SELECT team,
length,
row_number() OVER (PARTITION BY team ORDER BY length desc) AS rowNumber
FROM table_result) tmp
WHERE rowNumber <= 5
AND competition = 16
GROUP BY team
ORDER BY totalScore DESC
LIMIT 3;
This query has two parts.
The inner query uses the row_number() window function to give every row an extra column that indicates its rank. PARTITION BY team says that the rank should be kept separately for each team, so that you end up being able to select the top n scores for every team.
The outer query uses a GROUP BY on the result of the inner query to take the SUM, per team, of all of the scores whose row number is less than or equal to 5 - in other words, the top 5 scores.

Selecting id and value where max occurs

I've solved one issue and ran into another. Basicaly i want to select question_id, answer and maximum number of occurences. I run my query from basic table that gathers questions and answers to them (question id represents question and answer represents answer from 0 to 5 that corresponds to other table but that doesn't matter).
**survey_result**
question_id
answer (int from 0 to 5)
Sample survey_result:
question_id answer
1 3
1 5
1 2
2 2
2 0
2 4
Here's the query, it's purpose is to check for every single question, which answer (from 0 to 5) occured the most.
select question_id, answer, max(occurence_number) FROM
(select question_id, answer, count(*) as occurence_number
from survey_result
group by question_id, answer
order by question_id asc, occurence_number desc) as results
GROUP BY question_id
So a sub query results in something like this:
question_id answer occurence_number
1 0 12
1 1 20
1 2 34
1 3 5
1 4 9
1 5 15
But main query results something like this:
question_id answer occurence_number
1 0 12
2 0 20
3 0 34
4 0 5
So the problem is that it always shows answer 0, and i want to get correct answer number.
Sadly a bit redundant due to MySQL's lack of a WITH statement, but this should do what you want. In case of a tie, it will return the higher answer.
SELECT s1.question_id, MAX(s1.answer) answer, MAX(s1.c) occurrences
FROM
(SELECT question_id, answer, COUNT(*) c
FROM survey_result GROUP BY question_id,answer) s1
LEFT JOIN
(SELECT question_id, answer, COUNT(*) c
FROM survey_result GROUP BY question_id,answer) s2
ON s1.question_id=s2.question_id
AND s1.c < s2.c
WHERE s2.c IS NULL
GROUP BY question_id
An SQLfiddle to play with.
I think you are overcomplicating it, try this:
select question_id, answer, count(*) as occurence_number
from survey_result
group by question_id, answer

Get the last 2 rows of a table while grouping one of the column. MySQL

Consider Facebook. Facebook displays the latest 2 comments of any status. I want to do something similar.
I have a table with e.g. status_id, comment_id, comment and timestamp.
Now I want to fetch the latest 2 comments for each status_id.
Currently I am first doing a GROUP_CONCAT of all columns, group by status_id and then taking the SUBSTRING_INDEX with -2.
This fetches the latest 2 comments, however the GROUP_CONCAT of all the records for a status_id is an overhead.
SELECT SUBSTRING_INDEX(GROUP_CONCAT('~', comment_id,
'~', comment,
'~', timestamp)
SEPARATOR '|~|'),
'|~|', -2)
FROM commenttable
GROUP BY status_id;
Can you help me with better approach?
My table looks like this -
status_id comment_id comment timestamp
1 1 xyz1 3 hour
1 2 xyz2 2 hour
1 3 xyz3 1 hour
2 4 xyz4 2 hour
2 6 xyz6 1 hour
3 5 xyz5 1 hour
So I want the output as -
1 2 xyz2 2 hour
1 3 xyz3 1 hour
2 4 xyz4 2 hour
2 6 xyz6 1 hour
3 5 xyz5 1 hour
Here is a great answer I came across here:
select status_id, comment_id, comment, timestamp
from commenttable
where (
select count(*) from commenttable as f
where f.status_id = commenttable.status_id
and f.timestamp < commenttable.timestamp
) <= 2;
This is not very efficient (O(n^2)) but it's a lot more efficient than concatenating strings and using substrings to isolate your desired result. Some would say that reverting to string operations instead of native database indexing robs you of the benefits of using a database in the first place.
After some struggle I found this solution -
The following gives me the row_id -
SELECT a.status_id,
a.comments_id,
COUNT(*) AS row_num
FROM comments a
JOIN comments b
ON a.status_id = b.status_id AND a.comments_id >= b.comments_id
GROUP BY a.status_id , a.comments_id
ORDER BY row_num DESC
The gives me the total rows -
SELECT com.status_id, COUNT(*) total
FROM comments com
GROUP BY com.status_id
In the where clause of the main select -
row_num = total OR row_num = total - 1
This gives the latest 2 rows. You can modify the where clause to fetch more than 2 latest rows.