In the following, I am querying the same table 2 times. The second query is a nested query inside left join but queries the same table. The only difference is the addition of the aggregation function count, the result of which is used by the outer query. Is there a better way to approach this?
select sm.student_id, sm.marks, smarks.d as d_marks from student_marks as sm
left join(
select m.student_id, count(distinct m.marks) as d from student_marks as m group by m.student_id
) as smarks on smarks.student_id = sm.student_id;
Is it possible to do this in a single query without using a left join.
Yes there is an alternative approach which is using windowed functions. There's no way of doing COUNT(DISTINCT in a windowed function, but you can do this using DENSE_RANK() twice, once sorting by what you want a distinct count of ascending, and once descending, adding these together then taking one away:
SELECT sm.student_id,
sm.marks,
DENSE_RANK() OVER(PARTITION BY sm.student_id ORDER BY sm.marks DESC) +
DENSE_RANK() OVER(PARTITION BY sm.student_id ORDER BY sm.marks ASC) - 1 AS d_marks
FROM student_marks AS sm
N.B. this is not guaranteed to perform any better just because you are referencing a table one fewer times.
To explain the DENSE_RANK() trick, consider a simple data set:
marks
dense_rank ASC
dense_rank DESC
1
1
3
1
1
3
2
2
2
3
3
1
The two ranks added together will always be one more than the total number of items in the set (i.e. 1+3, 2+2, and 3+1 all equal 4), so we just need to take one off the result and this gives us our distinct count of items in the set without actually using COUNT(DISTINCT which isn't allowed (as noted in the restrictions)
ADENDUM
If marks is nullable (which I had assumed it would not be) and you don't want null rows included in the count, then as noted in the comments this wouldn't quite work as it is, you'd need to remove any null rows from the total, which can be done using:
- MAX(CASE WHEN sm.marks IS NULL THEN 1 ELSE 0 END) OVER(PARTITION BY sm.student_id)
Related
Input data:
npost_id
mid
like_count
7
t4
3
21
t11
2
30
t16
2
31
t16
2
32
t18
2
I want the post_id that received the most likes per one person.
I need to pick only one row with satisfying several conditions: Max(like_count), per 1 id (Can be duplicated), npost_id (primary key)
Here's what I've tried:
SELECT npost_id, mid, like_count
FROM feed
WHERE (mid, like_count) IN (SELECT mid, MAX(like_count)
FROM feed
GROUP BY mid)
I can't think of anything other than that query.
In MySQL 8.0, one option to retrieve only one row for each combination of <mid, like_count> is to use the ROW_NUMBER window function, which allows you to assign a ranking value for each combination of <mid, like_count> (a partition). In order to get only one element for each of these, it's sufficient to filter out rows that have ranking bigger than 1 (the rows that have repeated <mid, like_count> values).
WITH cte AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY mid, like_count ORDER BY npost_id) AS rn
FROM tab
)
SELECT npost_id, mid, like_count
FROM cte
WHERE rn = 1
Check the demo here.
In MySQL 5.7, you can instead aggregate on the two different combination of <mid, like_count> and take the smaller value for the npost_id field (given that you are willing to accept any npost_id value for the partition).
SELECT MIN(npost_id) AS npost_id,
mid,
like_count
FROM tab
GROUP BY mid, like_count
Check the demo here.
I have been practising SQL, and came across this behaviour i couldnt explain. ( I am also the one who asked this question : Over() function does not cover all rows in the table) -> its a different problem.
Suppose i have a table like this
MovieRating table:
movie_id
user_id
rating
created_at
1
1
3
2020-01-12
1
2
4
2020-02-11
1
3
2
2020-02-12
1
4
1
2020-01-01
2
1
5
2020-02-17
2
2
2
2020-02-01
2
3
2
2020-03-01
3
1
3
2020-02-22
3
2
4
2020-02-25
What I am trying to do, is to rank the movie by rating, which i have this SQL query:
SELECT
movie_id,
rank() over(partition by movie_id order by avg(rating) desc) as rank_rate
FROM
MovieRating
From my previous question, i learnt that the over() function will operate in a window selected by the query, basically the window this query returns:
SELECT movie_id FROM MovieRating
So I would expect to see at least 3 rows here, for id 1, 2 and 3.
The result is however just one row:
{"headers": ["movie_id", "rank_rate"], "values": [[1, 1]]}
Why is that ? Is something wrong with my understanding regarding how over() function works ?
You need an aggregation query and use RANK() window function on its results:
SELECT movie_id,
AVG(rating) AS average_rating, -- you may remove this line if you don't actually need the average rating
RANK() OVER (ORDER BY AVG(rating) DESC) AS rank_rate
FROM MovieRating
GROUP BY movie_id
ORDER BY rank_rate;
See the demo.
Your query is an aggregation query without a group by clause and this means that it operates on the whole table and not to each movie_id. Such queries return only 1 row with the result of the aggregation.
When yo apply RANK() window function, it will operate on that single row and not on the table.
I think you mean to get one row for each movie, with its average rating.
You should use GROUP BY, not a window function:
SELECT movie_id, AVG(rating) AS avg_rating
FROM MovieRating
GROUP BY movie_id
ORDER BY avg_rating DESC;
https://www.db-fiddle.com/f/o9qLFbJEwhaHDWoTS9Qfwp/1
The reason you only got one row is that when you use an aggregate function like AVG(), that implicitly makes the query into an aggregating query. The result of the query is one row per group.
https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html says:
If you use an aggregate function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.
In other words, the whole table is considered one "group" if you use AVG() but don't specify a GROUP BY expression. Because the whole table is a single group, the result is one row.
Windows defined by windowing functions are not the same as groups defined by aggregate functions. The window functions are applied after the rows have been reduced by aggregation. Since there was only one group and therefore one row in your result, the rank was 1.
I need to fill the group_concat with values, no more no less than 4 values, if there is more, create another row of the next 4 until there is no more.
I have this:
cd_table
GROUP_CONCAT
1
A04,A01,A00
2
A01
I need this:
cd_table
GROUP_CONCAT
1
A04,A01,A00,false
2
A01,false,false,false
3
A04,A01,A00,A03
3
A02,false,false,false
Fiddle: http://sqlfiddle.com/#!9/51b601/3/0
There are several components to this:
Limiting the ids to four per row.
Adding additional rows.
Adding the falses.
This can be incorporated into a query:
SELECT cd_guime,
SUBSTRING_INDEX(CONCAT(GROUP_CONCAT(ds_tbcid), ',false,false,false'), ',', 4)
FROM (SELECT guime.cd_guime, ds_tbcid,
ROW_NUMBER() OVER (PARTITION BY guime.cd_guime ORDER BY ds_tbcid) as seqnum
FROM guime JOIN
gucid
ON gucid.cd_guime = guime.cd_guime JOIN
tbcid
ON gucid.cd_tbcid = tbcid.cd_tbcid
) x
GROUP BY cd_guime, CEILING(seqnum / 4.0);
Here is a db<>fiddle.
Problem Statement: I need my result set to include records that would not naturally return because they are NULL.
I'm going to put some simplified code here since my code seems to be too long.
Table Scores has Company_type, Company, Score, Project_ID
Select Score, Count(Project_ID)
FROM Scores
WHERE company_type= :company_type
GROUP BY Score
Results in the following:
Score Projects
5 95
4 94
3 215
2 51
1 155
Everything is working fine until I apply a condition to company_type that does not include results in one of the 5 score categories. When this happens, I don't have 5 rows in my result set any more.
It displays like this:
Score Projects
5 5
3 6
1 3
I'd like it to display like this:
Score Projects
5 5
4 0
3 6
2 0
1 3
I need the results to always display 5 rows. (Scores = 1-5)
I tried one of the approaches below by Spencer7593. My simplified query now looks like this:
SELECT i.score AS Score, IFNULL(count(*), 0) AS Projects
FROM (SELECT 5 AS score
UNION ALL
SELECT 4
UNION ALL
SELECT 3
UNION ALL
SELECT 2
UNION ALL
SELECT 1) i
LEFT JOIN Scores ON Scores.score = i.score
GROUP BY Score
ORDER BY i.score DESC
And gives the following results, which is accurate except that the rows with 1 in Projects should actually be 0 because they are derived by the "i". There are no projects with a score of 5 or 2.
Score Projects
5 1
4 5
3 6
2 1
1 3
Solved! I just needed to adjust my count to specifically look at the project count - count(project) rather than count(*). This returned the expected results.
If you always want your query to return 5 rows, with Score values of 5,4,3,2,1... you'll need a rowsource that supplies those Score values.
One approach would be to use a simple query to return those fixed values, e.g.
SELECT 5 AS score
UNION ALL SELECT 4
UNION ALL SELECT 3
UNION ALL SELECT 2
UNION ALL SELECT 1
Then use that query as inline view, and do an outer join operation to the results from your current query
SELECT i.score AS `Score`
, IFNULL(q.projects,0) AS `Projects`
FROM ( SELECT 5 AS score
UNION ALL SELECT 4
UNION ALL SELECT 3
UNION ALL SELECT 2
UNION ALL SELECT 1
) i
LEFT
JOIN (
-- the current query with "missing" Score rows goes here
-- for completeness of this example, without the query
-- we emulate that result with a different query
SELECT 5 AS score, 95 AS projects
UNION ALL SELECT 3, 215
UNION ALL SELECT 1, 155
) q
ON q.score = i.score
ORDER BY i.score DESC
It doesn't have to be the view query in this example. But there does need to be a rowsource that the rows can be returned from. You could, for example, have a simple table that contains those five rows, with those five score values.
This is just an example approach for the general approach. It might be possible to modify your existing query to return the rows you want. But without seeing the query, the schema, and example data, we can't tell.
FOLLOWUP
Based on the edit to the question, showing an example of the current query.
If we are guaranteed that the five values of Score will always appear in the Scores table, we could do conditional aggregation, writing a query like this:
SELECT s.score
, COUNT(IF(s.company_type = :company_type,s.project_id,NULL)) AS projects
FROM Scores s
GROUP BY s.score
ORDER BY s.score DESC
Note that this will require a scan of all the rows, so it may not perform as well. The "trick" is the IF function, which returns a NULL value in place of project_id, when the row would have been excluded by the WHERE clause.)
If we are guaranteed that project_id is non-NULL, we could use a more terse MySQL shorthand expression to achieve an equivalent result...
, IFNULL(SUM(s.company_type = :company_type),0) AS projects
This works because MySQL returns 1 when the comparison is TRUE, and otherwisee returns 0 or NULL.
Try something like this:
select distinct score
from (
select distinct score from scores
) s
left outer join (
Select Score, Count(Project_ID) cnt
FROM Scores
WHERE company_type= :company_type
) x
on s.score = x.score
Your posted query would not work without a group by statement. However, even there, if you don't have those particular scores for that company type, it wouldn't work either.
One option is to use an outer join. That would require a little more work though.
Here's another option using conditional aggregation:
select Score, sum(company_type=:company_type)
from Scores
group by Score
I have a table with these columns:
id
user_id
player_in
player_out
date
I need to make a report that count the number of repetitions each "player" both in player_in field, as in player_out field.
For example, if I have this 2 rows in the table (in the respective order).
id user_id player_in player_out
1 1 88 56
2 7 77 88
The result for the player 88 will be 2, and for the players 56 and 77, just 1
Use a subquery that employs union all to get the two column into one column, then use a standard count(*):
Note: Thus query included individual totals for ins and outs as per further request in comments to this answer.
select
player_id,
count(*) as total,
sum(ins) as ins,
sum(outs) as outs
from (
select
player_in as player_id,
1 as ins,
0 as outs
from mytable
union all
select player_out, 0, 1
from mytable
) x
group by player_id
Note: you must use union all (not just union), because union removes duplicates whereas union all does not.
You could use a cross-join to a 2-row virtual table to unpivot the player_* columns, then group the results, like this:
SELECT
player,
COUNT(*) AS total_count
FROM (
SELECT
CASE WHEN x.is_in THEN t.player_in ELSE t.player_out END AS player
FROM mytable t
CROSS JOIN (SELECT TRUE AS is_in UNION ALL SELECT FALSE) x
) s
GROUP BY
player
;
That is, every row of the original table is essentially duplicated and each copy of the row supplies either player_in or player_out, depending on whether the derived table's is_in column is TRUE or FALSE, to form a single player column. This method of unpivoting might perform better than the UNION method suggested by #Bohemian because this way the (physical) table is passed just once (but you'd need to test and compare both methods to determine if there's any substantial benefit to this approach in your particular situation).
To calculate in and out counts, as you have requested in one of your comments to the above mentioned answer, you could extend my original suggestion like this:
SELECT
player,
COUNT( is_in OR NULL) AS in_count,
COUNT(NOT is_in OR NULL) AS out_count,
COUNT(*) AS total_count
FROM (
SELECT
x.is_in,
CASE WHEN x.is_in THEN t.player_in ELSE t.player_out END AS player
FROM mytable t
CROSS JOIN (SELECT TRUE AS is_in UNION ALL SELECT FALSE) x
) s
GROUP BY
player
;
As you can see, the derived table now additionally returns the is_in column in its own right, and the column is used in two conditional aggregations for counting how many times a player was in and out. (If you are interested, the OR NULL trick is explained here.)
You could also rewrite the COUNT(condition OR NULL) entries as SUM(condition). That would certainly shorten both expressions, some also find the SUM method of counting clearer/more elegant. In either event, there would likely be no difference in performance, so choose whichever method suits your taste better.
A SQL Fiddle demo of the second query can be found here.