I've been working on a HackerRank SQL question, "Challenges".
This is the problem:
Julia asked her students to create some coding challenges. Write a query to print the hacker_id, name, and the total number of challenges created by each student. Sort your results by the total number of challenges in descending order. If more than one student created the same number of challenges, then sort the result by hacker_id. If more than one student created the same number of challenges and the count is less than the maximum number of challenges created, then exclude those students from the result.
This question stumped me, and I was searching online to see how other people have solved it, when I came across this Github here, who proposed a really elegant solution.
SELECT c.hacker_id,
h.name,
COUNT(c.challenge_id) AS cnt
FROM Hackers AS h
JOIN Challenges AS c ON h.hacker_id = c.hacker_id
GROUP BY c.hacker_id, h.name
HAVING cnt = (SELECT COUNT(c1.challenge_id)
FROM Challenges AS c1
GROUP BY c1.hacker_id
ORDER BY COUNT(*) DESC
LIMIT 1)
OR
cnt NOT IN (SELECT COUNT(c2.challenge_id)
FROM Challenges AS c2
GROUP BY c2.hacker_id
HAVING c2.hacker_id <> c.hacker_id)
ORDER BY cnt DESC, c.hacker_id;
I understand how the poster ensured that we'd only get the maximum number of challenges in the first condition of the HAVING, but as for the second part, I don't really understand. I'm not sure why the second condition of the HAVING query works: HAVING c2.hacker_id <> c.hacker_id
Won't c2.hacker_id = c.hacker_id always?
if you equalize them (c2.hacker_id = c.hacker_id), the results relates with same hacker_id. on the inequality condition it counts by not including hacker_id itself. it means count number of challenges whose hacker_id is not equal to hacker_id of the outer query and exempt those counts from main query.
Related
select h.hacker_id, name, count(challenge_id) as total from Challenges c inner join Hackers h on h.hacker_id=c.hacker_id group by h.hacker_id, name
having total not in
(select count(challenge_id) as cnt from Challenges c where c.hacker_id!=h.hacker_id
group by c.hacker_id
having cnt!= (select max(count(challenge_id)) from Challenges group by hacker_id))
order by total desc, h.hacker_id
Here is my MySql code and I got an error: ERROR 1111 (HY000) at line 1: Invalid use of group function.
I don't know what is wrong with this line: (select max(count(challenge_id)) from Challenges group by hacker_id)
How can I fix this error?
The link of the problem that I want to solve: https://www.hackerrank.com/challenges/challenges/problem
You can't have 2 grouping functions together with 1 group by.
Instead of
select max(count(challenge_id)) from Challenges group by hacker_id)
you can do
select max(cnt_challenge) from (select count(challenge_id) as cnt_challenge from Challenges group by hacker_id))
The problem is the max(count(). However, I would solve it using limit` rather than an additional subquery:
having cnt <> (select count(*)
from challenges
group by hacker_id
order by count(*) desc
limit 1
)
That said, this query is probably better written using window functions. However, without sample data, desired results, and a clear explanation of what the query should be doing, it is hard to make concrete suggestions.
I have a mysql table "lessons" from which I want to output a course list for each teacher (see picture).
Up to now I wanted to use GROUP BY and GROUP_CONCAT for the output, but I failed so far. I have no idea at the moment. Can anyone help me?
If I understand correctly, lessons that are at the same time (based on the last two columns) and for the same teacher and have the same name should be combined as a "course".
If this is the case, you can use two levels of aggregation:
select group_concat(ids) as ids,
group_concat(class_id) as class_ids,
teacher, name, weekday_hours
from (select l.class_id, l.teacher_id, l.name,
group_concat(l.weekday, ':', l.hour order by l.weekday, l.hour) as weekday_hours,
group_concat(l.id order by l.id) as ids
from lessons l
group by l.class_id, l.teacher_id, l.name
) l
group by teacher, name, weekday_hours;
I'm using MySQL and am currently stuck trying to understand how a piece of code works. This is regarding the Hackerrank SQL question titled "Challenges". The problem statement is as follows
Julia asked her students to create some coding challenges. Write a
query to print the hacker_id, name, and the total number of challenges
created by each student. Sort your results by the total number of
challenges in descending order. If more than one student created the
same number of challenges, then sort the result by hacker_id. If more
than one student created the same number of challenges and the count
is less than the maximum number of challenges created, then exclude
those students from the result.
I have found a working MySQL solution courtesy of this page That uses the following code:
SELECT c.hacker_id, h.name, COUNT(c.challenge_id) AS cnt
FROM Hackers AS h JOIN Challenges AS c ON h.hacker_id = c.hacker_id
GROUP BY c.hacker_id, h.name HAVING
cnt = (SELECT COUNT(c1.challenge_id) FROM Challenges AS c1 GROUP BY c1.hacker_id ORDER BY COUNT(*) DESC LIMIT 1) OR
cnt NOT IN (SELECT COUNT(c2.challenge_id) FROM Challenges AS c2 GROUP BY c2.hacker_id HAVING c2.hacker_id <> c.hacker_id)
ORDER BY cnt DESC, c.hacker_id;
As of now I understand the problem statement up until "If more than one student created the same number of challenges and the count is less than the maximum number of challenges created, then exclude those students from the result." I simply have no clue how to structure a query to solve that statement.
In the code provided above, I understand everything it does until this section
cnt NOT IN (SELECT COUNT(c2.challenge_id) FROM Challenges AS c2 GROUP BY c2.hacker_id HAVING c2.hacker_id <> c.hacker_id)
Can anyone please help me understand what this line accomplishes and the logic behind it? Specifically I don't know what c2.hacker_id <> c.hacker_id is supposed to do. I'm guessing the whole line selects the number of challenge_ids done by particular hacker_ids who aren't the same person, but I have no clue how that solves the query.
Suppose you get a list of hacker ids and sub counts out of the query when you don't have this clause:
hacker, counter
1, 10
2, 9
3, 9
Two and three shouldn't be in there because they're tied on count, so we can implement it as excluding anyone who counted 9
Consider that conceptually the database will run the query for every row in the results: when processing hacker 2 row the query gets a list of challenge counts where someone whose id isnt 2. This means when considering hacker 2, the dB will pull back a list of the following counts:
10, --it comes from hacker 1
9 --it comes from hacker 3
The database then goes "i'm processing hacker 2, whose count is 9. I may only include hacker 2 in the results if hacker 2's count(9) is not in the following list of values: 10, 9. Oh, 9 is in the list of banned values. I'll exclude hacker 2 from the results
Repeat for hacker 3, this time a 9 count comes from hacker 2 so 3 is also excluded
Analytic functions greatly help with a question like this, so I will offer a solution using MySQL 8+, which, moving forward, will be the likely database which a reader of your question would be using (and HackerRank will at some point also be using MySQL 8+).
WITH cte AS (
SELECT
c.hacker_id,
h.name,
COUNT(c.challenge_id) AS cnt,
ROW_NUMBER() OVER (ORDER BY COUNT(c.challenge_id) DESC) rn,
MIN(c.hacker_id) OVER (PARTITION BY COUNT(c.challenge_id)) hacker_id_min,
MAX(c.hacker_id) OVER (PARTITION BY COUNT(c.challenge_id)) hacker_id_max
FROM Hackers AS h
INNER JOIN Challenges AS c
ON h.hacker_id = c.hacker_id
GROUP BY
c.hacker_id,
h.name
)
SELECT
hacker_id,
name,
cnt
FROM cte
WHERE
rn = 1 OR hacker_id_min = hacker_id_max
ORDER BY
cnt DESC,
c.hacker_id;
This answer words by computing a row number, sorted in descending order by the count. It also computes the min and max hacker_id values for each partition of challenge counts. Records are retained if they belong to the highest count, regardless of ties for first place. And records are also retained if the given count is only associated with a single user.
I have 3 tables which is:
Courses
courses_id
name
QnAs
qna_id
student_id
courses_id
name
question
Students
student_id
name
Now I'm trying to count how many qna's there are for each courses. How do i make the query?
I've tried doing this :
SELECT (SELECT COUNT(qna_id) AS Expr1
FROM QnAs) AS Count
FROM QnAs AS QnAs_1 CROSS JOIN
Courses
GROUP BY Courses.courses_id
It does counts how many QnA's there are but not for each Courses
The output i got is each Courses names and QnAs count number but what i want is the QnA's number for each of the Courses
It seems you merely want to aggregate QNAs by course ID:
select courses_id, count(*)
from qnas
group by courses_id
order by courses_id;
Along with the course names:
select c.course_id, c.name, coalesce(q.cnt, 0) as qna_count
from courses c
left join
(
select courses_id, count(*) as cnt
from qnas
group by courses_id
) q on q.course_id = c.course_id
order by c.course_id;
Why not just use GROUP BY?
SELECT q.courses_id, COUNT(qna_id) as cnt
FROM QnAs q
GROUP BY q.courses_id;
This is not an answer, but just an explanation what your query does.
In your own query you first cross join all QnAs with all courses for no apparent reason, thus getting all possible combinations. So with two courses, each with three QNAs (that makes six QNAs in total), you'd construct 2 x 6 = 12 rows.
For each of these rows you select the total number of rows in the QNA table, which is six in above example. So you'd select 12 rows, all showing the number 6.
But then you group by course ID, thus ending up with two rows only in my example. You should apply an aggregate function on your subquery, e.g. MAX or SUM, but you don't, which makes your query invalid (because you are dealing with many rows, but treat this as if it were a single value). MySQL however silently applies ANY_VALUE, so your query becomes:
SELECT
ANY_VALUE( (SELECT COUNT(*) FROM QnAs) ) AS Count
FROM QnAs AS QnAs_1
CROSS JOIN Courses
GROUP BY Courses.courses_id;
I hope this explanation helps you understand how joins and aggregation work. You may want to set ONLY_FULL_GROUP_BY mode (https://dev.mysql.com/doc/...) in order to have MySQL report the syntax error instead of silently "fixing" the query by applying ANY_VALUE.
I have read many posts have a solution for this but this does not work in my case. What am I doing wrong?
- This gives me the SUM of scores for every user and this is the first part.( Aggregated Data)
The Query result
SELECT user_id, sum(score) as total_user_score
FROM (
SELECT comments_proper.user_id, comments_proper.score
FROM assignment_2.comments_proper
) AS rsch
GROUP BY user_id;
However, I want only 2 records which contain the min and the max score values.
What am I doing wrong?
Oh dear, where to begin.
I have read many posts
You should have been paying attention to which ones got up-voted and good answers, and which were down-voted/closed. The former would have included the table structures, examples of input and expected output. And unambiguous questions.
I want only 2 records
Is that from the source data set or from the aggregated data set?
The latter is a slightly tricky problem which has been asked and answered many times here on SO, there are multiple solutions with different performance characteristics. There's even a chapter in the manual covering just this question. The current content at that link uses subqueries to identify the min/max value which replaces an earlier version of the documentation which explained the max-concat trick, but its also possible to use variables to identify the right caddidate rows in a sub-query or to use sorting.
However the SQL you've shown us here, has very little to do with solving the problem you describe, and is very badly written.
I won't provide examples of every solution, but this will solve your problem...
SELECT user_id, SUM(score)
FROM assignment_2.comments_proper
GROUP BY user_id
ORDER BY SUM(score)
UNION
SELECT user_id, SUM(score)
FROM assignment_2.comments_proper
GROUP BY user_id
ORDER BY SUM(score)
updated
I hadn't tested the above. I did test this:
SELECT *
FROM (
SELECT user_id, SUM(score)
FROM assignment_2.comments_proper
GROUP BY user_id
ORDER BY SUM(score) LIMIT 0,1
) as lowest
UNION ALL
SELECT *
FROM (
SELECT user_id, SUM(score)
FROM assignment_2.comments_proper
GROUP BY user_id
ORDER BY SUM(score) DESC LIMIT 0,1
) as highest
In your queries you have some problem of sintax and a too complex calculation for aggregated resul anyway .. in cp1.* result you have the min related values in cp2.* the max related.
If you need all the resuly for min and max rows on the same row you can use a couple of inner join based on the aggregated result
select cp1.* , cp2.*
from ( SELECT cp.user_id, sum(cp.score), min(cp.score) min_score, max(cp.score) max_score
FROM assignment_2.comments_proper cp
group by cp.user_id ) t
inner join assignment_2.comments_proper cp1 on cp1.user_id = t.user_id
and cp1.score = t.min_score
inner join assignment_2.comments_proper cp2 on cp2.user_id = t.user_id
and cp2.score = t.max_score
otherwise if you want the result in two rows one for min and one for max
select 'min' , cp1.*
from ( SELECT cp.user_id, sum(cp.score), min(cp.score) min_score, max(cp.score) max_score
FROM assignment_2.comments_proper cp
group by cp.user_id ) t
inner join assignment_2.comments_proper cp1 on cp1.user_id = t.user_id
and cp1.score = t.min_score
union
select 'max' , cp2.*
from ( SELECT cp.user_id, sum(cp.score), min(cp.score) min_score, max(cp.score) max_score
FROM assignment_2.comments_proper cp
group by cp.user_id ) t
inner join assignment_2.comments_proper cp2 on cp2.user_id = t.user_id
and cp2.score = t.max_score