Need help understanding COUNT and <> operators (Hackerrank SQL Challenges question) - mysql

I'm using MySQL and am currently stuck trying to understand how a piece of code works. This is regarding the Hackerrank SQL question titled "Challenges". The problem statement is as follows
Julia asked her students to create some coding challenges. Write a
query to print the hacker_id, name, and the total number of challenges
created by each student. Sort your results by the total number of
challenges in descending order. If more than one student created the
same number of challenges, then sort the result by hacker_id. If more
than one student created the same number of challenges and the count
is less than the maximum number of challenges created, then exclude
those students from the result.
I have found a working MySQL solution courtesy of this page That uses the following code:
SELECT c.hacker_id, h.name, COUNT(c.challenge_id) AS cnt
FROM Hackers AS h JOIN Challenges AS c ON h.hacker_id = c.hacker_id
GROUP BY c.hacker_id, h.name HAVING
cnt = (SELECT COUNT(c1.challenge_id) FROM Challenges AS c1 GROUP BY c1.hacker_id ORDER BY COUNT(*) DESC LIMIT 1) OR
cnt NOT IN (SELECT COUNT(c2.challenge_id) FROM Challenges AS c2 GROUP BY c2.hacker_id HAVING c2.hacker_id <> c.hacker_id)
ORDER BY cnt DESC, c.hacker_id;
As of now I understand the problem statement up until "If more than one student created the same number of challenges and the count is less than the maximum number of challenges created, then exclude those students from the result." I simply have no clue how to structure a query to solve that statement.
In the code provided above, I understand everything it does until this section
cnt NOT IN (SELECT COUNT(c2.challenge_id) FROM Challenges AS c2 GROUP BY c2.hacker_id HAVING c2.hacker_id <> c.hacker_id)
Can anyone please help me understand what this line accomplishes and the logic behind it? Specifically I don't know what c2.hacker_id <> c.hacker_id is supposed to do. I'm guessing the whole line selects the number of challenge_ids done by particular hacker_ids who aren't the same person, but I have no clue how that solves the query.

Suppose you get a list of hacker ids and sub counts out of the query when you don't have this clause:
hacker, counter
1, 10
2, 9
3, 9
Two and three shouldn't be in there because they're tied on count, so we can implement it as excluding anyone who counted 9
Consider that conceptually the database will run the query for every row in the results: when processing hacker 2 row the query gets a list of challenge counts where someone whose id isnt 2. This means when considering hacker 2, the dB will pull back a list of the following counts:
10, --it comes from hacker 1
9 --it comes from hacker 3
The database then goes "i'm processing hacker 2, whose count is 9. I may only include hacker 2 in the results if hacker 2's count(9) is not in the following list of values: 10, 9. Oh, 9 is in the list of banned values. I'll exclude hacker 2 from the results
Repeat for hacker 3, this time a 9 count comes from hacker 2 so 3 is also excluded

Analytic functions greatly help with a question like this, so I will offer a solution using MySQL 8+, which, moving forward, will be the likely database which a reader of your question would be using (and HackerRank will at some point also be using MySQL 8+).
WITH cte AS (
SELECT
c.hacker_id,
h.name,
COUNT(c.challenge_id) AS cnt,
ROW_NUMBER() OVER (ORDER BY COUNT(c.challenge_id) DESC) rn,
MIN(c.hacker_id) OVER (PARTITION BY COUNT(c.challenge_id)) hacker_id_min,
MAX(c.hacker_id) OVER (PARTITION BY COUNT(c.challenge_id)) hacker_id_max
FROM Hackers AS h
INNER JOIN Challenges AS c
ON h.hacker_id = c.hacker_id
GROUP BY
c.hacker_id,
h.name
)
SELECT
hacker_id,
name,
cnt
FROM cte
WHERE
rn = 1 OR hacker_id_min = hacker_id_max
ORDER BY
cnt DESC,
c.hacker_id;
This answer words by computing a row number, sorted in descending order by the count. It also computes the min and max hacker_id values for each partition of challenge counts. Records are retained if they belong to the highest count, regardless of ties for first place. And records are also retained if the given count is only associated with a single user.

Related

SQL query for creating course lists. GROUP BY?

I have a mysql table "lessons" from which I want to output a course list for each teacher (see picture).
Up to now I wanted to use GROUP BY and GROUP_CONCAT for the output, but I failed so far. I have no idea at the moment. Can anyone help me?
If I understand correctly, lessons that are at the same time (based on the last two columns) and for the same teacher and have the same name should be combined as a "course".
If this is the case, you can use two levels of aggregation:
select group_concat(ids) as ids,
group_concat(class_id) as class_ids,
teacher, name, weekday_hours
from (select l.class_id, l.teacher_id, l.name,
group_concat(l.weekday, ':', l.hour order by l.weekday, l.hour) as weekday_hours,
group_concat(l.id order by l.id) as ids
from lessons l
group by l.class_id, l.teacher_id, l.name
) l
group by teacher, name, weekday_hours;

Meaning of this line: HAVING c2.hacker_id <> c.hacker.id?

I've been working on a HackerRank SQL question, "Challenges".
This is the problem:
Julia asked her students to create some coding challenges. Write a query to print the hacker_id, name, and the total number of challenges created by each student. Sort your results by the total number of challenges in descending order. If more than one student created the same number of challenges, then sort the result by hacker_id. If more than one student created the same number of challenges and the count is less than the maximum number of challenges created, then exclude those students from the result.
This question stumped me, and I was searching online to see how other people have solved it, when I came across this Github here, who proposed a really elegant solution.
SELECT c.hacker_id,
h.name,
COUNT(c.challenge_id) AS cnt
FROM Hackers AS h
JOIN Challenges AS c ON h.hacker_id = c.hacker_id
GROUP BY c.hacker_id, h.name
HAVING cnt = (SELECT COUNT(c1.challenge_id)
FROM Challenges AS c1
GROUP BY c1.hacker_id
ORDER BY COUNT(*) DESC
LIMIT 1)
OR
cnt NOT IN (SELECT COUNT(c2.challenge_id)
FROM Challenges AS c2
GROUP BY c2.hacker_id
HAVING c2.hacker_id <> c.hacker_id)
ORDER BY cnt DESC, c.hacker_id;
I understand how the poster ensured that we'd only get the maximum number of challenges in the first condition of the HAVING, but as for the second part, I don't really understand. I'm not sure why the second condition of the HAVING query works: HAVING c2.hacker_id <> c.hacker_id
Won't c2.hacker_id = c.hacker_id always?
if you equalize them (c2.hacker_id = c.hacker_id), the results relates with same hacker_id. on the inequality condition it counts by not including hacker_id itself. it means count number of challenges whose hacker_id is not equal to hacker_id of the outer query and exempt those counts from main query.

SQL How to Count table for each fields

I have 3 tables which is:
Courses
courses_id
name
QnAs
qna_id
student_id
courses_id
name
question
Students
student_id
name
Now I'm trying to count how many qna's there are for each courses. How do i make the query?
I've tried doing this :
SELECT (SELECT COUNT(qna_id) AS Expr1
FROM QnAs) AS Count
FROM QnAs AS QnAs_1 CROSS JOIN
Courses
GROUP BY Courses.courses_id
It does counts how many QnA's there are but not for each Courses
The output i got is each Courses names and QnAs count number but what i want is the QnA's number for each of the Courses
It seems you merely want to aggregate QNAs by course ID:
select courses_id, count(*)
from qnas
group by courses_id
order by courses_id;
Along with the course names:
select c.course_id, c.name, coalesce(q.cnt, 0) as qna_count
from courses c
left join
(
select courses_id, count(*) as cnt
from qnas
group by courses_id
) q on q.course_id = c.course_id
order by c.course_id;
Why not just use GROUP BY?
SELECT q.courses_id, COUNT(qna_id) as cnt
FROM QnAs q
GROUP BY q.courses_id;
This is not an answer, but just an explanation what your query does.
In your own query you first cross join all QnAs with all courses for no apparent reason, thus getting all possible combinations. So with two courses, each with three QNAs (that makes six QNAs in total), you'd construct 2 x 6 = 12 rows.
For each of these rows you select the total number of rows in the QNA table, which is six in above example. So you'd select 12 rows, all showing the number 6.
But then you group by course ID, thus ending up with two rows only in my example. You should apply an aggregate function on your subquery, e.g. MAX or SUM, but you don't, which makes your query invalid (because you are dealing with many rows, but treat this as if it were a single value). MySQL however silently applies ANY_VALUE, so your query becomes:
SELECT
ANY_VALUE( (SELECT COUNT(*) FROM QnAs) ) AS Count
FROM QnAs AS QnAs_1
CROSS JOIN Courses
GROUP BY Courses.courses_id;
I hope this explanation helps you understand how joins and aggregation work. You may want to set ONLY_FULL_GROUP_BY mode (https://dev.mysql.com/doc/...) in order to have MySQL report the syntax error instead of silently "fixing" the query by applying ANY_VALUE.

How to find the count of workers working on the exact same projects

Given the following table where both rows make up the key:
WORKS_ON: ID, ProjectNum
Is it possible to output for each ID, the number of other workers (each ID corresponds to a worker) who also work on the exact same projects?
For example if we have ID 1 working on projects 2 and 3, 2 working on 2 and 3 working on 2 and 3. I want to output 1 with 1 as the count, 2 with 0 as the count and 3 with 1 as the count.
I am not sure if this is even possible, I did think of a naive way whereby I select the count of rows for each ID and then order by ProjectNumber before comparing all of the ProjectNumber's one by one. But I am not sure how to then count the number of employees satisfying this or how to formulate it in an acceptable way.
Any guidance is appreciated.
You can express this rather convolutedly using set mechanisms in SQL. Or, you can do it more simply using string aggregation. So, in MySQL, this would look like:
select i.id, count(i2.id) as cnt
from (select id, group_concat(projectnum order projectnum) as projects
from works_on
group by id
) i left join
(select id, group_concat(projectnum order projectnum) as projects
from works_on
group by id
) i2
on i2.projects = i.projects and i2.id <> i.id
group by i.id
order by i.id;

MySQL - Get row and average of rows

First of all I'll just warn everyone that I'm something of a rookie with MySQL. Additionally I haven't tested the example queries below so they might not be perfect.
Anyway, I have a table of items, each one with a name, a category and a score. Every 12 hours the top item is taken, used and then removed.
So far I've simply been grabbing the top item with
SELECT * FROM items_table ORDER BY score DESC LIMIT 1
The only issue with this is that some categories are biased and have generally higher scores. I'd like to solve this by sorting by the score divided by the average score instead of simply sorting by the score. Something like
ORDER BY score/(GREATEST(5,averageScore))
I'm now trying to work out the best way to find averageScore. I have another table for categories so obviously I could add an averageScore column to that and run a cronjob to keep them updated and retrieve them with something like
SELECT * FROM items_table, categories_table WHERE items_table.category = categories_table.category ORDER BY items_table.score/(GREATEST(5,categories_table.averageScore)) DESC LIMIT 1
but this feels messy. I know I can find all the averages using something like
SELECT AVG(score) FROM items_table GROUP BY category
What I'm wondering is if there's some way to retrieve the averages right in the one query.
Thanks,
YM
You can join the query that calculates the averages:
SELECT i.*
FROM items_table i JOIN (
SELECT category, AVG(score) AS averageScore
FROM items_table
GROUP BY category
) t USING (category)
ORDER BY i.score/GREATEST(5, t.averageScore) DESC
LIMIT 1