MySQL - Complex COUNT Query - mysql

I have a table called user_scores as below:
id | af_id | uid | level | record_date
----------------------------------------
1 | 1.1 | 1 | 3 | 2012-01-01
2 | 1.1 | 1 | 4 | 2012-02-01
3 | 1.2 | 1 | 3 | 2012-01-01
4 | 1.2 | 1 | 5 | 2012-03-01
...
I have another table call user_info as below:
uid | forename | surname | gender
-----------------------------------
1 | Homer | Simpson | M
2 | Marge | Simpson | F
3 | Bart | Simpson | M
4 | Lisa | Simpson | F
...
In user scores uid is the user id of a registered user on the system, af_id identifies a particular test a user submits. A user scores a level between 1 - 5 for each test, which can be submitted every month.
My problem is I need to produce an analysis at the end of the year to COUNT the number of users that have achieved each level for a particular test. The analysis is to show a gender split for male and female.
So for example an administrator would select test 1.1 and the system would generate stats based that would COUNT of the total MAX level achieved by each user in the year, with a gender split.
Any help is much appreciated. Thank you in advance.
-
I think I need to clarify myself a bit. Because a user can complete the test multiple times throughout the year, there will be multiple scores for the same test. The query should take the highest level achieved and include this in the count. An example result would be:
Male Results:
level1 | level2 | level3 | level4 | level5
------------------------------------------
2 | 5 | 10 | 8 | 1

I am not certain I get exactly what you mean, but as always I'll have a go. As I understand it you want to know how many people from each gender reached each level in a certain year.
SELECT MaxLevel,
COUNT(CASE WHEN ui.Gender = 'M' THEN 1 END) AS Males,
COUNT(CASE WHEN ui.Gender = 'F' THEN 1 END) AS Females
FROM User_Info ui
INNER JOIN
( SELECT MAX(Level) AS MaxLevel,
UID
FROM User_Scores us
WHERE af_ID = '1.1'
AND YEAR(Record_Date) = 2012
GROUP BY UID
) AS MaxUs
ON MaxUs.uid = ui.UID
GROUP BY MaxLevel
I've put some sample data on SQL Fiddle so you see if it is what you were after.
EDIT
To transpose the data so levels are along the top and Gender in the rows the following will work:
SELECT Gender,
COUNT(CASE WHEN MaxLevel = 1 THEN 1 END) AS Level1,
COUNT(CASE WHEN MaxLevel = 2 THEN 1 END) AS Level2,
COUNT(CASE WHEN MaxLevel = 3 THEN 1 END) AS Level3,
COUNT(CASE WHEN MaxLevel = 4 THEN 1 END) AS Level4,
COUNT(CASE WHEN MaxLevel = 5 THEN 1 END) AS Level5
FROM User_Info ui
INNER JOIN
( SELECT MAX(Level) AS MaxLevel,
UID
FROM User_Scores us
WHERE af_ID = '1.1'
AND YEAR(Record_Date) = 2012
GROUP BY UID
) AS MaxUs
ON MaxUs.uid = ui.UID
GROUP BY Gender
Note, that if there are ever more than 5 levels you will need to add more to the select statement, or start building dynamic SQL.

Assuming record_date holds only dates (without time parts):
SELECT
s.maxlevel,
COUNT(NULLIF(gender, 'F')) AS M,
COUNT(NULLIF(gender, 'M')) AS F
FROM user_info u
INNER JOIN (
SELECT
uid,
MAX(level) AS maxlevel
FROM user_scores
WHERE record_date > DATE_SUB(CURDATE(), INTERVAL DAYOFYEAR(CURDATE()) DAY)
AND af_id = '1.1'
GROUP BY
uid
) s ON s.uid = u.uid
GROUP BY
s.maxlevel
That will show you only the maximum levels found in the user_scores table. If you have a Levels table where all possible levels (1 to 5) are listed, you could use that table to get a complete list of levels. If some levels are not present in the requested subset of data, the corresponding rows will show 0s in both columns.
Here's the above script with minor changes to show the complete chart of levels:
SELECT
l.level AS maxlevel,
COUNT(NULLIF(gender, 'F')) AS M,
COUNT(NULLIF(gender, 'M')) AS F
FROM user_info u
INNER JOIN (
SELECT
uid, MAX(level) AS maxlevel
FROM user_scores
WHERE record_date > DATE_SUB(CURDATE(), INTERVAL DAYOFYEAR(CURDATE()) DAY)
AND af_id = '1.1'
GROUP BY
uid
) s ON s.uid = u.uid
RIGHT JOIN Levels l ON s.maxlevel = l.level
GROUP BY
l.level

Hope this is what your looking for!
Show number of records group by userid and gender of the max score for af_id '1.1'.
select count(*), info.uid, info.gender, max(score.level)
from user_info as info
join user_scores as score
on info.uid = score.uid
where score.af_id = '1.1'
group by info.uid, info.gender;

EDITED based on your edit.
select sum(if(a.gender="M",1,0)) Male_users, sum(if(a.gender="F",1,0)) Female_users
from myTable a where
a.level = (select max(b.level) from myTable b where a.uid=b.uid)
group by af_id.
I typed this in a rush. But it should work or at least get you where you need to go. E.G. if you need to specify time frame, add that.

You need something like
SELECT
uid,
MAX(level)
WHERE
record_date BETWEEN '2012-01-01' AND '2012-12-31'
AND af_id='1.1'
GROUP BY uid
If you need the gender splits then depending on what stat you need per gender you can either add a JOIN on the user_info table into this query (to get the MAX per gender) to wrap this as a sub-query and JOIN on the whole thing.

Related

How to get list of students who have enrolled atleast once and then final status is not enrolled?

I've following table, It has log of students who enrolled and enrolled out datewise.
student_id | is_enrolled | created_at
-------------------------------------
1 | 1 | 2020-01-01
2 | 0 | 2020-01-02
3 | 0 | 2020-01-01
1 | 0 | 2020-01-02
4 | 1 | 2020-01-02
1 | 0 | 2020-01-03
3 | 0 | 2020-01-03
4 | 1 | 2020-01-04
If you see, the student 1 has enrolled on 2020-01-01 and then enrolled out on 2020-01-02. Student 2 and 3 have never enrolled. Student 4 enrolled multiple times but never enrolled out. Hence, not in the output.
Basically, I want to write a query whose output is students like 1, who have atleast enrolled once and final status is not enrolled. I was able to get all the enrolled students, but stuck after that point.
My queries,
SELECT DISTINCT student_id
FROM student
WHERE is_enrolled = 1
ORDER
BY student_id; # gives me 1 and 4
SQL fiddle
Ideally, a single query solution without nested query would be awesome. I'm, okay with multiple query solution as well.
Note: I was able to get the required output by using for-loops in my code, but I would like to learn can I do this just by SQL queries. I'm not looking for any programming language code.
SELECT DISTINCT x.*
FROM student x
JOIN
( SELECT student_id
, MAX(created_at) created_at
FROM student
GROUP
BY student_id
) y
ON y.student_id = x.student_id
AND y.created_at = x.created_at
JOIN student z
ON z.student_id = x.student_id
AND z.is_enrolled = 1
WHERE x.is_enrolled = 0;
As an aside, never use SELECT *, and in the absence of any aggregating functions, a GROUP BY clause is NEVER appropriate.
I'm not a DBA ( database expert ), but I'll normally use something like this for my MSSQL database:
WITH summary AS (
SELECT
student_id,
is_enrolled,
created_at
ROW_NUMBER() OVER(PARTITION BY s.student_id ORDER BY s.created_at DESC) AS rk
FROM student s)
SELECT s.*
FROM summary s
WHERE s.rk = 1
AND is_enrolled = 1
What I did was adding an extra column after the order by is done, you want to see if the latest created value has an is_enrolled value of 1.
The "With" part is used to define a sub query, with some extra logic in there.
You can use aggreation:
select student_id
from student s
group by student_id
having sum(is_enrolled) >= 1 and
max(created_at) = max(case when is_enrolled = 0 then created_at end);
The first condition checks that the student is enrolled at least once.
The second checks that the latest created_at is the latest created_at for an unenrolled record. That checks that the last status is "unenrolled".
Here is the SQL Fiddle.

Query: I have 4 rows, need to add the results from 3 rows into one, and leave the last row untouched

I have a kind of tricky question for this query. First the code:
SELECT user_type.user_type_description,COUNT(incident.user_id) as Quantity
FROM incident
INNER JOIN user ON incident.user_id=user.user_id
INNER JOIN user_type ON user.user_type=user_type.user_type
WHERE incident.code=2
GROUP BY user.user_type
What Am I doing?
For example, I am counting police reports of robbery, made from different kind of users. In my example, "admin" users reported 6 incidents of code "2" (robbery) and so on, as is showed in 'where' clause (incident must be robbery, also code 2).
this brings the following result:
+-----------------------+----------+
| user_type_description | Quantity |
+-----------------------+----------+
| Admin | 6 |
| Moderator | 8 |
| Fully_registered_user | 8 |
| anonymous_user | 9 |
+-----------------------+----------+
Basically Admin,Moderator and Fully_registered_user are appropriately registered users. I need to add them in a result where it shows like:
+--------------+------------+
| Proper_users | Anonymous |
+--------------+------------+
| 22 | 9 |
+--------------+------------+
I am not good with sql. Any help is appreciated. Thanks.
You can try to use condition aggregate function base on your current result set.
SUM with CASE WHEN expression.
SELECT SUM(CASE WHEN user_type_description IN ('Admin','Moderator','Fully_registered_user') THEN Quantity END) Proper_users,
SUM(CASE WHEN user_type_description = 'anonymous_user' THEN Quantity END) Anonymous
FROM (
SELECT user_type.user_type_description,COUNT(incident.user_id) as Quantity
FROM incident
INNER JOIN user ON incident.user_id=user.user_id
INNER JOIN user_type ON user.user_type=user_type.user_type
WHERE incident.code=2
GROUP BY user.user_type
) t1
You just need conditional aggregation:
SELECT SUM( ut.user_type_description IN ('Admin', 'Moderator', 'Fully_registered_user') ) as Proper_users,
SUM( ut.user_type_description IN ('anonymous_user') as anonymous
FROM incident i INNER JOIN
user u
ON i.user_id = u.user_id INNER JOIN
user_type ut
ON u.user_type = ut.user_type
WHERE i.code = 2;
Notes:
Table aliases make the query easier to write and to read.
This uses a MySQL shortcut for adding values -- just just adding the booelean expressions.
I would solve it with a CTE, but it would be better to have this association in a table.
WITH
user_type_categories
AS
(
SELECT 'Admin' AS [user_type_description] , 'Proper_users' AS [user_type_category]
UNION SELECT 'Moderator' AS [user_type_description] , 'Proper_users' AS [user_type_category]
UNION SELECT 'Fully_registered_user' AS [user_type_description] , 'Proper_users' AS [user_type_category]
UNION SELECT 'anonymous_user' AS [user_type_description] , 'Anonymous' AS [user_type_category]
)
SELECT
CASE WHEN utc.[user_type_category] = 'Proper_users' THEN
SUM(incident.user_id)
END AS [Proper_Users_Quantity]
, CASE WHEN utc.[user_type_category] = 'Anonymous' THEN
SUM(incident.user_id)
END AS [Anonymous_Quantity]
FROM
[incident]
INNER JOIN [user] ON [incident].[user_id] = [user].[user_id]
INNER JOIN [user_type] ON [user].[user_type] = [user_type].[user_type]
LEFT JOIN user_type_categories AS utc ON utc.[user_type_description] = [user_type].[user_type_description]
WHERE
[incident].[code] = 2

GROUP BY & COUNT with multiple parameters

I have a simple configuration :
2 tables linked in a many-to-many relation, so it gave me 3 tables.
Table author:
idAuthor INT
name VARCHAR
Table publication:
idPublication INT,
title VARCHAR,
date YEAR,
type VARCHAR,
conference VARCHAR,
journal VARCHAR
Table author_has_publication:
Author_idAuthor,
Publication_idPublication
I am trying to get all the authors name that have published at least 2 papers in conference SIGMOD and conference PVLDB.
Right now I achieved this but I still have a double result. My query :
SELECT author.name, publication.journal, COUNT(*)
FROM author
INNER JOIN author_has_publication
ON author.idAuthor = author_has_publication.Author_idAuthor
INNER JOIN publication
ON author_has_publication.Publication_idPublication = publication.idPublication
GROUP BY publication.journal, author.name
HAVING COUNT(*) >= 2
AND (publication.journal = 'PVLDB' OR publication.journal = 'SIGMOD');
returns
+-------+---------+----------+
| name | journal | COUNT(*) |
+-------+---------+----------+
| Renee | PVLDB | 2 |
| Renee | SIGMOD | 2 |
+-------+---------+----------+
As you can see the result is correct but doubled, as I just want 1 time the name.
Other question, how to modify the number parameter for only one conference, for example get all the author that published at least 3 SIGMOD and at least 1 PVLDB ?
If you don't care about the journal , don't select it, it is splitting your results. Also, normal filters need to be placed in the WHERE clause, not the HAVING clause :
SELECT author.name, COUNT(*)
FROM author
INNER JOIN author_has_publication
ON author.idAuthor = author_has_publication.Author_idAuthor
INNER JOIN publication
ON author_has_publication.Publication_idPublication =
publication.idPublication
WHERE publication.journal IN('PVLDB','SIGMOD')
GROUP BY author.name
HAVING COUNT(CASE WHEN publication.journal = 'SIGMOD' THEN 1 END) >= 2
AND COUNT(CASE WHEN publication.journal = 'PVLDB' THEN 1 END) >= 2;
For the second question, use this HAVING() clause :
HAVING COUNT(CASE WHEN publication.journal = 'SIGMOD' THEN 1 END) >= 3
AND COUNT(CASE WHEN publication.journal = 'PVLDB' THEN 1 END) >= 1;

SQL: count of distinct users with conditions based on many to many table

I have a typical user table in addition to the following feature table
features:
-----------------------
| userId | feature |
-----------------------
| 1 | account |
| 1 | hardware |
| 2 | account |
| 3 | account |
| 3 | hardware |
| 3 | extra |
-----------------------
Basically I am trying to get some counts for reporting purposes. In particular, I am trying to find the number of users with accounts and hardware along with the total number of accounts.
I know I can do the following to get the total number of accounts
SELECT
COUNT(DISTINCT userId) as totalAccounts
FROM features
WHERE feature = "account";
I am unsure as to how to get the number of users with both accounts and hardware though. In this example dataset, the number I am looking for is 2. Users 1 and 3 have both accounts and hardware.
I would prefer to do this in a single query. Possibly using CASE (example for totalAccounts below):
SELECT
COUNT(DISTINCT(CASE WHEN feature = "account" THEN userId END)) as totalAccounts,
COUNT( ? ) as accountsWithHardware
FROM features;
These are two queries - one for the all user count, one for the two-features user count - that you can combine with a cross join:
select
count_all_users.cnt as all_user_count,
count_users_having_both.cnt as two_features_user_count
from
(
select count(distinct userid) as cnt
from features
) count_all_users
cross join
(
select count(*) as cnt
from
(
select userid
from features
where feature in ('account', 'hardware')
group by userid
having count(*) = 2
) users_having_both
) count_users_having_both;
UPDATE: With some thinking, there is a much easier way. Group by user and detect whether feature 1 and feature 2 exists. Then count.
select
count(*) as all_user_count,
count(case when has_account = 1 and has_hardware = 1 then 1 end)
as two_features_user_count
from
(
select
userid,
max(case when feature = 'account' then 1 else 0 end) as has_account,
max(case when feature = 'hardware' then 1 else 0 end) as has_hardware
from features
group by userid
) users;

SQL find team that only contains specified 2 users

I have a table called team_members with this structure and contents:
+---------+---------+
| team_id | user_id |
+---------+---------+
| 1 | 18 |
+---------+---------+
| 1 | 7 |
+---------+---------+
| 3 | 18 |
+---------+---------+
What i am trying to do is to find a team that only contains 2 users and this users are supplied by me (in this case users with id 7 and 18). Unfortunately, i am having no ideas about how to make this query properly. I have tried something like
SELECT a.team_uid
FROM team_members a
INNER JOIN (
SELECT team_uid, user_id, COUNT(*) cnt_team
FROM team_members
GROUP BY team_uid
HAVING COUNT(*) = 2
) b ON a.user_id = b.user_id
Use Case statement in Having clause and Count only the required user_id's. Try this.
select teamid from yourtable
group by teamid
having count(case when userid=7 then 1 end)=1
and count(case when userid=18 then 1 end)=1
and count(1)=2
Something to think about (and assuming a PK on team_id,user_id)...
SELECT x.*, COUNT(*),SUM(user_id IN(7,18)) FROM my_table x GROUP BY team_id;
A couple more ways to do this (where $id1 and $id2 are the users in question):
SELECT team_id
FROM team_members
GROUP BY team_id
HAVING COUNT(*) = 2
AND MIN(user_id) = LEAST($id1,$id2)
AND MAX(user_id) = GREATEST($id1,$id2)
See SQL Fiddle Demo here with values of 7 and 18 for $id1 and $id2. I am using LEAST() and GREATEST() in case it's not known which is the higher and which is the lower (for example, if they're coming from user input).
SELECT team_id
FROM team_members
GROUP BY team_id
HAVING GROUP_CONCAT(user_id ORDER BY user_id) = ('7,18')
See SQL Fiddle Demo here. Again, if it isn't known which is the higher and which is the lower, then this might be written as (the ORDER BY in GROUP_CONCAT() would be unnecessary):
SELECT team_id
FROM team_members
GROUP BY team_id
HAVING GROUP_CONCAT(user_id) IN ('$id1,$id2','$id2,$id1')
You can also use:
select team_id
from team_members
group by team_id
having sum(user_id not in(7, 18)) = 0
Example above assumes you want teams with only users 7 or 18 (no others, but not necessarily both).
If you want teams with BOTH users 7 and 18, and no others, you can use:
select team_id
from team_members
group by team_id
having sum(user_id not in(7, 18)) = 0
and sum(user_id in(7, 18)) = 2