SQL, group one column and count its attributes - mysql

Say I have a relation grades about students' grades like this:
| ID | semester | Year | course_id | grade |
|------+----------+------+-----------+-------+
| 1018 | Fall | 2002 | 272 | A+ |
| 107 | Fall | 2002 | 274 | B |
| 111 | Fall | 2002 | 123 | C |
/* a lot of data here */
|------+----------+------+-----------+-------+------------|
I wanna group by course_id and count its grades like this:
| course_id | semester | year | A+ | A- | B+ | B- | C+ | D+ | D- | else | sum |
| 1 | Fall | 2009 | 11 | 8 | 10 | 1 | 1 | 1 | 1 | 1 | 34 |
| 2 | Fall | 2009 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 8 |
I already figured out one solution but seems not satisfying to me:
/* use sum function */
select course_id, semester, year,
sum(if(grade = 'A+', 1, 0)) as 'A+',
sum(if(grade = 'A-', 1, 0)) as 'A-',
/* multiple lines */
from grades
group by course_id, semester, year;
I wonder if there is a more built-in way to make it, because my above solution is kinda of tricky and not general.
Can anyone offer better idea?
p.s.: yes it's a school assignment, and I want to seek more solutions:)
It will be appreciated if give me more hints.

There is no "simpler" way to do this. Well, actually, I would simplify the logic (assuming MySQL) to:
select course_id, semester, year,
sum( grade = 'A+') as `A+`,
sum(i grade = 'A-') as `A-`,
/* multiple lines */
from grades
group by course_id, semester, year;
However, that is probably not what you are looking for. A SQL query returns a fixes set of columns, with their names and types fixed. If you want the columns to be based on the actual values, then you cannot readily use a simple SQL statement.
You can use dynamic SQL, but that is even more complicated than your SQL query.

Related

How to check if a group has three consecutive values in a column?

I have a table games with values such as:
+----------+------+
| game | year |
+----------+------+
| Football | 1999 |
| Football | 2000 |
| Football | 2001 |
| Football | 2002 |
| Cricket | 1996 |
| Tennis | 2001 |
| Tennis | 2002 |
| Tennis | 2003 |
| Tennis | 2009 |
| Golf | 1994 |
| Golf | 1996 |
| Golf | 1997 |
+----------+------+
I am trying to see if a game has an entry with a minimum three consecutive years in the table. My expected output is:
+----------+
| game |
+----------+
| Football |
| Tennis |
+----------+
Because:
Football has four entries out of which four are consecutive years => 1999, 2000, 2001, 2002
Tennis has four entries out of which three are consecutive years => 2001, 2002, 2003
In order to find the rows with a minimum three consecutive entries I first partitioned the table on game and then checked difference between the current and the next row as below:
select game, year, case
when (year - lag(year) over (partition by game order by year)) is null then 1
else year - lag(year) over (partition by game order by year)
end as diff
from games
Output of the above query:
+----------+------+------+
| game | year | diff |
+----------+------+------+
| Football | 1999 | 1 |
| Football | 2000 | 1 |
| Football | 2001 | 1 |
| Football | 2002 | 1 |
| Cricket | 1996 | 1 |
| Tennis | 2001 | 1 |
| Tennis | 2002 | 1 |
| Tennis | 2003 | 1 |
| Tennis | 2009 | 6 |
| Golf | 1994 | 1 |
| Golf | 1996 | 2 |
| Golf | 1997 | 1 |
+----------+------+------+
I am not able to proceed from here on getting the output by filtering the data for each game with its difference.
Could anyone let me know if I am in the right track of the implementation? If not, how do I prepare the query to get the expected output?
You could use a self join approach here:
SELECT DISTINCT g1.Game
FROM games g1
INNER JOIN games g2
ON g2.Game = g1.Game AND g2.Year = g1.Year + 1
INNER JOIN games g3
ON g3.Game = g2.Game AND g3.Year = g2.Year + 1;
Demo
The above query requires any matching game to have at least one record whose year can be found in the following year, and the year after that as well.
You can use lag() and lead() and compare them to the current Year:
with u as
(select *, case
when lag(Year) over(partition by Game order by Year) = Year - 1
and lead(Year) over(partition by Game order by Year) = Year + 1
then 1 else 0
end as consec
from games)
select distinct Game
from u
where consec = 1;
Fiddle
Yes, your initial approach is correct. You were actually really close to fully figuring it out yourself.
What I would do is alter LAG a bit:
year - LAG(year, 2) OVER (
PARTITION BY game
ORDER BY year
ROWS BETWEEN UNBOUNDED PRECEEDING AND CURRENT ROW
)
For each row, this will compare the difference between the year from current row and the year from (current - 2)th row.
If it is the third consecutive row it will yield 2 which you can filter in where clause.
If your data contains duplicates you need to group by game, year first.
Using CTE(Common Table Expression) and the useful ROW_NUMBER window function this can be easily solved.
WITH CTE (name, RN) AS (
select name, ROW_NUMBER() OVER (PARTITION BY name order by year) RN
from game)
Select Distinct name
from CTE
Where RN >= 3

MYSQL Total required for Query

Hi I am looking for a solution to my inability to understand how I can get an overall total for a column in my query.
This query gets engineers names and the number of jobs they have that are out of SLA i.e. the data the job should have been completed has past and the job has still to be completed.
SELECT Engineer,Job_Status,COUNT(*) as 'Out Of SLA'
FROM import
WHERE (Job_Status = 'P' or Job_Status='P2' or Job_Status='P8')
and (isnull(Job_Completed_Date)
or Job_Completed_Date='0000-00-00')
and (Job_SLA_Due_Date < CURDATE()
)
GROUP BY import.Engineer,Job_Status
The code above produces the following results from the import table.
+----------------+------------+------------+
| Engineer | Job_Status | Out of SLA |
+----------------+------------+------------+
| Andy Beeres | P | 15 |
| Andy Broad | P | 4 |
| Darren Goodwin | P | 6 |
+----------------+------------+------------+
I want to be able to show the total number of the Out of SLA column as well as the rest of the table data if that makes sense something like the table below.
| Engineer | Job_Status | Out of SLA |
|------------- |------------ |------------ |
| Andy Beeres | P | 14 |
| | P2 | 3 |
| | P8 | 1 |
| Total | | 18 |
| Andy Broad | P | 12 |
| | P2 | 2 |
| Total | | 14 |
| Grand Total | | 32 |
Regards
Alan
Use with rollup with group by to get total_sla
According to MySql Docs:
The GROUP BY clause permits a WITH ROLLUP modifier that causes summary output to include extra rows that represent higher-level (that is, super-aggregate) summary operations. ROLLUP thus enables you to answer questions at multiple levels of analysis with a single query.
SELECT Engineer,Job_Status,COUNT(*) as 'Out Of SLA'
FROM import
WHERE (Job_Status = 'P' or Job_Status='P2' or Job_Status='P8')
and (isnull(Job_Completed_Date)
or Job_Completed_Date='0000-00-00')
and (Job_SLA_Due_Date < CURDATE()
)
GROUP BY import.Engineer,Job_Status WITH ROLLUP
One option is to use a subquery which finds the SLA total:
SELECT Engineer,
Job_Status,
COUNT(*) AS `Out Of SLA`,
(SELECT COUNT(*) FROM import) AS total_sla
FROM import
WHERE (Job_Status = 'P' OR Job_Status='P2' OR Job_Status='P8') AND
(ISNULL(Job_Completed_Date) OR Job_Completed_Date = '0000-00-00') AND
Job_SLA_Due_Date < CURDATE()
GROUP BY Engineer,
Job_Status

My sql listing entries which are repeated more than certain value

I have a staff table like this --->
+------+------------------+------+------------+--------+
| EC | Name | Code | Dob | Salary |
+------+------------------+------+------------+--------+
| 2001 | ROBBIE KEANE | VSS1 | 1990-05-16 | 18000 |
| 2002 | ANSUMAN BANERJEE | VSS1 | 1985-05-21 | 18000 |
| 2003 | OMAR GONZALEZ | SACP | 1989-04-16 | 20000 |
| 2004 | ALAN GORDON | IALO | 1989-05-03 | 20000 |
| 2005 | ROBBIE KEANE | IALO | 1988-01-16 | 18000 |
| 2006 | CHANDLER HOFFMAN | BBDP | 1988-07-17 | 22000 |
| 2007 | PAUL POGBA | BHSM | 1990-08-16 | 18000 |
| 2008 | SHINJI KAGAWA | LPDC | 1991-01-20 | 18000 |
+------+------------------+------+------------+--------+
And now i want to list those codes (like VSS1), which have less than specified number of people assigned with them(say like less than 2) , how can i do this please help.
My query up till now is-->
SELECT Code,count(*) as 'Number of Staff' from STAFF where Code IN (SELECT Code from STAFF GROUP BY CODE LIMIT 2);
But this is not working.
You can filter row count for each Code group with the HAVING clause :
SELECT Code
, COUNT(*)
FROM STAFF
GROUP BY Code
HAVING COUNT(*) < 2
If you need to know the names of the people having this count less than 2 then...
SELECT S.EC, S.Name, S.Code, S.DOB, S.Salary, SC.Code, SC.Cnt
FROM STAFF S
INNER JOIN (SELECT Count(*) cnt, Code FROM STAFF GROUP BY CODE) SC
on S.Code = SC.code
WHERE SC.CNT < 2
should work in SQL server and mySQL. Though SQL Sever could also use a windowed set which would be faster.
If however, you just need to know the Codes having less than a certain number, notulysses having clause should fit the bill.

SQL table join with multiple duplicates - returning count less than X within a date range

My head is spinning. I have been struggling for the past two days to come up with a MySQL query that joins two tables. I've run into several complications.
The first table, we'll call it log, has a log of people which have "signed in". Each entry is timestamped with datetime. It consists of a student_id and timestamp. It looks something like this
-------------------------------------
| student_id | timestamp |
| 1234 | 2014-02-26 21:50:27 |
| 2345 | 2014-02-26 21:54:54 |
| 1234 | 2014-03-03 19:18:18 |
| .....etc. |
-------------------------------------
My second table, we'll call it students has the information on each student.
--------------------------------------------------------
| student_id | name | homeroom |
| 1234 | Charles Reinmuth | Swatosh |
| 2345 | Kathryn Mo | Green |
| 6789 | Emily Salt | Clayborne |
| .....etc. |
--------------------------------------------------------
I want to return students, within a certain datetime range, that have fewer than X entries in the log. INCLUDING students with no entries within that datetime range. My current query works well. But only returns students with >0 entries in the log. I've tried LEFT OUTER and RIGHT OUTER but to no avail. Here is my current query:
SELECT students.name, students.homeroom, COUNT(1) AS cnt
FROM students
INNER JOIN log ON log.student_id = students.student_id
WHERE log.`datetime` between '2014-02-25 00:00:00' and '2014-03-04 00:00:00'
GROUP BY log.student_id HAVING COUNT(1) < 3
This will return, in the case of the example above:
--------------------------------------------------
| Name | Homeroom | Count |
| Charles Reinmuth | Swatosh | 2 |
| Kathryn Mo | Green | 1 |
| .....etc. |
--------------------------------------------------
This is what I am trying to accomplish:
--------------------------------------------------
| Name | Homeroom | Count |
| Charles Reinmuth | Swatosh | 2 |
| Kathryn Mo | Green | 1 |
| Emily Salt | Clayborne | 0 |
| .....etc. |
--------------------------------------------------
To get rows with no matches in the second table, you need to use LEFT JOIN.
SELECT students.name, students.homeroom, COUNT(log.student_id) AS cnt
FROM students
LEFT JOIN log
ON log.student_id = students.student_id
AND log.`datetime` between '2014-02-25 00:00:00' and '2014-03-04 00:00:00'
GROUP BY students.student_id
HAVING cnt < 3
Note the other changes I made:
COUNT(log.student_id) - Otherwise, you'll count the row from the students table, even though there's no log row. COUNT(column) doesn't count null values, which is what you get when there's no match.
The log.datetime test must be moved into the ON clause, because it will be NULL for students with no rows in log.
I changed the HAVING clause to use the cnt alias, to avoid having to repeat COUNT(log.student_id).

query using inner joins and having

These are my tables:
mysql> select * from professor;
+-------+--------+--------+--------+------+
| empid | name | status | salary | age |
+-------+--------+--------+--------+------+
| 1 | Arun | 1 | 2000 | 23 |
| 2 | Benoy | 0 | 3000 | 25 |
| 3 | Chacko | 1 | 1000 | 36 |
| 4 | Divin | 0 | 5000 | 32 |
| 5 | Edwin | 1 | 2500 | 55 |
| 7 | George | 0 | 1500 | 46 |
+-------+--------+--------+--------+------+
6 rows in set (0.00 sec)
mysql> select * from works;
+----------+-------+---------+
| courseid | empid | classid |
+----------+-------+---------+
| 1 | 1 | 10 |
| 2 | 2 | 9 |
| 3 | 3 | 8 |
| 4 | 4 | 10 |
| 5 | 5 | 9 |
| 6 | 1 | 9 |
| 2 | 3 | 10 |
| 2 | 1 | 7 |
| 4 | 2 | 6 |
| 2 | 4 | 6 |
| 2 | 5 | 2 |
| 7 | 5 | 6 |
| 3 | 5 | 2 |
| 6 | 4 | 10 |
| 2 | 7 | 1 |
+----------+-------+---------+
15 rows in set (0.00 sec)
mysql> select * from course;
+----------+------------+--------+
| courseid | coursename | points |
+----------+------------+--------+
| 1 | Maths | 5 |
| 2 | Science | 1 |
| 3 | English | 6 |
| 4 | Social | 4 |
| 5 | Malayalam | 20 |
| 6 | Arts | 25 |
| 7 | Biology | 20 |
+----------+------------+--------+
7 rows in set (0.00 sec)
The question is :
Return those courses that have been taught by all professors.
The query I tries is:
select course.coursename from
course inner join works
on course.courseid=works.empid
group by works.courseid
having works.empid in (select empid from professor);
I am getting an error like this:
Unknown column 'works.empid' in 'IN/ALL/ANY subquery'
Pls help me out with the query.
http://sqlfiddle.com/#!2/4b197/5
Apart from the reasons why your current query gives you an error, which #GordonLinoff explained in detail, one way to achieve the desired result
Return those courses that have been taught by all professors.
is
SELECT c.*
FROM
(
SELECT courseid
FROM works
GROUP BY courseid
HAVING COUNT(DISTINCT empid) =
(
SELECT COUNT(*)
FROM professor
)
) q JOIN course c
ON q.courseid = c.courseid
Note: Thanks to #eggyal it's worth to mention that this query operates on the assumption that referential integrity is intact meaning works table doesn't have orphaned records (rows where empid refers to a non-existent row in professor table) and technically returns courses taught by the same number of professors as currently exist in the professor table which in the case of intact referential integrity happen to be the courses we're looking for.
Output:
| COURSEID | COURSENAME | POINTS |
|----------|------------|--------|
| 2 | Science | 1 |
Here is SQLFiddle demo
First, the query is way off from what you want. I'm only going to address the error.
The error is very interesting. In short, MySQL allows something called hidden columns (which are described below). However, these only work in the having clause when they are included in the select clause. I hadn't known that.
The following two queries parse correctly (I'm skipping the middle part for brevity):
select course.coursename, works.empid
. . .
having works.empid = 1;
select course.coursename, works.empid
. . .
having works.empid = 1;
And yet, the following two fail with the same error:
select course.coursename
. . .
having works.empid = 1;
select course.coursename
. . .
having works.empid = 1;
The only difference s that the column is not mentioned in the select clause.
What is happening is that you are using a MySQL extension to the group by clause, sometimes called "Hidden Columns". You have columns in the select or having clause that are neither aggregation keys (course.courseid) nor surrounded by an aggregation function (say min(works.empid) or group_concat(works.empid). Apparently, MySQL only recognizes these columns in the having clause when they are already in the `select clause. At your stage of learning SQL, you just shouldn't do this. Following the documentation to turn off this extension and go to ANSI standard behavior:
To disable the MySQL GROUP BY extension, enable the ONLY_FULL_GROUP_BY
SQL mode. This enables standard SQL behavior: Columns not named in the
GROUP BY clause cannot be used in the select list or HAVING clause
unless enclosed in an aggregate function.
The way to fix the syntax problem is to use an aggregation function, something like:
select course.coursename
. . .
having min(works.empid) = 1;
select course.coursename
. . .
having min(works.empid) = 1;
This will get you no closer to having a working query, because yours is far from solving the problem. But it will fix the syntactic error.
Make a cross-join between course and professor to obtain every combination of courses and professors;
Make an outer join between that and works to identify which of those (course, professor) combinations actually exist;
Group by course and sum the number of such combinations that do not exist;
Filter the groups for only those where there are no such non-existent combinations.
Therefore:
SELECT course.*
FROM (course, professor) LEFT JOIN works USING (courseid, empid)
GROUP BY courseid
HAVING SUM(works.empid IS NULL) = 0
try this:
select c.coursename, w.empid from
course as c inner join works as w
on (c.courseid=w.empid)
group by w.courseid
having w.empid in (select p.empid from professor as p);
select course.coursename,works.courseid,works.empid from
course inner join works
on course.courseid=works.empid
group by works.courseid
having works.empid in (select empid from professor);
Its Working