Grouping all in one tuple in SQL - mysql

I have a table EMP with employees id and their hireyear. And I have to get the amount of hired employees in lets say the the years 2002 and 2000. The output table should als contain the amount of hired employees in the whole time.
So the last is easy. I just have to write:
SELECT COUNT(id) AS GLOBELAMOUNT FROM EMP;
But how do I count the amount of hired employees in 2002?
I could write the following:
SELECT COUNT(id) AS HIREDIN2002 FROM EMP WHERE YEAR = 2002;
But how do I combine this in one tuple with the data above?
Maybe I should group the data by Hireyear first and then count it? But can not really imagine how I count the data for several years.
Hope u guys can help me.
Cheers,
Andrej

Use conditional aggregation, e.g.:
SELECT COUNT(id) AS GLOBELAMOUNT,
COUNT(CASE WHEN YEAR=2000 THEN 1 END) AS HIREDIN2000,
COUNT(CASE WHEN YEAR=2002 THEN 1 END) AS HIREDIN2002
FROM EMP;

In Microsoft SQL Server (Transact-SQL) at least, you can use a windowed aggregate function like this:
Select Distinct
Year
,count(Id) over (Partition by Year) as CountHiredInYear
,count(Id) over () as CountTotalHires
From EMP
This gives something like:
Year | CountHiredInYear | CountTotalHires
2005 | 3 | 12
2006 | 4 | 12
2007 | 5 | 12
Another SQL Server specific approach is the With Rollup keyword.
Select Year
,count(Id) as CountHires
From Emp
Group by Year
With Rollup
This adds a summary line for each level of grouping, with the total value for that set of rows. So here, you'd get an extra row where Year was NULL, with the value 12.

You could use two (or more) inline queries:
SELECT
(SELECT COUNT(id) FROM EMP) AS GLOBELAMOUNT,
(SELECT COUNT(id) FROM EMP WHERE YEAR = 2002) AS HIREDIN2002
or a CROSS JOIN:
SELECT GLOBELAMOUNT, HIREDIN2002
FROM
(SELECT COUNT(id) AS GLOBELAMOUNT FFROM EMP) g CROSS JOIN
(SELECT COUNT(id) AS HIREDIN2002 FROM EMP WHERE YEAR = 2002) h

Related

Display column values and their count on SQL

I just want to ask you please this question on SQL.
Let's consider this EMPLOYEE table :
Employee Department
A 10
A 10
A 11
A 12
B 13
B 13
What I want to display is for each employee, all distinct departments (without duplicates) AND the total number of those distinct departments. So, something like this :
Employee Department total_dept
A 10 3
A 11 3
A 12 3
B 13 1
If possible, I would even prefer something like these :
Employee Department total_dept
A 10 3
A 11 null
A 12 null
B 13 1
I have a very big table (with many columns and many data) so I thought this can be an "optimisation", no ? I mean, there is no need to store the total_dept in all rows. Just put it once it's sufficient. No problem if after this I left the column empty. But I don't know if it's possible to do such thing in SQL.
So, how can I fix this please ? I tried but it seems impossible to combine count(column) with the same column...
Thank you in advance
This might be what you are looking for
SELECT
emp,
dept,
(select count(distinct dept) from TB as tbi where tb.emp = tbi.emp ) x
FROM TB
group by emp, dept;
MySQL 8.0 supports windowed COUNT:
SELECT *,COUNT(*) OVER (PARTITION BY Employee) AS total_dept
FROM (SELECT DISTINCT * FROM Employees) e
db<>fiddle demo
You could even have second resulset(I recommend to leave presentation matter to apllication layer):
SELECT *, CASE WHEN ROW_NUMBER() OVER(PARTITION BY Employee ORDER BY Department) = 1
THEN COUNT(*) OVER (PARTITION BY Employee) END AS total_dept
FROM (SELECT DISTINCT * FROM Employees) e
ORDER BY Employee, Department;
db<>fiddle demo
For the 2nd version:
SELECT
DISTINCT e.Employee, e.Department,
CASE
WHEN e.Department =
(SELECT MIN(Department) FROM Employees WHERE Employees.Employee = e.Employee)
THEN
(SELECT COUNT(DISTINCT Department) FROM Employees WHERE Employees.Employee = e.Employee)
END AS total_dept
FROM Employees e
ORDER BY e.Employee, e.Department;
See the demo

Finding missing data in a sequence in MySQL

Is there an efficient way to find missing data not just in one sequence, but many sequences?
This is probably unavoidably O(N**2), so efficient here is defined as relatively few queries using MySQL
Let's say I have a table of temporary employees and their starting and ending months.
employees | start_month | end_month
------------------------------------
Jane 2017-05 2017-07
Bob 2017-10 2017-12
And there is a related table of monthly payments to those employees
employee | paid_month
---------------------
Jane 2017-05
Jane 2017-07
Bob 2017-11
Bob 2017-12
Now, it's clear that we're missing a month for Jane (2017-06) and one for Bob too (2017-10).
Is there a way to somehow find the gaps in their payment record, without lots of trips back and forth?
In the case where there's just one sequence to check, some people generate a temporary table of valid values, and then LEFT JOIN to find the gaps. But here we have different sequences for each employee.
One possibility is that we could do an aggregate query to find the COUNT() of paid_months for each employee, and then check it versus the expected delta of months. Unfortunately the data here is a bit dirty so we actually have payment dates that could be before or after that employee start or end date. But we're verifying that the official sequence definitely has payments.
Form a Cartesian product of employees and months, then left join the actual data to that, then the missing data is revealed when there is no matched payment to the Cartesian product.
You need a list of every months. This might come from a "calendar table" you already have, OR, it MIGHT be possible using a subquery if every month is represented in the source data)
e.g.
select
m.paid_month, e.employee
from (select distinct paid_month from payments) m
cross join (select employee from employees) e
left join payments p on m.paid_month = p.paid_month and e.employee = p.employee
where p.employee is null
The subquery m can be substituted by the calendar table or some other technique for generating a series of months. e.g.
select
DATE_FORMAT(m1, '%Y-%m')
from (
select
'2017-01-01'+ INTERVAL m MONTH as m1
from (
select #rownum:=#rownum+1 as m
from (select 1 union select 2 union select 3 union select 4) t1
cross join (select 1 union select 2 union select 3 union select 4) t2
## cross join (select 1 union select 2 union select 3 union select 4) t3
## cross join (select 1 union select 2 union select 3 union select 4) t4
cross join(select #rownum:=-1) t0
) d1
) d2
where m1 < '2018-01-01'
order by m1
The subquery e could contain other logic (e.g. to determine which employees are still currently employed, or that are "temporary employees")
First we need to get all the months between start date and end_date in a temporary table then need do a left outer join with the payments table on paid month filtering all non matching months ( payment employee name is null )
select e.employee, e.yearmonth as missing_paid_month from (
with t as (
select e.employee, to_date(e.start_date, 'YYYY-MM') as start_date, to_date(e.end_date, 'YYYY-MM') as end_date from employees e
)
select distinct t.employee,
to_char(add_months(trunc(start_date,'MM'),level - 1),'YYYY-MM') yearmonth
from t
connect by trunc(end_date,'mm') >= add_months(trunc(start_date,'mm'),level - 1)
order by t.employee, yearmonth
) e
left outer join payments p
on p.paid_month = e.yearmonth
where p.employee is null
output
EMPLOYEE MISSING_PAID_MONTH
Bob 2017-10
Jane 2017-06
SQL Fiddle http://sqlfiddle.com/#!4/2b2857/35

Get the greatest Year value in mysql after grouping by a column

The below table contains an id and a Year and Groups
GroupingTable
id | Year | Groups
1 | 2000 | A
2 | 2001 | B
3 | 2001 | A
Now I want select the greatest year even after grouping them by the Groups Column
SELECT
id,
Year,
Groups
FROM
GroupingTable
GROUP BY
`Groups`
ORDER BY Year DESC
And below is what I am expecting even though the query above doesnt work as expected
id | Year | Groups
2 | 2001 | B
3 | 2001 | A
You need to learn how to use aggregate functions.
SELECT
MAX(Year) AS Year,
Groups
FROM
GroupingTable
GROUP BY
`Groups`
ORDER BY Year DESC
When using GROUP BY, only the column(s) you group by are unambiguous, because they have the same value on every row of the group.
Other columns return a value arbitrarily from one of the rows in the group. Actually, this is behavior of MySQL (and SQLite), but because of the ambiguity, it's an illegal query in standard SQL and all other brands of SQL implementations.
For more on this, see my answer to Reason for Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause
Your query misuses the heinously confusing nonstandard extension to GROUP BY that's built in to MySQL. Read this and weep. https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
If all you want is the year it's a snap.
SELECT MAX(Year) Year, Groups
FROM GroupingTable
GROUP BY Groups
If you want the id of the row in question, you have to do a bunch of monkey business to retrieve the column id from the above query.
SELECT a.*
FROM GroupingTable a
JOIN (
SELECT MAX(Year) Year, Groups
FROM GroupingTable
GROUP BY Groups
) b ON a.Groups = b.Groups AND a.Year = b.Year
You have to do this because the GROUP BY query yields a summary result set, and you have to join that back to the detail result set to retrieve the ID.

General time range with SQL

I'm trying to do something in SQL and I just can't figure out how I should do that. I have this table
----------------------------------
|id_visit | visit_date | ssn |
----------------------------------
|1 |1940-01-07 |123125789|
----------------------------------
|2 |1975-03-15 |987743271|
----------------------------------
| ... | ... | ... |
and I need to select SSN's of patients that were visited more than five times within a year. How do I do that? I know it involves a 'HAVING COUNT(id_visit)' but for time part... that's a different story because my goal isn't to select ssn's in a specific time range but within a general range.
From #Gordon Linoff answer, I modified the query a bit for eliminating repetition in the results and getting maximum result only.
select p_ssn as SSN, max(visits_within_one_year) as "Maximum number of visits"
from (select t.p_ssn,count(*) as visits_within_one_year
from t join
t tyr
on t.p_ssn = tyr.p_ssn and
tyr.visit_date between t.visit_date and adddate(t.visit_date, 365)
group by t.p_ssn,t.visit_date
having visits_within_one_year > 5)results
group by p_ssn;
Assuming that you mean calendar year, the following query retrieves all SSNs and year combinations where the SSN appears more than five times during the year:
select ssn, year(visit_date) as yr
from t
group by ssn, year(visit_date)
having count(*) > 5;
If the question is about an arbitrary year period, then you can use a self join and aggregation:
select t.ssn, t.visit_date, count(*) as visits_within_one_year
from t join
t tyr
on t.ssn = tyr.ssn and
tyr.visit_date between t.visit_date and adddate(t.visit_date, 365)
group by t.ssn, t.visit_date
having visits_within_one_year > 5;
If you mean to get those ssn within a solar year (jan/dec):
select ssn
from tablename
group by ssn,year(visit_date)
having count(ssn)>5

How can I write a query that aggregate a single row with latest date among multiple set of rows?

I have a MySQL table where there are many rows for each person, and I want to write a query which aggregates rows with special constraint. (one per person)
For example, lets say the table is consist of following data.
name date reason
---------------------------------------
John 2013-04-01 14:00:00 Vacation
John 2013-03-31 18:00:00 Sick
Ted 2012-05-06 20:00:00 Sick
Ted 2012-02-20 01:00:00 Vacation
John 2011-12-21 00:00:00 Sick
Bob 2011-04-02 20:00:00 Sick
I want to see the distribution of 'reason' column. If I just write a query like below
select reason, count(*) as count from table group by reason
then I will be able to see number of reasons for this table overall.
reason count
------------------
Sick 4
Vacation 2
However, I am only interested in single reason from each person. The reason that should be counted should be from a row with latest date from the person's records. For example, John's latest reason would be Vacation while Ted's latest reason would be Sick. And Bob's latest reason (and the only reason) is Sick.
The expected result for that query should be like below. (Sum of count will be 3 because there are only 3 people)
reason count
-----------------
Sick 2
Vacation 1
Is it possible to write a query such that single latest reason will be counted when I want to see distribution(count) of reasons?
Here are some facts about the table.
The table has tens of millions of rows
For most of times, each person has one reason.
Some people have multiple reasons, but 99.99% of people have fewer than 5 reasons.
There are about 30 different reasons while there are millions of distinct names.
The table is partitioned based on date range.
SELECT T.REASON, COUNT(*)
FROM
(
SELECT PERSON, MAX(DATE) AS MAX_DATE
FROM TABLE-NAME
GROUP BY PERSON
) A, TABLE-NAME T
WHERE T.PERSON = A.PERSON AND T.DATE = A.MAX_DATE
GROUP BY T.REASON
Try this
select reason, count(*) from
(select reason from table where date in
(select max(date) from table group by name)) t
group by reason
In MySQL, it's not very efficient to do this kind of query since you don't have access to tools like partitionning query in SQL Server or Oracle.
You can still emulate it by doing a subquery and retrieve the rows based on the condition you need, here the maximum date :
SELECT t.reason, COUNT(1)
FROM
(
SELECT name, MAX(adate) AS maxDate
FROM #aTable
GROUP BY name
) maxDateRows
INNER JOIN #aTable t ON maxDateRows.name = t.name
AND maxDateRows.maxDate = t.adate
GROUP BY t.reason
You can see a sample here.
Test this query on your samples, but I'm afraid that it will be slow as hell.
For your information, you can do the same thing in a more elegant and much much faster way in SQL Server :
SELECT reason, COUNT(1)
FROM
(
SELECT name
, reason
, RANK() OVER(PARTITION BY name ORDER BY adate DESC) as Rank
FROM #aTable
) AS rankTable
WHERE Rank = 1
GROUP BY reason
The sample is here
If you are really stuck to MySql, and the first query is too slow, then you can split the problem.
Do a first query creating a table:
CREATE TABLE maxDateRows AS
SELECT name, MAX(adate) AS maxDate
FROM #aTable
GROUP BY name
Then create index on both name and maxDate.
Finally, get the results :
SELECT t.reason, COUNT(1)
FROM maxDateRows m
INNER JOIN #aTable t ON m.name = t.name
AND m.maxDate = t.adate
GROUP BY t.reason
The solution you are looking for seems to be solved by this query :
select
reason,
count(*)
from (select * from tablename group by name) abc
group by
reason
It is quite fast and simple. You can view the SQL Fiddle
Apologies if this answer duplicates an existing. Maybe I'm suffering from some form aphasia but I cannot see it...
SELECT x.reason
, COUNT(*)
FROM absentism x
JOIN
( SELECT name,MAX(date) max_date FROM absentism GROUP BY name) y
ON y.name = x.name
AND y.max_date = x.date
GROUP
BY reason;