I'm trying to do something in SQL and I just can't figure out how I should do that. I have this table
----------------------------------
|id_visit | visit_date | ssn |
----------------------------------
|1 |1940-01-07 |123125789|
----------------------------------
|2 |1975-03-15 |987743271|
----------------------------------
| ... | ... | ... |
and I need to select SSN's of patients that were visited more than five times within a year. How do I do that? I know it involves a 'HAVING COUNT(id_visit)' but for time part... that's a different story because my goal isn't to select ssn's in a specific time range but within a general range.
From #Gordon Linoff answer, I modified the query a bit for eliminating repetition in the results and getting maximum result only.
select p_ssn as SSN, max(visits_within_one_year) as "Maximum number of visits"
from (select t.p_ssn,count(*) as visits_within_one_year
from t join
t tyr
on t.p_ssn = tyr.p_ssn and
tyr.visit_date between t.visit_date and adddate(t.visit_date, 365)
group by t.p_ssn,t.visit_date
having visits_within_one_year > 5)results
group by p_ssn;
Assuming that you mean calendar year, the following query retrieves all SSNs and year combinations where the SSN appears more than five times during the year:
select ssn, year(visit_date) as yr
from t
group by ssn, year(visit_date)
having count(*) > 5;
If the question is about an arbitrary year period, then you can use a self join and aggregation:
select t.ssn, t.visit_date, count(*) as visits_within_one_year
from t join
t tyr
on t.ssn = tyr.ssn and
tyr.visit_date between t.visit_date and adddate(t.visit_date, 365)
group by t.ssn, t.visit_date
having visits_within_one_year > 5;
If you mean to get those ssn within a solar year (jan/dec):
select ssn
from tablename
group by ssn,year(visit_date)
having count(ssn)>5
Related
The below table contains an id and a Year and Groups
GroupingTable
id | Year | Groups
1 | 2000 | A
2 | 2001 | B
3 | 2001 | A
Now I want select the greatest year even after grouping them by the Groups Column
SELECT
id,
Year,
Groups
FROM
GroupingTable
GROUP BY
`Groups`
ORDER BY Year DESC
And below is what I am expecting even though the query above doesnt work as expected
id | Year | Groups
2 | 2001 | B
3 | 2001 | A
You need to learn how to use aggregate functions.
SELECT
MAX(Year) AS Year,
Groups
FROM
GroupingTable
GROUP BY
`Groups`
ORDER BY Year DESC
When using GROUP BY, only the column(s) you group by are unambiguous, because they have the same value on every row of the group.
Other columns return a value arbitrarily from one of the rows in the group. Actually, this is behavior of MySQL (and SQLite), but because of the ambiguity, it's an illegal query in standard SQL and all other brands of SQL implementations.
For more on this, see my answer to Reason for Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause
Your query misuses the heinously confusing nonstandard extension to GROUP BY that's built in to MySQL. Read this and weep. https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
If all you want is the year it's a snap.
SELECT MAX(Year) Year, Groups
FROM GroupingTable
GROUP BY Groups
If you want the id of the row in question, you have to do a bunch of monkey business to retrieve the column id from the above query.
SELECT a.*
FROM GroupingTable a
JOIN (
SELECT MAX(Year) Year, Groups
FROM GroupingTable
GROUP BY Groups
) b ON a.Groups = b.Groups AND a.Year = b.Year
You have to do this because the GROUP BY query yields a summary result set, and you have to join that back to the detail result set to retrieve the ID.
I'm attempting to calculate the CURRENT location of a person based on a schedule items table (schedules).
The basic premise is, you can schedule a person to be in an office for a certain period of time (let's say start_date=2015-10-01, end_date=2015-12-31). That is a schedule item. It has a 1toM relationship with a location - that's no problem, I have that part sorted.
However, whilst they're scheduled to be in that office, they may also be scheduled to attend an offsite/client office. So there will be another schedule entry for, say, start_date=2015-12-03, end_date=2015-12=04.
Here's the table structure.
Person table
----------------------------------------------
|person_id |person_name |person_email |
----------------------------------------------
|1 |John |john#example.org |
|2 |Jane |jane#example.org |
----------------------------------------------
Schedule table
--------------------------------------------------------------
|schedule_id |person_id |location_id |start_date |end_date |
--------------------------------------------------------------
|1 |1 |1 |2015-10-01 |2015-12-31 |
|2 |2 |2 |2015-10-15 |2016-01-15 |
|3 |1 |5 |2015-12-03 |2015-12-10 |
|4 |2 |7 |2015-12-04 |2015-12-12 |
--------------------------------------------------------------
When I'm querying a single record, I'm easily able to calculate where the person currently is. It's not so complex.
SELECT * FROM schedules
WHERE person_id = 1 AND start_date <= CURDATE() AND end_date >= CURDATE
ORDER BY end_date ASC, start_date DESC
LIMIT 0,1
However, when I need to generate a list of all people with their current schedule item, I'm running into issues. I had initially thought of just using a GROUP BY statement in the query, but that will only ever return the earliest schedule item that matches the query.
The problem therein, is that there are MULTIPLE schedule items that will match the query (this is part of the domain logic). However, I will always select the SHORTEST current stint as their CURRENT location.
I've used a groupwise query in the past to calculate the status of a person's employment based on the most recent status entry. However, because the schedule item has some slightly more complex logic in and around it (it has future scheduled items in it) I'm really just talking myself in circles as to the best approach.
A method using a sub query with substring_index. This gets all the schedule ids ordered by the length of time between the end and start dates, then uses SUBSTRING_INDEX to just get the first one. Then joins this against schedules to get the rest of the details.
SELECT *
FROM schedules
INNER JOIN
(
SELECT person_id, SUBSTRING_INDEX(GROUP_CONCAT(schedule_id ORDER BY DATEDIFF(end_date, start_date)), ',', 1) AS best_schedule_id
FROM schedules
WHERE person_id = 1
AND start_date <= CURDATE() AND end_date >= CURDATE
GROUP BY person_id
) sub0
ON schedules.schedule_id = sub0.best_schedule_id
AND schedules.person_id = sub0.person_id
Note, I have also returned the person id from the sub query. Not strictly necessary as the query is at the moment, but put it in place so if you start to want to bring back multiple people it will need little change.
You want to select all records with a starting date before and an end date after the current date. You can get one person multiple times. From that person you want to select the occurrence with the earliest end date. That means you have to order those record by end date and number them within the person group. Try this:
select * from (
select a.scheduled_id
, a.person_id
, a.location_id
, a.start_date
, a.end_date
, row_number() over (partition by a.person_id order by a.end_date) as rn
from schedules a
where getdate() between a.start_date and a.end_date
) tab
where rn=1
I added this afterwards because I realized that the row_number function is not available in MySQL. So this is the MySQL version. A bit more complicated but it should work:
select * from (
select #row_num := if(#prev_value=a.person_id,#row_num+1,1) as rn
, a.scheduled_id
, a.person_id
, a.location_id
, a.start_date
, a.end_date
, #prev_value := a.person_id as asgmnt
from schedules a,
(select #row_num:=1) x,
(select #prev_value:=0) y
where a.start_date<=curdate() and a.end_date>=curdate()
order by a.person_id, a.end_date
) tab
where rn=1
I took some motivation from what you gave me and simply decided to do another subquery on the result set, prior to doing a GROUP BY on the output.
SELECT s.schedule_id, s.person_id, s.location_id FROM (
SELECT * FROM schedules
WHERE person_id = 1 AND start_date <= CURDATE() AND end_date >= CURDATE
ORDER BY end_date ASC, start_date DESC
) AS s GROUP BY s.person_id
This appears to have given me the result set that I was after, unless anybody can think of a reason this would fail?
I have a table EMP with employees id and their hireyear. And I have to get the amount of hired employees in lets say the the years 2002 and 2000. The output table should als contain the amount of hired employees in the whole time.
So the last is easy. I just have to write:
SELECT COUNT(id) AS GLOBELAMOUNT FROM EMP;
But how do I count the amount of hired employees in 2002?
I could write the following:
SELECT COUNT(id) AS HIREDIN2002 FROM EMP WHERE YEAR = 2002;
But how do I combine this in one tuple with the data above?
Maybe I should group the data by Hireyear first and then count it? But can not really imagine how I count the data for several years.
Hope u guys can help me.
Cheers,
Andrej
Use conditional aggregation, e.g.:
SELECT COUNT(id) AS GLOBELAMOUNT,
COUNT(CASE WHEN YEAR=2000 THEN 1 END) AS HIREDIN2000,
COUNT(CASE WHEN YEAR=2002 THEN 1 END) AS HIREDIN2002
FROM EMP;
In Microsoft SQL Server (Transact-SQL) at least, you can use a windowed aggregate function like this:
Select Distinct
Year
,count(Id) over (Partition by Year) as CountHiredInYear
,count(Id) over () as CountTotalHires
From EMP
This gives something like:
Year | CountHiredInYear | CountTotalHires
2005 | 3 | 12
2006 | 4 | 12
2007 | 5 | 12
Another SQL Server specific approach is the With Rollup keyword.
Select Year
,count(Id) as CountHires
From Emp
Group by Year
With Rollup
This adds a summary line for each level of grouping, with the total value for that set of rows. So here, you'd get an extra row where Year was NULL, with the value 12.
You could use two (or more) inline queries:
SELECT
(SELECT COUNT(id) FROM EMP) AS GLOBELAMOUNT,
(SELECT COUNT(id) FROM EMP WHERE YEAR = 2002) AS HIREDIN2002
or a CROSS JOIN:
SELECT GLOBELAMOUNT, HIREDIN2002
FROM
(SELECT COUNT(id) AS GLOBELAMOUNT FFROM EMP) g CROSS JOIN
(SELECT COUNT(id) AS HIREDIN2002 FROM EMP WHERE YEAR = 2002) h
Let's say I have a schools table (cols = "ids (int)") and a users table (cols = "id (int), school_id (int), created_at (datetime)").
I have a list of school ids saved in <school_ids>. I want to group those schools by the yearweek(users.created_at) value for the user at that school with the earliest created_at value, and for each group list the value of yearweek(users.created_at) and the number of schools.
In other words, i want to find the earliest-created user for each school, and then group the schools by the yearweek() result for that created_at date, so i have the number of schools that signed up their first user in each week, effectively.
So, i want results like
| 201301 | 22 | #meaning there are 22 schools where the earliest created_at user
#has yearweek(created_at) = "201301"
| 201302 | 5 | #meaning there are 5 schools where the earliest created_at user
#has yearweek(created_at) = "201302"
etc
As a sanity check, the total of all rows in the second column should equal the size of <school_ids>, ie the number of ids in school_ids.
Does that make sense? I can't quite figure out how to get this without doing several queries and storing values in between. I'm sure there's a one-liner. Thanks! max
You could use a subquery that returns the minimum created_at field for every school_id, and then you can group by yearweek and do the count:
SELECT
yearweek(u.min_created_at) AS yearweek_first_user,
COUNT(*)
FROM
(
SELECT school_id, MIN(created_at) AS min_created_at
FROM users
GROUP BY school_id
) u
GROUP BY
yearweek(u.min_created_at)
I have a MySQL table where there are many rows for each person, and I want to write a query which aggregates rows with special constraint. (one per person)
For example, lets say the table is consist of following data.
name date reason
---------------------------------------
John 2013-04-01 14:00:00 Vacation
John 2013-03-31 18:00:00 Sick
Ted 2012-05-06 20:00:00 Sick
Ted 2012-02-20 01:00:00 Vacation
John 2011-12-21 00:00:00 Sick
Bob 2011-04-02 20:00:00 Sick
I want to see the distribution of 'reason' column. If I just write a query like below
select reason, count(*) as count from table group by reason
then I will be able to see number of reasons for this table overall.
reason count
------------------
Sick 4
Vacation 2
However, I am only interested in single reason from each person. The reason that should be counted should be from a row with latest date from the person's records. For example, John's latest reason would be Vacation while Ted's latest reason would be Sick. And Bob's latest reason (and the only reason) is Sick.
The expected result for that query should be like below. (Sum of count will be 3 because there are only 3 people)
reason count
-----------------
Sick 2
Vacation 1
Is it possible to write a query such that single latest reason will be counted when I want to see distribution(count) of reasons?
Here are some facts about the table.
The table has tens of millions of rows
For most of times, each person has one reason.
Some people have multiple reasons, but 99.99% of people have fewer than 5 reasons.
There are about 30 different reasons while there are millions of distinct names.
The table is partitioned based on date range.
SELECT T.REASON, COUNT(*)
FROM
(
SELECT PERSON, MAX(DATE) AS MAX_DATE
FROM TABLE-NAME
GROUP BY PERSON
) A, TABLE-NAME T
WHERE T.PERSON = A.PERSON AND T.DATE = A.MAX_DATE
GROUP BY T.REASON
Try this
select reason, count(*) from
(select reason from table where date in
(select max(date) from table group by name)) t
group by reason
In MySQL, it's not very efficient to do this kind of query since you don't have access to tools like partitionning query in SQL Server or Oracle.
You can still emulate it by doing a subquery and retrieve the rows based on the condition you need, here the maximum date :
SELECT t.reason, COUNT(1)
FROM
(
SELECT name, MAX(adate) AS maxDate
FROM #aTable
GROUP BY name
) maxDateRows
INNER JOIN #aTable t ON maxDateRows.name = t.name
AND maxDateRows.maxDate = t.adate
GROUP BY t.reason
You can see a sample here.
Test this query on your samples, but I'm afraid that it will be slow as hell.
For your information, you can do the same thing in a more elegant and much much faster way in SQL Server :
SELECT reason, COUNT(1)
FROM
(
SELECT name
, reason
, RANK() OVER(PARTITION BY name ORDER BY adate DESC) as Rank
FROM #aTable
) AS rankTable
WHERE Rank = 1
GROUP BY reason
The sample is here
If you are really stuck to MySql, and the first query is too slow, then you can split the problem.
Do a first query creating a table:
CREATE TABLE maxDateRows AS
SELECT name, MAX(adate) AS maxDate
FROM #aTable
GROUP BY name
Then create index on both name and maxDate.
Finally, get the results :
SELECT t.reason, COUNT(1)
FROM maxDateRows m
INNER JOIN #aTable t ON m.name = t.name
AND m.maxDate = t.adate
GROUP BY t.reason
The solution you are looking for seems to be solved by this query :
select
reason,
count(*)
from (select * from tablename group by name) abc
group by
reason
It is quite fast and simple. You can view the SQL Fiddle
Apologies if this answer duplicates an existing. Maybe I'm suffering from some form aphasia but I cannot see it...
SELECT x.reason
, COUNT(*)
FROM absentism x
JOIN
( SELECT name,MAX(date) max_date FROM absentism GROUP BY name) y
ON y.name = x.name
AND y.max_date = x.date
GROUP
BY reason;