MySQL - count by month (including missing records)

MySQL - count by month (including missing records) - mysql

I have this SELECT:
SELECT
DATE_FORMAT(`created`, '%Y-%m') as byMonth,
COUNT(*) AS Total
FROM
`qualitaet`
WHERE
`created` >= MAKEDATE(year(now()-interval 1 year),1) + interval 5 month
AND
`status`=1
GROUP BY
YEAR(`created`), MONTH(`created`)
ORDER BY
YEAR(`created`) ASC
and get this result:
| byMonth | Total |
| 2015-06 | 2 |
| 2015-09 | 12 |
| 2015-10 | 3 |
| 2015-12 | 8 |
| 2016-01 | 1 |
see SQL-Fiddle here
The WHERE clause is important because i need it as current fiscal year starting on June, 1 in my example.
As you can see, i have no records for Jul, Aug and Nov. But i need this records with zero in Total.
So my result should look like this:
| byMonth | Total |
| 2015-06 | 2 |
| 2015-07 | 0 |
| 2015-08 | 0 |
| 2015-09 | 12 |
| 2015-10 | 3 |
| 2015-11 | 0 |
| 2015-12 | 8 |
| 2016-01 | 1 |
is there a way to get this result?

You need to generate all the wanted dates, and then left join your data to the dates. Note also that it is important to put some predicates in the left join's ON clause, and others in the WHERE clause:
SELECT
CONCAT(y, '-', LPAD(m, 2, '0')) as byMonth,
COUNT(`created`) AS Total
FROM (
SELECT year(now()) AS y UNION ALL
SELECT year(now()) - 1 AS y
) `years`
CROSS JOIN (
SELECT 1 AS m UNION ALL
SELECT 2 AS m UNION ALL
SELECT 3 AS m UNION ALL
SELECT 4 AS m UNION ALL
SELECT 5 AS m UNION ALL
SELECT 6 AS m UNION ALL
SELECT 7 AS m UNION ALL
SELECT 8 AS m UNION ALL
SELECT 9 AS m UNION ALL
SELECT 10 AS m UNION ALL
SELECT 11 AS m UNION ALL
SELECT 12 AS m
) `months`
LEFT JOIN `qualitaet` q
ON YEAR(`created`) = y
AND MONTH(`created`) = m
AND `status` = 1
WHERE STR_TO_DATE(CONCAT(y, '-', m, '-01'), '%Y-%m-%d')
>= MAKEDATE(year(now()-interval 1 year),1) + interval 5 month
AND STR_TO_DATE(CONCAT(y, '-', m, '-01'), '%Y-%m-%d')
<= now()
GROUP BY y, m
ORDER BY y, m
How does the above work?
CROSS JOIN creates a cartesian product between all available years and all available months. This is what you want, you want all year-month combinations with no gaps.
LEFT JOIN adds all the qualitaet records to the result (if they exist) and joins them to the year-month cartesian product from before. It is important to put prediactes like the status = 1 predicate here.
COUNT(created) counts only non-NULL values of created, i.e. when the LEFT JOIN produces no rows for any given year-month, we want 0 as a result, not 1, i.e. we don't want to count the NULL value.
A note on performance
The above makes heavy use of string operations and date time arithmetic in your ON and WHERE predicates. This isn't going to perform for lots of data. In that case, you should better pre-truncate and index your year-months in the qualitaet table, and operate only on those values.

Related

MySQL: How to get the Active members by month in year

I have got the previous year working members and subtracted previous year relieving employees, then got the previous month relieving list and subtracted it from the result set. Then added the newly added members in a current month.
SQL Fiddle Link
I am sensing that there lot of improvements we can do to the current query. But right now I am out of ideas, Can someone kindly help on this?

IF I have interpreted your existing query correctly, I suggest the following:
select
mnth.num, count(*)
from (
select 1 AS num union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9 union all select 10 union all select 11 union all select 12
) mnth
left join (
select
e.emp_id
, case
when e.hired_date < date_format(current_date(), '%Y-01-01') then 1
else month(e.hired_date)
end AS start_month
, case
when es.relieving_date < date_format(current_date(), '%Y-01-01') then 0
when es.relieving_date >= date_format(current_date(), '%Y-01-01') then month(es.relieving_date)
else month(current_date())
end AS end_month
from employee e
left join employee_separation es on e.emp_id = es.emp_id
) emp on mnth.num between emp.start_month and emp.end_month
where mnth.num <= month(current_date())
group by
mnth.num
;
This produced the following result (current_date() on Nov 21 2017
| num | count(*) |
|-----|----------|
| 1 | 6 |
| 2 | 7 |
| 3 | 8 |
| 4 | 9 |
| 5 | 10 |
| 6 | 9 |
| 7 | 10 |
| 8 | 11 |
| 9 | 12 |
| 10 | 13 |
| 11 | 14 |
DEMO
Depending on data volumes adding a where clause in the emp subquery may help, this also affect a case expression:
, case
when es.relieving_date >= date_format(current_date(), '%Y-01-01') then month(es.relieving_date)
else month(current_date())
end AS end_month
from employee e
left join employee_separation es on e.emp_id = es.emp_id
where es.relieving_date >= date_format(current_date(), '%Y-01-01')

I think what you need to do is to get all the employees who are already working from the employee table with:
SELECT * FROM employee WHERE hired_date<= CURRENT_DATE;
Then get the list of employees whose relieving date is still in the future using:
SELECT * FROM employee_separation WHERE relieving_date > CURRENT_DATE;
Then join the two results and group by the month and year of the reliving date as shown below:
SELECT DATE_FORMAT(B.relieving_date, "%Y-%M") RELIEVING_DATE, COUNT(*)
NUMBER_OF_ACTIVE_MEMBERS FROM
(SELECT * FROM employee WHERE hired_date <= CURRENT_DATE) A INNER JOIN
(SELECT * FROM employee_separation WHERE relieving_date > CURRENT_DATE) B
ON A.emp_id=B.emp_id
GROUP BY DATE_FORMAT(B.relieving_date , "%Y-%M");
Here is a Demo on sql fiddle.

Include Missing Dates Group By Date and Running Total

I have a table like below:
transID | date | payee | amount
1 | 7/1/2016 | Balance| 100.00
2 | 7/1/2016 | Bobi | -11.11
3 | 7/4/2016 | Chris | -20.76
4 | 7/7/2016 | Erin | -100.00
5 | 7/11/2016| Tom | -2.11
6 | 7/11/2016| Pay | 500.00
I am having a issue tackling how to group by days, sum, and include the missing date. I am trying to get a select to look like this:
date | balance
7/1/2016 | 88.89
7/2/2016 | 88.89
7/3/2016 | 88.89
7/4/2016 | 68.13
7/5/2016 | 68.13
7/6/2016 | 68.13
7/7/2016 | -31.87
7/8/2016 | -31.87
7/9/2016 | -31.87
7/10/2016 | -31.87
7/11/2016 | 466.02
Basically I am trying to get all the dates between each grouped date and carry the balance. This will operate just like a bank account does from day to day.

Just needed something like this, maybe it's not the right way, but worked for me :)
SELECT
*
FROM
(SELECT
'2016-07-01' + INTERVAL a + b DAY myDate
FROM
(SELECT 0 a UNION SELECT 1 a UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) d, (SELECT 0 b UNION SELECT 10 UNION SELECT 20 UNION SELECT 30 UNION SELECT 40) m
WHERE
'2016-07-01' + INTERVAL a + b DAY < '2016-08-01'
ORDER BY a + b) t1
LEFT JOIN
(SELECT
COUNT(*) AS myCount, DATE(your_date_field) AS myFS
FROM
your_table_that_misses_dates
GROUP BY myFS) t2 ON t1.myDate = t2.myFS order by myDate;
It'll give you something like this:

count the number of rows between intervals

My table is like:
+---------+---------+------------+-----------------------+---------------------+
| visitId | userId | locationId | comments | time |
+---------+---------+------------+-----------------------+---------------------+
| 1 | 3 | 12 | It's a good day here! | 2012-12-12 11:50:12 |
+---------+---------+------------+-----------------------+---------------------+
| 2 | 3 | 23 | very beautiful | 2012-12-12 12:50:12 |
+---------+---------+------------+-----------------------+---------------------+
| 3 | 3 | 52 | nice | 2012-12-12 13:50:12 |
+---------+---------+------------+-----------------------+---------------------+
which records visitors' trajectory and some comments on the places visited.
I want to count the numbers of visitors that visit a specific place (say id=3227) from 0:00 to 23:59, over some interval (ie. 30mins)
I was trying to do this by :
SELECT COUNT(*) FROM visits
GROUP BY HOUR(time), SIGN( MINUTE(time) - 30 )// if they are in the same interval this will yield the same result
WHERE locationId=3227
The problem is that if there is no record that falls in some interval, this will NOT return that interval with count 0. For example, there are no visitors visiting the location from 02:00 to 03:00, this will not give me the intervals of 02:00-02:29 and 02:30-2:59.
I want a result with an exact size of 48 (one for every half hour), how can I do this?

You have to create a table with the 48 rows that you want and use left outer join:
select n.hr, n.hr, coalesce(v.cnt, 0) as cnt
from (select 0 as hr, -1 as sign union all
select 0, 1 union all
select 1, -1 union all
select 1, 1 union all
. . .
select 23, -1 union all
select 23, 1 union all
) left outer join
(SELECT HOUR(time) as hr, SIGN( MINUTE(time) - 30 ) as sign, COUNT(*) as cnt
FROM visits
WHERE locationId=3227
GROUP BY HOUR(time), SIGN( MINUTE(time) - 30 )
) v
on n.hr = v.hr and n.sign = v.sign
order by n.hr, n.hr

MySQL count rows within the same intervals to eachother

I have a table where one column is the date:
+----------+---------------------+
| id | date |
+----------+---------------------+
| 5 | 2012-12-10 10:12:37 |
+----------+---------------------+
| 4 | 2012-12-10 09:09:55 |
+----------+---------------------+
| 3 | 2012-12-09 21:12:35 |
+----------+---------------------+
| 2 | 2012-12-09 20:15:07 |
+----------+---------------------+
| 1 | 2012-12-09 20:01:42 |
+----------+---------------------+
What I need, is to count the rows which are for example whitin 3 hours to each other. In this example I want to join the upper row with the 2nd row, and the 3rd row with the 4th and 5th rows. So my output should be like this:
+----------+---------------------+---------+
| id | date | count |
+----------+---------------------+---------+
| 5 | 2012-12-10 10:12:37 | 2 |
+----------+---------------------+---------+
| 3 | 2012-12-09 21:12:35 | 3 |
+----------+---------------------+---------+
How could I do this?

I think you need a self-join for this:
select t.id, t.date, COUNT(t2.id)
from t left outer join
t t2
on t.date between t2.date - interval 3 hour and t2.date + interval 3 hour
group by t.id, t.date
(This is untested code so it might have a syntax error.)
If you are trying to divide everything into 3-hour intervals, you can do something like:
select max(t.date), t.id, count(*)
from (select t.*,
(date(date)*100 + floor(hour(date)/3)*3) as interval
from t
) t
group by interval

I am not sure how to do this with My SQL but i am able to build a set of queries in SQL Server 2005 which will provide the intended results. Here is the working sample, its very complex and may be overly complex but that's how i was able to get the desired result:
WITH BaseData AS
(
SELECT 5 AS ID, '2012-12-10 10:12:37' AS Date
UNION ALL
SELECT 4 AS ID, '2012-12-10 09:09:55' AS Date
UNION ALL
SELECT 3 AS ID, '2012-12-09 21:12:35' AS Date
UNION ALL
SELECT 2 AS ID, '2012-12-09 20:15:07' AS Date
UNION ALL
SELECT 1 AS ID, '2012-12-09 20:01:42' AS Date
),
BaseDataWithRowNum AS
(
SELECT ID,DATE, ROW_NUMBER() OVER (ORDER BY Date DESC) AS RowNum
FROM BaseData
),
InterRelatedDates AS
(
SELECT B1.RowNum AS RowNum1,B2.RowNum AS RowNum2
FROM BaseDataWithRowNum B1
INNER JOIN BaseDataWithRowNum B2
ON B1.Date BETWEEN B2.Date AND DATEADD(hh,3,B2.Date)
AND B1.RowNum < B2.RowNum
AND B1.ID != B2.ID
),
InterRelatedDatesWithinMultipleGroups AS
(
SELECT G1.RowNum1,G2.RowNum2
FROM InterRelatedDates G1
LEFT JOIN InterRelatedDates G2
ON G1.RowNum2 = G2.RowNum2
AND G1.RowNum1 != G2.RowNum1
)
SELECT BN.ID,
BN.Date,
CountExcludingOriginalGrouppingRecord +1 AS C
FROM
(
SELECT RowNum1 AS RowNum,COUNT(1) AS CountExcludingOriginalGrouppingRecord
FROM
(
-- If a row was used in only one group then it is ok. use as it is
SELECT D1.RowNum1
FROM InterRelatedDatesWithinMultipleGroups AS D1
WHERE D1.RowNum2 IS NULL
UNION ALL
-- In case a row was selected in two groups, choose the one with higher date
SELECT Min(D1.RowNum1)
FROM InterRelatedDatesWithinMultipleGroups AS D1
WHERE D1.RowNum2 IS NOT NULL
GROUP BY D1.RowNum2
) T
GROUP BY RowNum1
) T2
INNER JOIN BaseDataWithRowNum BN
ON BN.RowNum = T2.RowNum

How to include dates with zero messages into the resultset anyway?

I have the following table with messages:
+---------+---------+------------+----------+
| msg_id | user_id | m_date | m_time |
+-------------------+------------+----------+
| 1 | 1 | 2011-01-22 | 06:23:11 |
| 2 | 1 | 2011-01-23 | 16:17:03 |
| 3 | 1 | 2011-01-23 | 17:05:45 |
| 4 | 2 | 2011-01-22 | 23:58:13 |
| 5 | 2 | 2011-01-23 | 23:59:32 |
| 6 | 2 | 2011-01-24 | 21:02:41 |
| 7 | 3 | 2011-01-22 | 13:45:00 |
| 8 | 3 | 2011-01-23 | 13:22:34 |
| 9 | 3 | 2011-01-23 | 18:22:34 |
| 10 | 3 | 2011-01-24 | 02:22:22 |
| 11 | 3 | 2011-01-24 | 13:12:00 |
+---------+---------+------------+----------+
What I want is for each day, to see how many messages each user has sent BEFORE and AFTER 16:00:
SELECT
user_id,
m_date,
SUM(m_time <= '16:00') AS before16,
SUM(m_time > '16:00') AS after16
FROM messages
GROUP BY user_id, m_date
ORDER BY user_id, m_date ASC
This produces:
user_id m_date before16 after16
-------------------------------------
1 2011-01-22 1 0
1 2011-01-23 0 2
2 2011-01-22 0 1
2 2011-01-23 0 1
2 2011-01-24 0 1
3 2011-01-22 1 0
3 2011-01-23 1 1
3 2011-01-24 2 0
Because user 1 has written no messages on 2011-01-24, this date is not in the resultset. However, this is undesirable. I have a second table in my database, called "date_range":
+---------+------------+
| date_id | d_date |
+---------+------------+
| 1 | 2011-01-21 |
| 1 | 2011-01-22 |
| 1 | 2011-01-23 |
| 1 | 2011-01-24 |
+---------+------------+
I want to check the "messages" against this table. For each user, all these dates have to be in the resultset. As you can see, none of the users have written messages on 2011-01-21, and as said, user 1 has no messages on 2011-01-24. The desired output of the query would be:
user_id d_date before16 after16
-------------------------------------
1 2011-01-21 0 0
1 2011-01-22 1 0
1 2011-01-23 0 2
1 2011-01-24 0 0
2 2011-01-21 0 0
2 2011-01-22 0 1
2 2011-01-23 0 1
2 2011-01-24 0 1
3 2011-01-21 0 0
3 2011-01-22 1 0
3 2011-01-23 1 1
3 2011-01-24 2 0
How can I link the two tables so that the query result also holds rows with zero values for before16 and after16?
Edit: yes, I have a "users" table:
+---------+------------+
| user_id | user_date |
+---------+------------+
| 1 | foo |
| 2 | bar |
| 3 | foobar |
+---------+------------+

Test bed:
create table messages (msg_id integer, user_id integer, _date date, _time time);
create table date_range (date_id integer, _date date);
insert into messages values
(1,1,'2011-01-22','06:23:11'),
(2,1,'2011-01-23','16:17:03'),
(3,1,'2011-01-23','17:05:05');
insert into date_range values
(1, '2011-01-21'),
(1, '2011-01-22'),
(1, '2011-01-23'),
(1, '2011-01-24');
Query:
SELECT p._date, p.user_id,
coalesce(m.before16, 0) b16, coalesce(m.after16, 0) a16
FROM
(SELECT DISTINCT user_id, dr._date FROM messages m, date_range dr) p
LEFT JOIN
(SELECT user_id, _date,
SUM(_time <= '16:00') AS before16,
SUM(_time > '16:00') AS after16
FROM messages
GROUP BY user_id, _date
ORDER BY user_id, _date ASC) m
ON p.user_id = m.user_id AND p._date = m._date;
EDIT:
Your initial query is left as is, I hope it doesn't requires any explanations;
SELECT DISTINCT user_id, dr._date FROM messages m, date_range dr will return a cartesian or CROSS JOIN of two tables, which will give me all required date range for each user in subject. As I'm interested in each pair only once, I use DISTINCT clause. Try this query with and without it;
Then I use LEFT JOIN on two sub-selects.
This join means: first, INNER join is performed, i.e. all rows with matching fields in the ON condition are returned. Then, for each row in the left-side relation of the join that has no matches on the right side, return NULLs (thus the name, LEFT JOIN, i.e. left relation is always there and right is expected to have NULLs). This join will do what you expect — return user_id + date combinations even if there were no messages in the given date for a given user. Note that I use user_id + date sub-select first (on the left) and messages query second (on the right);
coalesce() is used to replace NULL with zero.
I hope this clarifies how this query works.

Give this a shot:
select u.user_id, u._date,
sum(_time <= '16:00') as before16,
sum(_time > '16:00') as after16
from (
select m.user_id, d._date
from messages m
cross join date_range d
group by m.user_id, d._date
) u
left join messages m on u.user_id=m.user_id
and u._date=m._date
group by u.user_id, u._date
The inner query is just building a set of all possible/desired user-date pairs. It would be more efficient to use a users table, but you didn't mention that you had one, so I won't assume. otherwise, you just need the left join to not remove the non-joined records.
EDIT
--More detailed explanation: taking the query apart.
Start with the innermost query; the goal is to get a list of all desired dates for every user. Since there's a table of users and a table of dates it can look like this:
select distinct u.user_id, d.d_date
from users u
cross join date_range d
The key here is the cross join, taking every row in the users table and associating it with every row in the date_range table. The distinct keyword is really just a shorthand for a group by on all columns, and is here just in case there's duplicated data.
Note that there are several other methods of getting this same result set (like in my original query), but this is probably the simplest from both a logical and computational standpoint.
Really, the only other steps are to add the left join (associating all of the rows we got above to all available data, and not removing anything that doesn't have any data) and the group by and select components which are basically the same as you had before. So, putting everything together it looks like this:
select t.user_id, t.d_date,
sum(m.m_time <= '16:00') as before16,
sum(m.m_time > '16:00') as after16
from (
select distinct u.user_id, d.d_date
from users u
cross join date_range d
) t
left join messages m on t.user_id = m.user_id
and t.d_date = m.m_date
group by t.user_id, t.d_date
Based on some other comments/questions, note the explicit use of prefixes for all uses of all tables and sub-queries (which is pretty straight forward since we're not using any table more than once anymore): u for the users table, d for the date_range table, t for the sub-query containing the dates to use for each user, and m for the message table. This is probably where my first explanation fell a little short, since I used the message table twice, both times with the same prefix. It works there because of the context of both uses (one was in a sub-query), but it probably isn't the best practice.

It is not neat. But if you have a user table. Then maybe something like this:
SELECT
user_id,
_date,
SUM(_time <= '16:00') AS before16,
SUM(_time > '16:00') AS after16
FROM messages
GROUP BY user_id, _date
UNION
SELECT
user_id,
date_range,
0 AS before16,
0 AS after16
FROM
users,
date_range
ORDER BY user_id, _date ASC

chezy525's solution works great, I ported it to postgresql and removed/renamed some aliases:
select users_and_dates.user_id, users_and_dates._date,
SUM(case when _time <= '16:00' then 1 else 0 end) as before16,
SUM(case when _time > '16:00' then 1 else 0 end) as after16
from (
select messages.user_id, date_range._date
from messages
cross join date_range
group by messages.user_id, date_range._date
) users_and_dates
left join messages on users_and_dates.user_id=messages.user_id
and users_and_dates._date=messages._date
group by users_and_dates.user_id, users_and_dates._date;
and ran on my machine, worked perfectly

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySQL - count by month (including missing records) - mysql

Related

MySQL: How to get the Active members by month in year

Include Missing Dates Group By Date and Running Total

count the number of rows between intervals

MySQL count rows within the same intervals to eachother

How to include dates with zero messages into the resultset anyway?

Categories

Resources