Grouping by 2 column - mysql

I have a query that shows how many messages, is being sent through my system the last year, grouped by months. Works perfectly!
The result look like this:
+------+-------+--------+--------+--------+
| Year | Month | Type 1 | Type 2 | Type 3 |
+------+-------+--------+--------+--------+
| 2013 | 10 | 0 | 2 | 3 |
| 2013 | 11 | 4 | 21 | 56 |
| 2013 | 12 | 1 | 10 | 16 |
| 2014 | 1 | 2 | 10 | 52 |
| 2014 | 2 | 1 | 62 | 118 |
+------+-------+--------+--------+--------+
(type 1,2 and 3 is simply different types of USERS -ignore this)
However, I'd like to avoid that the same receiver (msg_receiver) can be shown twice in the result set, for each month.
So if user 44 and 39 sends a message to user 70 in december, user_id 70 would only be counted ONCE for december. Currently, he will show up twice.
Below is my query:
SELECT
Year(m.msg_date) as year,
Month(m.msg_date) as month,
sum(u.type = '1') as type_1,
Sum(u.type = '2') as type_2,
sum(u.type = '7') as type_3
FROM
messages m
INNER JOIN
users u ON u.user_id = m.msg_sender
WHERE
m.msg_date >= CURDATE() - INTERVAL 1 YEAR
AND month(msg_date) != month(curdate())
GROUP BY
Month(m.msg_date) -- , m.msg_receiver (this does not work, it will no longer group by each month/year).
ORDER BY
msg_date
The logical answer to this, would in my option be, to first group by month, then user_id (or vice via). But if I do this, the results looks strange. See:
Using GROUP BY Month(m.msg_date), u.user_id
+------+-------+--------+--------+--------+
| Year | Month | Type 1 | Type 2 | Type 3 |
+------+-------+--------+--------+--------+
| 2013 | 10 | 0 | 1 | 0 |
| 2013 | 10 | 0 | 0 | 1 |
| 2013 | 10 | 0 | 0 | 1 |
| 2013 | 10 | 0 | 1 | 0 |
| 2013 | 10 | 0 | 0 | 1 |
| 2013 | 11 | 0 | 0 | 19 |
| 2013 | 11 | 0 | 1 | 0 |
| 2013 | 11 | 0 | 1 | 0 |
| 2013 | 11 | 0 | 1 | 0 |
| 2013 | 11 | 0 | 1 | 0 |
| 2013 | 11 | 2 | 0 | 0 |
| 2013 | 11 | 0 | 0 | 11 |
+------+-------+--------+--------+--------+
It does not group by months anymore, as it should.
Any ideas?
EDIT
Just to clarify what exactly I want to achieve, as people have been a bit confused. Imagine this scenario:
It is December 2013.
USER 1 has written 5 messages to USER 2 (this should count as 1 in december)
USER 4 has written 1 message to USER 4 (this should count as 1 in december)
USER 3 has written 2 messages to USER 4 and 2 (this should count as 2 in december).
The totals of the month would then be 4. Because there has been 4 conversations.
Does it makes sense? I find my self often struggling with how to express my self correctly and understandable.

You can use COUNT(DISTINCT to only count each msg_receiver once per type:
SELECT
Year(m.msg_date) as year,
Month(m.msg_date) as month,
COUNT(DISTINCT CASE WHEN u.type = '1' THEN m.msg_receiver END) as type_1,
COUNT(DISTINCT CASE WHEN u.type = '2' THEN m.msg_receiver END) as type_2,
COUNT(DISTINCT CASE WHEN u.type = '3' THEN m.msg_receiver END) as type_3
FROM
messages m
INNER JOIN
users u ON u.user_id = m.msg_sender
WHERE
m.msg_date >= CURDATE() - INTERVAL 1 YEAR
AND month(msg_date) != month(curdate())
GROUP BY
Year(m.msg_date), Month(m.msg_date)
ORDER BY
msg_date
N.B I have added Year(m.msg_date) to your group by to ensure the results are determinate
If the same user receives a message from two different users that have two different types, they will be counted in both types though. If this is not the intended result you would need to come up with some logic as to which type they should be counted in (Min, Max, Mode, Median etc)
If, for example, you wanted the minimum user type, you could use:
SELECT
m.year,
m.month,
sum(m.type = '1') as type_1,
Sum(m.type = '2') as type_2,
sum(m.type = '7') as type_3
FROM (
SELECT
Year(m.msg_date) as year,
Month(m.msg_date) as month,
m.msg_receiver,
MIN(u.type) AS type
FROM
messages m
INNER JOIN
users u ON u.user_id = m.msg_sender
WHERE
m.msg_date >= CURDATE() - INTERVAL 1 YEAR
AND month(msg_date) != month(curdate())
GROUP BY
Year(m.msg_date), Month(m.msg_date), m.msg_receiver
) m
GROUP BY
m.Year, m.Month
ORDER BY
m.year, m.month;
EDIT
In response to your updated question, in its current form my first answer would count your example as only 3 conversations not 4, as there were only 3 unique recipients. What you really need is to be able to count distinct over sender and receiver, i.e. count(distinct m.msg_sender, m.msg_sender). Unfortunately this is not valid syntax, however, you can achieve essentially the same thing by concatenating the two fields (as long as they are separated by a character/characters that cannot appear in either. e.g
SELECT
Year(m.msg_date) as year,
Month(m.msg_date) as month,
COUNT(DISTINCT CASE WHEN u.type = '1' THEN CONCAT(m.msg_sender, '|', m.msg_receiver) END) as type_1,
COUNT(DISTINCT CASE WHEN u.type = '2' THEN CONCAT(m.msg_sender, '|', m.msg_receiver) END) as type_2,
COUNT(DISTINCT CASE WHEN u.type = '3' THEN CONCAT(m.msg_sender, '|', m.msg_receiver) END) as type_3
FROM
messages m
INNER JOIN
users u ON u.user_id = m.msg_sender
WHERE
m.msg_date >= CURDATE() - INTERVAL 1 YEAR
AND month(msg_date) != month(curdate())
GROUP BY
Year(m.msg_date), Month(m.msg_date)
ORDER BY
msg_date

you haven't posted a data structure, but it appears that you want to change the INNER JOIN to
INNER JOIN
users u ON u.user_id = m.msg_receiver

Related

Query based on dates

I've got the following table/data (example)
Users
user_id | email
1 | asd#asd.com
2 | asd2#asd.com
3 | asd3#asd.com
4 | asd4#asd.com
5 | asd5#asd.com
Scheduled_Jobs
job_id | user_id | date
1 | 1 | 05/09/2019
2 | 1 | 05/10/2019
3 | 1 | 05/11/2019
4 | 1 | 05/12/2019
5 | 2 | 07/10/2019
6 | 2 | 07/11/2019
7 | 2 | 07/12/2019
8 | 3 | 11/07/2019
9 | 4 | 13/10/2019
10 | 4 | 13/11/2019
11 | 5 | 10/10/2019
12 | 5 | 10/11/2019
13 | 5 | 10/12/2019
Last_Update
update_id | job_id
1 | 1
2 | 2
3 | 3
4 | 5
5 | 9
6 | 11
When a user is created a list of scheduled jobs is created too. When a user completes a job the Last_Update table is getting updated.
I'm trying to show a list of users which got unfinished jobs based on date. For example 1-30 days delay: x users, 31-60 days delay: y users etc
Based on the example above here would be the expected result:
Number of users with no delayed jobs: 2 (users 1 & 4)
1-30 days delay: 2 (users 2 & 5)
31-60 days delay: 0
Over 60 days delay: 1 (user 3)
I'm currently only showing the number of users that got no delayed jobs
SELECT u.user_id
FROM users u
LEFT JOIN (
SELECT j.user_id AS completed
FROM jobs j
LEFT JOIN last_update lu
ON lu.job_id = j.job_id
WHERE j.job_date <= CURDATE()
AND lu.update_id IS NULL
) AS cj
ON u.user_id = cj.completed
WHERE cj.completed IS NULL
You can first join the three tables, aggregate by user_id and compute, for each user
how many unfinished jobs they have
how many unfinished jobs they have within the last 30 days
how many unfinished jobs they have within the last 31-60 days
Then, you can add another level of aggreation and count how many users meet each criteria.
Query:
select
sum(cnt_jobs_unfinished = 0) cnt_users_no_unfinished_jobs,
sum(cnt_jobs_unfinished_30d > 0) cnt_users_unfinished_30d,
sum(cnt_jobs_unfinished_31_60d > 0) cnt_users_unfinished_31_60d
from (
select
u.user_id,
sum(l.job_id is null) cnt_jobs_unfinished,
sum(
l.job_id is null
and j.date >= curdate() - interval 30 day
) cnt_jobs_unfinished_30d,
sum(
l.job_id is null
and j.date < curdate() - interval 30 day
and j.date >= curdate() - interval 60 day
) cnt_jobs_unfinished_31_60d
from users u
inner join scheduled_jobs j
on j.date <= curdate()
and j.user_id = u.user_id
left join last_update l
on l.job_id = j.job_id
group by u.user_id
) t
Demo on DB Fiddle
cnt_users_no_unfinished_jobs | cnt_users_unfinished_30d | cnt_users_unfinished_31_60d
---------------------------: | -----------------------: | --------------------------:
2 | 2 | 1
Note: I had to modify your sample data so job 8, for user 3, has a date within 30-60 days, as it was not the case in your original data).
You can run the subquery independantly to see what it returns:
user_id | cnt_jobs_unfinished | cnt_jobs_unfinished_30d | cnt_jobs_unfinished_31_60d
------: | ------------------: | ----------------------: | -------------------------:
1 | 0 | 0 | 0
2 | 1 | 1 | 0
3 | 1 | 0 | 1
4 | 0 | 0 | 0
5 | 1 | 1 | 0

Select count date from multiple where conditions for each day

I have lot of coupons. I would like to learn for each day how many coupons from each campaign have been received by users. But I cannot write something like assigned_date for each select row
SELECT count(id) as number_of_coupons,
DATE_FORMAT(assigned_date, '%d-%m-%Y') as date,
(SELECT COUNT(id) FROM coupon WHERE campaign_id = 1 AND assigned_date=THIS MUST BE SOMETHING) as campaign_1,
(SELECT COUNT(id) FROM coupon WHERE campaign_id = 2 AND assigned_date=THIS MUST BE SOMETHING) as campaign_2
FROM coupon
GROUP BY DATE_FORMAT(assigned_date, '%d-%m-%Y')
order by STR_TO_DATE(date, '%d-%m-%Y') DESC
So the result will be something like. How can I achieve this result?
+-------------------+------------+-------------+-----------+
| number of coupons | date | campaign_1 | campaign2 |
+-------------------+------------+-------------+-----------+
| 156 | 12-10-2019 | 6980 | 100 |
| 177 | 11-10-2019 | 6980 | 100 |
| 44 | 10-10-2019 | 6980 | 100 |
| 94 | 09-10-2019 | 6980 | 100 |
| 93 | 08-10-2019 | 6980 | 100 |
+-------------------+------------+-------------+-----------+
Not knowing what your data structure looks like, I can only speculate on what the solution should be. However, my guess is that a query such as the following is what you want:
SELECT COUNT(DISTINCT cv.id) as number_of_coupons,
DATE_FORMAT(cv.assigned_date, '%d-%m-%Y') as date,
SUM(CASE WHEN c.campaign_id = 1 THEN 1 ELSE 0 END) as campaign_1,
SUM(CASE WHEN c.campaign_id = 2 THEN 1 ELSE 0 END) as campaign_2
FROM coupon_vault cv LEFT JOIN
coupons c
ON cv.coupon_id = c.coupon_id
GROUP BY DATE_FORMAT(cv.assigned_date, '%d-%m-%Y')
ORDER BY MIN(cv.assigned_date);
It is quite possible that the COUNT(DISTINCT) is unnecessary and COUNT() would suffice.
Both Postgres and MySQL (your original tag) have reasonable alternatives for the SUM(CASE . . .). However, you have not specified your database, so I'm sticking with the code that works in both databases.

Join two table and count, avoid zero if record is not available in second table

I have following tables products and tests.
select id,pname from products;
+----+---------+
| id | pname |
+----+---------+
| 1 | prd1 |
| 2 | prd2 |
| 3 | prd3 |
| 4 | prd4 |
+----+---------+
select pname,testrunid,testresult,time from tests;
+--------+-----------+------------+-------------+
| pname | testrunid | testresult | time |
+--------+-----------+------------+-------------+
| prd1 | 800 | PASS | 2017-10-02 |
| prd1 | 801 | FAIL | 2017-10-16 |
| prd1 | 802 | PASS | 2017-10-02 |
| prd1 | 803 | NULL | 2017-10-16 |
| prd1 | 804 | PASS | 2017-10-16 |
| prd1 | 805 | PASS | 2017-10-16 |
| prd1 | 806 | PASS | 2017-10-16 |
+--------+-----------+------------+-------------+
I like to count test results for products and if there is no result available,for a product just show a zero for it. something like following table:
+--------+------------+-----------+----------------+---------------+
| pname | total_pass | total_fail| pass_lastweek | fail_lastweek |
+--------+------------+-----------+----------------+---------------+
| prd1 | 5 | 1 | 3 | 1 |
| prd2 | 0 | 0 | 0 | 0 |
| prd3 | 0 | 0 | 0 | 0 |
| prd4 | 0 | 0 | 0 | 0 |
+--------+------------+-----------+----------------++--------------+
I have tried different queries like following, which is just working for one product and is incomplete:
SELECT pname, count(*) as pass_lastweek FROM tests where testresult = 'PASS' AND time
>= '2017-10-11' and pname in (select pname from products) group by pname;
+-------------+---------------+
| pname | pass_lastweek |
+-------------+---------------+
| prd1 | 3 |
+-------------+---------------+
it looks so basic but still I am unable to write it, any idea?
Use conditional aggregation. The COUNT function count NULL values as zeros automatically, therefore, there is no need to take care of that.
select p.pname,
count(case when testresult = 'PASS' then 1 end) as total_pass,
count(case when testresult = 'FAIL' then 1 end) as total_fail,
count(case when testresult = 'PASS' and time >= curdate() - INTERVAL 6 DAY then 1 end) as pass_lastweek ,
count(case when testresult = 'FAIL' and time >= curdate() - INTERVAL 6 DAY then 1 end) as fail_lastweek ,
from products p
left join tests t on t.pname = p.pname
group p.id, p.pname
Generally, you need to LEFT JOIN the first table with the second one before you group. The join will give you a row for each product (even if there are no test results to join it to; INNER JOIN would exclude products with no associated tests) + an additional row for each test result (beyond the first). Then you can group them.
SELECT products.*, tests.* FROM products
LEFT JOIN tests ON products.pname = tests.pname
GROUP BY products.id
Also, I would strongly recommend using a product_id column in the tests table, rather than using pname (if a products.pname changes, your whole DB breaks unless you also update the pname field in kind for every test result). The general query would then look like this:
SELECT products.*, tests.* FROM products
LEFT JOIN tests ON products.id = tests.product_id
GROUP BY products.id
I used 2 queries , the first with conditional count and the second one is to change all null values into 0 :
select pname,
case when total_pass is null then 0 else total_pass end as total_pass,
case when total_fail is null then 0 else total_fail end as total_fail,
case when pass_lastweek is null then 0 else pass_lastweek end as pass_lastweek,
case when fail_lastweek is null then 0 else fail_lastweek end asfail_lastweek from (
select products.pname,
count(case when testresult = 'PASS' then 1 end) as total_pass,
count(case when testresult = 'FAIL' then 1 end) as total_fail,
count(case when testresult = 'PASS' and time >= current_date -7 DAY then 1 end) as pass_lastweek ,
count(case when testresult = 'FAIL' and time >= current_date -7 DAY then 1 end) as fail_lastweek ,
from products
left join tests on tests.pname = products.pname
group 1 ) t1

How to write this query MYSQL

I have this database:
| id | name | email | control_number | created | | | | | |
|:--:|-------|-----------------|----------------|------------|---|---|---|---|---|
| 1 | john | john#gmail.com | 1 | 14/09/2016 | | | | | |
| 2 | carl | carl#gmail.com | 1 | 13/08/2016 | | | | | |
| 3 | frank | frank#gmail.com | 2 | 12/08/2016 | | | | | |
And i want to get the COUNT in the last 12 months by the control_number.
basicly is a COUNT where control_number = 1 but by month.
So if the query is done today, its september, it should start from september to October 2015 and display the count of records for each month.
Result should be:
09/2016 = 50
08/2016 = 35
07/2016 = 20
06/2016 = 50
05/2016 = 21
04/2016 = 33
03/2016 = 60
02/2016 = 36
01/2016 = 11
12/2015 = 0
11/2015 = 0
10/2015 = 0
Hmmm. Getting the 0 values can be tricky. Assuming that you have some data each month (even if not for "1"), th en you can do:
select extract(year_month from created) as yyyymm,
sum(control_number = 1)
from t
where created >= date_sub(curdate(), interval 12 month)
group by extract(year_month from created)
order by yyyymm;
If you don't have at least one record for each month, then you'll need a left join and a table with one row per month.
Try this:
select CONCAT(SUBSTRING(ym, 5, 2), '/', SUBSTRING(ym, 1, 4)) Month, Count from (
select EXTRACT(YEAR_MONTH FROM created) ym, count(*) Count
from mytable
where EXTRACT(YEAR_MONTH FROM created) > (EXTRACT(YEAR_MONTH FROM SUBDATE(NOW(), INTERVAL 1 YEAR))
group by 1
order by 1 desc) x
Try:
select concat(month(created),'/',year(created)) as period, count(*) as cnt
from mytable
where control_number=1 and TIMESTAMPDIFF(year, created, now())=0
group by (month(created));

SELECT total votes for new & returning users in mysql

I have 2 tables: votes and users. I would like to be show the count (not sum) of total votes for new registered users by date next to the total votes of that day.
Time frame for the query can be all, I can later add from - to time frame.
So if I have 10 votes per Date X, I would like to know how many where generated by new users and how many by returning users as well as how many new and returning voted (* also to include new who didn't vote).
votes:
id | vote | user_id | created_at
1 | 30 | 28 | 2012-06-10
1 | 12 | 15 | 2012-06-10
1 | 30 | 28 | 2012-06-10
...
users:
users_ id | created_at
28 | 2012-06-01
29 | 2012-06-03
30 | 2012-06-10
...
and the result i'm looking to get is this:
Date | total votes | votes for new | votes for returning | total new users | total returning users
thanks!
----- current code:
SELECT
DATE(created_at) AS Date,
SUM(CASE WHEN `Type` = 'Votes' THEN 1 ELSE 0 END) AS 'Total Votes',
SUM(CASE WHEN `Type` = 'Users' THEN 1 ELSE 0 END) AS 'Total Users'
FROM
(
SELECT created_at, 'Votes' `Type` FROM votes
UNION ALL
SELECT created_at, 'Users' FROM users
) t
GROUP by DATE(created_at)
ORDER by DATE(created_at) DESC
I would consider breaking this up in several queries. If i understood you right, this should be what you are looking for.
All votes pr date:
SELECT created_at AS date, COUNT(*) AS totalvotes FROM votes GROUP BY created_at;
All votes by new users pr date
SELECT v.created_at AS date, COUNT(*) AS newuservotes FROM votes v INNER JOIN users u WHERE v.user_id = u.user_id AND v.created_at = u.created_at GROUP BY v.created_at;
All votes by old users pr date
The two numbers subtracted
If you want the entire thing in one query:
SELECT totvotes.date AS date, totvotes.totalvotes AS totalvotes, IFNULL(newvotes.newuservotes, 0) AS newuservotes, totvotes.totalvotes-IFNULL(newvotes.newuservotes, 0) AS olduservotes
FROM
(
SELECT created_at AS date, COUNT(*) AS totalvotes FROM votes GROUP BY created_at
) totvotes
LEFT JOIN
(
SELECT v.created_at AS date, COUNT(*) AS newuservotes FROM votes v INNER JOIN users u WHERE v.user_id = u.user_id AND v.created_at = u.created_at GROUP BY v.created_at
) newvotes ON totvotes.date = newvotes.date
Tested with
mysql> select * from users;
+---------+------------+
| user_id | created_at |
+---------+------------+
| 1 | 2012-06-01 |
| 2 | 2012-06-02 |
| 3 | 2012-06-03 |
+---------+------------+
3 rows in set (0.00 sec)
mysql> select * from votes;
+----+------+---------+------------+
| id | vote | user_id | created_at |
+----+------+---------+------------+
| 1 | 10 | 1 | 2012-06-01 |
| 2 | 20 | 1 | 2012-06-10 |
| 3 | 30 | 2 | 2012-06-02 |
| 4 | 40 | 2 | 2012-06-10 |
+----+------+---------+------------+
4 rows in set (0.00 sec)
Result:
+------------+------------+--------------+--------------+
| date | totalvotes | newuservotes | olduservotes |
+------------+------------+--------------+--------------+
| 2012-06-01 | 1 | 1 | 0 |
| 2012-06-02 | 1 | 1 | 0 |
| 2012-06-10 | 2 | 0 | 2 |
+------------+------------+--------------+--------------+
Something similar could be used for "Never voted" users and so forth
I wasn't sure about how versed you are in SQL, so if there is any questions please comment and i will update the ansver.