How to include dates with zero messages into the resultset anyway? - mysql

I have the following table with messages:
+---------+---------+------------+----------+
| msg_id | user_id | m_date | m_time |
+-------------------+------------+----------+
| 1 | 1 | 2011-01-22 | 06:23:11 |
| 2 | 1 | 2011-01-23 | 16:17:03 |
| 3 | 1 | 2011-01-23 | 17:05:45 |
| 4 | 2 | 2011-01-22 | 23:58:13 |
| 5 | 2 | 2011-01-23 | 23:59:32 |
| 6 | 2 | 2011-01-24 | 21:02:41 |
| 7 | 3 | 2011-01-22 | 13:45:00 |
| 8 | 3 | 2011-01-23 | 13:22:34 |
| 9 | 3 | 2011-01-23 | 18:22:34 |
| 10 | 3 | 2011-01-24 | 02:22:22 |
| 11 | 3 | 2011-01-24 | 13:12:00 |
+---------+---------+------------+----------+
What I want is for each day, to see how many messages each user has sent BEFORE and AFTER 16:00:
SELECT
user_id,
m_date,
SUM(m_time <= '16:00') AS before16,
SUM(m_time > '16:00') AS after16
FROM messages
GROUP BY user_id, m_date
ORDER BY user_id, m_date ASC
This produces:
user_id m_date before16 after16
-------------------------------------
1 2011-01-22 1 0
1 2011-01-23 0 2
2 2011-01-22 0 1
2 2011-01-23 0 1
2 2011-01-24 0 1
3 2011-01-22 1 0
3 2011-01-23 1 1
3 2011-01-24 2 0
Because user 1 has written no messages on 2011-01-24, this date is not in the resultset. However, this is undesirable. I have a second table in my database, called "date_range":
+---------+------------+
| date_id | d_date |
+---------+------------+
| 1 | 2011-01-21 |
| 1 | 2011-01-22 |
| 1 | 2011-01-23 |
| 1 | 2011-01-24 |
+---------+------------+
I want to check the "messages" against this table. For each user, all these dates have to be in the resultset. As you can see, none of the users have written messages on 2011-01-21, and as said, user 1 has no messages on 2011-01-24. The desired output of the query would be:
user_id d_date before16 after16
-------------------------------------
1 2011-01-21 0 0
1 2011-01-22 1 0
1 2011-01-23 0 2
1 2011-01-24 0 0
2 2011-01-21 0 0
2 2011-01-22 0 1
2 2011-01-23 0 1
2 2011-01-24 0 1
3 2011-01-21 0 0
3 2011-01-22 1 0
3 2011-01-23 1 1
3 2011-01-24 2 0
How can I link the two tables so that the query result also holds rows with zero values for before16 and after16?
Edit: yes, I have a "users" table:
+---------+------------+
| user_id | user_date |
+---------+------------+
| 1 | foo |
| 2 | bar |
| 3 | foobar |
+---------+------------+

Test bed:
create table messages (msg_id integer, user_id integer, _date date, _time time);
create table date_range (date_id integer, _date date);
insert into messages values
(1,1,'2011-01-22','06:23:11'),
(2,1,'2011-01-23','16:17:03'),
(3,1,'2011-01-23','17:05:05');
insert into date_range values
(1, '2011-01-21'),
(1, '2011-01-22'),
(1, '2011-01-23'),
(1, '2011-01-24');
Query:
SELECT p._date, p.user_id,
coalesce(m.before16, 0) b16, coalesce(m.after16, 0) a16
FROM
(SELECT DISTINCT user_id, dr._date FROM messages m, date_range dr) p
LEFT JOIN
(SELECT user_id, _date,
SUM(_time <= '16:00') AS before16,
SUM(_time > '16:00') AS after16
FROM messages
GROUP BY user_id, _date
ORDER BY user_id, _date ASC) m
ON p.user_id = m.user_id AND p._date = m._date;
EDIT:
Your initial query is left as is, I hope it doesn't requires any explanations;
SELECT DISTINCT user_id, dr._date FROM messages m, date_range dr will return a cartesian or CROSS JOIN of two tables, which will give me all required date range for each user in subject. As I'm interested in each pair only once, I use DISTINCT clause. Try this query with and without it;
Then I use LEFT JOIN on two sub-selects.
This join means: first, INNER join is performed, i.e. all rows with matching fields in the ON condition are returned. Then, for each row in the left-side relation of the join that has no matches on the right side, return NULLs (thus the name, LEFT JOIN, i.e. left relation is always there and right is expected to have NULLs). This join will do what you expect — return user_id + date combinations even if there were no messages in the given date for a given user. Note that I use user_id + date sub-select first (on the left) and messages query second (on the right);
coalesce() is used to replace NULL with zero.
I hope this clarifies how this query works.

Give this a shot:
select u.user_id, u._date,
sum(_time <= '16:00') as before16,
sum(_time > '16:00') as after16
from (
select m.user_id, d._date
from messages m
cross join date_range d
group by m.user_id, d._date
) u
left join messages m on u.user_id=m.user_id
and u._date=m._date
group by u.user_id, u._date
The inner query is just building a set of all possible/desired user-date pairs. It would be more efficient to use a users table, but you didn't mention that you had one, so I won't assume. otherwise, you just need the left join to not remove the non-joined records.
EDIT
--More detailed explanation: taking the query apart.
Start with the innermost query; the goal is to get a list of all desired dates for every user. Since there's a table of users and a table of dates it can look like this:
select distinct u.user_id, d.d_date
from users u
cross join date_range d
The key here is the cross join, taking every row in the users table and associating it with every row in the date_range table. The distinct keyword is really just a shorthand for a group by on all columns, and is here just in case there's duplicated data.
Note that there are several other methods of getting this same result set (like in my original query), but this is probably the simplest from both a logical and computational standpoint.
Really, the only other steps are to add the left join (associating all of the rows we got above to all available data, and not removing anything that doesn't have any data) and the group by and select components which are basically the same as you had before. So, putting everything together it looks like this:
select t.user_id, t.d_date,
sum(m.m_time <= '16:00') as before16,
sum(m.m_time > '16:00') as after16
from (
select distinct u.user_id, d.d_date
from users u
cross join date_range d
) t
left join messages m on t.user_id = m.user_id
and t.d_date = m.m_date
group by t.user_id, t.d_date
Based on some other comments/questions, note the explicit use of prefixes for all uses of all tables and sub-queries (which is pretty straight forward since we're not using any table more than once anymore): u for the users table, d for the date_range table, t for the sub-query containing the dates to use for each user, and m for the message table. This is probably where my first explanation fell a little short, since I used the message table twice, both times with the same prefix. It works there because of the context of both uses (one was in a sub-query), but it probably isn't the best practice.

It is not neat. But if you have a user table. Then maybe something like this:
SELECT
user_id,
_date,
SUM(_time <= '16:00') AS before16,
SUM(_time > '16:00') AS after16
FROM messages
GROUP BY user_id, _date
UNION
SELECT
user_id,
date_range,
0 AS before16,
0 AS after16
FROM
users,
date_range
ORDER BY user_id, _date ASC

chezy525's solution works great, I ported it to postgresql and removed/renamed some aliases:
select users_and_dates.user_id, users_and_dates._date,
SUM(case when _time <= '16:00' then 1 else 0 end) as before16,
SUM(case when _time > '16:00' then 1 else 0 end) as after16
from (
select messages.user_id, date_range._date
from messages
cross join date_range
group by messages.user_id, date_range._date
) users_and_dates
left join messages on users_and_dates.user_id=messages.user_id
and users_and_dates._date=messages._date
group by users_and_dates.user_id, users_and_dates._date;
and ran on my machine, worked perfectly

Related

Dividing new created columns

orders_table:
orders_id_column | user_id_column | final_status_column
----------------------------------------------------
1 | 4455 | DeliveredStatus
2 | 4455 | DeliveredStatus
3 | 4455 | CanceledStatus
4 | 8888 | CanceledStatus
I want to calculate the total number of orders, and the number of Canceled orders by user_id, and then the cocient between these two, to arrive to something like is:
user_id | total_orders | canceled_orders | cocient
---------------------------------------------------
4455 | 3 | 1 | 0.33
8888 | 1 | 1 | 1.00
I managed to create the first two columns, but not the last one:
SELECT
COUNT(order_id) AS total_orders,
SUM(if(orders.final_status = 'DeliveredStatus', 1, 0)) AS canceled_orders
FROM users
GROUP BY user_id;
You can use an easy approach :
SELECT
user_id,
COUNT(order_id) AS total_orders,
SUM(CASE WHEN final_status = 'CanceledStatus' THEN 1 ELSE 0 END ) AS
canceled_orders,
SUM(CASE WHEN final_status = 'CanceledStatus' THEN 1 ELSE 0 END ) /COUNT(order_id)
as cocient
FROM users
GROUP BY user_id;
Demo: https://www.db-fiddle.com/f/7yUJcuMJPncBBnrExKbzYz/136
You could just use a sub-query.
Then you can refer to the newly created columns, as the outer query exists in a different scope (one where the new columns now exist).
(Thus avoids repeating any logic, and maintaining DRY code.)
SELECT
user_id,
total_orders,
cancelled_orders,
cancelled_orders / total_orders
FROM
(
SELECT
user_id,
COUNT(order_id) AS total_orders,
SUM(if(orders.final_status = 'DeliveredStatus', 1, 0)) AS canceled_orders
FROM
users
GROUP BY
user_id
)
AS per_user
Note, selecting from the users table appears to be a typo in your example. It would appear that you should select from the orders table...

Select all rows with multiple occurrences - on same day

I have a single MySQL table with the name 'checkins' and 4 columns.
id | userIDFK | checkin_datetime | shopId
------------------------------------------------
1 | 1 | 2018-01-18 09:44:00 | 3
2 | 2 | 2018-01-18 10:32:00 | 3
3 | 3 | 2018-01-18 11:19:00 | 3
4 | 1 | 2018-01-18 17:57:00 | 3
5 | 1 | 2018-01-18 16:31:00 | 1
6 | 1 | 2018-01-19 08:31:00 | 3
Basically I want to find rows where users have checked-in more than once (>=2) on the same day and the same shop. So for instance if a user checks-in as in rows with ids 1 and 4 (same user, same day, same shop), the query should return a hit with the the entire rows (id, userIDFK, checkin_datetime, shopId). Hope this makes sense.
I already tried using
SELECT id, userIDFK, checkin_datetime, shopId
FROM (
SELECT * FROM 'checkins' WHERE COUNT(userIDFK)>=2 AND COUNT(shopId)>=2
)
The same day part I have no clew how to do it, and I know this query is way off, but this is the best I could.
You can try grouping by userId checkin_date and shopID
SELECT userIDFK, checkin_datetime, shopId,COUNT(SHOPiD)
FROM checkins
GROUP BY userIDFK, DATE(checkin_datetime), shopId
HAVING COUNT(SHOPID)>1
EDIT
You can include a subquery to get all lines:
select b.id,b.userIDFK, b.checkin_datetime, b.shopId
from checkins b
where (SELECT COUNT(SHOPiD)
FROM checkins a
where a.userIDFK=b.userIDFK and date(a.checkin_datetime)=date(b.checkin_datetime) and a.shopId=b.a.shopId
GROUP BY userIDFK, DATE(checkin_datetime), shopId)>1
GROUPBY can be used to get the multiple occurrences.
SELECT id, userIDFK, checkin_datetime, shopId
FROM checkins
GROUP BY userIDFK, DATE(checkin_datetime), shopId
HAVING count(id) > 1;
Hope it helps!
EDIT:
Using inner join you can achieve it. Here is the query:
SELECT c1.* FROM checkins c1 INNER JOIN checkins c2
ON c1.userIDFK = c2.userIDFK
AND date(c1.checkin_datetime) = date(c2.checkin_datetime)
AND c1.shopId = c2.shopId
AND c1.id != c2.id
Cheers!!

select two tables mysql without join

There are two tables, recharge and purchase.
select * from recharge;
+-----+------+--------+---------------------+
| idx | user | amount | created |
+-----+------+--------+---------------------+
| 1 | 3 | 10 | 2016-01-09 20:16:18 |
| 2 | 3 | 5 | 2016-01-09 20:16:45 |
+-----+------+--------+---------------------+
select * from purchase;
+-----+------+----------+---------------------+
| idx | user | resource | created |
+-----+------+----------+---------------------+
| 1 | 3 | 2 | 2016-01-09 20:55:30 |
| 2 | 3 | 1 | 2016-01-09 20:55:30 |
+-----+------+----------+---------------------+
I want to figure out balance of users which is SUM(amount) - COUNT(purchase.idx). (in this case, 13)
So I had tried
SELECT (SUM(`amount`)-COUNT(purchase.idx)) AS balance
FROM `recharge`, `purchase`
WHERE purchase.user = 3 AND recharge.user = 3
but, it returned error.
If you want an accurate count, then aggregate before doing arithmetic. For your particular case:
select ((select sum(r.amount) from recharge where r.user = 3) -
(select count(*) from purchase p where p.user = 3)
)
To do this for multiple users, move the subqueries to the from clause or use union all and aggregation. The second is safer if a user might only be in one table:
select user, coalesce(sum(suma), 0) - coalesce(sum(countp), 0)
from ((select user, sum(amount) as suma, null as countp
from recharge
group by user
) union all
(select user, null, count(*)
from purchase
group by user
)
) rp
group by user
It is possible to using union like this
SELECT SUM(`amount`-aidx) AS balance
FROM(
SELECT SUM(`amount`) as amount, 0 as aidx
from `recharge` where recharge.user = 3
union
select 0 as amount, COUNT(purchase.idx) as aidx
from `purchase`
WHERE purchase.user = 3 )a

Get total count of records with a mysql join and 2 tables

I have 2 tables that I am trying to join but I am not sure how to make it the most time efficient.
Tasks Table:
nid | created_by | claimed_by | urgent
1 | 11 | 22 | 1
2 | 22 | 33 | 1
3 | 33 | 11 | 1
1 | 11 | 43 | 0
1 | 11 | 44 | 1
Employee Table:
userid | name
11 | EmployeeA
22 | EmployeeB
33 | EmployeeC
Result I am trying to get:
userid | created_count | claimed_count | urgent_count
11 | 3 | 1 | 3
22 | 1 | 1 | 2
33 | 1 | 1 | 2
created_account column will show total # of tasks created by that user.
claimed_count column will show total # of tasks claimed by that user.
urgent_count column will show total # of urgent tasks (created or claimed) by that user.
Thanks in advance!
I would start by breaking this up into pieces and then putting them back together. You can get the created_count and claimed_count using simple aggregation like this:
SELECT created_by, COUNT(*) AS created_count
FROM myTable
GROUP BY created_by;
SELECT claimed_by, COUNT(*) AS claimed_count
FROM myTable
GROUP BY claimed_by;
To get the urgent count for each employee, I would join the two tables on the condition that the employee is either the created_by or claimed_by column, and group by employee. Instead of counting, however, I would use SUM(). I am doing this because it appears each row will be either 0 or 1, so SUM() will effectively count all non-zero rows:
SELECT e.userid, SUM(t.urgent)
FROM employee e
JOIN task t ON e.userid IN (t.created_by, t.claimed_by)
GROUP BY e.userid;
Now that you have all the bits of data you need, you can use an outer join to join all of those subqueries to the employees table to get their counts. You can use the COALESCE() function to replace any null counts with 0:
SELECT e.userid, COALESCE(u.urgent_count, 0) AS urgent_count, COALESCE(crt.created_count, 0) AS created_count, COALESCE(clm.claimed_count, 0) AS claimed_count
FROM employee e
LEFT JOIN(
SELECT e.userid, SUM(t.urgent) AS urgent_count
FROM employee e
JOIN task t ON e.userid IN (t.created_by, t.claimed_by)
GROUP BY e.userid) u ON u.userid = e.userid
LEFT JOIN(
SELECT claimed_by, COUNT(*) AS claimed_count
FROM task
GROUP BY claimed_by) clm ON clm.claimed_by = e.userid
LEFT JOIN(
SELECT created_by, COUNT(*) AS created_count
FROM task
GROUP BY created_by) crt ON crt.created_by = e.userid;
Here is an SQL Fiddle example.

Mysql: event table :chronological consecutive join

I have the following table:
user_id | Membership_type | start_Date
1 | 1 | 1
1 | 1 | 2
1 | 2 | 3
1 | 3 | 4
with several users, and i need to find out for each user when the membership type changes and what the change is, in the following format (start date is datetime, put it here in int for ease of understanding)
user_id |Membership_change| change_Date
1 | 1 to 2 | 3
1 | 2 to 3 | 4
I have tried
select m1.user_id, concat(m1.Membership_type, ' to ',m2.Membership_type), m2.start_date
from table_membership m1
join table_membership m2
on m1.user_id=m2.user_id and m1.start_date<m2.start_date and m1.membership_type<>m2.membership_type
but this will return
user_id |Membership_change| change_Date
1 | 1 to 2 | 3
1 | 1 to 2 | 3
1 | 1 to 3 | 4
1 | 2 to 3 | 4
The duplicate 1 to 2 is not a problem to remove through a grouping, but I cannot seem to be able to think of a way to avoid having the 1 to 3 result. I basically just need to join chronologically from one membership to the next
Any ideas would be appreciated!
Edit: Had an idea to add the column m1.start_date and group by account_id and m1.start_date, so I would only get the first row where each entry is joined. Also a pre-sort by date before the joins, to make sure they are all in order. Will test.
You are missing GROUP BY
select
m1.user_id,
concat(m1.Membership_type, ' to ',m2.Membership_type),
m2.start_date
from table_membership m1
join table_membership m2
on m1.user_id = m2.user_id
and m1.start_date < m2.start_date
and m1.membership_type <> m2.membership_type
GROUP BY user_id, Membership_change, change_Date
Had an idea to add the column m1.start_date and group by account_id and m1.start_date, so I would only get the first row where each entry is joined. Also a pre-sort by date before the joins, to make sure they are all in order.
select m.user_id, m.membership_change, m.change_date from
(
select
m1.user_id,
concat(m1.Membership_type, ' to ',m2.Membership_type) as membership_change,
m2.start_date as change_date,
m1.start_date
from (select * from table_membership order by start_date asc)m1
join (select * from table_membership order by start_date asc)m2
on m1.user_id = m2.user_id
and m1.start_date < m2.start_date
and m1.membership_type <> m2.membership_type
GROUP BY m1.user_id, m1.start_Date
)m group by 1,2,3