Dividing new created columns - mysql

orders_table:
orders_id_column | user_id_column | final_status_column
----------------------------------------------------
1 | 4455 | DeliveredStatus
2 | 4455 | DeliveredStatus
3 | 4455 | CanceledStatus
4 | 8888 | CanceledStatus
I want to calculate the total number of orders, and the number of Canceled orders by user_id, and then the cocient between these two, to arrive to something like is:
user_id | total_orders | canceled_orders | cocient
---------------------------------------------------
4455 | 3 | 1 | 0.33
8888 | 1 | 1 | 1.00
I managed to create the first two columns, but not the last one:
SELECT
COUNT(order_id) AS total_orders,
SUM(if(orders.final_status = 'DeliveredStatus', 1, 0)) AS canceled_orders
FROM users
GROUP BY user_id;

You can use an easy approach :
SELECT
user_id,
COUNT(order_id) AS total_orders,
SUM(CASE WHEN final_status = 'CanceledStatus' THEN 1 ELSE 0 END ) AS
canceled_orders,
SUM(CASE WHEN final_status = 'CanceledStatus' THEN 1 ELSE 0 END ) /COUNT(order_id)
as cocient
FROM users
GROUP BY user_id;
Demo: https://www.db-fiddle.com/f/7yUJcuMJPncBBnrExKbzYz/136

You could just use a sub-query.
Then you can refer to the newly created columns, as the outer query exists in a different scope (one where the new columns now exist).
(Thus avoids repeating any logic, and maintaining DRY code.)
SELECT
user_id,
total_orders,
cancelled_orders,
cancelled_orders / total_orders
FROM
(
SELECT
user_id,
COUNT(order_id) AS total_orders,
SUM(if(orders.final_status = 'DeliveredStatus', 1, 0)) AS canceled_orders
FROM
users
GROUP BY
user_id
)
AS per_user
Note, selecting from the users table appears to be a typo in your example. It would appear that you should select from the orders table...

Related

Calculate unique items seen by users via sql

I need help to resolve the next case.
The data which users want to see is accessible by pagination requests and later these requests are stored in the database in the next form:
+----+---------+-------+--------+
| id | user id | first | amount |
+----+---------+-------+--------+
| 1 | 1 | 0 | 5 |
| 2 | 1 | 10 | 10 |
| 3 | 1 | 10 | 5 |
| 4 | 1 | 15 | 10 |
| 5 | 2 | 0 | 10 |
| 6 | 2 | 0 | 5 |
| 7 | 2 | 10 | 5 |
+----+---------+-------+--------+
The table is ordered by user id asc, first asc, amount desc.
The task is to write the SQL statement which calculate what total unique amount of data the user has seen.
For the first user total amount must be 20, since the request with id=1 returned first 5 items, with id=2 returned another 10 items. Request with id=3 returns data already 'seen' by request with id=2. Request with id=4 intersects with id=2, but still returns 5 'unseen' pieces of data.
For the second user total amount must be 15.
As a result of SQL statement, I should get the next output:
+---------+-------+
| user id | total |
+---------+-------+
| 1 | 20 |
+---------+-------+
| 2 | 15 |
+---------+-------+
I am using MySQL 5.7, so window functions are not available for me. I stuck with this task for a day already and still cannot get the desired output. If it is not possible with this setup, I will end up calculating the results in the application code. I would appreciate any suggestions or help with resolving this task, thank you!
This is a type of gaps and islands problem. In this case, use a cumulative max to determine if one request intersects with a previous request. If not, that is the beginning of an "island" of adjacent requests. A cumulative sum of the beginnings assigns an "island", then an aggregation counts each island.
So, the islands look like this:
select userid, min(first), max(first + amount) as last
from (select t.*,
sum(case when prev_last >= first then 0 else 1 end) over
(partition by userid order by first) as grp
from (select t.*,
max(first + amount) over (partition by userid order by first range between unbounded preceding and 1 preceding) as prev_last
from t
) t
) t
group by userid, grp;
You then want this summed by userid, so that is one more level of aggregation:
with islands as (
select userid, min(first) as first, max(first + amount) as last
from (select t.*,
sum(case when prev_last >= first then 0 else 1 end) over
(partition by userid order by first) as grp
from (select t.*,
max(first + amount) over (partition by userid order by first range between unbounded preceding and 1 preceding) as prev_last
from t
) t
) t
group by userid, grp
)
select userid, sum(last - first) as total
from islands
group by userid;
Here is a db<>fiddle.
This logic is similar to Gordon's, but runs on older releases of MySQL, too.
select userid
-- overall length minus gaps
,max(maxlast)-min(minfirst) + sum(gaplen) as total
from
(
select userid
,prevlast
,min(first) as minfirst -- first of group
,max(last) as maxlast -- last of group
-- if there was a gap, calculate length of gap
,min(case when prevlast < first then prevlast - first else 0 end) as gaplen
from
(
select t.*
,first + amount as last -- last value in range
,( -- maximum end of all previous rows
select max(first + amount)
from t as t2
where t2.userid = t.userid
and t2.first < t.first
) as prevlast
from t
) as dt
group by userid, prevlast
) as dt
group by userid
order by userid
See fiddle

MySql Sum different types of expenses from 'expense' table based on value in 'expense type' group by employee

A more generic title for this post would be
MySql Sum different columns in same table based on value of another row, group by yet another row
I have a table of employee expenses:
id | employee_id | expense_cat_id | expense_amount |
1 | 11 | 1 | 100 |
2 | 11 | 1 | 200 |
3 | 12 | 1 | 120 |
4 | 12 | 1 | 140 |
5 | 11 | 2 | 5 |
6 | 12 | 2 | 8 |`
and I want to produce a report like this:
Employee Id | Expense Cat 1 Total Amount | Expense Cat 2 Total Amount
11 | 300 | 5
12 | 260 | 8
So initially I thought I could use 2 table aliases for the same table like this:
SELECT
employee_id,
sum(expense_cat_1.expense_amount) as expense_1_total,
sum(expense_cat_2.expense_amount) as expense_2_total
FROM
expenses as expense_cat_1 where expense_cat_1.expense_cat_id=1 ,
expenses as expense_cat_2 where expense_cat_2.expense_cat_id=2
group by employee_id
but this was not correct Sql Syntax, which makes sense to me.
So I thought I could do two joins on between employee table and the expenses table:
SELECT
employees.id as employee_id,
sum(expenses_cat_1.expense_amount) as expense_1_total,
sum(expenses_cat_2.expense_amount) as expense_2_total
FROM employees
join expenses as expenses_cat_1 on employees.id = expenses_cat_1.employee_id and expenses_cat_1.expense_cat_id=1
join expenses as expenses_cat_2 on employees.id = expenses_cat_2.employee_id and expenses_cat_2.expense_cat_id=2
group by employees.id
Which comes close, but is wrong:
employee_id | expense_1_total | expense_2_total
11 | 300 | 10
12 | 260 | 16
as the expense 2 total is doubled! I think this is because the join on shows up two rows for each of the two expenses with category 1, and sums them.
I also tried a sub-query approach:
SELECT (SELECT sum(expense_amount)
FROM expenses
WHERE expense_cat_id = 1) AS sum1 ,
(SELECT sum(expense_amount)
FROM expenses
WHERE expense_cat_id = 2) AS sum2,
employee_id
FROM expenses group by employee_id
but this has the same problem as the join approach - totals for cat 2 are doubled.
How do I make the second join only include the expense_2_total once ???
I have a personal dislike of sql case statements as they seem more of a procedural language construct (and sql is declarative), but am happy to consider their use in this case - but I put the challenge out there for sql experts to solve this elegantly.
You are looking for conditional aggregation:
SELECT employee_id,
sum(case when expense_cat_id = 1 then expense_amount else 0 end) as expense_1_total,
sum(case when expense_cat_id = 2 then expense_amount else 0 end) as expense_2_total
FROM expenses e
GROUP BY employee_id;

select two tables mysql without join

There are two tables, recharge and purchase.
select * from recharge;
+-----+------+--------+---------------------+
| idx | user | amount | created |
+-----+------+--------+---------------------+
| 1 | 3 | 10 | 2016-01-09 20:16:18 |
| 2 | 3 | 5 | 2016-01-09 20:16:45 |
+-----+------+--------+---------------------+
select * from purchase;
+-----+------+----------+---------------------+
| idx | user | resource | created |
+-----+------+----------+---------------------+
| 1 | 3 | 2 | 2016-01-09 20:55:30 |
| 2 | 3 | 1 | 2016-01-09 20:55:30 |
+-----+------+----------+---------------------+
I want to figure out balance of users which is SUM(amount) - COUNT(purchase.idx). (in this case, 13)
So I had tried
SELECT (SUM(`amount`)-COUNT(purchase.idx)) AS balance
FROM `recharge`, `purchase`
WHERE purchase.user = 3 AND recharge.user = 3
but, it returned error.
If you want an accurate count, then aggregate before doing arithmetic. For your particular case:
select ((select sum(r.amount) from recharge where r.user = 3) -
(select count(*) from purchase p where p.user = 3)
)
To do this for multiple users, move the subqueries to the from clause or use union all and aggregation. The second is safer if a user might only be in one table:
select user, coalesce(sum(suma), 0) - coalesce(sum(countp), 0)
from ((select user, sum(amount) as suma, null as countp
from recharge
group by user
) union all
(select user, null, count(*)
from purchase
group by user
)
) rp
group by user
It is possible to using union like this
SELECT SUM(`amount`-aidx) AS balance
FROM(
SELECT SUM(`amount`) as amount, 0 as aidx
from `recharge` where recharge.user = 3
union
select 0 as amount, COUNT(purchase.idx) as aidx
from `purchase`
WHERE purchase.user = 3 )a

How to query number of changes in a column in MySQL

I have a table that stores items with two properties. So the table has three columns:
item_id | property_1 | property_2 | insert_time
1 | 10 | 100 | 2012-08-24 00:00:01
1 | 11 | 100 | 2012-08-24 00:00:02
1 | 11 | 101 | 2012-08-24 00:00:03
2 | 20 | 200 | 2012-08-24 00:00:04
2 | 20 | 201 | 2012-08-24 00:00:05
2 | 20 | 200 | 2012-08-24 00:00:06
That is, each time either property of any item changes, a new row is inserted. There is also a column storing the insertion time. Now I want to get the number of changes in property_2. For the table above, I should get
item_id | changes_in_property_2
1 | 2
2 | 3
How can I get this?
This will tell you how many distinct values were entered. If it was changed back to a previous value, it will not be counted as a new change, though. Without a chronology to your data, hard to do much more.
select item_id, count(distinct property_2)
from Table1
group by item_id
Here is the closest that I could get to your desired result. I should note however, that you are asking for the number of changes to property_2 based on item_id. If you are analyzing strictly those two columns, then there is only 1 change for item_id 1 and 2 changes for item_id 2. You would need to expand your result to aggregate by property_1. Hopefully, this fiddle will show you why.
SELECT a.item_id,
SUM(
CASE
WHEN a.property_2 <>
(SELECT property_2 FROM tbl b
WHERE b.item_id = a.item_id AND b.insert_time > a.insert_time LIMIT 1) THEN 1
ELSE 0
END) AS changes_in_property_2
FROM tbl a
GROUP BY a.item_id
My take :
SELECT
i.item_id,
SUM(CASE WHEN i.property_1 != p.property_1 THEN 1 ELSE 0 END) + 1
AS changes_1,
SUM(CASE WHEN i.property_2 != p.property_2 THEN 1 ELSE 0 END) + 1
AS changes_2
FROM items i
LEFT JOIN items p
ON p.time =
(SELECT MAX(q.insert_time) FROM items q
WHERE q.insert_time < i.insert_time AND i.item_id = q.item_id)
GROUP BY i.item_id;
There is one entry for each item that is not selected in i, the one that has no predecessor. It counts for a change though, that's why the sums are incremented.
I would do it this way, with user-defined variables to keep track of the previous row's value.
SELECT item_id, MAX(c) AS changes_in_property_2
FROM (
SELECT IF(#i = item_id, IF(#p = property_2, #c, #c:=#c+1), #c:=1) AS c,
(#i:=item_id) AS item_id,
(#p:=property_2)
FROM `no_one_names_their_table_in_sql_questions` AS t,
(SELECT #i:=0, #p:=0) AS _init
ORDER BY insert_time
) AS sub
GROUP BY item_id;

How to include dates with zero messages into the resultset anyway?

I have the following table with messages:
+---------+---------+------------+----------+
| msg_id | user_id | m_date | m_time |
+-------------------+------------+----------+
| 1 | 1 | 2011-01-22 | 06:23:11 |
| 2 | 1 | 2011-01-23 | 16:17:03 |
| 3 | 1 | 2011-01-23 | 17:05:45 |
| 4 | 2 | 2011-01-22 | 23:58:13 |
| 5 | 2 | 2011-01-23 | 23:59:32 |
| 6 | 2 | 2011-01-24 | 21:02:41 |
| 7 | 3 | 2011-01-22 | 13:45:00 |
| 8 | 3 | 2011-01-23 | 13:22:34 |
| 9 | 3 | 2011-01-23 | 18:22:34 |
| 10 | 3 | 2011-01-24 | 02:22:22 |
| 11 | 3 | 2011-01-24 | 13:12:00 |
+---------+---------+------------+----------+
What I want is for each day, to see how many messages each user has sent BEFORE and AFTER 16:00:
SELECT
user_id,
m_date,
SUM(m_time <= '16:00') AS before16,
SUM(m_time > '16:00') AS after16
FROM messages
GROUP BY user_id, m_date
ORDER BY user_id, m_date ASC
This produces:
user_id m_date before16 after16
-------------------------------------
1 2011-01-22 1 0
1 2011-01-23 0 2
2 2011-01-22 0 1
2 2011-01-23 0 1
2 2011-01-24 0 1
3 2011-01-22 1 0
3 2011-01-23 1 1
3 2011-01-24 2 0
Because user 1 has written no messages on 2011-01-24, this date is not in the resultset. However, this is undesirable. I have a second table in my database, called "date_range":
+---------+------------+
| date_id | d_date |
+---------+------------+
| 1 | 2011-01-21 |
| 1 | 2011-01-22 |
| 1 | 2011-01-23 |
| 1 | 2011-01-24 |
+---------+------------+
I want to check the "messages" against this table. For each user, all these dates have to be in the resultset. As you can see, none of the users have written messages on 2011-01-21, and as said, user 1 has no messages on 2011-01-24. The desired output of the query would be:
user_id d_date before16 after16
-------------------------------------
1 2011-01-21 0 0
1 2011-01-22 1 0
1 2011-01-23 0 2
1 2011-01-24 0 0
2 2011-01-21 0 0
2 2011-01-22 0 1
2 2011-01-23 0 1
2 2011-01-24 0 1
3 2011-01-21 0 0
3 2011-01-22 1 0
3 2011-01-23 1 1
3 2011-01-24 2 0
How can I link the two tables so that the query result also holds rows with zero values for before16 and after16?
Edit: yes, I have a "users" table:
+---------+------------+
| user_id | user_date |
+---------+------------+
| 1 | foo |
| 2 | bar |
| 3 | foobar |
+---------+------------+
Test bed:
create table messages (msg_id integer, user_id integer, _date date, _time time);
create table date_range (date_id integer, _date date);
insert into messages values
(1,1,'2011-01-22','06:23:11'),
(2,1,'2011-01-23','16:17:03'),
(3,1,'2011-01-23','17:05:05');
insert into date_range values
(1, '2011-01-21'),
(1, '2011-01-22'),
(1, '2011-01-23'),
(1, '2011-01-24');
Query:
SELECT p._date, p.user_id,
coalesce(m.before16, 0) b16, coalesce(m.after16, 0) a16
FROM
(SELECT DISTINCT user_id, dr._date FROM messages m, date_range dr) p
LEFT JOIN
(SELECT user_id, _date,
SUM(_time <= '16:00') AS before16,
SUM(_time > '16:00') AS after16
FROM messages
GROUP BY user_id, _date
ORDER BY user_id, _date ASC) m
ON p.user_id = m.user_id AND p._date = m._date;
EDIT:
Your initial query is left as is, I hope it doesn't requires any explanations;
SELECT DISTINCT user_id, dr._date FROM messages m, date_range dr will return a cartesian or CROSS JOIN of two tables, which will give me all required date range for each user in subject. As I'm interested in each pair only once, I use DISTINCT clause. Try this query with and without it;
Then I use LEFT JOIN on two sub-selects.
This join means: first, INNER join is performed, i.e. all rows with matching fields in the ON condition are returned. Then, for each row in the left-side relation of the join that has no matches on the right side, return NULLs (thus the name, LEFT JOIN, i.e. left relation is always there and right is expected to have NULLs). This join will do what you expect — return user_id + date combinations even if there were no messages in the given date for a given user. Note that I use user_id + date sub-select first (on the left) and messages query second (on the right);
coalesce() is used to replace NULL with zero.
I hope this clarifies how this query works.
Give this a shot:
select u.user_id, u._date,
sum(_time <= '16:00') as before16,
sum(_time > '16:00') as after16
from (
select m.user_id, d._date
from messages m
cross join date_range d
group by m.user_id, d._date
) u
left join messages m on u.user_id=m.user_id
and u._date=m._date
group by u.user_id, u._date
The inner query is just building a set of all possible/desired user-date pairs. It would be more efficient to use a users table, but you didn't mention that you had one, so I won't assume. otherwise, you just need the left join to not remove the non-joined records.
EDIT
--More detailed explanation: taking the query apart.
Start with the innermost query; the goal is to get a list of all desired dates for every user. Since there's a table of users and a table of dates it can look like this:
select distinct u.user_id, d.d_date
from users u
cross join date_range d
The key here is the cross join, taking every row in the users table and associating it with every row in the date_range table. The distinct keyword is really just a shorthand for a group by on all columns, and is here just in case there's duplicated data.
Note that there are several other methods of getting this same result set (like in my original query), but this is probably the simplest from both a logical and computational standpoint.
Really, the only other steps are to add the left join (associating all of the rows we got above to all available data, and not removing anything that doesn't have any data) and the group by and select components which are basically the same as you had before. So, putting everything together it looks like this:
select t.user_id, t.d_date,
sum(m.m_time <= '16:00') as before16,
sum(m.m_time > '16:00') as after16
from (
select distinct u.user_id, d.d_date
from users u
cross join date_range d
) t
left join messages m on t.user_id = m.user_id
and t.d_date = m.m_date
group by t.user_id, t.d_date
Based on some other comments/questions, note the explicit use of prefixes for all uses of all tables and sub-queries (which is pretty straight forward since we're not using any table more than once anymore): u for the users table, d for the date_range table, t for the sub-query containing the dates to use for each user, and m for the message table. This is probably where my first explanation fell a little short, since I used the message table twice, both times with the same prefix. It works there because of the context of both uses (one was in a sub-query), but it probably isn't the best practice.
It is not neat. But if you have a user table. Then maybe something like this:
SELECT
user_id,
_date,
SUM(_time <= '16:00') AS before16,
SUM(_time > '16:00') AS after16
FROM messages
GROUP BY user_id, _date
UNION
SELECT
user_id,
date_range,
0 AS before16,
0 AS after16
FROM
users,
date_range
ORDER BY user_id, _date ASC
chezy525's solution works great, I ported it to postgresql and removed/renamed some aliases:
select users_and_dates.user_id, users_and_dates._date,
SUM(case when _time <= '16:00' then 1 else 0 end) as before16,
SUM(case when _time > '16:00' then 1 else 0 end) as after16
from (
select messages.user_id, date_range._date
from messages
cross join date_range
group by messages.user_id, date_range._date
) users_and_dates
left join messages on users_and_dates.user_id=messages.user_id
and users_and_dates._date=messages._date
group by users_and_dates.user_id, users_and_dates._date;
and ran on my machine, worked perfectly