How to query number of changes in a column in MySQL - mysql

I have a table that stores items with two properties. So the table has three columns:
item_id | property_1 | property_2 | insert_time
1 | 10 | 100 | 2012-08-24 00:00:01
1 | 11 | 100 | 2012-08-24 00:00:02
1 | 11 | 101 | 2012-08-24 00:00:03
2 | 20 | 200 | 2012-08-24 00:00:04
2 | 20 | 201 | 2012-08-24 00:00:05
2 | 20 | 200 | 2012-08-24 00:00:06
That is, each time either property of any item changes, a new row is inserted. There is also a column storing the insertion time. Now I want to get the number of changes in property_2. For the table above, I should get
item_id | changes_in_property_2
1 | 2
2 | 3
How can I get this?

This will tell you how many distinct values were entered. If it was changed back to a previous value, it will not be counted as a new change, though. Without a chronology to your data, hard to do much more.
select item_id, count(distinct property_2)
from Table1
group by item_id

Here is the closest that I could get to your desired result. I should note however, that you are asking for the number of changes to property_2 based on item_id. If you are analyzing strictly those two columns, then there is only 1 change for item_id 1 and 2 changes for item_id 2. You would need to expand your result to aggregate by property_1. Hopefully, this fiddle will show you why.
SELECT a.item_id,
SUM(
CASE
WHEN a.property_2 <>
(SELECT property_2 FROM tbl b
WHERE b.item_id = a.item_id AND b.insert_time > a.insert_time LIMIT 1) THEN 1
ELSE 0
END) AS changes_in_property_2
FROM tbl a
GROUP BY a.item_id

My take :
SELECT
i.item_id,
SUM(CASE WHEN i.property_1 != p.property_1 THEN 1 ELSE 0 END) + 1
AS changes_1,
SUM(CASE WHEN i.property_2 != p.property_2 THEN 1 ELSE 0 END) + 1
AS changes_2
FROM items i
LEFT JOIN items p
ON p.time =
(SELECT MAX(q.insert_time) FROM items q
WHERE q.insert_time < i.insert_time AND i.item_id = q.item_id)
GROUP BY i.item_id;
There is one entry for each item that is not selected in i, the one that has no predecessor. It counts for a change though, that's why the sums are incremented.

I would do it this way, with user-defined variables to keep track of the previous row's value.
SELECT item_id, MAX(c) AS changes_in_property_2
FROM (
SELECT IF(#i = item_id, IF(#p = property_2, #c, #c:=#c+1), #c:=1) AS c,
(#i:=item_id) AS item_id,
(#p:=property_2)
FROM `no_one_names_their_table_in_sql_questions` AS t,
(SELECT #i:=0, #p:=0) AS _init
ORDER BY insert_time
) AS sub
GROUP BY item_id;

Related

Dividing new created columns

orders_table:
orders_id_column | user_id_column | final_status_column
----------------------------------------------------
1 | 4455 | DeliveredStatus
2 | 4455 | DeliveredStatus
3 | 4455 | CanceledStatus
4 | 8888 | CanceledStatus
I want to calculate the total number of orders, and the number of Canceled orders by user_id, and then the cocient between these two, to arrive to something like is:
user_id | total_orders | canceled_orders | cocient
---------------------------------------------------
4455 | 3 | 1 | 0.33
8888 | 1 | 1 | 1.00
I managed to create the first two columns, but not the last one:
SELECT
COUNT(order_id) AS total_orders,
SUM(if(orders.final_status = 'DeliveredStatus', 1, 0)) AS canceled_orders
FROM users
GROUP BY user_id;
You can use an easy approach :
SELECT
user_id,
COUNT(order_id) AS total_orders,
SUM(CASE WHEN final_status = 'CanceledStatus' THEN 1 ELSE 0 END ) AS
canceled_orders,
SUM(CASE WHEN final_status = 'CanceledStatus' THEN 1 ELSE 0 END ) /COUNT(order_id)
as cocient
FROM users
GROUP BY user_id;
Demo: https://www.db-fiddle.com/f/7yUJcuMJPncBBnrExKbzYz/136
You could just use a sub-query.
Then you can refer to the newly created columns, as the outer query exists in a different scope (one where the new columns now exist).
(Thus avoids repeating any logic, and maintaining DRY code.)
SELECT
user_id,
total_orders,
cancelled_orders,
cancelled_orders / total_orders
FROM
(
SELECT
user_id,
COUNT(order_id) AS total_orders,
SUM(if(orders.final_status = 'DeliveredStatus', 1, 0)) AS canceled_orders
FROM
users
GROUP BY
user_id
)
AS per_user
Note, selecting from the users table appears to be a typo in your example. It would appear that you should select from the orders table...

Select promoted items grouped by another attribute

From table like below:
id | node_id | promoted | group_type | created_at |status
------------------------------------------------------------------
8 | 4321 | 1 | 3 | 2018-01-08 13:29:55| 1
4 | 4321 | 0 | 3 | 2018-01-06 11:22:53| 1
3 | 4321 | 0 | 1 | 2018-01-05 23:19:02| 1
2 | 4321 | 1 | 1 | 2018-01-05 21:20:15| 1
1 | 4321 | 1 | 3 | 2018-01-05 11:09:51| 1
I have to get one id and group_type values per each group_type.
If there is promoted item in the group, query should return it's id and group_type.
If there are more than one promoted items in the group, most recent promoted record should be returned.
If there is no promoted item in the group, query should return most recent record.
Using query below I managed to get almost what I need
SELECT a.id, a.group_type, a.promoted, a.created_at
FROM (
SELECT group_type, MAX(promoted) AS max_promoted
FROM nodes
WHERE node_id=4321 AND status=1
GROUP BY group_type
) AS g
INNER JOIN nodes AS a
ON a.group_type = g.group_type AND a.promoted = g.max_promoted
WHERE node_id= 4321 AND status=1 ORDER BY created_at
Unfortunately when there is more than one promoted item in the group I get both.
Any idea how to get only one promoted item per group?
EDIT:
If there is more than one group, query should return multiple rows but one per every group.
You can limit the result of the query by adding LIMIT 0,1 at the end of the query.
As you have ordered your result it will works.
For more information about LIMIT see : https://dev.mysql.com/doc/refman/5.7/en/limit-optimization.html
Edited: You should order items in descending to get the latest one on top and limit items as per required i.e. 1 or 2 and so on. Also union will help in getting latest result either promoted in case not promoted. The last limit will result only single (required) row. Here's your query:
(SELECT a.id, a.group_type, a.promoted, a.created_at
FROM (
SELECT group_type, MAX(promoted) AS max_promoted
FROM nodes
WHERE node_id=4321 and status=1
GROUP BY group_type
) AS g
INNER JOIN nodes AS a
ON a.group_type = g.group_type AND a.promoted = g.max_promoted
WHERE node_id= 4321 and status=1 ORDER BY created_at desc
limit 1)
union
(select a.id, a.group_type, a.promoted, a.created_at from nodes a order by created_at desc limit 1)
limit 1
Hope it helps!

MySQL SUM previous row by date column using Union

I am hoping I am just stumped because its the end of the work day on a Monday, and someone here can give me a hand.
Basically I have 2 tables that have invoice information and a table that has payment information. Using the following I get the first part of my display.
SELECT d.id, i.id as invid, i.company_id, d.total, created, adjustment FROM tbl_finance_invoices as i
LEFT JOIN tbl_finance_invoice_details as d ON d.invoice_id = i.id
WHERE company_id = '69350'
UNION
SELECT id, 0, comp_id, amount_paid, uploaded_date, 'paid' FROM tbl_finance_invoice_paid_items
WHERE comp_id = '69350'
ORDER BY created
What I want to do is:
Create a new column called "Balance" that adds total to the previous total by the created column regardless of how the rest of the table is sorted.
To give a quick example, my current output is something like:
id | invid | company_id | total | created | adjustment
12 | 16 | 1 | 40 | 01/01/16| 0
100| 0 | 1 | 10 | 01/05/16| 0
50 | 20 | 1 | 50 | 05/01/16| 0
What my goal is would be:
id | invid | company_id | total | created | adjustment | balance |Notes
12 | 16 | 1 | 40 | 01/01/16| 0 | 40 | 0 + 40
100| 0 | 1 | 10 | 01/05/16| 1 | 50 | 40 + 10
50 | 20 | 1 | 50 | 05/01/16| 0 | 100 | 50 + 50
And regardless of sorting by id, invid, total, created, etc, the balance would always be tied to the created date.
So if I added a "Where adjustment = '1'" to my sql, I would get:
100| 0 | 1 | 10 | 01/05/16| 1 | 50 | 40 + 10
Since the OP confirmed my understanding in comments, I'm basing my answer on the following assumption:
The running total would be tied to the order of created_date. The
running total would only be affected by company id as a filtering
criterion, all other filters should be disregarded for that
calculation.
Since the running total may have a different order by and filtering criteria than the rest of the query, therefore the running total calculation has to be placed in a subquery.
The other assumption I have to make is that there cannot be more than one invoice with the same created date for a single customer id, since the original query in the OP does not have any group by or summing either.
I prefer to use the approach suggested by #OMG Ponies in this post on SO, where he initiates the mysql variable holding the running total in a subquery, thus there is no need to initialize the variable in a separate set statement.
SELECT d.id, i.id as invid, i.company_id, rt.total, rt.cumulative_sum, rt.created, adjustment
FROM tbl_finance_invoices as i
LEFT JOIN tbl_finance_invoice_details as d ON d.invoice_id = i.id
LEFT JOIN
(SELECT d.total, created, #running_total := #running_total + t.count AS cumulative_sum
FROM tbl_finance_invoices as i
LEFT JOIN tbl_finance_invoice_details as d ON d.invoice_id = i.id
JOIN (SELECT #running_total := 0) r -- no join condition, so this produces a carthesian join
WHERE company_id = '69350'
ORDER BY created) rt
ON i.created=rt.created --this is also an assumption, I do not know which original table holds the created field
WHERE company_id = '69350' and adjustment=1
ORDER BY d.id
If you need to take the amounts from the tbl_finance_invoice_paid_items into account as well, then you need to add that to the subquery.

How to include dates with zero messages into the resultset anyway?

I have the following table with messages:
+---------+---------+------------+----------+
| msg_id | user_id | m_date | m_time |
+-------------------+------------+----------+
| 1 | 1 | 2011-01-22 | 06:23:11 |
| 2 | 1 | 2011-01-23 | 16:17:03 |
| 3 | 1 | 2011-01-23 | 17:05:45 |
| 4 | 2 | 2011-01-22 | 23:58:13 |
| 5 | 2 | 2011-01-23 | 23:59:32 |
| 6 | 2 | 2011-01-24 | 21:02:41 |
| 7 | 3 | 2011-01-22 | 13:45:00 |
| 8 | 3 | 2011-01-23 | 13:22:34 |
| 9 | 3 | 2011-01-23 | 18:22:34 |
| 10 | 3 | 2011-01-24 | 02:22:22 |
| 11 | 3 | 2011-01-24 | 13:12:00 |
+---------+---------+------------+----------+
What I want is for each day, to see how many messages each user has sent BEFORE and AFTER 16:00:
SELECT
user_id,
m_date,
SUM(m_time <= '16:00') AS before16,
SUM(m_time > '16:00') AS after16
FROM messages
GROUP BY user_id, m_date
ORDER BY user_id, m_date ASC
This produces:
user_id m_date before16 after16
-------------------------------------
1 2011-01-22 1 0
1 2011-01-23 0 2
2 2011-01-22 0 1
2 2011-01-23 0 1
2 2011-01-24 0 1
3 2011-01-22 1 0
3 2011-01-23 1 1
3 2011-01-24 2 0
Because user 1 has written no messages on 2011-01-24, this date is not in the resultset. However, this is undesirable. I have a second table in my database, called "date_range":
+---------+------------+
| date_id | d_date |
+---------+------------+
| 1 | 2011-01-21 |
| 1 | 2011-01-22 |
| 1 | 2011-01-23 |
| 1 | 2011-01-24 |
+---------+------------+
I want to check the "messages" against this table. For each user, all these dates have to be in the resultset. As you can see, none of the users have written messages on 2011-01-21, and as said, user 1 has no messages on 2011-01-24. The desired output of the query would be:
user_id d_date before16 after16
-------------------------------------
1 2011-01-21 0 0
1 2011-01-22 1 0
1 2011-01-23 0 2
1 2011-01-24 0 0
2 2011-01-21 0 0
2 2011-01-22 0 1
2 2011-01-23 0 1
2 2011-01-24 0 1
3 2011-01-21 0 0
3 2011-01-22 1 0
3 2011-01-23 1 1
3 2011-01-24 2 0
How can I link the two tables so that the query result also holds rows with zero values for before16 and after16?
Edit: yes, I have a "users" table:
+---------+------------+
| user_id | user_date |
+---------+------------+
| 1 | foo |
| 2 | bar |
| 3 | foobar |
+---------+------------+
Test bed:
create table messages (msg_id integer, user_id integer, _date date, _time time);
create table date_range (date_id integer, _date date);
insert into messages values
(1,1,'2011-01-22','06:23:11'),
(2,1,'2011-01-23','16:17:03'),
(3,1,'2011-01-23','17:05:05');
insert into date_range values
(1, '2011-01-21'),
(1, '2011-01-22'),
(1, '2011-01-23'),
(1, '2011-01-24');
Query:
SELECT p._date, p.user_id,
coalesce(m.before16, 0) b16, coalesce(m.after16, 0) a16
FROM
(SELECT DISTINCT user_id, dr._date FROM messages m, date_range dr) p
LEFT JOIN
(SELECT user_id, _date,
SUM(_time <= '16:00') AS before16,
SUM(_time > '16:00') AS after16
FROM messages
GROUP BY user_id, _date
ORDER BY user_id, _date ASC) m
ON p.user_id = m.user_id AND p._date = m._date;
EDIT:
Your initial query is left as is, I hope it doesn't requires any explanations;
SELECT DISTINCT user_id, dr._date FROM messages m, date_range dr will return a cartesian or CROSS JOIN of two tables, which will give me all required date range for each user in subject. As I'm interested in each pair only once, I use DISTINCT clause. Try this query with and without it;
Then I use LEFT JOIN on two sub-selects.
This join means: first, INNER join is performed, i.e. all rows with matching fields in the ON condition are returned. Then, for each row in the left-side relation of the join that has no matches on the right side, return NULLs (thus the name, LEFT JOIN, i.e. left relation is always there and right is expected to have NULLs). This join will do what you expect — return user_id + date combinations even if there were no messages in the given date for a given user. Note that I use user_id + date sub-select first (on the left) and messages query second (on the right);
coalesce() is used to replace NULL with zero.
I hope this clarifies how this query works.
Give this a shot:
select u.user_id, u._date,
sum(_time <= '16:00') as before16,
sum(_time > '16:00') as after16
from (
select m.user_id, d._date
from messages m
cross join date_range d
group by m.user_id, d._date
) u
left join messages m on u.user_id=m.user_id
and u._date=m._date
group by u.user_id, u._date
The inner query is just building a set of all possible/desired user-date pairs. It would be more efficient to use a users table, but you didn't mention that you had one, so I won't assume. otherwise, you just need the left join to not remove the non-joined records.
EDIT
--More detailed explanation: taking the query apart.
Start with the innermost query; the goal is to get a list of all desired dates for every user. Since there's a table of users and a table of dates it can look like this:
select distinct u.user_id, d.d_date
from users u
cross join date_range d
The key here is the cross join, taking every row in the users table and associating it with every row in the date_range table. The distinct keyword is really just a shorthand for a group by on all columns, and is here just in case there's duplicated data.
Note that there are several other methods of getting this same result set (like in my original query), but this is probably the simplest from both a logical and computational standpoint.
Really, the only other steps are to add the left join (associating all of the rows we got above to all available data, and not removing anything that doesn't have any data) and the group by and select components which are basically the same as you had before. So, putting everything together it looks like this:
select t.user_id, t.d_date,
sum(m.m_time <= '16:00') as before16,
sum(m.m_time > '16:00') as after16
from (
select distinct u.user_id, d.d_date
from users u
cross join date_range d
) t
left join messages m on t.user_id = m.user_id
and t.d_date = m.m_date
group by t.user_id, t.d_date
Based on some other comments/questions, note the explicit use of prefixes for all uses of all tables and sub-queries (which is pretty straight forward since we're not using any table more than once anymore): u for the users table, d for the date_range table, t for the sub-query containing the dates to use for each user, and m for the message table. This is probably where my first explanation fell a little short, since I used the message table twice, both times with the same prefix. It works there because of the context of both uses (one was in a sub-query), but it probably isn't the best practice.
It is not neat. But if you have a user table. Then maybe something like this:
SELECT
user_id,
_date,
SUM(_time <= '16:00') AS before16,
SUM(_time > '16:00') AS after16
FROM messages
GROUP BY user_id, _date
UNION
SELECT
user_id,
date_range,
0 AS before16,
0 AS after16
FROM
users,
date_range
ORDER BY user_id, _date ASC
chezy525's solution works great, I ported it to postgresql and removed/renamed some aliases:
select users_and_dates.user_id, users_and_dates._date,
SUM(case when _time <= '16:00' then 1 else 0 end) as before16,
SUM(case when _time > '16:00' then 1 else 0 end) as after16
from (
select messages.user_id, date_range._date
from messages
cross join date_range
group by messages.user_id, date_range._date
) users_and_dates
left join messages on users_and_dates.user_id=messages.user_id
and users_and_dates._date=messages._date
group by users_and_dates.user_id, users_and_dates._date;
and ran on my machine, worked perfectly

MySQL: How to return only one row based on criteria within a resultset

I have the following table:
id | group | value
1 | 1 | 10
2 | 1 | 20
3 | 1 | 30
4 | 0 | 20
5 | 0 | 20
6 | 0 | 10
I want to return the highest value where the group is 1 (=30) and all of the values where the group is 0, into one resultset.
I have to do this in one statement, and I guess I should use an IF statement within a SELECT statement, but I can't work out how. Can anyone help to point me in the right direction?
(select max(value) from the_table where group = 1)
union
(select value from the_table where group = 0)
If (group +value) is unique, you can also do it without union (as proposed by Ray Toal)
SELECT a.value
FROM table1 a
WHERE a.`group`=0 or (a.`group`=1 AND a.value =
(SELECT MAX(value) FROM table1 b WHERE b.`group`=1))