I have the following sample data:
order_id receipt_id receipt_amount total_already_invoiced_amount
14 36 30 150
14 37 30 150
15 42 30 30
16 45 30 60
16 46 40 60
17 50 40 60
17 51 40 60
17 52 40 60
The column receipt_amount is the amount of an order received for that specific line.
The column total_already_invoiced_amount is the total amount invoiced for an order.
I want to transform this table into a new one which retains only the lines where there is a received amount which is remaining after deducting the total invoiced amount (first in first out).
For example, if I have 3 receipt lines, each of 40, and my total invoiced is 60, then I can figure out that the first receipt line is fully invoiced, the second receipt line has 20 remaining to be invoiced and the third one has not been invoiced at all. I cannot aggregate, I must keep the receipt_id as an index as these can have different dates and I need to be able to distinguish according to that.
The result of such query would be the following:
order_id receipt_id received_not_invoiced_amount
16 46 10
17 51 20
17 52 40
I understand I can select group by order_id to get the aggregated receipt_amount, but it will also aggregate the total_already_invoiced_amount, which is not what I want. I am trying the following but that will not perform the FIFO calculation....
SELECT order_id,
receipt_id,
(total_already_invoiced_amount -
(SELECT receipt_amount FROM X GROUP BY order_id)
) total_already_invoiced_amount
FROM X
WHERE (total_already_invoiced_amount -
(SELECT receipt_amount FROM X GROUP BY order_id)) < 0
I'm a bit lost of where to start with to make this work.
In the absence of Windowing functions (not available in MySQL 5.7), one approach is to do a Self-Join and compute Sum of all the receipts for the order, until the receipt row of the first table. We can then use conditional statements to determine the differences accordingly:
Query #1 View on DB Fiddle
SELECT t1.order_id,
t1.receipt_id,
CASE
WHEN Coalesce(Sum(t2.receipt_amount), 0) <=
t1.total_already_invoiced_amount
THEN 0
ELSE Least(Coalesce(Sum(t2.receipt_amount), 0) -
t1.total_already_invoiced_amount,
t1.receipt_amount)
end AS received_not_invoiced_amount
FROM X t1
LEFT JOIN X t2
ON t2.order_id = t1.order_id
AND t2.receipt_id <= t1.receipt_id
GROUP BY t1.order_id,
t1.receipt_id,
t1.receipt_amount,
t1.total_already_invoiced_amount
HAVING received_not_invoiced_amount > 0;
| order_id | receipt_id | received_not_invoiced_amount |
| -------- | ---------- | ---------------------------- |
| 16 | 46 | 10 |
| 17 | 51 | 20 |
| 17 | 52 | 40 |
For good performance, you can define the following composite index: (order_id, receipt_id).
Another approach is using User-defined Variables. It is like a looping technique, where we calculate rolling (cumulative) sum over order_id, as we move down the receipts. Based on the sum, we determine whether excess payment received or not accordingly. For more detailed explanation on this technique, you may check this answer: https://stackoverflow.com/a/53465139
Query #2 View on DB Fiddle
SELECT order_id,
receipt_id,
received_not_invoiced_amount
FROM (SELECT #s := IF(#o = order_id, #s + receipt_amount, receipt_amount) AS
cum_receipt_amount,
IF(#s <= total_already_invoiced_amount, 0,
Least(#s - total_already_invoiced_amount, receipt_amount)) AS
received_not_invoiced_amount,
#o := order_id AS
order_id
,
receipt_id
FROM (SELECT *
FROM X
ORDER BY order_id,
receipt_id) t1
CROSS JOIN (SELECT #o := 0,
#s := 0) vars) t2
WHERE received_not_invoiced_amount > 0;
| order_id | receipt_id | received_not_invoiced_amount |
| -------- | ---------- | ---------------------------- |
| 16 | 46 | 10 |
| 17 | 51 | 20 |
| 17 | 52 | 40 |
For good performance, you can define the same composite index: (order_id, receipt_id).
You may benchmark both the approaches for best performance.
You want a cumulative sum:
select order_id, receipt_id,
least(running_ra, total_already_invoiced_amount), receipt_amount)
from (select x.*,
sum(receipt_amount) over (partition by order_id order by receipt_id) as running_ra
from x
) x
where running_ra > total_already_invoiced_amount
Related
I have a MYSQL database that keeps track of all the users' daily total scores (and some other similar score/count type metrics like "badgesEarned", I am only including 2 fields here out of the 5 I need to track). It only has data for the days in which a user was active (earning score points or badges). So the db wont have data for every date there is.
Here's a toy example:
Example Database Table: "User"
Now my goal is to get the last 7 days change in score for each user (I also need to do last 30 days and 365 day but let's stick to just 7 for this example). Since the db table stores a snapshot of total scores for all active days for each user, I wrote a SQL query that finds the two appropriate rows/snapshots and gets the difference in score/badges between them. These 2 rows would be the current date row (or if that doesnt exist, use the row just prior to it) vs the (current_date - 7)th row (or if that doesnt exist, use the row just prior to it).
To make matters worse, I also have to keep track of the "ranks" of each player via the dense_rank() SQL method and add that in as a column in the final result table.
There are 2 ways so far that I can achieve this using 2 different SQL queries.
My main question is - is one of these "better" in terms of performance/good practice/efficiency than the other? Or are they both horrendous and I have completely gone down the wrong route to begin with and totally missed a more efficient approach? I am not great with SQL stuff, so apologies in advance if the question and code examples are horrifying:
First Approach:
Use multiple nested subqueries only (no join).
SELECT *, dense_rank() OVER (ORDER BY t3.score DESC) AS ranking
FROM
(
SELECT t1.userId,
(SELECT t2.score
FROM User t2
WHERE t2.date <= CURDATE() AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1)
-
(SELECT t2.score
FROM User t2
WHERE t2.date <= DATE_ADD(CURDATE(), INTERVAL - 7 DAY) AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1) as score,
(SELECT t2.badgesEarned
FROM User t2
WHERE t2.date <= CURDATE() AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1)
-
(SELECT t2.badgesEarned
FROM User t2
WHERE t2.date <= DATE_ADD(CURDATE(), INTERVAL - 7 DAY) AND t2.userId=t1.userId
ORDER BY t2.date DESC LIMIT 1) as badgesEarned
FROM User t1
GROUP BY t1.userId) t3
Second Approach:
Get 2 separate tables for each date point, then do Inner Join to subtract relevant columns.
SELECT *, dense_rank() OVER (ORDER BY T0.score_delta DESC) AS ranking
FROM
(SELECT T1.userId,
(T1.score - T2.score),
(T1.badgesEarned - T2.badgesEarned)
FROM
(select *
from (
select *, row_number() over (partition by userId order by date desc) as ranking
from User
where date<=date_add(CURDATE(),interval -7 day)
) t
where t.ranking = 1) as T2
INNER JOIN
(select *
from (
select *, row_number() over (partition by userId order by date desc) as ranking
from User
where date<=CURDATE()
) t
where t.ranking = 1) as T1
on T1.userId= T2.userId ) T0
Side-question: One of my colleagues was suggesting that I handle the column subtractions in the code itself - like, I would call the database twice, get the two tables (one for CURDATE() and another for CURDATE-7), and then loop through all the User objects and subtract the relevant fields to construct my final result list. I'm not sure if that would be the better approach, so should I be doing that instead of handling it all through the SQL way?
Here's the SQLfiddle of the db if you want to play around with dummy data: http://sqlfiddle.com/#!9/86c58f0/1
Also, the above two code segments run just fine on my MySQL 8.0 workbench with no errors.
I'm not quite getting your expected results. But could you not just work with window functions, in conjunction with the RANGE clause?
I'm just creating the central backbone table, and it will then be up to you to subtract whatever you need to subtract from each other, and finally to dense_rank() what you need to dense_rank(). Basically, I think you need to put a final select, containing DENSE_RANK() , to select from my with_a_week_before in-line table.
WITH
-- your input
usr(userid,dt,score,badgesearned) AS (
SELECT 1234,DATE '2020-08-06', 100, 10
UNION ALL SELECT 1234,DATE '2020-08-07', 120, 12
UNION ALL SELECT 1234,DATE '2020-08-08', 130, 13
UNION ALL SELECT 1234,DATE '2020-08-12', 140, 14
UNION ALL SELECT 1234,DATE '2020-08-14', 150, 15
UNION ALL SELECT 100,DATE '2020-08-05', 100, 10
UNION ALL SELECT 100,DATE '2020-08-10', 100, 10
UNION ALL SELECT 100,DATE '2020-08-14', 200, 10
UNION ALL SELECT 1,DATE '2020-08-05', 140, 14
UNION ALL SELECT 1,DATE '2020-08-08', 145, 14
UNION ALL SELECT 1,DATE '2020-08-12', 150, 15
)
,
with_a_week_before AS (
SELECT
*
, FIRST_VALUE(score) OVER(
PARTITION BY userid ORDER BY dt
RANGE BETWEEN INTERVAL '7 DAYS' PRECEDING AND CURRENT ROW
) AS score_a_week
, FIRST_VALUE(badgesearned) OVER(
PARTITION BY userid ORDER BY dt
RANGE BETWEEN INTERVAL '7 DAYS' PRECEDING AND CURRENT ROW
) AS badgesearned_a_week
, FIRST_VALUE(dt) OVER( -- check the date of the previous row
PARTITION BY userid ORDER BY dt
RANGE BETWEEN INTERVAL '7 DAYS' PRECEDING AND CURRENT ROW
) AS dt_a_week
FROM usr
)
SELECT * FROM with_a_week_before ORDER BY userid
-- out userid | dt | score | badgesearned | score_a_week | badgesearned_a_week | dt_a_week
-- out --------+------------+-------+--------------+--------------+---------------------+------------
-- out 1 | 2020-08-05 | 140 | 14 | 140 | 14 | 2020-08-05
-- out 1 | 2020-08-08 | 145 | 14 | 140 | 14 | 2020-08-05
-- out 1 | 2020-08-12 | 150 | 15 | 140 | 14 | 2020-08-05
-- out 100 | 2020-08-05 | 100 | 10 | 100 | 10 | 2020-08-05
-- out 100 | 2020-08-10 | 100 | 10 | 100 | 10 | 2020-08-05
-- out 100 | 2020-08-14 | 200 | 10 | 100 | 10 | 2020-08-10
-- out 1234 | 2020-08-06 | 100 | 10 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-07 | 120 | 12 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-08 | 130 | 13 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-12 | 140 | 14 | 100 | 10 | 2020-08-06
-- out 1234 | 2020-08-14 | 150 | 15 | 120 | 12 | 2020-08-07
I have a table in mysql database this data.
id date number qty
114 07-10-2018 200 5
120 01-12-2018 300 10
123 03-02-2019 700 12
1126 07-03-2019 1000 15
I want to calculate difference between two consecutive rows and i need output format be like:
id date number diff qty avg
114 07-10-2018 200 0 5 0
120 01-12-2018 300 100 10 10
123 03-02-2019 700 400 12 33.33
1126 07-03-2019 1000 300 15 20
Any one know how to do this in mysql query? I want first value of diff and avg column to be 0 and rest is the difference.
For MySQL 8 then use Lag window function.
SELECT
test.id,
test.date,
test.number,
test.qty,
IFNULL(test.number - LAG(test.number) OVER w, 0) AS diff,
ROUND(IFNULL(test.number - LAG(test.number) OVER w, 0)/ test.qty, 2) AS 'Avg'
FROM purchases test
WINDOW w AS (ORDER BY test.`date` ASC);
For MySQL 5.7 or lesser version
We can use the MySQL variable to do this job. Consider your table name is test.
SELECT
test.id,
test.date,
test.number,
test.qty,
#diff:= IF(#prev_number = 0, 0, test.number - #prev_number) AS diff,
ROUND(#diff / qty, 2) 'avg',
#prev_number:= test.number as dummy
FROM
test,
(SELECT #prev_number:= 0 AS num) AS b
ORDER BY test.`date` ASC;
-------------------------------------------------------------------------------
Output:
| id | date | number| qty | diff | avg | dummy |
-----------------------------------------------------------------
| 114 | 2018-10-07 | 200 | 5 | 0 | 0.00 | 200 |
| 120 | 2018-12-01 | 300 | 10 | 100 | 10.00 | 300 |
| 123 | 2019-02-03 | 700 | 12 | 400 | 33.33 | 700 |
| 1126 | 2019-03-07 | 1000 | 15 | 300 | 20.00 | 1000 |
Explaination:
(SELECT #prev_number:= 0 AS num) AS b
we initialized variable #prev_number to zero in FROM clause and joined with each row of the test table.
#diff:= IF(#prev_number = 0, 0, test.number - #prev_number) AS diff First we are generating difference and then created another variable diff to reuse it for average calculation. Also we included one condition to make the diff for first row as zero.
#prev_number:= test.number as dummy we are setting current number to this variable, so it can be used by next row.
Note: We have to use this variable first, in both difference as well as average and then set to the new value, so next row can access value from the previous row.
You can skip/modify order by clause as per your requirements.
There could be better ways to do this, but try this:
SELECT A.id,
A.date,
A.number,
A.qty,
A.diff,
B.avg
FROM
(SELECT *, abs(LAG(number, 1, number) OVER (ORDER BY id) - number) AS 'diff'
FROM table) AS A
JOIN
(SELECT *, abs(LAG(number, 1, number) OVER (ORDER BY id) - number)/qty AS 'avg' FROM table) AS B
ON A.id = B.id;
I have the following SQL query
SELECT *
FROM `sensor_data` AS `sd1`
WHERE (sd1.timestamp BETWEEN '2017-05-13 00:00:00'
AND '2017-05-14 00:00:00')
AND (`id` =
(
SELECT `id`
FROM `sensor_data` AS `sd2`
WHERE sd1.mid = sd2.mid
AND sd1.sid = sd2.sid
ORDER BY `value` DESC, `id` DESC
LIMIT 1)
)
Background:
I've checked the validity of the query by changing LIMIT 1 to LIMIT 0, and the query works without any problem. However with LIMIT 1 the query doesn't complete, it just states loading until I shutdown and restart.
Breaking the Query down:
I have broken down the query with the date boundary as follows:
SELECT *
FROM `sensor_data` AS `sd1`
WHERE (sd1.timestamp BETWEEN '2017-05-13 00:00:00'
AND '2017-05-14 00:00:00')
This takes about 0.24 seconds to return the query with 8200 rows each having 5 columns.
Question:
I suspect the second half of my Query, is not correct or well optimized.
The tables are as follows:
Current Table:
+------+-------+-------+-----+-----------------------+
| id | mid | sid | v | timestamp |
+------+-------+-------+-----+-----------------------+
| 51 | 10 | 1 | 40 | 2015-05-13 11:56:01 |
| 52 | 10 | 2 | 39 | 2015-05-13 11:56:25 |
| 53 | 10 | 2 | 40 | 2015-05-13 11:56:42 |
| 54 | 10 | 2 | 40 | 2015-05-13 11:56:45 |
| 55 | 10 | 2 | 40 | 2015-05-13 11:57:01 |
| 56 | 11 | 1 | 50 | 2015-05-13 11:57:52 |
| 57 | 11 | 2 | 18 | 2015-05-13 11:58:41 |
| 58 | 11 | 2 | 19 | 2015-05-13 11:58:59 |
| 59 | 11 | 3 | 58 | 2015-05-13 11:59:01 |
| 60 | 11 | 3 | 65 | 2015-05-13 11:59:29 |
+------+-------+-------+-----+-----------------------+
Q: How would I get the MAX(v)for each sid for each mid?
NB#1: In the example above ROW 53, 54, 55 have all the same value (40), but I would like to retrieve the row with the most recent timestamp, which is ROW 55.
Expected Output:
+------+-------+-------+-----+-----------------------+
| id | mid | sid | v | timestamp |
+------+-------+-------+-----+-----------------------+
| 51 | 10 | 1 | 40 | 2015-05-13 11:56:01 |
| 55 | 10 | 2 | 40 | 2015-05-13 11:57:01 |
| 56 | 11 | 1 | 50 | 2015-05-13 11:57:52 |
| 58 | 11 | 2 | 19 | 2015-05-13 11:58:59 |
| 60 | 11 | 3 | 65 | 2015-05-13 11:59:29 |
+------+-------+-------+-----+-----------------------+
Structure of the table:
NB#2:
Since this table has over 110 million entries, it is critical to have have date boundaries, which limits to ~8000 entries over a 24 hour period.
The query can be written as follows:
SELECT t1.id, t1.mid, t1.sid, t1.v, t1.ts
FROM yourtable t1
INNER JOIN (
SELECT mid, sid, MAX(v) as v
FROM yourtable
WHERE ts BETWEEN '2015-05-13 00:00:00' AND '2015-05-14 00:00:00'
GROUP BY mid, sid
) t2
ON t1.mid = t2.mid
AND t1.sid = t2.sid
AND t1.v = t2.v
INNER JOIN (
SELECT mid, sid, v, MAX(ts) as ts
FROM yourtable
WHERE ts BETWEEN '2015-05-13 00:00:00' AND '2015-05-14 00:00:00'
GROUP BY mid, sid, v
) t3
ON t1.mid = t3.mid
AND t1.sid = t3.sid
AND t1.v = t3.v
AND t1.ts = t3.ts;
Edit and Explanation:
The first sub-query (first INNER JOIN) fetches MAX(v) per (mid, sid) combination. The second sub-query is to identify MAX(ts) for every (mid, sid, v). At this point, the two queries do not influence each others' results. It is also important to note that ts date range selection is done in the two sub-queries independently such that the final query has fewer rows to examine and no additional WHERE filters to apply.
Effectively, this translates into getting MAX(v) per (mid, sid) combination initially (first sub-query); and if there is more than one record with the same value MAX(v) for a given (mid, sid) combo, then the excess records get eliminated by the selection of MAX(ts) for every (mid, sid, v) combination obtained by the second sub-query. We then simply associate the output of the two queries by the two INNER JOIN conditions to get to the id of the desired records.
Demo
select * from sensor_data s1 where s1.v in (select max(v) from sensor_data s2 group by s2.mid)
union
select * from sensor_data s1 where s1.v in (select max(v) from sensor_data s2 group by s2.sid);
IN ( SELECT ... ) does not optimize well. It is even worse because of being correlated.
What you are looking for is a groupwise-max .
Please provide SHOW CREATE TABLE; we need to know at least what the PRIMARY KEY is.
Suggested code
You will need:
With the WHERE: INDEX(timestamp, mid, sid, v, id)
Without the WHERE: INDEX(mid, sid, v, timestamp, id)
Code:
SELECT id, mid, sid, v, timestamp
FROM ( SELECT #prev_mid := 99999, -- some value not in table
#prev_sid := 99999,
#n := 0 ) AS init
JOIN (
SELECT #n := if(mid != #prev_mid OR
sid != #prev_sid,
1, #n + 1) AS n,
#prev_mid := mid,
#prev_sid := sid,
id, mid, sid, v, timestamp
FROM sensor_data
WHERE timestamp >= '2017-05-13'
timestamp < '2017-05-13' + INTERVAL 1 DAY
ORDER BY mid DESC, sid DESC, v DESC, timestamp DESC
) AS x
WHERE n = 1
ORDER BY mid, sid; -- optional
Notes:
The index is 'composite' and 'covering'.
This should make one pass over the index, thereby providing 'good' performance.
The final ORDER BY is optional; the results may be in reverse order.
All the DESC in the inner ORDER BY must be in place to work correctly (unless you are using MySQL 8.0).
Note how the WHERE avoids including both midnights? And avoids manually computing leap-days, year-ends, etc?
With the WHERE (and associated INDEX), there will be filtering, but a 'sort'.
Without the WHERE (and the other INDEX), sort will not be needed.
You can test the performance of any competing formulations via this trick, even if you do not have enough rows (yet) to get reliable timings:
FLUSH STATUS;
SELECT ...
SHOW SESSION STATUS LIKE 'Handler%';
This can also be used to compare different versions of MySQL and MariaDB -- I have seen 3 significantly different performance characteristics in a related groupwise-max test.
I have 2 tables, SVISE and OVERW
Inside OVERW I have some scores with person ids and the date of that score.
E.g
p_id degrees mo_date
5 10.2 2013-10-09
5 9.85 2013-03-10
8 14.75 2013-04-25
8 11.00 2013-02-22
5 5.45 2013-08-11
5 6.2 2013-06-10
SVISE.ofh field must be updated with the sum of the last three records
(for a specific person, ordered by date descending), so for person with id 5, the sum would result from the rows
5 10.2 2013-10-09
5 5.45 2013-08-11
5 6.2 2013-06-10
sum=21.85.
Desired final result on SVISE, based on the values above:
HID OFH START
5 21.85 October, 16 2013 ##(10.2 + 5.45 + 6.2)
5 21.5 September, 07 2013 ##(5.45 + 6.2 + 9.85)
5 0 March, 05 2013 ##(no rows)
8 25.75 October, 14 2013 ##(14.75 + 11)
3 0 October, 14 2013 ##(no rows)
5 0 March, 05 2012 ##(no rows)
OFHwas 0 initially
I can get the total sum for a specific person, but I can't use limit to get the last 3 rows. It gets ignored.
This is the query I use to retrieve the sum of all degrees per person for a given date:
UPDATE SVISE SV
SET
SV.ofh=(SELECT sum(degrees) FROM OVERW WHERE p_id =SV.hid
AND date(mo_date)<date(SV.start)
AND year(mo_date)=year(SV.start))
I cannot just use limit with sum:
UPDATE SVISE SV
SET
SV.ofh=(SELECT sum(degrees) FROM OVERW WHERE p_id =SV.hid
AND date(mo_date)<date(SV.start)
AND year(mo_date)=year(SV.start)
ORDER BY mo_date DESC
LIMIT 3)
This does not work.
I have tried with multi-table updates and with nested queries to achieve this.
Every scenario has known limitations that block me from accomplishing the desired result.
Nested queries cant see the parent table. Unknown column 'SV.hid'in 'where clause'
Multi-table update cant be use with limit. Incorrect usage of UPDATE and LIMIT
Any solution will do. There is no need to do it in a single query. If anyone wants to try even with an intermediate table.
An SQL fiddle is also available.
Thanks in advance for your help.
--Update--
Here is the solution from Akash: http://sqlfiddle.com/#!2/4cf1a/1
This should work,
UPDATED to have a join on svice
UPDATE
svice SV
JOIN (
SELECT
hid,
start,
sum(degrees) as degrees
FROM
(
SELECT
*,
IF(#prev_row <> unix_timestamp(start)+P_ID, #row_number:=0,NULL),
#prev_row:=unix_timestamp(start)+P_ID,
#row_number:=#row_number+1 as row_number
FROM
(
SELECT
mo_date,
p_id,
hid,
start,
degrees
FROM
OVERW
JOIN svice sv ON ( p_id = hid
AND date(mo_date)<date(SV.start)
AND year(mo_date)=year(SV.start) )
ORDER BY
hid,
start,
mo_date desc
) sub_query1
JOIN ( select #row_number:=0, #prev_row:=0 ) sub_query2
) sub_query
where
row_number <= 3
GROUP BY
hid,
start
) sub_query ON ( sub_query.hid = sv.hid AND sub_query.start = sv.start )
SET
SV.ofh = sub_query.degrees
Note: Check this with your updated data, the test data provided could not yield the results you expected due to the date conditions
Try
UPDATE svice SV
JOIN (SELECT SUM(degrees)sumdeg,p_id FROM(SELECT DISTINCT degrees,p_id FROM OVERW,svice WHERE OVERW.p_id IN (SELECT svice.hid FROM svice)
AND date(mo_date)<date(svice.start)
AND year(mo_date)=year(svice.start)ORDER BY mo_date DESC )deg group by p_id)bbc
ON bbc.p_id=SV.hid
SET
SV.ofh=bbc.sumdeg where p_id =SV.hid
http://sqlfiddle.com/#!2/95b42/42
Getting closer,now it "only" needs a limit in GROUP BY.
Two assumptions:
You can figure out how to turn this into an update, and
A PK exists on (id,mo_date)
Then you can do this -
SELECT p_id
, SUM(degrees) ttl
FROM
( SELECT x.*
FROM overw x
JOIN overw y
ON y.p_id = x.p_id
AND y.mo_date >= x.mo_date
GROUP
BY p_id
, mo_date HAVING COUNT(*) <= 3
) a
GROUP
BY p_id;
Maybe I'm slow, but let's ignore svice for now.
Can you show the correct result and the working for each row below...
+------+---------+------------+--------+
| p_id | degrees | mo_date | result |
+------+---------+------------+--------+
| 5 | 6.20 | 2013-06-10 | ? |
| 5 | 5.45 | 2013-08-11 | ? |
| 5 | 10.20 | 2013-10-09 | 21.85 | <- = 10.2+5.45+6.2 = 21.85
| 8 | 14.75 | 2013-04-25 | ? |
| 5 | 9.85 | 2013-03-10 | ? |
| 8 | 11.00 | 2013-02-22 | ? |
+------+---------+------------+--------+
I am struggling in to get result from mysql in the following way. I have 10 records in mysql db table having date and unit fields. I need to get used units on every date.
Table structure as follows, adding today unit with past previous unit in every record:
Date Units
---------- ---------
10/10/2012 101
11/10/2012 111
12/10/2012 121
13/10/2012 140
14/10/2012 150
15/10/2012 155
16/10/2012 170
17/10/2012 180
18/10/2012 185
19/10/2012 200
Desired output will be :
Date Units
---------- ---------
10/10/2012 101
11/10/2012 10
12/10/2012 10
13/10/2012 19
14/10/2012 10
15/10/2012 5
16/10/2012 15
17/10/2012 10
18/10/2012 5
19/10/2012 15
Any help will be appreciated. Thanks
There's a couple of ways to get the resultset. If you can live with an extra column in the resultset, and the order of the columns, then something like this is a workable approach.
using user variables
SELECT d.Date
, IF(#prev_units IS NULL
,#diff := 0
,#diff := d.units - #prev_units
) AS `Units_used`
, #prev_units := d.units AS `Units`
FROM ( SELECT #prev_units := NULL ) i
JOIN (
SELECT t.Date, t.Units
FROM mytable t
ORDER BY t.Date, t.Units
) d
This returns the specified resultset, but it includes the Units column as well. It's possible to have that column filtered out, but it's more expensive, because of the way MySQL processes an inline view (MySQL calls it a "derived table")
To remove that extra column, you can wrap that in another query...
SELECT f.Date
, f.Units_used
FROM (
query from above goes here
) f
ORDER BY f.Date
but again, removing that column comes with the extra cost of materializing that result set a second time.
using a semi-join
If you are guaranteed to have a single row for each Date value, either stored as a DATE, or as a DATETIME with the timecomponent set to a constant, such as midnight, and no gaps in the Date value, and Date is defined as DATE or DATETIME datatype, then another query that will return the specifid result set:
SELECT t.Date
, t.Units - s.Units AS Units_Used
FROM mytable t
LEFT
JOIN mytable s
ON s.Date = t.Date + INTERVAL -1 DAY
ORDER BY t.Date
If there's a missing Date value (a gap) such that there is no matching previous row, then Units_used will have a NULL value.
using a correlated subquery
If you don't have a guarantee of no "missing dates", but you have a guarantee that there is no more than one row for a particular Date, then another approach (usually more expensive in terms of performance) is to use a correlated subquery:
SELECT t.Date
, ( t.Units - (SELECT s.Units
FROM mytable s
WHERE s.Date < t.Date
ORDER BY s.Date DESC
LIMIT 1)
) AS Units_used
FROM mytable t
ORDER BY t.Date, t.Units
spencer7593's solution will be faster, but you can also do something like this...
SELECT * FROM rolling;
+----+-------+
| id | units |
+----+-------+
| 1 | 101 |
| 2 | 111 |
| 3 | 121 |
| 4 | 140 |
| 5 | 150 |
| 6 | 155 |
| 7 | 170 |
| 8 | 180 |
| 9 | 185 |
| 10 | 200 |
+----+-------+
SELECT a.id,COALESCE(a.units - b.units,a.units) units
FROM
( SELECT x.*
, COUNT(*) rank
FROM rolling x
JOIN rolling y
ON y.id <= x.id
GROUP
BY x.id
) a
LEFT
JOIN
( SELECT x.*
, COUNT(*) rank
FROM rolling x
JOIN rolling y
ON y.id <= x.id
GROUP
BY x.id
) b
ON b.rank= a.rank -1;
+----+-------+
| id | units |
+----+-------+
| 1 | 101 |
| 2 | 10 |
| 3 | 10 |
| 4 | 19 |
| 5 | 10 |
| 6 | 5 |
| 7 | 15 |
| 8 | 10 |
| 9 | 5 |
| 10 | 15 |
+----+-------+
This should give the desired result. I don't know how your table is called so I named it "tbltest".
Naming a table date is generally a bad idea as it also refers to other things (functions, data types,...) so I renamed it "fdate". Using uppercase characters in field names or tablenames is also a bad idea as it makes your statements less database independent (some databases are case sensitive and some are not).
SELECT
A.fdate,
A.units - coalesce(B.units, 0) AS units
FROM
tbltest A left join tbltest B ON A.fdate = B.fdate + INTERVAL 1 DAY