I am struggling in to get result from mysql in the following way. I have 10 records in mysql db table having date and unit fields. I need to get used units on every date.
Table structure as follows, adding today unit with past previous unit in every record:
Date Units
---------- ---------
10/10/2012 101
11/10/2012 111
12/10/2012 121
13/10/2012 140
14/10/2012 150
15/10/2012 155
16/10/2012 170
17/10/2012 180
18/10/2012 185
19/10/2012 200
Desired output will be :
Date Units
---------- ---------
10/10/2012 101
11/10/2012 10
12/10/2012 10
13/10/2012 19
14/10/2012 10
15/10/2012 5
16/10/2012 15
17/10/2012 10
18/10/2012 5
19/10/2012 15
Any help will be appreciated. Thanks
There's a couple of ways to get the resultset. If you can live with an extra column in the resultset, and the order of the columns, then something like this is a workable approach.
using user variables
SELECT d.Date
, IF(#prev_units IS NULL
,#diff := 0
,#diff := d.units - #prev_units
) AS `Units_used`
, #prev_units := d.units AS `Units`
FROM ( SELECT #prev_units := NULL ) i
JOIN (
SELECT t.Date, t.Units
FROM mytable t
ORDER BY t.Date, t.Units
) d
This returns the specified resultset, but it includes the Units column as well. It's possible to have that column filtered out, but it's more expensive, because of the way MySQL processes an inline view (MySQL calls it a "derived table")
To remove that extra column, you can wrap that in another query...
SELECT f.Date
, f.Units_used
FROM (
query from above goes here
) f
ORDER BY f.Date
but again, removing that column comes with the extra cost of materializing that result set a second time.
using a semi-join
If you are guaranteed to have a single row for each Date value, either stored as a DATE, or as a DATETIME with the timecomponent set to a constant, such as midnight, and no gaps in the Date value, and Date is defined as DATE or DATETIME datatype, then another query that will return the specifid result set:
SELECT t.Date
, t.Units - s.Units AS Units_Used
FROM mytable t
LEFT
JOIN mytable s
ON s.Date = t.Date + INTERVAL -1 DAY
ORDER BY t.Date
If there's a missing Date value (a gap) such that there is no matching previous row, then Units_used will have a NULL value.
using a correlated subquery
If you don't have a guarantee of no "missing dates", but you have a guarantee that there is no more than one row for a particular Date, then another approach (usually more expensive in terms of performance) is to use a correlated subquery:
SELECT t.Date
, ( t.Units - (SELECT s.Units
FROM mytable s
WHERE s.Date < t.Date
ORDER BY s.Date DESC
LIMIT 1)
) AS Units_used
FROM mytable t
ORDER BY t.Date, t.Units
spencer7593's solution will be faster, but you can also do something like this...
SELECT * FROM rolling;
+----+-------+
| id | units |
+----+-------+
| 1 | 101 |
| 2 | 111 |
| 3 | 121 |
| 4 | 140 |
| 5 | 150 |
| 6 | 155 |
| 7 | 170 |
| 8 | 180 |
| 9 | 185 |
| 10 | 200 |
+----+-------+
SELECT a.id,COALESCE(a.units - b.units,a.units) units
FROM
( SELECT x.*
, COUNT(*) rank
FROM rolling x
JOIN rolling y
ON y.id <= x.id
GROUP
BY x.id
) a
LEFT
JOIN
( SELECT x.*
, COUNT(*) rank
FROM rolling x
JOIN rolling y
ON y.id <= x.id
GROUP
BY x.id
) b
ON b.rank= a.rank -1;
+----+-------+
| id | units |
+----+-------+
| 1 | 101 |
| 2 | 10 |
| 3 | 10 |
| 4 | 19 |
| 5 | 10 |
| 6 | 5 |
| 7 | 15 |
| 8 | 10 |
| 9 | 5 |
| 10 | 15 |
+----+-------+
This should give the desired result. I don't know how your table is called so I named it "tbltest".
Naming a table date is generally a bad idea as it also refers to other things (functions, data types,...) so I renamed it "fdate". Using uppercase characters in field names or tablenames is also a bad idea as it makes your statements less database independent (some databases are case sensitive and some are not).
SELECT
A.fdate,
A.units - coalesce(B.units, 0) AS units
FROM
tbltest A left join tbltest B ON A.fdate = B.fdate + INTERVAL 1 DAY
Related
I have a table with a column A that is INT(11) (it's a timestamp, but for now I just use small numbers)
id | A | diff |
---+----+------+
1 | 12 | |
2 | 7 | |
3 | 23 | |
4 | 9 | |
5 | 2 | |
6 | 30 | |
I like to update diff with the difference between A and it's nearest smaller neighbour. So if A=12 it's first smaller neightbour is A=7, if A=30 it is A=23. I should end up with a table like this (sorted on A):
id | A | diff |
---+----+------+
5 | 2 | - |
2 | 7 | 5 | (7-5)
4 | 9 | 2 | (9-7)
1 | 12 | 3 | (12-9)
3 | 23 | 11 | (23-12)
6 | 30 | 7 | (30-23)
I can calculate the difference at the moment of insertion, as I know A then (here: A=15):
INSERT INTO `table` (`A`,`diff`)
(SELECT 15 , 15-`A` FROM `table` WHERE `A` < 15 ORDER BY `A` DESC LIMIT 1)
This results in a new record:
id | A | diff |
---+----+------+
7 | 15 | 3 | (3 being the difference between A=12 and A=15
(NOTE: This fails miserably when A=1, being the new smallest value and having no smaller neighbour, so no value of diff)
But now the value of diff in record 3 is wrong, because it still is based on the difference between 23 - 12 as is now should be 23 - 15.
So I just want to insert the A value and then run an update on the table, refreshing diff where necessery. But that's where my knowledge of MYSQL ends...
I crafted this query, but it fails saying `You can't specify table 't1' for update in FROM clause
UPDATE `table` AS t1
SET
t1.`diff` = t1.`A` - (SELECT `A` FROM `table`
WHERE `A` < t1.`A`
ORDER BY `A` DESC LIMIT 1
)
Here's a query:
SELECT x.*
, x.a-MAX(y.a) diff
FROM my_table x
LEFT
JOIN my_table y
ON y.a < x.a
GROUP
BY x.id
ORDER
BY a;
I'm not sure why you would want to store derived data, but you can I guess...
UPDATE my_table m
JOIN
( SELECT x.*
, x.a-MAX(y.a) q
FROM my_table x
JOIN my_table y
ON y.a < x.a
GROUP
BY x.id
) n
ON n.id = m.id
SET m.diff = q;
You may try this after inserting new value :
UPDATE x
SET
x.diff = iq2.new_diff
FROM
#t x
INNER JOIN
(SELECt id,A,diff , new_diff
FROM
(select id,A,15 as new_number,
CASE WHEN (A-15) < 0 THEN NULL ELSE (A-15) END as new_diff,diff
from #t
) iq
WHERE
iq.new_diff <= iq.diff
AND iq.new_diff <> 0
)iq2
on x.A = iq2.A
inner query compares the previous difference and current one and then updates the relevant ones.
Trying to select last row each day.
This is my (simplified, more records in actual table) table:
+-----+-----------------------+------+
| id | datetime | temp |
+-----+-----------------------+------+
| 9 | 2017-06-05 23:55:00 | 9.5 |
| 8 | 2017-06-05 23:50:00 | 9.6 |
| 7 | 2017-06-05 23:45:00 | 9.3 |
| 6 | 2017-06-04 23:55:00 | 9.4 |
| 5 | 2017-06-04 23:50:00 | 9.2 |
| 4 | 2017-06-05 23:45:00 | 9.1 |
| 3 | 2017-06-03 23:55:00 | 9.8 |
| 2 | 2017-06-03 23:50:00 | 9.7 |
| 1 | 2017-06-03 23:45:00 | 9.6 |
+-----+-----------------------+------+
I want to select row with id = 9, id = 6 and id = 3.
I have tried this query:
SELECT MAX(datetime) Stamp
, temp
FROM weatherdata
GROUP
BY YEAR(DateTime)
, MONTH(DateTime)
, DAY(DateTime)
order
by datetime desc
limit 10;
But datetime and temp does not match.
Kind Regards
Here's one way, which gets the MAX date per day and then uses it in the INNER query to get the other fields:
SELECT *
FROM test
WHERE `datetime` IN (
SELECT MAX(`datetime`)
FROM test
GROUP BY DATE(`datetime`)
);
Here's the SQL Fiddle.
If your rows are always inserted and never updated, and if id is an autoincrementing primary key, then
SELECT w.*
FROM weatherdata w
JOIN ( SELECT MAX(id) id
FROM weatherdata
GROUP BY DATE(datetime)
) last ON w.id = last.id
will get you what you want. Why? The inner query returns the largest (meaning most recent) id value for each date in weatherdata. This can be very fast indeed, especially if you put an index on the datetime column.
But it's possible the conditions for this to work don't hold. If your datetime column sometimes gets updated to change the date, it's possible that larger id values don't always imply larger datetime values.
In that case you need something like this.
SELECT w.*
FROM weatherdata w
JOIN ( SELECT MAX(datetime) datetime
FROM weatherdata
GROUP BY DATE(datetime)
) last ON w.datetime = last.datetime
Your query doesn't work because it misuses the nasty nonstandard extension to MySQL GROUP BY. Read this: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
It should, properly, use the ANY_VALUE() function to highlight the unpredictability of the results. It shoud read ....
SELECT MAX(datetime) Stamp, ANY_VALUE(temp) temp
which means you aren't guaranteed the right row's temp value. Rather, it can return the temp value from any row in each day's grouping.
I have the following SQL query
SELECT *
FROM `sensor_data` AS `sd1`
WHERE (sd1.timestamp BETWEEN '2017-05-13 00:00:00'
AND '2017-05-14 00:00:00')
AND (`id` =
(
SELECT `id`
FROM `sensor_data` AS `sd2`
WHERE sd1.mid = sd2.mid
AND sd1.sid = sd2.sid
ORDER BY `value` DESC, `id` DESC
LIMIT 1)
)
Background:
I've checked the validity of the query by changing LIMIT 1 to LIMIT 0, and the query works without any problem. However with LIMIT 1 the query doesn't complete, it just states loading until I shutdown and restart.
Breaking the Query down:
I have broken down the query with the date boundary as follows:
SELECT *
FROM `sensor_data` AS `sd1`
WHERE (sd1.timestamp BETWEEN '2017-05-13 00:00:00'
AND '2017-05-14 00:00:00')
This takes about 0.24 seconds to return the query with 8200 rows each having 5 columns.
Question:
I suspect the second half of my Query, is not correct or well optimized.
The tables are as follows:
Current Table:
+------+-------+-------+-----+-----------------------+
| id | mid | sid | v | timestamp |
+------+-------+-------+-----+-----------------------+
| 51 | 10 | 1 | 40 | 2015-05-13 11:56:01 |
| 52 | 10 | 2 | 39 | 2015-05-13 11:56:25 |
| 53 | 10 | 2 | 40 | 2015-05-13 11:56:42 |
| 54 | 10 | 2 | 40 | 2015-05-13 11:56:45 |
| 55 | 10 | 2 | 40 | 2015-05-13 11:57:01 |
| 56 | 11 | 1 | 50 | 2015-05-13 11:57:52 |
| 57 | 11 | 2 | 18 | 2015-05-13 11:58:41 |
| 58 | 11 | 2 | 19 | 2015-05-13 11:58:59 |
| 59 | 11 | 3 | 58 | 2015-05-13 11:59:01 |
| 60 | 11 | 3 | 65 | 2015-05-13 11:59:29 |
+------+-------+-------+-----+-----------------------+
Q: How would I get the MAX(v)for each sid for each mid?
NB#1: In the example above ROW 53, 54, 55 have all the same value (40), but I would like to retrieve the row with the most recent timestamp, which is ROW 55.
Expected Output:
+------+-------+-------+-----+-----------------------+
| id | mid | sid | v | timestamp |
+------+-------+-------+-----+-----------------------+
| 51 | 10 | 1 | 40 | 2015-05-13 11:56:01 |
| 55 | 10 | 2 | 40 | 2015-05-13 11:57:01 |
| 56 | 11 | 1 | 50 | 2015-05-13 11:57:52 |
| 58 | 11 | 2 | 19 | 2015-05-13 11:58:59 |
| 60 | 11 | 3 | 65 | 2015-05-13 11:59:29 |
+------+-------+-------+-----+-----------------------+
Structure of the table:
NB#2:
Since this table has over 110 million entries, it is critical to have have date boundaries, which limits to ~8000 entries over a 24 hour period.
The query can be written as follows:
SELECT t1.id, t1.mid, t1.sid, t1.v, t1.ts
FROM yourtable t1
INNER JOIN (
SELECT mid, sid, MAX(v) as v
FROM yourtable
WHERE ts BETWEEN '2015-05-13 00:00:00' AND '2015-05-14 00:00:00'
GROUP BY mid, sid
) t2
ON t1.mid = t2.mid
AND t1.sid = t2.sid
AND t1.v = t2.v
INNER JOIN (
SELECT mid, sid, v, MAX(ts) as ts
FROM yourtable
WHERE ts BETWEEN '2015-05-13 00:00:00' AND '2015-05-14 00:00:00'
GROUP BY mid, sid, v
) t3
ON t1.mid = t3.mid
AND t1.sid = t3.sid
AND t1.v = t3.v
AND t1.ts = t3.ts;
Edit and Explanation:
The first sub-query (first INNER JOIN) fetches MAX(v) per (mid, sid) combination. The second sub-query is to identify MAX(ts) for every (mid, sid, v). At this point, the two queries do not influence each others' results. It is also important to note that ts date range selection is done in the two sub-queries independently such that the final query has fewer rows to examine and no additional WHERE filters to apply.
Effectively, this translates into getting MAX(v) per (mid, sid) combination initially (first sub-query); and if there is more than one record with the same value MAX(v) for a given (mid, sid) combo, then the excess records get eliminated by the selection of MAX(ts) for every (mid, sid, v) combination obtained by the second sub-query. We then simply associate the output of the two queries by the two INNER JOIN conditions to get to the id of the desired records.
Demo
select * from sensor_data s1 where s1.v in (select max(v) from sensor_data s2 group by s2.mid)
union
select * from sensor_data s1 where s1.v in (select max(v) from sensor_data s2 group by s2.sid);
IN ( SELECT ... ) does not optimize well. It is even worse because of being correlated.
What you are looking for is a groupwise-max .
Please provide SHOW CREATE TABLE; we need to know at least what the PRIMARY KEY is.
Suggested code
You will need:
With the WHERE: INDEX(timestamp, mid, sid, v, id)
Without the WHERE: INDEX(mid, sid, v, timestamp, id)
Code:
SELECT id, mid, sid, v, timestamp
FROM ( SELECT #prev_mid := 99999, -- some value not in table
#prev_sid := 99999,
#n := 0 ) AS init
JOIN (
SELECT #n := if(mid != #prev_mid OR
sid != #prev_sid,
1, #n + 1) AS n,
#prev_mid := mid,
#prev_sid := sid,
id, mid, sid, v, timestamp
FROM sensor_data
WHERE timestamp >= '2017-05-13'
timestamp < '2017-05-13' + INTERVAL 1 DAY
ORDER BY mid DESC, sid DESC, v DESC, timestamp DESC
) AS x
WHERE n = 1
ORDER BY mid, sid; -- optional
Notes:
The index is 'composite' and 'covering'.
This should make one pass over the index, thereby providing 'good' performance.
The final ORDER BY is optional; the results may be in reverse order.
All the DESC in the inner ORDER BY must be in place to work correctly (unless you are using MySQL 8.0).
Note how the WHERE avoids including both midnights? And avoids manually computing leap-days, year-ends, etc?
With the WHERE (and associated INDEX), there will be filtering, but a 'sort'.
Without the WHERE (and the other INDEX), sort will not be needed.
You can test the performance of any competing formulations via this trick, even if you do not have enough rows (yet) to get reliable timings:
FLUSH STATUS;
SELECT ...
SHOW SESSION STATUS LIKE 'Handler%';
This can also be used to compare different versions of MySQL and MariaDB -- I have seen 3 significantly different performance characteristics in a related groupwise-max test.
Update #1: query gives me syntax error on Left Join line (running the query within the left join independently works perfectly though)
SELECT b1.company_id, ((sum(b1.credit)-sum(b1.debit)) as 'Balance'
FROM MyTable b1
JOIN CustomerInfoTable c on c.id = b1.company_id
#Filter for Clients of particular brand, package and active status
where c.brand_id = 2 and c.status = 2 and c.package_id = 3
LEFT JOIN
(
SELECT b2.company_id, sum(b2.debit) as 'Current_Usage'
FROM MyTable b2
WHERE year(b2.timestamp) = '2012' and month(b2.timestamp) = '06'
GROUP BY b2.company_id
)
b3 on b3.company_id = b1.company_id
group by b1.company_id;
Original Post:
I keep track of debits and credits in the same table. The table has the following schema:
| company_id | timestamp | credit | debit |
| 10 | MAY-25 | 100 | 000 |
| 11 | MAY-25 | 000 | 054 |
| 10 | MAY-28 | 000 | 040 |
| 12 | JUN-01 | 100 | 000 |
| 10 | JUN-25 | 150 | 000 |
| 10 | JUN-25 | 000 | 025 |
As my result, I want to to see:
| Grouped by: company_id | Balance* | Current_Usage (in June) |
| 10 | 185 | 25 |
| 12 | 100 | 0 |
| 11 | -54 | 0 |
Balance: Calculated by (sum(credit) - sum(debits))* - timestamp does not matter
Current_Usage: Calculated by sum(debits) - but only for debits in JUN.
The problem: If I filter by JUN timestamp right away, it does not calculate the balance of all time but only the balance of any transactions in June.
How can I calculate the current usage by month but the balance on all transactions in the table. I have everything working, except that it filters only the JUN results into the current usage calculation in my code:
SELECT b.company_id, ((sum(b.credit)-sum(b.debit))/1024/1024/1024/1024) as 'BW_remaining', sum(b.debit/1024/1024/1024/1024/28*30) as 'Usage_per_month'
FROM mytable b
#How to filter this only for the current_usage calculation?
WHERE month(a.timestamp) = 'JUN' and a.credit = 0
#Group by company in order to sum all entries for balance
group by b.company_id
order by b.balance desc;
what you will need here is a join with sub query which will filter based on month.
SELECT T1.company_id,
((sum(T1.credit)-sum(T1.debit))/1024/1024/1024/1024) as 'BW_remaining',
MAX(T3.DEBIT_PER_MONTH)
FROM MYTABLE T1
LEFT JOIN
(
SELECT T2.company_id, SUM(T2.debit) T3.DEBIT_PER_MONTH
FROM MYTABLE T2
WHERE month(T2.timestamp) = 'JUN'
GROUP BY T2.company_id
)
T3 ON T1.company_id-T3.company_id
GROUP BY T1.company_id
I havn't tested the query. The point here i am trying to make is how you can join your existing query to get usage per month.
alright, thanks to #Kshitij I got it working. In case somebody else is running into the same issue, this is how I solved it:
SELECT b1.company_id, ((sum(b1.credit)-sum(b1.debit)) as 'Balance',
(
SELECT sum(b2.debit)
FROM MYTABLE b2
WHERE b2.company_id = b1.company_id and year(b2.timestamp) = '2012' and month(b2.timestamp) = '06'
GROUP BY b2.company_id
) AS 'Usage_June'
FROM MYTABLE b1
#Group by company in order to add sum of all zones the company is using
group by b1.company_id
order by Usage_June desc;
I need some help with a MySQL query I'm working on. I have data as follows.
Table 1
id date1 text number
---|------------|--------|-------
1 | 2012-12-12 | hi | 399
2 | 2011-11-11 | so | 399
5 | 2010-10-10 | what | 555
3 | 2009-09-09 | bye | 300
4 | 2008-08-08 | you | 300
Table 2
id number date2 ref
---|--------|------------|----
1 | 399 | 2012-06-06 | 40
2 | 399 | 2011-06-06 | 50
5 | 555 | 2011-03-03 | 60
For each row in Table 1, I want to get zero or one ref values from Table 2. There should be a row in the result for each row in Table 1. The number column isn't unique to either table, so the join must be made using the date1 & date2 columns, where date2 is the highest value for the number without exceeding date1 for that number.
The desired result from the above example would be like so.
date1 text number ref
------------|--------|--------|-----
2012-12-12 | hi | 399 | 40
2011-11-11 | so | 399 | 50
2010-10-10 | what | 555 | null
2009-09-09 | bye | 300 | null
2008-08-08 | you | 300 | null
You can see in the result's first row, ref is 40 was chosen because in table2 the record with ref=40 had a date2 that that was less than date1, and the highest date that met that condition.
In the result's second row, ref is 50 was chosen because in table2 the record with ref=50 had a date2 that that was less than date1, and the highest date that met that condition.
The rest of the results have null refs because date1 is always less or a corresponding number doesn't exist in table2.
I've got to a certain point but I'm stuck. The query I have so far is like this.
SELECT date1, text, number, ref
FROM table1
LEFT JOIN (
SELECT *
FROM (
SELECT *
FROM table2
WHERE date2 <= '2012-12-12'
ORDER BY date2 DESC
) tmp
GROUP BY msisdn
) tmp ON table1.number = table2.number;
The problem is that the hard coded date won't do, it should be based on date1, but I can't use date1 because it's in the outer query. Is there a way I can make this work?
I tried similar example with different tables just now and was able to get what you wanted. Below is a similar query modified to fit your needs. You might want to change < with <= if that is what you are looking for.
SELECT a.date1, a.text, b.ref
FROM table1 a LEFT JOIN table2 b ON
( a.number = b.number
AND a.date1 > b.date2
AND b.date2 = ( SELECT MAX(x.date2)
FROM table2 x
WHERE x.number = b.number
AND x.date2 < a.date1)
)
Untested:
SELECT t1.date1,
t1.text,
t1.number,
(SELECT a.ref
FROM TABLE_2 a
JOIN (SELECT t.number,
MAX(t.date2) AS max_date
FROM TABLE_2 t
WHERE t.number = t1.number
AND t.date2 <= t1.date1
GROUP BY t.number) b ON b.number = a.number
AND b.max_date = a.date2)
FROM TABLE_1 t1
The issue is the use of t1 in the derived table of the subselect...