Loading multiple time series simultaneously using SQL - mysql

Suppose I have this exact dataset:
date
widget ID
widget price
widget expiry date
2020-01-01
A
1
2020-03-01
2020-01-01
B
2
2020-04-01
2020-01-01
C
3
2020-05-01
2020-01-01
D
4
2020-06-01
2020-01-02
A
1.1
2020-03-01
2020-01-02
B
2.05
2020-04-01
2020-01-02
C
3.7
2020-05-01
2020-01-02
D
3.8
2020-06-01
2020-01-03
A
1.15
2020-03-01
2020-01-03
B
2.09
2020-04-01
2020-01-03
C
3.54
2020-05-01
2020-01-03
D
4.2
2020-06-01
2020-01-04
A
1.19
2020-03-01
2020-01-04
B
2.14
2020-04-01
2020-01-04
C
3.73
2020-05-01
2020-01-04
D
4.30
2020-06-01
Say I wanted to simultaneously retrieve the full time series of the two following widgets using a single SQL query:
the widget which on date 2020-01-01 had price as close as possible to 1 and expiry date as close as possible to 2020-03-10.
the widget which on date 2020-01-03 had price as close as possible to 3.5 and expiry date as close as possible to 2020-05-15.
In other words, this exact table:
date
widget ID
widget price
widget expiry date
2020-01-01
A
1
2020-03-01
2020-01-01
C
3
2020-05-01
2020-01-02
A
1.1
2020-03-01
2020-01-02
C
3.7
2020-05-01
2020-01-03
A
1.15
2020-03-01
2020-01-03
C
3.54
2020-05-01
2020-01-04
A
1.19
2020-03-01
2020-01-04
C
3.73
2020-05-01
How would you recommend going about it?
Generalising this example, suppose you had a list of tuples like below, where price_i is a target price and expiry_date_i is a target expiry date.
(date_1, price_1, expiry_date_1), (date_2, price_2, expiry_date_2),
(date_3, price_3, expiry_date_3),...
How would you load all of the corresponding widgets' time series in one go?
For the time being I am retrieving these widgets' IDs separately using a SQL query like this one (in this example date='2020-01-01', price=1, expiry date='2020-03-10'). Then collecting all of these retrieved IDs I load the full widget time series.
WITH sample AS
(SELECT *, ABS(DATEDIFF(day,widget_expiry_date, '2020-03-10')) AS date_diff, ABS(widget_price - 1) As price_diff
FROM data WHERE date='2020-01-01'
ORDER BY date_diff ASC, price_diff ASC)
SELECT TOP 1 widget_ID FROM sample
As you can imagine this is extremely inefficient. I wonder if there is a smarter way about it?
Thank you for your time and apologies in advance for the noobish question.

Retrieving all the series in a single query
with params (date_, price_, expiry_date_) AS (
select date '2020-01-01', 1, date '2020-03-10' union all
select date '2020-01-03', 3.5, date '2020-05-15'
)
select data.*
from params p
join data on data.widgetID = (
SELECT widgetID
FROM data d
WHERE d.date = p.date_
ORDER BY ABS(DATEDIFF(d.widget_expiry_date, p.expiry_date_)) ASC, ABS(d.widget_price - p.price_) ASC
LIMIT 1);
db<>fiddle

you also can use window functions:
SELECT indate , widgetID , price , expirydate FROM (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY indate ORDER BY ABS(price - 1), ABS(DATEDIFF(expirydate, '2020-03-10')) ) rn1
, ROW_NUMBER() OVER (PARTITION BY indate ORDER BY ABS(price - 3.5), ABS(DATEDIFF(expirydate, '2020-05-15')) ) rn2
FROM widgets
) t
WHERE rn1 =1 OR rn2 = 1
ORDER BY indate , widgetID
db<>fiddle here

Related

How to execute a query on an unrelated table for every row in SQL

Let's say I have a table like this that tracks the balance of an asset I have in an account:
Delta
NetBalance
Timestamp
2
2
2020-01-01 00:00:00.000
4
6
2020-01-02 00:00:00.000
-1
5
2020-01-03 00:00:00.000
Let's say I have another unrelated table that keeps of track of pricing for my asset:
Price
Timestamp
1.00
2020-01-01 00:00:00.000
1.02
2020-01-01 23:59:00.000
2.01
2020-01-02 10:00:00.000
2.02
2020-01-02 18:00:00.000
3.01
2020-01-03 12:00:00.000
3.02
2020-01-03 13:59:00.000
I'm looking for a query that will yield a result set with the columns from the first table, plus the closest price (from the exact moment, or the past) from the second table and its associated timestamp, so, something like this:
Delta
NetBalance
Timestamp
MostRecentPrice
MostRecentPriceTimestamp
2
2
2020-01-01 00:00:00.000
1.00
2020-01-01 00:00:00.000
4
6
2020-01-02 00:00:00.000
1.02
2020-01-01 23:59:00.000
-1
5
2020-01-03 00:00:00.000
2.02
2020-01-02 18:00:00.000
Working with MySQL here. Would prefer to avoid things like cross joins because the tables themselves are pretty huge, but open to suggestions.
You can try to use LAG window function get previous Timestamp from account then do join with unrelated table.
Then use ROW_NUMBER window function to get MostRecent data rows.
SELECT *
FROM (
SELECT *,
row_number() OVER(PARTITION BY MONTH(Timestamp),DAY(Timestamp) ORDER BY MostRecentPriceTimestamp DESC) rn
FROM (
SELECT a.Delta,
a.NetBalance,
a.Timestamp,
u.Timestamp MostRecentPriceTimestamp,
u.Price MostRecentPrice
FROM (
SELECT *,LAG(Timestamp,1,Timestamp) OVER(ORDER BY Timestamp) prev_Timestamp
FROM account a
) a
INNER JOIN unrelated u
ON u.Timestamp BETWEEN a.prev_Timestamp AND a.Timestamp
) t1
) t1
WHERE rn = 1
sqlfiddle

Want to generate a row_number variable in MySQL 5.6 - but not using #variables

I have a table (all code is on fiddle).
id nom bal bal_dt val_dt
1 Bill 75.00 2019-11-01 2020-03-31
1 Bill 100.00 2020-04-01 2020-07-31
1 Bill 500.00 2020-08-01 2021-11-11 -- record goes over New Year 2021
2 Ben 5.00 2019-11-01 2020-03-31
2 Ben 10.00 2020-04-01 2020-07-31
2 Ben 100.00 2020-08-01 2021-11-11 -- record goes over New Year 2021
6 rows
The primary key is (id, bal_dt) - only one deposit/day.
I want to get the last record before the New Year 2021 (or <= 2021-01-01 00:00:00).
I try code from here as follow.
select a2.id, a2.nom, a2.val_dt,
(select count(*) from account a1 where a1.id < a2.id) AS rn
from account a2
where a2.val_dt <= '2021-01-01 00:00:00'
order by val_dt desc;
But result is not good.
id nom val_dt rn
1 Bill 2020-07-31 0
2 Ben 2020-07-31 3
1 Bill 2020-03-31 0
2 Ben 2020-03-31 3
I want something like
id nom rn val_dt bal
1 Bill 1 2020-08-01 500.00
2 Ben 1 2020-08-01 100.00
so I choose record for Bill and Ben. Any helps please?
note - I don't requier #variables and not assume 3 only records or only 2 accounts and not same dates and not only last date!
You can use NOT EXISTS and a correlated subquery that checks for the absence of a younger timestamp within the desired period.
SELECT a1.id,
a1.nom,
a1.val_dt
FROM account a1
WHERE a1.val_dt < '2021-01-01 00:00:00'
AND NOT EXISTS (SELECT *
FROM account a2
WHERE a2.val_dt < '2021-01-01 00:00:00'
AND a2.val_dt > a1.val_dt
AND a2.id = a1.id);
Note that 2021-01-01 00:00:00 already is in 2021, so the operator needs to actually be < not <=.
Solved it (see fiddle)!
select
tab.id, tab.md, a2.bal
from account a2
join
(
select
a1.id, max(a1.bal_dt) AS md
from account a1
where a1.bal_dt <= '2021-01-01 00:00:00'
group by a1.id
) as tab
on a2.id = tab.id and a2.bal_dt = tab.md;
and.
id md bal
1 2020-08-01 500.00
2 2020-08-01 100.00

Counting club memberships in SQL

EDIT: I have added the primary key, following the comment by #Strawberry
The aim is to return the number of current members, and also the number of past memberships, on any particular date/time.
For example, suppose we have
msid id start cancelled
1 1 2020-01-01 09:00:00 null
2 2 2020-01-01 09:00:00 2020-12-31 09:00:00
3 2 2021-01-01 09:00:00 null
4 3 2020-01-01 09:00:00 2020-06-30 09:00:00
5 3 2020-02-01 09:00:00 2020-06-30 09:00:00
6 3 2020-07-01 09:00:00 null
and we want to calculate the number of members at various times, which should return as follows
Datetime Current Past <Notes - not to be returned by the query>
2020-01-01 12:00:00 3 0 -- all 3 IDs have joined earlier on this date
2020-02-01 12:00:00 3 0 -- new membership for existing member (ID 3) is not counted
2020-06-30 12:00:00 2 1 -- ID 3 has cancelled earlier on this day
2020-07-01 12:00:00 3 0 -- ID 3 has re-joined earlier on this day
2020-12-31 12:00:00 2 1 -- ID 2 has cancelled earlier on this day
2021-01-01 12:00:00 3 0 -- ID 2 has re-joined earlier on this day
An ID may either be current or past, but never both. That is, if a past member re-joins, as in the case of ID 2 and 3 above, they become current members, and are no longer past members.
Also, a member may have multiple current memberships, but they can only be counted as a current member once, as in the case of ID 3 above.
How can this be achieved in MySQL ?
Here is a db<>fiddle with the above data
Test this:
WITH
cte1 AS ( SELECT start `timestamp` FROM dt
UNION
SELECT cancelled FROM dt WHERE cancelled IS NOT NULL ),
cte2 AS ( SELECT DISTINCT id
FROM dt )
SELECT cte1.`timestamp`, COUNT(DISTINCT dt.id) current, SUM(dt.id IS NULL) past
FROM cte1
CROSS JOIN cte2
LEFT JOIN dt ON cte1.`timestamp` >= dt.start
AND (cte1.`timestamp` < dt.cancelled OR dt.cancelled IS NULL)
AND cte2.id = dt.id
GROUP BY cte1.`timestamp`
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=942e4c97951ed0929e178134ef67ce69

MySQL SUM two columns then another column with the net amount

Hi ive tried all the solution on stack over flow to no avail
i have 2 tables with a ID primary key, then date, and amount. there can be multiple dates of the same date in the table. the debits uses negative numbers in debits table
table "credits"
id | date | amount
1 2020-01-01 10.00
2 2020-01-02 20.00
3 2020-01-03 30.00
4 2020-01-01 10.00
5 2020-01-02 10.00
6 2020-01-03 10.00
table "debits"
id | date | amount
55 2020-01-01 -5.00
56 2020-01-02 -5.00
57 2020-01-03 -5.00
58 2020-01-01 -5.00
59 2020-01-02 -5.00
60 2020-01-03 -5.00
I want to return a 3 column result like so, grouped by DATE with 4 fields, date, amount credits (for teh day) amount debits (for the day) and the amont total (for the day)
date | amount_credits | amount_debits | amount_total
2020-01-01 20 10 10
2020-01-02 30 10 20
2020-01-03 40 10 30
I would do this using union all and aggrgtion:
select date, sum(credit) as credits, abs(sum(debits)) as debits),
sum(credits) + sum(debits) as net
from ((select c.date, c.amount as credit, 0 as debit
from credits c
) union all
(select c.date, 0, d.amount
from debits d
)
) cd
group by date;
I note that the sign of the debits amount changes, from the source table to the result set, which is why the outer query uses abs().
In particular, using union all and group by ensures that all dates in the original data are in the result set -- even if the date in in only one of the tables.
I'd group both tables on the date and then join the two:
SELECT c.date, amount_credits, amount_debits, amount_credits - amount_debits AS amount_total
FROM (SELECT date, SUM(amount) AS amount_credits
FROM credits
GROUP BY date) c
JOIN (SELECT date, -1 * SUM(amount) AS amount_debits
FROM debits
GROUP BY date) d ON c.date = d.date

MYSQL return zero in date not present, and COUNT how many rows are present with a specific date

I have a table named calendar with a single column (mydate DATE).
MYDATE
2020-01-01
2020-01-02
2020-01-03
...
I have also a table named delivery with three columns (id PK, giorno DATE, totale FLOAT)
ID GIORNO TOTALE
1 2020-01-01 0.10
2 2020-01-01 5
3 2020-01-02 12
4 2020-01-12 5
5 2020-02-02 13.50
This is what I'm trying to obtain:
Day Numbers of orders
2020-01-01 2
2020-01-02 1
2020-01-03 0
2020-01-04 0
2020-01-05 0
2020-01-06 0
2020-01-07 0
2020-01-08 0
2020-01-09 0
2020-01-10 0
2020-01-11 0
2020-01-12 1
...
I was trying this query:
SELECT c.mydate, IFNULL(d.totale, 0) value
FROM delivery d
RIGHT JOIN calendar c
ON ( c.mydate = d.giorno )
GROUP BY c.mydate
ORDER BY c.mydate
Consider:
select c.mydate, count(d.id) number_of_orders
from calendar c
left join delivery d on d.giorno = c.mydate
group by c.mydate
This works by left-joining the calendar table with the orders table, then aggregating by date, and finally counting the number of matching rows in the order table.
This is quite close to your original query (although this uses left join instead of right join), however this uses an aggregate function to count the orders.