Complex queries to find sum of rows depending on date - mysql

CREATE TABLE master_tb
(
date DATE
item VARCHAR(10)
change INT
current INT
);
INSERT INTO master_tb (date, item, change, current)
VALUES
("2021-01-01", "ABC", 11, 11),
("2021-01-01", "KLM", 4, 4),
("2021-01-02", "KLM", -3, 1),
("2021-01-03", "KLM", -1, 0),
("2021-02-01", "KLM", 6, 6),
("2021-02-02", "XYZ", 5, 5),
("2021-02-08", "KLM", -3, 3),
("2021-02-09", "XYZ", -1, 4),
("2021-03-02", "XYZ", 2, 6),
("2021-03-08", "XYZ", -1, 5),
("2021-03-08", "KLM", -3, 0);
I have the above table for an inventory log. I want to get 2 things:
The current value given a #date. So if my given date is 2021-03-09, even though that date doesn't exist in the list, it will give me the most recent values of all ABC, XYZ, and KLM items and their current status. So the select table would look something like this:
+------+---------+
| item | current |
+======+=========+
| ABC | 11 |
+------+---------+
| XYZ | 5 |
+------+---------+
| KLM | 0 |
+------+---------+
Similarly, I want to get the current values but for specific timeframes, given a "now" date #date where that can be any date. So If #date = 2021-4-1, I am looking for something like this:
+------+-------+-----------+------------+------------+----------+
| item | total | 0-30 days | 31-60 days | 61-90 days | 90+ days |
+======+=======+===========+============+============+==========+
| XYZ | 5 | 1 | 4 | 0 | 0 |
+------+-------+-----------+------------+------------+----------+
| ABC | 11 | 0 | 0 | 11 | 0 |
+------+-------+-----------+------------+------------+----------+
| KLM | 0 | 0 | 0 | 0 | 0 |
+------+-------+-----------+------------+------------+----------+
One thing to note is that "older" items are deducted if there was a deduction. So if 5x item XYZ was added 50 days ago, and 2x 20 days ago, and it was reduced 3x 10 days ago, the table would show total = 4, 0-30 days = 2, and 31-60 days = 2 because the older items were deducted even tho the deduction occurred recently.
My first guess was to utilize partitions but I am not sure if that's possible, knowing the values outside of a partition is affected.
EDIT:
After a friend pointed out this article I have found a way to answer the first part of the question:
SELECT
item,
MAX(curr) as current // needed to group item columns
FROM(
SELECT
*,
LAST_VALUE(current) OVER // partitioning was needed to
( // find the most recent 'current'
PARTITION BY item // value added on that 'item'
ORDER BY date
RANGE BETWEEN
UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING
) curr
FROM tbl
WHERE date <= #date
)a
WHERE bal = balance
GROUP BY item
;

To answer the first part of your question, consider the following...
CREATE TABLE master_tb
( date DATE
, item VARCHAR(10)
, change_x INT
, current INT
, PRIMARY KEY(date,item)
);
INSERT INTO master_tb (date, item, change_x, current)
VALUES
("2021-01-01", "ABC", 11, 11),
("2021-01-01", "KLM", 4, 4),
("2021-01-02", "KLM", -3, 1),
("2021-01-03", "KLM", -1, 0),
("2021-02-01", "KLM", 6, 6),
("2021-02-02", "XYZ", 5, 5),
("2021-02-08", "KLM", -3, 3),
("2021-02-09", "XYZ", -1, 4),
("2021-03-02", "XYZ", 2, 6),
("2021-03-08", "XYZ", -1, 5),
("2021-03-08", "KLM", -3, 0);
SELECT item
, MAX(date) date
FROM master_tb
WHERE date <= '2021-03-09'
GROUP
BY item;
+------+------------+
| item | date |
+------+------------+
| ABC | 2021-01-01 |
| KLM | 2021-03-08 |
| XYZ | 2021-03-08 |
+------+------------+
SELECT a.*
FROM master_tb a
JOIN
( SELECT item
, MAX(date) date
FROM master_tb
WHERE date <= '2021-03-09'
GROUP
BY item
) b
ON b.item = a.item
AND b.date = a.date;
+------------+------+----------+---------+
| date | item | change_x | current |
+------------+------+----------+---------+
| 2021-01-01 | ABC | 11 | 11 |
| 2021-03-08 | KLM | -3 | 0 |
| 2021-03-08 | XYZ | -1 | 5 |
+------------+------+----------+---------+
We could also solve this using common table expressions/windowing functions, and that might be useful when we come to look at the next part of the problem - although I have a suspicion that much of that should be resolved in application code.

Related

using "OR" and "HAVING SUM" Mysql 5.7

this is my fiddle :https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=71b1fd5d8e222ab1c51ace8d1af4c94f
CREATE TABLE order_match(ID int(10) NOT NULL PRIMARY KEY AUTO_INCREMENT,
quantity decimal(10,2), createdAt date NOT NULL, order_status_id int(10) NOT NULL,
createdby int(11), code_order varchar(20) NOT NULL);
insert into order_match values
(1, 0.2, '2020-02-02', 6, 01, 0001),
(2, 1, '2020-02-03', 7, 02, 0002),
(3, 1.3, '2020-02-04', 7, 03, 0003),
(4, 1.4, '2020-02-08', 5, 08, 0004),
(5, 1.2, '2020-02-05', 8, 04, 0005),
(6, 1.4, '2020-03-01', 8, 05, 0006),
(7, 0.23, '2020-01-01', 8, 03, 0007),
(8, 2.3, '2020-02-07', 8, 04, 0009);
and then this is my table
select order_status_id, createdby, createdAt from order_match;
+-----------------+-----------+------------+
| order_status_id | createdby | createdAt |
+-----------------+-----------+------------+
| 6 | 1 | 2020-02-02 |
| 7 | 2 | 2020-02-03 |
| 7 | 3 | 2020-02-04 |
| 5 | 8 | 2020-02-08 |
| 8 | 4 | 2020-02-05 |
| 8 | 5 | 2020-03-01 |
| 8 | 3 | 2020-01-01 |
+-----------------+-----------+------------+
order_status_id are the status of transaction, "7" means no approval transaction, else are approval, createdby are the id of users who doing transaction, and createdAt are the date of transaction happen.
so i want to find out the repeat users who doing transaction in between '2020-02-01' and '2020-02-28', repeat users are the users who doing approval transaction before '2020-02-28' and atleast doing 1 more approval transaction again in date range ('2020-02-01' until '2020-02-28')
based on the explanation i used this query :
SELECT s1.createdby
FROM order_match s1
WHERE s1.order_status_Id in (4, 5, 6, 8)
GROUP BY s1.createdby
HAVING SUM(s1.createdAt BETWEEN '2020-02-01' AND '2020-02-28')
AND SUM(s1.createdAt <= '2020-02-28') > 1
OR exists (select 1 from order_match s1 where
s1.createdAt < '2020-02-01'
and s1.order_status_id in (4, 5, 6, 8));
from that query, the result was this :
+-----------+
| createdby |
+-----------+
| 1 |
| 3 |
| 4 |
| 5 |
| 8 |
+-----------+
and the expected results based on the data and explanation was like this :
+-----------+
| createdby |
+-----------+
| null |
+-----------+
because there's no users who fit with "repeat users" condition. where my wrong at?
Looks like
SELECT createdby
FROM order_match
-- select rows in specified data range
WHERE createdAt BETWEEN '2020-02-01' AND '2020-02-28'
GROUP BY createdby
-- check that user has more than one transaction which' status is not non-approved
HAVING SUM(order_status_id != 7) > 1 -- or SUM(order_status_id in (4, 5, 6, 8)) > 1
that's why i used or exists to check the users before '2020-02-01'
Sorry, I have understood the task wrongly.
SELECT createdby
FROM order_match
GROUP BY createdby
-- check that user has more than one transaction which' status is not non-approved
HAVING SUM(order_status_id != 7) > 1
-- and at least one of them is in specified data range
AND SUM(order_status_id != 7 AND createdAt BETWEEN '2020-02-01' AND '2020-02-28')
where my wrong at?
In WHERE IN - this condition gives TRUE for each createdby who has at least one approved transacions, because this transaction checks self in this condition.
Additionally - s1.createdAt BETWEEN '2020-02-01' AND '2020-02-28' overlaps s1.createdAt <= '2020-02-28', so 2nd condition is excess (if 1st is true then 2nd is true too).

Sort records on multiple columns and conditions

I have the below table, that stores the rank of person participating in respective events.
event_running and event_jumping are the events and the ranks stored.
CREATE TABLE `ranks` (
`id` int(11) NOT NULL,
`personid` int(11) NOT NULL,
`event_running` int(11) DEFAULT NULL,
`event_longjump` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Sample data
INSERT INTO `ranks` (`id`, `personid`, `event_running`, `event_longjump`) VALUES
(1, 1, 4, 8),
(2, 2, 10, 6),
(3, 3, 5, 0),
(4, 5, 20, 1),
(5, 4, 9, 3),
(6, 6, 1, 2);
SQL Fiddle Link
I want to build a leaderboard as below
| Standing | PersonID | RunningRank | JumpingRank |
| 1 | 6 | 1 | 2 |
| 2 | 4 | 9 | 3 |
| 3 | 1 | 4 | 8 |
| 4 | 3 | 5 | 0 |
| 5 | 2 | 10 | 6 |
This has to be sorted in ascending order - irrespective of the events lowest come first and also ranks above 20 are ignored.
And inputs on how can this be done?
you can use something similar to below
select PersonID,
RunningRank,
JumpingRank,
(RunningRank + JumpingRank) as Total
from ranks
order by Total asc
limit 20;
Here's your query.
set #row_number = 0;
select (#row_number:=#row_number + 1) as standing, personid, event_running, event_longjump from ranks
where event_running < 20 and event_longjump < 20
order by
case when if(event_longjump=0,event_running ,event_longjump) < event_running
then event_longjump else event_running end
see dbfiddle
Your sorting criteria is a bit vague. I am assuming that you want to sort on the basis of cumulative of the ranks across all events and its jumping score.
Also, please explain the position of person Id 3 in your queation.
You can do,
select PersonID,
RunningRank,
JumpingRank,
(JumpingRank + RunningRank) as cumulativeRank
from ranks
ORDER BY cumulativeRank, JumpingRank aesc
limit 20;
This will get you all the positions baring person id 3

Not displaying average values for users with more than one instance

Basically, I am fairly new to SQL and I am playing around with it. It is fun and interesting and basically I am trying to select the userId and average gametime for each user who has more than one game session. This is my table below:
id | userId |gameTime |
________________________
1 | 1 | 10 |
2 | 2 | 10 |
3 | 3 | 15 |
4 | 1 | 10 |
5 | 2 | 25 |
_________________________
CREATE TABLE game_session(
id INTEGER PRIMARY KEY,
userId INTEGER NOT NULL,
gameTime DECIMAL NOT NULL);
INSERT INTO game_session VALUES
(1, 1, 10),
(2, 2, 10),
(3, 3, 15),
(4, 1, 10),
(5, 2, 25);
So I just want to display all instances of userID more than once and its total average time. I don't want to display the count row in the end.
I wrote my script but it isn't displaying the table WITHOUT THE COUNT(userID) AND it is showing the userID even for once instance, not more than 1!
This is my script:
SELECT userID, AVG(gameTime)
FROM game_session
WHERE ((SELECT COUNT(userID) FROM game_session) > 1)
GROUP BY userID
Where am I going wrong?
I believe you're looking for a having clause.
Try this out:
select userID, avg(gameTime)
from game_session
group by userID
having count(userID) > 1;

How to make Mysql variables work in a query

I've been struggling for a while now with attempting to generate code for automatic aggregations in my mysql/mariadb database. The method That i'm currently trying uses variables. I will admit in advance I'm not a database expert by any means. I'm totally self taught, and have been struggling to find adequate resources for this particular problem. Ive included simplified examples below, Oh and i'm using mariadb 10.1.
This code should work in mysql 5.6 as well as mariadb 10.0+, I have tested it on 10.1 and it works.
Here is my Table: and SQL FIDDLE <- doesn't work for some reason. Probably the dynamic columns. I'll leave it in case someone knows why.
CREATE TABLE data_points
(
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
device_id INTEGER,
dtime DATETIME,
sf INTEGER(11), -- sample frequency or interval
agg INTEGER(11), -- aggregation type, actually a fk
data_point BLOB,
PRIMARY KEY (id),
UNIQUE (device_id, dtime, sf, agg)
);
Lets insert some data:
INSERT INTO data_points
(device_id, dtime, sf, agg, data_point)
VALUES
(1, '2015-01-02 12:00:00', 1, 60, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 13:00:00', 1, 60, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 14:00:00', 1, 60, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 15:00:00', 1, 60, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 16:00:00', 1, 60, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45));
So up to this point everything works just fine. What i'm trying to do is perform aggregations over different time periods, my lowest grain period is 60 seconds. Here is where I have issues, Its probably something obvious.
SELECT
#dp_dtime := MAX(dtime),
#dp_aa := MIN(ROUND(COLUMN_GET(data_point, 'aa' AS DOUBLE), 4)),
#dp_ab := MIN(ROUND(COLUMN_GET(data_point, 'ab' AS DOUBLE), 4)),
#dp_ac := MIN(ROUND(COLUMN_GET(data_point, 'ac' AS DOUBLE), 4))
FROM data_points
WHERE
device_id = 1 AND
dtime BETWEEN '2015/01/02 12:00:00' AND '2015/01/17 23:05:00' AND
sf = 60 AND
agg = 1;
INSERT INTO data_points
(device_id, dtime, sf, agg, data_point)
VALUES (8, #dp_dtime, 300, 2, COLUMN_CREATE('aa', #dp_aa, 'ab', #dp_ab, 'ac', #dp_ac));
This ends up creating another row with NULL everywhere a variable was in the statement.
select #dp_dtime, #dp_aa, #dp_ab, #pd_ac;
-- This results in NULL, NULL, NULL, NULL
At this point I'm pretty sure i'm doing something wrong with the variables.
It's Late, 14 hour day. Am I even close? Is there a better/easier way?
Any help would be greatly appreciated.
EDIT:
In my real use case the number of columns is dependent on the type of device were doing an aggregation for. Columns are excel style 'aa' through 'zz' possible. although the max I've seen is about 150 cols wide. This may sound like a bad design, but the performance is surprising, I can't tell the difference between these dynamic columns and actual columns. (at least as long as you don't need to index on them)
Try the following queries.
SQL:
CREATE TABLE data_points
(
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
device_id INTEGER,
dtime DATETIME,
sf INTEGER(11), -- sample frequency or interval
agg INTEGER(11), -- aggregation type, actually a fk
data_point BLOB,
UNIQUE (device_id, dtime, sf, agg)
);
INSERT INTO data_points
(device_id, dtime, sf, agg, data_point)
VALUES
(1, '2015-01-02 12:00:00', 1, 1, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 13:00:00', 1, 1, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 14:00:00', 1, 1, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 15:00:00', 1, 1, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 16:00:00', 1, 1, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45));
select * from data_points;
SELECT
#dp_dtime := MAX(dtime) as dp_dtime,
#dp_aa := MIN(ROUND(COLUMN_GET(data_point, 'aa' AS DOUBLE), 4)) as dp_aa,
#dp_ab := MIN(ROUND(COLUMN_GET(data_point, 'ab' AS DOUBLE), 4)) as dp_ab,
#dp_ac := MIN(ROUND(COLUMN_GET(data_point, 'ac' AS DOUBLE), 4)) as dp_ac
FROM data_points
WHERE
device_id = 1 AND
dtime BETWEEN '2015/01/02 12:00:00' AND '2015/1/17 23:05:00' AND
sf = 1 AND
agg = 1;
INSERT INTO data_points
(device_id, dtime, sf, agg, data_point)
VALUES (8, #dp_dtime, 300, 2, COLUMN_CREATE('aa', #dp_aa, 'ab', #dp_ab, 'ac', #dp_ac));
select * from data_points;
Output:
mysql> select * from data_points;
+----+-----------+---------------------+------+------+----------------------------+
| id | device_id | dtime | sf | agg | data_point |
+----+-----------+---------------------+------+------+----------------------------+
| 1 | 1 | 2015-01-02 12:00:00 | 1 | 1 | aaabacDZ |
| 2 | 1 | 2015-01-02 13:00:00 | 1 | 1 | aaabacDZ |
| 3 | 1 | 2015-01-02 14:00:00 | 1 | 1 | aaabacDZ |
| 4 | 1 | 2015-01-02 15:00:00 | 1 | 1 | aaabacDZ |
| 5 | 1 | 2015-01-02 16:00:00 | 1 | 1 | aaabacDZ |
+----+-----------+---------------------+------+------+----------------------------+
5 rows in set (0.00 sec)
mysql> SELECT
-> #dp_dtime := MAX(dtime) as dp_dtime,
-> #dp_aa := MIN(ROUND(COLUMN_GET(data_point, 'aa' AS DOUBLE), 4)) as dp_aa,
-> #dp_ab := MIN(ROUND(COLUMN_GET(data_point, 'ab' AS DOUBLE), 4)) as dp_ab,
-> #dp_ac := MIN(ROUND(COLUMN_GET(data_point, 'ac' AS DOUBLE), 4)) as dp_ac
-> FROM data_points
-> WHERE
-> device_id = 1 AND
-> dtime BETWEEN '2015/01/02 12:00:00' AND '2015/1/17 23:05:00' AND
-> sf = 1 AND
-> agg = 1;
+---------------------+---------+---------+---------+
| dp_dtime | dp_aa | dp_ab | dp_ac |
+---------------------+---------+---------+---------+
| 2015-01-02 16:00:00 | 12.0000 | 34.0000 | 45.0000 |
+---------------------+---------+---------+---------+
1 row in set (0.00 sec)
mysql> INSERT INTO data_points
-> (device_id, dtime, sf, agg, data_point)
-> VALUES (8, #dp_dtime, 300, 2, COLUMN_CREATE('aa', #dp_aa, 'ab', #dp_ab, 'ac', #dp_ac));
Query OK, 1 row affected (0.00 sec)
mysql> select * from data_points;
+----+-----------+---------------------+------+------+-------------------------------------------------+
| id | device_id | dtime | sf | agg | data_point |
+----+-----------+---------------------+------+------+-------------------------------------------------+
| 1 | 1 | 2015-01-02 12:00:00 | 1 | 1 | aaabacDZ |
| 2 | 1 | 2015-01-02 13:00:00 | 1 | 1 | aaabacDZ |
| 3 | 1 | 2015-01-02 14:00:00 | 1 | 1 | aaabacDZ |
| 4 | 1 | 2015-01-02 15:00:00 | 1 | 1 | aaabacDZ |
| 5 | 1 | 2015-01-02 16:00:00 | 1 | 1 | aaabacDZ |
| 6 | 8 | 2015-01-02 16:00:00 | 300 | 2 | ▒ aaabac (# A# ▒F# |
+----+-----------+---------------------+------+------+-------------------------------------------------+
6 rows in set (0.00 sec)
Possibly a simple typo: I see #_dtime.
In the UNIQUE index, put dtime last; it will make the queries faster. Mini index lesson: All = columns should come first in a composite index, in any order (cardinality makes virtually no difference). Then you can put one 'range' (dtime). Any columns after a range are not used for filtering. See my cookbook.
Get rid of id and promote the UNIQUE index to PRIMARY KEY; it will make the queries still faster. Mini index lesson: Secondary keys (such as your UNIQUE) requires bouncing between the key and the data. The PRIMARY KEY is clustered with the data (in InnoDB), thereby avoiding the bouncing. Instead a 'range scan' over the PK is a range over the table.

How do I store a value from the last row in a variable using MySQL?

I am trying to calculate the date difference between each record in a dataset for each account.
Here is the data that I have
id aid value
1 1 2015-01-01
2 1 2015-01-07
4 1 2015-01-08
6 1 2015-04-10
3 2 2015-02-01
5 2 2015-02-05
I would first need to combine the data where I can use TIMESTAMPDIFF to calculate the difference in Days (i.e. TIMESTAMPDIFF(DAY, previousValue, currentValue)).
How can I combine the rows in the dataset about to look like this
aid currentValue previousValue
1 2015-01-07 2015-01-01
1 2015-01-08 2015-01-07
1 2015-04-10 2015-01-08
2 2015-02-05 2015-02-01
from there I can easily calculate the difference in days between current and previous value.
Note, that I have a large data set and I can't use subqueries in my select this is why I need to know how to do it using variables.
How can convert my initial dataset to the second dataset where I have currentValue, previousValue for each account?
Here is SQL to generate tables with the data above
CREATE TEMPORARY TABLE lst
(
id int,
account_id int,
value date
);
INSERT INTO lst VALUES
(1, 1, '2015-01-01')
, (2, 1, '2015-01-07')
, (3, 2, '2015-02-01')
, (4, 1, '2015-01-08')
, (5, 2, '2015-02-05')
, (6, 1, '2015-04-10');
CREATE TEMPORARY TABLE lst1 AS
SELECT * FROM lst ORDER BY account_id, value ASC;
UPDATED
This is what I get after attempting Giorgos Betsos' answer below
'1', '2015-01-01', '2015-01-07'
'1', '2015-01-07', '2015-01-08'
'1', '2015-02-05', '2015-04-10'
'2', '2015-01-08', '2015-02-01'
'2', '2015-02-01', '2015-02-05'
SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE test
(
id int,
account_id int,
value date
);
INSERT INTO test VALUES
(1, 1, '2015-01-01')
, (2, 1, '2015-01-07')
, (3, 2, '2015-02-01')
, (4, 1, '2015-01-08')
, (5, 2, '2015-02-05')
, (6, 1, '2015-04-10');
Query 1:
SELECT
IF(#accId = account_id, #prevDate, '-') as "Previous Date",
(#prevDate := value) as "Date",
(#accId :=account_id) as account_id
FROM
test, (SELECT #accId := 0) a, (SELECT #prevDate := '-') b
ORDER BY account_id ASC, value ASC
Results:
| Previous Date | Date | account_id |
|---------------|------------|------------|
| - | 2015-01-01 | 1 |
| 2015-01-01 | 2015-01-07 | 1 |
| 2015-01-07 | 2015-01-08 | 1 |
| 2015-01-08 | 2015-04-10 | 1 |
| - | 2015-02-01 | 2 |
| 2015-02-01 | 2015-02-05 | 2 |