using "OR" and "HAVING SUM" Mysql 5.7

using "OR" and "HAVING SUM" Mysql 5.7 - mysql

this is my fiddle :https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=71b1fd5d8e222ab1c51ace8d1af4c94f
CREATE TABLE order_match(ID int(10) NOT NULL PRIMARY KEY AUTO_INCREMENT,
quantity decimal(10,2), createdAt date NOT NULL, order_status_id int(10) NOT NULL,
createdby int(11), code_order varchar(20) NOT NULL);
insert into order_match values
(1, 0.2, '2020-02-02', 6, 01, 0001),
(2, 1, '2020-02-03', 7, 02, 0002),
(3, 1.3, '2020-02-04', 7, 03, 0003),
(4, 1.4, '2020-02-08', 5, 08, 0004),
(5, 1.2, '2020-02-05', 8, 04, 0005),
(6, 1.4, '2020-03-01', 8, 05, 0006),
(7, 0.23, '2020-01-01', 8, 03, 0007),
(8, 2.3, '2020-02-07', 8, 04, 0009);
and then this is my table
select order_status_id, createdby, createdAt from order_match;
+-----------------+-----------+------------+
| order_status_id | createdby | createdAt |
+-----------------+-----------+------------+
| 6 | 1 | 2020-02-02 |
| 7 | 2 | 2020-02-03 |
| 7 | 3 | 2020-02-04 |
| 5 | 8 | 2020-02-08 |
| 8 | 4 | 2020-02-05 |
| 8 | 5 | 2020-03-01 |
| 8 | 3 | 2020-01-01 |
+-----------------+-----------+------------+
order_status_id are the status of transaction, "7" means no approval transaction, else are approval, createdby are the id of users who doing transaction, and createdAt are the date of transaction happen.
so i want to find out the repeat users who doing transaction in between '2020-02-01' and '2020-02-28', repeat users are the users who doing approval transaction before '2020-02-28' and atleast doing 1 more approval transaction again in date range ('2020-02-01' until '2020-02-28')
based on the explanation i used this query :
SELECT s1.createdby
FROM order_match s1
WHERE s1.order_status_Id in (4, 5, 6, 8)
GROUP BY s1.createdby
HAVING SUM(s1.createdAt BETWEEN '2020-02-01' AND '2020-02-28')
AND SUM(s1.createdAt <= '2020-02-28') > 1
OR exists (select 1 from order_match s1 where
s1.createdAt < '2020-02-01'
and s1.order_status_id in (4, 5, 6, 8));
from that query, the result was this :
+-----------+
| createdby |
+-----------+
| 1 |
| 3 |
| 4 |
| 5 |
| 8 |
+-----------+
and the expected results based on the data and explanation was like this :
+-----------+
| createdby |
+-----------+
| null |
+-----------+
because there's no users who fit with "repeat users" condition. where my wrong at?

Looks like
SELECT createdby
FROM order_match
-- select rows in specified data range
WHERE createdAt BETWEEN '2020-02-01' AND '2020-02-28'
GROUP BY createdby
-- check that user has more than one transaction which' status is not non-approved
HAVING SUM(order_status_id != 7) > 1 -- or SUM(order_status_id in (4, 5, 6, 8)) > 1
that's why i used or exists to check the users before '2020-02-01'
Sorry, I have understood the task wrongly.
SELECT createdby
FROM order_match
GROUP BY createdby
-- check that user has more than one transaction which' status is not non-approved
HAVING SUM(order_status_id != 7) > 1
-- and at least one of them is in specified data range
AND SUM(order_status_id != 7 AND createdAt BETWEEN '2020-02-01' AND '2020-02-28')
where my wrong at?
In WHERE IN - this condition gives TRUE for each createdby who has at least one approved transacions, because this transaction checks self in this condition.
Additionally - s1.createdAt BETWEEN '2020-02-01' AND '2020-02-28' overlaps s1.createdAt <= '2020-02-28', so 2nd condition is excess (if 1st is true then 2nd is true too).

Related

Complex queries to find sum of rows depending on date

CREATE TABLE master_tb
(
date DATE
item VARCHAR(10)
change INT
current INT
);
INSERT INTO master_tb (date, item, change, current)
VALUES
("2021-01-01", "ABC", 11, 11),
("2021-01-01", "KLM", 4, 4),
("2021-01-02", "KLM", -3, 1),
("2021-01-03", "KLM", -1, 0),
("2021-02-01", "KLM", 6, 6),
("2021-02-02", "XYZ", 5, 5),
("2021-02-08", "KLM", -3, 3),
("2021-02-09", "XYZ", -1, 4),
("2021-03-02", "XYZ", 2, 6),
("2021-03-08", "XYZ", -1, 5),
("2021-03-08", "KLM", -3, 0);
I have the above table for an inventory log. I want to get 2 things:
The current value given a #date. So if my given date is 2021-03-09, even though that date doesn't exist in the list, it will give me the most recent values of all ABC, XYZ, and KLM items and their current status. So the select table would look something like this:
+------+---------+
| item | current |
+======+=========+
| ABC | 11 |
+------+---------+
| XYZ | 5 |
+------+---------+
| KLM | 0 |
+------+---------+
Similarly, I want to get the current values but for specific timeframes, given a "now" date #date where that can be any date. So If #date = 2021-4-1, I am looking for something like this:
+------+-------+-----------+------------+------------+----------+
| item | total | 0-30 days | 31-60 days | 61-90 days | 90+ days |
+======+=======+===========+============+============+==========+
| XYZ | 5 | 1 | 4 | 0 | 0 |
+------+-------+-----------+------------+------------+----------+
| ABC | 11 | 0 | 0 | 11 | 0 |
+------+-------+-----------+------------+------------+----------+
| KLM | 0 | 0 | 0 | 0 | 0 |
+------+-------+-----------+------------+------------+----------+
One thing to note is that "older" items are deducted if there was a deduction. So if 5x item XYZ was added 50 days ago, and 2x 20 days ago, and it was reduced 3x 10 days ago, the table would show total = 4, 0-30 days = 2, and 31-60 days = 2 because the older items were deducted even tho the deduction occurred recently.
My first guess was to utilize partitions but I am not sure if that's possible, knowing the values outside of a partition is affected.
EDIT:
After a friend pointed out this article I have found a way to answer the first part of the question:
SELECT
item,
MAX(curr) as current // needed to group item columns
FROM(
SELECT
*,
LAST_VALUE(current) OVER // partitioning was needed to
( // find the most recent 'current'
PARTITION BY item // value added on that 'item'
ORDER BY date
RANGE BETWEEN
UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING
) curr
FROM tbl
WHERE date <= #date
)a
WHERE bal = balance
GROUP BY item
;

To answer the first part of your question, consider the following...
CREATE TABLE master_tb
( date DATE
, item VARCHAR(10)
, change_x INT
, current INT
, PRIMARY KEY(date,item)
);
INSERT INTO master_tb (date, item, change_x, current)
VALUES
("2021-01-01", "ABC", 11, 11),
("2021-01-01", "KLM", 4, 4),
("2021-01-02", "KLM", -3, 1),
("2021-01-03", "KLM", -1, 0),
("2021-02-01", "KLM", 6, 6),
("2021-02-02", "XYZ", 5, 5),
("2021-02-08", "KLM", -3, 3),
("2021-02-09", "XYZ", -1, 4),
("2021-03-02", "XYZ", 2, 6),
("2021-03-08", "XYZ", -1, 5),
("2021-03-08", "KLM", -3, 0);
SELECT item
, MAX(date) date
FROM master_tb
WHERE date <= '2021-03-09'
GROUP
BY item;
+------+------------+
| item | date |
+------+------------+
| ABC | 2021-01-01 |
| KLM | 2021-03-08 |
| XYZ | 2021-03-08 |
+------+------------+
SELECT a.*
FROM master_tb a
JOIN
( SELECT item
, MAX(date) date
FROM master_tb
WHERE date <= '2021-03-09'
GROUP
BY item
) b
ON b.item = a.item
AND b.date = a.date;
+------------+------+----------+---------+
| date | item | change_x | current |
+------------+------+----------+---------+
| 2021-01-01 | ABC | 11 | 11 |
| 2021-03-08 | KLM | -3 | 0 |
| 2021-03-08 | XYZ | -1 | 5 |
+------------+------+----------+---------+
We could also solve this using common table expressions/windowing functions, and that might be useful when we come to look at the next part of the problem - although I have a suspicion that much of that should be resolved in application code.

find out time difference for every user in condition mysql 5.7

this is my fiddle https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=7c549a3de0c8002ec43381462ba6a801
let's assume I have the data like this
CREATE TABLE test (
ID INT,
user_id INT,
createdAt DATE,
status_id INT
);
INSERT INTO test VALUES
(1, 12, '2020-01-01', 4),
(2, 12, '2020-01-03', 7),
(3, 12, '2020-01-06', 7),
(4, 13, '2020-01-02', 5),
(5, 13, '2020-01-03', 6),
(6, 14, '2020-03-03', 8),
(7, 13, '2020-03-04', 4),
(8, 15, '2020-04-04', 7),
(9, 14, '2020-03-02', 6),
(10, 14, '2020-03-10', 5),
(11, 13, '2020-04-10', 8);
select * from test
order by createdAt;
and this is the table after doing select (*)
+----+---------+------------+-----------+
| ID | user_id | createdAt | status_id |
+----+---------+------------+-----------+
| 1 | 12 | 2020-01-01 | 4 |
| 4 | 13 | 2020-01-02 | 5 |
| 2 | 12 | 2020-01-03 | 7 |
| 5 | 13 | 2020-01-03 | 6 |
| 3 | 12 | 2020-01-06 | 7 |
| 9 | 14 | 2020-03-02 | 6 |
| 6 | 14 | 2020-03-03 | 8 |
| 7 | 13 | 2020-03-04 | 4 |
| 10 | 14 | 2020-03-10 | 5 |
| 8 | 15 | 2020-04-04 | 7 |
| 11 | 13 | 2020-04-10 | 8 |
+----+---------+------------+-----------+
the id is the id of the transaction, user_Id is the id of the users who doing the transaction, createdAt are the date transaction happen, status_id is the status for the transaction (if the status_Id is 7, then the transaction are denied or not approval).
so on this case, I want to find out time difference for every approval transaction on every repeat users on time range between '2020-02-01' until '2020-04-01', repeat users are the users who doing transaction before the end of the time range, and at least doing 1 transaction again in the time range, on this case, users are doing approval transaction before '2020-04-01' and at least doing 1 more approval transaction again in between '2020-02-01' and '2020-04-01'.
from the explanation, I used this query
SELECT SUM(transactions) AS transactions,
MIN(`MIN`) AS `MIN`,
MAX(`MAX`) AS `MAX`,
SUM(total) / SUM(transactions) AS `AVG`
FROM (
SELECT user_id,
COUNT(*) AS transactions,
MIN(diff) AS `MIN`,
MAX(diff) AS `MAX`,
SUM(diff) AS total
FROM (
SELECT user_id, DATEDIFF((SELECT MIN(t2.createdAt)
FROM test t2
WHERE t2.user_id = t1.user_id
AND t1.createdAt < t2.createdAt
AND t2.status_id in (4, 5, 6, 8)
), t1.createdAt) AS diff
FROM test t1
WHERE status_id in (4, 5, 6, 8)
HAVING SUM(status_id != 7 and createdAt < '2020-04-01') > 1
AND SUM(status_id != 7 AND createdAt BETWEEN '2020-02-01'
AND '2020-04-01')
) DiffTable
WHERE diff IS NOT NULL
GROUP BY user_id
) totals
and it says
In aggregated query without GROUP BY, expression #1 of SELECT list contains nonaggregated column 'db_314931870.t1.user_id'; this is incompatible with sql_mode=only_full_group_by
expected results
+-----+-----+---------+
| MIN | MAX | AVG |
+-----+-----+---------+
| 1 | 61 | 21,6667 |
+-----+-----+---------+
explanation: min (minimum) is 1-day difference which happens for users_id 14 who doing approval transaction in '2020-03-02' and doing approval transaction again in '2020-03-03', max (maximum) is 61-time difference which happen in users_Id 13 who doing approval transaction in '2020-01-03'
and doing approval transaction again in '2020-03-04', average time difference is from sum all time difference in time range: count transaction happen in the time range

SELECT MIN(DATEDIFF(t2.createdAt, t1.createdAt)) min_diff,
MAX(DATEDIFF(t2.createdAt, t1.createdAt)) max_diff,
AVG(DATEDIFF(t2.createdAt, t1.createdAt)) avg_diff
FROM test t1
JOIN test t2 ON t1.user_id = t2.user_id
AND t1.createdAt < t2.createdAt
AND 7 NOT IN (t1.status_id, t2.status_id)
JOIN (SELECT t3.user_id
FROM test t3
WHERE t3.status_id != 7
GROUP BY t3.user_id
HAVING SUM(t3.createdAt < '2020-04-01')
AND SUM(t3.createdAt BETWEEN '2020-02-01' AND '2020-04-01')) t4 ON t1.user_id = t4.user_id
WHERE NOT EXISTS (SELECT NULL
FROM test t5
WHERE t1.user_id = t5.user_id
AND t5.status_id != 7
AND t1.createdAt < t5.createdAt
AND t5.createdAt < t2.createdAt)
fiddle with short explanations.

How to make Mysql variables work in a query

I've been struggling for a while now with attempting to generate code for automatic aggregations in my mysql/mariadb database. The method That i'm currently trying uses variables. I will admit in advance I'm not a database expert by any means. I'm totally self taught, and have been struggling to find adequate resources for this particular problem. Ive included simplified examples below, Oh and i'm using mariadb 10.1.
This code should work in mysql 5.6 as well as mariadb 10.0+, I have tested it on 10.1 and it works.
Here is my Table: and SQL FIDDLE <- doesn't work for some reason. Probably the dynamic columns. I'll leave it in case someone knows why.
CREATE TABLE data_points
(
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
device_id INTEGER,
dtime DATETIME,
sf INTEGER(11), -- sample frequency or interval
agg INTEGER(11), -- aggregation type, actually a fk
data_point BLOB,
PRIMARY KEY (id),
UNIQUE (device_id, dtime, sf, agg)
);
Lets insert some data:
INSERT INTO data_points
(device_id, dtime, sf, agg, data_point)
VALUES
(1, '2015-01-02 12:00:00', 1, 60, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 13:00:00', 1, 60, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 14:00:00', 1, 60, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 15:00:00', 1, 60, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 16:00:00', 1, 60, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45));
So up to this point everything works just fine. What i'm trying to do is perform aggregations over different time periods, my lowest grain period is 60 seconds. Here is where I have issues, Its probably something obvious.
SELECT
#dp_dtime := MAX(dtime),
#dp_aa := MIN(ROUND(COLUMN_GET(data_point, 'aa' AS DOUBLE), 4)),
#dp_ab := MIN(ROUND(COLUMN_GET(data_point, 'ab' AS DOUBLE), 4)),
#dp_ac := MIN(ROUND(COLUMN_GET(data_point, 'ac' AS DOUBLE), 4))
FROM data_points
WHERE
device_id = 1 AND
dtime BETWEEN '2015/01/02 12:00:00' AND '2015/01/17 23:05:00' AND
sf = 60 AND
agg = 1;
INSERT INTO data_points
(device_id, dtime, sf, agg, data_point)
VALUES (8, #dp_dtime, 300, 2, COLUMN_CREATE('aa', #dp_aa, 'ab', #dp_ab, 'ac', #dp_ac));
This ends up creating another row with NULL everywhere a variable was in the statement.
select #dp_dtime, #dp_aa, #dp_ab, #pd_ac;
-- This results in NULL, NULL, NULL, NULL
At this point I'm pretty sure i'm doing something wrong with the variables.
It's Late, 14 hour day. Am I even close? Is there a better/easier way?
Any help would be greatly appreciated.
EDIT:
In my real use case the number of columns is dependent on the type of device were doing an aggregation for. Columns are excel style 'aa' through 'zz' possible. although the max I've seen is about 150 cols wide. This may sound like a bad design, but the performance is surprising, I can't tell the difference between these dynamic columns and actual columns. (at least as long as you don't need to index on them)

Try the following queries.
SQL:
CREATE TABLE data_points
(
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
device_id INTEGER,
dtime DATETIME,
sf INTEGER(11), -- sample frequency or interval
agg INTEGER(11), -- aggregation type, actually a fk
data_point BLOB,
UNIQUE (device_id, dtime, sf, agg)
);
INSERT INTO data_points
(device_id, dtime, sf, agg, data_point)
VALUES
(1, '2015-01-02 12:00:00', 1, 1, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 13:00:00', 1, 1, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 14:00:00', 1, 1, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 15:00:00', 1, 1, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 16:00:00', 1, 1, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45));
select * from data_points;
SELECT
#dp_dtime := MAX(dtime) as dp_dtime,
#dp_aa := MIN(ROUND(COLUMN_GET(data_point, 'aa' AS DOUBLE), 4)) as dp_aa,
#dp_ab := MIN(ROUND(COLUMN_GET(data_point, 'ab' AS DOUBLE), 4)) as dp_ab,
#dp_ac := MIN(ROUND(COLUMN_GET(data_point, 'ac' AS DOUBLE), 4)) as dp_ac
FROM data_points
WHERE
device_id = 1 AND
dtime BETWEEN '2015/01/02 12:00:00' AND '2015/1/17 23:05:00' AND
sf = 1 AND
agg = 1;
INSERT INTO data_points
(device_id, dtime, sf, agg, data_point)
VALUES (8, #dp_dtime, 300, 2, COLUMN_CREATE('aa', #dp_aa, 'ab', #dp_ab, 'ac', #dp_ac));
select * from data_points;
Output:
mysql> select * from data_points;
+----+-----------+---------------------+------+------+----------------------------+
| id | device_id | dtime | sf | agg | data_point |
+----+-----------+---------------------+------+------+----------------------------+
| 1 | 1 | 2015-01-02 12:00:00 | 1 | 1 | aaabacDZ |
| 2 | 1 | 2015-01-02 13:00:00 | 1 | 1 | aaabacDZ |
| 3 | 1 | 2015-01-02 14:00:00 | 1 | 1 | aaabacDZ |
| 4 | 1 | 2015-01-02 15:00:00 | 1 | 1 | aaabacDZ |
| 5 | 1 | 2015-01-02 16:00:00 | 1 | 1 | aaabacDZ |
+----+-----------+---------------------+------+------+----------------------------+
5 rows in set (0.00 sec)
mysql> SELECT
-> #dp_dtime := MAX(dtime) as dp_dtime,
-> #dp_aa := MIN(ROUND(COLUMN_GET(data_point, 'aa' AS DOUBLE), 4)) as dp_aa,
-> #dp_ab := MIN(ROUND(COLUMN_GET(data_point, 'ab' AS DOUBLE), 4)) as dp_ab,
-> #dp_ac := MIN(ROUND(COLUMN_GET(data_point, 'ac' AS DOUBLE), 4)) as dp_ac
-> FROM data_points
-> WHERE
-> device_id = 1 AND
-> dtime BETWEEN '2015/01/02 12:00:00' AND '2015/1/17 23:05:00' AND
-> sf = 1 AND
-> agg = 1;
+---------------------+---------+---------+---------+
| dp_dtime | dp_aa | dp_ab | dp_ac |
+---------------------+---------+---------+---------+
| 2015-01-02 16:00:00 | 12.0000 | 34.0000 | 45.0000 |
+---------------------+---------+---------+---------+
1 row in set (0.00 sec)
mysql> INSERT INTO data_points
-> (device_id, dtime, sf, agg, data_point)
-> VALUES (8, #dp_dtime, 300, 2, COLUMN_CREATE('aa', #dp_aa, 'ab', #dp_ab, 'ac', #dp_ac));
Query OK, 1 row affected (0.00 sec)
mysql> select * from data_points;
+----+-----------+---------------------+------+------+-------------------------------------------------+
| id | device_id | dtime | sf | agg | data_point |
+----+-----------+---------------------+------+------+-------------------------------------------------+
| 1 | 1 | 2015-01-02 12:00:00 | 1 | 1 | aaabacDZ |
| 2 | 1 | 2015-01-02 13:00:00 | 1 | 1 | aaabacDZ |
| 3 | 1 | 2015-01-02 14:00:00 | 1 | 1 | aaabacDZ |
| 4 | 1 | 2015-01-02 15:00:00 | 1 | 1 | aaabacDZ |
| 5 | 1 | 2015-01-02 16:00:00 | 1 | 1 | aaabacDZ |
| 6 | 8 | 2015-01-02 16:00:00 | 300 | 2 | ▒ aaabac (# A# ▒F# |
+----+-----------+---------------------+------+------+-------------------------------------------------+
6 rows in set (0.00 sec)

Possibly a simple typo: I see #_dtime.
In the UNIQUE index, put dtime last; it will make the queries faster. Mini index lesson: All = columns should come first in a composite index, in any order (cardinality makes virtually no difference). Then you can put one 'range' (dtime). Any columns after a range are not used for filtering. See my cookbook.
Get rid of id and promote the UNIQUE index to PRIMARY KEY; it will make the queries still faster. Mini index lesson: Secondary keys (such as your UNIQUE) requires bouncing between the key and the data. The PRIMARY KEY is clustered with the data (in InnoDB), thereby avoiding the bouncing. Instead a 'range scan' over the PK is a range over the table.

How do I store a value from the last row in a variable using MySQL?

I am trying to calculate the date difference between each record in a dataset for each account.
Here is the data that I have
id aid value
1 1 2015-01-01
2 1 2015-01-07
4 1 2015-01-08
6 1 2015-04-10
3 2 2015-02-01
5 2 2015-02-05
I would first need to combine the data where I can use TIMESTAMPDIFF to calculate the difference in Days (i.e. TIMESTAMPDIFF(DAY, previousValue, currentValue)).
How can I combine the rows in the dataset about to look like this
aid currentValue previousValue
1 2015-01-07 2015-01-01
1 2015-01-08 2015-01-07
1 2015-04-10 2015-01-08
2 2015-02-05 2015-02-01
from there I can easily calculate the difference in days between current and previous value.
Note, that I have a large data set and I can't use subqueries in my select this is why I need to know how to do it using variables.
How can convert my initial dataset to the second dataset where I have currentValue, previousValue for each account?
Here is SQL to generate tables with the data above
CREATE TEMPORARY TABLE lst
(
id int,
account_id int,
value date
);
INSERT INTO lst VALUES
(1, 1, '2015-01-01')
, (2, 1, '2015-01-07')
, (3, 2, '2015-02-01')
, (4, 1, '2015-01-08')
, (5, 2, '2015-02-05')
, (6, 1, '2015-04-10');
CREATE TEMPORARY TABLE lst1 AS
SELECT * FROM lst ORDER BY account_id, value ASC;
UPDATED
This is what I get after attempting Giorgos Betsos' answer below
'1', '2015-01-01', '2015-01-07'
'1', '2015-01-07', '2015-01-08'
'1', '2015-02-05', '2015-04-10'
'2', '2015-01-08', '2015-02-01'
'2', '2015-02-01', '2015-02-05'

SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE test
(
id int,
account_id int,
value date
);
INSERT INTO test VALUES
(1, 1, '2015-01-01')
, (2, 1, '2015-01-07')
, (3, 2, '2015-02-01')
, (4, 1, '2015-01-08')
, (5, 2, '2015-02-05')
, (6, 1, '2015-04-10');
Query 1:
SELECT
IF(#accId = account_id, #prevDate, '-') as "Previous Date",
(#prevDate := value) as "Date",
(#accId :=account_id) as account_id
FROM
test, (SELECT #accId := 0) a, (SELECT #prevDate := '-') b
ORDER BY account_id ASC, value ASC
Results:
| Previous Date | Date | account_id |
|---------------|------------|------------|
| - | 2015-01-01 | 1 |
| 2015-01-01 | 2015-01-07 | 1 |
| 2015-01-07 | 2015-01-08 | 1 |
| 2015-01-08 | 2015-04-10 | 1 |
| - | 2015-02-01 | 2 |
| 2015-02-01 | 2015-02-05 | 2 |

Getting ROW_NUMBER in MySQL, restarting at 1 for sub-groups?

I have a database table containing responses to questions from users. Each question has a request and a response timestamp. The questions are asked in random order. Some users abandoned before completing all the questions and thus don't have subsequent response records.
I didn't capture the order in which the questions were asked for each user, but the sequence could be derived from the request timestamps (SELECT * FROM responses ORDER BY id_user, request_ts;).
I'm using MySQL, so I don't have ROW_NUMBER() as an available function. How would I go about getting the equivalent output, and have the counting restart on each id_user?
That is, for user_id=1, I want responses with values 1,2,..n ordered by request_ts, and then user_id=2 would have their responses with values 1,2,..n; and so on.
Ultimately, I want to get a set of data of aggregated average duration for each nth question (i.e. average duration for first question asked, ditto second question asked, etc).
+-----+-----+-------+
| Seq | Num | Avg_D |
+-----+-----+-------+
| 1 | 20 | 00:36 |
| 2 | 20 | 00:31 |
| 3 | 19 | 00:31 |
| 4 | 20 | 00:25 |
| 5 | 18 | 00:24 |
| 6 | 20 | 00:24 |
| 7 | 20 | 00:23 |
| 8 | 20 | 00:25 |
+-----+-----+-------+
This can then be used to show participant drop-off, survey fatigue, etc.

I created a dummy with sample data.
CREATE TABLE `test9b` ( `id` int(32) NOT NULL AUTO_INCREMENT, `user_id` int(32) NOT NULL, `num` int(32) NOT NULL, `avg` int(32) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8
INSERT INTO `test9b` (`id`, `user_id`, `num`, `avg`) VALUES
(1, 1, 21, 36),
(2, 1, 23, 32),
(3, 1, 20, 35),
(4, 2, 22, 32),
(5, 2, 25, 37),
(6, 2, 10, 39),
(7, 2, 20, 33),
(8, 3, 30, 36),
(9, 3, 40, 36),
(10, 4, 50, 36);
Query :
SELECT a.user_id, a.num, count(*) as row_number FROM test9b a
JOIN test9b b ON a.user_id = b.user_id AND a.num >= b.num
GROUP BY a.user_id, a.num
OUTPUT :
user_id num row_number
1 20 1
1 21 2
1 23 3
2 10 1
2 20 2
2 22 3
2 25 4
3 30 1
3 40 2
4 50 1

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

using "OR" and "HAVING SUM" Mysql 5.7 - mysql

Related

Complex queries to find sum of rows depending on date

find out time difference for every user in condition mysql 5.7

How to make Mysql variables work in a query

How do I store a value from the last row in a variable using MySQL?

Getting ROW_NUMBER in MySQL, restarting at 1 for sub-groups?

Categories

Resources