How to make Mysql variables work in a query - mysql

I've been struggling for a while now with attempting to generate code for automatic aggregations in my mysql/mariadb database. The method That i'm currently trying uses variables. I will admit in advance I'm not a database expert by any means. I'm totally self taught, and have been struggling to find adequate resources for this particular problem. Ive included simplified examples below, Oh and i'm using mariadb 10.1.
This code should work in mysql 5.6 as well as mariadb 10.0+, I have tested it on 10.1 and it works.
Here is my Table: and SQL FIDDLE <- doesn't work for some reason. Probably the dynamic columns. I'll leave it in case someone knows why.
CREATE TABLE data_points
(
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
device_id INTEGER,
dtime DATETIME,
sf INTEGER(11), -- sample frequency or interval
agg INTEGER(11), -- aggregation type, actually a fk
data_point BLOB,
PRIMARY KEY (id),
UNIQUE (device_id, dtime, sf, agg)
);
Lets insert some data:
INSERT INTO data_points
(device_id, dtime, sf, agg, data_point)
VALUES
(1, '2015-01-02 12:00:00', 1, 60, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 13:00:00', 1, 60, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 14:00:00', 1, 60, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 15:00:00', 1, 60, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 16:00:00', 1, 60, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45));
So up to this point everything works just fine. What i'm trying to do is perform aggregations over different time periods, my lowest grain period is 60 seconds. Here is where I have issues, Its probably something obvious.
SELECT
#dp_dtime := MAX(dtime),
#dp_aa := MIN(ROUND(COLUMN_GET(data_point, 'aa' AS DOUBLE), 4)),
#dp_ab := MIN(ROUND(COLUMN_GET(data_point, 'ab' AS DOUBLE), 4)),
#dp_ac := MIN(ROUND(COLUMN_GET(data_point, 'ac' AS DOUBLE), 4))
FROM data_points
WHERE
device_id = 1 AND
dtime BETWEEN '2015/01/02 12:00:00' AND '2015/01/17 23:05:00' AND
sf = 60 AND
agg = 1;
INSERT INTO data_points
(device_id, dtime, sf, agg, data_point)
VALUES (8, #dp_dtime, 300, 2, COLUMN_CREATE('aa', #dp_aa, 'ab', #dp_ab, 'ac', #dp_ac));
This ends up creating another row with NULL everywhere a variable was in the statement.
select #dp_dtime, #dp_aa, #dp_ab, #pd_ac;
-- This results in NULL, NULL, NULL, NULL
At this point I'm pretty sure i'm doing something wrong with the variables.
It's Late, 14 hour day. Am I even close? Is there a better/easier way?
Any help would be greatly appreciated.
EDIT:
In my real use case the number of columns is dependent on the type of device were doing an aggregation for. Columns are excel style 'aa' through 'zz' possible. although the max I've seen is about 150 cols wide. This may sound like a bad design, but the performance is surprising, I can't tell the difference between these dynamic columns and actual columns. (at least as long as you don't need to index on them)

Try the following queries.
SQL:
CREATE TABLE data_points
(
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
device_id INTEGER,
dtime DATETIME,
sf INTEGER(11), -- sample frequency or interval
agg INTEGER(11), -- aggregation type, actually a fk
data_point BLOB,
UNIQUE (device_id, dtime, sf, agg)
);
INSERT INTO data_points
(device_id, dtime, sf, agg, data_point)
VALUES
(1, '2015-01-02 12:00:00', 1, 1, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 13:00:00', 1, 1, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 14:00:00', 1, 1, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 15:00:00', 1, 1, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45)),
(1, '2015-01-02 16:00:00', 1, 1, COLUMN_CREATE('aa', 12, 'ab', 34, 'ac', 45));
select * from data_points;
SELECT
#dp_dtime := MAX(dtime) as dp_dtime,
#dp_aa := MIN(ROUND(COLUMN_GET(data_point, 'aa' AS DOUBLE), 4)) as dp_aa,
#dp_ab := MIN(ROUND(COLUMN_GET(data_point, 'ab' AS DOUBLE), 4)) as dp_ab,
#dp_ac := MIN(ROUND(COLUMN_GET(data_point, 'ac' AS DOUBLE), 4)) as dp_ac
FROM data_points
WHERE
device_id = 1 AND
dtime BETWEEN '2015/01/02 12:00:00' AND '2015/1/17 23:05:00' AND
sf = 1 AND
agg = 1;
INSERT INTO data_points
(device_id, dtime, sf, agg, data_point)
VALUES (8, #dp_dtime, 300, 2, COLUMN_CREATE('aa', #dp_aa, 'ab', #dp_ab, 'ac', #dp_ac));
select * from data_points;
Output:
mysql> select * from data_points;
+----+-----------+---------------------+------+------+----------------------------+
| id | device_id | dtime | sf | agg | data_point |
+----+-----------+---------------------+------+------+----------------------------+
| 1 | 1 | 2015-01-02 12:00:00 | 1 | 1 | aaabacDZ |
| 2 | 1 | 2015-01-02 13:00:00 | 1 | 1 | aaabacDZ |
| 3 | 1 | 2015-01-02 14:00:00 | 1 | 1 | aaabacDZ |
| 4 | 1 | 2015-01-02 15:00:00 | 1 | 1 | aaabacDZ |
| 5 | 1 | 2015-01-02 16:00:00 | 1 | 1 | aaabacDZ |
+----+-----------+---------------------+------+------+----------------------------+
5 rows in set (0.00 sec)
mysql> SELECT
-> #dp_dtime := MAX(dtime) as dp_dtime,
-> #dp_aa := MIN(ROUND(COLUMN_GET(data_point, 'aa' AS DOUBLE), 4)) as dp_aa,
-> #dp_ab := MIN(ROUND(COLUMN_GET(data_point, 'ab' AS DOUBLE), 4)) as dp_ab,
-> #dp_ac := MIN(ROUND(COLUMN_GET(data_point, 'ac' AS DOUBLE), 4)) as dp_ac
-> FROM data_points
-> WHERE
-> device_id = 1 AND
-> dtime BETWEEN '2015/01/02 12:00:00' AND '2015/1/17 23:05:00' AND
-> sf = 1 AND
-> agg = 1;
+---------------------+---------+---------+---------+
| dp_dtime | dp_aa | dp_ab | dp_ac |
+---------------------+---------+---------+---------+
| 2015-01-02 16:00:00 | 12.0000 | 34.0000 | 45.0000 |
+---------------------+---------+---------+---------+
1 row in set (0.00 sec)
mysql> INSERT INTO data_points
-> (device_id, dtime, sf, agg, data_point)
-> VALUES (8, #dp_dtime, 300, 2, COLUMN_CREATE('aa', #dp_aa, 'ab', #dp_ab, 'ac', #dp_ac));
Query OK, 1 row affected (0.00 sec)
mysql> select * from data_points;
+----+-----------+---------------------+------+------+-------------------------------------------------+
| id | device_id | dtime | sf | agg | data_point |
+----+-----------+---------------------+------+------+-------------------------------------------------+
| 1 | 1 | 2015-01-02 12:00:00 | 1 | 1 | aaabacDZ |
| 2 | 1 | 2015-01-02 13:00:00 | 1 | 1 | aaabacDZ |
| 3 | 1 | 2015-01-02 14:00:00 | 1 | 1 | aaabacDZ |
| 4 | 1 | 2015-01-02 15:00:00 | 1 | 1 | aaabacDZ |
| 5 | 1 | 2015-01-02 16:00:00 | 1 | 1 | aaabacDZ |
| 6 | 8 | 2015-01-02 16:00:00 | 300 | 2 | ▒ aaabac (# A# ▒F# |
+----+-----------+---------------------+------+------+-------------------------------------------------+
6 rows in set (0.00 sec)

Possibly a simple typo: I see #_dtime.
In the UNIQUE index, put dtime last; it will make the queries faster. Mini index lesson: All = columns should come first in a composite index, in any order (cardinality makes virtually no difference). Then you can put one 'range' (dtime). Any columns after a range are not used for filtering. See my cookbook.
Get rid of id and promote the UNIQUE index to PRIMARY KEY; it will make the queries still faster. Mini index lesson: Secondary keys (such as your UNIQUE) requires bouncing between the key and the data. The PRIMARY KEY is clustered with the data (in InnoDB), thereby avoiding the bouncing. Instead a 'range scan' over the PK is a range over the table.

Related

MySQL query to show data separated by comma with group by different columns

I have a MySQL table where I have stored all of users searches. So the table looks something like this
CREATE TABLE `users_search_activity` (
`ID` bigint(20) UNSIGNED NOT NULL,
`user_id` int(11) NOT NULL,
`country_id` int(11) NOT NULL,
`search_keywords` text COLLATE utf8mb4_unicode_ci NOT NULL,
`date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
--
-- Dumping data for table `users_search_activity`
--
INSERT INTO `users_search_activity` (`ID`, `user_id`, `country_id`, `search_keywords`, `date`) VALUES
(1, 132, 2, 'xavie', '2021-07-13 08:20:37'),
(2, 132, 6, 'xavier', '2021-07-13 08:21:38'),
(3, 132, 5, 'xavier ins', '2021-07-13 08:21:39'),
(4, 132, 4, 'xavier ins', '2021-07-13 08:21:39'),
(5, 131, 9, 'xavier ins', '2021-07-13 08:22:12'),
(6, 132, 7, 'xavier ins', '2021-07-13 08:22:25'),
(7, 132, 8, 'xavier ins', '2021-07-13 09:24:43'),
(8, 132, 6, 'xavier ins', '2021-07-13 09:24:45'),
(9, 132, 4, 'xavier insa', '2021-07-13 09:24:47'),
(10, 131, 5, 'ins', '2021-07-13 09:24:54'),
(11, 132, 3, 'ins', '2021-07-13 09:24:54'),
(12, 132, 2, 'ins', '2021-07-13 09:24:58'),
(13, 132, 9, 'ins', '2021-07-13 09:24:59'),
(14, 132, 0, 'ins', '2021-07-13 09:25:00'),
(15, 132, 0, 'ins', '2021-07-13 09:25:02'),
(16, 132, 0, 'inst', '2021-07-13 09:58:20'),
(17, 132, 0, 'inst', '2021-07-04 09:58:25'),
(18, 132, 0, 'inst', '2021-07-07 09:58:25'),
(19, 132, 0, 'inst', '2021-07-11 09:58:26'),
(20, 1, 12, 'University Business Academy in Novi Sad', '2021-07-14 10:16:33');
--
-- Indexes for dumped tables
--
--
-- Indexes for table `users_search_activity`
--
ALTER TABLE `users_search_activity`
ADD PRIMARY KEY (`ID`);
--
-- AUTO_INCREMENT for dumped tables
--
--
-- AUTO_INCREMENT for table `users_search_activity`
--
ALTER TABLE `users_search_activity`
MODIFY `ID` bigint(20) UNSIGNED NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=20;
COMMIT;
Now I want to make some query from where I can get the data group by country_id and date.
So for that I have made my query like this
WITH cte AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY country_id, DATE(date) ORDER BY ID) rn, COUNT(*) OVER (PARTITION BY country_id, DATE(date)) cnt FROM users_search_activity ) SELECT ID, cnt AS count, search_keywords, user_id, country_id, DATE(date) as date FROM cte WHERE rn = 1;
Here its working fine except the search_keywords. It is only showing single search_keywords. I want to show them all one by one separated by comma for the given date and country_id.
So can someone tell me how to do that? Any help or suggestions would be really appreciable.
Thanks,
The output should be something like this
count user_id country_id date search_keywords
1 132 4 2021-07-13 xavier ins, xavier insa
you can use group_concat
WITH cte AS (
SELECT *,DATE(date),ROW_NUMBER() OVER (PARTITION BY country_id, DATE(date) ORDER BY ID) rn,
COUNT(*) OVER (PARTITION BY country_id, DATE(date)) cnt
FROM users_search_activity
)
,tab2 as (
select t1.country_id,
date(date) dat,
group_concat(t1.search_keywords)
from cte t1
group by t1.country_id,
date(date)
)
SELECT *
FROM cte t1,
tab2 t2
WHERE t1.rn = 1
and t1.country_id = t2.country_id
and DATE(t1.date) = t2.dat
;
update at 2021/8/3
5.7 also has group_concat,you just need to deal with row_number.
select t1.country_id,
date(date) dat,
group_concat(t1.search_keywords),
max(case when t1.row_number = 1 then t1.user_id else null end) user_id
from (
select t1.*,
#rn := case when #temp1 is null then 1
when #temp1 = t1.country_id and #temp2 = DATE(date) then 0
else 1
end row_number,
#temp1:= t1.country_id,
#temp2:= DATE(date)
from users_search_activity t1,(select #rn:=0, #temp1:='', #temp2:='') t2
order by country_id, DATE(date), t1.id
) t1
group by t1.country_id,
date(date)
;
First of all thanks for providing such detail information in text format. Your question deserves upvote for that.
You don't need any ranking function like row_number() or cte. Simple group by with group_concat() is enough.
Query:
SELECT COUNT(*) OVER (PARTITION BY country_id, DATE(date)) AS count,
user_id, country_id, DATE(date) as date, group_concat(search_keywords) FROM users_search_activity
group by user_id,country_id,DATE(date)
Output:
|count | user_id | country_id | date | group_concat(search_keywords)
|----: | ------: | ---------: | :--------- | :--------------------------------------
| 1 | 132 | 0 | 2021-07-04 | inst
| 1 | 132 | 0 | 2021-07-07 | inst
| 1 | 132 | 0 | 2021-07-11 | inst
| 1 | 132 | 0 | 2021-07-13 | inst,ins,ins
| 1 | 132 | 2 | 2021-07-13 | xavie,ins
| 1 | 132 | 3 | 2021-07-13 | ins
| 1 | 132 | 4 | 2021-07-13 | xavier insa,xavier ins
| 2 | 131 | 5 | 2021-07-13 | ins
| 2 | 132 | 5 | 2021-07-13 | xavier ins
| 1 | 132 | 6 | 2021-07-13 | xavier ins,xavier
| 1 | 132 | 7 | 2021-07-13 | xavier ins
| 1 | 132 | 8 | 2021-07-13 | xavier ins
| 2 | 131 | 9 | 2021-07-13 | xavier ins
| 2 | 132 | 9 | 2021-07-13 | ins
| 1 | 1 | 12 | 2021-07-14 | University Business Academy in Novi Sad
db<>fiddle here

Complex queries to find sum of rows depending on date

CREATE TABLE master_tb
(
date DATE
item VARCHAR(10)
change INT
current INT
);
INSERT INTO master_tb (date, item, change, current)
VALUES
("2021-01-01", "ABC", 11, 11),
("2021-01-01", "KLM", 4, 4),
("2021-01-02", "KLM", -3, 1),
("2021-01-03", "KLM", -1, 0),
("2021-02-01", "KLM", 6, 6),
("2021-02-02", "XYZ", 5, 5),
("2021-02-08", "KLM", -3, 3),
("2021-02-09", "XYZ", -1, 4),
("2021-03-02", "XYZ", 2, 6),
("2021-03-08", "XYZ", -1, 5),
("2021-03-08", "KLM", -3, 0);
I have the above table for an inventory log. I want to get 2 things:
The current value given a #date. So if my given date is 2021-03-09, even though that date doesn't exist in the list, it will give me the most recent values of all ABC, XYZ, and KLM items and their current status. So the select table would look something like this:
+------+---------+
| item | current |
+======+=========+
| ABC | 11 |
+------+---------+
| XYZ | 5 |
+------+---------+
| KLM | 0 |
+------+---------+
Similarly, I want to get the current values but for specific timeframes, given a "now" date #date where that can be any date. So If #date = 2021-4-1, I am looking for something like this:
+------+-------+-----------+------------+------------+----------+
| item | total | 0-30 days | 31-60 days | 61-90 days | 90+ days |
+======+=======+===========+============+============+==========+
| XYZ | 5 | 1 | 4 | 0 | 0 |
+------+-------+-----------+------------+------------+----------+
| ABC | 11 | 0 | 0 | 11 | 0 |
+------+-------+-----------+------------+------------+----------+
| KLM | 0 | 0 | 0 | 0 | 0 |
+------+-------+-----------+------------+------------+----------+
One thing to note is that "older" items are deducted if there was a deduction. So if 5x item XYZ was added 50 days ago, and 2x 20 days ago, and it was reduced 3x 10 days ago, the table would show total = 4, 0-30 days = 2, and 31-60 days = 2 because the older items were deducted even tho the deduction occurred recently.
My first guess was to utilize partitions but I am not sure if that's possible, knowing the values outside of a partition is affected.
EDIT:
After a friend pointed out this article I have found a way to answer the first part of the question:
SELECT
item,
MAX(curr) as current // needed to group item columns
FROM(
SELECT
*,
LAST_VALUE(current) OVER // partitioning was needed to
( // find the most recent 'current'
PARTITION BY item // value added on that 'item'
ORDER BY date
RANGE BETWEEN
UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING
) curr
FROM tbl
WHERE date <= #date
)a
WHERE bal = balance
GROUP BY item
;
To answer the first part of your question, consider the following...
CREATE TABLE master_tb
( date DATE
, item VARCHAR(10)
, change_x INT
, current INT
, PRIMARY KEY(date,item)
);
INSERT INTO master_tb (date, item, change_x, current)
VALUES
("2021-01-01", "ABC", 11, 11),
("2021-01-01", "KLM", 4, 4),
("2021-01-02", "KLM", -3, 1),
("2021-01-03", "KLM", -1, 0),
("2021-02-01", "KLM", 6, 6),
("2021-02-02", "XYZ", 5, 5),
("2021-02-08", "KLM", -3, 3),
("2021-02-09", "XYZ", -1, 4),
("2021-03-02", "XYZ", 2, 6),
("2021-03-08", "XYZ", -1, 5),
("2021-03-08", "KLM", -3, 0);
SELECT item
, MAX(date) date
FROM master_tb
WHERE date <= '2021-03-09'
GROUP
BY item;
+------+------------+
| item | date |
+------+------------+
| ABC | 2021-01-01 |
| KLM | 2021-03-08 |
| XYZ | 2021-03-08 |
+------+------------+
SELECT a.*
FROM master_tb a
JOIN
( SELECT item
, MAX(date) date
FROM master_tb
WHERE date <= '2021-03-09'
GROUP
BY item
) b
ON b.item = a.item
AND b.date = a.date;
+------------+------+----------+---------+
| date | item | change_x | current |
+------------+------+----------+---------+
| 2021-01-01 | ABC | 11 | 11 |
| 2021-03-08 | KLM | -3 | 0 |
| 2021-03-08 | XYZ | -1 | 5 |
+------------+------+----------+---------+
We could also solve this using common table expressions/windowing functions, and that might be useful when we come to look at the next part of the problem - although I have a suspicion that much of that should be resolved in application code.

using "OR" and "HAVING SUM" Mysql 5.7

this is my fiddle :https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=71b1fd5d8e222ab1c51ace8d1af4c94f
CREATE TABLE order_match(ID int(10) NOT NULL PRIMARY KEY AUTO_INCREMENT,
quantity decimal(10,2), createdAt date NOT NULL, order_status_id int(10) NOT NULL,
createdby int(11), code_order varchar(20) NOT NULL);
insert into order_match values
(1, 0.2, '2020-02-02', 6, 01, 0001),
(2, 1, '2020-02-03', 7, 02, 0002),
(3, 1.3, '2020-02-04', 7, 03, 0003),
(4, 1.4, '2020-02-08', 5, 08, 0004),
(5, 1.2, '2020-02-05', 8, 04, 0005),
(6, 1.4, '2020-03-01', 8, 05, 0006),
(7, 0.23, '2020-01-01', 8, 03, 0007),
(8, 2.3, '2020-02-07', 8, 04, 0009);
and then this is my table
select order_status_id, createdby, createdAt from order_match;
+-----------------+-----------+------------+
| order_status_id | createdby | createdAt |
+-----------------+-----------+------------+
| 6 | 1 | 2020-02-02 |
| 7 | 2 | 2020-02-03 |
| 7 | 3 | 2020-02-04 |
| 5 | 8 | 2020-02-08 |
| 8 | 4 | 2020-02-05 |
| 8 | 5 | 2020-03-01 |
| 8 | 3 | 2020-01-01 |
+-----------------+-----------+------------+
order_status_id are the status of transaction, "7" means no approval transaction, else are approval, createdby are the id of users who doing transaction, and createdAt are the date of transaction happen.
so i want to find out the repeat users who doing transaction in between '2020-02-01' and '2020-02-28', repeat users are the users who doing approval transaction before '2020-02-28' and atleast doing 1 more approval transaction again in date range ('2020-02-01' until '2020-02-28')
based on the explanation i used this query :
SELECT s1.createdby
FROM order_match s1
WHERE s1.order_status_Id in (4, 5, 6, 8)
GROUP BY s1.createdby
HAVING SUM(s1.createdAt BETWEEN '2020-02-01' AND '2020-02-28')
AND SUM(s1.createdAt <= '2020-02-28') > 1
OR exists (select 1 from order_match s1 where
s1.createdAt < '2020-02-01'
and s1.order_status_id in (4, 5, 6, 8));
from that query, the result was this :
+-----------+
| createdby |
+-----------+
| 1 |
| 3 |
| 4 |
| 5 |
| 8 |
+-----------+
and the expected results based on the data and explanation was like this :
+-----------+
| createdby |
+-----------+
| null |
+-----------+
because there's no users who fit with "repeat users" condition. where my wrong at?
Looks like
SELECT createdby
FROM order_match
-- select rows in specified data range
WHERE createdAt BETWEEN '2020-02-01' AND '2020-02-28'
GROUP BY createdby
-- check that user has more than one transaction which' status is not non-approved
HAVING SUM(order_status_id != 7) > 1 -- or SUM(order_status_id in (4, 5, 6, 8)) > 1
that's why i used or exists to check the users before '2020-02-01'
Sorry, I have understood the task wrongly.
SELECT createdby
FROM order_match
GROUP BY createdby
-- check that user has more than one transaction which' status is not non-approved
HAVING SUM(order_status_id != 7) > 1
-- and at least one of them is in specified data range
AND SUM(order_status_id != 7 AND createdAt BETWEEN '2020-02-01' AND '2020-02-28')
where my wrong at?
In WHERE IN - this condition gives TRUE for each createdby who has at least one approved transacions, because this transaction checks self in this condition.
Additionally - s1.createdAt BETWEEN '2020-02-01' AND '2020-02-28' overlaps s1.createdAt <= '2020-02-28', so 2nd condition is excess (if 1st is true then 2nd is true too).

How do I store a value from the last row in a variable using MySQL?

I am trying to calculate the date difference between each record in a dataset for each account.
Here is the data that I have
id aid value
1 1 2015-01-01
2 1 2015-01-07
4 1 2015-01-08
6 1 2015-04-10
3 2 2015-02-01
5 2 2015-02-05
I would first need to combine the data where I can use TIMESTAMPDIFF to calculate the difference in Days (i.e. TIMESTAMPDIFF(DAY, previousValue, currentValue)).
How can I combine the rows in the dataset about to look like this
aid currentValue previousValue
1 2015-01-07 2015-01-01
1 2015-01-08 2015-01-07
1 2015-04-10 2015-01-08
2 2015-02-05 2015-02-01
from there I can easily calculate the difference in days between current and previous value.
Note, that I have a large data set and I can't use subqueries in my select this is why I need to know how to do it using variables.
How can convert my initial dataset to the second dataset where I have currentValue, previousValue for each account?
Here is SQL to generate tables with the data above
CREATE TEMPORARY TABLE lst
(
id int,
account_id int,
value date
);
INSERT INTO lst VALUES
(1, 1, '2015-01-01')
, (2, 1, '2015-01-07')
, (3, 2, '2015-02-01')
, (4, 1, '2015-01-08')
, (5, 2, '2015-02-05')
, (6, 1, '2015-04-10');
CREATE TEMPORARY TABLE lst1 AS
SELECT * FROM lst ORDER BY account_id, value ASC;
UPDATED
This is what I get after attempting Giorgos Betsos' answer below
'1', '2015-01-01', '2015-01-07'
'1', '2015-01-07', '2015-01-08'
'1', '2015-02-05', '2015-04-10'
'2', '2015-01-08', '2015-02-01'
'2', '2015-02-01', '2015-02-05'
SQL Fiddle
MySQL 5.6 Schema Setup:
CREATE TABLE test
(
id int,
account_id int,
value date
);
INSERT INTO test VALUES
(1, 1, '2015-01-01')
, (2, 1, '2015-01-07')
, (3, 2, '2015-02-01')
, (4, 1, '2015-01-08')
, (5, 2, '2015-02-05')
, (6, 1, '2015-04-10');
Query 1:
SELECT
IF(#accId = account_id, #prevDate, '-') as "Previous Date",
(#prevDate := value) as "Date",
(#accId :=account_id) as account_id
FROM
test, (SELECT #accId := 0) a, (SELECT #prevDate := '-') b
ORDER BY account_id ASC, value ASC
Results:
| Previous Date | Date | account_id |
|---------------|------------|------------|
| - | 2015-01-01 | 1 |
| 2015-01-01 | 2015-01-07 | 1 |
| 2015-01-07 | 2015-01-08 | 1 |
| 2015-01-08 | 2015-04-10 | 1 |
| - | 2015-02-01 | 2 |
| 2015-02-01 | 2015-02-05 | 2 |

Getting ROW_NUMBER in MySQL, restarting at 1 for sub-groups?

I have a database table containing responses to questions from users. Each question has a request and a response timestamp. The questions are asked in random order. Some users abandoned before completing all the questions and thus don't have subsequent response records.
I didn't capture the order in which the questions were asked for each user, but the sequence could be derived from the request timestamps (SELECT * FROM responses ORDER BY id_user, request_ts;).
I'm using MySQL, so I don't have ROW_NUMBER() as an available function. How would I go about getting the equivalent output, and have the counting restart on each id_user?
That is, for user_id=1, I want responses with values 1,2,..n ordered by request_ts, and then user_id=2 would have their responses with values 1,2,..n; and so on.
Ultimately, I want to get a set of data of aggregated average duration for each nth question (i.e. average duration for first question asked, ditto second question asked, etc).
+-----+-----+-------+
| Seq | Num | Avg_D |
+-----+-----+-------+
| 1 | 20 | 00:36 |
| 2 | 20 | 00:31 |
| 3 | 19 | 00:31 |
| 4 | 20 | 00:25 |
| 5 | 18 | 00:24 |
| 6 | 20 | 00:24 |
| 7 | 20 | 00:23 |
| 8 | 20 | 00:25 |
+-----+-----+-------+
This can then be used to show participant drop-off, survey fatigue, etc.
I created a dummy with sample data.
CREATE TABLE `test9b` ( `id` int(32) NOT NULL AUTO_INCREMENT, `user_id` int(32) NOT NULL, `num` int(32) NOT NULL, `avg` int(32) NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8
INSERT INTO `test9b` (`id`, `user_id`, `num`, `avg`) VALUES
(1, 1, 21, 36),
(2, 1, 23, 32),
(3, 1, 20, 35),
(4, 2, 22, 32),
(5, 2, 25, 37),
(6, 2, 10, 39),
(7, 2, 20, 33),
(8, 3, 30, 36),
(9, 3, 40, 36),
(10, 4, 50, 36);
Query :
SELECT a.user_id, a.num, count(*) as row_number FROM test9b a
JOIN test9b b ON a.user_id = b.user_id AND a.num >= b.num
GROUP BY a.user_id, a.num
OUTPUT :
user_id num row_number
1 20 1
1 21 2
1 23 3
2 10 1
2 20 2
2 22 3
2 25 4
3 30 1
3 40 2
4 50 1