MySQL select all rows from last N groups - mysql

I have a dataset like this, where there can be multiple transactions per trade
| tx_id | trade_id |
--------------------
| 100 | 11 |
| 99 | 11 |
| 98 | 11 |
| 97 | 10 |
| 96 | 10 |
| 95 | 9 |
| 94 | 9 |
| 93 | 8 |
...
I want to select all of the transactions from the last N trades. For instance if I wanted to select all rows from the last 2 trades, I would get:
| tx_id | trade_id |
--------------------
| 100 | 11 |
| 99 | 11 |
| 98 | 11 |
| 97 | 10 |
| 96 | 10 |
I cannot guarantee that the trade_id will always have an interval of 1.
How can I accomplish this in mysql?

This will also work with mysql 5
Changing the linit , you can choose how many trades you want to receive
CREATE TABLE tab1 (
`tx_id` INTEGER,
`trade_id` INTEGER
);
INSERT INTO tab1
(`tx_id`, `trade_id`)
VALUES
('100', '11'),
('99', '11'),
('98', '11'),
('97', '10'),
('96', '10'),
('95', '9'),
('94', '9'),
('93', '8');
SELECT t1.* FROM tab1 t1 JOIN (SELECT DISTINCT `trade_id` FROM tab1 ORDER BY `trade_id` DESC LIMIT 2) t2
ON t1.`trade_id` = t2.`trade_id`
tx_id | trade_id
----: | -------:
100 | 11
99 | 11
98 | 11
97 | 10
96 | 10
db<>fiddle here

You use DENSE_RANK on trade_id descending, then filter on your required X for "last X":
CREATE TABLE t (tx_id int, trade_id int);
INSERT INTO t (tx_id, trade_id) VALUES
(100,11),
(99,11),
(98,11),
(97,10),
(96,10),
(95,9),
(94,9),
(93,8);
SET #ngroups=2;
WITH dat
AS
(
SELECT tx_id, trade_id, DENSE_RANK() OVER (ORDER BY trade_id DESC) AS trade_id_rank
FROM t
)
SELECT tx_id, trade_id
FROM dat
WHERE trade_id_rank <= #ngroups;
dbfiddle.uk

If we assume the "last trades" are the ones with the highest trade_id numbers, then you can use DENSE_RANK().
For example:
select *
from (
select *,
dense_rank() over(order by trade_id desc) as dr
from t
) x
where dr <= 2

This can be done with a CTE
WITH trades AS
SELECT trade_id tid
FROM myTable
GROUP BY trade_id
ORDER BY trade_id
LIMIT 2
SELECT * FROM
trades
JOIN myTable ON trade_id = tid
ORDER BY tx_id;

Related

Set limit for IN condition element that evaluate true

table: t
+--------------+-----------+-----------+
| Id | price | Date |
+--------------+-----------+-----------+
| 1 | 30 | 2021-05-09|
| 1 | 24 | 2021-04-26|
| 1 | 33 | 2021-04-13|
| 2 | 36 | 2021-04-18|
| 3 | 15 | 2021-04-04|
| 3 | 33 | 2021-05-06|
| 4 | 46 | 2021-02-16|
+--------------+-----------+-----------+
I want to select rows where id is 1,2,4 and get maximum 2 row for each id by date descending order.
+--------------+-----------+-----------+
| Id | price | Date |
+--------------+-----------+-----------+
| 1 | 30 | 2021-05-09|
| 1 | 24 | 2021-04-26|
| 2 | 36 | 2021-04-18|
| 4 | 46 | 2021-02-16|
+--------------+-----------+-----------+
Something like:
Select * from t where Id IN ('1','2','4') limit 2 order by Date desc;
this will limit the overall result fetched.
Use row_number():
select id, price, date
from (select t.*,
row_number() over (partition by id order by date desc) as seqnum
from t
where id in (1, 2, 4)
) t
where seqnum <= 2;
Probably the most efficient method is a correlated subquery:
select t.*
from t
where t.id in (1, 2, 4) and
t.date >= coalesce( (select t2.date
from t t2
where t2.id = t.id
order by t2.date desc
limit 1,1
), t.date
);
For performance, you want an index on (id, date). Also, this can return duplicates if there are multiple rows for a given id on the same date.
Here is a db<>fiddle.

how to group two specific items together

I have a database with questions with columns Question, Answer, Type.
Currently, this is the sql statement I am running:
SELECT Question, Answer, Type FROM goodquestions ORDER BY RAND() LIMIT 0,20
As you can see, I select random values from the table and I would like it to be that way. However, when type is 12 I would like to access the table row prior to that entry and print them out in conjunction
Like this
RANDOM
RANDOM
RANDOM
Question before 12 type
12 type question
RANDOM
RANDOM
RANDOM
It can also be like this:
Question before 12 type
12 type question
RANDOM
RANDOM
RANDOM
RANDOM
RANDOM
RANDOM
I just need them to be together and I am unable to do this right now.
I guess I see what you want.
Please try this query, I changed limit 1,20 to limit 1, 10:
select #next_line_id:=0, #next_line_type:=0;
select g.*, tt.is_property
from
(select * from
(select *,
rand() as rand_val,
case
when #next_line_type= 12 or Type = 12 then 1
else 0
end is_property,
#next_line_id as next_line_id,
#next_line_id:=id as current_id,
#next_line_type:=Type as current_type
from goodquestions order by id desc
) t
where t.Type <> 12
order by rand_val limit 0,10) tt
join goodquestions g on g.id = tt.id or (g.id = tt.next_line_id and tt.is_property = 1 and tt.Type <> 12)
group by g.id, g.Question, g.Answer, g.Type, tt.is_property
order by is_property desc, id
limit 0, 10;
The following are the query of creating test table:
create table goodquestions (
id int unsigned auto_increment primary key,
Question varchar(255) not null,
Answer varchar(255) not null,
Type int unsigned,
index idx_type (Type)
) engine=innodb DEFAULT CHARSET=latin1;
insert into goodquestions (Question, Answer, Type)
values ('q1', 'a1', 1),
('q2', 'a2', 2),
('q3', 'a3', 3),
('q4', 'a4', 4),
('q5', 'a5', 5),
('q6', 'a6', 6),
('q7', 'a7', 7),
('q8', 'a8', 8),
('q9', 'a9', 9),
('q10', 'a10', 10),
('q11', 'a11', 11),
('q12', 'a12', 12),
('q13', 'a13', 13),
('q14', 'a14', 14),
('q15', 'a15', 15),
('q16', 'a16', 16),
('q17', 'a17', 17),
('q18', 'a18', 18);
Please note, using rand() function may have a bad performance for a large
table. If there are performance issue, I could provide another solution for better performance.
The following query which result list must have and only have one record of type 12:
select #total_type_12:=(select count(*) from goodquestions where Type=12);
select #random_type_12:=(floor(rand()*#total_type_12) + 1) * 2;
select #next_line_id:=0, #next_line_type:=0, #is_property:=0;
select g.*, tt.is_property
from
(select * from
(select *,
case
when (#next_line_type= 12 or Type = 12) and #random_type_12 > 0 and #random_type_12 <= 2 then #is_property:=1
else #is_property:=0
end is_property,
rand() as rand_val,
#random_type_12 as cur_random_type_counter,
case
when (#next_line_type= 12 or Type = 12) and #random_type_12 > 0 then #random_type_12:=#random_type_12-1
else #random_type_12
end as next_rand_type_counter,
#next_line_id as next_line_id,
#next_line_id:=id as current_id,
#next_line_type:=Type as current_type
from goodquestions order by id desc
) t
where t.Type <> 12
order by is_property desc, rand_val limit 0,10) tt
join goodquestions g on g.id = tt.id or (g.id = tt.next_line_id and tt.is_property = 1 and tt.Type <> 12)
group by g.id, g.Question, g.Answer, g.Type, tt.is_property
order by is_property desc, id
limit 0, 10;
Test data set is as following:
mysql> select * from goodquestions;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id: 84
Current database: test
+----+----------+--------+------+
| id | Question | Answer | Type |
+----+----------+--------+------+
| 1 | q1 | a1 | 1 |
| 2 | q2 | a2 | 2 |
| 3 | q3 | a3 | 3 |
| 4 | q4 | a4 | 4 |
| 5 | q5 | a5 | 5 |
| 6 | q6 | a6 | 6 |
| 7 | q7 | a7 | 7 |
| 8 | q8 | a8 | 8 |
| 9 | q9 | a9 | 9 |
| 10 | q10 | a10 | 10 |
| 11 | q11 | a11 | 11 |
| 12 | q12 | a12 | 12 |
| 13 | q13 | a13 | 13 |
| 14 | q14 | a14 | 14 |
| 15 | q15 | a15 | 15 |
| 16 | q16 | a16 | 16 |
| 17 | q17 | a17 | 17 |
| 18 | q18 | a18 | 18 |
| 19 | q21 | a21 | 12 |
| 20 | q22 | a22 | 22 |
| 21 | q23 | a23 | 12 |
+----+----------+--------+------+
21 rows in set (0.34 sec)
For MySQL 8+ it can be something similar to
WITH
-- SELECT 20 random rows
cte AS ( SELECT Question, Answer, Type
FROM goodquestions
ORDER BY RAND() LIMIT 0,20 )
( SELECT Question, Answer, Type
FROM cte )
-- add pre-row if Type=12 row is selected and pre-row is not selected
UNION DISTINCT
( SELECT Question, Answer, Type
FROM goodquestions
WHERE Type = 'pre-type for type 12'
AND EXISTS ( SELECT NULL
FROM cte
WHERE Type = 12 ) )
-- sort placing pre-row and type=12 row at the top
ORDER BY Type = 'pre-type for type 12' DESC,
Type = 12 DESC,
RAND()
-- remove excess row if Type=12 row was selected in CTE
-- and pre-row was not selected in CTE but added in UNION
LIMIT 0, 20
The query assumes that goodquestions.Type is unique.

Ranking for unique users

If I have three columns:
id, username, time
My data is:
+-------+------------------+-------------+
| id | username | time |
+-------+------------------+-------------+
| 1 | A | 1 min |
| 2 | A | 2 min |
| 3 | B | 3 min |
| 4 | B | 4 min |
+-------+------------------+-------------+
This query is working to get the ranking:
SELECT time,
FIND_IN_SET(MIN(time), (SELECT GROUP_CONCAT(time ORDER BY time ASC)
FROM table t1)) AS rank
FROM table t2
WHERE t2.username = 'B';
There is only one problem: It returns Rank 3de for the user B instead 2nd.
So I tried to use GROUP BY t2.username and also Distinct t2.username but did not work.
How can I get the rank of THE user B? It should be 2 (Not 3) because we have only 2 users.
E.g.:
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL
,username CHAR(1) NOT NULL
,time TIME NOT NULL
);
INSERT INTO my_table VALUES
(1,'A','00:01:00'),
(2,'A','00:02:00'),
(3,'B','00:03:00'),
(4,'B','00:04:00');
SELECT * FROM my_table;
+----+----------+----------+
| id | username | time |
+----+----------+----------+
| 1 | A | 00:01:00 |
| 2 | A | 00:02:00 |
| 3 | B | 00:03:00 |
| 4 | B | 00:04:00 |
+----+----------+----------+
SELECT *
FROM
( SELECT username
, time
, #i:=#i+1 rank
FROM
( SELECT username
, MIN(time) time
FROM my_table
GROUP
BY username
) x
, (SELECT #i:=0) vars
ORDER
BY time
) n
WHERE username = 'B';
+----------+----------+------+
| username | time | rank |
+----------+----------+------+
| B | 00:03:00 | 2 |
+----------+----------+------+
I think this would work too, but it's slightly hacky, so I'm not sure...
SELECT x.*
, FIND_IN_SET(time,(SELECT GROUP_CONCAT(DISTINCT time ORDER BY time) FROM (SELECT MIN(time) time FROM my_table GROUP BY username) j )) rank
FROM my_table x HAVING rank <> 0 AND username = 'B';

To find the last value in the dataset of 15 minutes interval

ID Timestamp Value
1 11:59.54 10
1 12.04.00 20
1 12.12.00 31
1 12.16.00 10
1 12.48.00 05
I want the result set as
ID Timestamp Value
1 11:59.54 10
1 12:00:00 10
1 12.04.00 20
1 12.12.00 31
1 12:15:00 31
1 12:16.00 10
1 12:30:00 10
1 12:45:00 10
1 12.48.00 05
More coffee will probably lead to a simpler solution, but consider the the following...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,timestamp TIMESTAMP
,value INT NOT NULL
);
INSERT INTO my_table VALUES
(1 ,'11:59:54',10),
(2 ,'12:04:00',20),
(3 ,'12:12:00',31),
(4 ,'12:16:00',10),
(5 ,'12:48:00',05);
... in addition, I have a table of integers, that looks like this:
SELECT * FROM ints;
+---+
| i |
+---+
| 0 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
+---+
So...
SELECT a.timestamp
, b.value
FROM
( SELECT x.*
, MIN(y.timestamp) min_timestamp
FROM
( SELECT timestamp
FROM my_table
UNION
SELECT SEC_TO_TIME((i2.i*10+i1.i)*900)
FROM ints i1
, ints i2
WHERE SEC_TO_TIME((i2.i*10+i1.i)*900)
BETWEEN (SELECT MIN(timestamp) FROM my_table)
AND (SELECT MAX(timestamp) FROM my_table)
ORDER
BY timestamp
) x
LEFT
JOIN my_table y
ON y.timestamp >= x.timestamp
GROUP
BY x.timestamp
) a
JOIN my_table b
ON b.timestamp = min_timestamp;
+-----------+-------+
| timestamp | value |
+-----------+-------+
| 11:59:54 | 10 |
| 12:00:00 | 20 |
| 12:04:00 | 20 |
| 12:12:00 | 31 |
| 12:15:00 | 10 |
| 12:16:00 | 10 |
| 12:30:00 | 5 |
| 12:45:00 | 5 |
| 12:48:00 | 5 |
+-----------+-------+
The idea is as follows. Use SERIES_GENERATE() to generate the missing time stamps with the 15 minute intervals and and union it with the existing data your table T. Now you would want to use LAST_VALUE with IGNORE NULLS. IGNORE NULLS is not implemented in HANA, therefore you have to do a bit of a workaround. I use COUNT() as a window function to count the non null values. I do the same on the original data and then join both on the count. This way I repeat the last non-null value.
select X.ID, X.TIME, Y.VALUE from (
select ID, TIME, value,
count(VALUE) over (order by TIME rows between unbounded preceding and current row) as CNT
from (
--add the missing 15 minute interval timestamps
select 1 as ID, GENERATED_PERIOD_START as TIME, NULL as VALUE
from SERIES_GENERATE_TIME('INTERVAL 15 MINUTE', '12:00:00', '13:00:00')
union all
select ID, TIME, VALUE from T
)
) as X join (
select ID, TIME, value,
count(value) over (order by TIME rows between unbounded preceding and current row) as CNT
from T
) as Y on X.CNT = Y.CNT

mySQL Ranking (and draws)

Next weekend we're having a competition with 3 qualifications a semifinal and a final. Only the best 15 participants could compete in the semifinal. Only the best 6 compete in the Finals.
in the qualifications you get a score from 0 to 100 for each qualification
I'm looking to find a way to select the contesters for the semi-final. This should be based on (rank of qualification1) * (rank of qualification2) * (rank of qualification3)
so i need something like:
select id, name, ((.... as RANK_OF_SCORE_1) * (.. as RANK_OF_SCORE_2) * (... as RANK_OF_SCORE_3)) as qualification_score from participants order by qualification_score desc limit 15
but of course this is not valid mySQL.
Besides this problem if tho contesters have the same score, they should be both included in the semi-finals even if this exceeds the maximum of 15.
For the finals, we would like to select the best 6 of the semi-final scores. If 2 scores are the same we would like to select on the qualifications..
option 1 : use postgres, which support windowing functions (namely RANK() and DENSE_RANK())
SELECT user_id, score, rank() over (order by score desc) from scores;
Time : 0.0014 s
option 2 : use a self- join : the rank of a user with score X is (1 +the count(*) of users with score less than X) ; this is likely to be pretty slow
CREATE TABLE scores( user_id INT PRIMARY KEY, score INT, KEY(score) );
INSERT INTO scores SELECT id, rand()*100 FROM serie LIMIT 1000;
SELECT a.user_id, a.score, 1+count(b.user_id) AS rank
FROM scores a
LEFT JOIN scores b ON (b.score>a.score)
GROUP BY user_id ORDER BY rank;
+---------+-------+------+
| user_id | score | rank |
+---------+-------+------+
| 381 | 100 | 1 |
| 777 | 100 | 1 |
| 586 | 100 | 1 |
| 907 | 100 | 1 |
| 790 | 100 | 1 |
| 253 | 99 | 6 |
| 393 | 99 | 6 |
| 429 | 99 | 6 |
| 376 | 99 | 6 |
| 857 | 99 | 6 |
| 293 | 99 | 6 |
| 156 | 99 | 6 |
| 167 | 98 | 13 |
| 594 | 98 | 13 |
| 690 | 98 | 13 |
| 510 | 98 | 13 |
| 436 | 98 | 13 |
| 671 | 98 | 13 |
time 0.7s
option 3 :
SET #rownum = 0;
SELECT a.user_id, a.score, b.r FROM
scores a
JOIN (
SELECT score, min(r) AS r FROM (
SELECT user_id, score, #rownum:=#rownum+1 AS r
FROM scores ORDER BY score DESC
) foo GROUP BY score
) b USING (score)
ORDER BY r;
time : 0.0014 s
EDIT
SET #rownum1 = 0;
SET #rownum2 = 0;
SET #rownum3 = 0;
SELECT s.*, s1.r, s2.r, s3.r FROM
scores s
JOIN
(
SELECT score_1, min(r) AS r FROM (
SELECT score_1, #rownum1:=#rownum1+1 AS r
FROM scores ORDER BY score_1 DESC
) foo GROUP BY score_1
) s1 USING (score_1) JOIN (
SELECT score_2, min(r) AS r FROM (
SELECT score_2, #rownum2:=#rownum2+1 AS r
FROM scores ORDER BY score_2 DESC
) foo GROUP BY score_2
) s2 USING (score_2) JOIN (
SELECT score_3, min(r) AS r FROM (
SELECT score_3, #rownum3:=#rownum3+1 AS r
FROM scores ORDER BY score_3 DESC
) foo GROUP BY score_3
) s3 USING (score_3)
ORDER BY s1.r * s2.r * s3.r;