Different SQL writing methods cause different time cost - mysql

I'm trying to select every n-th row from mysql, I read this answer.
There is a table sys_request_log:
CREATE TABLE `sys_request_log`
(
`id` bigint(20) NOT NULL,
`user_id` bigint(20) DEFAULT NULL,
`ip` varchar(50) DEFAULT NULL,
`data` mediumtext,
`create_time` datetime DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE,
KEY `user_id` (`user_id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;
It contains 11837 rows.
I try to select every 5-th row from table, first I try to execute:
SELECT
*
FROM
(SELECT #ROW := #ROW + 1 AS rownum, log.* FROM ( SELECT #ROW := 0 ) r, sys_request_log log ) ranked
WHERE
rownum % 5 = 1
The result is:
rownum id user_id create_time
-------------------------------------------------------------------
1 1271446699071639552 1 2020-06-12 22:18:10
6 1271446948980854784 1 2020-06-12 22:19:10
11 1271447016878247936 1269884071484461056 2020-06-12 22:19:26
It costs 1.001s time
I found there is a unrelated column rownum. So I modify the SQL like this:
SELECT
log.*
FROM
(SELECT #ROW := #ROW + 1 AS rownum FROM (SELECT #ROW := 0) t) r,
sys_request_log log
WHERE
rownum % 5 = 1
Now the result is clean (no rownum), but It costs 2.516s time!
Why?
Mysql version: 5.7.26-log

In the first case, the row number values are selected during the selection from the table(sys_request_log), but for the second case there occurs a cartesian product among subquery r and the selection from the table because of the CROSS JOIN occurence for each individual rownum versus each individual row value of the table.

If I understand correctly, you can do what you want by moving the variable assignment to the where clause:
select srl.*
from sys_request_log srl cross join
(select #rn := 0) params
where (#rn := (#rn + 1)) % 5 = 1;
Note this happens to work in this case, because the query needs to do a full table scan and run the WHERE clause on each row. It might not work if the query has a JOIN, GROUP BY or ORDR BY.
The use of variables in this way is deprecated in MySQL now. You should upgrade and learn about window functions.

Your second query has different result from the first one and returns all rows, so take much more time from the first one.
To remove rownum from first query, Just name fields in SELECT clause.
try this:
SELECT
ranked.id, ranked.user_id ,ranked.create_time
FROM
( SELECT #ROW := #ROW + 1 AS rownum, log.* FROM ( SELECT #ROW := 0 ) r, sys_request_log log ) ranked
WHERE
rownum % 5 = 1

Related

Auto incremental temporary column in conditional select statement

I am using the following select query to group and calculate a specific column:
SELECT `COUNTRY`, sum(`POINT`) as total_mark
FROM results
GROUP BY `COUNTRY`
ORDER by total_mark DESC
I would like to add a temporary column that numbers all rows incrementally. I used:
(#cnt := #cnt + 1) AS rowNumber
The new column gets created but returns a NULL value for all rows. Can anyone help me to do the grouping and calculation first and the numbering the rows it returns. I tried this:
SELECT `COUNTRY`, sum(`POINT`) as total_mark, (#cnt := #cnt + 1) AS rowNumber
FROM results
GROUP BY `COUNTRY`
ORDER by total_mark DESC
I still only get NULL values.
You have to initialize #cnt to 0.
You should also do the ordering in a subquery. Otherwise, the row numbers may be assigned before ordering.
SELECT country, total_mark, (#cnt := #cnt + 1) AS rowNumber
FROM (
SELECT `COUNTRY`, sum(`POINT`) as total_mark
FROM results
GROUP BY `COUNTRY`
ORDER by total_mark DESC
) AS x
CROSS JOIN (SELECT #cnt := 0) AS vars
Note that if you're using MySQL 8.x you can use the ROW_NUMBER() window function instead of a session variable.

Efficient SQL query to find gap in consecutive numeric data (MySQL)

I have a table with column "time" (INT unsigned), every row represents one second and I need to find gaps in time (missing seconds).
I have tried with this query (to find the first time before a gap):
SELECT t1.time
FROM `table` AS t1
LEFT JOIN `table` AS t2 ON t2.time=(t1.time+1)
WHERE t2.time IS NULL
ORDER BY TIME ASC
LIMIT 1
And it works but it's too slow for big tables (near 100M rows)
Is there some faster solution?
EXPLAIN query:
SHOW CREATE:
CREATE TABLE `candles` (
`time` int(10) unsigned NOT NULL,
`open` float unsigned NOT NULL,
`high` float unsigned NOT NULL,
`low` float unsigned NOT NULL,
`close` float unsigned NOT NULL,
`vb` int(10) unsigned NOT NULL,
`vs` int(10) unsigned NOT NULL,
`trades` int(10) unsigned NOT NULL,
PRIMARY KEY (`time`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
If DB version is 8.0, then The Recursive Common Table Expression might be used such as
WITH RECURSIVE cte AS
(
SELECT 1 AS n
UNION ALL
SELECT n + 1 AS value
FROM cte
WHERE cte.n < (SELECT MAX(time) FROM tab )
)
SELECT n AS gaps
FROM cte
LEFT JOIN tab
ON n=time
WHERE cte.n > (SELECT MIN(time) FROM tab )
AND time IS NULL
Demo
In MySQL 5.7, this is a use case where user variables might be helpful:
select max(time)
from (
select t.time, #rn := #rn + 1 as rn
from (select time from mytable order by time) t
cross join (select #rn := 0) r
) t
group by time - rn
This addresses the question as a gaps-and-islands problem. The idea is to identify groups of records where time increments without gaps (the islands). For this, we assign an incrementing id to each row, ordered by time; whenever the difference between time and the auto-increment changes, you know there is a gap.
With mysql 8, you can use LEAD():
select time from (
select time, lead(time, 1) over (order by time) next_time
from `table`
) t
where time+1 != next_time
In earlier versions, I might do something like:
select prev_time as time from (
select #prev_time+0 as prev_time,if(#prev_time:=time,time,time) as time
from (select #prev_time:=null) initvars
cross join (select time from `table` order by time) t
) t
where time != prev_time+1
Either will not include the greatest time, where your original query would have.
I think the group by required to treat it as a strict gaps and islands problem would be too expensive with that many records.
fiddle

What are the subquery equivalents of SQL aggregate functions MAX/MIN/AVG/COUNT

Can someone show me how to represent the following SQL statements without the use of aggregate functions?
SELECT COUNT(column) FROM table;
SELECT AVG(column) FROM table;
SELECT MAX(column) FROM table;
SELECT MIN(column) FROM table;
MIN() and MAX() can be done with simple subqueries:
select (select column from table order by column is not null desc, column asc limit 1) as "MIN",
(select column from table order by column is not null desc, column desc limit 1) as "MAX"
COUNT() and AVG() require the use of variables, if you don't allow any aggregations:
select rn as "COUNT", sumcol / rnaas "AVG"
from (select t.*
from (select t.*,
(#rn := #rn + 1) as rn,
(#rna := #rna + if(column is not null, 1, 0)) as rna,
(#sum := #sum + coalesce(column, 0)) as sumcol
from table t cross join
(select #rn := 0, #rna := 0, #sum := 0) const
order by column
) t
order by rn desc
limit 1
) t
This latter formulation only works in MySQL.
EDIT:
The empty table is a challenge. Let's do this with a left outer join:
select cast(coalesce(rn, 0) as int) as "COUNT",
(case when rna > 0 then sumcol / rna end) as "AVG"
from (select 1 as n
) n left outer join
(select t.*
from (select t.*,
(#rn := #rn + 1) as rn,
(#rna := #rna + if(column is not null, 1, 0)) as rna,
(#sum := #sum + coalesce(column, 0)) as sumcol
from table t cross join
(select #rn := 0, #rna := 0, #sum := 0) const
order by column
) t
order by rn desc
limit 1
) t
on n.n = 1;
Notes. This will return 0 for the count if the table is empty. That is correct. If the table is empty, it will return NULL for the average, and that is also correct.
If the table is not empty, but the values are all NULL, then it will also return NULL. The types for the count are always integers, so that should be ok. The type of the average is more problematic, but the variables will return some sort of generic numeric type, which seems compatible in spirit.
min/max can be replaced with something like this:
select t1.pk_column,
t1.some_column
from the_table t1
where t1.some_column < ALL (select t2.some_column
from the_table t2
where t2.pk_column <> t2.pk_column);
For getting the max you need to replace < with >. pk_column is the primary key column of the table and is needed to avoid comparing each row to itself (it doesn't have to be a PK it only needs to be unique)
I don't think there is an alternative for count() or avg() (at least I can't think of one)
I used the_column and the_table because column and table are reserved words
SET #t1=0, #t2=0, #t3=0,#T4=0;
COUNT:
Select #t1:=#t1+1 as CNT from table
order by #t1:=#t1+1 DESC
LIMIT 1
Similar methods could be put together for Avg and max/min using limits...
Still thinking about Min/Max...
Not to supersede the excellent answer from Gordon Linoff, but there's a little more work involved to accurately emulate the AVG(), COUNT(), and SUM() functions. (The answer for the MIN and MAX functions in Gordon's answer are spot on.)
There's a corner case when the table is empty. In order to emulate the SQL aggregate functions, we need our query to return a single row. But at the same time, we need a test of whether or not the table contains at least one row.
Here's a query that is a more precise emulation:
-- create an empty table
CREATE TABLE `foo` (col INT);
-- TRUNCATE TABLE `foo`;
SELECT IF(s.ne IS NULL,0,s.rn) AS `COUNT(*)`
, IF(s.cc>0,s.tc,NULL) AS `SUM(col)`
, IF(s.cc>0,s.tc/s.cc,NULL) AS `AVG(col)`
FROM ( SELECT v.rn
, v.cc
, v.tc
, e.ne
FROM ( SELECT #rn := #rn + 1 AS rn
, #cc := #cc + (t.col IS NOT NULL) AS cc
, #tc := #tc + IFNULL(t.col,0) AS tc
FROM (SELECT #rn := 0, #cc := 0, #tc := 0) c
LEFT
JOIN `foo` t
ON 1=1
) v
LEFT
JOIN (SELECT 1 AS ne FROM `foo` z LIMIT 1) e
ON 1=1
ORDER BY v.rn DESC
LIMIT 1
) s
NOTES:
The purpose of the inline view aliased as e is to give us a way to determine whether or not the table contains any rows. If the table contains at least one row, we'll get a value of 1 returned as column ne (not empty). If the table is empty, that query won't return a row, and e.ne will be NULL, which is something we can test in the outer query.
In order to return a row, so we can return a value, like a 0 for a COUNT, we need to insure that we return at least one row from the inline view v. Since we are guaranteed exactly one row from the inline view aliased as c (which initializes our user defined variables), we'll use that as the "driving" table for a LEFT [OUTER] JOIN operation.
But, if the table is empty, our our row counter (#rn) coming out of v is going to have a value of 1. But we'll deal with that, we have the e.ne we can check to know if the count should really be returned as 0.
In order to calculate the average, we can't divide by the row counter, we have to divide by the number of rows where col was not null. We use the #cc user defined variable to keep track of the count of those rows.
Similarly, for the SUM (and the average) we need to accumulate only the non-NULL values. (If we were to add a NULL, it would turn the whole total to NULL, basically wiping out are accumulation. So, we're going to do a conditional test to check if t.col IS NULL, to avoid accidentally wiping out the accumulation. And our accumulator is going to be a 0 if there aren't any rows that are not null. But that's not a problem, because we'll make sure we check our #cc to see if there were any rows that were included. We're going to need to check it anyway, to avoid a "divide by zero" issue.
To test, run against the empty table foo. It will return a count of 0, and NULL for SUM and AVG, equivalent to the result we get from:
SELECT COUNT(*), SUM(col), AVG(col) FROM foo;
We can also test the query against a table containing only NULL values for col:
INSERT INTO `foo` (col) VALUES (NULL);
As well as some non-NULL values:
INSERT INTO `foo` (col) VALUES (2),(3),(5),(7),(11),(13),(17),(19);
And compare the results of the two queries.
This essentially the same as the answer from Gordon Linoff, with just a little more precision to work around the corner cases of NULL values and the empty table.

Order by multiple conditions

im very noob and this became ungoogleable (is that a word?)
the rank is by time but..
time done with ( A=0 ) AND ( B=0 ) beat everyone
time done with ( A=0 ) AND ( B=1 ) beat everyone with ( A=1 )
time done with ( A=1 ) AND ( B=0 ) beat everyone with ( A=1 + B=1 )
rank example (track=desert)
pos--car------time---A----B
1.---yellow----90----No---No
2.---red-------95----No---No
3.---grey-----78-----No---Yes
4.---orange--253---No---Yes
5.---black----86----Yes---No
6.---white----149---Yes---No
7.---pink-----59----Yes---Yes
8.---blue-----61----Yes---Yes
to make it even worst, the table accept multiple records for the same car
here is the entries
create table `rank`
(
`id` int not null auto_increment,
`track` varchar(25) not null,
`car` varchar(32) not null,
`time` int not null,
`a` boolean not null,
`b` boolean not null,
primary key (`id`)
);
insert into rank (track,car,time,a,b) values
('desert','red','95','0','0'),
('desert','yellow','89','0','1'),
('desert','yellow','108','0','0'),
('desert','red','57','1','1'),
('desert','orange','120','1','0'),
('desert','grey','85','0','1'),
('desert','grey','64','1','0'),
('desert','yellow','90','0','0'),
('desert','white','92','1','1'),
('desert','orange','253','0','1'),
('desert','black','86','1','0'),
('desert','yellow','94','0','1'),
('desert','white','149','1','0'),
('desert','pink','59','1','1'),
('desert','grey','78','0','1'),
('desert','blue','61','1','1'),
('desert','pink','73','1','1');
please, help? :p
ps: sorry about the example table
To prioritize a, then b, then time, use order by b, a, time.
You can use a not exists subquery to select only the best row per car.
Finally, you can add a Pos column using MySQL's variables, like #rn := #rn + 1.
Example query:
select #rn := #rn + 1 as pos
, r.*
from rank r
join (select #rn := 0) init
where not exists
(
select *
from rank r2
where r.car = r2.car
and (
r2.a < r.a
or (r2.a = r.a and r2.b < r.b)
or (r2.a = r.a and r2.b = r.b and r2.time < r.time)
)
)
order by
b
, a
, time
See it working at SQL Fiddle.

getting the ranking of the rows in mysql ORDER BY statements

suppose I have
SELECT * FROM t ORDER BY j
is there a way to specify the query to also return an autoincremented column that go along with the results that specifies the rank of that row in terms of the ordering?
also this column should also work when using ranged LIMITs, eg
SELECT * FROM t ORDER BY j LIMIT 10,20
should have the autoincremented column return 11,12,13,14 etc....
Oracle, MSSQL etc support ranking functions that do exactly what you want, unfortunately, MySQL has some catching up to do in this regard.
The closest I've ever been able to get to approximating ROW_NUMBER() OVER() in MySQL is like this:
SELECT t.*,
#rank = #rank + 1 AS rank
FROM t, (SELECT #rank := 0) r
ORDER BY j
I don't know how that would rank using ranged LIMIT unless you used that in a subquery perhaps (although performance may suffer with large datasets)
SELECT T2.*, rank
FROM (
SELECT t.*,
#rank = #rank + 1 AS rank
FROM t, (SELECT #rank := 0) r
ORDER BY j
) t2
LIMIT 10,20
The other option would be to create a temporary table,
CREATE TEMPORARY TABLE myRank
(
`rank` INT(11) NOT NULL AUTO_INCREMENT,
`id` INT(11) NOT NULL,
PRIMARY KEY(id, rank)
)
INSERT INTO myRank (id)
SELECT T.id
FROM T
ORDER BY j
SELECT T.*, R.rank
FROM T
INNER JOIN myRank R
ON T.id = R.id
LIMIT 10,20
Of course, the temporary table would need to be persisted between calls.
I wish there was a better way, but without ROW_NUMBER() you must resort to some hackery to get the behavior you want.