Mysql find id gaps in temporary table - mysql

I need to find id gaps in a integer field of a temporary table.
Due to a temporary table limitation I cannot open the table twice and join it with itself.
This is the example:
CREATE TEMPORARY TABLE tmp_table (
id int(8) unsigned,
PRIMARY KEY (id)
);
INSERT INTO tmp_table VALUES (1),(2),(4),(7),(10),(11),(13);
and here I want to find this resultset:
3, 5, [6], 8, [9], 12
[#]=optional

A solution that works, but might be a better one.
This generates a range of numbers (so you will need to modify it for larger ranges - this copes up to 1000), and cross joins that range against the temp table. It finds the max and min ids from the temp table that are greater than or equal to or less or equal to (respectively) than generated value. Then in the HAVING clause check that the greater / less vales are not equal and also not null
SELECT 1 + units.i + tens.i * 10 + hundreds.i * 100 AS poss_num,
MAX(IF(tmp_table.id <= 1 + units.i + tens.i * 10 + hundreds.i * 100, tmp_table.id, NULL)) AS max_id,
MIN(IF(tmp_table.id >= 1 + units.i + tens.i * 10 + hundreds.i * 100, tmp_table.id, NULL)) AS min_id
FROM
(
SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
) units
CROSS JOIN
(
SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
) tens
CROSS JOIN
(
SELECT 0 AS i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
) hundreds
CROSS JOIN tmp_table
GROUP BY poss_num
HAVING min_id != max_id
AND min_id IS NOT NULL
AND max_id IS NOT NULL

Related

Insert unique random number with recursive MySQL

I want to populate a table with unique random numbers without using a procedure.
I've tried using this reply to do it but not success.
What I'm trying to do is something like this but validating that numbers are not repeated:
INSERT into table (row1,row2)
WITH RECURSIVE
cte AS ( SELECT 0 num, LPAD(FLOOR(RAND() * 99999999),8,0) random_num
UNION ALL
SELECT num+1, LPAD(FLOOR(RAND() * 99999999),8,0) random_num
FROM cte WHERE num < 100000-1)
SELECT random_num, null
FROM cte;
In the example above, I am able to generate random values and insert them but without validating that the numbers are not repeated.
I have tried to do this:
INSERT into table (row1,row2)
WITH RECURSIVE
cte AS ( SELECT 0 num, LPAD(FLOOR(RAND() * 99999999),8,0) random_num
UNION ALL
SELECT num+1, LPAD(FLOOR(RAND() * 99999999),8,0) random_num
FROM cte WHERE num < 100000-1 AND random_num NOT IN (SELECT random_num FROM cte WHERE random_num IS NOT NULL))
SELECT random_num, null
FROM cte;
but the condition AND random_num NOT IN (SELECT random_num FROM cte WHERE random_num IS NOT NULL) in the where case, causes an SQL Error [4008] [HY000]: Restrictions imposed on recursive definitions are violated for table 'cte'
Any suggestions of how to do it? thank you!.
This could be an option. Generate all possible values, sort randomly and take desired number of entries.
CREATE TABLE random_data (
row1 INT PRIMARY KEY AUTO_INCREMENT,
row2 VARCHAR(10) NOT NULL,
UNIQUE KEY _Idx1 ( row2 )
);
INSERT INTO random_data (row2)
SELECT LPAD(num, 8, 0)
FROM (
SELECT h * 10000000 + g * 1000000 + f * 100000 + e * 10000 + d * 1000 + c * 100 + b * 10 + a AS num
FROM (SELECT 0 a UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) ta
JOIN (SELECT 0 b UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) tb
JOIN (SELECT 0 c UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) tc
JOIN (SELECT 0 d UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) td
JOIN (SELECT 0 e UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) te
JOIN (SELECT 0 f UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) tf
JOIN (SELECT 0 g UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) tg
JOIN (SELECT 0 h UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) th
) n
ORDER BY RAND()
LIMIT 100000;
If you have a table - any table - with e.g. 100 rows then you can generate million distinct random numbers between 0 and 99999999 as follows:
select distinct floor(rand() * 100000000)
from t as t0, t as t1, t as t2
limit 1000000
Note that because of distinct you will need to generate a bigger number of rows so that you get desired number of rows after distinct.

How to fill missing values in aggregate-by-time function

I have function (from this question) which groups values by every 5 minutes and calculate min/avg/max:
SELECT (FLOOR(clock / 300) * 300) as period_start,
MIN(value), AVG(value), MAX(value)
FROM data
WHERE clock BETWEEN 1200000000 AND 1200001200
GROUP BY FLOOR(clock / 300);
However, due to missing values, some five-minute periods are skipped, making the timeline inconsistent. How to make it so that in the absence of data for a certain period, the value of max / avg / min becomes 0, instead of being skipped?
For example:
If I have timestamp - value
1200000001 - 100
1200000002 - 300
1200000301 - 100
1200000601 - 300
I want to get this: (select min/avg/max, time between 1200000000 and 1200001200)
1200000000 - 100/200/300
1200000300 - 100/100/100
1200000600 - 300/300/300
1200000900 - 0/0/0
Instead of this: (time between 1200000000 and 1200001200)
1200000000 - 100/200/300
1200000300 - 100/100/100
1200000600 - 300/300/300
1200000900 - THIS LINE WILL NOT BE, I will only get 3 lines above. No data between 1200000900 and 1200001200 for calculation.
My Answer:
Generate first table with required time range, and then left join this generated table on query with common group by operator. Such like this:
select * from
(select UNIX_TIMESTAMP(gen_date) as unix_date from
(select adddate('1970-01-01',t4*10000 + t3*1000 + t2*100 + t1*10 + t0) gen_date from
(select 0 t0 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
(select 0 t1 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
(select 0 t2 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
(select 0 t3 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3,
(select 0 t4 union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t4) v
where gen_date between '2017-01-01' and '2017-12-31') date_range_table
left join (
SELECT (FLOOR(clock / 300) * 300) as period_start,
MIN(value), AVG(value), MAX(value)
FROM table
WHERE clock BETWEEN 1483218000 AND 1514667600
GROUP BY FLOOR(clock / 300)) data_table
on date_range_table.unix_date = data_table.period_start;
Use recursive CTE (available in MariaDB starting from 10.2.2) and generate base calendar table:
WITH RECURSIVE
cte AS ( SELECT #timestart timestart, #timestart + 300 timeend
UNION ALL
SELECT timestart + 300, timeend + 300 FROM cte WHERE timeend < #timeend)
SELECT cte.timestart,
COALESCE(MIN(value), 0) min_value,
COALESCE(AVG(value), 0) avg_value,
COALESCE(MAX(value), 0) max_value
FROM cte
LEFT JOIN example ON example.clock >= cte.timestart
AND example.clock < cte.timeend
GROUP BY cte.timestart;
https://dbfiddle.uk/?rdbms=mariadb_10.3&fiddle=f5c41b7596d56f1d7babe075f19302ec
I am not very sure but here's a link which can solve your problem
https://www.sqlservercurry.com/2009/06/find-missing-identity-numbers-in-sql.html
You can try this one;
with seq as (
select
(step-1)* 300 + (select (FLOOR(min(clock) / 300) * 300) from data) as step
from
(select row_number() over() as step from data) tmp
where
tmp.step-1 < (select(max(clock)-min(clock))/ 300 from data))
SELECT seq.step as period_start, MIN(value), AVG(value), MAX(value)
FROM seq left join data on (seq.step=(FLOOR(clock / 300) * 300))
WHERE clock BETWEEN 1622667600 AND 1625259600
GROUP BY period_start
Alternative answer is generate first table with required time range, and then left join this generated table on query with common group by operator.

How can I update a column with a value from a list of values for a set of records

I do an update in a table row by row:
UPDATE table
SET col = $value
WHERE id = $id
Now if I update e.g. 10000 records each record gets the $value but it does not really matter which $id gets which $value. The only requirement I have is that all the records I am updating end up with a $value.
So how could I convert this update to something like
UPDATE table
SET col ?????? what here from a $value_list???
WHERE id IN ($id_list)
I.e. pass the list ids and somehow the values and that range of ids get a value
Let's assume you've got two comma separated lists of your ids and your values with the same count of items. Then you could do your update with statements like those:
-- the list of the ids
SET #ids = '2,4,5,6';
-- the list of the values
SET #vals = '17, 73,55, 12';
UPDATE yourtable
INNER JOIN (
SELECT
SUBSTRING_INDEX(SUBSTRING_INDEX(t.ids, ',', n.n), ',', -1) id,
SUBSTRING_INDEX(SUBSTRING_INDEX(t.vals, ',', n.n), ',', -1) val
FROM (SELECT #ids as ids, #vals as vals) t
CROSS JOIN (
-- build for up to 1000 separated values
SELECT
1 + a.N + b.N * 10 + c.N * 100 AS n
FROM
(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) c
ORDER BY n
) n
WHERE n <= (1 + LENGTH(t.ids) - LENGTH(REPLACE(t.ids, ',', '')))
) t1
ON
yourtable.id = t1.id
SET
yourtable.val = t1.val;
Explanation
The inner series of UNIONs builds a table with the numbers from 1 to 1000. You should be able to expand this mechanism to your needs:
-- build for up to 1000 separated values
SELECT
a.N + b.N * 10 + c.N * 100 + 1 AS n
FROM
(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) c
ORDER BY n
We use this numbers to get the items out of our lists with the nested SUBSTRING_INDEX call
SUBSTRING_INDEX(SUBSTRING_INDEX(t.ids, ',', n.n), ',', -1) id,
SUBSTRING_INDEX(SUBSTRING_INDEX(t.vals, ',', n.n), ',', -1) val
The WHERE clause get the number of items in (ok only one of the two) lists:
WHERE n <= (1 + LENGTH(t.ids) - LENGTH(REPLACE(t.ids, ',', '')))
Because we've got one occurence of the separator less, we add 1 to the difference in length of the list with the separator and the length of the list without all occurrences of the separator.
Then we do the UPDATE with a JOIN operation on the id values in the outer UPDATE statement.
See it working in this fiddle.
Believe me: This is much faster than agonizing row-by-row update.

Rewrite to not include subquery in FROM?

A great fellow helped me with developing the following statement. However, in mySQL - I cannot save a view with a subquery in the FROM clause. Any suggestions o nhow to rewrite this so that it can be saved into a mySQL server?
SELECT t.idPatternMetadata, SUBSTRING_INDEX(SUBSTRING_INDEX(t.sKeywords, ',', n.n), ',', -1) color , count(*) as counts
FROM tblPatternMetadata t CROSS JOIN
(
SELECT a.N + b.N * 10 + 1 n
FROM
(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) a
,(SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9) b
ORDER BY n
) n
WHERE n.n <= 1 + (LENGTH(t.sKeywords) - LENGTH(REPLACE(t.sKeywords, ',', '')))
group by color
THANKS in advance!
One option is to create a table that contains the 100 integer values, and reference that table in the query.
CREATE TABLE n (n INT UNSIGNED PRIMARY KEY);
INSERT INTO n (n)
SELECT a.n + b.n * 10 + 1 n
FROM ( SELECT 0 AS n UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6
UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) a
CROSS
JOIN ( SELECT 0 AS n UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6
UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) b
ORDER BY 1;
Then rewrite your query to reference the table in place of the inline view:
SELECT t.idPatternMetadata, SUBSTRING_INDEX(SUBSTRING_INDEX(t.sKeywords, ',', n.n), ',', -1) AS color
, count(*) AS counts
FROM tblPatternMetadata t
JOIN n
ON n.n <= 1 + (LENGTH(t.sKeywords) - LENGTH(REPLACE(t.sKeywords, ',', '')))
GROUP BY color

MySQL infinite loop in a subquery?

The following query probably results an infinite loop:
SELECT
*,
(SELECT
t2.`value`
FROM
`table` t2
WHERE
t2.`variable` = 'xxx'
AND t2.`read` = (SELECT
MAX(t1.`read`)
FROM
`table` t1
WHERE
t1.`variable` = 'xxx'
AND UNIX_TIMESTAMP(t1.`read`) < (1401801648 - n.integers)
)
)
FROM
(SELECT
#N:=#N + 1 AS integers
FROM
mysql.help_relation, (SELECT #N:=0) dum
LIMIT 48) n
I need a result with 48 rows for 48 different time ranges (In this example 1401801648 minus {1..48}). Each row should contain a value depending on the current time range. The query on the bottom is for these 48 ranges.
The query in the middle is needed to find the date for the newest entry which is older than the calculated timestamp (1401801648 - n.integers). The upper query tells me the value of the row with the date from the query in the middle.
When the "n.integers" is replaced by a number everything works fine.
Without the subquery (t2) the query is not in a loop(?):
SELECT
*,
(SELECT
MAX(t1.`read`)
FROM
`table` t1
WHERE
t1.`variable` = 'xxx'
AND UNIX_TIMESTAMP(t1.`read`) < (1401801648 - n.integers)
)
FROM
(SELECT
#N:=#N + 1 AS integers
FROM
mysql.help_relation, (SELECT #N:=0) dum
LIMIT 48) AS n
An alternative method avoiding using variables:-
SELECT sub1.a_cnt, t2.value
FROM table t2
INNER JOIN
(
SELECT sub1.a_timestamp, sub1.a_cnt, t1.variable, MAX(t1.read) AS max_timestamp
FROM
(
SELECT (1401801648 - units.i + 10 * tens.i) AS a_cnt,(1401801648 - units.i + 10 * tens.i) AS a_timestamp
FROM
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) units,
(SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) tens
WHERE units.i + 10 * tens.i BETWEEN 1 AND 48
) sub1
INNER JOIN table t1
AND UNIX_TIMESTAMP(t1.read) < sub1.a_timestamp
WHERE t1.variable = 'xxx'
GROUP BY sub1.a_timestamp
) sub2
ON t2.read = sub2.max_timestamp
AND t2.variable = sub2.variable
This uses a load of unioned queries getting constants to generate the numbers 0 to 9, cross joins that against another copy of itself and does a minor calulation to get all the numbers from 0 to 99, with a WHERE clause to narrow it down to the range 1 to 48, and uses this to calculate the timestamps required.
This is then joined against your table to get the max read date for each timestamp / generated number.
The results of this are then joined back against your table to get the other details from that row (in this case your value column).
Not tested it but hopefully it gives you an idea.