How can we find gaps in sequential numbering in MySQL? - mysql

We have a database with a table whose values were imported from another system. There is an auto-increment column, and there aren’t any duplicate values, but there are missing values. For example, running this query:
select count(id) from arrc_vouchers where id between 1 and 100
should return 100, but it returns 87 instead. Is there a query I can run that will return the values of the missing numbers? For example, the records may exist for id 1-70 and 83-100, but there aren’t any records with id's of 71-82. I want to return 71, 72, 73, etc.
Is this possible?

A better answer
JustPlainMJS provided a much better answer in terms of performance.
The (not as fast as possible) answer
Here's a version that works on a table of any size (not just on 100 rows):
SELECT (t1.id + 1) as gap_starts_at,
(SELECT MIN(t3.id) -1 FROM arrc_vouchers t3 WHERE t3.id > t1.id) as gap_ends_at
FROM arrc_vouchers t1
WHERE NOT EXISTS (SELECT t2.id FROM arrc_vouchers t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL
gap_starts_at - first id in current gap
gap_ends_at - last id in current gap

This just worked for me to find the gaps in a table with more than 80k rows:
SELECT
CONCAT(z.expected, IF(z.got-1>z.expected, CONCAT(' thru ',z.got-1), '')) AS missing
FROM (
SELECT
#rownum:=#rownum+1 AS expected,
IF(#rownum=YourCol, 0, #rownum:=YourCol) AS got
FROM
(SELECT #rownum:=0) AS a
JOIN YourTable
ORDER BY YourCol
) AS z
WHERE z.got!=0;
Result:
+------------------+
| missing |
+------------------+
| 1 thru 99 |
| 666 thru 667 |
| 50000 |
| 66419 thru 66456 |
+------------------+
4 rows in set (0.06 sec)
Note that the order of columns expected and got is critical.
If you know that YourCol doesn't start at 1 and that doesn't matter, you can replace
(SELECT #rownum:=0) AS a
with
(SELECT #rownum:=(SELECT MIN(YourCol)-1 FROM YourTable)) AS a
New result:
+------------------+
| missing |
+------------------+
| 666 thru 667 |
| 50000 |
| 66419 thru 66456 |
+------------------+
3 rows in set (0.06 sec)
If you need to perform some kind of shell script task on the missing IDs, you can also use this variant in order to directly produce an expression you can iterate over in Bash.
SELECT GROUP_CONCAT(IF(z.got-1>z.expected, CONCAT('$(',z.expected,' ',z.got-1,')'), z.expected) SEPARATOR " ") AS missing
FROM ( SELECT #rownum:=#rownum+1 AS expected, IF(#rownum=height, 0, #rownum:=height) AS got FROM (SELECT #rownum:=0) AS a JOIN block ORDER BY height ) AS z WHERE z.got!=0;
This produces an output like so
$(seq 1 99) $(seq 666 667) 50000 $(seq 66419 66456)
You can then copy and paste it into a for loop in a bash terminal to execute a command for every ID
for ID in $(seq 1 99) $(seq 666 667) 50000 $(seq 66419 66456); do
echo $ID
# Fill the gaps
done
It's the same thing as above, only that it's both readable and executable. By changing the "CONCAT" command above, syntax can be generated for other programming languages. Or maybe even SQL.

A quick-and-dirty query that should do the trick:
SELECT a AS id, b AS next_id, (b - a) -1 AS missing_inbetween
FROM
(
SELECT a1.id AS a , MIN(a2.id) AS b
FROM arrc_vouchers AS a1
LEFT JOIN arrc_vouchers AS a2 ON a2.id > a1.id
WHERE a1.id <= 100
GROUP BY a1.id
) AS tab
WHERE
b > a + 1
This will give you a table showing the id that has ids missing above it, and next_id that exists, and how many are missing between... E.g.,
id next_id missing_inbetween
1 4 2
68 70 1
75 87 11

If you are using a MariaDB database, you have a faster (800%) option using the sequence storage engine:
SELECT * FROM seq_1_to_50000 WHERE SEQ NOT IN (SELECT COL FROM TABLE);

If there is a sequence having gap of maximum one between two numbers (like
1,3,5,6) then the query that can be used is:
select s.id+1 from source1 s where s.id+1 not in(select id from source1) and s.id+1<(select max(id) from source1);
table_name - source1
column_name - id

An alternative solution that requires a query + some code doing some processing would be:
select l.id lValue, c.id cValue, r.id rValue
from
arrc_vouchers l
right join arrc_vouchers c on l.id=IF(c.id > 0, c.id-1, null)
left join arrc_vouchers r on r.id=c.id+1
where 1=1
and c.id > 0
and (l.id is null or r.id is null)
order by c.id asc;
Note that the query does not contain any subselect that we know it's not handled performantly by MySQL's planner.
That will return one entry per centralValue (cValue) that does not have a smaller value (lValue) or a greater value (rValue), i.e.:
lValue |cValue|rValue
-------+------+-------
{null} | 2 | 3
8 | 9 | {null}
{null} | 22 | 23
23 | 24 | {null}
{null} | 29 | {null}
{null} | 33 | {null}
Without going into further details (we'll see them in next paragraphs) this output means that:
No values between 0 and 2
No values between 9 and 22
No values between 24 and 29
No values between 29 and 33
No values between 33 and MAX VALUE
So the basic idea is to do a RIGHT and LEFT joins with the same table seeing if we have adjacents values per value (i.e., if central value is '3' then we check for 3-1=2 at left and 3+1 at right), and when a ROW has a NULL value at RIGHT or LEFT then we know there is no adjacent value.
The complete raw output of my table is:
select * from arrc_vouchers order by id asc;
0
2
3
4
5
6
7
8
9
22
23
24
29
33
Some notes:
The SQL IF statement in the join condition is needed if you define the 'id' field as UNSIGNED, therefore it will not allow you to decrease it under zero. This is not strictly necessary if you keep the c.value > 0 as it's stated in the next note, but I'm including it just as doc.
I'm filtering the zero central value as we are not interested in any previous value and we can derive the post value from the next row.

I tried it in a different manner, and the best performance that I found was this simple query:
select a.id+1 gapIni
,(select x.id-1 from arrc_vouchers x where x.id>a.id+1 limit 1) gapEnd
from arrc_vouchers a
left join arrc_vouchers b on b.id=a.id+1
where b.id is null
order by 1
;
... one left join to check if the next id exists, only if next if is not found, then the subquery finds the next id that exists to find the end of gap. I did it because the query with equal (=) is better performance than the greater than (>) operator.
Using the sqlfiddle it does not show so a different performance compared to the other queries, but in a real database this query above results in 3 times faster than the others.
The schema:
CREATE TABLE arrc_vouchers (id int primary key)
;
INSERT INTO `arrc_vouchers` (`id`) VALUES (1),(4),(5),(7),(8),(9),(10),(11),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29)
;
Follow below all the queries that I made to compare the performance:
select a.id+1 gapIni
,(select x.id-1 from arrc_vouchers x where x.id>a.id+1 limit 1) gapEnd
from arrc_vouchers a
left join arrc_vouchers b on b.id=a.id+1
where b.id is null
order by 1
;
select *, (gapEnd-gapIni) qt
from (
select id+1 gapIni
,(select x.id from arrc_vouchers x where x.id>a.id limit 1) gapEnd
from arrc_vouchers a
order by id
) a where gapEnd <> gapIni
;
select id+1 gapIni
,(select x.id from arrc_vouchers x where x.id>a.id limit 1) gapEnd
#,coalesce((select id from arrc_vouchers x where x.id=a.id+1),(select x.id from arrc_vouchers x where x.id>a.id limit 1)) gapEnd
from arrc_vouchers a
where id+1 <> (select x.id from arrc_vouchers x where x.id>a.id limit 1)
order by id
;
select id+1 gapIni
,coalesce((select id from arrc_vouchers x where x.id=a.id+1),(select x.id from arrc_vouchers x where x.id>a.id limit 1)) gapEnd
from arrc_vouchers a
order by id
;
select id+1 gapIni
,coalesce((select id from arrc_vouchers x where x.id=a.id+1),concat('*** GAT *** ',(select x.id from arrc_vouchers x where x.id>a.id limit 1))) gapEnd
from arrc_vouchers a
order by id
;
You can see and test my query using this SQL Fiddle:
http://sqlfiddle.com/#!9/6bdca7/1

It is probably not relevant, but I was looking for something like this to list the gaps in a sequence of numbers and found this post that has multiple different solutions depending upon exactly what you are looking for. I was looking for the first available gap in the sequence (i.e., next available number), and this seems to work fine.
SELECT MIN(l.number_sequence + 1) as nextavabile
from patients as l
LEFT OUTER JOIN patients as r on l.number_sequence + 1 = r.number_sequence
WHERE r.number_sequence is NULL
Several other scenarios and solutions discussed there, from 2005!
How to Find Missing Values in a Sequence With SQL

Create a temporary table with 100 rows and a single column containing the values 1-100.
Outer Join this table to your arrc_vouchers table and select the single column values where the arrc_vouchers id is null.
This should work:
select tempid from temptable
left join arrc_vouchers on temptable.tempid = arrc_vouchers.id
where arrc_vouchers.id is null

Although these all seem to work, the result set returns in a very lengthy time when there are 50,000 records.
I used this, and it find the gap or the next available (last used + 1) with a much faster return from the query.
SELECT a.id as beforegap, a.id+1 as avail
FROM table_name a
where (select b.id from table_name b where b.id=a.id+1) is null
limit 1;

Based on the answer given by matt, this stored procedure allows you to specify the table and column names that you wish to test to find non-contiguous records - thus answering the original question and also demonstrating how one could use #var to represent tables &/or columns in a stored procedure.
create definer=`root`#`localhost` procedure `spfindnoncontiguous`(in `param_tbl` varchar(64), in `param_col` varchar(64))
language sql
not deterministic
contains sql
sql security definer
comment ''
begin
declare strsql varchar(1000);
declare tbl varchar(64);
declare col varchar(64);
set #tbl=cast(param_tbl as char character set utf8);
set #col=cast(param_col as char character set utf8);
set #strsql=concat("select
( t1.",#col," + 1 ) as starts_at,
( select min(t3.",#col,") -1 from ",#tbl," t3 where t3.",#col," > t1.",#col," ) as ends_at
from ",#tbl," t1
where not exists ( select t2.",#col," from ",#tbl," t2 where t2.",#col," = t1.",#col," + 1 )
having ends_at is not null");
prepare stmt from #strsql;
execute stmt;
deallocate prepare stmt;
end

A simple, yet effective, solution to find the missing auto-increment values:
SELECT `id`+1
FROM `table_name`
WHERE `id`+1 NOT IN (SELECT id FROM table_name)

Another simple answer that identifies the gaps. We do a query selecting just the odd numbers and we right join it to a query with all the even numbers. As long as you're not missing id 1; this should give you a comprehensive list of where the gaps start.
You'll still have to take a look at that place in the database to figure out how many numbers the gap is. I found this way easier than the solution proposed and much easier to customize to unique situations.
SELECT *
FROM (SELECT * FROM MyTABLE WHERE MYFIELD % 2 > 0) AS A
RIGHT JOIN FROM (SELECT * FROM MyTABLE WHERE MYFIELD % 2 = 0) AS B
ON A.MYFIELD=(B.MYFIELD+1)
WHERE a.id IS NULL;

This works for me:
SELECT distinct(l.membership_no + 1) as nextavabile
from Tablename as l
LEFT OUTER JOIN Tablename as r on l.membership_no + 1 = r.membership_no
WHERE r.membership_no is NULL and l.membership_no is not null order by nextavabile asc;

Starting from the comment posted by user933161,
select l.id + 1 as start from sequence as l inner join sequence as r on l.id + 1 = r.id where r.id is null;
is better in that it will not produce a false positive for the end of the list of records. (I'm not sure why so many are using left outer joins.)
Also,
insert into sequence (id) values (#);
where # is the start value for a gap will fill that start value. (If there are fields that cannot be null, you will have to add those with dummy values.)
You could alternate between querying for start values and filling in each start value until the query for start values returns an empty set.
Of course, this approach would only be helpful if you're working with a small enough data set that manually iterating like that is reasonable. I don't know enough about things like phpMyAdmin to come up with ways to automate it for larger sets with more and larger gaps.

CREATE TABLE arrc_vouchers (id int primary key);
INSERT INTO `arrc_vouchers` (`id`) VALUES (1),(4),(5),(7),(8),(9),(10),(11),(15),(16);
WITH RECURSIVE odd_num_cte (id) AS
(
SELECT (select min(id) from arrc_vouchers)
union all
SELECT id+1 from odd_num_cte where id <(SELECT max(id) from arrc_vouchers)
)
SELECT cte.id
from arrc_vouchers ar right outer join odd_num_cte cte on ar.id=cte.id
where ar.id is null;

Related

mysql constraints for auto-incremenenting range key with existing values [duplicate]

I'd like to find the first "gap" in a counter column in an SQL table. For example, if there are values 1,2,4 and 5 I'd like to find out 3.
I can of course get the values in order and go through it manually, but I'd like to know if there would be a way to do it in SQL.
In addition, it should be quite standard SQL, working with different DBMSes.
In MySQL and PostgreSQL:
SELECT id + 1
FROM mytable mo
WHERE NOT EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = mo.id + 1
)
ORDER BY
id
LIMIT 1
In SQL Server:
SELECT TOP 1
id + 1
FROM mytable mo
WHERE NOT EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = mo.id + 1
)
ORDER BY
id
In Oracle:
SELECT *
FROM (
SELECT id + 1 AS gap
FROM mytable mo
WHERE NOT EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = mo.id + 1
)
ORDER BY
id
)
WHERE rownum = 1
ANSI (works everywhere, least efficient):
SELECT MIN(id) + 1
FROM mytable mo
WHERE NOT EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = mo.id + 1
)
Systems supporting sliding window functions:
SELECT -- TOP 1
-- Uncomment above for SQL Server 2012+
previd
FROM (
SELECT id,
LAG(id) OVER (ORDER BY id) previd
FROM mytable
) q
WHERE previd <> id - 1
ORDER BY
id
-- LIMIT 1
-- Uncomment above for PostgreSQL
Your answers all work fine if you have a first value id = 1, otherwise this gap will not be detected. For instance if your table id values are 3,4,5, your queries will return 6.
I did something like this
SELECT MIN(ID+1) FROM (
SELECT 0 AS ID UNION ALL
SELECT
MIN(ID + 1)
FROM
TableX) AS T1
WHERE
ID+1 NOT IN (SELECT ID FROM TableX)
There isn't really an extremely standard SQL way to do this, but with some form of limiting clause you can do
SELECT `table`.`num` + 1
FROM `table`
LEFT JOIN `table` AS `alt`
ON `alt`.`num` = `table`.`num` + 1
WHERE `alt`.`num` IS NULL
LIMIT 1
(MySQL, PostgreSQL)
or
SELECT TOP 1 `num` + 1
FROM `table`
LEFT JOIN `table` AS `alt`
ON `alt`.`num` = `table`.`num` + 1
WHERE `alt`.`num` IS NULL
(SQL Server)
or
SELECT `num` + 1
FROM `table`
LEFT JOIN `table` AS `alt`
ON `alt`.`num` = `table`.`num` + 1
WHERE `alt`.`num` IS NULL
AND ROWNUM = 1
(Oracle)
The first thing that came into my head. Not sure if it's a good idea to go this way at all, but should work. Suppose the table is t and the column is c:
SELECT
t1.c + 1 AS gap
FROM t as t1
LEFT OUTER JOIN t as t2 ON (t1.c + 1 = t2.c)
WHERE t2.c IS NULL
ORDER BY gap ASC
LIMIT 1
Edit: This one may be a tick faster (and shorter!):
SELECT
min(t1.c) + 1 AS gap
FROM t as t1
LEFT OUTER JOIN t as t2 ON (t1.c + 1 = t2.c)
WHERE t2.c IS NULL
This works in SQL Server - can't test it in other systems but it seems standard...
SELECT MIN(t1.ID)+1 FROM mytable t1 WHERE NOT EXISTS (SELECT ID FROM mytable WHERE ID = (t1.ID + 1))
You could also add a starting point to the where clause...
SELECT MIN(t1.ID)+1 FROM mytable t1 WHERE NOT EXISTS (SELECT ID FROM mytable WHERE ID = (t1.ID + 1)) AND ID > 2000
So if you had 2000, 2001, 2002, and 2005 where 2003 and 2004 didn't exist, it would return 2003.
The following solution:
provides test data;
an inner query that produces other gaps; and
it works in SQL Server 2012.
Numbers the ordered rows sequentially in the "with" clause and then reuses the result twice with an inner join on the row number, but offset by 1 so as to compare the row before with the row after, looking for IDs with a gap greater than 1. More than asked for but more widely applicable.
create table #ID ( id integer );
insert into #ID values (1),(2), (4),(5),(6),(7),(8), (12),(13),(14),(15);
with Source as (
select
row_number()over ( order by A.id ) as seq
,A.id as id
from #ID as A WITH(NOLOCK)
)
Select top 1 gap_start from (
Select
(J.id+1) as gap_start
,(K.id-1) as gap_end
from Source as J
inner join Source as K
on (J.seq+1) = K.seq
where (J.id - (K.id-1)) <> 0
) as G
The inner query produces:
gap_start gap_end
3 3
9 11
The outer query produces:
gap_start
3
Inner join to a view or sequence that has a all possible values.
No table? Make a table. I always keep a dummy table around just for this.
create table artificial_range(
id int not null primary key auto_increment,
name varchar( 20 ) null ) ;
-- or whatever your database requires for an auto increment column
insert into artificial_range( name ) values ( null )
-- create one row.
insert into artificial_range( name ) select name from artificial_range;
-- you now have two rows
insert into artificial_range( name ) select name from artificial_range;
-- you now have four rows
insert into artificial_range( name ) select name from artificial_range;
-- you now have eight rows
--etc.
insert into artificial_range( name ) select name from artificial_range;
-- you now have 1024 rows, with ids 1-1024
Then,
select a.id from artificial_range a
where not exists ( select * from your_table b
where b.counter = a.id) ;
This one accounts for everything mentioned so far. It includes 0 as a starting point, which it will default to if no values exist as well. I also added the appropriate locations for the other parts of a multi-value key. This has only been tested on SQL Server.
select
MIN(ID)
from (
select
0 ID
union all
select
[YourIdColumn]+1
from
[YourTable]
where
--Filter the rest of your key--
) foo
left join
[YourTable]
on [YourIdColumn]=ID
and --Filter the rest of your key--
where
[YourIdColumn] is null
For PostgreSQL
An example that makes use of recursive query.
This might be useful if you want to find a gap in a specific range
(it will work even if the table is empty, whereas the other examples will not)
WITH
RECURSIVE a(id) AS (VALUES (1) UNION ALL SELECT id + 1 FROM a WHERE id < 100), -- range 1..100
b AS (SELECT id FROM my_table) -- your table ID list
SELECT a.id -- find numbers from the range that do not exist in main table
FROM a
LEFT JOIN b ON b.id = a.id
WHERE b.id IS NULL
-- LIMIT 1 -- uncomment if only the first value is needed
My guess:
SELECT MIN(p1.field) + 1 as gap
FROM table1 AS p1
INNER JOIN table1 as p3 ON (p1.field = p3.field + 2)
LEFT OUTER JOIN table1 AS p2 ON (p1.field = p2.field + 1)
WHERE p2.field is null;
I wrote up a quick way of doing it. Not sure this is the most efficient, but gets the job done. Note that it does not tell you the gap, but tells you the id before and after the gap (keep in mind the gap could be multiple values, so for example 1,2,4,7,11 etc)
I'm using sqlite as an example
If this is your table structure
create table sequential(id int not null, name varchar(10) null);
and these are your rows
id|name
1|one
2|two
4|four
5|five
9|nine
The query is
select a.* from sequential a left join sequential b on a.id = b.id + 1 where b.id is null and a.id <> (select min(id) from sequential)
union
select a.* from sequential a left join sequential b on a.id = b.id - 1 where b.id is null and a.id <> (select max(id) from sequential);
https://gist.github.com/wkimeria/7787ffe84d1c54216f1b320996b17b7e
Here is an alternative to show the range of all possible gap values in portable and more compact way :
Assume your table schema looks like this :
> SELECT id FROM your_table;
+-----+
| id |
+-----+
| 90 |
| 103 |
| 104 |
| 118 |
| 119 |
| 120 |
| 121 |
| 161 |
| 162 |
| 163 |
| 185 |
+-----+
To fetch the ranges of all possible gap values, you have the following query :
The subquery lists pairs of ids, each of which has the lowerbound column being smaller than upperbound column, then use GROUP BY and MIN(m2.id) to reduce number of useless records.
The outer query further removes the records where lowerbound is exactly upperbound - 1
My query doesn't (explicitly) output the 2 records (YOUR_MIN_ID_VALUE, 89) and (186, YOUR_MAX_ID_VALUE) at both ends, that implicitly means any number in both of the ranges hasn't been used in your_table so far.
> SELECT m3.lowerbound + 1, m3.upperbound - 1 FROM
(
SELECT m1.id as lowerbound, MIN(m2.id) as upperbound FROM
your_table m1 INNER JOIN your_table
AS m2 ON m1.id < m2.id GROUP BY m1.id
)
m3 WHERE m3.lowerbound < m3.upperbound - 1;
+-------------------+-------------------+
| m3.lowerbound + 1 | m3.upperbound - 1 |
+-------------------+-------------------+
| 91 | 102 |
| 105 | 117 |
| 122 | 160 |
| 164 | 184 |
+-------------------+-------------------+
select min([ColumnName]) from [TableName]
where [ColumnName]-1 not in (select [ColumnName] from [TableName])
and [ColumnName] <> (select min([ColumnName]) from [TableName])
Here is standard a SQL solution that runs on all database servers with no change:
select min(counter + 1) FIRST_GAP
from my_table a
where not exists (select 'x' from my_table b where b.counter = a.counter + 1)
and a.counter <> (select max(c.counter) from my_table c);
See in action for;
PL/SQL via Oracle's livesql,
MySQL via sqlfiddle,
PostgreSQL via sqlfiddle
MS Sql via sqlfiddle
It works for empty tables or with negatives values as well. Just tested in SQL Server 2012
select min(n) from (
select case when lead(i,1,0) over(order by i)>i+1 then i+1 else null end n from MyTable) w
If You use Firebird 3 this is most elegant and simple:
select RowID
from (
select `ID_Column`, Row_Number() over(order by `ID_Column`) as RowID
from `Your_Table`
order by `ID_Column`)
where `ID_Column` <> RowID
rows 1
-- PUT THE TABLE NAME AND COLUMN NAME BELOW
-- IN MY EXAMPLE, THE TABLE NAME IS = SHOW_GAPS AND COLUMN NAME IS = ID
-- PUT THESE TWO VALUES AND EXECUTE THE QUERY
DECLARE #TABLE_NAME VARCHAR(100) = 'SHOW_GAPS'
DECLARE #COLUMN_NAME VARCHAR(100) = 'ID'
DECLARE #SQL VARCHAR(MAX)
SET #SQL =
'SELECT TOP 1
'+#COLUMN_NAME+' + 1
FROM '+#TABLE_NAME+' mo
WHERE NOT EXISTS
(
SELECT NULL
FROM '+#TABLE_NAME+' mi
WHERE mi.'+#COLUMN_NAME+' = mo.'+#COLUMN_NAME+' + 1
)
ORDER BY
'+#COLUMN_NAME
-- SELECT #SQL
DECLARE #MISSING_ID TABLE (ID INT)
INSERT INTO #MISSING_ID
EXEC (#SQL)
--select * from #MISSING_ID
declare #var_for_cursor int
DECLARE #LOW INT
DECLARE #HIGH INT
DECLARE #FINAL_RANGE TABLE (LOWER_MISSING_RANGE INT, HIGHER_MISSING_RANGE INT)
DECLARE IdentityGapCursor CURSOR FOR
select * from #MISSING_ID
ORDER BY 1;
open IdentityGapCursor
fetch next from IdentityGapCursor
into #var_for_cursor
WHILE ##FETCH_STATUS = 0
BEGIN
SET #SQL = '
DECLARE #LOW INT
SELECT #LOW = MAX('+#COLUMN_NAME+') + 1 FROM '+#TABLE_NAME
+' WHERE '+#COLUMN_NAME+' < ' + cast( #var_for_cursor as VARCHAR(MAX))
SET #SQL = #sql + '
DECLARE #HIGH INT
SELECT #HIGH = MIN('+#COLUMN_NAME+') - 1 FROM '+#TABLE_NAME
+' WHERE '+#COLUMN_NAME+' > ' + cast( #var_for_cursor as VARCHAR(MAX))
SET #SQL = #sql + 'SELECT #LOW,#HIGH'
INSERT INTO #FINAL_RANGE
EXEC( #SQL)
fetch next from IdentityGapCursor
into #var_for_cursor
END
CLOSE IdentityGapCursor;
DEALLOCATE IdentityGapCursor;
SELECT ROW_NUMBER() OVER(ORDER BY LOWER_MISSING_RANGE) AS 'Gap Number',* FROM #FINAL_RANGE
Found most of approaches run very, very slow in mysql. Here is my solution for mysql < 8.0. Tested on 1M records with a gap near the end ~ 1sec to finish. Not sure if it fits other SQL flavours.
SELECT cardNumber - 1
FROM
(SELECT #row_number := 0) as t,
(
SELECT (#row_number:=#row_number+1), cardNumber, cardNumber-#row_number AS diff
FROM cards
ORDER BY cardNumber
) as x
WHERE diff >= 1
LIMIT 0,1
I assume that sequence starts from `1`.
If your counter is starting from 1 and you want to generate first number of sequence (1) when empty, here is the corrected piece of code from first answer valid for Oracle:
SELECT
NVL(MIN(id + 1),1) AS gap
FROM
mytable mo
WHERE 1=1
AND NOT EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = mo.id + 1
)
AND EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = 1
)
DECLARE #Table AS TABLE(
[Value] int
)
INSERT INTO #Table ([Value])
VALUES
(1),(2),(4),(5),(6),(10),(20),(21),(22),(50),(51),(52),(53),(54),(55)
--Gaps
--Start End Size
--3 3 1
--7 9 3
--11 19 9
--23 49 27
SELECT [startTable].[Value]+1 [Start]
,[EndTable].[Value]-1 [End]
,([EndTable].[Value]-1) - ([startTable].[Value]) Size
FROM
(
SELECT [Value]
,ROW_NUMBER() OVER(PARTITION BY 1 ORDER BY [Value]) Record
FROM #Table
)AS startTable
JOIN
(
SELECT [Value]
,ROW_NUMBER() OVER(PARTITION BY 1 ORDER BY [Value]) Record
FROM #Table
)AS EndTable
ON [EndTable].Record = [startTable].Record+1
WHERE [startTable].[Value]+1 <>[EndTable].[Value]
If the numbers in the column are positive integers (starting from 1) then here is how to solve it easily. (assuming ID is your column name)
SELECT TEMP.ID
FROM (SELECT ROW_NUMBER() OVER () AS NUM FROM 'TABLE-NAME') AS TEMP
WHERE ID NOT IN (SELECT ID FROM 'TABLE-NAME')
ORDER BY 1 ASC LIMIT 1
SELECT ID+1 FROM table WHERE ID+1 NOT IN (SELECT ID FROM table) ORDER BY 1;

BigQuery and NOT IN

Sorry for my bad English. Hopefully you understand, what I want.
I want something like a Pivot table (hopefully it's the right word)
For example I have a table with two columns: userid and domain
UserID Domain
1 | A
1 | B
1 | C
2 | A
2 | B
3 | A
2 | C
What I want. I want a table like the following which extracts the differences row-wise
A B C
A 0 1 1
B 0 0 0
C 0 0 0
How the read the output?
For example the first row (0,1,1)
Imagine all users which visited domain A (in our case user 1, user 2 and user 3).... All of domain A visitors were on domain A (I guess that's clear). The also visited domain B? No, one user (in our case user 3) was not on domain B. So we have a 1. Now we check if all domain A visitors were on domain C! And here we have also on user which was not on domain C. User 1 and 2 were on domain C but user 3 was not on domain C but on domain A. So we have to write a 1 again....
Second row - Check which users where on Domain B.
User 1 and user 2 were on domain B. Where they also on domain A? Yes... Both... So we have to write down a 0. User 1 and user 2 were on domain B? Yes... So 0. And on domain C? Yes... Both.. So we have to write a zero again.
Third row - To check the domain C
On domain C we have the visitors 1 and 2. Both also visited domain A so we have a zero... Both visited domain B? Yes, also zero and the last entry is clear since they came from domain C.....
To keep the long story short: I want to extract all exclusive visitors of each domain compared to the other domains...
I am struggling since 2 days with left joins and case when and so on... Nothing works out.
Is there anybody out their with suggestions? Would be really helpful. And yes, I have more than 3 domains. I have around 200!
very very big query :) , but it's working
DROP PROCEDURE IF EXISTS dowhile;
CREATE PROCEDURE dowhile()
BEGIN
SELECT #domain_arr := CONCAT(GROUP_CONCAT(domain SEPARATOR ','),',') AS domain_arr FROM ( SELECT t1.domain FROM user_domain t1 WHERE 1 GROUP BY t1.domain ) AS tt;
DROP table IF EXISTS temp_table;
create temporary table temp_table (
domain VARCHAR(100) not NULL
);
SET #domain_arr_table= #domain_arr;
WHILE LOCATE(',', #domain_arr_table) > 0 DO
SET #domain = SUBSTRING(#domain_arr_table,1,LOCATE(',',#domain_arr_table) - 1);
SET #domain_arr_table= SUBSTRING(#domain_arr_table, LOCATE(',',#domain_arr_table) + 1);
SET #s= CONCAT('ALTER TABLE temp_table ADD COLUMN ',#domain,' TINYINT DEFAULT 0');
PREPARE stmt3 FROM #s;
EXECUTE stmt3;
END WHILE;
WHILE LOCATE(',', #domain_arr) > 0 DO
SET #domain = SUBSTRING(#domain_arr,1,LOCATE(',',#domain_arr) - 1);
SET #domain_arr= SUBSTRING(#domain_arr, LOCATE(',',#domain_arr) + 1);
SELECT #user_count := COUNT(*) FROM user_domain WHERE domain=#domain;
INSERT INTO temp_table (domain) VALUES (#domain);
SELECT #domains_should_be_1 := CONCAT(GROUP_CONCAT(domain SEPARATOR ','),',') FROM (SELECT domain FROM user_domain WHERE user_id IN (SELECT user_id FROM user_domain WHERE domain=#domain) GROUP BY domain HAVING COUNT(*) < #user_count) AS tt2;
WHILE LOCATE(',', #domains_should_be_1) > 0 DO
SET #domain_sb_1 = SUBSTRING(#domains_should_be_1,1,LOCATE(',',#domains_should_be_1) - 1);
SET #domains_should_be_1= SUBSTRING(#domains_should_be_1, LOCATE(',',#domains_should_be_1) + 1);
SET #s= CONCAT("UPDATE temp_table SET ",#domain_sb_1,"='1' WHERE domain='",#domain,"'");
SELECT #s;
PREPARE stmt3 FROM #s;
EXECUTE stmt3;
END WHILE;
END WHILE;
END;
call dowhile();
SELECT * FROM temp_table;
There are really two questions here
I want to extract all exclusive visitors of each domain compared to the other domains...
I want something like a Pivot table
Let me answer your questions one by one
So,
How extract all exclusive visitors of each domain compared to the other domains...
Below is for BigQuery Standard SQL and produces flattened version of your matrix
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 1 userid, 'A' domain UNION ALL
SELECT 1, 'B' UNION ALL
SELECT 1, 'C' UNION ALL
SELECT 2, 'A' UNION ALL
SELECT 2, 'B' UNION ALL
SELECT 3, 'A' UNION ALL
SELECT 2, 'C'
), temp AS (
SELECT DISTINCT userid, domain
FROM `project.dataset.your_table`
)
SELECT
a.domain domain_a,
b.domain domain_b,
COUNT(DISTINCT a.userid) - COUNTIF(a.userid = b.userid) count_of_not_in
FROM temp a
CROSS JOIN temp b
GROUP BY a.domain, b.domain
-- HAVING count_of_not_in > 0
This will result with
Row domain_a domain_b count_of_not_in
1 A A 0
2 A B 1
3 A C 1
4 B A 0
5 B B 0
6 B C 0
7 C A 0
8 C B 0
9 C C 0
I think in real life you will not have many zeroes in this data so if you want to compress that flattened version - just uncomment line with HAVING ... , so you will get "compact" version
Row domain_a domain_b count_of_not_in
1 A B 1
2 A C 1
For the sake of exercising and having fun, check out another approach below that produces exactly same result but in totally different way
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 1 userid, 'A' domain UNION ALL
SELECT 1, 'B' UNION ALL
SELECT 1, 'C' UNION ALL
SELECT 2, 'A' UNION ALL
SELECT 2, 'B' UNION ALL
SELECT 3, 'A' UNION ALL
SELECT 2, 'C'
), domains AS (
SELECT domain, ARRAY_AGG(DISTINCT userid) users
FROM `project.dataset.your_table`
GROUP BY domain
)
SELECT
a.domain domain_a, b.domain domain_b,
ARRAY_LENGTH(a.users) -
(SELECT COUNT(1)
FROM UNNEST(a.users) user_a
JOIN UNNEST(b.users) user_b
ON user_a = user_b
) count_of_not_in
FROM domains a
CROSS JOIN domains b
-- ORDER BY a.domain, b.domain
Now,
How to pivot above result, to produce actual matrix?
Ideally, pivoting should be done outside of BigQuery in whatever visualization tool you are usually using. But if for whatever reason you want to have it done within BigQuery - it is doable and there is enormous amount of questions here in SO related to this. One of the most recent that I have posted answer for is - https://stackoverflow.com/a/50300387/5221944 .
It shows how to generate/produce pivot query to achieve desired matrix
It is relatively easy and can be done either manually as two step process (step 1 - generate pivot query and step 2 - run generated query) or can be implemented using any client of your choice
You cannot (easily) express this as a matrix. But you can express this as a table with three columns: , , count.
with t as ( -- may not be necessary if the rows are already unique
select distinct userid, domain
from tab
)
select t1.domain as domain1, t2.domain as domain2, count(*)
from t t1 join
t t2
on t1.userid = t2.userid
group by t1.domain, t2.domain;
You cannot easily pivot the results in BigQuery into columns, unless you explicitly know the domains that you care about. You can aggregate them into columns, if you like.
For a given set of domains as columns, you can use conditional aggregation:
with t as ( -- may not be necessary if the rows are already unique
select distinct userid, domain
from tab
)
select t1.domain as domain1,
sum(case when t2.domain = 'amazon.com' then 1 else 0 end) as amazon,
sum(case when t2.domain = 'ebay.com' then 1 else 0 end) as ebay,
sum(case when t2.domain = 'yahoo.com' then 1 else 0 end) as yahoo
from t t1 join
t t2
on t1.userid = t2.userid
group by t1.domain, t2.domain;

MySQL: How to find sequence of values in column

I have long list of rows with random values:
| id | value |
|----|-------|
| 1 | abcd |
| 2 | qwer |
| 3 | jklm |
| 4 | yxcv |
| 5 | tzui |
Then I have an array of few values:
array('qwer', 'jklm');
And I need to know, if this sequence of values from array already exists in table in given order. In this case the sequence of values exists.
I tried to concat all values from table and array and match two strings, which works great with few rows but there are actually hundred of thousand of rows in table. I believe there should be a better solution.
If your list is short, you could just do a self-join and spell out the conditions for each joined table reference:
select t1.id from MyTable as t1 join MyTable as t2
where t1.value='qwer' and t2.value='jklm' and t1.id=t2.id-1;
This returns an empty set if there's no such sequence. And of course it assumes that the id numbers are consecutive (they are in your example, but in general that's a risky assumption).
This doesn't work well if your list gets really long. There's a hard limit of 63 table references MySQL supports in a single query.
Here's another solution, which works for any size list, but only if your id values are known to be consecutive:
select t1.id from MyTable as t1 join MyTable as t2
on t2.id between t1.id and t1.id+1
where t1.value = 'qwer' and t2.value in ('qwer','jklm')
group by t1.id
having group_concat(t2.value order by t2.id) = 'qwer,jklm';
The t1 row is the beginning of the potential matching sequence of rows, so it must match the first value in your list.
Then join to the t2 rows, which are the complete set of potentially matching rows.
The set of t2 rows is also limited to a set no more than N rows, based on the size of your list of N values you're searching for. But SQL has no way of making a group based on the number of rows, we can only limit based on some value in the row. So that's why this works if your id values can be assumed to be consecutive.
This way you can do it for the whole set:
select value1, value2
from
(
select *
from (
SELECT [IMEPAC] value1 , ROW_NUMBER() over(order by [MATBR]) rn1
FROM [PACM]
) a1 join
(
SELECT [IMEPAC] value2 , ROW_NUMBER() over(order by [MATBR]) rn2
FROM [PACM]
) a2 on a1.rn1 = a2.rn2 + 1
) a
group by value1, value2
having count(*) > 1
It is written for MS SQL but you can easily rewrite it to fit mysql too.
I run this against table with > 400000 rows on IMEPAC which is not part of any index and it run (first and only once) for 6 sec.
Here is Mysql version:
select value1, value2, count(*) count
from
(
select *
from (
SELECT #row_number1:= #row_number1 + 1 AS rn1, content as value1
FROM docs,(SELECT #row_number1:=0) AS t
order by id
) a1 join
(
SELECT #row_number2:= #row_number2 + 1 AS rn2, content value2
FROM docs,(SELECT #row_number2:=0) AS t
order by id
) a2 on a1.rn1 = a2.rn2 + 1
) a
group by value1, value2
having count(*) > 1;
SQL Fiddle here

DELETE a record in relational position in MySQL?

I am trying to clean up records stored in a MySQL table. If a row contains %X%, I need to delete that row and the row immediately below it, regardless of content. E.g. (sorry if the table is insulting anyone's intelligence):
| 1 | leave alone
| 2 | Contains %X% - Delete
| 3 | This row should also be deleted
| 4 | leave alone
| 5 | Contains %X% - Delete
| 6 | This row should also be deleted
| 7 | leave alone
Is there a way to do this using only a couple of queries? Or am I going to have to execute a SELECT query first (using the %x% search parameter) then loop through those results and execute a DELETE...WHERE for each index returned + 1
This should work although its a bit clunky (might want to check the LIKE argument as it uses pattern matching (see comments)
DELETE FROM table.db
WHERE idcol IN
( SELECT idcol FROM db.table WHERE col LIKE '%X%')
OR idcolIN
( SELECTidcol+1 FROMdb.tableWHEREcol` LIKE '%X%')
Let's assume the table was named test and contained to columns named id and data.
We start with a SELECT that gives us the id of all rows that have a preceding row (highest id of all ids lower than id of our current row):
SELECT t1.id FROM test t1
JOIN test t2 ON
( t2.id, true )
=
( SELECT t3.id, t3.data LIKE '%X%' FROM test t3
WHERE t3.id < t1.id ORDER BY id DESC LIMIT 1 )
That gives us the ids 3 and 6. Their preceding rows 2 and 5 contain %X%, so that's good.
Now lets get the ids of the rows that contain %X% and combine them with the previous ones, via UNION:
(SELECT t1.id FROM test t1
JOIN test t2 ON
( t2.id, true )
=
( SELECT t3.id, t3.data LIKE '%X%' FROM test t3
WHERE t3.id < t1.id ORDER BY id DESC LIMIT 1 )
)
UNION
(
SELECT id FROM test WHERE data LIKE '%X%'
)
That gives us 3, 6, 2, 5 - nice!
Now, we can't delete from a table and select from the same table in MySQL - so lets use a temporary table, store our ids that are to be deleted in there, and then read from that temporary table to delete from our original table:
CREATE TEMPORARY TABLE deleteids (id INT);
INSERT INTO deleteids
(SELECT t1.id FROM test t1
JOIN test t2 ON
( t2.id, true )
=
( SELECT t3.id, t3.data LIKE '%X%' FROM test t3
WHERE t3.id < t1.id ORDER BY id DESC LIMIT 1 )
)
UNION
(
SELECT id FROM test WHERE data LIKE '%X%'
);
DELETE FROM test WHERE id in (SELECT * FROM deleteids);
... and we are left with the ids 1, 4 and 7 in our test table!
(And since the previous rows are selected using <, ORDER BY and LIMIT, this also works if the ids are not continuous.)
You can do it all in a single DELETE statement:
Assuming the "row immediately after" is based on the order of your INT-based ID column, you can use MySQL variables to assign row numbers which accounts for gaps in your IDs:
DELETE a FROM tbl a
JOIN (
SELECT a.id, b.id AS nextid
FROM (
SELECT a.id, a.text, #rn:=#rn+1 AS rownum
FROM tbl a
CROSS JOIN (SELECT #rn:=1) rn_init
ORDER BY a.id
) a
LEFT JOIN (
SELECT a.id, #rn2:=#rn2+1 AS rownum
FROM tbl a
CROSS JOIN (SELECT #rn2:=0) rn_init
ORDER BY a.id
) b ON a.rownum = b.rownum
WHERE a.text LIKE '%X%'
) b ON a.id IN (b.id, b.nextid)
SQL Fiddle Demo (added additional data for example)
What this does is it first takes your data and ranks it based on your ID column, then we do an offset LEFT JOIN on an almost identical result set except that the rank column is behind by 1. This gets the rows and their immediate "next" rows side by side so that we can pull both of their id's at the same time in the parent DELETE statement:
SELECT a.id, a.text, b.id AS nextid, b.text AS nexttext
FROM (
SELECT a.id, a.text, #rn:=#rn+1 AS rownum
FROM tbl a
CROSS JOIN (SELECT #rn:=1) rn_init
ORDER BY a.id
) a
LEFT JOIN (
SELECT a.id, a.text, #rn2:=#rn2+1 AS rownum
FROM tbl a
CROSS JOIN (SELECT #rn2:=0) rn_init
ORDER BY a.id
) b ON a.rownum = b.rownum
WHERE a.text LIKE '%X%'
Yields:
ID | TEXT | NEXTID | NEXTTEXT
2 | Contains %X% - Delete | 3 | This row should also be deleted
5 | Contains %X% - Delete | 6 | This row should also be deleted
257 | Contains %X% - Delete | 3434 | This row should also be deleted
4000 | Contains %X% - Delete | 4005 | Contains %X% - Delete
4005 | Contains %X% - Delete | 6000 | Contains %X% - Delete
6000 | Contains %X% - Delete | 6534 | This row should also be deleted
We then JOIN-DELETE that entire statement on the condition that it deletes rows whose IDs are either the "subselected" ID or NEXTID.
There is no reasonable way of doing this in a single query. (It may be possible, but the query you end up having to use will be unreasonably complex, and will almost certainly not be portable to other SQL engines.)
Use the SELECT-then-DELETE approach you described in your question.

Adding one extra row to the result of MySQL select query

I have a MySQL table like this
id Name count
1 ABC 1
2 CDF 3
3 FGH 4
using simply select query I get the values as
1 ABC 1
2 CDF 3
3 FGH 4
How I can get the result like this
1 ABC 1
2 CDF 3
3 FGH 4
4 NULL 0
You can see Last row. When Records are finished an extra row in this format
last_id+1, Null ,0 should be added. You can see above. Even I have no such row in my original table. There may be N rows not fixed 3,4
The answer is very simple
select (select max(id) from mytable)+1 as id, NULL as Name, 0 as count union all select id,Name,count from mytable;
This looks a little messy but it should work.
SELECT a.id, b.name, coalesce(b.`count`) as `count`
FROM
(
SELECT 1 as ID
UNION
SELECT 2 as ID
UNION
SELECT 3 as ID
UNION
SELECT 4 as ID
) a LEFT JOIN table1 b
ON a.id = b.id
WHERE a.ID IN (1,2,3,4)
UPDATE 1
You could simply generate a table that have 1 column preferably with name (ID) that has records maybe up 10,000 or more. Then you could simply join it with your table that has the original record. For Example, assuming that you have a table named DummyRecord with 1 column and has 10,000 rows on it
SELECT a.id, b.name, coalesce(b.`count`) as `count`
FROM DummyRecord a LEFT JOIN table1 b
ON a.id = b.id
WHERE a.ID >= 1 AND
a.ID <= 4
that's it. Or if you want to have from 10 to 100, then you could use this condition
...
WHERE a.ID >= 10 AND
a.ID <= 100
To clarify this is how one can append an extra row to the result set
select * from table union select 123 as id,'abc' as name
results
id | name
------------
*** | ***
*** | ***
123 | abc
Simply use mysql ROLLUP.
SELECT * FROM your_table
GROUP BY Name WITH ROLLUP;
select
x.id,
t.name,
ifnull(t.count, 0) as count
from
(SELECT 1 AS id
-- Part of the query below, you will need to generate dynamically,
-- just as you would otherwise need to generate 'in (1,2,3,4)'
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
) x
LEFT JOIN YourTable t
ON t.id = x.id
If the id does not exist in the table you're selecting from, you'll need to LEFT JOIN against a list of every id you want returned - this way, it will return the null values for ones that don't exist and the true values for those that do.
I would suggest creating a numbers table that is a single-columned table filled with numbers:
CREATE TABLE `numbers` (
id int(11) unsigned NOT NULL
);
And then inserting a large amount of numbers, starting at 1 and going up to what you think the highest id you'll ever see plus a thousand or so. Maybe go from 1 to 1000000 to be on the safe side. Regardless, you just need to make sure it's more-than-high enough to cover any possible id you'll run into.
After that, your query can look like:
SELECT n.id, a.*
FROM
`numbers` n
LEFT JOIN table t
ON t.id = n.id
WHERE n.id IN (1,2,3,4);
This solution will allow for a dynamically growing list of ids without the need for a sub-query with a list of unions; though, the other solutions provided will equally work for a small known list too (and could also be dynamically generated).