mysql constraints for auto-incremenenting range key with existing values [duplicate] - mysql

I'd like to find the first "gap" in a counter column in an SQL table. For example, if there are values 1,2,4 and 5 I'd like to find out 3.
I can of course get the values in order and go through it manually, but I'd like to know if there would be a way to do it in SQL.
In addition, it should be quite standard SQL, working with different DBMSes.

In MySQL and PostgreSQL:
SELECT id + 1
FROM mytable mo
WHERE NOT EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = mo.id + 1
)
ORDER BY
id
LIMIT 1
In SQL Server:
SELECT TOP 1
id + 1
FROM mytable mo
WHERE NOT EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = mo.id + 1
)
ORDER BY
id
In Oracle:
SELECT *
FROM (
SELECT id + 1 AS gap
FROM mytable mo
WHERE NOT EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = mo.id + 1
)
ORDER BY
id
)
WHERE rownum = 1
ANSI (works everywhere, least efficient):
SELECT MIN(id) + 1
FROM mytable mo
WHERE NOT EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = mo.id + 1
)
Systems supporting sliding window functions:
SELECT -- TOP 1
-- Uncomment above for SQL Server 2012+
previd
FROM (
SELECT id,
LAG(id) OVER (ORDER BY id) previd
FROM mytable
) q
WHERE previd <> id - 1
ORDER BY
id
-- LIMIT 1
-- Uncomment above for PostgreSQL

Your answers all work fine if you have a first value id = 1, otherwise this gap will not be detected. For instance if your table id values are 3,4,5, your queries will return 6.
I did something like this
SELECT MIN(ID+1) FROM (
SELECT 0 AS ID UNION ALL
SELECT
MIN(ID + 1)
FROM
TableX) AS T1
WHERE
ID+1 NOT IN (SELECT ID FROM TableX)

There isn't really an extremely standard SQL way to do this, but with some form of limiting clause you can do
SELECT `table`.`num` + 1
FROM `table`
LEFT JOIN `table` AS `alt`
ON `alt`.`num` = `table`.`num` + 1
WHERE `alt`.`num` IS NULL
LIMIT 1
(MySQL, PostgreSQL)
or
SELECT TOP 1 `num` + 1
FROM `table`
LEFT JOIN `table` AS `alt`
ON `alt`.`num` = `table`.`num` + 1
WHERE `alt`.`num` IS NULL
(SQL Server)
or
SELECT `num` + 1
FROM `table`
LEFT JOIN `table` AS `alt`
ON `alt`.`num` = `table`.`num` + 1
WHERE `alt`.`num` IS NULL
AND ROWNUM = 1
(Oracle)

The first thing that came into my head. Not sure if it's a good idea to go this way at all, but should work. Suppose the table is t and the column is c:
SELECT
t1.c + 1 AS gap
FROM t as t1
LEFT OUTER JOIN t as t2 ON (t1.c + 1 = t2.c)
WHERE t2.c IS NULL
ORDER BY gap ASC
LIMIT 1
Edit: This one may be a tick faster (and shorter!):
SELECT
min(t1.c) + 1 AS gap
FROM t as t1
LEFT OUTER JOIN t as t2 ON (t1.c + 1 = t2.c)
WHERE t2.c IS NULL

This works in SQL Server - can't test it in other systems but it seems standard...
SELECT MIN(t1.ID)+1 FROM mytable t1 WHERE NOT EXISTS (SELECT ID FROM mytable WHERE ID = (t1.ID + 1))
You could also add a starting point to the where clause...
SELECT MIN(t1.ID)+1 FROM mytable t1 WHERE NOT EXISTS (SELECT ID FROM mytable WHERE ID = (t1.ID + 1)) AND ID > 2000
So if you had 2000, 2001, 2002, and 2005 where 2003 and 2004 didn't exist, it would return 2003.

The following solution:
provides test data;
an inner query that produces other gaps; and
it works in SQL Server 2012.
Numbers the ordered rows sequentially in the "with" clause and then reuses the result twice with an inner join on the row number, but offset by 1 so as to compare the row before with the row after, looking for IDs with a gap greater than 1. More than asked for but more widely applicable.
create table #ID ( id integer );
insert into #ID values (1),(2), (4),(5),(6),(7),(8), (12),(13),(14),(15);
with Source as (
select
row_number()over ( order by A.id ) as seq
,A.id as id
from #ID as A WITH(NOLOCK)
)
Select top 1 gap_start from (
Select
(J.id+1) as gap_start
,(K.id-1) as gap_end
from Source as J
inner join Source as K
on (J.seq+1) = K.seq
where (J.id - (K.id-1)) <> 0
) as G
The inner query produces:
gap_start gap_end
3 3
9 11
The outer query produces:
gap_start
3

Inner join to a view or sequence that has a all possible values.
No table? Make a table. I always keep a dummy table around just for this.
create table artificial_range(
id int not null primary key auto_increment,
name varchar( 20 ) null ) ;
-- or whatever your database requires for an auto increment column
insert into artificial_range( name ) values ( null )
-- create one row.
insert into artificial_range( name ) select name from artificial_range;
-- you now have two rows
insert into artificial_range( name ) select name from artificial_range;
-- you now have four rows
insert into artificial_range( name ) select name from artificial_range;
-- you now have eight rows
--etc.
insert into artificial_range( name ) select name from artificial_range;
-- you now have 1024 rows, with ids 1-1024
Then,
select a.id from artificial_range a
where not exists ( select * from your_table b
where b.counter = a.id) ;

This one accounts for everything mentioned so far. It includes 0 as a starting point, which it will default to if no values exist as well. I also added the appropriate locations for the other parts of a multi-value key. This has only been tested on SQL Server.
select
MIN(ID)
from (
select
0 ID
union all
select
[YourIdColumn]+1
from
[YourTable]
where
--Filter the rest of your key--
) foo
left join
[YourTable]
on [YourIdColumn]=ID
and --Filter the rest of your key--
where
[YourIdColumn] is null

For PostgreSQL
An example that makes use of recursive query.
This might be useful if you want to find a gap in a specific range
(it will work even if the table is empty, whereas the other examples will not)
WITH
RECURSIVE a(id) AS (VALUES (1) UNION ALL SELECT id + 1 FROM a WHERE id < 100), -- range 1..100
b AS (SELECT id FROM my_table) -- your table ID list
SELECT a.id -- find numbers from the range that do not exist in main table
FROM a
LEFT JOIN b ON b.id = a.id
WHERE b.id IS NULL
-- LIMIT 1 -- uncomment if only the first value is needed

My guess:
SELECT MIN(p1.field) + 1 as gap
FROM table1 AS p1
INNER JOIN table1 as p3 ON (p1.field = p3.field + 2)
LEFT OUTER JOIN table1 AS p2 ON (p1.field = p2.field + 1)
WHERE p2.field is null;

I wrote up a quick way of doing it. Not sure this is the most efficient, but gets the job done. Note that it does not tell you the gap, but tells you the id before and after the gap (keep in mind the gap could be multiple values, so for example 1,2,4,7,11 etc)
I'm using sqlite as an example
If this is your table structure
create table sequential(id int not null, name varchar(10) null);
and these are your rows
id|name
1|one
2|two
4|four
5|five
9|nine
The query is
select a.* from sequential a left join sequential b on a.id = b.id + 1 where b.id is null and a.id <> (select min(id) from sequential)
union
select a.* from sequential a left join sequential b on a.id = b.id - 1 where b.id is null and a.id <> (select max(id) from sequential);
https://gist.github.com/wkimeria/7787ffe84d1c54216f1b320996b17b7e

Here is an alternative to show the range of all possible gap values in portable and more compact way :
Assume your table schema looks like this :
> SELECT id FROM your_table;
+-----+
| id |
+-----+
| 90 |
| 103 |
| 104 |
| 118 |
| 119 |
| 120 |
| 121 |
| 161 |
| 162 |
| 163 |
| 185 |
+-----+
To fetch the ranges of all possible gap values, you have the following query :
The subquery lists pairs of ids, each of which has the lowerbound column being smaller than upperbound column, then use GROUP BY and MIN(m2.id) to reduce number of useless records.
The outer query further removes the records where lowerbound is exactly upperbound - 1
My query doesn't (explicitly) output the 2 records (YOUR_MIN_ID_VALUE, 89) and (186, YOUR_MAX_ID_VALUE) at both ends, that implicitly means any number in both of the ranges hasn't been used in your_table so far.
> SELECT m3.lowerbound + 1, m3.upperbound - 1 FROM
(
SELECT m1.id as lowerbound, MIN(m2.id) as upperbound FROM
your_table m1 INNER JOIN your_table
AS m2 ON m1.id < m2.id GROUP BY m1.id
)
m3 WHERE m3.lowerbound < m3.upperbound - 1;
+-------------------+-------------------+
| m3.lowerbound + 1 | m3.upperbound - 1 |
+-------------------+-------------------+
| 91 | 102 |
| 105 | 117 |
| 122 | 160 |
| 164 | 184 |
+-------------------+-------------------+

select min([ColumnName]) from [TableName]
where [ColumnName]-1 not in (select [ColumnName] from [TableName])
and [ColumnName] <> (select min([ColumnName]) from [TableName])

Here is standard a SQL solution that runs on all database servers with no change:
select min(counter + 1) FIRST_GAP
from my_table a
where not exists (select 'x' from my_table b where b.counter = a.counter + 1)
and a.counter <> (select max(c.counter) from my_table c);
See in action for;
PL/SQL via Oracle's livesql,
MySQL via sqlfiddle,
PostgreSQL via sqlfiddle
MS Sql via sqlfiddle

It works for empty tables or with negatives values as well. Just tested in SQL Server 2012
select min(n) from (
select case when lead(i,1,0) over(order by i)>i+1 then i+1 else null end n from MyTable) w

If You use Firebird 3 this is most elegant and simple:
select RowID
from (
select `ID_Column`, Row_Number() over(order by `ID_Column`) as RowID
from `Your_Table`
order by `ID_Column`)
where `ID_Column` <> RowID
rows 1

-- PUT THE TABLE NAME AND COLUMN NAME BELOW
-- IN MY EXAMPLE, THE TABLE NAME IS = SHOW_GAPS AND COLUMN NAME IS = ID
-- PUT THESE TWO VALUES AND EXECUTE THE QUERY
DECLARE #TABLE_NAME VARCHAR(100) = 'SHOW_GAPS'
DECLARE #COLUMN_NAME VARCHAR(100) = 'ID'
DECLARE #SQL VARCHAR(MAX)
SET #SQL =
'SELECT TOP 1
'+#COLUMN_NAME+' + 1
FROM '+#TABLE_NAME+' mo
WHERE NOT EXISTS
(
SELECT NULL
FROM '+#TABLE_NAME+' mi
WHERE mi.'+#COLUMN_NAME+' = mo.'+#COLUMN_NAME+' + 1
)
ORDER BY
'+#COLUMN_NAME
-- SELECT #SQL
DECLARE #MISSING_ID TABLE (ID INT)
INSERT INTO #MISSING_ID
EXEC (#SQL)
--select * from #MISSING_ID
declare #var_for_cursor int
DECLARE #LOW INT
DECLARE #HIGH INT
DECLARE #FINAL_RANGE TABLE (LOWER_MISSING_RANGE INT, HIGHER_MISSING_RANGE INT)
DECLARE IdentityGapCursor CURSOR FOR
select * from #MISSING_ID
ORDER BY 1;
open IdentityGapCursor
fetch next from IdentityGapCursor
into #var_for_cursor
WHILE ##FETCH_STATUS = 0
BEGIN
SET #SQL = '
DECLARE #LOW INT
SELECT #LOW = MAX('+#COLUMN_NAME+') + 1 FROM '+#TABLE_NAME
+' WHERE '+#COLUMN_NAME+' < ' + cast( #var_for_cursor as VARCHAR(MAX))
SET #SQL = #sql + '
DECLARE #HIGH INT
SELECT #HIGH = MIN('+#COLUMN_NAME+') - 1 FROM '+#TABLE_NAME
+' WHERE '+#COLUMN_NAME+' > ' + cast( #var_for_cursor as VARCHAR(MAX))
SET #SQL = #sql + 'SELECT #LOW,#HIGH'
INSERT INTO #FINAL_RANGE
EXEC( #SQL)
fetch next from IdentityGapCursor
into #var_for_cursor
END
CLOSE IdentityGapCursor;
DEALLOCATE IdentityGapCursor;
SELECT ROW_NUMBER() OVER(ORDER BY LOWER_MISSING_RANGE) AS 'Gap Number',* FROM #FINAL_RANGE

Found most of approaches run very, very slow in mysql. Here is my solution for mysql < 8.0. Tested on 1M records with a gap near the end ~ 1sec to finish. Not sure if it fits other SQL flavours.
SELECT cardNumber - 1
FROM
(SELECT #row_number := 0) as t,
(
SELECT (#row_number:=#row_number+1), cardNumber, cardNumber-#row_number AS diff
FROM cards
ORDER BY cardNumber
) as x
WHERE diff >= 1
LIMIT 0,1
I assume that sequence starts from `1`.

If your counter is starting from 1 and you want to generate first number of sequence (1) when empty, here is the corrected piece of code from first answer valid for Oracle:
SELECT
NVL(MIN(id + 1),1) AS gap
FROM
mytable mo
WHERE 1=1
AND NOT EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = mo.id + 1
)
AND EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = 1
)

DECLARE #Table AS TABLE(
[Value] int
)
INSERT INTO #Table ([Value])
VALUES
(1),(2),(4),(5),(6),(10),(20),(21),(22),(50),(51),(52),(53),(54),(55)
--Gaps
--Start End Size
--3 3 1
--7 9 3
--11 19 9
--23 49 27
SELECT [startTable].[Value]+1 [Start]
,[EndTable].[Value]-1 [End]
,([EndTable].[Value]-1) - ([startTable].[Value]) Size
FROM
(
SELECT [Value]
,ROW_NUMBER() OVER(PARTITION BY 1 ORDER BY [Value]) Record
FROM #Table
)AS startTable
JOIN
(
SELECT [Value]
,ROW_NUMBER() OVER(PARTITION BY 1 ORDER BY [Value]) Record
FROM #Table
)AS EndTable
ON [EndTable].Record = [startTable].Record+1
WHERE [startTable].[Value]+1 <>[EndTable].[Value]

If the numbers in the column are positive integers (starting from 1) then here is how to solve it easily. (assuming ID is your column name)
SELECT TEMP.ID
FROM (SELECT ROW_NUMBER() OVER () AS NUM FROM 'TABLE-NAME') AS TEMP
WHERE ID NOT IN (SELECT ID FROM 'TABLE-NAME')
ORDER BY 1 ASC LIMIT 1

SELECT ID+1 FROM table WHERE ID+1 NOT IN (SELECT ID FROM table) ORDER BY 1;

Related

SQL - Split a column into two columns in Mysql

I have this table. Considering the id starts from 0.
Table 1
ID Letter
1 A
2 B
3 C
4 D
6 E
I need following output
Col1 Col2
NULL A
B C
D NULL
E NULL
I tried using union with id, id - 1 and id + 1, but I couldn't figure out how to get letter based on ids, also tried even odd logic but nothing worked.
Any help is appreciated.
Thank you
You didn't post the database engine, so I'll assume PostgreSQL where the modulus operand is %.
The query should be:
select o.letter, e.letter
from (
select id, letter, id as base from my_table where id % 2 = 0
) o full outer join (
select id, letter, (id - 1) as base from my_table where id % 2 <> 0
) e on e.base = o.base
order by coalesce(o.base, e.base)
Please take the following option with a grain of salt since I don't have a way of testing it in MySQL 5.6. In the absence of a full outer join, you can perform two outer joins, and then you can union them, as in:
select * from (
select o.base, o.letter, e.letter
from (
select id, letter, id as base from my_table where id % 2 = 0
) o left join (
select id, letter, (id - 1) as base from my_table where id % 2 <> 0
) e on e.base = o.base
union
select e.base, o.letter, e.letter
from (
select id, letter, id as base from my_table where id % 2 = 0
) o right join (
select id, letter, (id - 1) as base from my_table where id % 2 <> 0
) e on e.base = o.base
) x
order by base
Just use conditional aggregation:
select max(case when id % 2 = 0 then letter end) as col1,
max(case when id % 2 = 1 then letter end) as col2
from t
group by floor(id / 2);
If you prefer, you can use mod() instead of %. MySQL supports both.

Select all rows that have column value larger than some value

I have a SQL table thus:
username | rank
a | 0
b | 2
c | 5
d | 4
e | 5
f | 7
g | 1
h | 12
I want to use a single select statement that returns all rows that have rank greater than the value of user e's rank.
Is this possible with a single statement?
Yes it is possible.
SELECT * FROM MyTable
WHERE rank > (SELECT Rank FROM MyTable WHERE username = 'e')
or you can also use self-join for the same
SELECT t1.* FROM MyTable t1
JOIN MyTable t2
ON t1.Rank > t2.Rank
AND t2.username = 'e';
See this SQLFiddle
You can use subquery
SELECT * FROM `table` WHERE `rank` > (
SELECT `rank` FROM `table` WHERE `username` ='b' LIMIT 1)
This is just an edit script of #UweB.. It will return the max rank even if there are multiple rows for username='e'
SELECT *
FROM tbl
WHERE
tbl.rank > (
SELECT max(rank) FROM tbl WHERE username = 'e')
SELECT *
FROM tbl
WHERE
tbl.rank > (
SELECT rank FROM tbl WHERE username = 'e'
);
Note that this will only work if the sub-select (the part in brackets) returns a single value (one row, one column so to speak) only.

Looking for missed IDs in SQL Server 2008

I have a table that contains two columns
ID | Name
----------------
1 | John
2 | Sam
3 | Peter
6 | Mike
It has missed IDs. In this case these are 4 and 5.
How do I find and insert them together with random names into this table?
Update: cursors and temp tables are not allowed. The random name should be 'Name_'+ some random number. Maybe it would be the specified value like 'Abby'. So it doesn't matter.
Using a recursive CTE you can determine the missing IDs as follows
DECLARE #Table TABLE(
ID INT,
Name VARCHAR(10)
)
INSERT INTO #Table VALUES (1, 'John'),(2, 'Sam'),(3,'Peter'),(6, 'Mike')
DECLARE #StartID INT,
#EndID INT
SELECT #StartID = MIN(ID),
#EndID = MAX(ID)
FROM #Table
;WITH IDS AS (
SELECT #StartID IDEntry
UNION ALL
SELECT IDEntry + 1
FROM IDS
WHERE IDEntry + 1 <= #EndID
)
SELECT IDS.IDEntry [ID]
FROM IDS LEFT JOIN
#Table t ON IDS.IDEntry = t.ID
WHERE t.ID IS NULL
OPTION (MAXRECURSION 0)
The option MAXRECURSION 0 will allow the code to avoid the recursion limit of SQL SERVER
From Query Hints and WITH common_table_expression (Transact-SQL)
MAXRECURSION number Specifies the maximum number of recursions
allowed for this query. number is a nonnegative integer between 0 and
32767. When 0 is specified, no limit is applied. If this option is not specified, the default limit for the server is 100.
When the specified or default number for MAXRECURSION limit is reached
during query execution, the query is ended and an error is returned.
Because of this error, all effects of the statement are rolled back.
If the statement is a SELECT statement, partial results or no results
may be returned. Any partial results returned may not include all rows
on recursion levels beyond the specified maximum recursion level.
Generating the RANDOM names will largly be affected by the requirements of such a name, and the column type of such a name. What exactly does this random name entail?
You can do this using a recursive Common Table Expression CTE. Here's an example how:
DECLARE #MaxId INT
SELECT #MaxId = MAX(ID) from MyTable
;WITH Numbers(Number) AS
(
SELECT 1
UNION ALL
SELECT Number + 1 FROM Numbers WHERE Number < #MaxId
)
SELECT n.Number, 'Random Name'
FROM Numbers n
LEFT OUTER JOIN MyTable t ON n.Number=t.ID
WHERE t.ID IS NULL
Here are a couple of articles about CTEs that will be helpful to Using Common Table Expressions and Recursive Queries Using Common Table Expressions
Start by selecting the highest number in the table (select top 1 id desc), or select max(id), then run a while loop to iterate from 1...max.
See this article about looping.
For each iteration, see if the row exists, and if not, insert into table, with that ID.
I think recursive CTE is a better solution, because it's going to be faster, but here is what worked for me:
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[TestTable]') AND type in (N'U'))
DROP TABLE [dbo].[TestTable]
GO
CREATE TABLE [dbo].[TestTable](
[Id] [int] NOT NULL,
[Name] [varchar](50) NOT NULL,
CONSTRAINT [PK_TestTable] PRIMARY KEY CLUSTERED
(
[Id] ASC
))
GO
INSERT INTO [dbo].[TestTable]([Id],[Name]) VALUES (1, 'John')
INSERT INTO [dbo].[TestTable]([Id],[Name]) VALUES (2, 'Sam')
INSERT INTO [dbo].[TestTable]([Id],[Name]) VALUES (3, 'Peter')
INSERT INTO [dbo].[TestTable]([Id],[Name]) VALUES (6, 'Mike')
GO
declare #mod int
select #mod = MAX(number)+1 from master..spt_values where [type] = 'P'
INSERT INTO [dbo].[TestTable]
SELECT y.Id,'Name_' + cast(newid() as varchar(45)) Name from
(
SELECT TOP (select MAX(Id) from [dbo].[TestTable]) x.Id from
(
SELECT
t1.number*#mod + t2.number Id
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
WHERE t1.[type] = 'P' and t2.[type] = 'P'
) x
WHERE x.Id > 0
ORDER BY x.Id
) y
LEFT JOIN [dbo].[TestTable] on [TestTable].Id = y.Id
where [TestTable].Id IS NULL
GO
select * from [dbo].[TestTable]
order by Id
GO
http://www.sqlfiddle.com/#!3/46c7b/18
It's actually very simple :
Create a table called #All_numbers which should contain all the natural number in the range that you are looking for.
#list is a table containing your data
select a.num as missing_number ,
'Random_Name' + convert(varchar, a.num)
from #All_numbers a left outer join #list l on a.num = l.Id
where l.id is null

SQL Group By And Display With Different Values

I want to create group query where Table values are like this below:
EMP_ID ProjectID
815 1
985 1
815 3
985 4
815 4
And i want output like this
EMP_ID ProjectID1 ProjectID2 ProjectID3
815 1 3 4
985 1 4 0
can anyone know how can i achieve this thing in SQL query.
Thank in advance.
The short way:
Using http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat
SELECT
tbl.emp_id,
GROUP_CONCAT( DISTINCT project_id ) project_id_list
FROM tbl
GROUP BY tbl.emp_id
In this case, you have to split/process the concatenated project_id_list string (or NULL) in your application
The long way:
We will use a little trick:
http://dev.mysql.com/doc/refman/5.1/en/example-auto-increment.html
For MyISAM tables you can specify AUTO_INCREMENT on a secondary column
in a multiple-column index. In this case, the generated value for the
AUTO_INCREMENT column is calculated as MAX(auto_increment_column) + 1
WHERE prefix=given-prefix. This is useful when you want to put data
into ordered groups.
CREATE TEMPORARY TABLE temp (
emp_id INT NOT NULL,
-- project_num will count from 1 to N PER emp_id!
project_num INT NOT NULL AUTO_INCREMENT,
project_id INT NOT NULL,
PRIMARY KEY ( emp_id, project_num )
) ENGINE=MyISAM; -- works only with myisam!
Generate the per-group auto increments:
INSERT INTO temp ( emp_id, project_id )
SELECT emp_id, project_id FROM tbl
Calculate how many project_id columns are needed:
$MAX_PROJECTS_PER_EMP =
SELECT MAX( max_projects_per_emp ) FROM
( SELECT COUNT(*) AS max_projects_per_emp project_id FROM tbl GROUP BY emp_id )
Programmatically create the select expression:
SELECT
temp.emp_id,
t1.project_id AS project_id_1,
t2.project_id AS project_id_2,
t98.project_id AS project_id_98,
t99.project_id AS project_id_99,
FROM temp
LEFT JOIN temp AS t1 ON temp.emp_id = t1.id AND t1.project_num = 1
LEFT JOIN temp AS t2 ON temp.emp_id = t2.id AND t1.project_num = 2
// create $MAX_PROJECTS_PER_EMP lines of LEFT JOINs
LEFT JOIN temp AS t98 ON temp.emp_id = t98.id AND t98.project_num = 98
LEFT JOIN temp AS t99 ON temp.emp_id = t99.id AND t99.project_num = 99

How can we find gaps in sequential numbering in MySQL?

We have a database with a table whose values were imported from another system. There is an auto-increment column, and there aren’t any duplicate values, but there are missing values. For example, running this query:
select count(id) from arrc_vouchers where id between 1 and 100
should return 100, but it returns 87 instead. Is there a query I can run that will return the values of the missing numbers? For example, the records may exist for id 1-70 and 83-100, but there aren’t any records with id's of 71-82. I want to return 71, 72, 73, etc.
Is this possible?
A better answer
JustPlainMJS provided a much better answer in terms of performance.
The (not as fast as possible) answer
Here's a version that works on a table of any size (not just on 100 rows):
SELECT (t1.id + 1) as gap_starts_at,
(SELECT MIN(t3.id) -1 FROM arrc_vouchers t3 WHERE t3.id > t1.id) as gap_ends_at
FROM arrc_vouchers t1
WHERE NOT EXISTS (SELECT t2.id FROM arrc_vouchers t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL
gap_starts_at - first id in current gap
gap_ends_at - last id in current gap
This just worked for me to find the gaps in a table with more than 80k rows:
SELECT
CONCAT(z.expected, IF(z.got-1>z.expected, CONCAT(' thru ',z.got-1), '')) AS missing
FROM (
SELECT
#rownum:=#rownum+1 AS expected,
IF(#rownum=YourCol, 0, #rownum:=YourCol) AS got
FROM
(SELECT #rownum:=0) AS a
JOIN YourTable
ORDER BY YourCol
) AS z
WHERE z.got!=0;
Result:
+------------------+
| missing |
+------------------+
| 1 thru 99 |
| 666 thru 667 |
| 50000 |
| 66419 thru 66456 |
+------------------+
4 rows in set (0.06 sec)
Note that the order of columns expected and got is critical.
If you know that YourCol doesn't start at 1 and that doesn't matter, you can replace
(SELECT #rownum:=0) AS a
with
(SELECT #rownum:=(SELECT MIN(YourCol)-1 FROM YourTable)) AS a
New result:
+------------------+
| missing |
+------------------+
| 666 thru 667 |
| 50000 |
| 66419 thru 66456 |
+------------------+
3 rows in set (0.06 sec)
If you need to perform some kind of shell script task on the missing IDs, you can also use this variant in order to directly produce an expression you can iterate over in Bash.
SELECT GROUP_CONCAT(IF(z.got-1>z.expected, CONCAT('$(',z.expected,' ',z.got-1,')'), z.expected) SEPARATOR " ") AS missing
FROM ( SELECT #rownum:=#rownum+1 AS expected, IF(#rownum=height, 0, #rownum:=height) AS got FROM (SELECT #rownum:=0) AS a JOIN block ORDER BY height ) AS z WHERE z.got!=0;
This produces an output like so
$(seq 1 99) $(seq 666 667) 50000 $(seq 66419 66456)
You can then copy and paste it into a for loop in a bash terminal to execute a command for every ID
for ID in $(seq 1 99) $(seq 666 667) 50000 $(seq 66419 66456); do
echo $ID
# Fill the gaps
done
It's the same thing as above, only that it's both readable and executable. By changing the "CONCAT" command above, syntax can be generated for other programming languages. Or maybe even SQL.
A quick-and-dirty query that should do the trick:
SELECT a AS id, b AS next_id, (b - a) -1 AS missing_inbetween
FROM
(
SELECT a1.id AS a , MIN(a2.id) AS b
FROM arrc_vouchers AS a1
LEFT JOIN arrc_vouchers AS a2 ON a2.id > a1.id
WHERE a1.id <= 100
GROUP BY a1.id
) AS tab
WHERE
b > a + 1
This will give you a table showing the id that has ids missing above it, and next_id that exists, and how many are missing between... E.g.,
id next_id missing_inbetween
1 4 2
68 70 1
75 87 11
If you are using a MariaDB database, you have a faster (800%) option using the sequence storage engine:
SELECT * FROM seq_1_to_50000 WHERE SEQ NOT IN (SELECT COL FROM TABLE);
If there is a sequence having gap of maximum one between two numbers (like
1,3,5,6) then the query that can be used is:
select s.id+1 from source1 s where s.id+1 not in(select id from source1) and s.id+1<(select max(id) from source1);
table_name - source1
column_name - id
An alternative solution that requires a query + some code doing some processing would be:
select l.id lValue, c.id cValue, r.id rValue
from
arrc_vouchers l
right join arrc_vouchers c on l.id=IF(c.id > 0, c.id-1, null)
left join arrc_vouchers r on r.id=c.id+1
where 1=1
and c.id > 0
and (l.id is null or r.id is null)
order by c.id asc;
Note that the query does not contain any subselect that we know it's not handled performantly by MySQL's planner.
That will return one entry per centralValue (cValue) that does not have a smaller value (lValue) or a greater value (rValue), i.e.:
lValue |cValue|rValue
-------+------+-------
{null} | 2 | 3
8 | 9 | {null}
{null} | 22 | 23
23 | 24 | {null}
{null} | 29 | {null}
{null} | 33 | {null}
Without going into further details (we'll see them in next paragraphs) this output means that:
No values between 0 and 2
No values between 9 and 22
No values between 24 and 29
No values between 29 and 33
No values between 33 and MAX VALUE
So the basic idea is to do a RIGHT and LEFT joins with the same table seeing if we have adjacents values per value (i.e., if central value is '3' then we check for 3-1=2 at left and 3+1 at right), and when a ROW has a NULL value at RIGHT or LEFT then we know there is no adjacent value.
The complete raw output of my table is:
select * from arrc_vouchers order by id asc;
0
2
3
4
5
6
7
8
9
22
23
24
29
33
Some notes:
The SQL IF statement in the join condition is needed if you define the 'id' field as UNSIGNED, therefore it will not allow you to decrease it under zero. This is not strictly necessary if you keep the c.value > 0 as it's stated in the next note, but I'm including it just as doc.
I'm filtering the zero central value as we are not interested in any previous value and we can derive the post value from the next row.
I tried it in a different manner, and the best performance that I found was this simple query:
select a.id+1 gapIni
,(select x.id-1 from arrc_vouchers x where x.id>a.id+1 limit 1) gapEnd
from arrc_vouchers a
left join arrc_vouchers b on b.id=a.id+1
where b.id is null
order by 1
;
... one left join to check if the next id exists, only if next if is not found, then the subquery finds the next id that exists to find the end of gap. I did it because the query with equal (=) is better performance than the greater than (>) operator.
Using the sqlfiddle it does not show so a different performance compared to the other queries, but in a real database this query above results in 3 times faster than the others.
The schema:
CREATE TABLE arrc_vouchers (id int primary key)
;
INSERT INTO `arrc_vouchers` (`id`) VALUES (1),(4),(5),(7),(8),(9),(10),(11),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29)
;
Follow below all the queries that I made to compare the performance:
select a.id+1 gapIni
,(select x.id-1 from arrc_vouchers x where x.id>a.id+1 limit 1) gapEnd
from arrc_vouchers a
left join arrc_vouchers b on b.id=a.id+1
where b.id is null
order by 1
;
select *, (gapEnd-gapIni) qt
from (
select id+1 gapIni
,(select x.id from arrc_vouchers x where x.id>a.id limit 1) gapEnd
from arrc_vouchers a
order by id
) a where gapEnd <> gapIni
;
select id+1 gapIni
,(select x.id from arrc_vouchers x where x.id>a.id limit 1) gapEnd
#,coalesce((select id from arrc_vouchers x where x.id=a.id+1),(select x.id from arrc_vouchers x where x.id>a.id limit 1)) gapEnd
from arrc_vouchers a
where id+1 <> (select x.id from arrc_vouchers x where x.id>a.id limit 1)
order by id
;
select id+1 gapIni
,coalesce((select id from arrc_vouchers x where x.id=a.id+1),(select x.id from arrc_vouchers x where x.id>a.id limit 1)) gapEnd
from arrc_vouchers a
order by id
;
select id+1 gapIni
,coalesce((select id from arrc_vouchers x where x.id=a.id+1),concat('*** GAT *** ',(select x.id from arrc_vouchers x where x.id>a.id limit 1))) gapEnd
from arrc_vouchers a
order by id
;
You can see and test my query using this SQL Fiddle:
http://sqlfiddle.com/#!9/6bdca7/1
It is probably not relevant, but I was looking for something like this to list the gaps in a sequence of numbers and found this post that has multiple different solutions depending upon exactly what you are looking for. I was looking for the first available gap in the sequence (i.e., next available number), and this seems to work fine.
SELECT MIN(l.number_sequence + 1) as nextavabile
from patients as l
LEFT OUTER JOIN patients as r on l.number_sequence + 1 = r.number_sequence
WHERE r.number_sequence is NULL
Several other scenarios and solutions discussed there, from 2005!
How to Find Missing Values in a Sequence With SQL
Create a temporary table with 100 rows and a single column containing the values 1-100.
Outer Join this table to your arrc_vouchers table and select the single column values where the arrc_vouchers id is null.
This should work:
select tempid from temptable
left join arrc_vouchers on temptable.tempid = arrc_vouchers.id
where arrc_vouchers.id is null
Although these all seem to work, the result set returns in a very lengthy time when there are 50,000 records.
I used this, and it find the gap or the next available (last used + 1) with a much faster return from the query.
SELECT a.id as beforegap, a.id+1 as avail
FROM table_name a
where (select b.id from table_name b where b.id=a.id+1) is null
limit 1;
Based on the answer given by matt, this stored procedure allows you to specify the table and column names that you wish to test to find non-contiguous records - thus answering the original question and also demonstrating how one could use #var to represent tables &/or columns in a stored procedure.
create definer=`root`#`localhost` procedure `spfindnoncontiguous`(in `param_tbl` varchar(64), in `param_col` varchar(64))
language sql
not deterministic
contains sql
sql security definer
comment ''
begin
declare strsql varchar(1000);
declare tbl varchar(64);
declare col varchar(64);
set #tbl=cast(param_tbl as char character set utf8);
set #col=cast(param_col as char character set utf8);
set #strsql=concat("select
( t1.",#col," + 1 ) as starts_at,
( select min(t3.",#col,") -1 from ",#tbl," t3 where t3.",#col," > t1.",#col," ) as ends_at
from ",#tbl," t1
where not exists ( select t2.",#col," from ",#tbl," t2 where t2.",#col," = t1.",#col," + 1 )
having ends_at is not null");
prepare stmt from #strsql;
execute stmt;
deallocate prepare stmt;
end
A simple, yet effective, solution to find the missing auto-increment values:
SELECT `id`+1
FROM `table_name`
WHERE `id`+1 NOT IN (SELECT id FROM table_name)
Another simple answer that identifies the gaps. We do a query selecting just the odd numbers and we right join it to a query with all the even numbers. As long as you're not missing id 1; this should give you a comprehensive list of where the gaps start.
You'll still have to take a look at that place in the database to figure out how many numbers the gap is. I found this way easier than the solution proposed and much easier to customize to unique situations.
SELECT *
FROM (SELECT * FROM MyTABLE WHERE MYFIELD % 2 > 0) AS A
RIGHT JOIN FROM (SELECT * FROM MyTABLE WHERE MYFIELD % 2 = 0) AS B
ON A.MYFIELD=(B.MYFIELD+1)
WHERE a.id IS NULL;
This works for me:
SELECT distinct(l.membership_no + 1) as nextavabile
from Tablename as l
LEFT OUTER JOIN Tablename as r on l.membership_no + 1 = r.membership_no
WHERE r.membership_no is NULL and l.membership_no is not null order by nextavabile asc;
Starting from the comment posted by user933161,
select l.id + 1 as start from sequence as l inner join sequence as r on l.id + 1 = r.id where r.id is null;
is better in that it will not produce a false positive for the end of the list of records. (I'm not sure why so many are using left outer joins.)
Also,
insert into sequence (id) values (#);
where # is the start value for a gap will fill that start value. (If there are fields that cannot be null, you will have to add those with dummy values.)
You could alternate between querying for start values and filling in each start value until the query for start values returns an empty set.
Of course, this approach would only be helpful if you're working with a small enough data set that manually iterating like that is reasonable. I don't know enough about things like phpMyAdmin to come up with ways to automate it for larger sets with more and larger gaps.
CREATE TABLE arrc_vouchers (id int primary key);
INSERT INTO `arrc_vouchers` (`id`) VALUES (1),(4),(5),(7),(8),(9),(10),(11),(15),(16);
WITH RECURSIVE odd_num_cte (id) AS
(
SELECT (select min(id) from arrc_vouchers)
union all
SELECT id+1 from odd_num_cte where id <(SELECT max(id) from arrc_vouchers)
)
SELECT cte.id
from arrc_vouchers ar right outer join odd_num_cte cte on ar.id=cte.id
where ar.id is null;