Delete duplicates from db - mysql

I have table like following
id | a_id | b_id | success
--------------------------
1 34 43 1
2 34 84 1
3 34 43 0
4 65 43 1
5 65 84 1
6 93 23 0
7 93 23 0
I want delete duplicates with same a_id and b_id, but I want keep one record. If possible kept record should be with success=1. So in example table third and sixth/seventh record should be deleted. How to do this?
I'm using MySQL 5.1

The task is simple:
Find the minimum number of records that should not be deleted.
Delete the other records.
The Oracle way,
delete from sample_table where id not in(
select id from
(
Select id, success,row_number()
over (partition by a_id,b_id order by success desc) rown
from sample_table
)
where (success = 1 and rown = 1) or rown=1)
The solution in mysql:
Will give you the minimum ids that should not be deleted.:
Select id from (SELECT * FROM report ORDER BY success desc) t
group by t.a_id, t.b
o/p:
ID
1
2
4
5
6
You can delete the other rows.
delete from report where id not in (the above query)
The consolidated DML:
delete from report
where id not in (Select id
from (SELECT * FROM report
ORDER BY success desc) t
group by t.a_id, t.b_id)
Now doing a Select on report:
ID A_ID B_ID SUCCESS
1 34 43 1
2 34 84 1
4 65 43 1
5 65 84 1
6 93 23 0
You can check the documentation of how the group by clause works when no aggregation function is provided:
When using this feature, all rows in each group should have the same
values for the columns that are omitted from the GROUP BY part. The
server is free to return any value from the group, so the results are
indeterminate unless all values are the same.
So just performing an order by 'success before the group by would allow us to get the first duplicate row with success = 1.

How about this:
CREATE TABLE new_table
AS (SELECT * FROM old_table WHERE 1 AND success = 1 GROUP BY a_id,b_id);
DROP TABLE old_table;
RENAME TABLE new_table TO old_table;
This method will create a new table with a temporary name, and copy all the deduped rows which have success = 1 from the old table. The old table is then dropped and the new table is renamed to the name of the old table.
If I understand your question correctly, this is probably the simplest solution. (though I don't know if it's really efficient or not)

This should work:
If procedural programming is available to you like e.g. pl/sql it is fairly simple. If you on the other hand is looking for a clean SQL solution it might be possible but not very "nice". Below is an example in pl/sql:
begin
for x in ( select a_id, b_id
from table
having count(*) > 1
group by a_id, b_id )
loop
for y in ( select *
from table
where a_id = x.a_id
and b_id = x.b_id
order by success desc )
loop
delete from table
where a_id = y.a_id
and b_id = y.b_id
and id != x.id;
exit; // Only do the first row
end loop;
end loop;
end;
This is the idea: For each duplicated combination of a_id and b_id select all the instances ordered so that any with success=1 is up first. Delete all of that combination except the first - being the successful one if any.
or perhaps:
declare
l_a_id integer := -1;
l_b_id integer := -1;
begin
for x in ( select *
from table
order by a_id, b_id, success desc )
loop
if x.a_id = l_a_id and x.b_id = l_b_id
then
delete from table where id = x.id;
end if;
l_a_id := x.a_id;
l_b_id := x.b_id;
end loop;
end;

In MySQL, if you dont want to care about which record is maintained, a single alter table will work.
ALTER IGNORE TABLE tbl_name
ADD UNIQUE INDEX(a_id, b_id)
It ignores the duplicate records and maintain only the unique records.
A useful links :
MySQL: ALTER IGNORE TABLE ADD UNIQUE, what will be truncated?

Related

How to keep from inserting duplicated rows into a table?

I have a table MY_TABLE that contains records as follows:
NAME
ID1
ID2
FLAG
JESSICA
12
34
TRUE
NULL
12
34
TRUE
I want to insert into another table TEST the values from the last 3 columns from MY_TABLE but I don't want to duplicate the rows
I am trying to do:
INSERT ALL
WHEN ID1 IS NOT NULL AND FLAG THEN
INTO TEST VALUES (
ID1,
ID2,
FLAG
)
SELECT *
FROM MY_TABLE
LEFT JOIN TEMP ON ID1;
This is resulting in my table looking like:
ID1
ID2
FLAG
12
34
TRUE
12
34
TRUE
instead of:
ID1
ID2
FLAG
12
34
TRUE
The issue I am running into is that these duplicated values for these last 3 columns are resulting from my join and I can't select only the last 3 columns in my query because I need the first column for another table I am inserting into as well (not shown and also is the reason why I need to use INSERT ALL here). Is there a way to solve this duplicate rows issue within the INSERT itself?
You can project a column with a row number for each group of rows and use the row number to decide which row to use in your multi-table insert.
create or replace temp table T1 as
select
COLUMN1::string as "NAME",
COLUMN2::string as "ID1",
COLUMN3::string as "ID2",
COLUMN4::string as "FLAG"
from (values
('JESSICA','12','34','TRUE'),
('NULL','12','34','TRUE')
);
select NAME
,ID1
,ID2
,FLAG
,row_number() over (partition by ID1, ID2, FLAG order by NAME nulls last) ROWNUMBER
from t1
;
That select will produce a result set like this:
NAME
ID1
ID2
FLAG
ROWNUMBER
JESSICA
12
34
TRUE
1
NULL
12
34
TRUE
2
In your multi-table insert, you can then key off the ROWNUMBER column:
INSERT ALL
WHEN ID1 IS NOT NULL AND FLAG AND ROWNUMBER = 1 THEN
[etc., etc.]

Ho to assign Previous value in column for each record

I have one table scenario in which data looks like this .
Request Id Field Id Current Key
1213 11 1001
1213 12 1002
1213 12 103
1214 13 799
1214 13 899
1214 13 7
In this when loop starts for first Request ID then it should check all the field ID for that particular request ID. then data should be look like this .
Request Id Field Id Previous Key Current Key
1213 11 null 1001
1213 12 null 1002
1213 12 1002 103
1214 13 null 799
1214 13 799 899
1214 13 899 7
When very first record for Field id for particular request id come then for it should be take null values in Previous key column and the current key will remain the same.
When the second record will come for same field ID its should take previous value of first record in Previous key column and when third record come it should take previous value of second record in Previous column and so on .
When the new field ID came the same thing should be repeated again.
Please let me know if you need any more info.Much needed your help.
You can check this.
Declare #t table (Request_Id int, Field_Id int, Current_Key int)
insert into #t values (1213, 11, 1001),(1213, 12, 1002), (1213, 12, 103) , (1214, 13, 799), (1214, 13, 899), (1214, 13, 7)
;with cte
as (
select 0 rowno,0 Request_Id, 0 Field_Id, 0 Current_Key
union
select ROW_NUMBER() over(order by request_id) rowno, * from #t
)
select
t1.Request_Id , t1.Field_Id ,
case when t1.Request_Id = t2.Request_Id and t1.Field_Id = t2.Field_Id
then t2.Current_Key
else null
end previous_key
, t1.Current_Key
from cte t1, cte t2
where t1.rowno = t2.rowno + 1
Refer link when you want to compare row value
When the second record will come for same field ID...
Tables don't work this way: there is no way to tell that 1213,12,1002 is the "previous" record of 1213,12,103 as you assume in your example.
Do you have any data you can use to sort your records properly? Request id isn't enough because, even if you guarantee that it increments monotonically for each operation, each operation can include multiple values for the same item id which need to be sorted relative to each other.
IN SQL 2008
You do not have the benefit of the lead and lag functions. Instead you must do a query for the new column. Make sure you query both tables in the same order, and add a row_num column. Then select the greatest row_num that is not equal to the current row_num and has the same request_id and field_id.
select a.request_id,
a.field_id,
(select x.current_key
from (select * from (select t.*, RowNumber() as row_num from your_table t) order by row_num desc) x
where x.request_id = a.request_id
and x.field_id = a.field_id
and x.row_num < a.row_num
and RowNumber()= 1
) as previous_key,
a.current_key
from (select t.*, RowNumber()as row_num from your_table t) a
IN SQL 2012+
You can use the LAG or LEAD functions with the OVER clause to get the previous or next nth row value:
select
Request_Id,
Field_Id,
lag(Current_Key,1) over (partition by Request_ID, Field_ID) as Previous_Key
,Current_Key
from your table
You should probably look at how you order your results too. If you have multiple results lag will only grab the next row in the default order of the table. If you had another column to order by such as a date time you could do the following:
lag(Current_Key,1) over (partition by Request_ID, Field_ID order by timestampColumn)
try this,
declare #tb table (RequestId int,FieldId int, CurrentKey int)
insert into #tb (RequestId,FieldId,CurrentKey) values
(1213,11,1001),
(1213,12,1002),
(1213,12,103),
(1214,13,799),
(1214,13,899),
(1214,13, 7)
select RequestId,t.FieldId,
case when t.FieldId=t1.FieldId then t1.CurrentKey end as PreviousKey,t.CurrentKey from
(select *, ROW_NUMBER() over (order by RequestId,FieldId) as rno
from #tb) t left join
(select FieldId,CurrentKey,
ROW_NUMBER() over (order by RequestId,FieldId) as rno from #tb) t1 on t.rno=t1.rno+1

Delete duplicates where field1 and field2 are the identical

I have a table like
productId retailerId
1 2
1 2
1 4
1 6
1 8
1 8
2 3
2 6
2 6
Now, I need to remove the duplicates. I've figured out how to remove duplicates when one field is the same. But I need to remove the duplicates such as 1 2, 1 8 and 2 6, where both fields are identical.
Any help would be very gratefully received.
Use mysql's multiple-table DELETE syntax as follows:
delete mytable
from mytable
join mytable t
on t.productId = mytable.productId
and t.retailerId = mytable.retailerId
and t.id < mytable.id
See this running on SQLFiddle.
Note that I have assumed that you have an id column as well.
Edit:
Since there is no id column, there simplest approach is to copy the desired data to a temporary table, delete all data, then copy it back, as follows:
CREATE TEMPORARY TABLE temptable
SELECT DISTINCT productId, retailerId
FROM mytable;
DELEYE FROM mytable;
INSERT INTO mytable
SELECT *
FROM temptable;

MySQL delete duplicate records but keep latest

I have unique id and email fields. Emails get duplicated. I only want to keep one Email address of all the duplicates but with the latest id (the last inserted record).
How can I achieve this?
Imagine your table test contains the following data:
select id, email
from test;
ID EMAIL
---------------------- --------------------
1 aaa
2 bbb
3 ccc
4 bbb
5 ddd
6 eee
7 aaa
8 aaa
9 eee
So, we need to find all repeated emails and delete all of them, but the latest id.
In this case, aaa, bbb and eee are repeated, so we want to delete IDs 1, 7, 2 and 6.
To accomplish this, first we need to find all the repeated emails:
select email
from test
group by email
having count(*) > 1;
EMAIL
--------------------
aaa
bbb
eee
Then, from this dataset, we need to find the latest id for each one of these repeated emails:
select max(id) as lastId, email
from test
where email in (
select email
from test
group by email
having count(*) > 1
)
group by email;
LASTID EMAIL
---------------------- --------------------
8 aaa
4 bbb
9 eee
Finally we can now delete all of these emails with an Id smaller than LASTID. So the solution is:
delete test
from test
inner join (
select max(id) as lastId, email
from test
where email in (
select email
from test
group by email
having count(*) > 1
)
group by email
) duplic on duplic.email = test.email
where test.id < duplic.lastId;
I don't have mySql installed on this machine right now, but should work
Update
The above delete works, but I found a more optimized version:
delete test
from test
inner join (
select max(id) as lastId, email
from test
group by email
having count(*) > 1) duplic on duplic.email = test.email
where test.id < duplic.lastId;
You can see that it deletes the oldest duplicates, i.e. 1, 7, 2, 6:
select * from test;
+----+-------+
| id | email |
+----+-------+
| 3 | ccc |
| 4 | bbb |
| 5 | ddd |
| 8 | aaa |
| 9 | eee |
+----+-------+
Another version, is the delete provived by Rene Limon
delete from test
where id not in (
select max(id)
from test
group by email)
Try this method
DELETE t1 FROM test t1, test t2
WHERE t1.id > t2.id AND t1.email = t2.email
Correct way is
DELETE FROM `tablename`
WHERE `id` NOT IN (
SELECT * FROM (
SELECT MAX(`id`) FROM `tablename`
GROUP BY `name`
)
)
DELETE
FROM
`tbl_job_title`
WHERE id NOT IN
(SELECT
*
FROM
(SELECT
MAX(id)
FROM
`tbl_job_title`
GROUP BY NAME) tbl)
revised and working version!!! thank you #Gaurav
If you want to keep the row with the lowest id value:
DELETE n1 FROM 'yourTableName' n1, 'yourTableName' n2 WHERE n1.id > n2.id AND n1.email = n2.email
If you want to keep the row with the highest id value:
DELETE n1 FROM 'yourTableName' n1, 'yourTableName' n2 WHERE n1.id < n2.id AND n1.email = n2.email
or this query might also help
DELETE FROM `yourTableName`
WHERE id NOT IN (
SELECT * FROM (
SELECT MAX(id) FROM yourTableName
GROUP BY name
)
)
I must say that the optimized version is one sweet, elegant piece of code, and it works like a charm even when the comparison is performed on a DATETIME column. This is what I used in my script, where I was searching for the latest contract end date for each EmployeeID:
DELETE CurrentContractData
FROM CurrentContractData
INNER JOIN (
SELECT
EmployeeID,
PeriodofPerformanceStartDate,
max(PeriodofPerformanceEndDate) as lastDate,
ContractID
FROM CurrentContractData
GROUP BY EmployeeID
HAVING COUNT(*) > 1) Duplicate on Duplicate.EmployeeID = CurrentContractData.EmployeeID
WHERE CurrentContractData.PeriodofPerformanceEndDate < Duplicate.lastDate;
Many thanks!
I personally had trouble with the top two voted answers. It's not the cleanest solution but you can utilize temporary tables to avoid all the issues MySQL has with deleting via joining on the same table.
CREATE TEMPORARY TABLE deleteRows;
SELECT MIN(id) as id FROM myTable GROUP BY myTable.email;
DELETE FROM myTable
WHERE id NOT IN (SELECT id FROM deleteRows);
DELIMITER //
CREATE FUNCTION findColumnNames(tableName VARCHAR(255))
RETURNS TEXT
BEGIN
SET #colNames = "";
SELECT GROUP_CONCAT(COLUMN_NAME) FROM INFORMATION_SCHEMA.columns
WHERE TABLE_NAME = tableName
GROUP BY TABLE_NAME INTO #colNames;
RETURN #colNames;
END //
DELIMITER ;
DELIMITER //
CREATE PROCEDURE deleteDuplicateRecords (IN tableName VARCHAR(255))
BEGIN
SET #colNames = findColumnNames(tableName);
SET #addIDStmt = CONCAT("ALTER TABLE ",tableName," ADD COLUMN id INT AUTO_INCREMENT KEY;");
SET #deleteDupsStmt = CONCAT("DELETE FROM ",tableName," WHERE id NOT IN
( SELECT * FROM ",
" (SELECT min(id) FROM ",tableName," group by ",findColumnNames(tableName),") AS tmpTable);");
set #dropIDStmt = CONCAT("ALTER TABLE ",tableName," DROP COLUMN id");
PREPARE addIDStmt FROM #addIDStmt;
EXECUTE addIDStmt;
PREPARE deleteDupsStmt FROM #deleteDupsStmt;
EXECUTE deleteDupsStmt;
PREPARE dropIDStmt FROM #dropIDStmt;
EXECUTE dropIDstmt;
END //
DELIMITER ;
Nice stored procedure I created for deleting all duplicate records of a table without needing an existing unique id on that table.
CALL deleteDuplicateRecords("yourTableName");
I want to remove duplicate records based on multiple columns in table, so this approach worked for me,
Step 1 - Get max id or unique id from duplocate records
select * FROM ( SELECT MAX(id) FROM table_name
group by travel_intimation_id,approved_by,approval_type,approval_status having
count(*) > 1
Step 2 - Get ids of single records from table
select * FROM ( SELECT id FROM table_name
group by travel_intimation_id,approved_by,approval_type,approval_status having
count(*) = 1
Step 3 - Exclude above 2 queries from delete to
DELETE FROM `table_name`
WHERE
id NOT IN (paste step 1 query) a //to exclude duplicate records
and
id NOT IN (paste step 2 query) b // to exclude single records
Final Query :-
DELETE FROM `table_name`
WHERE id NOT IN (
select * FROM ( SELECT MAX(id) FROM table_name
group by travel_intimation_id,approved_by,approval_type,approval_status having
count(*) > 1) a
)
and id not in (
select * FROM ( SELECT id FROM table_name
group by travel_intimation_id,approved_by,approval_type,approval_status having
count(*) = 1) b
);
By this query only duplocate records will delete.
Please try the following solution (based on the comments of the '#Jose Rui Santos' answer):
-- Set safe mode to false since;
-- You are using safe update mode and tried to update a table without a WHERE that uses a KEY column
SET SQL_SAFE_UPDATES = 0;
-- Delete the duplicate rows based on the field_with_duplicate_values
-- Keep the unique rows with the highest id
DELETE FROM table_to_deduplicate
WHERE id NOT IN (
SELECT * FROM (
-- Select the highest id grouped by the field_with_duplicate_values
SELECT MAX(id)
FROM table_to_deduplicate
GROUP BY field_with_duplicate_values
)
-- Subquery and alias needed since;
-- You can't specify target table 'table_to_deduplicate' for update in FROM clause
AS table_sub
);
-- Set safe mode to true
SET SQL_SAFE_UPDATES = 1;

How can we find gaps in sequential numbering in MySQL?

We have a database with a table whose values were imported from another system. There is an auto-increment column, and there aren’t any duplicate values, but there are missing values. For example, running this query:
select count(id) from arrc_vouchers where id between 1 and 100
should return 100, but it returns 87 instead. Is there a query I can run that will return the values of the missing numbers? For example, the records may exist for id 1-70 and 83-100, but there aren’t any records with id's of 71-82. I want to return 71, 72, 73, etc.
Is this possible?
A better answer
JustPlainMJS provided a much better answer in terms of performance.
The (not as fast as possible) answer
Here's a version that works on a table of any size (not just on 100 rows):
SELECT (t1.id + 1) as gap_starts_at,
(SELECT MIN(t3.id) -1 FROM arrc_vouchers t3 WHERE t3.id > t1.id) as gap_ends_at
FROM arrc_vouchers t1
WHERE NOT EXISTS (SELECT t2.id FROM arrc_vouchers t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL
gap_starts_at - first id in current gap
gap_ends_at - last id in current gap
This just worked for me to find the gaps in a table with more than 80k rows:
SELECT
CONCAT(z.expected, IF(z.got-1>z.expected, CONCAT(' thru ',z.got-1), '')) AS missing
FROM (
SELECT
#rownum:=#rownum+1 AS expected,
IF(#rownum=YourCol, 0, #rownum:=YourCol) AS got
FROM
(SELECT #rownum:=0) AS a
JOIN YourTable
ORDER BY YourCol
) AS z
WHERE z.got!=0;
Result:
+------------------+
| missing |
+------------------+
| 1 thru 99 |
| 666 thru 667 |
| 50000 |
| 66419 thru 66456 |
+------------------+
4 rows in set (0.06 sec)
Note that the order of columns expected and got is critical.
If you know that YourCol doesn't start at 1 and that doesn't matter, you can replace
(SELECT #rownum:=0) AS a
with
(SELECT #rownum:=(SELECT MIN(YourCol)-1 FROM YourTable)) AS a
New result:
+------------------+
| missing |
+------------------+
| 666 thru 667 |
| 50000 |
| 66419 thru 66456 |
+------------------+
3 rows in set (0.06 sec)
If you need to perform some kind of shell script task on the missing IDs, you can also use this variant in order to directly produce an expression you can iterate over in Bash.
SELECT GROUP_CONCAT(IF(z.got-1>z.expected, CONCAT('$(',z.expected,' ',z.got-1,')'), z.expected) SEPARATOR " ") AS missing
FROM ( SELECT #rownum:=#rownum+1 AS expected, IF(#rownum=height, 0, #rownum:=height) AS got FROM (SELECT #rownum:=0) AS a JOIN block ORDER BY height ) AS z WHERE z.got!=0;
This produces an output like so
$(seq 1 99) $(seq 666 667) 50000 $(seq 66419 66456)
You can then copy and paste it into a for loop in a bash terminal to execute a command for every ID
for ID in $(seq 1 99) $(seq 666 667) 50000 $(seq 66419 66456); do
echo $ID
# Fill the gaps
done
It's the same thing as above, only that it's both readable and executable. By changing the "CONCAT" command above, syntax can be generated for other programming languages. Or maybe even SQL.
A quick-and-dirty query that should do the trick:
SELECT a AS id, b AS next_id, (b - a) -1 AS missing_inbetween
FROM
(
SELECT a1.id AS a , MIN(a2.id) AS b
FROM arrc_vouchers AS a1
LEFT JOIN arrc_vouchers AS a2 ON a2.id > a1.id
WHERE a1.id <= 100
GROUP BY a1.id
) AS tab
WHERE
b > a + 1
This will give you a table showing the id that has ids missing above it, and next_id that exists, and how many are missing between... E.g.,
id next_id missing_inbetween
1 4 2
68 70 1
75 87 11
If you are using a MariaDB database, you have a faster (800%) option using the sequence storage engine:
SELECT * FROM seq_1_to_50000 WHERE SEQ NOT IN (SELECT COL FROM TABLE);
If there is a sequence having gap of maximum one between two numbers (like
1,3,5,6) then the query that can be used is:
select s.id+1 from source1 s where s.id+1 not in(select id from source1) and s.id+1<(select max(id) from source1);
table_name - source1
column_name - id
An alternative solution that requires a query + some code doing some processing would be:
select l.id lValue, c.id cValue, r.id rValue
from
arrc_vouchers l
right join arrc_vouchers c on l.id=IF(c.id > 0, c.id-1, null)
left join arrc_vouchers r on r.id=c.id+1
where 1=1
and c.id > 0
and (l.id is null or r.id is null)
order by c.id asc;
Note that the query does not contain any subselect that we know it's not handled performantly by MySQL's planner.
That will return one entry per centralValue (cValue) that does not have a smaller value (lValue) or a greater value (rValue), i.e.:
lValue |cValue|rValue
-------+------+-------
{null} | 2 | 3
8 | 9 | {null}
{null} | 22 | 23
23 | 24 | {null}
{null} | 29 | {null}
{null} | 33 | {null}
Without going into further details (we'll see them in next paragraphs) this output means that:
No values between 0 and 2
No values between 9 and 22
No values between 24 and 29
No values between 29 and 33
No values between 33 and MAX VALUE
So the basic idea is to do a RIGHT and LEFT joins with the same table seeing if we have adjacents values per value (i.e., if central value is '3' then we check for 3-1=2 at left and 3+1 at right), and when a ROW has a NULL value at RIGHT or LEFT then we know there is no adjacent value.
The complete raw output of my table is:
select * from arrc_vouchers order by id asc;
0
2
3
4
5
6
7
8
9
22
23
24
29
33
Some notes:
The SQL IF statement in the join condition is needed if you define the 'id' field as UNSIGNED, therefore it will not allow you to decrease it under zero. This is not strictly necessary if you keep the c.value > 0 as it's stated in the next note, but I'm including it just as doc.
I'm filtering the zero central value as we are not interested in any previous value and we can derive the post value from the next row.
I tried it in a different manner, and the best performance that I found was this simple query:
select a.id+1 gapIni
,(select x.id-1 from arrc_vouchers x where x.id>a.id+1 limit 1) gapEnd
from arrc_vouchers a
left join arrc_vouchers b on b.id=a.id+1
where b.id is null
order by 1
;
... one left join to check if the next id exists, only if next if is not found, then the subquery finds the next id that exists to find the end of gap. I did it because the query with equal (=) is better performance than the greater than (>) operator.
Using the sqlfiddle it does not show so a different performance compared to the other queries, but in a real database this query above results in 3 times faster than the others.
The schema:
CREATE TABLE arrc_vouchers (id int primary key)
;
INSERT INTO `arrc_vouchers` (`id`) VALUES (1),(4),(5),(7),(8),(9),(10),(11),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29)
;
Follow below all the queries that I made to compare the performance:
select a.id+1 gapIni
,(select x.id-1 from arrc_vouchers x where x.id>a.id+1 limit 1) gapEnd
from arrc_vouchers a
left join arrc_vouchers b on b.id=a.id+1
where b.id is null
order by 1
;
select *, (gapEnd-gapIni) qt
from (
select id+1 gapIni
,(select x.id from arrc_vouchers x where x.id>a.id limit 1) gapEnd
from arrc_vouchers a
order by id
) a where gapEnd <> gapIni
;
select id+1 gapIni
,(select x.id from arrc_vouchers x where x.id>a.id limit 1) gapEnd
#,coalesce((select id from arrc_vouchers x where x.id=a.id+1),(select x.id from arrc_vouchers x where x.id>a.id limit 1)) gapEnd
from arrc_vouchers a
where id+1 <> (select x.id from arrc_vouchers x where x.id>a.id limit 1)
order by id
;
select id+1 gapIni
,coalesce((select id from arrc_vouchers x where x.id=a.id+1),(select x.id from arrc_vouchers x where x.id>a.id limit 1)) gapEnd
from arrc_vouchers a
order by id
;
select id+1 gapIni
,coalesce((select id from arrc_vouchers x where x.id=a.id+1),concat('*** GAT *** ',(select x.id from arrc_vouchers x where x.id>a.id limit 1))) gapEnd
from arrc_vouchers a
order by id
;
You can see and test my query using this SQL Fiddle:
http://sqlfiddle.com/#!9/6bdca7/1
It is probably not relevant, but I was looking for something like this to list the gaps in a sequence of numbers and found this post that has multiple different solutions depending upon exactly what you are looking for. I was looking for the first available gap in the sequence (i.e., next available number), and this seems to work fine.
SELECT MIN(l.number_sequence + 1) as nextavabile
from patients as l
LEFT OUTER JOIN patients as r on l.number_sequence + 1 = r.number_sequence
WHERE r.number_sequence is NULL
Several other scenarios and solutions discussed there, from 2005!
How to Find Missing Values in a Sequence With SQL
Create a temporary table with 100 rows and a single column containing the values 1-100.
Outer Join this table to your arrc_vouchers table and select the single column values where the arrc_vouchers id is null.
This should work:
select tempid from temptable
left join arrc_vouchers on temptable.tempid = arrc_vouchers.id
where arrc_vouchers.id is null
Although these all seem to work, the result set returns in a very lengthy time when there are 50,000 records.
I used this, and it find the gap or the next available (last used + 1) with a much faster return from the query.
SELECT a.id as beforegap, a.id+1 as avail
FROM table_name a
where (select b.id from table_name b where b.id=a.id+1) is null
limit 1;
Based on the answer given by matt, this stored procedure allows you to specify the table and column names that you wish to test to find non-contiguous records - thus answering the original question and also demonstrating how one could use #var to represent tables &/or columns in a stored procedure.
create definer=`root`#`localhost` procedure `spfindnoncontiguous`(in `param_tbl` varchar(64), in `param_col` varchar(64))
language sql
not deterministic
contains sql
sql security definer
comment ''
begin
declare strsql varchar(1000);
declare tbl varchar(64);
declare col varchar(64);
set #tbl=cast(param_tbl as char character set utf8);
set #col=cast(param_col as char character set utf8);
set #strsql=concat("select
( t1.",#col," + 1 ) as starts_at,
( select min(t3.",#col,") -1 from ",#tbl," t3 where t3.",#col," > t1.",#col," ) as ends_at
from ",#tbl," t1
where not exists ( select t2.",#col," from ",#tbl," t2 where t2.",#col," = t1.",#col," + 1 )
having ends_at is not null");
prepare stmt from #strsql;
execute stmt;
deallocate prepare stmt;
end
A simple, yet effective, solution to find the missing auto-increment values:
SELECT `id`+1
FROM `table_name`
WHERE `id`+1 NOT IN (SELECT id FROM table_name)
Another simple answer that identifies the gaps. We do a query selecting just the odd numbers and we right join it to a query with all the even numbers. As long as you're not missing id 1; this should give you a comprehensive list of where the gaps start.
You'll still have to take a look at that place in the database to figure out how many numbers the gap is. I found this way easier than the solution proposed and much easier to customize to unique situations.
SELECT *
FROM (SELECT * FROM MyTABLE WHERE MYFIELD % 2 > 0) AS A
RIGHT JOIN FROM (SELECT * FROM MyTABLE WHERE MYFIELD % 2 = 0) AS B
ON A.MYFIELD=(B.MYFIELD+1)
WHERE a.id IS NULL;
This works for me:
SELECT distinct(l.membership_no + 1) as nextavabile
from Tablename as l
LEFT OUTER JOIN Tablename as r on l.membership_no + 1 = r.membership_no
WHERE r.membership_no is NULL and l.membership_no is not null order by nextavabile asc;
Starting from the comment posted by user933161,
select l.id + 1 as start from sequence as l inner join sequence as r on l.id + 1 = r.id where r.id is null;
is better in that it will not produce a false positive for the end of the list of records. (I'm not sure why so many are using left outer joins.)
Also,
insert into sequence (id) values (#);
where # is the start value for a gap will fill that start value. (If there are fields that cannot be null, you will have to add those with dummy values.)
You could alternate between querying for start values and filling in each start value until the query for start values returns an empty set.
Of course, this approach would only be helpful if you're working with a small enough data set that manually iterating like that is reasonable. I don't know enough about things like phpMyAdmin to come up with ways to automate it for larger sets with more and larger gaps.
CREATE TABLE arrc_vouchers (id int primary key);
INSERT INTO `arrc_vouchers` (`id`) VALUES (1),(4),(5),(7),(8),(9),(10),(11),(15),(16);
WITH RECURSIVE odd_num_cte (id) AS
(
SELECT (select min(id) from arrc_vouchers)
union all
SELECT id+1 from odd_num_cte where id <(SELECT max(id) from arrc_vouchers)
)
SELECT cte.id
from arrc_vouchers ar right outer join odd_num_cte cte on ar.id=cte.id
where ar.id is null;