BigQuery and NOT IN - mysql

Sorry for my bad English. Hopefully you understand, what I want.
I want something like a Pivot table (hopefully it's the right word)
For example I have a table with two columns: userid and domain
UserID Domain
1 | A
1 | B
1 | C
2 | A
2 | B
3 | A
2 | C
What I want. I want a table like the following which extracts the differences row-wise
A B C
A 0 1 1
B 0 0 0
C 0 0 0
How the read the output?
For example the first row (0,1,1)
Imagine all users which visited domain A (in our case user 1, user 2 and user 3).... All of domain A visitors were on domain A (I guess that's clear). The also visited domain B? No, one user (in our case user 3) was not on domain B. So we have a 1. Now we check if all domain A visitors were on domain C! And here we have also on user which was not on domain C. User 1 and 2 were on domain C but user 3 was not on domain C but on domain A. So we have to write a 1 again....
Second row - Check which users where on Domain B.
User 1 and user 2 were on domain B. Where they also on domain A? Yes... Both... So we have to write down a 0. User 1 and user 2 were on domain B? Yes... So 0. And on domain C? Yes... Both.. So we have to write a zero again.
Third row - To check the domain C
On domain C we have the visitors 1 and 2. Both also visited domain A so we have a zero... Both visited domain B? Yes, also zero and the last entry is clear since they came from domain C.....
To keep the long story short: I want to extract all exclusive visitors of each domain compared to the other domains...
I am struggling since 2 days with left joins and case when and so on... Nothing works out.
Is there anybody out their with suggestions? Would be really helpful. And yes, I have more than 3 domains. I have around 200!

very very big query :) , but it's working
DROP PROCEDURE IF EXISTS dowhile;
CREATE PROCEDURE dowhile()
BEGIN
SELECT #domain_arr := CONCAT(GROUP_CONCAT(domain SEPARATOR ','),',') AS domain_arr FROM ( SELECT t1.domain FROM user_domain t1 WHERE 1 GROUP BY t1.domain ) AS tt;
DROP table IF EXISTS temp_table;
create temporary table temp_table (
domain VARCHAR(100) not NULL
);
SET #domain_arr_table= #domain_arr;
WHILE LOCATE(',', #domain_arr_table) > 0 DO
SET #domain = SUBSTRING(#domain_arr_table,1,LOCATE(',',#domain_arr_table) - 1);
SET #domain_arr_table= SUBSTRING(#domain_arr_table, LOCATE(',',#domain_arr_table) + 1);
SET #s= CONCAT('ALTER TABLE temp_table ADD COLUMN ',#domain,' TINYINT DEFAULT 0');
PREPARE stmt3 FROM #s;
EXECUTE stmt3;
END WHILE;
WHILE LOCATE(',', #domain_arr) > 0 DO
SET #domain = SUBSTRING(#domain_arr,1,LOCATE(',',#domain_arr) - 1);
SET #domain_arr= SUBSTRING(#domain_arr, LOCATE(',',#domain_arr) + 1);
SELECT #user_count := COUNT(*) FROM user_domain WHERE domain=#domain;
INSERT INTO temp_table (domain) VALUES (#domain);
SELECT #domains_should_be_1 := CONCAT(GROUP_CONCAT(domain SEPARATOR ','),',') FROM (SELECT domain FROM user_domain WHERE user_id IN (SELECT user_id FROM user_domain WHERE domain=#domain) GROUP BY domain HAVING COUNT(*) < #user_count) AS tt2;
WHILE LOCATE(',', #domains_should_be_1) > 0 DO
SET #domain_sb_1 = SUBSTRING(#domains_should_be_1,1,LOCATE(',',#domains_should_be_1) - 1);
SET #domains_should_be_1= SUBSTRING(#domains_should_be_1, LOCATE(',',#domains_should_be_1) + 1);
SET #s= CONCAT("UPDATE temp_table SET ",#domain_sb_1,"='1' WHERE domain='",#domain,"'");
SELECT #s;
PREPARE stmt3 FROM #s;
EXECUTE stmt3;
END WHILE;
END WHILE;
END;
call dowhile();
SELECT * FROM temp_table;

There are really two questions here
I want to extract all exclusive visitors of each domain compared to the other domains...
I want something like a Pivot table
Let me answer your questions one by one
So,
How extract all exclusive visitors of each domain compared to the other domains...
Below is for BigQuery Standard SQL and produces flattened version of your matrix
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 1 userid, 'A' domain UNION ALL
SELECT 1, 'B' UNION ALL
SELECT 1, 'C' UNION ALL
SELECT 2, 'A' UNION ALL
SELECT 2, 'B' UNION ALL
SELECT 3, 'A' UNION ALL
SELECT 2, 'C'
), temp AS (
SELECT DISTINCT userid, domain
FROM `project.dataset.your_table`
)
SELECT
a.domain domain_a,
b.domain domain_b,
COUNT(DISTINCT a.userid) - COUNTIF(a.userid = b.userid) count_of_not_in
FROM temp a
CROSS JOIN temp b
GROUP BY a.domain, b.domain
-- HAVING count_of_not_in > 0
This will result with
Row domain_a domain_b count_of_not_in
1 A A 0
2 A B 1
3 A C 1
4 B A 0
5 B B 0
6 B C 0
7 C A 0
8 C B 0
9 C C 0
I think in real life you will not have many zeroes in this data so if you want to compress that flattened version - just uncomment line with HAVING ... , so you will get "compact" version
Row domain_a domain_b count_of_not_in
1 A B 1
2 A C 1
For the sake of exercising and having fun, check out another approach below that produces exactly same result but in totally different way
#standardSQL
WITH `project.dataset.your_table` AS (
SELECT 1 userid, 'A' domain UNION ALL
SELECT 1, 'B' UNION ALL
SELECT 1, 'C' UNION ALL
SELECT 2, 'A' UNION ALL
SELECT 2, 'B' UNION ALL
SELECT 3, 'A' UNION ALL
SELECT 2, 'C'
), domains AS (
SELECT domain, ARRAY_AGG(DISTINCT userid) users
FROM `project.dataset.your_table`
GROUP BY domain
)
SELECT
a.domain domain_a, b.domain domain_b,
ARRAY_LENGTH(a.users) -
(SELECT COUNT(1)
FROM UNNEST(a.users) user_a
JOIN UNNEST(b.users) user_b
ON user_a = user_b
) count_of_not_in
FROM domains a
CROSS JOIN domains b
-- ORDER BY a.domain, b.domain
Now,
How to pivot above result, to produce actual matrix?
Ideally, pivoting should be done outside of BigQuery in whatever visualization tool you are usually using. But if for whatever reason you want to have it done within BigQuery - it is doable and there is enormous amount of questions here in SO related to this. One of the most recent that I have posted answer for is - https://stackoverflow.com/a/50300387/5221944 .
It shows how to generate/produce pivot query to achieve desired matrix
It is relatively easy and can be done either manually as two step process (step 1 - generate pivot query and step 2 - run generated query) or can be implemented using any client of your choice

You cannot (easily) express this as a matrix. But you can express this as a table with three columns: , , count.
with t as ( -- may not be necessary if the rows are already unique
select distinct userid, domain
from tab
)
select t1.domain as domain1, t2.domain as domain2, count(*)
from t t1 join
t t2
on t1.userid = t2.userid
group by t1.domain, t2.domain;
You cannot easily pivot the results in BigQuery into columns, unless you explicitly know the domains that you care about. You can aggregate them into columns, if you like.
For a given set of domains as columns, you can use conditional aggregation:
with t as ( -- may not be necessary if the rows are already unique
select distinct userid, domain
from tab
)
select t1.domain as domain1,
sum(case when t2.domain = 'amazon.com' then 1 else 0 end) as amazon,
sum(case when t2.domain = 'ebay.com' then 1 else 0 end) as ebay,
sum(case when t2.domain = 'yahoo.com' then 1 else 0 end) as yahoo
from t t1 join
t t2
on t1.userid = t2.userid
group by t1.domain, t2.domain;

Related

Referencing the outer query from the case of a subquery in SQL server 2014

I'm trying to build a stored procedure that transforms a lot of data from a very condensed table with a key to find any given information, to a wide table with all of the information in columns per group. This is to preprocess information and alleviate at point of use. I've structured my query like below.
Select distinct a.group_,
, (select value from mytable b where a.group = b.group) as Information 1
from mytable a
However, when I want to use a case statement, it breaks the reference to the outer query
Select distinct a.group_,
, (case
when (select value from mytable b where a.group = b.group) is not null -- this breaks
then (select value from mytable b where a.group = b.group)
else (select anothervalue from mytable b where a.group = b.group)
end ) as Information 1
from mytable a
I thought about a work around with a simple case to find if the value is null, and execute an else statement, but I found that my 'is not null' didn't work in the when statment, and I needed to reference the outer query anyways for the else. So ultimately, I need some method to be able to conditionally select values for one column and have it tied to the group that I'm trying to transform. Any help would be appreciated, thanks.
Edit:
Below is an example. To clarify, I need to be able to conditionally select information from potentially multiple sources together into the same column for any given group. It's also going to be working on a large amount of data so I need it to be as minimally computationally intensive as possible. From this table, I will combine all of these groups in different combinations to have one line per collection of groups, with each group at least one column in the final table. It's a little more complicated than that, but that's the general idea.
if OBJECT_ID(N'tempdb..#tt1') is not null
drop table #tt1
declare #counter INT
set #counter = 1
create table #tt1 (group_ int, id1 int, id2 int, info varchar(10))
while (#counter <= 5)
begin
insert into #tt1 (group_, id1, id2, info)
select #counter, #counter, #counter, CONCAT('info ', cast(#counter as varchar(10)))
set #counter = #counter +1
end
set #counter = 1
while (#counter <= 5)
begin
insert into #tt1 (group_, id1, id2, info)
select #counter, #counter, #counter+1, CONCAT('info ', cast(#counter+5 as varchar(10)))
set #counter = #counter +1
end
select distinct a.group_,
(select info from #tt1 b where a.group_ = b.group_ and id1 = 1 and id2=2) as Group1Info6
from #tt1 a
--The above works fine
select distinct a.group_,
(case
when (select info from #tt1 b where A.group_ = b.group_ and id1=1 and id2 =2) is < 6
then (select info from #tt1 b where A.group_ = b.group_ and id1=1 and id2 =2)
else (select info from #tt1 b where A.group_ = b.group_ and id1=1 and id2=1)
end) as Group1info
from #tt1 a
--The above does not.
Edit2:
My desired results would look something like this. In my actual data there are many group 1's, with many group 1 info columns.
group_
Group1Info
Group2Info
Group3Info
Group4Info
Group5Info
1
info
2
info
3
info
4
info
5
info
Then I'll present the information more cleanly to the end user like this.
group_
Group1Info
Group2Info
Group3Info
Group4Info
Group5Info
1-5
info
info
info
info
info

MySQL - Finding how much duplicates are inside the same table given

Considering I have the following two sets of rows (same type) in a WHERE clause:
A B
1 1
2 2
3 4
I need to find how many A is in B
For example, for the given table above, it would be 66% since 2 out of 3 numbers are in B
Another example:
A B
1 1
2 2
3 4
5
3
Would give 100% since all of the numbers in A are in B
Here is what I tried myself: (Doesn't work on all test cases..)
DROP PROCEDURE IF EXISTS getProductsByDate;
DELIMITER //
CREATE PROCEDURE getProductsByDate (IN d_given date)
BEGIN
SELECT
Product,
COUNT(*) AS 'total Number',
(SELECT
(SELECT COUNT(DISTINCT Part) FROM products WHERE Product=B.Product) - COUNT(*)
FROM
products AS b2
WHERE
b2.SOP < B.SOP AND b2.Part != B.Part) AS 'New Parts',
CONCAT(round((SELECT
(SELECT COUNT(DISTINCT Part) FROM products WHERE Product=B.Product) - COUNT(*)
FROM
products AS b2
WHERE
b2.SOP < B.SOP AND b2.Part != B.Part)/count(DISTINCT part)*100, 0), '%') as 'Share New'
FROM
products AS B
WHERE
b.SOP < d_given
GROUP BY Product;
END//
DELIMITER ;
CALL getProductsByDate (date("2018-01-01"));
Thanks.
Naming your tables TA and TB respectively you could try something like this (test made on MSSQL and Mysql at moment)
SELECT ROUND(SUM(PERC) ,4)AS PERC_TOT
FROM (
SELECT DISTINCT TA.ID , 1.00/ (SELECT COUNT(DISTINCT ID) FROM TA) AS PERC
FROM TA
WHERE EXISTS ( SELECT DISTINCT ID FROM TB WHERE TB.ID=TA.ID)
) C;
Output with your first sample data set:
PERC_TOT
0,6667
Output with your second sample data set:
PERC_TOT
1,0000
Update (I made the original for two tables, as I was thinking at solution). This is for one single table (is almost the same than the former query): (I used ID1 for column A and ID2 for column B)
SELECT ROUND(SUM(PERC) ,4)AS PERC_TOT
FROM (
SELECT DISTINCT TA.ID1 , 1.00/ (SELECT COUNT(DISTINCT ID1) FROM TA) AS PERC
FROM TA
WHERE EXISTS ( SELECT DISTINCT ID2 FROM TA AS TB WHERE TB.ID2=TA.ID1)
) C;

SQL group by where values are same within group

So, I have a table with which individuals (person_id) have multiple lines (up to 4) and values for a column (value_column) can either = 0 or 1
I'd like to write a piece of code that returns a row for each person_id in which their value for value_column is only 0 or only 1 (even though they may have 3 or 4 rows each)
It's probably an easy line of code, but for someone with less SQL experience, it seems nearly impossible!
EDIT: here is a quick sample of lines:
person_id value_column
A 0
A 1
A 0
B 0
B 0
B 0
B 0
C 1
C 1
C 1
And I would expect the line of code to return the folowing:
person_id value_column
B 0
C 1
You can try something like this probably
select distinct * from table1
where person_id in
( select person_id
from table1
group by person_id
having count(distinct value_column) <= 1
)
Inner query, will return only those person_id for which there is only one value_column present and that's the same thing getting done by count(distinct value_column) <= 1 and then outer query just selects everything for those person_id.
select * from myTable where person_id not in
(select a.person_id from myTable a, myTable b
where a.person_id = b.person_id
and a.value_column <> b.value_column)
Get persons with different values and then get those who are not in this first query.
Or quicker and nicer :
select person_id, min(value_column)
from myTable
group by person_id
having min(value_column)=max(value_column)

What's an easy way to perform this complicated SELECT query?

Given these entries in a table table:
user entry
A 1
A 2
A 5
A 6
B 1
B 2
B 3
B 4
B 5
B 6
C 1
C 4
D 1
D 2
D 5
D 6
D 7
D 9
And we have a subset entries_A to work with, which is the array [1,2,5,6].
Problems:
Find all users that have the same entries [1,2,5,6] and more, e.g. [1,2,5,6,7] or [1,2,3,5,6].
Find all users that have a lot of the same entries (and more), e.g. [1,2,5,9] or [2,5,6,3].
The best solution to the first problem I could come up with, is the following select query:
SELECT DISTINCT user AS u FROM table WHERE EXISTS (SELECT * FROM table WHERE entry=1 AND user=u)
AND EXISTS(SELECT * FROM table WHERE entry=2 AND user=u)
AND EXISTS(SELECT * FROM table WHERE entry=5 AND user=u)
AND EXISTS(SELECT * FROM table WHERE entry=6 AND user=u)
On the other hand, I get a feeling there's some algebraic vector-problem lurking below the surface (especially for problem two) but I can't seem to wrap my head around it.
All ideas welcome!
I think the easiest way to perform this type of query is using aggregation and having. Here is an example.
To get A's that have exactly those four elements:
select user
from table
group by user
having sum(entry in (1,2,5,6)) > 0 and
count(distinct entry) = 4;
To get A's that have those four elements and perhaps others:
select user
from table
group by user
having sum(entry in (1,2,5,6)) > 0 and
count(distinct entry) >= 4;
To order users by the number of matches they have and the number of other matches:
select count(distinct case when entry in (1, 2, 5, 6) then entry end) as Matches,
count(distinct case when entry not in (1, 2, 5, 6) then entry end) as Others,
user
from table
group by user
order by Matches desc, Others;
For the first problem:
SELECT user FROM (
SELECT
DISTINCT user
FROM
table
WHERE entry IN (1,2,5,6)
) a JOIN table b ON a.user = b.user
GROUP BY a.user
HAVING COUNT(*) >= 4
For the second problem just decrease the count in the having clause.
This is how I would to your first query (though I think Gordon Linoff's answer is more efficient):
select distinct user from so s1
where not exists (
select * from so s2 where s2.entry in (1,2,5,6)
and not exists (
select * from so s3 where s2.entry = s3.entry and s1.user = s3.user
)
);
For the second problem, you would need to specify what a lot should mean... three, four, ...

How can we find gaps in sequential numbering in MySQL?

We have a database with a table whose values were imported from another system. There is an auto-increment column, and there aren’t any duplicate values, but there are missing values. For example, running this query:
select count(id) from arrc_vouchers where id between 1 and 100
should return 100, but it returns 87 instead. Is there a query I can run that will return the values of the missing numbers? For example, the records may exist for id 1-70 and 83-100, but there aren’t any records with id's of 71-82. I want to return 71, 72, 73, etc.
Is this possible?
A better answer
JustPlainMJS provided a much better answer in terms of performance.
The (not as fast as possible) answer
Here's a version that works on a table of any size (not just on 100 rows):
SELECT (t1.id + 1) as gap_starts_at,
(SELECT MIN(t3.id) -1 FROM arrc_vouchers t3 WHERE t3.id > t1.id) as gap_ends_at
FROM arrc_vouchers t1
WHERE NOT EXISTS (SELECT t2.id FROM arrc_vouchers t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL
gap_starts_at - first id in current gap
gap_ends_at - last id in current gap
This just worked for me to find the gaps in a table with more than 80k rows:
SELECT
CONCAT(z.expected, IF(z.got-1>z.expected, CONCAT(' thru ',z.got-1), '')) AS missing
FROM (
SELECT
#rownum:=#rownum+1 AS expected,
IF(#rownum=YourCol, 0, #rownum:=YourCol) AS got
FROM
(SELECT #rownum:=0) AS a
JOIN YourTable
ORDER BY YourCol
) AS z
WHERE z.got!=0;
Result:
+------------------+
| missing |
+------------------+
| 1 thru 99 |
| 666 thru 667 |
| 50000 |
| 66419 thru 66456 |
+------------------+
4 rows in set (0.06 sec)
Note that the order of columns expected and got is critical.
If you know that YourCol doesn't start at 1 and that doesn't matter, you can replace
(SELECT #rownum:=0) AS a
with
(SELECT #rownum:=(SELECT MIN(YourCol)-1 FROM YourTable)) AS a
New result:
+------------------+
| missing |
+------------------+
| 666 thru 667 |
| 50000 |
| 66419 thru 66456 |
+------------------+
3 rows in set (0.06 sec)
If you need to perform some kind of shell script task on the missing IDs, you can also use this variant in order to directly produce an expression you can iterate over in Bash.
SELECT GROUP_CONCAT(IF(z.got-1>z.expected, CONCAT('$(',z.expected,' ',z.got-1,')'), z.expected) SEPARATOR " ") AS missing
FROM ( SELECT #rownum:=#rownum+1 AS expected, IF(#rownum=height, 0, #rownum:=height) AS got FROM (SELECT #rownum:=0) AS a JOIN block ORDER BY height ) AS z WHERE z.got!=0;
This produces an output like so
$(seq 1 99) $(seq 666 667) 50000 $(seq 66419 66456)
You can then copy and paste it into a for loop in a bash terminal to execute a command for every ID
for ID in $(seq 1 99) $(seq 666 667) 50000 $(seq 66419 66456); do
echo $ID
# Fill the gaps
done
It's the same thing as above, only that it's both readable and executable. By changing the "CONCAT" command above, syntax can be generated for other programming languages. Or maybe even SQL.
A quick-and-dirty query that should do the trick:
SELECT a AS id, b AS next_id, (b - a) -1 AS missing_inbetween
FROM
(
SELECT a1.id AS a , MIN(a2.id) AS b
FROM arrc_vouchers AS a1
LEFT JOIN arrc_vouchers AS a2 ON a2.id > a1.id
WHERE a1.id <= 100
GROUP BY a1.id
) AS tab
WHERE
b > a + 1
This will give you a table showing the id that has ids missing above it, and next_id that exists, and how many are missing between... E.g.,
id next_id missing_inbetween
1 4 2
68 70 1
75 87 11
If you are using a MariaDB database, you have a faster (800%) option using the sequence storage engine:
SELECT * FROM seq_1_to_50000 WHERE SEQ NOT IN (SELECT COL FROM TABLE);
If there is a sequence having gap of maximum one between two numbers (like
1,3,5,6) then the query that can be used is:
select s.id+1 from source1 s where s.id+1 not in(select id from source1) and s.id+1<(select max(id) from source1);
table_name - source1
column_name - id
An alternative solution that requires a query + some code doing some processing would be:
select l.id lValue, c.id cValue, r.id rValue
from
arrc_vouchers l
right join arrc_vouchers c on l.id=IF(c.id > 0, c.id-1, null)
left join arrc_vouchers r on r.id=c.id+1
where 1=1
and c.id > 0
and (l.id is null or r.id is null)
order by c.id asc;
Note that the query does not contain any subselect that we know it's not handled performantly by MySQL's planner.
That will return one entry per centralValue (cValue) that does not have a smaller value (lValue) or a greater value (rValue), i.e.:
lValue |cValue|rValue
-------+------+-------
{null} | 2 | 3
8 | 9 | {null}
{null} | 22 | 23
23 | 24 | {null}
{null} | 29 | {null}
{null} | 33 | {null}
Without going into further details (we'll see them in next paragraphs) this output means that:
No values between 0 and 2
No values between 9 and 22
No values between 24 and 29
No values between 29 and 33
No values between 33 and MAX VALUE
So the basic idea is to do a RIGHT and LEFT joins with the same table seeing if we have adjacents values per value (i.e., if central value is '3' then we check for 3-1=2 at left and 3+1 at right), and when a ROW has a NULL value at RIGHT or LEFT then we know there is no adjacent value.
The complete raw output of my table is:
select * from arrc_vouchers order by id asc;
0
2
3
4
5
6
7
8
9
22
23
24
29
33
Some notes:
The SQL IF statement in the join condition is needed if you define the 'id' field as UNSIGNED, therefore it will not allow you to decrease it under zero. This is not strictly necessary if you keep the c.value > 0 as it's stated in the next note, but I'm including it just as doc.
I'm filtering the zero central value as we are not interested in any previous value and we can derive the post value from the next row.
I tried it in a different manner, and the best performance that I found was this simple query:
select a.id+1 gapIni
,(select x.id-1 from arrc_vouchers x where x.id>a.id+1 limit 1) gapEnd
from arrc_vouchers a
left join arrc_vouchers b on b.id=a.id+1
where b.id is null
order by 1
;
... one left join to check if the next id exists, only if next if is not found, then the subquery finds the next id that exists to find the end of gap. I did it because the query with equal (=) is better performance than the greater than (>) operator.
Using the sqlfiddle it does not show so a different performance compared to the other queries, but in a real database this query above results in 3 times faster than the others.
The schema:
CREATE TABLE arrc_vouchers (id int primary key)
;
INSERT INTO `arrc_vouchers` (`id`) VALUES (1),(4),(5),(7),(8),(9),(10),(11),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29)
;
Follow below all the queries that I made to compare the performance:
select a.id+1 gapIni
,(select x.id-1 from arrc_vouchers x where x.id>a.id+1 limit 1) gapEnd
from arrc_vouchers a
left join arrc_vouchers b on b.id=a.id+1
where b.id is null
order by 1
;
select *, (gapEnd-gapIni) qt
from (
select id+1 gapIni
,(select x.id from arrc_vouchers x where x.id>a.id limit 1) gapEnd
from arrc_vouchers a
order by id
) a where gapEnd <> gapIni
;
select id+1 gapIni
,(select x.id from arrc_vouchers x where x.id>a.id limit 1) gapEnd
#,coalesce((select id from arrc_vouchers x where x.id=a.id+1),(select x.id from arrc_vouchers x where x.id>a.id limit 1)) gapEnd
from arrc_vouchers a
where id+1 <> (select x.id from arrc_vouchers x where x.id>a.id limit 1)
order by id
;
select id+1 gapIni
,coalesce((select id from arrc_vouchers x where x.id=a.id+1),(select x.id from arrc_vouchers x where x.id>a.id limit 1)) gapEnd
from arrc_vouchers a
order by id
;
select id+1 gapIni
,coalesce((select id from arrc_vouchers x where x.id=a.id+1),concat('*** GAT *** ',(select x.id from arrc_vouchers x where x.id>a.id limit 1))) gapEnd
from arrc_vouchers a
order by id
;
You can see and test my query using this SQL Fiddle:
http://sqlfiddle.com/#!9/6bdca7/1
It is probably not relevant, but I was looking for something like this to list the gaps in a sequence of numbers and found this post that has multiple different solutions depending upon exactly what you are looking for. I was looking for the first available gap in the sequence (i.e., next available number), and this seems to work fine.
SELECT MIN(l.number_sequence + 1) as nextavabile
from patients as l
LEFT OUTER JOIN patients as r on l.number_sequence + 1 = r.number_sequence
WHERE r.number_sequence is NULL
Several other scenarios and solutions discussed there, from 2005!
How to Find Missing Values in a Sequence With SQL
Create a temporary table with 100 rows and a single column containing the values 1-100.
Outer Join this table to your arrc_vouchers table and select the single column values where the arrc_vouchers id is null.
This should work:
select tempid from temptable
left join arrc_vouchers on temptable.tempid = arrc_vouchers.id
where arrc_vouchers.id is null
Although these all seem to work, the result set returns in a very lengthy time when there are 50,000 records.
I used this, and it find the gap or the next available (last used + 1) with a much faster return from the query.
SELECT a.id as beforegap, a.id+1 as avail
FROM table_name a
where (select b.id from table_name b where b.id=a.id+1) is null
limit 1;
Based on the answer given by matt, this stored procedure allows you to specify the table and column names that you wish to test to find non-contiguous records - thus answering the original question and also demonstrating how one could use #var to represent tables &/or columns in a stored procedure.
create definer=`root`#`localhost` procedure `spfindnoncontiguous`(in `param_tbl` varchar(64), in `param_col` varchar(64))
language sql
not deterministic
contains sql
sql security definer
comment ''
begin
declare strsql varchar(1000);
declare tbl varchar(64);
declare col varchar(64);
set #tbl=cast(param_tbl as char character set utf8);
set #col=cast(param_col as char character set utf8);
set #strsql=concat("select
( t1.",#col," + 1 ) as starts_at,
( select min(t3.",#col,") -1 from ",#tbl," t3 where t3.",#col," > t1.",#col," ) as ends_at
from ",#tbl," t1
where not exists ( select t2.",#col," from ",#tbl," t2 where t2.",#col," = t1.",#col," + 1 )
having ends_at is not null");
prepare stmt from #strsql;
execute stmt;
deallocate prepare stmt;
end
A simple, yet effective, solution to find the missing auto-increment values:
SELECT `id`+1
FROM `table_name`
WHERE `id`+1 NOT IN (SELECT id FROM table_name)
Another simple answer that identifies the gaps. We do a query selecting just the odd numbers and we right join it to a query with all the even numbers. As long as you're not missing id 1; this should give you a comprehensive list of where the gaps start.
You'll still have to take a look at that place in the database to figure out how many numbers the gap is. I found this way easier than the solution proposed and much easier to customize to unique situations.
SELECT *
FROM (SELECT * FROM MyTABLE WHERE MYFIELD % 2 > 0) AS A
RIGHT JOIN FROM (SELECT * FROM MyTABLE WHERE MYFIELD % 2 = 0) AS B
ON A.MYFIELD=(B.MYFIELD+1)
WHERE a.id IS NULL;
This works for me:
SELECT distinct(l.membership_no + 1) as nextavabile
from Tablename as l
LEFT OUTER JOIN Tablename as r on l.membership_no + 1 = r.membership_no
WHERE r.membership_no is NULL and l.membership_no is not null order by nextavabile asc;
Starting from the comment posted by user933161,
select l.id + 1 as start from sequence as l inner join sequence as r on l.id + 1 = r.id where r.id is null;
is better in that it will not produce a false positive for the end of the list of records. (I'm not sure why so many are using left outer joins.)
Also,
insert into sequence (id) values (#);
where # is the start value for a gap will fill that start value. (If there are fields that cannot be null, you will have to add those with dummy values.)
You could alternate between querying for start values and filling in each start value until the query for start values returns an empty set.
Of course, this approach would only be helpful if you're working with a small enough data set that manually iterating like that is reasonable. I don't know enough about things like phpMyAdmin to come up with ways to automate it for larger sets with more and larger gaps.
CREATE TABLE arrc_vouchers (id int primary key);
INSERT INTO `arrc_vouchers` (`id`) VALUES (1),(4),(5),(7),(8),(9),(10),(11),(15),(16);
WITH RECURSIVE odd_num_cte (id) AS
(
SELECT (select min(id) from arrc_vouchers)
union all
SELECT id+1 from odd_num_cte where id <(SELECT max(id) from arrc_vouchers)
)
SELECT cte.id
from arrc_vouchers ar right outer join odd_num_cte cte on ar.id=cte.id
where ar.id is null;