I have this table in MySQL, for example:
ID | Name
1 | Bob
4 | Adam
6 | Someguy
If you notice, there is no ID number (2, 3 and 5).
How can I write a query so that MySQL would answer the missing IDs only, in this case: "2,3,5" ?
SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
FROM testtable AS a, testtable AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING start < MIN(b.id)
Hope this link also helps
http://www.codediesel.com/mysql/sequence-gaps-in-mysql/
A more efficent query:
SELECT (t1.id + 1) as gap_starts_at,
(SELECT MIN(t3.id) -1 FROM my_table t3 WHERE t3.id > t1.id) as gap_ends_at
FROM my_table t1
WHERE NOT EXISTS (SELECT t2.id FROM my_table t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL
Rather than returning multiple ranges of IDs, if you instead want to retrieve every single missing ID itself, each one on its own row, you could do the following:
SELECT id+1 FROM table WHERE id NOT IN (SELECT id-1 FROM table) ORDER BY 1
The query is very efficient. However, it also includes one extra row on the end, which is equal to the highest ID number, plus 1. This last row can be ignored in your server script, by checking for the number of rows returned (mysqli_num_rows), and then using a for loop if the number of rows is greater than 1 (the query will always return at least one row).
Edit:
I recently discovered that my original solution did not return all ID numbers that are missing, in cases where missing numbers are contiguous (i.e. right next to each other). However, the query is still useful in working out whether or not there are numbers missing at all, very quickly, and would be a time saver when used in conjunction with hagensoft's query (top answer). In other words, this query could be run first to test for missing IDs. If anything is found, then hagensoft's query could be run immediately afterwards to help identify the exact IDs that are missing (no time saved, but not much slower at all). If nothing is found, then a considerable amount of time is potentially saved, as hagensoft's query would not need to be run.
To add a little to Ivan's answer, this version shows numbers missing at the beginning if 1 doesn't exist:
SELECT 1 as gap_starts_at,
(SELECT MIN(t4.id) -1 FROM testtable t4 WHERE t4.id > 1) as gap_ends_at
FROM testtable t5
WHERE NOT EXISTS (SELECT t6.id FROM testtable t6 WHERE t6.id = 1)
HAVING gap_ends_at IS NOT NULL limit 1
UNION
SELECT (t1.id + 1) as gap_starts_at,
(SELECT MIN(t3.id) -1 FROM testtable t3 WHERE t3.id > t1.id) as gap_ends_at
FROM testtable t1
WHERE NOT EXISTS (SELECT t2.id FROM testtable t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL;
It would be far more efficient to get the start of the gap in one query and the end of the gap in one query.
I had 18M records and it took me less than a second each to get the two results. When I tried getting them together my query timed out after an hour.
Get the start of gap:
SELECT (t1.id + 1) as MissingID
FROM sequence t1
WHERE NOT EXISTS
(SELECT t2.id
FROM sequence t2
WHERE t2.id = t1.id + 1);
Get the end of gap:
SELECT (t1.id - 1) as MissingID
FROM sequence t1
WHERE NOT EXISTS
(SELECT t2.id
FROM sequence t2
WHERE t2.id = t1.id - 1);
Above queries will give two columns so you can try this to get the missing numbers in a single column
select start from
(SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
FROM sequence AS a, sequence AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING start < MIN(b.id)) b
UNION
select c.end from (SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
FROM sequence AS a, sequence AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING start < MIN(b.id)) c order by start;
By using window functions (available in mysql 8)
finding the gaps in the id column can be expressed as:
WITH gaps AS
(
SELECT
LAG(id, 1, 0) OVER(ORDER BY id) AS gap_begin,
id AS gap_end,
id - LAG(id, 1, 0) OVER(ORDER BY id) AS gap
FROM test
)
SELECT
gap_begin,
gap_end
FROM gaps
WHERE gap > 1
;
if you are on the older version of the mysql you would have to rely on the variables (so called poor-man's window function idiom)
SELECT
gap_begin,
gap_end
FROM (
SELECT
#id_previous AS gap_begin,
id AS gap_end,
id - #id_previous AS gap,
#id_previous := id
FROM (
SELECT
t.id
FROM test t
ORDER BY t.id
) AS sorted
JOIN (
SELECT
#id_previous := 0
) AS init_vars
) AS gaps
WHERE gap > 1
;
if you want a lighter way to search millions of rows of data,
SET #st=0,#diffSt=0,#diffEnd=0;
SELECT res.startID, res.endID, res.diff
, CONCAT(
"SELECT * FROM lost_consumer WHERE ID BETWEEN "
,res.startID+1, " AND ", res.endID-1) as `query`
FROM (
SELECT
#diffSt:=(#st) `startID`
, #diffEnd:=(a.ID) `endID`
, #st:=a.ID `end`
, #diffEnd-#diffSt-1 `diff`
FROM consumer a
ORDER BY a.ID
) res
WHERE res.diff>0;
check out this http://sqlfiddle.com/#!9/3ea00c/9
Related
I have this table in MySQL, for example:
ID | Name
1 | Bob
4 | Adam
6 | Someguy
If you notice, there is no ID number (2, 3 and 5).
How can I write a query so that MySQL would answer the missing IDs only, in this case: "2,3,5" ?
SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
FROM testtable AS a, testtable AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING start < MIN(b.id)
Hope this link also helps
http://www.codediesel.com/mysql/sequence-gaps-in-mysql/
A more efficent query:
SELECT (t1.id + 1) as gap_starts_at,
(SELECT MIN(t3.id) -1 FROM my_table t3 WHERE t3.id > t1.id) as gap_ends_at
FROM my_table t1
WHERE NOT EXISTS (SELECT t2.id FROM my_table t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL
Rather than returning multiple ranges of IDs, if you instead want to retrieve every single missing ID itself, each one on its own row, you could do the following:
SELECT id+1 FROM table WHERE id NOT IN (SELECT id-1 FROM table) ORDER BY 1
The query is very efficient. However, it also includes one extra row on the end, which is equal to the highest ID number, plus 1. This last row can be ignored in your server script, by checking for the number of rows returned (mysqli_num_rows), and then using a for loop if the number of rows is greater than 1 (the query will always return at least one row).
Edit:
I recently discovered that my original solution did not return all ID numbers that are missing, in cases where missing numbers are contiguous (i.e. right next to each other). However, the query is still useful in working out whether or not there are numbers missing at all, very quickly, and would be a time saver when used in conjunction with hagensoft's query (top answer). In other words, this query could be run first to test for missing IDs. If anything is found, then hagensoft's query could be run immediately afterwards to help identify the exact IDs that are missing (no time saved, but not much slower at all). If nothing is found, then a considerable amount of time is potentially saved, as hagensoft's query would not need to be run.
To add a little to Ivan's answer, this version shows numbers missing at the beginning if 1 doesn't exist:
SELECT 1 as gap_starts_at,
(SELECT MIN(t4.id) -1 FROM testtable t4 WHERE t4.id > 1) as gap_ends_at
FROM testtable t5
WHERE NOT EXISTS (SELECT t6.id FROM testtable t6 WHERE t6.id = 1)
HAVING gap_ends_at IS NOT NULL limit 1
UNION
SELECT (t1.id + 1) as gap_starts_at,
(SELECT MIN(t3.id) -1 FROM testtable t3 WHERE t3.id > t1.id) as gap_ends_at
FROM testtable t1
WHERE NOT EXISTS (SELECT t2.id FROM testtable t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL;
It would be far more efficient to get the start of the gap in one query and the end of the gap in one query.
I had 18M records and it took me less than a second each to get the two results. When I tried getting them together my query timed out after an hour.
Get the start of gap:
SELECT (t1.id + 1) as MissingID
FROM sequence t1
WHERE NOT EXISTS
(SELECT t2.id
FROM sequence t2
WHERE t2.id = t1.id + 1);
Get the end of gap:
SELECT (t1.id - 1) as MissingID
FROM sequence t1
WHERE NOT EXISTS
(SELECT t2.id
FROM sequence t2
WHERE t2.id = t1.id - 1);
Above queries will give two columns so you can try this to get the missing numbers in a single column
select start from
(SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
FROM sequence AS a, sequence AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING start < MIN(b.id)) b
UNION
select c.end from (SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
FROM sequence AS a, sequence AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING start < MIN(b.id)) c order by start;
By using window functions (available in mysql 8)
finding the gaps in the id column can be expressed as:
WITH gaps AS
(
SELECT
LAG(id, 1, 0) OVER(ORDER BY id) AS gap_begin,
id AS gap_end,
id - LAG(id, 1, 0) OVER(ORDER BY id) AS gap
FROM test
)
SELECT
gap_begin,
gap_end
FROM gaps
WHERE gap > 1
;
if you are on the older version of the mysql you would have to rely on the variables (so called poor-man's window function idiom)
SELECT
gap_begin,
gap_end
FROM (
SELECT
#id_previous AS gap_begin,
id AS gap_end,
id - #id_previous AS gap,
#id_previous := id
FROM (
SELECT
t.id
FROM test t
ORDER BY t.id
) AS sorted
JOIN (
SELECT
#id_previous := 0
) AS init_vars
) AS gaps
WHERE gap > 1
;
if you want a lighter way to search millions of rows of data,
SET #st=0,#diffSt=0,#diffEnd=0;
SELECT res.startID, res.endID, res.diff
, CONCAT(
"SELECT * FROM lost_consumer WHERE ID BETWEEN "
,res.startID+1, " AND ", res.endID-1) as `query`
FROM (
SELECT
#diffSt:=(#st) `startID`
, #diffEnd:=(a.ID) `endID`
, #st:=a.ID `end`
, #diffEnd-#diffSt-1 `diff`
FROM consumer a
ORDER BY a.ID
) res
WHERE res.diff>0;
check out this http://sqlfiddle.com/#!9/3ea00c/9
I would like to get the difference between 2 consecutive rows in the MySql. I am trying to resolve, but no luck. Here is the data in the image
I need a difference between rows of "Data2" column and results into "Diff" column.
Thanks for your kind attention and much appreciated for your help.
-Ram
If the table have an auto incremental column 'id', We can order by id and identify the next row value and subtract it
SELECT t1.*, t1.Data2-(SELECT t2.Data2 FROM `table_name` t2 WHERE t2.id > t1.id LIMIT 1 ) AS difference
FROM `table_name` t1
ORDER BY t1.id
to subtract from next row value
SELECT t1.*, t1.Data2-(SELECT t2.Data2 FROM `table_name` t2 WHERE t2.id < t1.id ORDER BY id DESC LIMIT 1 ) AS difference
FROM `table_name` t1
ORDER BY t1.id
Since you don't have a unique column in your table, you can achieve this by including a bind variable [#rn & #rn1] which adds a unique number sequentially to every row in the table.
Try this:
SELECT tab1.application_id, tab1.fiscal_year, tab1.data1, coalesce(cast(tab1.data2 as signed) -
cast(tab2.data2 as signed), tab1.data2) as diff
FROM
(SELECT b.application_id, #rn1:=#rn1+1 AS rank, b.fiscal_year, b.data1, b.data2
FROM your_table b, (SELECT #rn1:=0) t1) as tab1,
(SELECT a.application_id, #rn:=#rn+1 AS rank, a.fiscal_year, a.data1, a.data2
FROM your_table a, (SELECT #rn:=0) t2) as tab2
WHERE tab1.rank = tab2.rank + 1;
I have a table where i have a default value for the timestamp, e.g. 2013-06-15 12:00:00. There are at least 150 records with that value. Now I want to increment each of these timestamps by 1 second, taking into account that after 59 seconds, next value is next minute. Is this possible? Can you help? Thanks!
Here is another simple approach: (#kordirko, Thank you for your sqlfiddle)
SET #serial:=1;
UPDATE table1 SET t = t + INTERVAL (#serial:=#serial+1) SECOND;
You can test here. http://www.sqlfiddle.com/#!2/f5cbe/1
Assumming that the table has unique id column, this query can do this task:
UPDATE Table1 t1, (
SELECT t1.id, count(*) cnt
FROM table1 t1 JOIN table1 t2 ON t1.id >= t2.id
GROUP BY t1.id
) t2
SET t1.t = t1.t + interval t2.cnt second
WHERE t1.id = t2.id;
demo --> http://www.sqlfiddle.com/#!2/e6ef9b/1
My table has an NAME and DISTANCE column. I'd like to figure out a way to list all the names that are within N units or less from the same name. i.e. Given:
NAME DISTANCE
a 2
a 4
a 3
a 7
a 1
b 3
b 1
b 2
b 5
(let's say N = 2)
I would like
a 2
a 4
a 3
a 1
...
...
Instead of
a 2
a 2 (because it double counts)
I'm trying to apply this method in order to solve for a customerID with claim dates (stored as number) that appear in clusters around each other. I'd like to be able to label the customerID and the claim date that is within say 10 days of another claim by that same customer. i.e., |a.claimdate - b.claimdate| <= 10. When I use this method
WHERE a.CUSTID = b.CUSTID
AND a.CLDATE BETWEEN (b.CLDATE - 10 AND b.CLDATE + 10)
AND a.CLAIMID <> b.CLAIMID
I double count. CLAIMID is unique.
Since you don't need the text, and just want the values, you can accomplish that using DISTINCT:
select distinct t.name, t.distance
from yourtable t
join yourtable t2 on t.name = t2.name
and (t.distance = t2.distance+1 or t.distance = t2.distance-1)
order by t.name
SQL Fiddle Demo
Given your edits, if you're looking for results between a certain distance, you can use >= and <= (or BETWEEN):
select distinct t.name, t.distance
from yourtable t
join yourtable t2 on t.name = t2.name
and t.distance >= t2.distance-1
and t.distance <= t2.distance+1
and t.distance <> t2.distance
order by t.name
You need to add the final criteria of t.distance <> t2.distance so you don't return the entire dataset -- technically every distance is between itself. This would be better if you had a primary key to add to the join, but if you don't, you could utilize ROW_NUMBER() as well to achieve the same results.
with cte as (
select name, distance, row_number() over (partition by name order by (select null)) rn
from yourtable
)
select distinct t.name, t.distance
from cte t
join cte t2 on t.name = t2.name
and t.distance >= t2.distance-1
and t.distance <= t2.distance+1
and t.rn <> t2.rn
order by t.name
Updated SQL Fiddle
I like #sgeddes' solution, but you can also get rid of the distinct and or in the join condition like this:
select * from table a
where exists (
select 1 from table b
where b.name = a.name
and b.distance between a.distance - 1 and a.distance + 1
)
This also ensures that rows with equal distance get included and considers a whole range, not just the rows that have a distance difference of exactly n, as suggested by #HABO.
I need to select how many days since there is a break in my data. It's easier to show:
Table format:
id (autoincrement), user_id (int), start (datetime), end (datetime)
Example data (times left out as only need days):
1, 5, 2011-12-18, 2011-12-18
2, 5, 2011-12-17, 2011-12-17
3, 5, 2011-12-16, 2011-12-16
4, 5, 2011-12-13, 2011-12-13
As you can see there would be a break between 2011-12-13 and 2011-12-16. Now, I need to be able say:
Using the date 2011-12-18, how many days are there until a break:
2011-12-18: Lowest sequential date = 2011-12-16: Total consecutive days: 3
Probably: DATE_DIFF(2011-12-18, 2011-12-16)
So my problem is, how can I select that 2011-12-16 is the lowest sequential date? Remembering that data applies for particular user_id's.
It's kinda like the example here: http://www.artfulsoftware.com/infotree/queries.php#72 but in the reverse.
I'd like this done in SQL only, no php code
Thanks
SELECT qmin.start, qmax.end, DATE_DIFF( qmax.end, qmin.start ) FROM table AS qmin
LEFT JOIN (
SELECT end FROM table AS t1
LEFT JOIN table AS t2 ON
t2.start > t1.end AND
t2.start < DATE_ADD( t1.end, 1 DAY )
WHERE t1.end >= '2011-12-18' AND t2.start IS NULL
ORDER BY end ASC LIMIT 1
) AS qmax
LEFT JOIN table AS t2 ON
t2.end < qmin.start AND
t2.end > DATE_DIFF( qmin.start, 1 DAY )
WHERE qmin.start <= '2011-12-18' AND t2.start IS NULL
ORDER BY end DESC LIMIT 1
This should work - left joins selects one date which can be in sequence, so max can be fineded out if you take the nearest record without sequential record ( t2.anyfield is null ) , same thing we do with minimal date.
If you can calculate days between in script - do it using unions ( eg 1. row - minimal, 2. row maximal )
Check this,
SELECT DATEDIFF((SELECT MAX(`start`) FROM testtbl WHERE `user_id`=1),
(select a.`start` from testtbl as a
left outer join testtbl as b on a.user_id = b.user_id
AND a.`start` = b.`start` + INTERVAL 1 DAY
where a.user_id=1 AND b.`start` is null
ORDER BY a.`start` desc LIMIT 1))
DATEDIFF() show difference of the Two days, if you want to number of consecutive days add one for that result.
If it's not a beauty contents then you may try something like:
select t.start, t2.start, datediff(t2.start, t.start) + 1 as consecutive_days
from tab t
join tab t2 on t2.start = (select min(start) from (
select c1.*, case when c2.id is null then 1 else 0 end as gap
from tab c1
left join tab c2 on c1.start = adddate(c2.start, -1)
) t4 where t4.start <= t.start and t4.start >= (select max(start) from (
select c1.*, case when c2.id is null then 1 else 0 end as gap
from tab c1
left join tab c2 on c1.start = adddate(c2.start, -1)
) t3 where t3.start <= t.start and t3.gap = 1))
where t.start = '2011-12-18'
Result should be:
start start consecutive_days
2011-12-18 2011-12-16 3