My table has an NAME and DISTANCE column. I'd like to figure out a way to list all the names that are within N units or less from the same name. i.e. Given:
NAME DISTANCE
a 2
a 4
a 3
a 7
a 1
b 3
b 1
b 2
b 5
(let's say N = 2)
I would like
a 2
a 4
a 3
a 1
...
...
Instead of
a 2
a 2 (because it double counts)
I'm trying to apply this method in order to solve for a customerID with claim dates (stored as number) that appear in clusters around each other. I'd like to be able to label the customerID and the claim date that is within say 10 days of another claim by that same customer. i.e., |a.claimdate - b.claimdate| <= 10. When I use this method
WHERE a.CUSTID = b.CUSTID
AND a.CLDATE BETWEEN (b.CLDATE - 10 AND b.CLDATE + 10)
AND a.CLAIMID <> b.CLAIMID
I double count. CLAIMID is unique.
Since you don't need the text, and just want the values, you can accomplish that using DISTINCT:
select distinct t.name, t.distance
from yourtable t
join yourtable t2 on t.name = t2.name
and (t.distance = t2.distance+1 or t.distance = t2.distance-1)
order by t.name
SQL Fiddle Demo
Given your edits, if you're looking for results between a certain distance, you can use >= and <= (or BETWEEN):
select distinct t.name, t.distance
from yourtable t
join yourtable t2 on t.name = t2.name
and t.distance >= t2.distance-1
and t.distance <= t2.distance+1
and t.distance <> t2.distance
order by t.name
You need to add the final criteria of t.distance <> t2.distance so you don't return the entire dataset -- technically every distance is between itself. This would be better if you had a primary key to add to the join, but if you don't, you could utilize ROW_NUMBER() as well to achieve the same results.
with cte as (
select name, distance, row_number() over (partition by name order by (select null)) rn
from yourtable
)
select distinct t.name, t.distance
from cte t
join cte t2 on t.name = t2.name
and t.distance >= t2.distance-1
and t.distance <= t2.distance+1
and t.rn <> t2.rn
order by t.name
Updated SQL Fiddle
I like #sgeddes' solution, but you can also get rid of the distinct and or in the join condition like this:
select * from table a
where exists (
select 1 from table b
where b.name = a.name
and b.distance between a.distance - 1 and a.distance + 1
)
This also ensures that rows with equal distance get included and considers a whole range, not just the rows that have a distance difference of exactly n, as suggested by #HABO.
Related
I have this table in MySQL, for example:
ID | Name
1 | Bob
4 | Adam
6 | Someguy
If you notice, there is no ID number (2, 3 and 5).
How can I write a query so that MySQL would answer the missing IDs only, in this case: "2,3,5" ?
SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
FROM testtable AS a, testtable AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING start < MIN(b.id)
Hope this link also helps
http://www.codediesel.com/mysql/sequence-gaps-in-mysql/
A more efficent query:
SELECT (t1.id + 1) as gap_starts_at,
(SELECT MIN(t3.id) -1 FROM my_table t3 WHERE t3.id > t1.id) as gap_ends_at
FROM my_table t1
WHERE NOT EXISTS (SELECT t2.id FROM my_table t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL
Rather than returning multiple ranges of IDs, if you instead want to retrieve every single missing ID itself, each one on its own row, you could do the following:
SELECT id+1 FROM table WHERE id NOT IN (SELECT id-1 FROM table) ORDER BY 1
The query is very efficient. However, it also includes one extra row on the end, which is equal to the highest ID number, plus 1. This last row can be ignored in your server script, by checking for the number of rows returned (mysqli_num_rows), and then using a for loop if the number of rows is greater than 1 (the query will always return at least one row).
Edit:
I recently discovered that my original solution did not return all ID numbers that are missing, in cases where missing numbers are contiguous (i.e. right next to each other). However, the query is still useful in working out whether or not there are numbers missing at all, very quickly, and would be a time saver when used in conjunction with hagensoft's query (top answer). In other words, this query could be run first to test for missing IDs. If anything is found, then hagensoft's query could be run immediately afterwards to help identify the exact IDs that are missing (no time saved, but not much slower at all). If nothing is found, then a considerable amount of time is potentially saved, as hagensoft's query would not need to be run.
To add a little to Ivan's answer, this version shows numbers missing at the beginning if 1 doesn't exist:
SELECT 1 as gap_starts_at,
(SELECT MIN(t4.id) -1 FROM testtable t4 WHERE t4.id > 1) as gap_ends_at
FROM testtable t5
WHERE NOT EXISTS (SELECT t6.id FROM testtable t6 WHERE t6.id = 1)
HAVING gap_ends_at IS NOT NULL limit 1
UNION
SELECT (t1.id + 1) as gap_starts_at,
(SELECT MIN(t3.id) -1 FROM testtable t3 WHERE t3.id > t1.id) as gap_ends_at
FROM testtable t1
WHERE NOT EXISTS (SELECT t2.id FROM testtable t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL;
It would be far more efficient to get the start of the gap in one query and the end of the gap in one query.
I had 18M records and it took me less than a second each to get the two results. When I tried getting them together my query timed out after an hour.
Get the start of gap:
SELECT (t1.id + 1) as MissingID
FROM sequence t1
WHERE NOT EXISTS
(SELECT t2.id
FROM sequence t2
WHERE t2.id = t1.id + 1);
Get the end of gap:
SELECT (t1.id - 1) as MissingID
FROM sequence t1
WHERE NOT EXISTS
(SELECT t2.id
FROM sequence t2
WHERE t2.id = t1.id - 1);
Above queries will give two columns so you can try this to get the missing numbers in a single column
select start from
(SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
FROM sequence AS a, sequence AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING start < MIN(b.id)) b
UNION
select c.end from (SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
FROM sequence AS a, sequence AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING start < MIN(b.id)) c order by start;
By using window functions (available in mysql 8)
finding the gaps in the id column can be expressed as:
WITH gaps AS
(
SELECT
LAG(id, 1, 0) OVER(ORDER BY id) AS gap_begin,
id AS gap_end,
id - LAG(id, 1, 0) OVER(ORDER BY id) AS gap
FROM test
)
SELECT
gap_begin,
gap_end
FROM gaps
WHERE gap > 1
;
if you are on the older version of the mysql you would have to rely on the variables (so called poor-man's window function idiom)
SELECT
gap_begin,
gap_end
FROM (
SELECT
#id_previous AS gap_begin,
id AS gap_end,
id - #id_previous AS gap,
#id_previous := id
FROM (
SELECT
t.id
FROM test t
ORDER BY t.id
) AS sorted
JOIN (
SELECT
#id_previous := 0
) AS init_vars
) AS gaps
WHERE gap > 1
;
if you want a lighter way to search millions of rows of data,
SET #st=0,#diffSt=0,#diffEnd=0;
SELECT res.startID, res.endID, res.diff
, CONCAT(
"SELECT * FROM lost_consumer WHERE ID BETWEEN "
,res.startID+1, " AND ", res.endID-1) as `query`
FROM (
SELECT
#diffSt:=(#st) `startID`
, #diffEnd:=(a.ID) `endID`
, #st:=a.ID `end`
, #diffEnd-#diffSt-1 `diff`
FROM consumer a
ORDER BY a.ID
) res
WHERE res.diff>0;
check out this http://sqlfiddle.com/#!9/3ea00c/9
I have a table with 3 columns id, type, value like in image below.
What I'm trying to do is to make a query to get the data in this format:
type previous current
month-1 666 999
month-2 200 15
month-3 0 12
I made this query but it gets just the last value
select *
from statistics
where id in (select max(id) from statistics group by type)
order
by type
EDIT: Live example http://sqlfiddle.com/#!9/af81da/1
Thanks!
I would write this as:
select s.*,
(select s2.value
from statistics s2
where s2.type = s.type
order by id desc
limit 1, 1
) value_prev
from statistics s
where id in (select max(id) from statistics s group by type) order by type;
This should be relatively efficient with an index on statistics(type, id).
select
type,
ifnull(max(case when seq = 2 then value end),0 ) previous,
max( case when seq = 1 then value end ) current
from
(
select *, (select count(*)
from statistics s
where s.type = statistics.type
and s.id >= statistics.id) seq
from statistics ) t
where seq <= 2
group by type
I trying to get the last 6 months of the min and max of prices in my table and display them as a group by months. My query is not returning the corresponding rows values, such as the date time for when the max price was or min..
I want to select the min & max prices and the date time they both occurred and the rest of the data for that row...
(the reason why i have concat for report_term, as i need to print this with the dataset when displaying results. e.g. February 2018 -> ...., January 2018 -> ...)
SELECT metal_price_id, CONCAT(MONTHNAME(metal_price_datetime), ' ', YEAR(metal_price_datetime)) AS report_term, max(metal_price) as highest_gold_price, metal_price_datetime FROM metal_prices_v2
WHERE metal_id = 1
AND DATEDIFF(NOW(), metal_price_datetime) BETWEEN 0 AND 180
GROUP BY report_term
ORDER BY metal_price_datetime DESC
I have made an example, extract from my DB:
http://sqlfiddle.com/#!9/617bcb2/4/0
My desired result would be to see the min and max prices grouped by month, date of min, date of max.. and all in the last 6 months.
thanks
UPDATE.
The below code works, but it returns back rows from beyond the 180 days specified. I have just checked, and it is because it joining by the price which may be duplicated a number of times during the years.... see: http://sqlfiddle.com/#!9/5f501b/1
You could use twice inner join on the subselect for min and max
select a.metal_price_datetime
, t1.highest_gold_price
, t1.report_term
, t2.lowest_gold_price
,t2.metal_price_datetime
from metal_prices_v2 a
inner join (
SELECT CONCAT(MONTHNAME(metal_price_datetime), ' ', YEAR(metal_price_datetime)) AS report_term
, max(metal_price) as highest_gold_price
from metal_prices_v2
WHERE metal_id = 1
AND DATEDIFF(NOW(), metal_price_datetime) BETWEEN 0 AND 180
GROUP BY report_term
) t1 on t1.highest_gold_price = a.metal_price
inner join (
select a.metal_price_datetime
, t.lowest_gold_price
, t.report_term
from metal_prices_v2 a
inner join (
SELECT CONCAT(MONTHNAME(metal_price_datetime), ' ', YEAR(metal_price_datetime)) AS report_term
, min(metal_price) as lowest_gold_price
from metal_prices_v2
WHERE metal_id = 1
AND DATEDIFF(NOW(), metal_price_datetime) BETWEEN 0 AND 180
GROUP BY report_term
) t on t.lowest_gold_price = a.metal_price
) t2 on t2.report_term = t1.report_term
simplified version of what you should do so you can learn the working process.
You need calculate the min() max() of the periods you need. That is your first brick on this building.
you have tableA, you calculate min() lets call it R1
SELECT group_field, min() as min_value
FROM TableA
GROUP BY group_field
same for max() call it R2
SELECT group_field, max() as max_value
FROM TableA
GROUP BY group_field
Now you need to bring all the data from original fields so you join each result with your original table
We call those T1 and T2:
SELECT tableA.group_field, tableA.value, tableA.date
FROM tableA
JOIN ( ... .. ) as R1
ON tableA.group_field = R1.group_field
AND tableA.value = R1.min_value
SELECT tableA.group_field, tableA.value, tableA.date
FROM tableA
JOIN ( ... .. ) as R2
ON tableA.group_field = R2.group_field
AND tableA.value = R2.max_value
Now we join T1 and T2.
SELECT *
FROM ( .... ) as T1
JOIN ( .... ) as T2
ON t1.group_field = t2.group_field
So the idea is if you can do a brick, you do the next one. Then you also can add filters like last 6 months or something else you need.
In this case the group_field is the CONCAT() value
I have this table in MySQL, for example:
ID | Name
1 | Bob
4 | Adam
6 | Someguy
If you notice, there is no ID number (2, 3 and 5).
How can I write a query so that MySQL would answer the missing IDs only, in this case: "2,3,5" ?
SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
FROM testtable AS a, testtable AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING start < MIN(b.id)
Hope this link also helps
http://www.codediesel.com/mysql/sequence-gaps-in-mysql/
A more efficent query:
SELECT (t1.id + 1) as gap_starts_at,
(SELECT MIN(t3.id) -1 FROM my_table t3 WHERE t3.id > t1.id) as gap_ends_at
FROM my_table t1
WHERE NOT EXISTS (SELECT t2.id FROM my_table t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL
Rather than returning multiple ranges of IDs, if you instead want to retrieve every single missing ID itself, each one on its own row, you could do the following:
SELECT id+1 FROM table WHERE id NOT IN (SELECT id-1 FROM table) ORDER BY 1
The query is very efficient. However, it also includes one extra row on the end, which is equal to the highest ID number, plus 1. This last row can be ignored in your server script, by checking for the number of rows returned (mysqli_num_rows), and then using a for loop if the number of rows is greater than 1 (the query will always return at least one row).
Edit:
I recently discovered that my original solution did not return all ID numbers that are missing, in cases where missing numbers are contiguous (i.e. right next to each other). However, the query is still useful in working out whether or not there are numbers missing at all, very quickly, and would be a time saver when used in conjunction with hagensoft's query (top answer). In other words, this query could be run first to test for missing IDs. If anything is found, then hagensoft's query could be run immediately afterwards to help identify the exact IDs that are missing (no time saved, but not much slower at all). If nothing is found, then a considerable amount of time is potentially saved, as hagensoft's query would not need to be run.
To add a little to Ivan's answer, this version shows numbers missing at the beginning if 1 doesn't exist:
SELECT 1 as gap_starts_at,
(SELECT MIN(t4.id) -1 FROM testtable t4 WHERE t4.id > 1) as gap_ends_at
FROM testtable t5
WHERE NOT EXISTS (SELECT t6.id FROM testtable t6 WHERE t6.id = 1)
HAVING gap_ends_at IS NOT NULL limit 1
UNION
SELECT (t1.id + 1) as gap_starts_at,
(SELECT MIN(t3.id) -1 FROM testtable t3 WHERE t3.id > t1.id) as gap_ends_at
FROM testtable t1
WHERE NOT EXISTS (SELECT t2.id FROM testtable t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL;
It would be far more efficient to get the start of the gap in one query and the end of the gap in one query.
I had 18M records and it took me less than a second each to get the two results. When I tried getting them together my query timed out after an hour.
Get the start of gap:
SELECT (t1.id + 1) as MissingID
FROM sequence t1
WHERE NOT EXISTS
(SELECT t2.id
FROM sequence t2
WHERE t2.id = t1.id + 1);
Get the end of gap:
SELECT (t1.id - 1) as MissingID
FROM sequence t1
WHERE NOT EXISTS
(SELECT t2.id
FROM sequence t2
WHERE t2.id = t1.id - 1);
Above queries will give two columns so you can try this to get the missing numbers in a single column
select start from
(SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
FROM sequence AS a, sequence AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING start < MIN(b.id)) b
UNION
select c.end from (SELECT a.id+1 AS start, MIN(b.id) - 1 AS end
FROM sequence AS a, sequence AS b
WHERE a.id < b.id
GROUP BY a.id
HAVING start < MIN(b.id)) c order by start;
By using window functions (available in mysql 8)
finding the gaps in the id column can be expressed as:
WITH gaps AS
(
SELECT
LAG(id, 1, 0) OVER(ORDER BY id) AS gap_begin,
id AS gap_end,
id - LAG(id, 1, 0) OVER(ORDER BY id) AS gap
FROM test
)
SELECT
gap_begin,
gap_end
FROM gaps
WHERE gap > 1
;
if you are on the older version of the mysql you would have to rely on the variables (so called poor-man's window function idiom)
SELECT
gap_begin,
gap_end
FROM (
SELECT
#id_previous AS gap_begin,
id AS gap_end,
id - #id_previous AS gap,
#id_previous := id
FROM (
SELECT
t.id
FROM test t
ORDER BY t.id
) AS sorted
JOIN (
SELECT
#id_previous := 0
) AS init_vars
) AS gaps
WHERE gap > 1
;
if you want a lighter way to search millions of rows of data,
SET #st=0,#diffSt=0,#diffEnd=0;
SELECT res.startID, res.endID, res.diff
, CONCAT(
"SELECT * FROM lost_consumer WHERE ID BETWEEN "
,res.startID+1, " AND ", res.endID-1) as `query`
FROM (
SELECT
#diffSt:=(#st) `startID`
, #diffEnd:=(a.ID) `endID`
, #st:=a.ID `end`
, #diffEnd-#diffSt-1 `diff`
FROM consumer a
ORDER BY a.ID
) res
WHERE res.diff>0;
check out this http://sqlfiddle.com/#!9/3ea00c/9
I need to select how many days since there is a break in my data. It's easier to show:
Table format:
id (autoincrement), user_id (int), start (datetime), end (datetime)
Example data (times left out as only need days):
1, 5, 2011-12-18, 2011-12-18
2, 5, 2011-12-17, 2011-12-17
3, 5, 2011-12-16, 2011-12-16
4, 5, 2011-12-13, 2011-12-13
As you can see there would be a break between 2011-12-13 and 2011-12-16. Now, I need to be able say:
Using the date 2011-12-18, how many days are there until a break:
2011-12-18: Lowest sequential date = 2011-12-16: Total consecutive days: 3
Probably: DATE_DIFF(2011-12-18, 2011-12-16)
So my problem is, how can I select that 2011-12-16 is the lowest sequential date? Remembering that data applies for particular user_id's.
It's kinda like the example here: http://www.artfulsoftware.com/infotree/queries.php#72 but in the reverse.
I'd like this done in SQL only, no php code
Thanks
SELECT qmin.start, qmax.end, DATE_DIFF( qmax.end, qmin.start ) FROM table AS qmin
LEFT JOIN (
SELECT end FROM table AS t1
LEFT JOIN table AS t2 ON
t2.start > t1.end AND
t2.start < DATE_ADD( t1.end, 1 DAY )
WHERE t1.end >= '2011-12-18' AND t2.start IS NULL
ORDER BY end ASC LIMIT 1
) AS qmax
LEFT JOIN table AS t2 ON
t2.end < qmin.start AND
t2.end > DATE_DIFF( qmin.start, 1 DAY )
WHERE qmin.start <= '2011-12-18' AND t2.start IS NULL
ORDER BY end DESC LIMIT 1
This should work - left joins selects one date which can be in sequence, so max can be fineded out if you take the nearest record without sequential record ( t2.anyfield is null ) , same thing we do with minimal date.
If you can calculate days between in script - do it using unions ( eg 1. row - minimal, 2. row maximal )
Check this,
SELECT DATEDIFF((SELECT MAX(`start`) FROM testtbl WHERE `user_id`=1),
(select a.`start` from testtbl as a
left outer join testtbl as b on a.user_id = b.user_id
AND a.`start` = b.`start` + INTERVAL 1 DAY
where a.user_id=1 AND b.`start` is null
ORDER BY a.`start` desc LIMIT 1))
DATEDIFF() show difference of the Two days, if you want to number of consecutive days add one for that result.
If it's not a beauty contents then you may try something like:
select t.start, t2.start, datediff(t2.start, t.start) + 1 as consecutive_days
from tab t
join tab t2 on t2.start = (select min(start) from (
select c1.*, case when c2.id is null then 1 else 0 end as gap
from tab c1
left join tab c2 on c1.start = adddate(c2.start, -1)
) t4 where t4.start <= t.start and t4.start >= (select max(start) from (
select c1.*, case when c2.id is null then 1 else 0 end as gap
from tab c1
left join tab c2 on c1.start = adddate(c2.start, -1)
) t3 where t3.start <= t.start and t3.gap = 1))
where t.start = '2011-12-18'
Result should be:
start start consecutive_days
2011-12-18 2011-12-16 3