I have a list of value in my column. And want to query the range.
Eg. If values are 1,2,3,4,5,9,11,12,13,14,17,18,19
I want to display
1-5,9,11-14,17-19
Assuming that each value is stored on a separate row, you can use some gaps-and-island technique here:
select case when min(val) <> max(val)
then concat(min(val), '-', max(val))
else min(val)
end val_range
from (select val, row_number() over(order by val) rn from mytable) t
group by val - rn
order by min(val)
The idea is to build groups of consecutive values by taking the difference between the value and an incrementing rank, which is computed using row_number() (available in MySQL 8.0):
Demo on DB Fiddle:
| val_range |
| :-------- |
| 1-5 |
| 9 |
| 11-14 |
| 17-19 |
In earlier versions, you can emulate row_number() with a correlated subquery, or a user variable. The second option goes like:
select case when min(val) <> max(val)
then concat(min(val), '-', max(val))
else min(val)
end val_range
from (select #rn := 0) x
cross join (
select val, #rn := #rn + 1 rn
from (select val from mytable order by val) t
) t
group by val - rn
order by min(val)
As a complement to other answers:
select dn.val as dnval, min(up.val) as upval
from mytable up
join mytable dn
on dn.val <= up.val
where not exists (select 1 from mytable a where a.val = up.val + 1)
and not exists (select 1 from mytable b where b.val = dn.val - 1)
group by dn.val
order by dn.val;
1 5
9 9
11 14
17 19
Needless to say, but using an OLAP function like #GNB does, is orders of magnitude more efficient.
A short article on how to mimic OLAP functions in MySQL < 8 can be found at:
mysql-row_number
Fiddle
EDIT:
If another dimension is introduced (in this case p), something like:
select dn.p, dn.val as dnval, min(up.val) as upval
from mytable up
join mytable dn
on dn.val <= up.val
and dn.p = up.p
where not exists (select 1 from mytable a where a.val = up.val + 1 and a.p = up.p)
and not exists (select 1 from mytable b where b.val = dn.val - 1 and b.p = dn.p)
group by dn.p, dn.val
order by dn.p, dn.val;
can be used, see Fiddle2
Related
The goal
I am trying to write a query to find duplicate rows. A row is duplicate when either Column A or Column B is the same.
Writing it so that both need to be the same is easy; just a simple GROUP BY A, B.
However, filtering by just one of the two is proving to be a bit more difficult. How would one go about doing this?
I've tried the following:
select distinct a as col_a,
b as col_b,
(
select count(*)
from table_name
where a = col_a
or b = col_b
) as duplicate_count
from table_name
having duplicate_count > 1;
but it does not feel like the right way to go about this and with 84.000 rows it is also very slow.
Example
With the following table:
+----+------------------------+---+---------+
| id | name | a | b |
+----+------------------------+---+---------+
| 1 | Lorem ipsum | 1 | Donec |
+----+------------------------+---+---------+
| 2 | dolor sit | 2 | rhoncus |
+----+------------------------+---+---------+
| 3 | amet | 3 | rhoncus |
+----+------------------------+---+---------+
| 4 | consectetur adipiscing | 1 | primis |
+----+------------------------+---+---------+
| 5 | vulputate cursus | 4 | Aliquam |
+----+------------------------+---+---------+
Either result 1 or 4 (same A) and either result 2 or 3 (same B) should be returned, both with a duplicate_count of 2.
Which one of the two "duplicates" is returned does not matter.
Versions
On my local machine I use MySQL 5.7.24.
I just checked the live server, it uses 10.1.43-MariaDB.
You already know that this query:
select a, b
from tablename
group by a, b
having count(*) > 1
returns duplicates with both a and b equal.
You can get the rest of the duplicates for your requirement with EXISTS:
select t.a, t.b
from tablename t
where exists (
select 1 from tablename
where (a = t.a and b <> t.b) or (a <> t.a and b = t.b)
)
Or if you want them all use UNION ALL:
select a, b
from tablename
group by a, b
having count(*) > 1
union all
select t.a, t.b
from tablename t
where exists (
select 1 from tablename
where (a = t.a and b <> t.b) or (a <> t.a and b = t.b)
)
Update:
If you have an ID column then use EXISTS like this:
select t.*
from tablename t
where exists (
select 1 from tablename
where id <> t.id and (a = t.a or b = t.b)
)
Or if you want just 1 of the duplicates use id > t.id instead of id <> t.id.
See the demo.
Or with a self join:
select t.*
from tablename t inner join tablename tt
on (tt.a = t.a or tt.b = t.b) and tt.id <> t.id
Following solution works :
Another demo with a line that has duplication in a and b
CREATE TEMPORARY TABLE ab_duplicates (
a INTEGER
) AS
SELECT a, count(*) as cnt
FROM tablename
group by a, b
Having cnt > 1;
ALTER TABLE ab_duplicates ADD INDEX (a);
-- Select duplicates for a, but not for a and b
SELECT id, name, a, b
FROM (SELECT x.*, t.id, t.name, t.a, t.b,
#rn := IF(t.a = #a, #rn + 1, 1) rn,
#a := t.a,
ab.a as ab_exists
FROM (select #a := null, #rn := 0) x,
tablename t
LEFT JOIN ab_duplicates ab on ab.a = t.a
ORDER BY a
) a_duplicates
where rn = 2 and ab_exists is null
UNION
-- union duplicates for b, including duplicates for a and b
SELECT id, name, a, b
FROM (SELECT x.*, t.id, t.name, t.a, t.b,
#rn := IF(t.b = #b, #rn + 1, 1) rn,
#b := t.b
FROM (select #b := null, #rn := 0) x,
tablename t
ORDER BY b
) b_and_ab_duplicates
where rn = 2;
Previous solutions that only worked in some edge cases
Using group by and count() :
First finding ids with duplicates for a :
SELECT min(id) id, count(*) cnt from tablename t group by a having cnt > 1
-- this will work better if you have an index starting with a
Same with b :
SELECT min(id) id, count(*) cnt from tablename t group by b having cnt > 1
-- this will work better if you have an index starting with b
First solution :
Union gives you ids where there are duplicates for a or b requires 2 indices)
SELECT min(id) id, count(*) cnt from tablename t group by a having cnt > 1
UNION
SELECT min(id) id, count(*) cnt from tablename t group by b having cnt > 1
Use the ids to filter the table, if you need more data from the table :
SELECT tablename.*
FROM (
SELECT min(id) id, count(*) cnt from tablename t group by a having cnt > 1
UNION
SELECT min(id) id, count(*) cnt from tablename t group by b having cnt > 1
) as ids
JOIN tablename on tablename.id = ids.id
Now this might not use an index, but you can use a temporary table to have one :
First solution, using a temporary table (might be faster) :
-- using a temporary table to set an index
CREATE TEMPORARY TABLE ids (
-- adds an index on id, for the JOIN in the result query
`id` INTEGER PRIMARY KEY
) as
SELECT id
FROM (
-- duplicates on a, requires an index (a) on tablename
SELECT min(id) id, count(*) cnt from tablename t group by a having cnt > 1
-- removes duplicates between both part of the UNION : this might be slow
-- if there cannot be duplicates on a and b at the same time, consider using UNION ALL
UNION
-- duplicates on b, requires an index (b) on tablename
SELECT min(id) id, count(*) cnt from tablename t group by b having cnt > 1
) tempids;
SELECT tablename.*
FROM ids -- using the temporary table, MUST be in the same database connection, will filter duplicates
JOIN tablename on tablename.id = ids.id;
I do not know if setting the index on the temporary table is better then setting one after populating the data :
-- you might want to postpone the index after the ids are set
-- using a temporary table to set an index
CREATE TEMPORARY TABLE ids2 (
`id` INTEGER
) as
SELECT id
FROM (
-- duplicates on a, requires an index (a) on tablename
SELECT min(id) id, count(*) cnt from tablename t group by a having cnt > 1
-- removes duplicates between both part of the UNION : this might be slow
-- if there cannot be duplicates on a and b at the same time, consider using UNION ALL
UNION
-- duplicates on b, requires an index (b) on tablename
SELECT min(id) id, count(*) cnt from tablename t group by b having cnt > 1
) tempids;
ALTER TABLE ids2 ADD INDEX (id);
SELECT tablename.*
FROM ids2 -- using the temporary table, MUST be in the same database connection, will filter duplicates
JOIN tablename on tablename.id = ids2.id;
With mariadb 10.2, or mysql 8 you could use window function (I guess).
Another solution : using vars :
SELECT id, name, a, b, rn
FROM (SELECT *,
#rn := IF(a = #a, #rn + 1, 1) rn,
#a := a
FROM (select #a := null, #rn := 0) x,
tablename
ORDER BY a
) a_duplicates
where rn = 2
UNION
SELECT id, name, a, b, rn
FROM (SELECT *,
#rn := IF(b = #b, #rn + 1, 1) rn,
#b := b
FROM (select #b := null, #rn := 0) x,
tablename
ORDER BY b
) b_duplicates
where rn = 2
Demo : with some extra steps to understand
Edit : this only works if you don t have lines where a and b are duplicates. Which is the case in the example.
I have an headhache trying to get this simple request working in SQL on a very huge database, maybe some of you could help ?
ID|R1 |R2
1 | a | b
1 | c | d
2 | a | b
2 | c | d
I would like to make an sql select query to get instead :
ID|R1 |R2 |R3 |R4
1 | a | b | c | d
2 | a | b | c | d
Thank you for any help !
I'm going to offer a query which will have almost the same behavior as what you want, plus it will be fairly simple:
SELECT
ID, GROUP_CONCAT(val ORDER BY val) val
FROM
(
SELECT ID, R1 AS val FROM yourTable
UNION ALL
SELECT ID, R2 FROM yourTable
) t
GROUP BY ID;
Demo
This approach is desirable for several reasons. First, it is robust with regard to any arbitrary number of "columns" which a given ID might have. Second, it gives us the option to order each row of values any way we want. Finally, it will be much easier to maintain than an exact answer using session variables to simulate things like row number.
This is tricky, because you do not have enough ids in your table. One method is to use variables, add a sequence number, and aggregate:
select id,
max(case when rn = 1 then r1 end) as r1,
max(case when rn = 1 then r2 end) as r2,
max(case when rn = 2 then r1 end) as r3,
max(case when rn = 2 then r2 end) as r4
from (select t.*,
(#rn := if(#i = id, #rn + 1,
if(#i := id, 1, 1)
)
) as rn
from (select t.*
from t
order by t.id
) t cross join
(select #rn := 0, #i := -1) params
) t
group by id;
This answer is a little upgrade to #TimBiegeleisen answer.
if the table is big you also need to use
SET SESSION group_concat_max_len = ##max_allowed_packet;
This query wil convert the comma separted values from the GROUP_CONCAT function into columns by using nested SUBSTRING_INDEX functions.
Query
SELECT
ID
, SUBSTRING_INDEX(SUBSTRING_INDEX(val, ',', 1), ',', -1) AS r1
, SUBSTRING_INDEX(SUBSTRING_INDEX(val, ',', 2), ',', -1) AS r2
, SUBSTRING_INDEX(SUBSTRING_INDEX(val, ',', 3), ',', -1) AS r3
, SUBSTRING_INDEX(SUBSTRING_INDEX(val, ',', 4), ',', -1) AS r4
FROM (
SELECT
ID, GROUP_CONCAT(val ORDER BY val) val
FROM
(
SELECT ID, R1 AS val FROM yourTable
UNION ALL
SELECT ID, R2 FROM yourTable
) t
GROUP BY ID
) x
see demo http://rextester.com/SDF72100
I want to select data from table
suppose we have a table
table Temp
sequence_number | breakdown_number | physical_account | logical_account | debit_amount | credit_amount
----------------+------------------+------------------+-----------------+--------------+---------------
1 | 1 | 10001 | 10 | 0
2 | 1 | 0011 | 10 | 0
Now I have to select physical_account from 1st row and logical account from second row and insert it into another table in single row based on the breakdown number.
How can I do this ?
I am going to assume that sequence_number actually provides the ordering of the rows and you want to do this for each breakdown_number. The most accurate method is probably to use variables:
INSERT INTO second_table(physical_account, logical_account)
SELECT MAX(CASE WHEN seqnum = 1 THEN t.physical_account END),
MAX(CASE WHEN seqnum = 2 THEN t.logical_account END)
FROM (SELECT t.*,
(#rn := if(#b = t.breakdown_number, #rn + 1,
if(#b := t.breakdown_number, 1, 1)
)
) as seqnum
FROM Temp t CROSS JOIN
(SELECT #rn := 0, #b := -1) params
ORDER BY t.breakdown_number, t.sequence_number
) t
WHERE rn IN (1, 2)
GROUP BY t.breakdown_number;
If the sequence_number restarts at 1 for each breakdown_number, then the subquery and variables are not needed:
INSERT INTO second_table(physical_account, logical_account)
SELECT MAX(CASE WHEN t.sequence_number = 1 THEN t.physical_account END),
MAX(CASE WHEN t.sequence_number = 2 THEN t.logical_account END)
FROM Temp t
WHERE t.sequence_number IN (1, 2)
GROUP BY t.breakdown_number;
Finally, in some cases, you can just use a hack:
INSERT INTO second_table(physical_account, logical_account)
SELECT SUBSTRING_INDEX(GROUP_CONCAT(t.physical_account), ',', 1),
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(t.logical_account), ',', 2), ',', -1)
FROM Temp t
WHERE t.sequence_number IN (1, 2)
GROUP BY t.breakdown_number;
Notes about this approach:
It converts the accounts to strings, if they are of some other time.
group_concat() has a (configurable) maximum length, so if there are many records for a given breakdown_number, then you can get a run-time error.
You can use a sub query in the select with LIMIT OFFSET:
INSERT INTO second_table (physical_account, logical_account)
SELECT t.physical_account,
(SELECT s.logical_account FROM temp s
ORDER BY s.breakdown_number
LIMIT 1,1)
FROM Temp t
ORDER BY t.breakdown_number
LIMIT 1
This will select the first and second values based on breakdown_number on ACSENDING order.
DECLARE #physical_account varchar(30); /*Data Type as required*/
DECLARE #logical_account varchar(30);
SELECT #physical_account=physical_account FROM Temp WHERE logical_account=NULL AND physical_account='10001'
SELECT #logical_account=logical_account FROM Temp WHERE logical_account='0011' AND physical_account=NULL
INSERT INTO Table_New(physical_account, logical_account) VALUES(#physical_account, #logical_account);
So I was taking a test recently with some higher level SQL problems. I only have what I would consider "intermediate" experience in SQL and I've been working on this for a day or so now. I just can't figure it out.
Here's the problem:
You have a table with 4 columns as such:
EmployeeID int unique
EmployeeType int
EmployeeSalary int
Created date
Goal: I need to retrieve the difference between the latest two EmployeeSalary for any EmployeeType with more than 1 entry. It has to be done in one statement (nested queries are fine).
Example Data Set: http://sqlfiddle.com/#!9/0dfc7
EmployeeID | EmployeeType | EmployeeSalary | Created
-----------|--------------|----------------|--------------------
1 | 53 | 50 | 2015-11-15 00:00:00
2 | 66 | 20 | 2014-11-11 04:20:23
3 | 66 | 30 | 2015-11-03 08:26:21
4 | 66 | 10 | 2013-11-02 11:32:47
5 | 78 | 70 | 2009-11-08 04:47:47
6 | 78 | 45 | 2006-11-01 04:42:55
So for this data set, the proper return would be:
EmployeeType | EmployeeSalary
-------------|---------------
66 | 10
78 | 25
The 10 comes from subtracting the latest two EmployeeSalary values (30 - 20) for the EmployeeType of 66. The 25 comes from subtracting the latest two EmployeeSalary values (70-45) for EmployeeType of 78. We skip EmployeeID 53 completely because it only has one value.
This one has been destroying my brain. Any clues?
Thanks!
How to make really simple query complex?
One funny way(not best performance) to do it is:
SELECT final.EmployeeType, SUM(salary) AS difference
FROM (
SELECT b.EmployeeType, b.EmployeeSalary AS salary
FROM tab b
JOIN (SELECT EmployeeType, GROUP_CONCAT(EmployeeSalary ORDER BY Created DESC) AS c
FROM tab
GROUP BY EmployeeType
HAVING COUNT(*) > 1) AS sub
ON b.EmployeeType = sub.EmployeeType
AND FIND_IN_SET(b.EmployeeSalary, sub.c) = 1
UNION ALL
SELECT b.EmployeeType, -b.EmployeeSalary AS salary
FROM tab b
JOIN (SELECT EmployeeType, GROUP_CONCAT(EmployeeSalary ORDER BY Created DESC) AS c
FROM tab
GROUP BY EmployeeType
HAVING COUNT(*) > 1) AS sub
ON b.EmployeeType = sub.EmployeeType
AND FIND_IN_SET(b.EmployeeSalary, sub.c) = 2
) AS final
GROUP BY final.EmployeeType;
SqlFiddleDemo
EDIT:
The keypoint is MySQL doesn't support windowed function so you need to use equivalent code:
For example solution in SQL Server:
SELECT EmployeeType, SUM(CASE rn WHEN 1 THEN EmployeeSalary
ELSE -EmployeeSalary END) AS difference
FROM (SELECT *,
ROW_NUMBER() OVER(PARTITION BY EmployeeType ORDER BY Created DESC) AS rn
FROM #tab
) AS sub
WHERE rn IN (1,2)
GROUP BY EmployeeType
HAVING COUNT(EmployeeType) > 1
LiveDemo
And MySQL equivalent:
SELECT EmployeeType, SUM(CASE rn WHEN 1 THEN EmployeeSalary
ELSE -EmployeeSalary END) AS difference
FROM (
SELECT t1.EmployeeType, t1.EmployeeSalary,
count(t2.Created) + 1 as rn
FROM #tab t1
LEFT JOIN #tab t2
ON t1.EmployeeType = t2.EmployeeType
AND t1.Created < t2.Created
GROUP BY t1.EmployeeType, t1.EmployeeSalary
) AS sub
WHERE rn IN (1,2)
GROUP BY EmployeeType
HAVING COUNT(EmployeeType) > 1;
LiveDemo2
The dataset of the fiddle is different from the example above, which is confusing (not to mention a little perverse). Anyway, there's lots of ways to skin this particular cat. Here's one (not the fastest, however):
SELECT a.employeetype, ABS(a.employeesalary-b.employeesalary) diff
FROM
( SELECT x.*
, COUNT(*) rank
FROM employees x
JOIN employees y
ON y.employeetype = x.employeetype
AND y.created >= x.created
GROUP
BY x.employeetype
, x.created
) a
JOIN
( SELECT x.*
, COUNT(*) rank
FROM employees x
JOIN employees y
ON y.employeetype = x.employeetype
AND y.created >= x.created
GROUP
BY x.employeetype
, x.created
) b
ON b.employeetype = a.employeetype
AND b.rank = a.rank+1
WHERE a.rank = 1;
a very similar but faster solution looks like this (although you sometimes need to assign different variables between tables a and b - for reasons I still don't fully understand)...
SELECT a.employeetype
, ABS(a.employeesalary-b.employeesalary) diff
FROM
( SELECT x.*
, CASE WHEN #prev = x.employeetype THEN #i:=#i+1 ELSE #i:=1 END i
, #prev := x.employeetype prev
FROM employees x
, (SELECT #prev := 0, #i:=1) vars
ORDER
BY x.employeetype
, x.created DESC
) a
JOIN
( SELECT x.*
, CASE WHEN #prev = x.employeetype THEN #i:=#i+1 ELSE #i:=1 END i
, #prev := x.employeetype prev
FROM employees x
, (SELECT #prev := 0, #i:=1) vars
ORDER
BY x.employeetype
, x.created DESC
) b
ON b.employeetype = a.employeetype
AND b.i = a.i + 1
WHERE a.i = 1;
id value
---------
1 a
2 b
3 c
4 a
5 t
6 y
7 a
I want to select all rows where the value is 'a' and the row before it
id value
---------
1 a
3 c
4 a
6 y
7 a
I looked into
but I want to get all such rows in one query.
Please help me start
Thank you
I think the easiest way might be to use variables:
select t.*
from (select t.*,
(rn := if(value = 'a', 1, #rn + 1) as rn
from table t cross join
(select #rn := 0) params
order by id desc
) t
where rn in (1, 2)
order by id;
An alternative method uses a correlated subquery to get the previous value and then uses this in the where clause:
select t.*
from (select t.*,
(select t2.value
from table t2
where t2.id < t.id
order by t2.id desc
limit 1
) as prev_value
from table t
) t
where value = 'a' or prev_value = 'a';
With an index on id, this might even be faster than the method using variables.