Let's say I have the following very simple schema:
Create Table MyTable (
PrimaryKey int,
Column1 datetime.
Column2 int
)
I need a query that orders the data based on Column1, and finds the first 10 consecutive rows where the value of Column2 in the current row is greater than the value of column2 in the prior row.
Q is used to get a ranking value rn ordered by Column1. Added in PrimaryKey in case there are ties in Column1. C is a recursive CTE that loops from the top ordered by rn incrementing cc for each increasing value of Column2. It will break from the recursion when cc reaches 10. Finally get the last 10 rows from C. The where clause takes care of the case when there are no 10 consecutive increasing values.
with Q as
(
select PrimaryKey,
Column1,
Column2,
row_number() over(order by Column1, PrimaryKey) as rn
from MyTable
),
C as
(
select PrimaryKey,
Column1,
Column2,
rn,
1 as cc
from Q
where rn = 1
union all
select Q.PrimaryKey,
Q.Column1,
Q.Column2,
Q.rn,
case
when Q.Column2 > C.Column2 then C.cc + 1
else 1
end
from Q
inner join C
on Q.rn - 1 = C.rn
where C.cc < 10
)
select top 10 *
from C
where 10 in (select cc from C)
order by rn desc
option (maxrecursion 0)
Version 2
As Martin Smith pointed out in a comment, the above query has really bad performance. The culprit is the first CTE. The version below use table variable to hold the ranked rows. The primary key directive on rn creates an index that will be used in the join in the recursive part of the query. Apart from the table variable this does the same as above.
declare #T table
(
PrimaryKey int,
Column1 datetime,
Column2 int,
rn int primary key
);
insert into #T
select PrimaryKey,
Column1,
Column2,
row_number() over(order by Column1, PrimaryKey) as rn
from MyTable;
with C as
(
select PrimaryKey,
Column1,
Column2,
rn,
1 as cc
from #T
where rn = 1
union all
select T.PrimaryKey,
T.Column1,
T.Column2,
T.rn,
case
when T.Column2 > C.Column2 then C.cc + 1
else 1
end
from #T as T
inner join C
on T.rn = C.rn + 1
where C.cc < 10
)
select top 10 *
from C
where 10 in (select cc from C)
order by rn desc
option (maxrecursion 0)
Related
I have a list of value in my column. And want to query the range.
Eg. If values are 1,2,3,4,5,9,11,12,13,14,17,18,19
I want to display
1-5,9,11-14,17-19
Assuming that each value is stored on a separate row, you can use some gaps-and-island technique here:
select case when min(val) <> max(val)
then concat(min(val), '-', max(val))
else min(val)
end val_range
from (select val, row_number() over(order by val) rn from mytable) t
group by val - rn
order by min(val)
The idea is to build groups of consecutive values by taking the difference between the value and an incrementing rank, which is computed using row_number() (available in MySQL 8.0):
Demo on DB Fiddle:
| val_range |
| :-------- |
| 1-5 |
| 9 |
| 11-14 |
| 17-19 |
In earlier versions, you can emulate row_number() with a correlated subquery, or a user variable. The second option goes like:
select case when min(val) <> max(val)
then concat(min(val), '-', max(val))
else min(val)
end val_range
from (select #rn := 0) x
cross join (
select val, #rn := #rn + 1 rn
from (select val from mytable order by val) t
) t
group by val - rn
order by min(val)
As a complement to other answers:
select dn.val as dnval, min(up.val) as upval
from mytable up
join mytable dn
on dn.val <= up.val
where not exists (select 1 from mytable a where a.val = up.val + 1)
and not exists (select 1 from mytable b where b.val = dn.val - 1)
group by dn.val
order by dn.val;
1 5
9 9
11 14
17 19
Needless to say, but using an OLAP function like #GNB does, is orders of magnitude more efficient.
A short article on how to mimic OLAP functions in MySQL < 8 can be found at:
mysql-row_number
Fiddle
EDIT:
If another dimension is introduced (in this case p), something like:
select dn.p, dn.val as dnval, min(up.val) as upval
from mytable up
join mytable dn
on dn.val <= up.val
and dn.p = up.p
where not exists (select 1 from mytable a where a.val = up.val + 1 and a.p = up.p)
and not exists (select 1 from mytable b where b.val = dn.val - 1 and b.p = dn.p)
group by dn.p, dn.val
order by dn.p, dn.val;
can be used, see Fiddle2
The goal
I am trying to write a query to find duplicate rows. A row is duplicate when either Column A or Column B is the same.
Writing it so that both need to be the same is easy; just a simple GROUP BY A, B.
However, filtering by just one of the two is proving to be a bit more difficult. How would one go about doing this?
I've tried the following:
select distinct a as col_a,
b as col_b,
(
select count(*)
from table_name
where a = col_a
or b = col_b
) as duplicate_count
from table_name
having duplicate_count > 1;
but it does not feel like the right way to go about this and with 84.000 rows it is also very slow.
Example
With the following table:
+----+------------------------+---+---------+
| id | name | a | b |
+----+------------------------+---+---------+
| 1 | Lorem ipsum | 1 | Donec |
+----+------------------------+---+---------+
| 2 | dolor sit | 2 | rhoncus |
+----+------------------------+---+---------+
| 3 | amet | 3 | rhoncus |
+----+------------------------+---+---------+
| 4 | consectetur adipiscing | 1 | primis |
+----+------------------------+---+---------+
| 5 | vulputate cursus | 4 | Aliquam |
+----+------------------------+---+---------+
Either result 1 or 4 (same A) and either result 2 or 3 (same B) should be returned, both with a duplicate_count of 2.
Which one of the two "duplicates" is returned does not matter.
Versions
On my local machine I use MySQL 5.7.24.
I just checked the live server, it uses 10.1.43-MariaDB.
You already know that this query:
select a, b
from tablename
group by a, b
having count(*) > 1
returns duplicates with both a and b equal.
You can get the rest of the duplicates for your requirement with EXISTS:
select t.a, t.b
from tablename t
where exists (
select 1 from tablename
where (a = t.a and b <> t.b) or (a <> t.a and b = t.b)
)
Or if you want them all use UNION ALL:
select a, b
from tablename
group by a, b
having count(*) > 1
union all
select t.a, t.b
from tablename t
where exists (
select 1 from tablename
where (a = t.a and b <> t.b) or (a <> t.a and b = t.b)
)
Update:
If you have an ID column then use EXISTS like this:
select t.*
from tablename t
where exists (
select 1 from tablename
where id <> t.id and (a = t.a or b = t.b)
)
Or if you want just 1 of the duplicates use id > t.id instead of id <> t.id.
See the demo.
Or with a self join:
select t.*
from tablename t inner join tablename tt
on (tt.a = t.a or tt.b = t.b) and tt.id <> t.id
Following solution works :
Another demo with a line that has duplication in a and b
CREATE TEMPORARY TABLE ab_duplicates (
a INTEGER
) AS
SELECT a, count(*) as cnt
FROM tablename
group by a, b
Having cnt > 1;
ALTER TABLE ab_duplicates ADD INDEX (a);
-- Select duplicates for a, but not for a and b
SELECT id, name, a, b
FROM (SELECT x.*, t.id, t.name, t.a, t.b,
#rn := IF(t.a = #a, #rn + 1, 1) rn,
#a := t.a,
ab.a as ab_exists
FROM (select #a := null, #rn := 0) x,
tablename t
LEFT JOIN ab_duplicates ab on ab.a = t.a
ORDER BY a
) a_duplicates
where rn = 2 and ab_exists is null
UNION
-- union duplicates for b, including duplicates for a and b
SELECT id, name, a, b
FROM (SELECT x.*, t.id, t.name, t.a, t.b,
#rn := IF(t.b = #b, #rn + 1, 1) rn,
#b := t.b
FROM (select #b := null, #rn := 0) x,
tablename t
ORDER BY b
) b_and_ab_duplicates
where rn = 2;
Previous solutions that only worked in some edge cases
Using group by and count() :
First finding ids with duplicates for a :
SELECT min(id) id, count(*) cnt from tablename t group by a having cnt > 1
-- this will work better if you have an index starting with a
Same with b :
SELECT min(id) id, count(*) cnt from tablename t group by b having cnt > 1
-- this will work better if you have an index starting with b
First solution :
Union gives you ids where there are duplicates for a or b requires 2 indices)
SELECT min(id) id, count(*) cnt from tablename t group by a having cnt > 1
UNION
SELECT min(id) id, count(*) cnt from tablename t group by b having cnt > 1
Use the ids to filter the table, if you need more data from the table :
SELECT tablename.*
FROM (
SELECT min(id) id, count(*) cnt from tablename t group by a having cnt > 1
UNION
SELECT min(id) id, count(*) cnt from tablename t group by b having cnt > 1
) as ids
JOIN tablename on tablename.id = ids.id
Now this might not use an index, but you can use a temporary table to have one :
First solution, using a temporary table (might be faster) :
-- using a temporary table to set an index
CREATE TEMPORARY TABLE ids (
-- adds an index on id, for the JOIN in the result query
`id` INTEGER PRIMARY KEY
) as
SELECT id
FROM (
-- duplicates on a, requires an index (a) on tablename
SELECT min(id) id, count(*) cnt from tablename t group by a having cnt > 1
-- removes duplicates between both part of the UNION : this might be slow
-- if there cannot be duplicates on a and b at the same time, consider using UNION ALL
UNION
-- duplicates on b, requires an index (b) on tablename
SELECT min(id) id, count(*) cnt from tablename t group by b having cnt > 1
) tempids;
SELECT tablename.*
FROM ids -- using the temporary table, MUST be in the same database connection, will filter duplicates
JOIN tablename on tablename.id = ids.id;
I do not know if setting the index on the temporary table is better then setting one after populating the data :
-- you might want to postpone the index after the ids are set
-- using a temporary table to set an index
CREATE TEMPORARY TABLE ids2 (
`id` INTEGER
) as
SELECT id
FROM (
-- duplicates on a, requires an index (a) on tablename
SELECT min(id) id, count(*) cnt from tablename t group by a having cnt > 1
-- removes duplicates between both part of the UNION : this might be slow
-- if there cannot be duplicates on a and b at the same time, consider using UNION ALL
UNION
-- duplicates on b, requires an index (b) on tablename
SELECT min(id) id, count(*) cnt from tablename t group by b having cnt > 1
) tempids;
ALTER TABLE ids2 ADD INDEX (id);
SELECT tablename.*
FROM ids2 -- using the temporary table, MUST be in the same database connection, will filter duplicates
JOIN tablename on tablename.id = ids2.id;
With mariadb 10.2, or mysql 8 you could use window function (I guess).
Another solution : using vars :
SELECT id, name, a, b, rn
FROM (SELECT *,
#rn := IF(a = #a, #rn + 1, 1) rn,
#a := a
FROM (select #a := null, #rn := 0) x,
tablename
ORDER BY a
) a_duplicates
where rn = 2
UNION
SELECT id, name, a, b, rn
FROM (SELECT *,
#rn := IF(b = #b, #rn + 1, 1) rn,
#b := b
FROM (select #b := null, #rn := 0) x,
tablename
ORDER BY b
) b_duplicates
where rn = 2
Demo : with some extra steps to understand
Edit : this only works if you don t have lines where a and b are duplicates. Which is the case in the example.
id value
---------
1 a
2 b
3 c
4 a
5 t
6 y
7 a
I want to select all rows where the value is 'a' and the row before it
id value
---------
1 a
3 c
4 a
6 y
7 a
I looked into
but I want to get all such rows in one query.
Please help me start
Thank you
I think the easiest way might be to use variables:
select t.*
from (select t.*,
(rn := if(value = 'a', 1, #rn + 1) as rn
from table t cross join
(select #rn := 0) params
order by id desc
) t
where rn in (1, 2)
order by id;
An alternative method uses a correlated subquery to get the previous value and then uses this in the where clause:
select t.*
from (select t.*,
(select t2.value
from table t2
where t2.id < t.id
order by t2.id desc
limit 1
) as prev_value
from table t
) t
where value = 'a' or prev_value = 'a';
With an index on id, this might even be faster than the method using variables.
current situation is to add below value of A01, B03, Z11 and X21 in repetitive way in field code for 400 hundreds row of data in table BabyCode.
Above is current table - without value in 'Code" column
Above is to be updated table - repetitive value is added in 'Code' column
You can do this:
INSERT INTO BabyCode
SELECT Codes.Code
FROM
(
SELECT id
FROM
(
SELECT t3.digit * 100 + t2.digit * 10 + t1.digit + 1 AS id
FROM TEMP AS t1
CROSS JOIN TEMP AS t2
CROSS JOIN TEMP AS t3
) t
WHERE id <= 400
) t,
(
SELECT 1 AS ID, 'A01' AS Code
UNION ALL
SELECT 2, 'B03'
UNION ALL
SELECT 3, 'Z11'
UNION ALL
SELECT 4, 'X21'
) codes;
But you will need to define a temp table, to use as an anchor table:
CREATE TABLE TEMP (Digit int);
INSERT INTO Temp VALUES(0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
SQL Fiddle Demo
This will insert 400 hundred rows of the values A01, B03, Z11, and X21, into the code column in the table BabyCode.
You could put the four values into a virtual table identical to that used in #Mahmoud Gamal's answer, and, if the ID values in your table start at 1 and are sequential (have neither gaps nor duplicates), you could use the following method to join to the virtual table and update the target's Code column:
UPDATE YourTable t
INNER JOIN (
SELECT 1 AS ID, 'A01' AS Code
UNION ALL SELECT 2, 'B03'
UNION ALL SELECT 3, 'Z11'
UNION ALL SELECT 4, 'X21'
) x
ON (t.ID - 1) MOD 4 + 1 = x.ID
SET t.Code = x.Code
;
Otherwise you could use variables to assign 1, 2, 3, 4 sequentially to every row of your table, then you would be able join to the virtual table using those values:
UPDATE YourTable t
INNER JOIN (
SELECT ID, #rnk := CASE WHEN #rnk = 4 THEN 0 ELSE #rnk END + 1 AS rnk
FROM YourTable
CROSS JOIN (SELECT #rnk := 0) x
ORDER BY ID
) r ON t.ID = r.ID
INNER JOIN (
SELECT 1 AS ID, 'A01' AS Code
UNION ALL SELECT 2, 'B03'
UNION ALL SELECT 3, 'Z11'
UNION ALL SELECT 4, 'X21'
) x
ON r.rnk = x.ID
SET t.Code = x.Code
;
Both queries can be played with at SQL Fiddle:
Method 1
Method 2
I want to find out the longest sequence of letter in a string
e.g. in the word Honorificabcdwert , the output will be abcd.
How can I do it?
My idea is to get the Ascii and then count the sequence until it breaks at some point. But I was able to proceed with only
DECLARE #t TABLE(ID INT IDENTITY,String VARCHAR(100))
INSERT INTO #t SELECT 'Honorificabcdwert'
;with Get_Individual_Chars_Cte AS
(
SELECT
ID
,Row_ID =ROW_NUMBER() Over(PARTITION by ID Order by ID)
,SUBSTRING(String,Number,1) AS [Char]
,ASCII(SUBSTRING(String,Number,1)) AS [Ascii Value]
FROM #t
INNER JOIN master.dbo.spt_values ON
Number BETWEEN 1 AND LEN(String)
AND type='P'
)
Select * from Get_Individual_Chars_Cte
After this I don't know what to do. Help needed for this or any other way of doing so.
Will this help
DECLARE #t TABLE(ID INT IDENTITY,String VARCHAR(100))
INSERT INTO #t
SELECT 'Honorificabcdwert' UNION ALL
SELECT 'AbCdEfxy' UNION ALL
SELECT 'abc1234defg' UNION ALL
SELECT 'XYZABCPPCKLMIDBABC' UNION ALL
SELECT 'MNOP$%^&~()MNOPQRS;:'
SELECT ID, OriginalString,Sequence
FROM (SELECT ID, REPLACE(string,'%','') AS Sequence,OriginalString,
ROW_NUMBER() OVER(PARTITION BY ID ORDER BY LEN(string) DESC, string) AS rn
FROM (SELECT OriginalString = b.String, CASE WHEN b.String LIKE a.strings THEN a.strings ELSE NULL END AS string,
b.ID, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY LEN(strings) DESC, strings) AS rn
FROM (SELECT COALESCE('%' + b.strings+a.strings + '%','%' + a.strings + '%') AS strings
FROM (SELECT SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZ',t1.N,t2.N-t1.N+1) AS strings, t1.N
FROM (VALUES(1),(2),(3),(4),(5),(6),(7),(8),
(9),(10),(11),(12),(13),(14),(15),
(16),(17),(18),(19),(20),(21),(22),
(23),(24),(25),(26)) t1(N)
CROSS JOIN (VALUES(1),(2),(3),(4),(5),(6),(7),(8),
(9),(10),(11),(12),(13),(14),(15),
(16),(17),(18),(19),(20),(21),(22),
(23),(24),(25),(26)) t2(N)
WHERE t1.N <= t2.N) a
LEFT OUTER JOIN (SELECT REVERSE(SUBSTRING('ZYXWVUTSRQPONMLKJIHGFEDCBA',1,N)) AS strings, 1 AS ID
FROM (VALUES(1),(2),(3),(4),(5),(6),(7),(8),
(9),(10),(11),(12),(13),(14),(15),
(16),(17),(18),(19),(20),(21),(22),
(23),(24),(25),(26)) t1(N)
UNION ALL SELECT '', 1) b ON a.N = b.ID) a
CROSS JOIN #t b) a ) a
WHERE a.rn = 1
ORDER BY a.ID
Result
ID OriginalString Sequence
1 Honorificabcdwert ABCD
2 AbCdEfxy ABCDEF
3 abc1234defg DEFG
4 XYZABCPPCKLMIDBABC XYZABC
5 MNOP$%^&~()MNOPQRS;: MNOPQRS
Based on your inputs provided in the course of discussion with #Martin Smith, the program is being developed. Please test it and let me know if it satisfies your requirement.
For consecutive rows with characters rising in alphabetical order (equating alphabetical order with ASCII order here) ROW_NUMBER() OVER (ORDER BY Row_ID) - [Ascii Value] will be the same.
This is not sufficient on its own however as for the string ABCZE that would put E in the same group as ABC so then you need a second operation to find gaps in that grouping sequence.
Something like the following should do it.
DECLARE #t TABLE(ID INT IDENTITY,String VARCHAR(100))
INSERT INTO #t SELECT 'Honorificabcdwfrt'
;with Get_Individual_Chars_Cte AS
(
SELECT
ID
,Row_ID =ROW_NUMBER() Over(PARTITION by ID Order by ID)
,SUBSTRING(String,number,1) AS [Char]
,ASCII(SUBSTRING(String,number,1)) AS [Ascii Value]
FROM #t
INNER JOIN master.dbo.spt_values ON
number BETWEEN 1 AND LEN(String)
AND type='P'
)
, T1 AS
(
Select *,
ROW_NUMBER() OVER (ORDER BY Row_ID) - [Ascii Value] AS RN
from Get_Individual_Chars_Cte
), T2 AS
(
SELECT *,
ROW_NUMBER() OVER (ORDER BY Row_ID) -
ROW_NUMBER() OVER (PARTITION BY RN ORDER BY Row_ID) AS Grp
FROM T1
)
SELECT TOP 1 WITH TIES *
FROM T2
ORDER BY COUNT(*) OVER (PARTITION BY RN, Grp) DESC