I am currently using the below merge code to migrate date from source to target. I have a new requirement to extend the below code to delete the record from source once an update/insert is performed on the target. Is this possible using merge(all the examples i see on the net had performing del/insert/update in the target not on the source)
MERGE Target1 AS T
USING Source1 AS S
ON (T.EmployeeID = S.EmployeeID)
WHEN NOT MATCHED BY TARGET AND S.EmployeeName LIKE 'S%'
THEN INSERT(EmployeeID, EmployeeName) VALUES(S.EmployeeID, S.EmployeeName)
WHEN MATCHED
THEN UPDATE SET T.EmployeeName = S.EmployeeName
WHEN NOT MATCHED BY SOURCE AND T.EmployeeName LIKE 'S%'
THEN DELETE ;
You can use the output clause to capture the modified/inserted rows to a table variable and use that with a delete statement after the merge.
DECLARE #T TABLE(EmployeeID INT);
MERGE Target1 AS T
USING Source1 AS S
ON (T.EmployeeID = S.EmployeeID)
WHEN NOT MATCHED BY TARGET AND S.EmployeeName LIKE 'S%'
THEN INSERT(EmployeeID, EmployeeName) VALUES(S.EmployeeID, S.EmployeeName)
WHEN MATCHED
THEN UPDATE SET T.EmployeeName = S.EmployeeName
WHEN NOT MATCHED BY SOURCE AND T.EmployeeName LIKE 'S%'
THEN DELETE
OUTPUT S.EmployeeID INTO #T;
DELETE Source1
WHERE EmployeeID in (SELECT EmployeeID
FROM #T);
Nice Reponse, but your code will delete the row from your destination table, here's an exemple in wich you can delete the rows from your source destination without affecting your target table :
if OBJECT_ID('audit.tmp1') IS NOT NULL
DROP TABLE audit.tmp1
select *
into audit.tmp1
from
(
select 1 id, 'aa' nom, convert(date,'2014-01-01') as dd UNION ALL
select 2 id, 'bb' nom, convert(date,'2013-07-12') as dd UNION ALL
select 3 id, 'cc' nom, convert(date,'2012-08-21') as dd UNION ALL
select 4 id, 'dd' nom, convert(date,'2011-11-15') as dd UNION ALL
select 5 id, 'ee' nom, convert(date,'2010-05-16') as dd ) T
if OBJECT_ID('audit.tmp2') IS NOT NULL
DROP TABLE audit.tmp2
select *
into audit.tmp2
from
(
select 1 id, 'aAa' nom, convert(date,'2014-01-14') as dd UNION ALL
select 2 id, 'bbB' nom, convert(date,'2013-06-13') as dd UNION ALL
select 4 id, 'dDD' nom, convert(date,'2012-11-05') as dd UNION ALL
select 6 id, 'FFf' nom, convert(date,'2014-01-12') as dd) T
SELECT * FROM audit.tmp1 order by 1
SELECT * FROM audit.tmp2 order by 1
DECLARE #T TABLE(ID INT);
MERGE audit.tmp2 WITH (HOLDLOCK) AS T
USING (SELECT * FROM audit.tmp1 WHERE nom <> 'dd') AS S
ON (T.id = S.id)
WHEN NOT MATCHED BY TARGET
THEN INSERT(id, nom, dd) VALUES(S.id, S.nom, S.dd)
WHEN MATCHED
THEN UPDATE SET T.nom = S.nom, T.dd = S.dd
WHEN NOT MATCHED BY SOURCE
THEN UPDATE SET T.id = T.id OUTPUT S.id INTO #T;
DELETE tmp1
FROM audit.tmp1
INNER JOIN
#T AS DEL
ON DEL.id = tmp1 .id
SELECT * FROM audit.tmp1 ORDER BY 1
SELECT * FROM audit.tmp2 ORDER BY 1
I hope this will help you.
In our case, we wanted to use MERGE to synchronize our internal database with an outside source of a different structure. Automated CASCADE settings were not an option because we enjoy many cyclical relationships and, really, we don't like that kind of cheap power in the hands of disgruntled staffers. We can't delete parent rows before their child rows are gone.
All of this is done with lightning fast MERGEs that use Table Value Parameters. They provide, by far, the best performance with obscenely low app memory overhead.
Combining scattered advice for the MERGE of Orders data...
CREATE PROCEDURE MyOrderMerge #SourceValues [MyOrderSqlUserType] READONLY
AS
BEGIN
DECLARE #LiveRows TABLE (MergeAction VARCHAR(20), OrderId INT);
DECLARE #DeleteCount INT;
SET #DeleteCount = 0;
MERGE INTO [Order] AS [target]
USING ( SELECT sv.OrderNumber,
c.CustomerId,
st.ShipTypeId
sv.OrderDate,
sv.IsPriority
FROM #SourceValues sv
JOIN [Customer] c ON sv.[CustomerName] = c.[CustomerName]
JOIN [ShipType] st ON ...
) AS [stream]
ON [stream].[OrderNumber] = [target].[SourceOrderNumber]
WHEN MATCHED THEN
UPDATE
...
WHEN NOT MATCHED BY TARGET THEN
INSERT
---
-- Keep a tally of all active source records
-- SQL Server's "INSERTED." prefix encompases both INSERTed and UPDATEd rows <insert very bad words here>
OUTPUT $action, INSERTED.[OrderId] INTO #LiveRows
; -- MERGE has ended
-- Delete child OrderItem rows before parent Order rows
DELETE FROM [OrderItem]
FROM [OrderItem] oi
-- Delete the Order Items that no longer exist at the source
LEFT JOIN #LiveRows lr ON oi.[OrderId] = lr.[OrderId]
WHERE lr.OrderId IS NULL
;
SET #DeleteCount = #DeleteCount + ##ROWCOUNT;
-- Delete parent Order rows that no longer have child Order Item rows
DELETE FROM [Order]
FROM [Order] o
-- Delete the Orders that no longer exist at the source
LEFT JOIN #LiveRows lr ON o.[OrderId] = lr.[OrderId]
WHERE lr.OrderId IS NULL
;
SET #DeleteCount = #DeleteCount + ##ROWCOUNT;
SELECT MergeAction, COUNT(*) AS ActionCount FROM #LiveRows GROUP BY MergeAction
UNION
SELECT 'DELETE' AS MergeAction, #DeleteCount AS ActionCount
;
END
Everything is done in one sweet loop-dee-loop streamed round trip and highly optimized on key indexes. Even though internal primary key values are unknown from the source, the MERGE operation makes them available to the DELETE operations.
The Customer MERGE uses a different #LiveRows TABLE structure, consequentially a different OUTPUT statement and different DELETE statements...
CREATE PROCEDURE MyCustomerMerge #SourceValues [MyCustomerSqlUserType] READONLY
AS
BEGIN
DECLARE #LiveRows TABLE (MergeAction VARCHAR(20), CustomerId INT);
DECLARE #DeleteCount INT;
SET #DeleteCount = 0;
MERGE INTO [Customer] AS [target]
...
OUTPUT $action, INSERTED.[CustomerId] INTO #LiveRows
; -- MERGE has ended
-- Delete child OrderItem rows before parent Order rows
DELETE FROM [OrderItem]
FROM [OrderItem] oi
JOIN [Order] o ON oi.[OrderId] = o.[OrderId]
-- Delete the Order Items that no longer exist at the source
LEFT JOIN #LiveRows lr ON o.[CustomerId] = lr.[CustomerId]
WHERE lr.CustomerId IS NULL
;
SET #DeleteCount = #DeleteCount + ##ROWCOUNT;
-- Delete child Order rows before parent Customer rows
DELETE FROM [Order]
FROM [Order] o
-- Delete the Orders that no longer exist at the source
LEFT JOIN #LiveRows lr ON o.[CustomerId] = lr.[CustomerId]
WHERE lr.CustomerId IS NULL
;
SET #DeleteCount = #DeleteCount + ##ROWCOUNT;
-- Delete parent Customer rows that no longer have child Order or grandchild Order Item rows
DELETE FROM [Customer]
FROM [Customer] c
-- Delete the Customers that no longer exist at the source
LEFT JOIN #LiveRows lr ON c.[CustomerId] = lr.[CustomerId]
WHERE lr.CustomerId IS NULL
;
SET #DeleteCount = #DeleteCount + ##ROWCOUNT;
SELECT MergeAction, COUNT(*) AS ActionCount FROM #LiveRows GROUP BY MergeAction
UNION
SELECT 'DELETE' AS MergeAction, #DeleteCount AS ActionCount
;
END
Setup and maintenance is a bit of a pain - but so worth the efficiencies reaped.
you can also use below code
drop table energydata
create table temp_energydata
(
webmeterID int,
DT DateTime ,
kWh varchar(10)
)
Insert into temp_energydata
select 1,getdate()-10, 120
union
select 2,getdate()-9, 140
union
select 3,getdate()-6, 37
union
select 4,getdate()-3, 40
union
select 5,getdate()-1, 240
create table energydata
(
webmeterID int,
DT DateTime ,
kWh varchar(10)
)
Insert into energydata (webmeterID,kWh)
select 1, 120
union
select 2, 140
union
select 3, 37
union
select 4, 40
select * from energydata
select * from temp_energydata
begin tran ABC
DECLARE #T TABLE(ID INT);
MERGE INTO dbo.energydata WITH (HOLDLOCK) AS target
USING dbo.temp_energydata AS source
ON target.webmeterID = source.webmeterID
AND target.kWh = source.kWh
WHEN MATCHED THEN
UPDATE SET target.DT = source.DT
WHEN NOT MATCHED BY source THEN delete
OUTPUT source.webmeterID INTO #T;
DELETE temp_energydata
WHERE webmeterID in (SELECT webmeterID
FROM #T);
--INSERT (webmeterID, DT, kWh)
--VALUES (source.webmeterID, source.DT, source.kWh)
rollback tran ABC
commit tran ABC
Related
I'd like to find the first "gap" in a counter column in an SQL table. For example, if there are values 1,2,4 and 5 I'd like to find out 3.
I can of course get the values in order and go through it manually, but I'd like to know if there would be a way to do it in SQL.
In addition, it should be quite standard SQL, working with different DBMSes.
In MySQL and PostgreSQL:
SELECT id + 1
FROM mytable mo
WHERE NOT EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = mo.id + 1
)
ORDER BY
id
LIMIT 1
In SQL Server:
SELECT TOP 1
id + 1
FROM mytable mo
WHERE NOT EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = mo.id + 1
)
ORDER BY
id
In Oracle:
SELECT *
FROM (
SELECT id + 1 AS gap
FROM mytable mo
WHERE NOT EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = mo.id + 1
)
ORDER BY
id
)
WHERE rownum = 1
ANSI (works everywhere, least efficient):
SELECT MIN(id) + 1
FROM mytable mo
WHERE NOT EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = mo.id + 1
)
Systems supporting sliding window functions:
SELECT -- TOP 1
-- Uncomment above for SQL Server 2012+
previd
FROM (
SELECT id,
LAG(id) OVER (ORDER BY id) previd
FROM mytable
) q
WHERE previd <> id - 1
ORDER BY
id
-- LIMIT 1
-- Uncomment above for PostgreSQL
Your answers all work fine if you have a first value id = 1, otherwise this gap will not be detected. For instance if your table id values are 3,4,5, your queries will return 6.
I did something like this
SELECT MIN(ID+1) FROM (
SELECT 0 AS ID UNION ALL
SELECT
MIN(ID + 1)
FROM
TableX) AS T1
WHERE
ID+1 NOT IN (SELECT ID FROM TableX)
There isn't really an extremely standard SQL way to do this, but with some form of limiting clause you can do
SELECT `table`.`num` + 1
FROM `table`
LEFT JOIN `table` AS `alt`
ON `alt`.`num` = `table`.`num` + 1
WHERE `alt`.`num` IS NULL
LIMIT 1
(MySQL, PostgreSQL)
or
SELECT TOP 1 `num` + 1
FROM `table`
LEFT JOIN `table` AS `alt`
ON `alt`.`num` = `table`.`num` + 1
WHERE `alt`.`num` IS NULL
(SQL Server)
or
SELECT `num` + 1
FROM `table`
LEFT JOIN `table` AS `alt`
ON `alt`.`num` = `table`.`num` + 1
WHERE `alt`.`num` IS NULL
AND ROWNUM = 1
(Oracle)
The first thing that came into my head. Not sure if it's a good idea to go this way at all, but should work. Suppose the table is t and the column is c:
SELECT
t1.c + 1 AS gap
FROM t as t1
LEFT OUTER JOIN t as t2 ON (t1.c + 1 = t2.c)
WHERE t2.c IS NULL
ORDER BY gap ASC
LIMIT 1
Edit: This one may be a tick faster (and shorter!):
SELECT
min(t1.c) + 1 AS gap
FROM t as t1
LEFT OUTER JOIN t as t2 ON (t1.c + 1 = t2.c)
WHERE t2.c IS NULL
This works in SQL Server - can't test it in other systems but it seems standard...
SELECT MIN(t1.ID)+1 FROM mytable t1 WHERE NOT EXISTS (SELECT ID FROM mytable WHERE ID = (t1.ID + 1))
You could also add a starting point to the where clause...
SELECT MIN(t1.ID)+1 FROM mytable t1 WHERE NOT EXISTS (SELECT ID FROM mytable WHERE ID = (t1.ID + 1)) AND ID > 2000
So if you had 2000, 2001, 2002, and 2005 where 2003 and 2004 didn't exist, it would return 2003.
The following solution:
provides test data;
an inner query that produces other gaps; and
it works in SQL Server 2012.
Numbers the ordered rows sequentially in the "with" clause and then reuses the result twice with an inner join on the row number, but offset by 1 so as to compare the row before with the row after, looking for IDs with a gap greater than 1. More than asked for but more widely applicable.
create table #ID ( id integer );
insert into #ID values (1),(2), (4),(5),(6),(7),(8), (12),(13),(14),(15);
with Source as (
select
row_number()over ( order by A.id ) as seq
,A.id as id
from #ID as A WITH(NOLOCK)
)
Select top 1 gap_start from (
Select
(J.id+1) as gap_start
,(K.id-1) as gap_end
from Source as J
inner join Source as K
on (J.seq+1) = K.seq
where (J.id - (K.id-1)) <> 0
) as G
The inner query produces:
gap_start gap_end
3 3
9 11
The outer query produces:
gap_start
3
Inner join to a view or sequence that has a all possible values.
No table? Make a table. I always keep a dummy table around just for this.
create table artificial_range(
id int not null primary key auto_increment,
name varchar( 20 ) null ) ;
-- or whatever your database requires for an auto increment column
insert into artificial_range( name ) values ( null )
-- create one row.
insert into artificial_range( name ) select name from artificial_range;
-- you now have two rows
insert into artificial_range( name ) select name from artificial_range;
-- you now have four rows
insert into artificial_range( name ) select name from artificial_range;
-- you now have eight rows
--etc.
insert into artificial_range( name ) select name from artificial_range;
-- you now have 1024 rows, with ids 1-1024
Then,
select a.id from artificial_range a
where not exists ( select * from your_table b
where b.counter = a.id) ;
This one accounts for everything mentioned so far. It includes 0 as a starting point, which it will default to if no values exist as well. I also added the appropriate locations for the other parts of a multi-value key. This has only been tested on SQL Server.
select
MIN(ID)
from (
select
0 ID
union all
select
[YourIdColumn]+1
from
[YourTable]
where
--Filter the rest of your key--
) foo
left join
[YourTable]
on [YourIdColumn]=ID
and --Filter the rest of your key--
where
[YourIdColumn] is null
For PostgreSQL
An example that makes use of recursive query.
This might be useful if you want to find a gap in a specific range
(it will work even if the table is empty, whereas the other examples will not)
WITH
RECURSIVE a(id) AS (VALUES (1) UNION ALL SELECT id + 1 FROM a WHERE id < 100), -- range 1..100
b AS (SELECT id FROM my_table) -- your table ID list
SELECT a.id -- find numbers from the range that do not exist in main table
FROM a
LEFT JOIN b ON b.id = a.id
WHERE b.id IS NULL
-- LIMIT 1 -- uncomment if only the first value is needed
My guess:
SELECT MIN(p1.field) + 1 as gap
FROM table1 AS p1
INNER JOIN table1 as p3 ON (p1.field = p3.field + 2)
LEFT OUTER JOIN table1 AS p2 ON (p1.field = p2.field + 1)
WHERE p2.field is null;
I wrote up a quick way of doing it. Not sure this is the most efficient, but gets the job done. Note that it does not tell you the gap, but tells you the id before and after the gap (keep in mind the gap could be multiple values, so for example 1,2,4,7,11 etc)
I'm using sqlite as an example
If this is your table structure
create table sequential(id int not null, name varchar(10) null);
and these are your rows
id|name
1|one
2|two
4|four
5|five
9|nine
The query is
select a.* from sequential a left join sequential b on a.id = b.id + 1 where b.id is null and a.id <> (select min(id) from sequential)
union
select a.* from sequential a left join sequential b on a.id = b.id - 1 where b.id is null and a.id <> (select max(id) from sequential);
https://gist.github.com/wkimeria/7787ffe84d1c54216f1b320996b17b7e
Here is an alternative to show the range of all possible gap values in portable and more compact way :
Assume your table schema looks like this :
> SELECT id FROM your_table;
+-----+
| id |
+-----+
| 90 |
| 103 |
| 104 |
| 118 |
| 119 |
| 120 |
| 121 |
| 161 |
| 162 |
| 163 |
| 185 |
+-----+
To fetch the ranges of all possible gap values, you have the following query :
The subquery lists pairs of ids, each of which has the lowerbound column being smaller than upperbound column, then use GROUP BY and MIN(m2.id) to reduce number of useless records.
The outer query further removes the records where lowerbound is exactly upperbound - 1
My query doesn't (explicitly) output the 2 records (YOUR_MIN_ID_VALUE, 89) and (186, YOUR_MAX_ID_VALUE) at both ends, that implicitly means any number in both of the ranges hasn't been used in your_table so far.
> SELECT m3.lowerbound + 1, m3.upperbound - 1 FROM
(
SELECT m1.id as lowerbound, MIN(m2.id) as upperbound FROM
your_table m1 INNER JOIN your_table
AS m2 ON m1.id < m2.id GROUP BY m1.id
)
m3 WHERE m3.lowerbound < m3.upperbound - 1;
+-------------------+-------------------+
| m3.lowerbound + 1 | m3.upperbound - 1 |
+-------------------+-------------------+
| 91 | 102 |
| 105 | 117 |
| 122 | 160 |
| 164 | 184 |
+-------------------+-------------------+
select min([ColumnName]) from [TableName]
where [ColumnName]-1 not in (select [ColumnName] from [TableName])
and [ColumnName] <> (select min([ColumnName]) from [TableName])
Here is standard a SQL solution that runs on all database servers with no change:
select min(counter + 1) FIRST_GAP
from my_table a
where not exists (select 'x' from my_table b where b.counter = a.counter + 1)
and a.counter <> (select max(c.counter) from my_table c);
See in action for;
PL/SQL via Oracle's livesql,
MySQL via sqlfiddle,
PostgreSQL via sqlfiddle
MS Sql via sqlfiddle
It works for empty tables or with negatives values as well. Just tested in SQL Server 2012
select min(n) from (
select case when lead(i,1,0) over(order by i)>i+1 then i+1 else null end n from MyTable) w
If You use Firebird 3 this is most elegant and simple:
select RowID
from (
select `ID_Column`, Row_Number() over(order by `ID_Column`) as RowID
from `Your_Table`
order by `ID_Column`)
where `ID_Column` <> RowID
rows 1
-- PUT THE TABLE NAME AND COLUMN NAME BELOW
-- IN MY EXAMPLE, THE TABLE NAME IS = SHOW_GAPS AND COLUMN NAME IS = ID
-- PUT THESE TWO VALUES AND EXECUTE THE QUERY
DECLARE #TABLE_NAME VARCHAR(100) = 'SHOW_GAPS'
DECLARE #COLUMN_NAME VARCHAR(100) = 'ID'
DECLARE #SQL VARCHAR(MAX)
SET #SQL =
'SELECT TOP 1
'+#COLUMN_NAME+' + 1
FROM '+#TABLE_NAME+' mo
WHERE NOT EXISTS
(
SELECT NULL
FROM '+#TABLE_NAME+' mi
WHERE mi.'+#COLUMN_NAME+' = mo.'+#COLUMN_NAME+' + 1
)
ORDER BY
'+#COLUMN_NAME
-- SELECT #SQL
DECLARE #MISSING_ID TABLE (ID INT)
INSERT INTO #MISSING_ID
EXEC (#SQL)
--select * from #MISSING_ID
declare #var_for_cursor int
DECLARE #LOW INT
DECLARE #HIGH INT
DECLARE #FINAL_RANGE TABLE (LOWER_MISSING_RANGE INT, HIGHER_MISSING_RANGE INT)
DECLARE IdentityGapCursor CURSOR FOR
select * from #MISSING_ID
ORDER BY 1;
open IdentityGapCursor
fetch next from IdentityGapCursor
into #var_for_cursor
WHILE ##FETCH_STATUS = 0
BEGIN
SET #SQL = '
DECLARE #LOW INT
SELECT #LOW = MAX('+#COLUMN_NAME+') + 1 FROM '+#TABLE_NAME
+' WHERE '+#COLUMN_NAME+' < ' + cast( #var_for_cursor as VARCHAR(MAX))
SET #SQL = #sql + '
DECLARE #HIGH INT
SELECT #HIGH = MIN('+#COLUMN_NAME+') - 1 FROM '+#TABLE_NAME
+' WHERE '+#COLUMN_NAME+' > ' + cast( #var_for_cursor as VARCHAR(MAX))
SET #SQL = #sql + 'SELECT #LOW,#HIGH'
INSERT INTO #FINAL_RANGE
EXEC( #SQL)
fetch next from IdentityGapCursor
into #var_for_cursor
END
CLOSE IdentityGapCursor;
DEALLOCATE IdentityGapCursor;
SELECT ROW_NUMBER() OVER(ORDER BY LOWER_MISSING_RANGE) AS 'Gap Number',* FROM #FINAL_RANGE
Found most of approaches run very, very slow in mysql. Here is my solution for mysql < 8.0. Tested on 1M records with a gap near the end ~ 1sec to finish. Not sure if it fits other SQL flavours.
SELECT cardNumber - 1
FROM
(SELECT #row_number := 0) as t,
(
SELECT (#row_number:=#row_number+1), cardNumber, cardNumber-#row_number AS diff
FROM cards
ORDER BY cardNumber
) as x
WHERE diff >= 1
LIMIT 0,1
I assume that sequence starts from `1`.
If your counter is starting from 1 and you want to generate first number of sequence (1) when empty, here is the corrected piece of code from first answer valid for Oracle:
SELECT
NVL(MIN(id + 1),1) AS gap
FROM
mytable mo
WHERE 1=1
AND NOT EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = mo.id + 1
)
AND EXISTS
(
SELECT NULL
FROM mytable mi
WHERE mi.id = 1
)
DECLARE #Table AS TABLE(
[Value] int
)
INSERT INTO #Table ([Value])
VALUES
(1),(2),(4),(5),(6),(10),(20),(21),(22),(50),(51),(52),(53),(54),(55)
--Gaps
--Start End Size
--3 3 1
--7 9 3
--11 19 9
--23 49 27
SELECT [startTable].[Value]+1 [Start]
,[EndTable].[Value]-1 [End]
,([EndTable].[Value]-1) - ([startTable].[Value]) Size
FROM
(
SELECT [Value]
,ROW_NUMBER() OVER(PARTITION BY 1 ORDER BY [Value]) Record
FROM #Table
)AS startTable
JOIN
(
SELECT [Value]
,ROW_NUMBER() OVER(PARTITION BY 1 ORDER BY [Value]) Record
FROM #Table
)AS EndTable
ON [EndTable].Record = [startTable].Record+1
WHERE [startTable].[Value]+1 <>[EndTable].[Value]
If the numbers in the column are positive integers (starting from 1) then here is how to solve it easily. (assuming ID is your column name)
SELECT TEMP.ID
FROM (SELECT ROW_NUMBER() OVER () AS NUM FROM 'TABLE-NAME') AS TEMP
WHERE ID NOT IN (SELECT ID FROM 'TABLE-NAME')
ORDER BY 1 ASC LIMIT 1
SELECT ID+1 FROM table WHERE ID+1 NOT IN (SELECT ID FROM table) ORDER BY 1;
Write a SQL statement which can generate the list of customers whose minutes Streamed is consistently less than the previous minutes Streamed. As in minutes Streamed in the nth order is less than minutes Streamed in n-1th order, and the next previous order is also less. Another way to say it, list the customers that watch less and less minutes each time they watch a movie.
The table, query:
sqlfiddle link:
I have come up with the following query:
select distinct c1.customer_Id
from Customer c1
join Customer c2
where c1.customer_Id = c2.customer_Id
and c1.purchaseDate > c2.purchaseDate
and c1.minutesStreamed < c2.minutesStreamed;
This query doesn't deal with the (n-1)st and (n-2)nd comparison, i.e. "minutes Streamed in the nth order is less than minutes Streamed in n-1th order, and the next previous order is also less." condition.
I have attached a link for sqlfiddle, where I have created the table.
Hello Continuous Learner,
the following statement works for the n-1 and n-2 relation.
select distinct c1.customer_Id
from Customer c1
join Customer c2
on c1.customer_Id = c2.customer_Id
join Customer c3
on c1.customer_Id = c3.customer_Id
where c1.purchaseDate < c2.purchaseDate
and c1.minutesStreamed > c2.minutesStreamed
and c2.purchaseDate < c3.purchaseDate
and c2.minutesStreamed > c3.minutesStreamed
Although, I currently don't have an automatic solution for this problem.
Cheers
I would use a ROW_NUMBER() function with partition by customer id.
and then do a self join, on customer id and rank = rank-1, to bring new and old at the same level
Like:
create temp_rank_table as
(
select
customer_Id,
purchaseDate ,
minutesStreamed,
ROW_NUMBER() OVER (PARTITION BY customer_Id, ORDER BY purchaseDate, minutesStreamed) as cust_row
from Customer
)
self join
select customer_Id
( select
newval.customer_Id,
sum(case when newval.minutesStreamed < oldval.minutesStreamed then 1 else 0 end) as LessThanPrevCount,
max(newval.cust_row) as totalStreamCount
from temp_rank_table newval
left join temp_rank_table oldval
on newval.customer_id = oldval.customer_id
and newval.cust_row-1 = oldval.cust_row -- cust_row 2 matches to cust_row 1
group by newval.customer_id
)A
where A.LessThanPrevCount = (A.totalStreamCount-1)
-- get customers who always stream lesser than previous
--you can use having clause instead of a subquery too
DECLARE #TBL AS TABLE ( [NO] INT, [CODE] VARCHAR(50), [AREA]
VARCHAR(50) )
/* EXAMPLE 1 */ INSERT INTO #TBL([NO],[CODE],[AREA]) VALUES
(1,'001','A00') INSERT INTO #TBL([NO],[CODE],[AREA]) VALUES
(2,'001','A00') INSERT INTO #TBL([NO],[CODE],[AREA]) VALUES
(3,'001','B00') INSERT INTO #TBL([NO],[CODE],[AREA]) VALUES
(4,'001','C00') INSERT INTO #TBL([NO],[CODE],[AREA]) VALUES
(5,'001','C00') INSERT INTO #TBL([NO],[CODE],[AREA]) VALUES
(6,'001','A00') INSERT INTO #TBL([NO],[CODE],[AREA]) VALUES
(7,'001','A00')
/* EXAMPLE 2 / / ***** USE THIS CODE TO ENTER DATA FROM DIRECT TABLE
***** SELECT ROW_NUMBER() OVER(ORDER BY [FIELD_DATE]) AS [NO] ,[FIELD_CODE] AS [CODE] ,[FIELD_AREA] AS [AREA] FROM TABLE_A WHERE
CAST([FIELD_DATE] AS DATE) >= CAST('20200307' AS DATE) ORDER BY
[FIELD_DATE],[FIELD_CODE]
*/
SELECT A.NO AS ANO ,A.CODE AS ACODE ,A.AREA AS AAREA ,B.NO AS BNO
,B.CODE AS BCODE ,B.AREA AS BAREA ,CASE WHEN A.AREA=B.AREA THEN
'EQUAL' ELSE 'NOT EQUAL' END AS [COMPARE AREA] FROM #TBL A LEFT JOIN
#TBL B ON A.NO=B.NO+1
Blockquote
I have a table that contains two columns
ID | Name
----------------
1 | John
2 | Sam
3 | Peter
6 | Mike
It has missed IDs. In this case these are 4 and 5.
How do I find and insert them together with random names into this table?
Update: cursors and temp tables are not allowed. The random name should be 'Name_'+ some random number. Maybe it would be the specified value like 'Abby'. So it doesn't matter.
Using a recursive CTE you can determine the missing IDs as follows
DECLARE #Table TABLE(
ID INT,
Name VARCHAR(10)
)
INSERT INTO #Table VALUES (1, 'John'),(2, 'Sam'),(3,'Peter'),(6, 'Mike')
DECLARE #StartID INT,
#EndID INT
SELECT #StartID = MIN(ID),
#EndID = MAX(ID)
FROM #Table
;WITH IDS AS (
SELECT #StartID IDEntry
UNION ALL
SELECT IDEntry + 1
FROM IDS
WHERE IDEntry + 1 <= #EndID
)
SELECT IDS.IDEntry [ID]
FROM IDS LEFT JOIN
#Table t ON IDS.IDEntry = t.ID
WHERE t.ID IS NULL
OPTION (MAXRECURSION 0)
The option MAXRECURSION 0 will allow the code to avoid the recursion limit of SQL SERVER
From Query Hints and WITH common_table_expression (Transact-SQL)
MAXRECURSION number Specifies the maximum number of recursions
allowed for this query. number is a nonnegative integer between 0 and
32767. When 0 is specified, no limit is applied. If this option is not specified, the default limit for the server is 100.
When the specified or default number for MAXRECURSION limit is reached
during query execution, the query is ended and an error is returned.
Because of this error, all effects of the statement are rolled back.
If the statement is a SELECT statement, partial results or no results
may be returned. Any partial results returned may not include all rows
on recursion levels beyond the specified maximum recursion level.
Generating the RANDOM names will largly be affected by the requirements of such a name, and the column type of such a name. What exactly does this random name entail?
You can do this using a recursive Common Table Expression CTE. Here's an example how:
DECLARE #MaxId INT
SELECT #MaxId = MAX(ID) from MyTable
;WITH Numbers(Number) AS
(
SELECT 1
UNION ALL
SELECT Number + 1 FROM Numbers WHERE Number < #MaxId
)
SELECT n.Number, 'Random Name'
FROM Numbers n
LEFT OUTER JOIN MyTable t ON n.Number=t.ID
WHERE t.ID IS NULL
Here are a couple of articles about CTEs that will be helpful to Using Common Table Expressions and Recursive Queries Using Common Table Expressions
Start by selecting the highest number in the table (select top 1 id desc), or select max(id), then run a while loop to iterate from 1...max.
See this article about looping.
For each iteration, see if the row exists, and if not, insert into table, with that ID.
I think recursive CTE is a better solution, because it's going to be faster, but here is what worked for me:
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[TestTable]') AND type in (N'U'))
DROP TABLE [dbo].[TestTable]
GO
CREATE TABLE [dbo].[TestTable](
[Id] [int] NOT NULL,
[Name] [varchar](50) NOT NULL,
CONSTRAINT [PK_TestTable] PRIMARY KEY CLUSTERED
(
[Id] ASC
))
GO
INSERT INTO [dbo].[TestTable]([Id],[Name]) VALUES (1, 'John')
INSERT INTO [dbo].[TestTable]([Id],[Name]) VALUES (2, 'Sam')
INSERT INTO [dbo].[TestTable]([Id],[Name]) VALUES (3, 'Peter')
INSERT INTO [dbo].[TestTable]([Id],[Name]) VALUES (6, 'Mike')
GO
declare #mod int
select #mod = MAX(number)+1 from master..spt_values where [type] = 'P'
INSERT INTO [dbo].[TestTable]
SELECT y.Id,'Name_' + cast(newid() as varchar(45)) Name from
(
SELECT TOP (select MAX(Id) from [dbo].[TestTable]) x.Id from
(
SELECT
t1.number*#mod + t2.number Id
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
WHERE t1.[type] = 'P' and t2.[type] = 'P'
) x
WHERE x.Id > 0
ORDER BY x.Id
) y
LEFT JOIN [dbo].[TestTable] on [TestTable].Id = y.Id
where [TestTable].Id IS NULL
GO
select * from [dbo].[TestTable]
order by Id
GO
http://www.sqlfiddle.com/#!3/46c7b/18
It's actually very simple :
Create a table called #All_numbers which should contain all the natural number in the range that you are looking for.
#list is a table containing your data
select a.num as missing_number ,
'Random_Name' + convert(varchar, a.num)
from #All_numbers a left outer join #list l on a.num = l.Id
where l.id is null
I have a record in my table like
memberId PersonId Year
4057 1787 2
4502 1787 3
I want a result from a query like this
memberId1 MemberId2 PersonId
4057 4502 1787
How to write a query ??
Don't do this in a query, do it in the application layer.
Don't do this in SQL. At best you can try:
SELECT table1.memberId memberId1, table2.memberId MemberId2, PersonId
FROM table table1 JOIN table table2 USING (PersonId)
But it won't do what you expect if you have more than 2 Members for a person. (It will return every possible combination.)
Below an example on how to do it directly in SQL. Mind that there is plenty of room for optimisation but this version should be rather fast, esp if you have an index on PersonID and year.
SELECT DISTINCT PersonID,
memberId1 = Convert(int, NULL),
memberId2 = Convert(int, NULL)
INTO #result
FROM myTable
WHERE Year IN (2 , 3)
CREATE UNIQUE CLUSTERED INDEX uq0_result ON #result (PersonID)
UPDATE #result
SET memberId1 = t.memberId
FROM #result upd
JOIN myTable t
ON t.PersionId = upd.PersonID
AND t.Year = 2
UPDATE #result
SET memberId2 = t.memberId
FROM #result upd
JOIN myTable t
ON t.PersionId = upd.PersonID
AND t.Year = 3
SELECT * FROM #result
if you want all member ids for each person_id, you could use [for xml path] statement (great functionality)
to concat all memberId s in a string
select distinct PersonId
, (select ' '+cast(t0.MemberId as varchar)
from table t0
where t0.PersonId=t1.PersonId
for xml path('')
) [Member Ids]
from table t1
resulting in:
PersonId Members Ids
1787 ' 4057 4502'
if you really need seperate columns, with unlimited number of memberIds, consider using
a PIVOT table, but far more complex to use
I have unique id and email fields. Emails get duplicated. I only want to keep one Email address of all the duplicates but with the latest id (the last inserted record).
How can I achieve this?
Imagine your table test contains the following data:
select id, email
from test;
ID EMAIL
---------------------- --------------------
1 aaa
2 bbb
3 ccc
4 bbb
5 ddd
6 eee
7 aaa
8 aaa
9 eee
So, we need to find all repeated emails and delete all of them, but the latest id.
In this case, aaa, bbb and eee are repeated, so we want to delete IDs 1, 7, 2 and 6.
To accomplish this, first we need to find all the repeated emails:
select email
from test
group by email
having count(*) > 1;
EMAIL
--------------------
aaa
bbb
eee
Then, from this dataset, we need to find the latest id for each one of these repeated emails:
select max(id) as lastId, email
from test
where email in (
select email
from test
group by email
having count(*) > 1
)
group by email;
LASTID EMAIL
---------------------- --------------------
8 aaa
4 bbb
9 eee
Finally we can now delete all of these emails with an Id smaller than LASTID. So the solution is:
delete test
from test
inner join (
select max(id) as lastId, email
from test
where email in (
select email
from test
group by email
having count(*) > 1
)
group by email
) duplic on duplic.email = test.email
where test.id < duplic.lastId;
I don't have mySql installed on this machine right now, but should work
Update
The above delete works, but I found a more optimized version:
delete test
from test
inner join (
select max(id) as lastId, email
from test
group by email
having count(*) > 1) duplic on duplic.email = test.email
where test.id < duplic.lastId;
You can see that it deletes the oldest duplicates, i.e. 1, 7, 2, 6:
select * from test;
+----+-------+
| id | email |
+----+-------+
| 3 | ccc |
| 4 | bbb |
| 5 | ddd |
| 8 | aaa |
| 9 | eee |
+----+-------+
Another version, is the delete provived by Rene Limon
delete from test
where id not in (
select max(id)
from test
group by email)
Try this method
DELETE t1 FROM test t1, test t2
WHERE t1.id > t2.id AND t1.email = t2.email
Correct way is
DELETE FROM `tablename`
WHERE `id` NOT IN (
SELECT * FROM (
SELECT MAX(`id`) FROM `tablename`
GROUP BY `name`
)
)
DELETE
FROM
`tbl_job_title`
WHERE id NOT IN
(SELECT
*
FROM
(SELECT
MAX(id)
FROM
`tbl_job_title`
GROUP BY NAME) tbl)
revised and working version!!! thank you #Gaurav
If you want to keep the row with the lowest id value:
DELETE n1 FROM 'yourTableName' n1, 'yourTableName' n2 WHERE n1.id > n2.id AND n1.email = n2.email
If you want to keep the row with the highest id value:
DELETE n1 FROM 'yourTableName' n1, 'yourTableName' n2 WHERE n1.id < n2.id AND n1.email = n2.email
or this query might also help
DELETE FROM `yourTableName`
WHERE id NOT IN (
SELECT * FROM (
SELECT MAX(id) FROM yourTableName
GROUP BY name
)
)
I must say that the optimized version is one sweet, elegant piece of code, and it works like a charm even when the comparison is performed on a DATETIME column. This is what I used in my script, where I was searching for the latest contract end date for each EmployeeID:
DELETE CurrentContractData
FROM CurrentContractData
INNER JOIN (
SELECT
EmployeeID,
PeriodofPerformanceStartDate,
max(PeriodofPerformanceEndDate) as lastDate,
ContractID
FROM CurrentContractData
GROUP BY EmployeeID
HAVING COUNT(*) > 1) Duplicate on Duplicate.EmployeeID = CurrentContractData.EmployeeID
WHERE CurrentContractData.PeriodofPerformanceEndDate < Duplicate.lastDate;
Many thanks!
I personally had trouble with the top two voted answers. It's not the cleanest solution but you can utilize temporary tables to avoid all the issues MySQL has with deleting via joining on the same table.
CREATE TEMPORARY TABLE deleteRows;
SELECT MIN(id) as id FROM myTable GROUP BY myTable.email;
DELETE FROM myTable
WHERE id NOT IN (SELECT id FROM deleteRows);
DELIMITER //
CREATE FUNCTION findColumnNames(tableName VARCHAR(255))
RETURNS TEXT
BEGIN
SET #colNames = "";
SELECT GROUP_CONCAT(COLUMN_NAME) FROM INFORMATION_SCHEMA.columns
WHERE TABLE_NAME = tableName
GROUP BY TABLE_NAME INTO #colNames;
RETURN #colNames;
END //
DELIMITER ;
DELIMITER //
CREATE PROCEDURE deleteDuplicateRecords (IN tableName VARCHAR(255))
BEGIN
SET #colNames = findColumnNames(tableName);
SET #addIDStmt = CONCAT("ALTER TABLE ",tableName," ADD COLUMN id INT AUTO_INCREMENT KEY;");
SET #deleteDupsStmt = CONCAT("DELETE FROM ",tableName," WHERE id NOT IN
( SELECT * FROM ",
" (SELECT min(id) FROM ",tableName," group by ",findColumnNames(tableName),") AS tmpTable);");
set #dropIDStmt = CONCAT("ALTER TABLE ",tableName," DROP COLUMN id");
PREPARE addIDStmt FROM #addIDStmt;
EXECUTE addIDStmt;
PREPARE deleteDupsStmt FROM #deleteDupsStmt;
EXECUTE deleteDupsStmt;
PREPARE dropIDStmt FROM #dropIDStmt;
EXECUTE dropIDstmt;
END //
DELIMITER ;
Nice stored procedure I created for deleting all duplicate records of a table without needing an existing unique id on that table.
CALL deleteDuplicateRecords("yourTableName");
I want to remove duplicate records based on multiple columns in table, so this approach worked for me,
Step 1 - Get max id or unique id from duplocate records
select * FROM ( SELECT MAX(id) FROM table_name
group by travel_intimation_id,approved_by,approval_type,approval_status having
count(*) > 1
Step 2 - Get ids of single records from table
select * FROM ( SELECT id FROM table_name
group by travel_intimation_id,approved_by,approval_type,approval_status having
count(*) = 1
Step 3 - Exclude above 2 queries from delete to
DELETE FROM `table_name`
WHERE
id NOT IN (paste step 1 query) a //to exclude duplicate records
and
id NOT IN (paste step 2 query) b // to exclude single records
Final Query :-
DELETE FROM `table_name`
WHERE id NOT IN (
select * FROM ( SELECT MAX(id) FROM table_name
group by travel_intimation_id,approved_by,approval_type,approval_status having
count(*) > 1) a
)
and id not in (
select * FROM ( SELECT id FROM table_name
group by travel_intimation_id,approved_by,approval_type,approval_status having
count(*) = 1) b
);
By this query only duplocate records will delete.
Please try the following solution (based on the comments of the '#Jose Rui Santos' answer):
-- Set safe mode to false since;
-- You are using safe update mode and tried to update a table without a WHERE that uses a KEY column
SET SQL_SAFE_UPDATES = 0;
-- Delete the duplicate rows based on the field_with_duplicate_values
-- Keep the unique rows with the highest id
DELETE FROM table_to_deduplicate
WHERE id NOT IN (
SELECT * FROM (
-- Select the highest id grouped by the field_with_duplicate_values
SELECT MAX(id)
FROM table_to_deduplicate
GROUP BY field_with_duplicate_values
)
-- Subquery and alias needed since;
-- You can't specify target table 'table_to_deduplicate' for update in FROM clause
AS table_sub
);
-- Set safe mode to true
SET SQL_SAFE_UPDATES = 1;