My database schema looks like this:
Table t1:
id
valA
valB
Table t2:
id
valA
valB
What I want to do, is, for a given set of rows in one of these tables, find rows in both tables that have the same valA or valB (comparing valA with valA and valB with valB, not valA with valB). Then, I want to look for rows with the same valA or valB as the rows in the result of the previous query, and so on.
Example data:
t1 (id, valA, valB):
1, a, B
2, b, J
3, d, E
4, d, B
5, c, G
6, h, J
t2 (id, valA, valB):
1, b, E
2, d, H
3, g, B
Example 1:
Input: Row 1 in t1
Output:
t1/4, t2/3
t1/3, t2/2
t2/1
...
Example 2:
Input: Row 6 in t1
Output:
t1/2
t2/1
I would like to have the level of the search at that the row was found in the result (e.g. in Example 1: Level 1 for t1/2 and t2/1, level 2 for t1/5, ...) A limited depth of recursion is okay. Over time, I maybe want to include more tables following the same schema into the query. It would be nice if it was easy to extend the query for that purpose.
But what matters most, is the performance. Can you tell me the fastest possible way to accomplish this?
Thanks in advance!
try this although it's not fully tested but looked like it was working :P (http://pastie.org/1140339)
drop table if exists t1;
create table t1
(
id int unsigned not null auto_increment primary key,
valA char(1) not null,
valB char(1) not null
)
engine=innodb;
drop table if exists t2;
create table t2
(
id int unsigned not null auto_increment primary key,
valA char(1) not null,
valB char(1) not null
)
engine=innodb;
drop view if exists t12;
create view t12 as
select 1 as tid, id, valA, valB from t1
union
select 2 as tid, id, valA, valB from t2;
insert into t1 (valA, valB) values
('a','B'),
('b','J'),
('d','E'),
('d','B'),
('c','G'),
('h','J');
insert into t2 (valA, valB) values
('b','E'),
('d','H'),
('g','B');
drop procedure if exists find_children;
delimiter #
create procedure find_children
(
in p_tid tinyint unsigned,
in p_id int unsigned
)
proc_main:begin
declare done tinyint unsigned default 0;
declare dpth smallint unsigned default 0;
create temporary table children(
tid tinyint unsigned not null,
id int unsigned not null,
valA char(1) not null,
valB char(1) not null,
depth smallint unsigned default 0,
primary key (tid, id, valA, valB)
)engine = memory;
insert into children select p_tid, t.id, t.valA, t.valB, dpth from t12 t where t.tid = p_tid and t.id = p_id;
create temporary table tmp engine=memory select * from children;
/* http://dec.mysql.com/doc/refman/5.0/en/temporary-table-problems.html */
while done <> 1 do
if exists(
select 1 from t12 t
inner join tmp on tmp.valA = t.valA or tmp.valB = t.valB and tmp.depth = dpth) then
insert ignore into children
select
t.tid, t.id, t.valA, t.valB, dpth+1
from t12 t
inner join tmp on tmp.valA = t.valA or tmp.valB = t.valB and tmp.depth = dpth;
set dpth = dpth + 1;
truncate table tmp;
insert into tmp select * from children where depth = dpth;
else
set done = 1;
end if;
end while;
select * from children order by depth;
drop temporary table if exists children;
drop temporary table if exists tmp;
end proc_main #
delimiter ;
call find_children(1,1);
call find_children(1,6);
You can do it with stored procedures (see listings 7 and 7a):
http://www.artfulsoftware.com/mysqlbook/sampler/mysqled1ch20.html
You just need to figure out a query for the step of the recursion - taking the already-found rows and finding some more rows.
If you had a database which supported SQL-99 recursive common table expressions (like PostgreSQL or Firebird, hint hint), you could take the same approach as in the above link, but using a rCTE as the framework, so avoiding the need to write a stored procedure.
EDIT: I had a go at doing this with an rCTE in PostgreSQL 8.4, and although i can find the rows, i can't find a way to label them with the depth at which they were found. First, i create a a view to unify the tables:
create view t12 (tbl, id, vala, valb) as (
(select 't1', id, vala, valb from t1)
union
(select 't2', id, vala, valb from t2)
)
Then do this query:
with recursive descendants (tbl, id, vala, valb) as (
(select *
from t12
where tbl = 't1' and id = 1) -- the query that identifies the seed rows, here just t1/1
union
(select c.*
from descendants p, t12 c
where (p.vala = c.vala or p.valb = c.valb)) -- the recursive term
)
select * from descendants;
You would imagine that capturing depth would be as simple as adding a depth column to the rCTE, set to zero in the seed query, then somehow incremented in the recursive step. However, i couldn't find any way to do that, given that you can't write subqueries against the rCTE in the recursive step (so nothing like select max(depth) + 1 from descendants in the column list), and you can't use an aggregate function in the column list (so no max(p.depth) + 1 in the column list coupled with a group by c.* on the select).
You would also need to add a restriction to the query to exclude already-selected rows; you don't need to do that in the basic version, because of the distincting effect of the union, but if you add a count column, then a row can be included in the results more than once with different counts, and you'll get a Cartesian explosion. But you can't easily prevent it, because you can't have subqueries against the rCTE, which means you can't say anything like and not exists (select * from descendants d where d.tbl = c.tbl and d.id = c.id)!
I know all this stuff about recursive queries is of no use to you, but i find it riveting, so please do excuse me.
Related
Lets say I have a table which has a primary key of 1 to X. During the development of the table there are entries which have been deleted, say I deleted entry 5. And hence the table entry will be 1,2,3,4,6,7,8. Is there a query to find all id of the primary key which has been skipped?
Take a look at this link, it have a solution to your problem
Quoted from the link:
select l.id + 1 as start
from sequence as l
left outer join sequence as r on l.id + 1 = r.id
where r.id is null;
Supposing you have a table called sequence with primary key column Id starting from 1, With values: 1,2,3,4, 6,7, 9,...
This sample code will select 5 and 8.
simsim's answer will not return all missing keys if 2 keys in a sequence are missing. SQLFiddle demo: http://sqlfiddle.com/#!2/cc241/1
Instead create a numbers table and join where the key is null:
CREATE TABLE Numbers(
Num INTEGER
)
DECLARE #id INTEGER
SELECT #id = 1
WHILE #id >=1 AND #id <= 100000
BEGIN
INSERT INTO Numbers
VALUES(#id)
SELECT #id += 1
END
SELECT *
FROM Numbers N
LEFT JOIN your_table YT ON N.Num = YT.PrimaryKey
WHERE YT.Primary IS NULL
I have a MYSQL table which stores teams.
Table structure:
CREATE TABLE teams (
id int(11) NOT NULL AUTO_INCREMENT,
name varchar(28) COLLATE utf8_unicode_ci NOT NULL,
UNIQUE KEY id (id)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1;
Sample data:
INSERT INTOteamsVALUES
(1, 'one'),
(2, 'two'),
(3, 'three'),
(4, 'four'),
(5, 'five');
Use:
SELECT id, name, id as rowNumber FROM teams WHERE id = 4
Returns the correct rowNumber, as there are really three rows infront f it in the table. But this only works as long as I don't remove a row.
Example:
Let's say I DELETE FROM teams WHERE id = 3;
When I now use SELECT id, name, id as rowNumber FROM teams WHERE id = 4 the result is wrong as there are now only two rows (id's 1&2) infront of it in the table.
How can I get the "real" row number/index ordered by id from one specific row?
You are rturning ID as rowNumber, so it simply returning ID column value. Why do you expect it to be different?
I think you may want to define and #curRow variable to get the row number as and use sub query as below:
SELECT * from
(SELECT ID,
NAME,
#curRow := #curRow + 1 AS rowNumber
FROM Teams t
JOIN (SELECT #curRow := 0) curr
ORDER by t.ID asc) as ordered_team
WHERE ordered_team.id = 4;
It's not a good way, but with plain sql:
SELECT
t.id,
t.name,
(SELECT COUNT(*)+1 FROM teams WHERE id < t.id) as row_number
FROM teams t
WHERE t.id = 4
Why do you bother row indexes inside the persistance layer?
If your really need to rely on the "index" of the tupples stored, you could introduce a variable and increment it in the query/ program code for each row.
EDIT:
Just found that one:: With MySQL, how can I generate a column containing the record index in a table?
I have a table that contains two columns
ID | Name
----------------
1 | John
2 | Sam
3 | Peter
6 | Mike
It has missed IDs. In this case these are 4 and 5.
How do I find and insert them together with random names into this table?
Update: cursors and temp tables are not allowed. The random name should be 'Name_'+ some random number. Maybe it would be the specified value like 'Abby'. So it doesn't matter.
Using a recursive CTE you can determine the missing IDs as follows
DECLARE #Table TABLE(
ID INT,
Name VARCHAR(10)
)
INSERT INTO #Table VALUES (1, 'John'),(2, 'Sam'),(3,'Peter'),(6, 'Mike')
DECLARE #StartID INT,
#EndID INT
SELECT #StartID = MIN(ID),
#EndID = MAX(ID)
FROM #Table
;WITH IDS AS (
SELECT #StartID IDEntry
UNION ALL
SELECT IDEntry + 1
FROM IDS
WHERE IDEntry + 1 <= #EndID
)
SELECT IDS.IDEntry [ID]
FROM IDS LEFT JOIN
#Table t ON IDS.IDEntry = t.ID
WHERE t.ID IS NULL
OPTION (MAXRECURSION 0)
The option MAXRECURSION 0 will allow the code to avoid the recursion limit of SQL SERVER
From Query Hints and WITH common_table_expression (Transact-SQL)
MAXRECURSION number Specifies the maximum number of recursions
allowed for this query. number is a nonnegative integer between 0 and
32767. When 0 is specified, no limit is applied. If this option is not specified, the default limit for the server is 100.
When the specified or default number for MAXRECURSION limit is reached
during query execution, the query is ended and an error is returned.
Because of this error, all effects of the statement are rolled back.
If the statement is a SELECT statement, partial results or no results
may be returned. Any partial results returned may not include all rows
on recursion levels beyond the specified maximum recursion level.
Generating the RANDOM names will largly be affected by the requirements of such a name, and the column type of such a name. What exactly does this random name entail?
You can do this using a recursive Common Table Expression CTE. Here's an example how:
DECLARE #MaxId INT
SELECT #MaxId = MAX(ID) from MyTable
;WITH Numbers(Number) AS
(
SELECT 1
UNION ALL
SELECT Number + 1 FROM Numbers WHERE Number < #MaxId
)
SELECT n.Number, 'Random Name'
FROM Numbers n
LEFT OUTER JOIN MyTable t ON n.Number=t.ID
WHERE t.ID IS NULL
Here are a couple of articles about CTEs that will be helpful to Using Common Table Expressions and Recursive Queries Using Common Table Expressions
Start by selecting the highest number in the table (select top 1 id desc), or select max(id), then run a while loop to iterate from 1...max.
See this article about looping.
For each iteration, see if the row exists, and if not, insert into table, with that ID.
I think recursive CTE is a better solution, because it's going to be faster, but here is what worked for me:
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[TestTable]') AND type in (N'U'))
DROP TABLE [dbo].[TestTable]
GO
CREATE TABLE [dbo].[TestTable](
[Id] [int] NOT NULL,
[Name] [varchar](50) NOT NULL,
CONSTRAINT [PK_TestTable] PRIMARY KEY CLUSTERED
(
[Id] ASC
))
GO
INSERT INTO [dbo].[TestTable]([Id],[Name]) VALUES (1, 'John')
INSERT INTO [dbo].[TestTable]([Id],[Name]) VALUES (2, 'Sam')
INSERT INTO [dbo].[TestTable]([Id],[Name]) VALUES (3, 'Peter')
INSERT INTO [dbo].[TestTable]([Id],[Name]) VALUES (6, 'Mike')
GO
declare #mod int
select #mod = MAX(number)+1 from master..spt_values where [type] = 'P'
INSERT INTO [dbo].[TestTable]
SELECT y.Id,'Name_' + cast(newid() as varchar(45)) Name from
(
SELECT TOP (select MAX(Id) from [dbo].[TestTable]) x.Id from
(
SELECT
t1.number*#mod + t2.number Id
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
WHERE t1.[type] = 'P' and t2.[type] = 'P'
) x
WHERE x.Id > 0
ORDER BY x.Id
) y
LEFT JOIN [dbo].[TestTable] on [TestTable].Id = y.Id
where [TestTable].Id IS NULL
GO
select * from [dbo].[TestTable]
order by Id
GO
http://www.sqlfiddle.com/#!3/46c7b/18
It's actually very simple :
Create a table called #All_numbers which should contain all the natural number in the range that you are looking for.
#list is a table containing your data
select a.num as missing_number ,
'Random_Name' + convert(varchar, a.num)
from #All_numbers a left outer join #list l on a.num = l.Id
where l.id is null
Suppose I have a primary key pk and a nullable column col. I want to find consecutive sequences of rows where col is NULL, ordered in descending order of run-length.
I will accept as a valid answer a query that returns the run-lengths alone, but in the future (perhaps in a separate question) I will want to know some pk that points me to either the start or end point of each run.
Example data:
pk col
-- ---
1 'a'
2 NULL
3 'b'
4 NULL
5 NULL
6 NULL
7 'c'
8 NULL
9 NULL
10 'd'
Expected query result:
runlengths
----------
3
2
1
I prefer standard SQL if possible, but this is for analyzing a production data set stored in MySQL, so whatever works best in that context.
Try this.
DECLARE #a TABLE (
pk INT IDENTITY(1,1),
col CHAR(1)
)
INSERT #a (col)
VALUES ('a'), (null), ('b'), (null), (null), (null), ('c'), (null), (null), ('d')
SELECT COUNT(*) as runlengths
FROM #a AS A
INNER JOIN (
SELECT
l.pk,
MAX(r.pk) AS prev
FROM #a AS l
INNER JOIN #a AS r
ON l.pk > r.pk
WHERE
l.col IS NOT NULL
AND r.col IS NOT NULL
GROUP BY
l.pk
) AS B
ON A.pk < B.pk AND A.pk > B.prev
GROUP BY
B.pk
It's T-SQL dialect, but I believe it's clear enough.
There is an issue with this query, if first/last row has NULL value, but it is not difficult to fix that. The way how to do that depends on the your requirements.
Give this a try:
select count(*) runlengths from (
select col, #count := #count + (col is not null) cnt
from t, (select #count := 0) init
) final
where col is null
group by cnt
order by count(*) desc
Fiddle here.
I am currently using the below merge code to migrate date from source to target. I have a new requirement to extend the below code to delete the record from source once an update/insert is performed on the target. Is this possible using merge(all the examples i see on the net had performing del/insert/update in the target not on the source)
MERGE Target1 AS T
USING Source1 AS S
ON (T.EmployeeID = S.EmployeeID)
WHEN NOT MATCHED BY TARGET AND S.EmployeeName LIKE 'S%'
THEN INSERT(EmployeeID, EmployeeName) VALUES(S.EmployeeID, S.EmployeeName)
WHEN MATCHED
THEN UPDATE SET T.EmployeeName = S.EmployeeName
WHEN NOT MATCHED BY SOURCE AND T.EmployeeName LIKE 'S%'
THEN DELETE ;
You can use the output clause to capture the modified/inserted rows to a table variable and use that with a delete statement after the merge.
DECLARE #T TABLE(EmployeeID INT);
MERGE Target1 AS T
USING Source1 AS S
ON (T.EmployeeID = S.EmployeeID)
WHEN NOT MATCHED BY TARGET AND S.EmployeeName LIKE 'S%'
THEN INSERT(EmployeeID, EmployeeName) VALUES(S.EmployeeID, S.EmployeeName)
WHEN MATCHED
THEN UPDATE SET T.EmployeeName = S.EmployeeName
WHEN NOT MATCHED BY SOURCE AND T.EmployeeName LIKE 'S%'
THEN DELETE
OUTPUT S.EmployeeID INTO #T;
DELETE Source1
WHERE EmployeeID in (SELECT EmployeeID
FROM #T);
Nice Reponse, but your code will delete the row from your destination table, here's an exemple in wich you can delete the rows from your source destination without affecting your target table :
if OBJECT_ID('audit.tmp1') IS NOT NULL
DROP TABLE audit.tmp1
select *
into audit.tmp1
from
(
select 1 id, 'aa' nom, convert(date,'2014-01-01') as dd UNION ALL
select 2 id, 'bb' nom, convert(date,'2013-07-12') as dd UNION ALL
select 3 id, 'cc' nom, convert(date,'2012-08-21') as dd UNION ALL
select 4 id, 'dd' nom, convert(date,'2011-11-15') as dd UNION ALL
select 5 id, 'ee' nom, convert(date,'2010-05-16') as dd ) T
if OBJECT_ID('audit.tmp2') IS NOT NULL
DROP TABLE audit.tmp2
select *
into audit.tmp2
from
(
select 1 id, 'aAa' nom, convert(date,'2014-01-14') as dd UNION ALL
select 2 id, 'bbB' nom, convert(date,'2013-06-13') as dd UNION ALL
select 4 id, 'dDD' nom, convert(date,'2012-11-05') as dd UNION ALL
select 6 id, 'FFf' nom, convert(date,'2014-01-12') as dd) T
SELECT * FROM audit.tmp1 order by 1
SELECT * FROM audit.tmp2 order by 1
DECLARE #T TABLE(ID INT);
MERGE audit.tmp2 WITH (HOLDLOCK) AS T
USING (SELECT * FROM audit.tmp1 WHERE nom <> 'dd') AS S
ON (T.id = S.id)
WHEN NOT MATCHED BY TARGET
THEN INSERT(id, nom, dd) VALUES(S.id, S.nom, S.dd)
WHEN MATCHED
THEN UPDATE SET T.nom = S.nom, T.dd = S.dd
WHEN NOT MATCHED BY SOURCE
THEN UPDATE SET T.id = T.id OUTPUT S.id INTO #T;
DELETE tmp1
FROM audit.tmp1
INNER JOIN
#T AS DEL
ON DEL.id = tmp1 .id
SELECT * FROM audit.tmp1 ORDER BY 1
SELECT * FROM audit.tmp2 ORDER BY 1
I hope this will help you.
In our case, we wanted to use MERGE to synchronize our internal database with an outside source of a different structure. Automated CASCADE settings were not an option because we enjoy many cyclical relationships and, really, we don't like that kind of cheap power in the hands of disgruntled staffers. We can't delete parent rows before their child rows are gone.
All of this is done with lightning fast MERGEs that use Table Value Parameters. They provide, by far, the best performance with obscenely low app memory overhead.
Combining scattered advice for the MERGE of Orders data...
CREATE PROCEDURE MyOrderMerge #SourceValues [MyOrderSqlUserType] READONLY
AS
BEGIN
DECLARE #LiveRows TABLE (MergeAction VARCHAR(20), OrderId INT);
DECLARE #DeleteCount INT;
SET #DeleteCount = 0;
MERGE INTO [Order] AS [target]
USING ( SELECT sv.OrderNumber,
c.CustomerId,
st.ShipTypeId
sv.OrderDate,
sv.IsPriority
FROM #SourceValues sv
JOIN [Customer] c ON sv.[CustomerName] = c.[CustomerName]
JOIN [ShipType] st ON ...
) AS [stream]
ON [stream].[OrderNumber] = [target].[SourceOrderNumber]
WHEN MATCHED THEN
UPDATE
...
WHEN NOT MATCHED BY TARGET THEN
INSERT
---
-- Keep a tally of all active source records
-- SQL Server's "INSERTED." prefix encompases both INSERTed and UPDATEd rows <insert very bad words here>
OUTPUT $action, INSERTED.[OrderId] INTO #LiveRows
; -- MERGE has ended
-- Delete child OrderItem rows before parent Order rows
DELETE FROM [OrderItem]
FROM [OrderItem] oi
-- Delete the Order Items that no longer exist at the source
LEFT JOIN #LiveRows lr ON oi.[OrderId] = lr.[OrderId]
WHERE lr.OrderId IS NULL
;
SET #DeleteCount = #DeleteCount + ##ROWCOUNT;
-- Delete parent Order rows that no longer have child Order Item rows
DELETE FROM [Order]
FROM [Order] o
-- Delete the Orders that no longer exist at the source
LEFT JOIN #LiveRows lr ON o.[OrderId] = lr.[OrderId]
WHERE lr.OrderId IS NULL
;
SET #DeleteCount = #DeleteCount + ##ROWCOUNT;
SELECT MergeAction, COUNT(*) AS ActionCount FROM #LiveRows GROUP BY MergeAction
UNION
SELECT 'DELETE' AS MergeAction, #DeleteCount AS ActionCount
;
END
Everything is done in one sweet loop-dee-loop streamed round trip and highly optimized on key indexes. Even though internal primary key values are unknown from the source, the MERGE operation makes them available to the DELETE operations.
The Customer MERGE uses a different #LiveRows TABLE structure, consequentially a different OUTPUT statement and different DELETE statements...
CREATE PROCEDURE MyCustomerMerge #SourceValues [MyCustomerSqlUserType] READONLY
AS
BEGIN
DECLARE #LiveRows TABLE (MergeAction VARCHAR(20), CustomerId INT);
DECLARE #DeleteCount INT;
SET #DeleteCount = 0;
MERGE INTO [Customer] AS [target]
...
OUTPUT $action, INSERTED.[CustomerId] INTO #LiveRows
; -- MERGE has ended
-- Delete child OrderItem rows before parent Order rows
DELETE FROM [OrderItem]
FROM [OrderItem] oi
JOIN [Order] o ON oi.[OrderId] = o.[OrderId]
-- Delete the Order Items that no longer exist at the source
LEFT JOIN #LiveRows lr ON o.[CustomerId] = lr.[CustomerId]
WHERE lr.CustomerId IS NULL
;
SET #DeleteCount = #DeleteCount + ##ROWCOUNT;
-- Delete child Order rows before parent Customer rows
DELETE FROM [Order]
FROM [Order] o
-- Delete the Orders that no longer exist at the source
LEFT JOIN #LiveRows lr ON o.[CustomerId] = lr.[CustomerId]
WHERE lr.CustomerId IS NULL
;
SET #DeleteCount = #DeleteCount + ##ROWCOUNT;
-- Delete parent Customer rows that no longer have child Order or grandchild Order Item rows
DELETE FROM [Customer]
FROM [Customer] c
-- Delete the Customers that no longer exist at the source
LEFT JOIN #LiveRows lr ON c.[CustomerId] = lr.[CustomerId]
WHERE lr.CustomerId IS NULL
;
SET #DeleteCount = #DeleteCount + ##ROWCOUNT;
SELECT MergeAction, COUNT(*) AS ActionCount FROM #LiveRows GROUP BY MergeAction
UNION
SELECT 'DELETE' AS MergeAction, #DeleteCount AS ActionCount
;
END
Setup and maintenance is a bit of a pain - but so worth the efficiencies reaped.
you can also use below code
drop table energydata
create table temp_energydata
(
webmeterID int,
DT DateTime ,
kWh varchar(10)
)
Insert into temp_energydata
select 1,getdate()-10, 120
union
select 2,getdate()-9, 140
union
select 3,getdate()-6, 37
union
select 4,getdate()-3, 40
union
select 5,getdate()-1, 240
create table energydata
(
webmeterID int,
DT DateTime ,
kWh varchar(10)
)
Insert into energydata (webmeterID,kWh)
select 1, 120
union
select 2, 140
union
select 3, 37
union
select 4, 40
select * from energydata
select * from temp_energydata
begin tran ABC
DECLARE #T TABLE(ID INT);
MERGE INTO dbo.energydata WITH (HOLDLOCK) AS target
USING dbo.temp_energydata AS source
ON target.webmeterID = source.webmeterID
AND target.kWh = source.kWh
WHEN MATCHED THEN
UPDATE SET target.DT = source.DT
WHEN NOT MATCHED BY source THEN delete
OUTPUT source.webmeterID INTO #T;
DELETE temp_energydata
WHERE webmeterID in (SELECT webmeterID
FROM #T);
--INSERT (webmeterID, DT, kWh)
--VALUES (source.webmeterID, source.DT, source.kWh)
rollback tran ABC
commit tran ABC