I have a table t with columns id(primary key),a,b,c,d. assume that the columns id,a,b and c are already populated. I want to set column d =md5(concat(b,c)). Now the issue is that this table contains millions of records and the unique combination of b and c is only a few thousands. I want to save the time required for computing md5 of same values. Is there a way in which I can update multiple rows of this table with the same value without computing the md5 again, something like this:
update t set d=md5(concat(b,c)) group by b,c;
As group by does not work with update statement.
One method is a join:
update t join
(select md5(concat(b, c)) as val
from table t
group by b, c
) tt
on t.b = tt.b and t.c = tt.c
set d = val;
However, it is quite possible that any working with the data would take longer than the md5() function, so doing the update directly could be feasible.
EDIT:
Actually, updating the entire table is likely to take time, just for the updates and logging. I would suggest that you create another table entirely for the b/c/d values and join in the values when you need them.
Create a temp table:
CREATE TEMPORARY TABLE IF NOT EXISTS tmpTable
AS (SELECT b, c, md5(concat(b, c)) as d FROM t group by b, c)
Update initial table:
UPDATE t orig
JOIN tmpTable tmp ON orig.b = tmp.b AND orig.c = tmp.c
SET orig.d = tmp.d
Drop the temp table:
DROP TABLE tmpTable
Related
Given:
TableA
Table B
guid
guid
missing
Table A ~400k rows
Table B ~150k rows
Both tables have the same guids but I need to mark in A all the missing guids from B. Both guids have indexes.
Query:
update table_a
left join table_b b on table_a.guid = b.guid
set missing = true
where b.guid is null;
This query works but took 4,5 hours on my machine. Is there any way I can make this query run faster?
UPD:
All three answers below gave me some tips to think on.
The following query ran for 8 seconds.
update table_a a
set missing = true
where a.guid not in (
select a.guid
from table_b b,
table_a a
where b.guid = a.guid
);
its much faster. it only tests if the ROW xists.
UPDATE table_a ta
SET missing = true
WHERE NOT EXISTS ( SELECT 1 from table_b tb WHERE ta.guid = tb.guid );
Is table_b have an index starting with guid? (A PRIMARY KEY is an INDEX.)
Do you need to run this query frequently? Let's get rid of it after this initial update. In the future, whenever you modify table_b, reach over and update table_a. TRIGGERs might be a good way to do such. A DELETE TRIGGER could set missing=1; an INSERT TRIGGER (etc)
In, instead, you choose to run the UPDATE repeatedly, see this for how to chunk the action, etc: http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks
Another approach is to check for whether the row is "missing" by a LEFT JOIN in the SELECT, not by having the column.
Using TRIGGERs would be something like this:
DELIMITER //
CREATE TRIGGER del BEFORE DELETE ON table_b
FOR EACH ROW
BEGIN
UPDATE table_a
SET missing = true
WHERE guid = OLD.guid; -- or maybe test id??
END;
//
DELIMITER ;
And one for INSERT. And one for UPDATEs if you might change guid. This would need a two commands for UPDATE table_a -- one for the old guid (a la Delete), one for the new (a la Insert).
These would add a small burden when table_b is modified, but probably not enough to worry about.
Try:
update table_a
set missing = true
where guid not in (select guid from table_b)
There is a lot of information that I could find on SQL Merge, but I can't seem to get this working for me. Here's what's happening.
Each day I'll be getting an Excel file uploaded to a web server with a few thousand records, each record containing 180 columns. These records contain both new information which would have to use INSERT, and updated information which will have to use UPDATE. To get the information to the database, I'm using C# to do a Bulk Copy to a temp SQL 2008 table. My plan was to then perform a Merge to get the information into the live table. The temp table doesn't have a Primary Key set, but the live table does. In the end, this is how my Merge statement would look:
MERGE Table1 WITH (HOLDLOCK) AS t1
USING (SELECT * FROM Table2) AS t2
ON t1.id = t2.id
WHEN MATCHED THEN
UPDATE SET (t1.col1=t2.col1,t1.col2=t2.col2,...t1.colx=t2.colx)
WHEN NOT MATCHED BY TARGET THEN
INSERT (col1,col2,...colx)
VALUES(t2.col1,t2.col2,...t2.colx);
Even when including the HOLDLOCK, I still get the error Cannot insert duplicate key in object. From what I've read online, HOLDLOCK should allow SQL to read primary keys, but not perform any insert or update until after the task has been executed. I'm basically learning how to use MERGE on the fly, but is there something I have to enable for SQL 2008 to pick up on MERGE Locks?
I found a way around the problem and wanted to post the answer here, in case it helps anyone else. It looks like MERGE wouldn't work for what I needed since the temporary table being used had duplicate records that would be used as a Primary Key in the live table. The solution I came up with was to create the below stored procedure.
-- Start with insert
INSERT INTO LiveTable(A, B, C, D, id)
(
-- Filter rows to get unique id
SELECT A, B, C, D, id FROM(
SELECT A, B, C, D, id,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY id) AS row_number
FROM TempTable
WHERE NOT EXISTS(
SELECT id FROM LiveTable WHERE LiveTable.id = TempTable.id)
) AS ROWS
WHERE row_number = 1
)
-- Continue with Update
-- Covers skipped id's during insert
UPDATE tb_TestMLS
SET
LiveTable.A = T.A,
LiveTable.B = T.B,
LiveTable.C = T.C,
LiveTable.D = T.D
FROM LiveTable L
INNER JOIN
TempTable T
ON
L.id= T.id
The accepted answer to sql swap primary key values fails with the error Can't reopen table: 't' - presumably this has something to do with opening the same table for writing twice, causing a lock.
Is there any shortcut, or do I have to get both, set one of them to NULL, set the second one to the first one, then set the first one to the previously fetched value of the second?
Don't use temporary tables for this.
From the manual:
You cannot refer to a TEMPORARY table more than once in the same query.
For example, the following does not work:
mysql> SELECT * FROM temp_table, temp_table AS t2;
ERROR 1137: Can't reopen table: 'temp_table'
This error also occurs if you refer to a temporary table multiple
times in a stored function under different aliases, even if the
references occur in different statements within the function.
UPDATE:
Sorry if I don't get it right, but why does a simple three way exchange not work?
Like this:
create table yourTable(id int auto_increment, b int, primary key(id));
insert into yourTable(b) values(1), (2);
select * from yourTable;
DELIMITER $$
create procedure pkswap(IN a int, IN b int)
BEGIN
select #max_id:=max(id) + 1 from yourTable;
update yourTableset id=#max_id where id = a;
update yourTableset id=a where id = b;
update yourTableset id=b where id = #max_id;
END $$
DELIMITER ;
call pkswap(1, 2);
select * from yourTable;
To swap id values of 1 and 2, I would use a SQL statement like this:
EDIT : this does NOT work on an InnoDB table, only works on a MyISAM table, per my testing.
UPDATE mytable a
JOIN mytable b ON a.id = 1 AND b.id = 2
JOIN mytable c ON c.id = a.id
SET a.id = 0
, b.id = 1
, c.id = 2
For this statement to work, the id value of 0 must not exist in the table, any unused value would be suitable... but to get this to work in a single SQL statement, you need to (temporarily) use a third id value.
This solution works for regular MyISAM tables, not temporary tables. I missed that this was being performed on a temporary table, I was confused by the error message you reported Can't reopen table:.
To swap id values 1 and 2 in a temporary table, I'd run three separate statements, again, using a temporary placeholder value of 0:
UPDATE mytable a SET a.id = 0 WHERE a.id = 1;
UPDATE mytable b SET b.id = 1 WHERE b.id = 2;
UPDATE mytable c SET c.id = 2 WHERE c.id = 0;
Edit: Fixed errors
I have 2 tables with same structures.
current_db = 521,892 rows and primary key is email
new_db = 575,992 rows and primary key is email
On new_db I want delete existing email address which already in current_db.
Question :
How do I compare & delete existing email on new_db and the results on new_db only be 54,100
Check the rows you want to delete -
SELECT n.* FROM new_db n
JOIN current_db c
ON n.email = c.email;
Delete them -
DELETE n
FROM new_db n
JOIN current_db c
ON n.email = c.email;
if indexes matches you can do it in one shot with something like:
delete from new_table where id in (select id from old_table)
Otherwise you have to modify the query according to your matching fields
delete from new_table where id in (select id from old_table where oldtable.field = newtable.field)
Of course you have to pay attention on what you actually delete, my suggestion is to delete in two pass (creating a temporary table and changing the delete into an insert into temp_table) and then check if effectively everythings is correct.
(Under Oracle there's the minus operator which shows the difference between two recordset, hopefully there's something similar in your database enviroment).
minus op works like:
select * from table_a
minus
select * from table_b
Starting with 2 identical tables, it gives only rows with different field values.
Is straightforward how to cross check with count(*) if everything is fine with the data.
Don't know if for your RDBMS exists something alike, but I think that a fast search on google may help.
DELETE n
FROM new_db n
WHERE EXISTS
( SELECT *
FROM current_db c
WHERE c.email = n.email
)
I need to query a delete statement for the same table based on column conditions from the same table for a correlated subquery.
I can't directly run a delete statement and check a condition for the same table in mysql for a correlated subquery.
I want to know whether using temp table will affect mysql's memory/performance?
Any help will be highly appreciated.
Thanks.
You can make mysql do the temp table for you by wrapping your "where" query as an inline from table.
This original query will give you the dreaded "You can't specify target table for update in FROM clause":
DELETE FROM sametable
WHERE id IN (
SELECT id FROM sametable WHERE stuff=true
)
Rewriting it to use inline temp becomes...
DELETE FROM sametable
WHERE id IN (
SELECT implicitTemp.id from (SELECT id FROM sametable WHERE stuff=true) implicitTemp
)
Your question is really not clear, but I would guess you have a correlated subquery and you're having trouble doing a SELECT from the same table that is locked by the DELETE. For instance to delete all but the most recent revision of a document:
DELETE FROM document_revisions d1 WHERE edit_date <
(SELECT MAX(edit_date) FROM document_revisions d2
WHERE d2.document_id = d1.document_id);
This is a problem for MySQL.
Many examples of these types of problems can be solved using MySQL multi-table delete syntax:
DELETE d1 FROM document_revisions d1 JOIN document_revisions d2
ON d1.document_id = d2.document_id AND d1.edit_date < d2.edit_date;
But these solutions are best designed on a case-by-case basis, so if you edit your question and be more specific about the problem you're trying to solve, perhaps we can help you.
In other cases you may be right, using a temp table is the simplest solution.
can't directly run a delete statement and check a condition for the same table
Sure you can. If you want to delete from table1 while checking the condition that col1 = 'somevalue', you could do this:
DELETE
FROM table1
WHERE col1 = 'somevalue'
EDIT
To delete using a correlated subquery, please see the following example:
create table project (id int);
create table emp_project (id int, project_id int);
insert into project values (1);
insert into project values (2);
insert into emp_project values (100, 1);
insert into emp_project values (200, 1);
/* Delete any project record that doesn't have associated emp_project records */
DELETE
FROM project
WHERE NOT EXISTS
(SELECT *
FROM emp_project e
WHERE e.project_id = project.id);
/* project 2 doesn't have any emp_project records, so it was deleted, now
we have 1 project record remaining */
SELECT * FROM project;
Result:
id
1
Create a temp table with the values you want to delete, then join it to the table while deleting. In this example I have a table "Games" with an ID column. I will delete ids greater than 3. I will gather the targets in a temp table first so I can report on them later.
DECLARE #DeletedRows TABLE (ID int)
insert
#DeletedRows
(ID)
select
ID
from
Games
where
ID > 3
DELETE
Games
from
Games g
join
#DeletedRows x
on x.ID = g.ID
I have used group by aggregate with having clause and same table, where the query was like
DELETE
FROM TableName
WHERE id in
(select implicitTable.id
FROM (
SELECT id
FROM `TableName`
GROUP by id
HAVING count(id)>1
) as implicitTable
)
You mean something like:
DELETE FROM table WHERE someColumn = "someValue";
?
This is definitely possible, read about the DELETE syntax in the reference manual.
You can delete from same table. Delete statement is as follows
DELETE FROM table_name
WHERE some_column=some_value