I have 2 Innodb tables. IN TableA I have a column (guidNew) that I want to assign its' values to a column in TableB (owner) depending on the relation between the column in TableA (guid) and TableB (owner).
Basically Tabl6eB (owner) has multiple entries that correspond to one TableA (guid). This is a Many to One relation. I want to change the TableB(owner) value to the new TableA(guidNew) values.
This is an example of the query:
UPDATE `TableB`, `TableA`
SET
`TableB`.`owner` = `TableA`.`guidNew`
WHERE `TableB`.`guid` != 0
AND `TableB`.`owner` = `TableA`.`guid`;
Now I do not know if this is working or not because there are more than 2 million entries. Is there a way to know the progress it has AND more important, a way to do it faster.
Make sure that you have indexed the guid and owner columns.
Try using the EXPLAIN command to see how the query is being performed
EXPLAIN SELECT TableB.owner, TableA.guidNew
FROM TableB, TableA
WHERE TableB.guid != 0
AND TableB.owner = TableA.guid
Related
I have a MySQL database with just 1 table:
Fields are: blocknr (not unique), btcaddress (not unique), txid (not unique), vin, vinvoutnr, netvalue.
Indexes exist on both btcaddress and txid.
Data in it looks like this:
I need to delete all "deletable" record pairs. An example is given in red.
Conditions are:
txid must be the same (there can be more than 2 records with same txid)
vinvoutnr must be the same
vin must be different (can have only 2 values 0 and 1, so 1 must be 0 other must be 1)
In a table of 36M records, about 33M records will be deleted.
I've used this:
delete t1
from registration t1
inner join registration t2
where t1.txid=t2.txid and t1.vinvoutnr=t2.vinvoutnr and t1.vin<>t2.vin;
It works but takes 5 hours.
Maybe this would work too (not tested yet):
delete t1
from registration as t1, registration as t2
where t1.txid=t2.txid and t1.vinvoutnr=t2.vinvoutnr and t1.vin<>t2.vin;
Or do I forget about a delete query and try to make a new table with all non-delatables in and then drop the original ?
Database can be offline for this delete query.
Based on your question, you are deleting most of the rows in the table. That is just really expensive. A better approach is to empty the table and re-populate it:
create table temp_registration as
<query for the rows to keep here>;
truncate table registration;
insert into registration
select *
from temp_registration;
Your logic is a bit hard to follow, but I think the logic on the rows to keep is:
select r.*
from registration r
where not exists (select 1
from registration r2
where r2.txid = r.txid and
r2.vinvoutnr = r.vinvoutnr and
r2.vin <> r.vin
);
For best performance, you want an index on registration(txid, vinvoutnr, vin).
Given that you expect to remove the majority of your data it does sound like the simplest approach would be to create a new table with the correct data and then drop the original table as you suggest. Otherwise ADyson's corrections to the JOIN query might help to alleviate the performance issue.
I have two tables tableA and tableB. tableA has 2 Million records and tableB has over 10 millions records. tableA has more than thirty columns whereas tableB has only two column. I need to update a column in tableA from tableB by joining both tables.
UPDATE tableA a
INNER JOIN tableB b ON a.colA=b.colA
SET a.colB= b.colB
colA in both table has been indexed.
Now when I execute the query it takes hours. Honestly I never saw it completed and max i have waited is 5 hours. Is their any way to complete this query within 20-30 minutes. What approach should I take.
EXPLAIN on SQL Query
"id" "_type" "table" "type" "possible_" "key" "key_len" "ref" "rows" "Extra"
"1" "SIMPLE" "a" "ALL" "INDX_DESC" \N \N \N "2392270" "Using where"
"1" "SIMPLE" "b" "ref" "indx_desc" "indx_desc" "133" "cis.a.desc" "1" "Using where"
Your UPDATE operation is performing a single transaction on ten million rows of a large table. (The DBMS holds enough data to roll back the entire UPDATE query if it does not complete for any reason.) A transaction of that size is slow for your server to handle.
When you process entire tables, the operation can't use indexes as well as it can when it has highly selective WHERE clauses.
A few things to try:
1) Don't update rows unless they need it. Skip the rows that already have the correct value. If most rows already have the correct value this will make your update much faster.
UPDATE tableA a
INNER JOIN tableB b ON a.colA=b.colA
SET a.colB = b.colB
WHERE a.colB <> b.colB
2) Do the update in chunks of a few thousand rows, and repeat the update operation until the whole table is updated. I guess tableA contains an id column. You can use it to organize the chunks of rows to update.
UPDATE tableA a
INNER JOIN tableB b ON a.colA=b.colA
SET a.colB = b.colB
WHERE a.id IN (
SELECT a.id
FROM tableA
INNER JOIN tableB ON a.colA = b.colA
WHERE a.colB <> b.colB
LIMIT 5000
)
The subquery finds the id values of 5000 rows that haven't yet been updated, and the UPDATE query updates them. Repeat this query until it changes no rows, and you're done. This makes things faster because the server must only handle smaller transactions.
3) Don't do the update at all. Instead, whenever you need to retrieve your colB value, simply join to tableB in your select query.
Chunking is the right way to go. However, chunk on the PRIMARY KEY of tableA.
I suggest only 1000 rows at a time.
Follow the tips given here
Did you say that the PK of tableA is a varchar? No problem. See the second flavor of code in that link; it uses ORDER BY id LIMIT 1000,1 to find the end of the next chunk, regardless of the datatype of id (the PK).
Hi i am not sure but you can do by cron job.
process: in table tableA you need to add one more field (for example) is_update set its default value is 0, set the cron job every min. when cron is working: for example it pick first time 10000 record having is_update field 0 value and update records and set is_update is1, in 2nd time its pick next 10000 have is_update 0 and so on...
Hope this will help to you.
For updating around 70 million records of a single MySQL table, I wrote a stored procedure to update the table in chunks of 5000. Took approximately 3 hours to complete.
DELIMITER $$
DROP PROCEDURE IF EXISTS update_multiple_example_proc$$
CREATE PROCEDURE update_multiple_example_proc()
BEGIN
DECLARE x bigint;
SET x = 1;
WHILE x <= <MAX_PRIMARY_KEY_TO_REACH> DO
UPDATE tableA A
JOIN tableB B
ON A.col1 = B.col1
SET A.col2_to_be_updated = B.col2_to_be_updated where A.id between x and x+5000 ;
SET x = x + 5000;
END WHILE;
END$$
DELIMITER ;
Look at oak-chunk-update tool. It is one of the best tool if you want to update billion of rows too ;)
I have a left join query that shows all the fields from a primary table (tblMarkers) and the values from a second table (tblLocations) where there is matching record.
tblLocations does not have a record for every id in tblMarkers
$query ="SELECT `tblMarkers`.*,`tblLocation`.*,`tblLocation`.`ID` AS `markerID`
FROM
`tblMarkers`
LEFT JOIN `tblLocation` ON `tblMarkers`.`ID` = `tblLocation`.`ID`
WHERE
`tblMarkers`.`ID` = $id";
I am comfortable with using UPDATE to update the tblMarkers fields but how do I update or INSERT a record into tblLocations if the record does not exist yet in tblLocations.
Also, how do I lock the record I ma working on to prevent someone else from doing an update at the same time?
Can I also use UPDATE tblMarkers * or do I have to list every field in the UPDATE statement?
Unfortunately you might have to implement some validation in your outside script. There is an IF statement in SQL, but I'm not sure if you can trigger different commands based on it's outcome.
Locking
In terms of locking, you have 2 options. for MyISAM tables, you can only lock the entire table using http://dev.mysql.com/doc/refman/5.0/en/lock-tables.html
LOCK TABLE users;
For InnoDB tables, there is no explicit 'lock' for single rows, however you can use transactions, to get exclusive rights during the operation. http://dev.mysql.com/doc/refman/5.0/en/innodb-locks-set.html
Update
There might be some shorthand notation, but I think you have to list every field in your query. Alternatively, you can always read the entire row, delete it and insert again using shorthand INSERT query. It all depends on how many fields you've got.
I'm normalizing an existing database. I currently have two columns in Table1, domain and container, with limited distinct combinations (currently ~30 combinations of these two from ~1000 records). I've built a new Table2 that holds all combinations, with a primary key (container_id) auto-generated when a new record is installed. I've added a container_id column to Table1, and want to fill in the values based on the Table1.container column.
At this point, all of the container names in table 2 are distinct, but that could change in the future, hence the need for a unique number as the PK.
i.e.
UPDATE Table1
SET container_id = (SELECT Table2.container_id
FROM Table2
WHERE Table2.container = Table1.container)
WHERE EXISTS
( SELECT Table2.container_id
FROM Table2
WHERE Table2.container = Table1.container)
This query return error 1242: subquery returns more than one row.
Am I barking up the totally wrong tree? Table2 should have zero duplicate values.
I should have used a join to update the table1
UPDATE Table1
LEFT JOIN Table2 USING (container)
SET Table1.contanier_id = Table2.contanier_id
WHERE Table1.contanier_id IS NULL AND Table2.contanier_id IS NOT NULL;
Table2.container is not unique, hence can get repeated. Due to this both sub queries return more than one row.
My data scheme is really simple, let s say it's about farms
tableA is the main one, with an
important field "is_active" assuming
the farm is trusted (kind of)
tableB is a data storage of
serialized arrays about farms
statistics
I want to retrieve all data about active farm so I just do something like that:
SELECT * FROM tableA LEFT JOIN tableB ON id_tableA=id_tableB WHERE is_active=1 ORDER BY id_tableA DESC;
Right now the query takes 15 sec to execute straight from a sql shell, for example it I want to retrieve all data from the tableB, like :
SELECT * FROM tableB ORDER BY id_tableB DESC;
it takes less than 1 sec (approx 1200 rows)...
Any ideas how to improve the original query ?
thx
Create indexes on the keys joing two tables..
check this link, how to create indexes in mysql:
http://dev.mysql.com/doc/refman/5.0/en/create-index.html
You'll have to create an index.
You could create the following index:
mysql> create index ix_a_active_id on tableA (id_tableA, is_active);
mysql> create index ix_b_id on tableB (id_tableB);
This first creates an index on BOTH the id + is active variable.
The second creates an index on the id for tableB.