SQL UPDATE query taking too long - mysql

So I am very new to MySQL, and I am trying to run a query to update a column if a cell value is present in both tables, and the query is taking forever to run (It's been running for 10 minutes now and no result yet). One of my tables is about 250,000 rows, and the other is about 80,000, so I'm not sure why it is taking so long. The query I am using is:
USE the_db;
UPDATE table1
JOIN table2
ON table2.a = table1.b
SET table1.c = "Y";
I've changed the names of the tables and columns, but the query is exactly the same. I've looked at other answers on here and all of them take a very long time as well. Any help would be appreciated, thanks.

For this query:
UPDATE table1 JOIN
table2
ON table2.a = table1.b
SET table1.c = 'Y';
You want an index on table2(a):
create index idx_table2_a on table2(a);
Also, if there are multiple values of a that match each b, then you could also be generating a lot of intermediate rows, and that would have a big impact on performance.
If that is the case, then phrase the query as:
UPDATE table1
SET table1.c = 'Y'
WHERE EXISTS (SELECT 1 FROM table2 WHERE table2.a = table1.b);
And you need the same index.
The difference between the queries is that this one stops at the first matching row in table2.

Related

MySQL Query with inner join very slow

I have two tables containing 6M rows each. I'm trying to join the two using an inner join but the query ran for 2 days without finishing. The join is (note I've used count(*) just to enable me to run an explain, I'm actually using the join in a CTAS):
SELECT count(*)
FROM table1 t1,
table2 t2
WHERE t1.col1 = t2.colA
AND t1.col2 = t2.colB;
After a bit of investigation I've found the below query runs fine:
SELECT count(*)
FROM
(SELECT *
FROM table1) t1,
(SELECT *
FROM table2) t2
WHERE t1.col1 = t2.colA
AND t1.col2 = t2.colB;
The only difference between that instead of the table, I use the sub-query SELECT * FROM table;
Running the explain plans shows that the latter query is building up an index when it selects table2. Whereas the first query is using a join buffer (Block Nested Loop).
Surely MySQL is clever enough to work out that the two queries are practically identical and do the same with both queries? I don't see why an index should be need because a full scan is required for both tables anyway. These are temporary/transitory tables so if I did put an index on, it would literally be just to perform this join.
Is there a way to fix this via MySQL configuration?
You NEED the index on at least ONE of the tables, even such as
create index Temp1 on Table2 ( colA, colB )
So, your query from Table 1 joined to table 2, so even if a table scan is on all of table 1, you need it to quickly find the record(s) that match in table 2. If NEITHER has an index, then think of it this way. For every record in Table1, scan through ALL records in Table 2 and grab all records that match for ColA, ColB. Now, go back to table 1 for the SECOND record... go back through table 2 for ALL records until it finds a match.
Being that you have 6M records, you could practically choke a cow (so-to-speak) on performance. By having an index, even on the SECOND table, when the query is on the first record, it can immediately jump to the rows that match ColA, ColB and as soon as those A/B records are done, it goes back to the first table.
Now, for other overhead efficiencies. If you have BOTH tables indexed on respective Col1, Col2 and ColA, ColB, then the engine will have in its memory / cache a whole block of records for each common area and doesn't have to keep going back to the raw data pages for other elements repeatedly.
So, even though you think it might not be practical, it is still good to handle large table queries. Also, if you have multiple records in the first table with the same values for Col1, Col2, but have different other values for other columns in the table, and similarly in the second table for multiple ColA, ColB, you would get a Cartesian result. Consider the following scenario
Table1
Col1 Col2 OtherColumn
X Y blah1
X Y blah2
X Y blah3
Table2
ColA ColB OtherColumn
X Y second blah1
X Y second blah2
X Y second blah3
A simple query like you have
SELECT count(*)
FROM table1 t1,
table2 t2
WHERE t1.col1 = t2.colA
AND t1.col2 = t2.colB;
would result in a count of 9. You have 6M records and a possible Cartesian result? Hopefully this clarifies some problems you may be encountering.

Is this query well written? I am fairly new at this and am wondering if there is a better way to write it

UPDATE table1
INNER JOIN table2
ON table1.var1=table2.var1
SET table1.var2=table2.var2
My table has about 975,000 rows in it and I know this will take a while no matter what. Is there any better way to write this?
Thanks!
If the standard case is that table1.Var2 already is equal to table2.var2, you may end up with an inflated write count as the database may still update all those rows with no functional change in value.
You may get better performance by updating only those rows which have a different value than the one you desire.
UPDATE table1
INNER JOIN table2
ON table1.var1=table2.var1
SET table1.var2=table2.var2
WHERE (table1.var2 is null and table2.var2 is not null OR
table1.var2 is not null and table2.var2 is null OR
table1.var2 <> table2.var2)
Edit: Nevermind... MySQL only updates on actual changes, unlike some other RDBMS's (MS SQL, for example.)
Your query:
UPDATE table1 INNER JOIN
table2
ON table1.var1 = table2.var1
SET table1.var2 = table2.var2;
A priori, this looks fine. The major issue that I can see would be a 1-many relationship from table1 to table2. In that case, multiple rows from table2 might match a given row from table1. MySQL assigns an arbitrary value in such a case.
You could fix this by choosing one value, such as the min():
UPDATE table1 INNER JOIN
(select var1, min(var2) as var2
from table2
group by var1
) t2
ON table1.var1 = t2.var1
SET table1.var2 = t2.var2;
For performance reasons, you should have an index on table2(var1, var2). By including both columns in the index, the query will be able to use the index only and not have to fetch rows directly from the table.

mySQL: How to identify duplicates based on four fields

I have read a few posts on SO on how to delete duplicates, by comparing a table with another instance of itself, however I don't want to delete the duplicates I want to compare them.
eg. I have the fields "id", "sold_price", "bruksareal", "kommunenr", "Gårdsnr" ,"Bruksnr", "Festenr", "Seksjonsnr". All fields are int.
I want to identify the rows that are duplicates/identical (the same bruksareal, kommunenr, gårdsnr, bruksnr,festenr and seksjonsnr). If identical then I want to give these rows a unique reference number.
I believe this will make is easier to identify the rows that I later want to compare on other fields (eg. such as "sold_price", "sold_date" etc..)
I'm open to suggestions if you believe my approach is wrong...
Perform a join on the table to itself across all fields, then use an exists, query, such as:
Update Table1
Set reference = UUID()
Where exists (
Select tb1.id
from Table1 tb1 inner join Table1 tb2 on
tb1.Field1 = tb2.Field1 AND
tb1.Field2 = tb2.Field2 AND
etc
Where tb1.Id = Table1.Id
And tb1.Id != tb2.Id
)
actually you can simplify with just a join
Update Table1
Set reference = UUID()
From Table1 inner join Table1 tb2 on
Table1.Field1 = tb2.Field1 AND
Table1.Field2 = tb2.Field2 AND
etc
Where Table1.Id != tb2.Id
Depending on where you want to do that, i would go for a hash implementation. For every insert, calculate the hash of the needed columns when you do the insert (trigger maybe), and after that you should be able to find out very easily what rows are duplicated (if you index that column, the queries should be pretty fast, but remember that that is still not a int column, so it will get a little slower over time).
After this you can do whatever you please with the duplicated records, without very expensive queries on the database.
Later edit: Make sure that you convert the null values into some defined value, since some of the mysql functions like MD5 will just return null if the operand is null. The same goes for concat - if one operand is null, it will return null (the same is not valid for concat_ws though).

Reducing MySQL Query Time (Currently running for 24 hours and still going)

I have a database with three tables and I need to cross reference the first table against the other two to create a fourth table of consolidated information. All the tables have one field which is common, this is the MSISDN (mobile / cell telephone number) and is at least 10 digits long.
Table 1 - 819,248 rows
Table 2 - 75,308,813 rows
Table 3 - 17,701,196 rows
I want to return all the rows from Table 1 and append some of the fields from Tables 2 and Table 3 when there's a matching MSISDN.
My query has been running now for over 24 hours and I have no way of knowing how long something like this should take.
This type of query may be a regular project - is there a way to significantly reduce the query time?
I have indexed tables 2 and 3 with MSISDN and the fields I need to return.
My query is like this:
create TABLE FinishedData
select
Table1.ADDRESS, table1.POSTAL, table1.MOBILE,
table1.FIRST, table1.LAST, table1.MID, table1.CARRIER,
table1.TOWN, table1.ID, table2.status as 'status1',
table2.CurrentNetworkName as 'currentnetwork1',
table2.DateChecked as 'datechecked1', table3.Status as 'status2',
table3.CurrentNetworkName 'currentnetwork2',
table3.DateChecked as 'datechecked2'
from
table1 left join (table2, table3)
on (right(table1.MOBILE, 10) = right(table2.MSISDN, 10)
AND right(table1.MOBILE,10) = right(table3.MSISDN,10))
MySQL is running on a 64bit windows machine with 12GB memory and 8 logical cores # 3GHz. MySQLd is only using 10% cpu and 600MB of resources when running the query.
Any help is appreciated.
The kill performance issue is with right function When you use this function, MySQL can't use indexes.
My suggest is:
Create new fields in table2 and table 3 with reverse content of MSISDN
Make the join replacing right function by left function.
With this little change MySQL will can take indexes to make your joins.
Explained steps:
1) Create new columns:
Alter table table2 add column r_MSISDN varchar(200);
update table2 set r_MSISDN = reverse( MSISDN );
Alter table table3 add column r_MSISDN varchar(200);
update table3 set r_MSISDN = reverse( MSISDN );
2) New join:
...
from
table1 left join (table2, table3)
on (right(table1.MOBILE, 10) = left(table2.r_MSISDN, 10)
AND right(table1.MOBILE,10) = left(table3.r_MSISDN,10))
RIGHT is a function. Using a function in where clause means MySQL (and perhaps any database) cannot use an index because it has to compute the value returned by the function for each row before comparing.
If you want to make this query any faster, consider storing the MSISDN in a normalized form and comparing using = operator.
Now I am not sure what MSISDN number looks like. If it is a fixed width number then your job is easy. If it contains separators (spaces/hyphens) and the separators are only there for readability you should remove them before storing in database. If the first 10 characters are important and remaining are optional, you might consider storing the first 10 and remaining characters in separate columns.
As others have already mentioned, the problem is with the right function which does not allow using any indexes.
In simple words, your current query for each row in table1 makes a full scan of table2 and for each match makes a full scan of table3. Considering how many rows you have in table2 and table3, you have a good chance to see the world before the query is finished.
Another problem is that the query initiates a huge transaction which should be able, as MySQL thinks, to be rolled back and you might think over the isolation level.
I would not change the current tables though. I would create subcopies of table2 and table3 with the required columns, and add the right(table2.MSISDN, 10) as a separate indexed column in table2 copy (right(table3.MSISDN,10) in table3 copy).
Then, you can do the LEFT JOIN with the copies, or even reduce the copies to the rows which do match anything in table1 and do the LEFT JOIN then.

Fast Cross Table Update with MySQL

Simple question :D. I know how to do it, but I have to do it fast.
What’s the most time efficient method?
Scenario: two tables, tableA and tableB, update tableA.columnA from tableB.columnB, based on tableA.primarykey = tableB.primarykey.
Problem: tableA and tableB are over 10.000.000 records each.
update TableA as a
join TableB as b on
a.PrimaryKey = b.PrimaryKey
set a.ColumnA = b.ColumnB
Updating 10 million rows cannot be fast. Well... at least in comparison to the update of one row.
The best you can do:
indexes on joining fields, but you've got this, as these fields are primary keys
limit by where condition if applicable. Index covering where condition is needed to speed it up.