solving latency in mysql update queries - mysql

so I have written a query as follows:
UPDATE
table1 latest, table2 previous
SET latest.col1 = previous.col1
WHERE latest.col2 = previous.col2 and previous.col1 is not null;
which copies the value of col2, from table2 to table 1 wherever the value of col1, matches. However due to the context, there can be no primary/foreign key constraints involve and col2 doesn't contain nulls but col1 does( in both tables)..
however this query takes several minutes to execute! is there a way to speeded it up?

Fixed by adding indexes to both table. Created indexes for the common column on which the tables were being joined. Wherever lookups are performed, either through joins, the columns being joined/lookedup should have indexes

Related

MySQL queries - defining compound and single indexes across multiple queries. How to prevent the indexes from conflicting and creating slow queries?

I am struggling to work out which columns are best to put my indexes on, when it seems adding additional indexes can have a detrimental effect on the query performance.
For example, I have the following query on a table with around 5m rows;
SELECT col1, col2 FROM table WHERE col1 = 'a' AND col2 = 'b' AND col3 = 'c';
Running this with no indexes takes 12 seconds!
I add a compound index on all 3 columns - table_col1_col2_col3_index;
My query now drops down to 2 seconds - great!
I now have another query on the same table (with no indexes on any column):
SELECT col1, col2 FROM table WHERE col1 = 'a';
Running this on its own and the query takes 4 seconds - still pretty slow!
So now I add a single column index to col1 table_col1_index
My query reduces down to 0.2 seconds. This is great, however I now run the original query again and notice that it is using this index opposed to the one I specified earlier. The original query is now back up at 6 seconds.
I am unsure how to go about ensuring that both queries can be optimised at the same time.
You can create indexes taking care to leave the most used or selective column on the left and then organizing the indexes if you can so you can use the same index in more queries ..
Furthermore you can always print the index you think is best adapted using FORCE (or IGNORE) https://dev.mysql.com/doc/refman/8.0/en/index-hints.html
SELECT * FROM table1 FORCE INDEX (col3_index)
WHERE col1=1 AND col2=2 AND col3=3;
Turn off the query cache, or use SELECT SQL_NO_CACHE ... when doing timing.
Run each timing test twice. The first may spend extra time fetching data and/or index blocks from disk. The second timing is better for comparisons. (And is closer to the way it would be in a "production" server.)
How many rows are being returned? That could have an impact. The 2nd query may be returning many times as many rows.
Please provide SHOW CREATE TABLE -- there could be subtle issues. (datatypes, column sizes, collations, who-knows-what)
Please provide EXPLAIN SELECT ... -- As written, each of your examples should say "Using index", meaning that the index "covers" the query, which means that all the columns in the SELECT exist in the INDEX being used.
Do not use "index hints" -- while it may help a query today, it may hurt it more tomorrow.
All of your examples would ('should') benefit from INDEX(col1, col2, col3), in that order; I would not add any others.

Use IN for multiple columns primary key

I have two columns primary key (id, id2) in MySql.
Those ids have a direct connection (for id=1 id2=11, for id=2 id=22, ect.)
I was wonder if the following query:
select * from my_table where id IN (1,2,3..) AND id2 IN (11,22,33..)
Is actually damage the performance, although it is a primary key.
Will run a single select in loop:
select * from my_table where id = 1 AND id2 = 11
select * from my_table where id = 2 AND id2 = 22
...
run faster?
I believe the answer is yes, cause for each id, the query compare id2 with a list of integers.
Is it correct?
Also, does IN makes a difference for a single column primary key?
If you are checking 'most' of the values, then a single query doing a table scan is probably fastest.
If the IN clauses are rather short, the Optimizer may hopscotch through the table very efficiently.
If the table is huge and the values are scattered and disk hits are needed, it may be slow regardless.
Roughly speaking, 10 1-row SELECTs inherently take as long as a single SELECT fetching 100 rows. (This assumes no I/O and good indexes.) So, you need to be desperate to do single selects.
In other words, you test it with your data and your IN lists. We cannot give you a simple answer. But beware, as the table grows and/or your IN lists change, performance could change.

Indexing for a Greater/Less than query

I am trying to do an update between two very large tables (millions of records)
Update TableA as T1
Inner Join TableB as T2
On T1.Field1=T2.Field1
And T1.Value >= T2.MinValue
And T1.Value <= T2.MaxValue
Set T1.Field2=T2.Field2
I have indexes on all the fields individually in both tables, Value fields included. They are all Normal/BTree (default).
Is there some better way to index the value fields to get better performance? It is adding a huge overhead to what is otherwise (using just =) a pretty fast update otherwise. My first test too 3 minutes to update 25 records and I have 5 million to update.
Generally, only one index per table can be used for a particular WHERE clause. So if it uses the index on Field1, it won't be able to use the index on Value, and vice versa. The solution is to use a composite index:
ALTER TABLE TableA ADD INDEX (Field1, Value);
Since an index on multiple columns is effectively an index on any prefix set of the columns, you can remove the individual index on Field1 when you add this index. But you should still keep the individual index on Value if that's needed for other queries.

Would WHERE col1 and ORDER BY col2 use a composite key on (col1,col2)?

I have a database table (potentially huge, with hundreds of millions of records in the future) on which I would execute the following query very often:
select *
from table1
where col1 = [some number]
order by col2
Obviously having an index on "col1" would make it run fast. col1 is not unique, so many rows (2000+ I expect) would be returned.
Does it make sense to create an index on (col1, col2)? Would MySQL use it for this query?
Also, if I just query without "order by" part, would this index be used as well for the "where" part?
Yes, it will help, mysql will use composite index with first part on WHERE and second part on ORDER BY. You can read about ORDER BY optimization here: http://dev.mysql.com/doc/refman/5.5/en/order-by-optimization.html

MySQL multicolumn index

Should I include col3 & col4 in my index on MyTable if this is the only query I intend to run on my database?
Select MyTable.col3, MyTable.col4
From MyTable
Inner Join MyOtherTable
On MyTable.col1 = MyOtherTable.col1
And MyTable.col2 = MyOtherTable.col2;
The tables I'm using have about half a million rows in them. For the purposes of my question, col1 & col2 are a unique set found in both tables.
Here's the example table definition if you really need to know:
CREATE TABLE MyTable
(col1 varchar(10), col2 varchar(10), col3 varchar(10), col4 varchar(10));
CREATE TABLE MyOtherTable
(col1 varchar(10), col2 varchar(10));
So, should it be this?
CREATE MyIdx ON MyTable (col1,col2);
Or this?
CREATE MyIdx ON MyTable (col1,col2,col3,col4);
adding columns col3 and col4 will not help because you're just pulling those values after finding them using columns col1 and col2. The speed would normally come from making sure columns col1 and col2 are indexed.
You should actually split those indexes since you're not using them together:
CREATE MyIdx ON MyTable (col1);
CREATE MyIdx ON MyTable (col2);
I don't think a combined index will help you in this case.
CORRECTION: I think I've misspoken, since you intend to use only that query on the two tables and never have the individual columns joined in isolation. In your case it appears you could get some speed up by putting them together. It would be interesting to benchmark this to see just how much of a speedup you'd see on 1/2 million rows using a combined index versus individual ones. (You should still not use columns col3 and col4 in the index, since you're not joining anything by them.)
A query returning half a million rows joined from two tables is never going to be very fast - because it's returning half a million rows.
An index on col1,col2 seems sufficient (as a secondary index), but depending on what other columns you have, adding (col3,col4) might make it a covering index.
In InnoDB it might be to make the primary key (col1,col2), then it will cluster it, which is something of a win.
But once again, if your query joins 500,000 rows with no other WHERE clause, and returns 500,000 rows, it's not going to be fast, becuase it needs to fetch all of the rows to return them.
I don't think anyone else mentioned it, so I'm adding that you should have a compound (col1,col2) index on both tables:
CREATE MyIdx ON MyTable (col1,col2);
CREATE MyOtherIdx ON MyOtherTable (col1,col2);
And another point. An index on (col1,col2,col3,col4) will be helpful if you ever need to use a DISTINCT variation of your query:
Select DISTINCT
MyTable.col3, MyTable.col4
From MyTable
Inner Join MyOtherTable
On MyTable.col1 = MyOtherTable.col1
And MyTable.col2 = MyOtherTable.col2;