I am trying to do an update between two very large tables (millions of records)
Update TableA as T1
Inner Join TableB as T2
On T1.Field1=T2.Field1
And T1.Value >= T2.MinValue
And T1.Value <= T2.MaxValue
Set T1.Field2=T2.Field2
I have indexes on all the fields individually in both tables, Value fields included. They are all Normal/BTree (default).
Is there some better way to index the value fields to get better performance? It is adding a huge overhead to what is otherwise (using just =) a pretty fast update otherwise. My first test too 3 minutes to update 25 records and I have 5 million to update.
Generally, only one index per table can be used for a particular WHERE clause. So if it uses the index on Field1, it won't be able to use the index on Value, and vice versa. The solution is to use a composite index:
ALTER TABLE TableA ADD INDEX (Field1, Value);
Since an index on multiple columns is effectively an index on any prefix set of the columns, you can remove the individual index on Field1 when you add this index. But you should still keep the individual index on Value if that's needed for other queries.
Related
I have a table table1 and the columns are like this:
id - int
field1 - varchar(10)
field2 - varchar(10)
totalMarks - int
Also, I have created an index using the column field1.
create index myindex1 on table1 (field1);
I need to update the table entries, either using field1 or field2.
UPDATE table1 SET totalMarks = 1000 WHERE field1='somevalue';
or
UPDATE table1 SET totalMarks = 1000 WHERE field2='somevalue';
Which update query will have good performance? Since we have created an index using field1, will it have a good performance if we use field1 in the where clause?
A simple update statement with a where clause with an equality comparison on an indexed column should use the index for the where clause.
This will not always improve performance, but it usually will on larger tables. On very small tables where the data all fits on a single data page, the engine needs to load both the index and the data page into memory, which can actually be a wee bit slower than just looking for the row on a given page. This is an edge case.
I would recommend using the version with the indexed column.
so I have written a query as follows:
UPDATE
table1 latest, table2 previous
SET latest.col1 = previous.col1
WHERE latest.col2 = previous.col2 and previous.col1 is not null;
which copies the value of col2, from table2 to table 1 wherever the value of col1, matches. However due to the context, there can be no primary/foreign key constraints involve and col2 doesn't contain nulls but col1 does( in both tables)..
however this query takes several minutes to execute! is there a way to speeded it up?
Fixed by adding indexes to both table. Created indexes for the common column on which the tables were being joined. Wherever lookups are performed, either through joins, the columns being joined/lookedup should have indexes
I have the following tables (example)
t1 (20.000 rows, 60 columns, primary key t1_id)
t2 (40.000 rows, 8 columns, primary key t2_id)
t3 (50.000 rows, 3 columns, primary key t3_id)
t4 (30.000 rows, 4 columns, primary key t4_id)
sql query:
SELECT COUNT(*) AS count FROM (t1)
JOIN t2 ON t1.t2_id = t2.t2_id
JOIN t3 ON t2.t3_id = t3.t3_id
JOIN t4 ON t3.t4_id = t4.t4_id
I have created indexes on columns that affect the join (e.g on t1.t2_id) and foreign keys where necessary. The query is slow (600 ms) and if I put where clauses (e.g. WHERE t1.column10 = 1, where column10 doesn't have index), the query becomes much slower. The queries I do with select (*) and LIMIT are fast, and I can't understand count behaviour. Any solution?
EDIT: EXPLAIN SQL ADDED
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t4 index PRIMARY user_id 4 NULL 5259 Using index
1 SIMPLE t2 ref PRIMARY,t4_id t4_id 4 t4.t4_id 1 Using index
1 SIMPLE t1 ref t2_id t2_id 4 t2.t2_id 1 Using index
1 SIMPLE t3 ref PRIMARY PRIMARY 4 t2.t2_id 1 Using index
where user_id is a column of t4 table
EDIT: I changed from innodb to myisam and i had a speed increase, especially if i put where clauses. But i h still have times (100-150 ms) The reason i want count in my application, is to the the user who is processing a search form, the number of results he is expecting with ajax. May be there is a better solution in this, for example creating a temporary table, that is updated every one hour?
The count query is simply faster because of INDEX ONLY SCAN, as stated within query plan. The query you mention consists of only indexed columns, and thats why during execution there is no need to touch physical data - all query is performed on indexes. When you put some additional clause consisting of columns that are not indexed, or indexed in a way that prevents index usage there is a need to access data stored in a heap table by physical address - which is very slow.
EDIT:
Another important thing is that, those are PKs, so they are UNIQUE. Optimizer choses to perform INDEX RANGE SCAN on the first index, and only checks if keys exist in subsequent indexes (that's why the plan states there will be only one row returned).
EDIT2:
Thx to J. Bruni, in fact that is clustered index co the above isn't the "whole truth". There is probably full scan on the first table, and three subsequent INDEX ACCESSes to confirm the FK existance.
count iterate over whole result set and does not depends on indexes. Use EXPLAIN ANALYSE for your query to check how it is executed.
select + limit does not iterate whole result set, hence it's faster
Regarding the COUNT(*) slow performance: are you using InnoDB engine? See:
http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/
"SELECT COUNT(*)" is slow, even with where clause
The main information seems to be: "InnoDB uses clustered primary keys, so the primary key is stored along with the row in the data pages, not in separate index pages."
So, one possible solution is to create a separated index and force its usage through USE INDEX command in the SQL query. Look at this comment for a sample usage report:
http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/comment-page-1/#comment-529049
Regarding the WHERE issue, the query will perform better if you put the condition in the JOIN clause, like this:
SELECT COUNT(t1.t1_id) AS count FROM (t1)
JOIN t2 ON (t1.column10 = 1) AND (t1.t2_id = t2.t2_id)
JOIN t3 ON t2.t3_id = t3.t3_id
JOIN t4 ON t3.t4_id = t4.t4_id
Had a quick question about joins in mysql and indexing. If I had 2 tables:
table1
id
name
table2
table1id
table2title
And I join table2 and table1 using id and table1id would I add a index to id on table1 and table1id in table2? Or would I just add an index to one of the tables? Im using MySQL with MyISAM version 5.x
If table1.id is a primary key, then you don't need to index it. (Primary keys are automatically indexed)
If not, then you'll need to index table1.id and table2.table1id
Use "EXPLAIN" in you selects to see what indexes you are hitting.
Yes, add an index on both id columns (as another poster said, primary columns are indexed). Indices allow MySQL to quickly locate the row in a data file, instead of reading sequentially.
If you need rows from both tables given an id, index both tables for optimal performance. Else, the initial clause (SELECT...WHERE) will run quickly, and the JOIN will be slow (or vice versa), resulting in a slow query.
Lets say I have a simple many-to-many table between tables "table1" and "table2" that consists from two int fields: "table1-id" and "table2-id". How should I index this linking table?
I used to just make a composite primary index (table1-id,table2-id), but I read that this index might not work if you change order of the fields in the query. So what's the optimal solution then - make independent indexes for each field without a primary index?
Thanks.
It depends on how you search.
If you search like this:
/* Given a value from table1, find all related values from table2 */
SELECT *
FROM table1 t1
JOIN table_table tt ON (tt.table_1 = t1.id)
JOIN table2 t2 ON (t2.id = tt.table_2)
WHERE t1.id = #id
then you need:
ALTER TABLE table_table ADD CONSTRAINT pk_table1_table2 (table_1, table_2)
In this case, table1 will be leading in NESTED LOOPS and your index will be usable only when table1 is indexed first.
If you search like this:
/* Given a value from table2, find all related values from table1 */
SELECT *
FROM table2 t2
JOIN table_table tt ON (tt.table_2 = t2.id)
JOIN table1 t1 ON (t1.id = tt.table_1)
WHERE t2.id = #id
then you need:
ALTER TABLE table_table ADD CONSTRAINT pk_table1_table2 (table_2, table_1)
for the reasons above.
You don't need independent indices here. A composite index can be used everywhere where a plain index on the first column can be used. If you use independent indices, you won't be able to search efficiently for both values:
/* Check if relationship exists between two given values */
SELECT 1
FROM table_table
WHERE table_1 = #id1
AND table_2 = #id2
For a query like this, you'll need at least one index on both columns.
It's never bad to have an additional index for the second field:
ALTER TABLE table_table ADD CONSTRAINT pk_table1_table2 PRIMARY KEY (table_1, table_2)
CREATE INDEX ix_table2 ON table_table (table_2)
Primary key will be used for searches on both values and for searches based on value of table_1, additional index will be used for searches based on value of table_2.
As long as you are specifying both keys in the query, it doesn't matter what order they have in the query, nor does it matter what order you specify them in the index.
However, it's not unlikely that you will sometimes have only one or the other of the keys. If you sometimes have id_1 only, then that should be the first (but you still only need one index).
If you sometimes have one, sometimes the other, sometimes both, you'll need one index with both keys, and a second (non-unique) index with one field - the more selective of the two keys - and the primary composite index should start with the other key.
#Quassnoi, in your first query you're actually using only tt.table_1 key as we can see from the WHERE-clause: WHERE t1.id = #id. And in the second query - only tt.table_2.
So the multi-column index could be useful only in the third query because of WHERE table_1 = #id1 AND table_2 = #id2. If the queries of this kind are not going to be used, do you think it's worth to use two separate one-column indices instead?