I have a table table1 and the columns are like this:
id - int
field1 - varchar(10)
field2 - varchar(10)
totalMarks - int
Also, I have created an index using the column field1.
create index myindex1 on table1 (field1);
I need to update the table entries, either using field1 or field2.
UPDATE table1 SET totalMarks = 1000 WHERE field1='somevalue';
or
UPDATE table1 SET totalMarks = 1000 WHERE field2='somevalue';
Which update query will have good performance? Since we have created an index using field1, will it have a good performance if we use field1 in the where clause?
A simple update statement with a where clause with an equality comparison on an indexed column should use the index for the where clause.
This will not always improve performance, but it usually will on larger tables. On very small tables where the data all fits on a single data page, the engine needs to load both the index and the data page into memory, which can actually be a wee bit slower than just looking for the row on a given page. This is an edge case.
I would recommend using the version with the indexed column.
Related
We have a table having around 10 million records and we are trying to update some columns using the id(primary key) in the where clause.
UPDATE table_name SET column1=1, column2=0,column3='2022-10-30' WHERE id IN(1,2,3,4,5,6,7,......etc);
Scenario 1: when there are 3000 or fewer ids in the IN clause and if I try for EXPLAIN, then the 'possible_keys' and 'key' show the PRIMARY, and the query gets executed very fast.
Scenario 2: when there are 3000 or more ids(up to 30K) in the IN clause and if I try for EXPLAIN, then the 'possible_keys' shows NULL and the 'key' shows the PRIMARY and the query runs forever. If I use FORCE INDEX(PRIMARY) then the 'possible_keys' and the 'key' shows the PRIMARY and the query gets executed very fast.
Scenario 3: when there are more than 30k ids in the IN clause and even if I use FORCE INDEX(PRIMARY), the 'possible_keys' shows NULL, and the 'key' shows the PRIMARY and the query runs forever.
I believe the optimizer is going for a full table scan instead of an index scan. Can we make any change such that the optimizer goes for an index scan instead of a table scan? Please suggest if there are any parameter changes required to overcome this issue.
The MySQL version is 5.7
As far as I know you need to just provide an ad-hoc table with all the ids and join table_name from it:
update (select 1 id union select 2 union select 3) ids
join table_name using (id) set column1=1, column2=0, column3='2022-10-30';
In mysql 8 you can use a values table constructor which is a little more terse (omit "row" for mariadb, e.g. values (1),(2),(3)):
update (select null id where 0 union all values row(1),row(2),row(3)) ids
join table_name using (id) set column1=1, column2=0, column3='2022-10-30';
fiddle
When UPDATEing a significant chunk of a table wit all the same update values, I see a red flag.
Do you always update the same set of rows? Could that info be in a smaller separate table that you JOIN to?
Or may some other structural schema change that focuses on helping the Updates be faster?
If you must have a long IN list, I suggest doing 100 at a time. And don't try to COMMIT all 3000+ in the same transaction. (Committing in chunks mak violate some business logic, so you may not want to do such.)
I am trying to do an update between two very large tables (millions of records)
Update TableA as T1
Inner Join TableB as T2
On T1.Field1=T2.Field1
And T1.Value >= T2.MinValue
And T1.Value <= T2.MaxValue
Set T1.Field2=T2.Field2
I have indexes on all the fields individually in both tables, Value fields included. They are all Normal/BTree (default).
Is there some better way to index the value fields to get better performance? It is adding a huge overhead to what is otherwise (using just =) a pretty fast update otherwise. My first test too 3 minutes to update 25 records and I have 5 million to update.
Generally, only one index per table can be used for a particular WHERE clause. So if it uses the index on Field1, it won't be able to use the index on Value, and vice versa. The solution is to use a composite index:
ALTER TABLE TableA ADD INDEX (Field1, Value);
Since an index on multiple columns is effectively an index on any prefix set of the columns, you can remove the individual index on Field1 when you add this index. But you should still keep the individual index on Value if that's needed for other queries.
I have a MySQL table of the form
CREATE TABLE `myTable` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`timestamp` datetime NOT NULL,
`fieldA` int(11) NOT NULL,
`fieldB` int(11) NOT NULL,
....
)
The table will have around 500,000,000 rows, with the remaining fields being floats.
The queries I will be using will be of the form:
SELECT * FROM myTable
WHERE fieldA= AND fieldB= AND timestamp>'' and timestamp<=''
ORDER BY timestamp;
At the moment I have two indices: a primary key on id, and a unique key on timestamp,fieldA,fieldB (hashed). At the moment, a select query like the above takes around 6 minutes on a reasonably powerful desktop PC.
What would the optimal index to apply? Does the ordering of the 3 fields in the key matter, and should I be using a binary tree instead of hashed? Is there a conflict between my primary key and the second index? Or do I have the best performance I can expect for such a large db without more serious hardware?
Thanks!
For that particular query adding an index to fieldA and fieldB probably would be optimal. Order of the columns in the index do matter.
Index Order
In order for Mysql to even consider using a particular index on the query the first column must be in the query, so for example:
alter table mytable add index a_b_index(a, b);
select * from mytable where a = 1 and b = 2;
The above query should use the index a_b_index. Now take this next example:
alter table mytable add index a_b_index(a, b);
select * from mytable where b = 2;
This will not use the index because the index starts with a, but a is never used in the query so mysql will not use it.
Comparison
Mysql will only use an index if you use equality comparison. So < and > won't use an index for that column, same with between
LIKE
Mysql does use indexes on the LIKE statement, but only when the % is at the end of the statement like this:
select * from mytable where cola like 'hello%';
Whereas these will not use a index:
select * from mytable where cola like '%hello';
select * from mytable where cola like '%hello%';
Hashed indexes are not used for ranges. They are used for equality comparisons only. Therefore, a hashed index cannot be used for the range portion of your query.
Since you have a range in your query, you should use a standard b-tree index. Ensure that fielda and fieldb are the first columns in the index, then timestamp. MySQL cannot utilize the index for searches beyond the first range.
Consider a multi-column index on (fielda, fieldb, timestamp).
The index should also be able to satisfy the ORDER BY.
To improve the query further, select only those three columns or consider a larger "covering" index.
Assume I have this table:
create table table_a (
id int,
name varchar(25),
address varchar(25),
primary key (id)
) engine = innodb;
When I run this query:
select * from table_a where id >= 'x' and name = 'test';
How will MySQL process it? Will it pull all the id's first (assume 1000 rows) then apply the where clause name = 'test'?
Or while it looks for the ids, it is already applying the where clause at the same time?
As id is the PK (and no index on name) it will load all rows that satisfy the id based criterion into memory after which it will filter the resultset by the name criterion. Adding a composite index containing both fields would mean that it would only load the records that satisfy both criteria. Adding a separate single column index on the name field may not result in an index merge operation, in which case the index would have no effect.
Do you have indexes on either column? That may affect the execution plan. The other thing is one might cast the 'x'::int to ensure a numeric comparison instead of a string comparison.
For the best result, you should have a single index which includes both of the columns id and name.
In your case, I can't answer the affect of the primary index to that query. That depends on DBMS's and versions. If you really don't want to put more index (because more index means slow write and updates) just populate your table with like 10.000.000 random results, try it and see the effect.
you can compare the execution times by executing the query first when the id comes first in the where clause and then interchange and bring the name first. to see an example of mysql performance with indexes check this out http://www.mysqlperformanceblog.com/2006/06/02/indexes-in-mysql/
You can get information on how the query is processed by running EXPLAIN on the query.
If the idea is to optimize that query then you might want to add an index like:
alter table table_a add unique index name_id_idx (name, id);
Should I include col3 & col4 in my index on MyTable if this is the only query I intend to run on my database?
Select MyTable.col3, MyTable.col4
From MyTable
Inner Join MyOtherTable
On MyTable.col1 = MyOtherTable.col1
And MyTable.col2 = MyOtherTable.col2;
The tables I'm using have about half a million rows in them. For the purposes of my question, col1 & col2 are a unique set found in both tables.
Here's the example table definition if you really need to know:
CREATE TABLE MyTable
(col1 varchar(10), col2 varchar(10), col3 varchar(10), col4 varchar(10));
CREATE TABLE MyOtherTable
(col1 varchar(10), col2 varchar(10));
So, should it be this?
CREATE MyIdx ON MyTable (col1,col2);
Or this?
CREATE MyIdx ON MyTable (col1,col2,col3,col4);
adding columns col3 and col4 will not help because you're just pulling those values after finding them using columns col1 and col2. The speed would normally come from making sure columns col1 and col2 are indexed.
You should actually split those indexes since you're not using them together:
CREATE MyIdx ON MyTable (col1);
CREATE MyIdx ON MyTable (col2);
I don't think a combined index will help you in this case.
CORRECTION: I think I've misspoken, since you intend to use only that query on the two tables and never have the individual columns joined in isolation. In your case it appears you could get some speed up by putting them together. It would be interesting to benchmark this to see just how much of a speedup you'd see on 1/2 million rows using a combined index versus individual ones. (You should still not use columns col3 and col4 in the index, since you're not joining anything by them.)
A query returning half a million rows joined from two tables is never going to be very fast - because it's returning half a million rows.
An index on col1,col2 seems sufficient (as a secondary index), but depending on what other columns you have, adding (col3,col4) might make it a covering index.
In InnoDB it might be to make the primary key (col1,col2), then it will cluster it, which is something of a win.
But once again, if your query joins 500,000 rows with no other WHERE clause, and returns 500,000 rows, it's not going to be fast, becuase it needs to fetch all of the rows to return them.
I don't think anyone else mentioned it, so I'm adding that you should have a compound (col1,col2) index on both tables:
CREATE MyIdx ON MyTable (col1,col2);
CREATE MyOtherIdx ON MyOtherTable (col1,col2);
And another point. An index on (col1,col2,col3,col4) will be helpful if you ever need to use a DISTINCT variation of your query:
Select DISTINCT
MyTable.col3, MyTable.col4
From MyTable
Inner Join MyOtherTable
On MyTable.col1 = MyOtherTable.col1
And MyTable.col2 = MyOtherTable.col2;