Why update statement always does index range scan? - mysql

Consider this UPDATE statement:
UPDATE `messages` force index (primary)
SET `isDeleted`=1
WHERE `messages`.`id` = '069737b6-726d-4f5b-a5b9-0510acdd7a92';
Here's the explain graph for it:
Why this simple query uses index range scan instead of single row fetch or at least unique key fetch? Notice that I use FORCE INDEX and exactly same query written as SELECT statement results in "Single Row (constant)" scan.
Also same happens if I add LIMIT 1
I'm using mysql 5.6.46

MySQL ignores index hints in UPDATE statements.
Therefore there's no way to deterministically set the scan method for UPDATE query.
I guess I have to rely on MySQL's heuristics on deciding which scan method is faster based on table size, etc. Not ideal, because I don't know what's gonna be the performance profile for that query anymore, but I hope it will at least be "Index Range Scan" and nothing worse...
Reference: How to force mysql UPDATE query to use index? How to enable mysql engine to automatically use the index instead of forcing it?
https://dba.stackexchange.com/a/153323/146991

The index hint is a red herring. I think it is because of internal differences between SELECT and UPDATE, especially when it comes to planning the query.
Suggest you file a bug.
I think it is not really doing a "range". You can get some confidence in this by doing:
FLUSH STATUS;
UPDATE ... ;
SHOW SESSION STATUS LIKE 'Handler%';
(I have checked a variety of versions; nothing hints that more than 1 row is being hit other than the dubious "range".)

Related

MySQL indexing has no speed effect through PHP but does on PhpMyAdmin

I am trying to speed up a simple SELECT query on a table that has around 2 million entries, in a MariaDB MySQL database. It took over 1.5s until I created an index for the columns that I need, and running it through PhpMyAdmin showed a significant boost in speed (now takes around 0.09s).
The problem is, when I run it through my PHP server (mysqli), the execution time does not change at all. I'm logging my execution time by running microtime() before and after the query, and it takes ~1.5s to run it, regardless of having the index or not (tried removing/readding it to see the difference).
Query example:
SELECT `pair`, `price`, `time` FROM `live_prices` FORCE INDEX
(pairPriceTime) WHERE `time` = '2022-08-07 03:01:59';
Index created:
ALTER TABLE `live_prices` ADD INDEX pairPriceTime (pair, price, time);
Any thoughts on this? Does PHP PDO ignore indexes? Do I need to restart the server in order for it to "acknowledge" that there is a new index? (Which is a problem since I'm using a shared hosting service...)
If that is really the query, then it needs an INDEX starting with the value tested in the WHERE:
INDEX(time)
Or, to make a "covering index":
INDEX(time, pair, price)
However, I suspect that most of your accesses involve pair? If so, then other queries may need
INDEX(pair, time)
especially if you as for a range of times.
To discuss various options further, please provide EXPLAIN SELECT ...
PDO, mysqli, phpmyadmin -- These all work the same way. (A possible exception deals with an implicit LIMIT on phpmyadmin.)
Try hard to avoid the use of FORCE INDEX -- what helps on today's query and dataset may hurt on tomorrow's.
When you see puzzling anomalies in timings, run the query twice. Caching may be the explanation.
The mysql documenation says
The FORCE INDEX hint acts like USE INDEX (index_list), with the addition that a table scan is assumed to be very expensive. In other words, a table scan is used only if there is no way to use one of the named indexes to find rows in the table.
MariaDB documentation Force Index here says this
FORCE INDEX works by only considering the given indexes (like with USE_INDEX) but in addition, it tells the optimizer to regard a table scan as something very expensive. However, if none of the 'forced' indexes can be used, then a table scan will be used anyway.
Use of the index is not mandatory. Since you have only specified one condition - the time, it can choose to use some other index for the fetch. I would suggest that you use another condition for the select in the where clause or add an order by
order by pair, price, time
I ended up creating another index (just for the time column) and it did the trick, running at ~0.002s now. Setting the LIMIT clause had no effect since I was always getting 423 rows (for 423 coin pairs).
Bottom line, I probably needed a more specific index, although the weird part is that the first index worked great on PMA but not through PHP, but the second one now applies to both approaches.
Thank you all for the kind replies :)

what would happen to CRUD when you index on every column in a table in SQL database?

If I have a table in SQL database and put index on every column of it, what would happen to CRUD?
I think CREATE statement will definitely be slower and READ will be faster. But I don't know what about UPDATE and DELETE.
On one side, since there are WHERE clauses in UPDATE and DELETE statements, I guess that part will be faster. But since these 2 operations will also modify other columns, I guess that part will be slower. Then which part will count more and what's the final impact on UPDATE and DELETE?
DELETE will definitely be slower because every deleted row will require deleting the row from each index. Of course, that is offset by any increase in speed based on the WHERE clause.
UPDATE might be slower or faster. Filtering might be faster, depending on the WHERE clause. On the other hand, every column bing modified would need index updates.

In Mysql, why do unused indexes affect the query plan?

I've seen this several times but I could be misinterpreting the EXPLAIN query plan.
Suppose I have a table(col1, col2).
I want to join it with another table on both col1 and col2.
So I create an index(col1, col2).
Sometimes, the EXPLAIN shows that the index is not being used. Perhaps some other inefficient index is used or none at all.
But if I create another index(col1), then the first index(col1, col2) is used.
Has anyone ever had this happen to them before? Do you have any idea why this might happen?
My theory is that the unused index provides some more accurate statistics about the table that hints to the query plan to use the first index. But I'm not familiar enough with the inner workings of mysql to know if this is true or how to prove it.
The documentation of MySQL for ALTER TABLE states that it may be required to run ANALYZE TABLE on it to refresh the index cardinality, which I believe to be a factor in the behaviour you're seeing. Also, the query optimiser usually handles empty (or near) empty tables quite different from populated tables, and it'll often do a full table scan instead of using an index when there are only a few rows. For my own development at $work I can't rely on the EXPLAIN output of my dev database because of that.

Improve this update query

So I've been struggling to run this query. It takes a really long time.
Its MySQL Innodb. The fields I am using are indexed. Its on a pretty beefy server with around 10gig allocated to the innodb pool config thing.
UPDATE TEMP_account_product p
JOIN products_temp c ON (c.`some_id` = p.`old_someid`)
SET p.`product` = c.id
WHERE p.product IS NULL;
The thing to note here is that both tables contain around 900,000 rows. this line brings back around 800,000 records (WHERE p.product IS NULL;)
I have a feeling I'm kinda screwed here but thought Id try anyway.
I think that possible reasons of slow execution of such type of request can be:
MOST probable - you have an INDEX(es) on updated field and that request is updating lot of rows - in that case MySQL will need to do a lot of work rebuilding that INDEX(es) during UPDATE. In that case just DROP the INDEX(es) before request, and later recreate it (if needed).
JOIN is slow (you can check it by select with that JOIN) - i.e. join is done w/o INDEXES. Add indexes in that case.
slow filtering of WHERE (i.e. MySQL make a full scan to filter), - you can check how fast it is by select with same filter.
I suggest running it in batches, so that you don't need to rely on the query plan to decide not to being the entire result set into memory before it starts doing the updates. Add something like LIMIT 1000 to the query, and then run it until the number of affected rows is zero (technique depends on your environment, but I think it could be done in SQL).
UPDATE, this is not a valid option (as-is).
Sure enough, I overlooked this in the UPDATE docs:
For the multiple-table syntax ... In this case, ORDER BY and LIMIT cannot be used.

Update versus Select

I am using MySQL as database. I need to update some data. However the data may haven't changed so I may not need to update the row in such case.
I wanted to know which one will be better (performance wise):
a) Search the table to determine if the data has changed. For example I can search by primary key and then see if the value of remaining fields have changed or not. If yes, then continue with update statement and if not then leave it.
b) Use UPDATE query directly. If there are no changes in the data, MySQL will automatically ignore it and not process updating the data.
So which one will be perform better in such case.
From the MySQL manual:
If you set a column to the value it currently has, MySQL notices this
and does not update it.
So save yourself the latency and leave that task to MySQL. It will even tell you how many rows were actually affected.
First options seems better to me but in a specific scenerio.
Select all or some rows from table and get them in a result set
Traverse the result set, as in-memory traversal is really fast enough, and pick up the primary keys whose records you want to update and then execute update queries.
It seems comparetively efficient solution to me as compared to executing update query on each row regardless whether you need to do so or not.
If you use the select-then-update approach, you need to lock the row (e.g. select for update), otherwise you are in a race condition - the row can be changed after you selected and checked that it hasn't be changed.
As #AndreKR pointed out, MySQL won't perform any write operation, if the values are the same, so using update directly is faster than using 2 queries.