Why MySql can not use PARTITION pruning with INSERT statements?

Why MySql can not use PARTITION pruning with INSERT statements? - mysql

Here is a sentence:
MySQL can apply partition pruning to SELECT, DELETE, and UPDATE statements. INSERT statements currently cannot be pruned.
So when a new row is inserted MySql can not determine what partition it belongs? Sound very strange. Is it a mistake? or what do they mean by this pharse?

As I read it, the partition-pruning optimization currently relies on parsing of the WHERE clause to determine which partitions to access. The INSERT statement has no WHERE clause, and the optimizer currently has no other mechanism by which to prune.
It looks as though partition pruning is, at present, something of a work-in-progress.

Related

MySQL - Can I create an index only to one particular partition

Since partition also splits the table into subtables, I wanted to know if there is any way to index the partitioned table one by one based on the partition name or id. I am asking this because, my table can have 1 Billion+ rows and add index query takes long hours/day, so wanted to check if I can start adding index based on the partition that I think is more important first or vice versa.

No, MySQL has no syntax to support creating indexes on a partitioned table one partition at a time. The index will be added to all partitions in one ALTER TABLE or CREATE INDEX statement.
At my company, we execute schema changes using pt-online-schema-change, a script that allows clients to continue reading and writing the table while the alter is running. It might even take longer to run the schema change, but since it doesn't block clients, this doesn't cause a problem.
The script is part of the Percona Toolkit, which is a free, open-source collection of tools written in Perl and Bash.

Why update statement always does index range scan?

Consider this UPDATE statement:
UPDATE `messages` force index (primary)
SET `isDeleted`=1
WHERE `messages`.`id` = '069737b6-726d-4f5b-a5b9-0510acdd7a92';
Here's the explain graph for it:
Why this simple query uses index range scan instead of single row fetch or at least unique key fetch? Notice that I use FORCE INDEX and exactly same query written as SELECT statement results in "Single Row (constant)" scan.
Also same happens if I add LIMIT 1
I'm using mysql 5.6.46

MySQL ignores index hints in UPDATE statements.
Therefore there's no way to deterministically set the scan method for UPDATE query.
I guess I have to rely on MySQL's heuristics on deciding which scan method is faster based on table size, etc. Not ideal, because I don't know what's gonna be the performance profile for that query anymore, but I hope it will at least be "Index Range Scan" and nothing worse...
Reference: How to force mysql UPDATE query to use index? How to enable mysql engine to automatically use the index instead of forcing it?
https://dba.stackexchange.com/a/153323/146991

The index hint is a red herring. I think it is because of internal differences between SELECT and UPDATE, especially when it comes to planning the query.
Suggest you file a bug.
I think it is not really doing a "range". You can get some confidence in this by doing:
FLUSH STATUS;
UPDATE ... ;
SHOW SESSION STATUS LIKE 'Handler%';
(I have checked a variety of versions; nothing hints that more than 1 row is being hit other than the dubious "range".)

Issuing multiple sql update statements in one go

I have to issue about ~1M sql queries in the following form:
update table1 ta join table2 tr on ta.tr_id=tr.id
set start_date=null, end_date=null
where title_id='X' and territory_id='AG' and code='FREE';
The sql statements are in a text document -- I can only copy paste them in as-is.
What would be the fastest way to do this? Is there some checks that I can disable so it only inserts them at the end? For example something like:
start transaction;
copy/paste all sql statements here;
commit;
I tried the above approach but saw zero speed improvement on the inserts. Are there any other things I can try?

The performance cost is partly attributed to running 1M separate SQL statements, but it's also attributed to the cost of rewriting rows and the corresponding indexes.
What I mean is, there are several steps to executing an SQL statement, and each of them take non-zero amount of time:
Start a transaction.
Parse the SQL, validate the syntax, check your privileges to make sure you have permission to update those tables, etc.
Change the values you updated in the row.
Change the values you updated in each index on that table that contain the columns you changed.
Commit the transaction.
In autocommit mode, the start & commit transaction implicitly happens for every SQL statement, so that causes maximum overhead. Using explict START and COMMIT as you showed reduces that overhead by doing each once.
Caveat: I don't usually run 1M updates in a single transaction. That causes other types of overhead, because MySQL needs to keep the original rows in case you ROLLBACK. As a compromise, I would execute maybe 1000 updates, then commit and start a new transaction. That at least reduces the START/COMMIT overhead by 99.9%.
In any case, the overhead of transactions isn't great. It might be unnoticeable compared to the cost of updating indexes.
MyISAM tables have an option to DISABLE KEYS, which means it doesn't have to update non-unique indexes during the transaction. But this might not be a good optimization for you, because (a) you might need indexes to be active, to help performance of lookups in your WHERE clause and the joins; and (b) it doesn't work in InnoDB, which is the default storage engine, and it's a better idea to use InnoDB.
You could also review if you have too many indexes or redundant indexes on your table. There's no sense having extra indexes you don't need, which only add cost to your updates.
There's also a possibility that you don't have enough indexes, and your UPDATE is slow because it's doing a table-scan for every statement. The table-scans might be so expensive that you'd be better off creating the needed indexes to optimize the lookups. You should use EXPLAIN to see if your UPDATE statement is well-optimized.
If you want me to review that, please run SHOW CREATE TABLE <tablename> for each of your tables in your update, and run EXPLAIN UPDATE ... for your example SQL statement. Add the output to your question above (please don't paste in a comment).

Performance of VALUES(col_name) function in the UPDATE clause

The question is about SQL legacy code for MySQL database.
It is known, that when doing INSERT ... ON DUPLICATE KEY UPDATE statement VALUES(col_name) function can be used to refer to column values from the INSERT portion instead of passing there exact values:
INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE b=VALUES(b), c=VALUES(c)
My legacy code contains a lot of huge inserts in parametrized style (they are used in batch-inserts):
INSERT INTO table (a,b,c, <...dozens of params...>) VALUES (?,?,?,<...dozens of values...>)
ON DUPLICATE KEY UPDATE b=?, c=?, <...dozens of params...>
The question is: would it increase performance of batch-inserts if I will change all these queries to use VALUES(col_name) function (in UPDATE portion)?
My queries are executed from java code using jdbc driver. So, what I guess, is that for long text values it should significantly reduce size of queries. What about MySQL it self? Would it really in general give me increasing of speed?

Batched inserts can may run 10 times as fast and one row at a time. The reason for this is all the network, etc, overhead.
Another technique is to change from a single batched IODKU into two statements -- one to insert the new rows, one to do the updates. (I don't know if that will run any faster.) Here is a discussion of the two steps, in the context of "normalization".
Another thing to note: If there is an AUTO_INCREMENT involved (not as one of the columns mentioned), then IODKU may "burn" ids for the cases where it does an 'update'. That is, the IODKU (and INSERT IGNORE and a few others) get all the auto_incs that it might need, then proceeds to use the ones it does need and waste the others.
You get into "diminishing returns" if you try to insert more than a few hundred rows in a batch. And you stress the rollback log.

Will SQL update affect its subquery during the update run?

I'm just composing a complex update query which looks more or less like this:
update table join
(select y, min(x) as MinX
from table
group by y) as t1
using (y)
set x = x - MinX
Which means that the variable x is updated based on the subquery, which also processes variable x - but couldn't this x already be modified by the running update command? Isn't this a problem? I mean, in normal programming you normally have to handle this explicitly, i.e. store new value to some other place from the old value and after the job is done, replace the old value with new... but how will SQL database do this?
I'm not interested in a single observation or experiment. I would like to have a snippet from the docs or sql standard that will say what is the defined behaviour in this case. I'm using MySQL, but answers valid also for other PostgresQL, Oracle, etc. and especially for SQL standard in general are appreciated. Thanks!

** Edited **
Selecting from the target table
From 13.2.9.8. Subqueries in the FROM Clause:
Subqueries in the FROM clause can return a scalar, column, row, or table. Subqueries in the FROM clause cannot be correlated subqueries, unless used within the ON clause of a JOIN operation.
So, yes, you can perform the above query.
The problem
There are really two problems here. There's concurrency, or ensuring that no one else changes the data out from under our feet. This is handled with locking. Dealing with the actual modification of new versus old values is handled with derived tables.
Locking
In the case of your query above, with InnoDB, MySQL performs the SELECT first, and acquires a read (shared) lock on each row in the table individually. If you had a WHERE clause in the SELECT statement, then only the records you select would be locked, where ranges would cause any gaps to be locked as well.
A read lock prevents any other query from acquiring write locks, so records can't be updated from elsewhere while they're read locked.
Then, MySQL acquires a write (exclusive) lock on each of the records in the table individually. If you had a WHERE clause in your UPDATE statement, then only the specific records would be write locked, and again, if the WHERE clause selected a range, then you would have a range locked.
Any record that had a read lock from the previous SELECT would automatically be escalated to a write lock.
A write lock prevents other queries from obtaining either a read or write lock.
You can use Innotop to see this by running it in Lock mode, start a transaction, execute the query (but don't commit it), and you will see the locks in Innotop. Also, you can view the details without Innotop with SHOW ENGINE INNODB STATUS.
Deadlocks
Your query is vulnerable to a deadlock if two instances were run at the same time. If query A got read locks, then query B got read locks, query A would have to wait for query B's read locks to release before it could acquire the write locks. However, query B isn't going to release the read locks until after it finishes, and it won't finish unless it can acquire write locks. Query A and query B are in a stalemate, and hence, a deadlock.
Therefore, you may wish to perform an explicit table lock, both to avoid the massive amount of record locks (which uses memory and affects performance), and to avoid a deadlock.
An alternative approach is to use SELECT ... FOR UPDATE on your inner SELECT. This starts out with write locks on all of the rows instead of starting with read and escalating them.
Derived tables
For the inner SELECT, MySQL creates a derived temporary table. A derived table is an actual non-indexed copy of the data that lives in the temporary table that is automatically created by MySQL (as opposed to a temporary table that you explicitly create and can add indexes to).
Since MySQL uses a derived table, that's the temporary old value that you refer to in your question. In other words, there's no magic here. MySQL does it just like you'd do it anywhere else, with a temporary value.
You can see the derived table by doing an EXPLAIN against your UPDATE statement (supported in MySQL 5.6+).

A proper RDBMS uses statement level read consistency, which ensures the statement sees (selects) the data as it was at the time the statement began. So the scenario you are afraid of, won't occur.
Regards,
Rob.

Oracle has this in the 11.2 Documentation
A consistent
result set is provided for every query, guaranteeing data consistency,
with no action by the user. An implicit query, such as a query implied
by a WHERE clause in an UPDATE statement, is guaranteed a consistent
set of results. However, each statement in an implicit query does not
see the changes made by the DML statement itself, but sees the data as
it existed before changes were made.

Although its been noted you SHOULDN'T be able to do an update to a table based on its own data, you should be able to adjust the MySQL syntax to allow for it via
update Table1,
(select T2.y, MIN( T2.x ) as MinX from Table1 T2 group by T2.y ) PreQuery
set Table1.x = Table1.x - PreQuery.MinX
where Table1.y = PreQuery.y
I don't know if the syntax goes a different route using JOIN vs the comma list version, but by the complete prequery you do would have to be applied first for its result completed ONCE, and joined (via the WHERE) to actually perform the update.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008