I have two tables with related key. I want to choose the best way to delete row from tbl_one and tbl_two rows that have related key. I tried using DELETE JOIN to do this correctly, but I found another way that is very simple that I use two statements of delete. Could you tell me which is better?
First method:
DELETE tbl_one,
tbl_two FROM tbl_one
JOIN tbl_two ON tbl_one.id = tbl_two.tbl_one_id WHERE tbl_one.id = 1
Second method:
DELETE FROM tbl_one WHERE id =1;
DELETE FROM tbl_two WHERE tbl_one_id =1;
The main point of concern the operation should be done in isolation(either both or none)
you should put the operations inside transaction block.
In my perspective first query works better just because the server can reach the savepoint with a single query rather than parsing and executing two.
turn off the foreign_key_check global variable and run the query and turn it on back afterwards.
NB: You can get use of cascading foreign key behavior mysql provides.
It does not matter if you use a single or multiple statements to alter database content, as long as you are using transactions. Without transactions two issues might arise:
another process accessing the data inbetween you running one statement after another queries a state of the database that is "unclean", because only part of the statements has been processed. This may always happen in a system where more than a single client can use the database at the same time, for example in web pages and the like.
a subsequent query might fail, out of whatever reason. In that case only part of your statements have been processed, the other part not. That leaves your database in an "undefined" state again, a persistent situation in this case. You'd have to manually prevent this by error detection, but even then it might simply not be possible to fix the issue.
Relational database management systems offer transactions for this. Transactions allow to "bundle" several statements to a single one from a logical point of view. You start a transaction, run your statements, then close the transaction. If something unexpected occurred you can always "rollback" your transaction, that way you get a stable and clean database situation just like before the start of your transaction.
Related
I'm using DBAL in my project because it is easier to convert the database statements in an already written project that I'm converting to Symfony v2.8 and MySQL than going with full-on Doctrine, but now I need to implement "read-only row locks" to prevent data changes by other users while a pair of tightly coupled but separate SELECT statements are consecutively executed, and I'm thinking that I should use Transactions and SELECT FOR UPDATE statements. However, I don't see that DBAL supports SELECT FOR UPDATE statements in it's documentation. I do see that Transactions are supported, but as I understand it, these won't prevent other users from UPDATE-ing or DELETE-ing the data in the same data row that the SELECTs statements are using.
Specifically, the two SELECTs share data retrieved in one row by the first SELECT with a second SELECT that retrieve multiple rows from the same tables based on the first SELECT. The two SELECTs are somewhat complex, and I don't know if I could combine them into a super-sized single SELECT, nor do I really want to as that would make the new SELECT harder to maintain in the future.
The problem is that other users could be updating the same values retrieved by the first SELECT and if this done between the the two SELECTs, it would break the second SELECT of the pair and either prevent the second from returning data or at least return the wrong data.
I believe that I need to use a SELECT FOR UPDATE to lock the row that it retrieve to temporarily prevent other users from performing their updates and deletes on the single row retrieved by the first SELECT of the pair, but since I'm not actually performing an update, but rather two SELECTs, how do I release the lock on the one row locked by the first SELECT without performing a 'fake' update, say by UPDATE-ING a column value with the same value it already had?
Thanks
For the transaction you want repeatable results for:
START TRANSACTION READ ONLY
SELECT ...
{some processing}
SELECT {that covers the same rows} [will return the same result]
COMMIT
note: READ ONLY is optional
Experiment by running two mysql client connections and observer the results. The other connection can modify or insert rows covering the first selects criteria and the first transaction won't observe them.
I'm using MySQL 5.6. Let's say we have the following two tables:
Every DataSet has a huge amount of child DataEntry records that the number would be 10000 or 100000 or more. DataSet.md5sum and DataSet.version get updated when its child DataEntry records are inserted or deleted, in one transaction. A DataSet.md5sum is calculated against all of its children DataEntry.content s.
Under this situation, What's the most efficient way to fetch consistent data from those two tables?
If I issue the following two distinct SELECTs, I think I might get inconsistent data due to concurrent INSERT / UPDATEs:
SELECT md5sum, version FROM DataSet WHERE dataset_id = 1000
SELECT dataentry_id, content FROM DataEntry WHERE dataset_id = 1000 -- I think the result of this query will possibly incosistent with the md5sum which fetched by former query
I think I can get consistent data with one query as follows:
SELECT e.dataentry_id, e.content, s.md5sum, s.version
FROM DataSet s
INNER JOIN DataEntry e ON (s.dataset_id = e.dataset_id)
WHERE s.dataset_id = 1000
But it produces redundant dataset which filled with 10000 or 100000 duplicated md5sums, So I guess it's not efficient (EDIT: My concerns are high network bandwidth and memory consumption).
I think using pessimistic read / write lock (SELECT ... LOCK IN SHARE MODE / FOR UPDATE) would be another option but it seems overkill. Are there any other better approaches?
The join will ensure that the data returned is not affected by any updates that would have occurred between the two separate selects, since they are being executed as a single query.
When you say that md5sum and version are updated, do you mean the child table has a trigger on it for inserts and updates?
When you join the tables, you will get a "duplicate md5sum and version" because you are pulling the matching record for each item in the DataEntry table. It is perfectly fine and isn't going to be an efficiency issue. The alternative would be to use the two individual selects, but depending upon the frequency of inserts/updates, without a transaction, you run the very slight risk of getting data that may be slightly off.
I would just go with the join. You can run explain plans on your query from within mysql and look at how the query is executed and see any differences between the two approaches based upon your data and if you have any indexes, etc...
Perhaps it would be more beneficial to run these groups of records into a staging table of sorts. Before processing, you could call a pre-processor function that takes a "snapshot" of the data about to be processed, putting a copy into a staging table. Then you could select just the version and md5sum alone, and then all of the records, as two different selects. Since these are copied into a separate staging table, you wont have to worry about immediate updates corrupting your session of processing. You could set up timed jobs to do this or have it as an on-demand call. Again though, this would be something you would need to research the best approach given the hardware/network setup you are working with. And any job scheduling software you have available to you.
Use this pattern:
START TRANSACTION;
SELECT ... FOR UPDATE; -- this locks the row
...
UPDATE ...
COMMIT;
(and check for errors after every statement, including COMMIT.)
"100000" is not "huge", but "BIGINT" is. Recomment INT UNSIGNED instead.
For an MD5, make sure you are not using utf8: CHAR(32) CHARACTER SET ascii. This goes for any other hex strings.
Or, use BINARY(16) for half the space. Then use UNHEX(md5...) when inserting, and HEX(...) when fetching.
You are concerned about bandwidth, etc. Please describe your client (PHP? Java? ...). Please explain how much (100K rows?) needs to be fetched to re-do the MD5.
Note that there is a MD5 function in MySQL. If each of your items had an MD5, you could take the MD5 of the concatenation of those -- and do it entirely in the server; no bandwidth needed. (Be sure to increase group_concat_max_len)
SO, we are trying to run a Report going to screen, which will not change any stored data.
However, it is complex, so needs to go through a couple of (TEMPORARY*) tables.
It pulls data from live tables, which are replicated.
The nasty bit when it comes to take the "eligible" records from
temp_PreCalc
and populate them from the live data to create the next (TEMPORARY*) table output
resulting in effectively:
INSERT INTO temp_PostCalc (...)
SELECT ...
FROM temp_PreCalc
JOIN live_Tab1 ON ...
JOIN live_Tab2 ON ...
JOIN live_Tab3 ON ...
The report is not a "definitive" answer, expectation is that is merely a "snapshot" report and will be out-of-date as soon as it appears on screen.
There is no order or reproducibility issue.
So Ideally, I would turn my TRANSACTION ISOLATION LEVEL down to READ COMMITTED...
However, I can't because live_Tab1,2,3 are replicated with BIN_LOG STATEMENT type...
The statement is lovely and quick - it takes hardly any time to run, so the resource load is now less than it used to be (which did separate selects and inserts) but it waits (as I understand it) because of the SELECT that waits for a repeatable/syncable lock on the live_Tab's so that any result could be replicated safely.
In fact it now takes more time because of that wait.
I'd like to SEE that performance benefit in response time!
Except the data is written to (TEMPORARY*) tables and then thrown away.
There are no live_ table destinations - only sources...
these tables are actually not TEMPORARY TABLES but dynamically created and thrown away InnoDB Tables, as the report Calculation requires Self-join and delete... but they are temporary
I now seem to be going around in circles finding an answer.
I don't have SUPER privilege and don't want it...
So can't SET BIN_LOG=0 for this connection session (Why is this a requirement?)
So...
If I have a scratch Database or table wildcard, which excludes all my temp_ "Temporary" tables from replication...
(I am awaiting for this change to go through at my host centre)
Will MySQL allow me to
SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
INSERT INTO temp_PostCalc (...)
SELECT ...
FROM temp_PreCalc
JOIN live_Tab1 ON ...
JOIN live_Tab2 ON ...
JOIN live_Tab3 ON ...
;
Or will I still get my
"Cannot Execute statement: impossible to write to binary log since
BINLOG_FORMAT = STATEMENT and at least one table uses a storage engine
limited to row-based logging..."
Even though its not technically true?
I am expecting it to, as I presume that the replication will kick in simply because it sees the "INSERT" statement, and will do a simple check on any of the tables involved being replication eligible, even though none of the destinations are actually replication eligible....
or will it pleasantly surprise me?
I really can't face using an unpleasant solution like
SELECT TO OUTFILE
LOAD DATA INFILE
In fact I dont think I could even use that - how would I get unique filenames? How would I clean them up?
The reports are run on-demand directly by end users, and I only have MySQL interface access to the server.
or streaming it through the PHP client, just to separate the INSERT from the SELECT so that MySQL doesnt get upset about which tables are replication eligible....
So, it looks like the only way appears to be:
We create a second Schema "ScratchTemp"...
Set the dreaded replication --replicate-ignore-db=ScratchTemp
My "local" query code opens a new mysql connection, and performs a USE ScratchTemp;
Because I have selected the default database of the "ignore"d one - none of my queries will be replicated.
So I need to take huge care not to perform ANY real queries here
Reference my scratch_ tables and actual data tables by prefixing them all on my queries with the schema qualified name...
e.g.
INSERT INTO LiveSchema.temp_PostCalc (...) SELECT ... FROM LiveSchema.temp_PreCalc JOIN LiveSchema.live_Tab1 etc etc as above.
And then close this connection just as soon as I can, as it is frankly dangerous to have a non-replicated connection open....
Sigh...?
If two independent scripts call a database with update requests to the same field, but with different values, would they execute at the same time and one overwrite the other?
as an example to help ensure clarity, imagine both of these statements being requested to run at the same time, each by a different script, where Status = 2 is called microseconds after Status = 1 by coincidence.
Update My_Table SET Status = 1 WHERE Status= 0;
Update My_Table SET Status = 2 WHERE Status= 0;
What would my results be and why? if other factors play a roll, expand on them as much as you please, this is meant to be a general idea.
Side Note:
Because i know people will still ask, my situation is using MySql with Google App Engine, but i don't want to limit this question to just me should it be useful to others. I am using Status as an identifier for what script is doing stuff to the field. if status is not 0, no other script is allowed to touch it.
This is what locking is for. All major SQL implementations lock DML statements by default so that one query won't overwrite another before the first is complete.
There are different levels of locking. If you've got row locking then your second update will run in parallel with the first, so at some point you'll have 1s and 2s in your table.
Table locking would force the second query to wait for the first query to completely finish to release it's table lock.
You can usually turn off locking right in your SQL, but it's only ever done if you need a performance boost and you know you won't encounter race conditions like in your example.
Edits based on the new MySQL tag
If you're updating a table that used the InnoDB engine, then you're working with row locking, and your query could yield a table with both 1s and 2s.
If you're working with a table that uses the MyISAM engine, then you're working with table locking, and your update statements would end up with a table that would either have all 1s or all 2s.
from https://dev.mysql.com/doc/refman/5.0/en/lock-tables-restrictions.html (MySql)
Normally, you do not need to lock tables, because all single UPDATE statements are atomic; no other session can interfere with any other currently executing SQL statement. However, there are a few cases when locking tables may provide an advantage:
from https://msdn.microsoft.com/en-us/library/ms177523.aspx (sql server)
An UPDATE statement always acquires an exclusive (X) lock on the table it modifies, and holds that lock until the transaction completes. With an exclusive lock, no other transactions can modify data.
If you were having two separate connections executing the two posted update statements, whichever statement was started first, would be the one that completed. THe other statement would not update the data as there would no longer be records with a status of 0
The short answer is: it depends on which statement commits first. Just because one process started an update statement before another doesn't mean that it will complete before another. It might not get scheduled first, it might be blocked by another process, etc.
Ultimately, it's a race condition: the operation that completes (and commits) last, wins.
Since you have TWO scripts doing the same thing and using different values for the UPDATE, they will NOT run at the same time, one of the scripts will run before even if you think you are calling them at the same time. You need to specify WHEN each script should run, otherwise the program will not know what should be 1 and what should be 2.
I am experiencing what appears to be the effects of a race condition in an application I am involved with. The situation is as follows, generally, a page responsible for some heavy application logic is following this format:
Select from test and determine if there are rows already matching a clause.
If a matching row already exists, we terminate here, otherwise we proceed with the application logic
Insert into the test table with values that will match our initial select.
Normally, this works fine and limits the action to a single execution. However, under high load and user-abuse where many requests are intentionally sent simultaneously, MySQL allows many instances of the application logic to run, bypassing the restriction from the select clause.
It seems to actually run something like:
select from test
select from test
select from test
(all of which pass the check)
insert into test
insert into test
insert into test
I believe this is done for efficiency reasons, but it has serious ramifications in the context of my application. I have attempted to use Get_Lock() and Release_Lock() but this does not appear to suffice under high load as the race condition still appears to be present. Transactions are also not a possibility as the application logic is very heavy and all tables involved are not transaction-capable.
To anyone familiar with this behavior, is it possible to turn this type of handling off so that MySQL always processes queries in the order in which they are received? Is there another way to make such queries atomic? Any help with this matter would be appreciated, I can't find much documented about this behavior.
The problem here is that you have, as you surmised, a race condition.
The SELECT and the INSERT need to be one atomic unit.
The way you do this is via transactions. You cannot safely make the SELECT, return to PHP, and assume the SELECT's results will reflect the database state when you make the INSERT.
If well-designed transactions (the correct solution) are as you say not possible - and I still strongly recommend them - you're going to have to make the final INSERT atomically check if its assumptions are still true (such as via an INSERT IF NOT EXISTS, a stored procedure, or catching the INSERT's error in the application). If they aren't, it will abort back to your PHP code, which must start the logic over.
By the way, MySQL likely is executing requests in the order they were received. It's possible with multiple simultaneous connections to receive SELECT A,SELECT B,INSERT A,INSERT B. Thus, the only "solution" would be to only allow one connection at a time - and that will kill your scalability dead.
Personally, I would go about the check another way.
Attempt to insert the row. If it fails, then there was already a row there.
In this manner, you check or a duplicate and insert the new row in a single query, eliminating the possibility of races.