I have a cronjob that loops through and updates a MySQL table row by row. After the table is 'completed', I would like to execute the cronjob exactly 1 more time, to perform various cleanup activities.
In execute a cronjob exactly once, thaJeztah states:
It's best to set that value in the mysql database, e.g. needs_cleanup = 1. That way you can always find those records at a later time. Keeping it in the database allows to to recover, for example, if a cron-job wasn't executed or failed half-way the loop. – thaJeztah
I think this would be a good solution if its possible, as in my case I only need to set the flag once a day. If it is possible could someone point me to the sql commands nescesary to execute the placement of a simple binary flag, with values 0,1 in a mysql table?
UPDATE mytable SET needs_cleanup = 1
does it for all records of mytable. If you need for a single record, add a WHERE condition, e.g.
UPDATE mytable SET needs_cleanup = 1
WHERE id = 1
Related
What I try to accomplish seems simple,
Db type: MyISAM
Table Structure: card_id, status
Query: select an unused card_id from a table, and set the row as "used".
Is it race condition that when two queries running at the same time, and before status is updated, the same card_id is fetched twice?
I did some search already. It seems Lock table is a solution, but it's overkill to me and need Lock Privilege.
Any Idea?
Thanks!
It really depends on what statements you are running.
For plain old UPDATE statements against a MyISAM table, MySQL will obtain a lock on the entire table, so there is no "race" condition between two sessions there. One session will wait until the lock is released, and then proceed with it's own update (or will wait for a specified period, and abort with a "timeout".)
BUT, if what you are asking about is two sessions both running a SELECT against a table, to retrieve an identifier for a row to be updated, and both sessions retrieving the same row identifier, and then both sessions attempting to update the same row, then yes, that's a definite possibility, and one which really does have to be considered.
If that condition is not addressed, then it's basically going to be a matter of "last update wins", the second session will (potentially) overwrite the changes made by a previous update.
If that's an untenable situation for your application, then that does need to be addressed, either with a different design, or with some mechanism that prevents the second update from overwriting the update applied by the first update.
One approach, as you mentioned, is to avoid this situation by first obtaining an exclusive lock on the table (using a LOCK TABLES statement), then running a SELECT to obtain an identifier, and then running an UPDATE to update the identified row, and then finally, releasing the lock (using an UNLOCK TABLES statement.)
That's a workable approach for some low volume, low concurrency applications. But it does have some significant drawbacks. Of primary concern is reduced concurrency, due to the exclusive locks obtained on a single resource, which has the potential to cause a performance bottleneck.
Another alternative is an strategy called "optimistic locking". (As opposed to the previously described approach, which could be described as "pessimistic locking".)
For an "optimistic locking" strategy, an additional "counter" column is added to the table. Whenever an update is applied to a row in the table, the counter for that row is incremented by one.
To make use of this "counter" column, when a query retrieves a row that will (or might) be updated later, that query also retrieves the value of the counter column.
When an UPDATE is attempted, the statement also compares the current value of the "counter" column in the row with the previously retrieved value of the counter column. (We just include a predicate (e.g. in the WHERE clause) of the UPDATE statement. For example,
UPDATE mytable
SET counter = counter + 1
, col = :some_new_value
WHERE id = :previously_fetched_row_identifier
AND counter = :previously_fetched_row_counter
If some other session has applied an update to the row we are attempting to update (sometime between the time our session retrieved the row and before our session is attempting to do the update), then the value of the "counter" column on that row will have been changed.
The predicate on our UPDATE statement checks for that, and if the "counter" has been changed, that will cause our update to NOT be applied. We can then detect this condition (i.e. the affected rows count will be a 0 rather than a 1) and our session can take some appropriate action. ("Hey! Some other session updated a row we were intending to update!")
There are some good write-ups on how to implement an "optimistic locking" strategy.
Some ORM frameworks (e.g. Hibernate, JPA) provide support for this type of locking strategy.
Unfortunately, MySQL does NOT provide support for a RETURNING clause in an UPDATE statement, such as:
UPDATE ...
SET status = 'used'
WHERE status = 'unused'
AND ROWNUM = 1
RETURNING card_id INTO ...
Other RDBMS (e.g. Oracle) do provide that kind of functionality. With that feature of the UPDATE statement available, we can simply run the UPDATE statement to both 1) locate a row with status = 'unused', 2) change the value of status = 'used', and 3) return the card_id (or whatever columns we want) of the row the we just updated.
That gets around the problem of having to run a SELECT and then running a separate UPDATE, with the potential of some other session updating the row between our SELECT and our UPDATE.
But the RETURNING clause is not supported in MySQL. And I've not found any reliable way of emulating this type functionality from within MySQL.
This may work for you
I'm not entirely sure why I previously abandoned this approach using user variables (I mentioned above that I had played around with this. I think maybe I needed something more general, which would update more than one row and return a set of of id values. Or, maybe there was something that wasn't guaranteed about the behavior of user variables. (Then again, I only reference user variables in carefully constructed SELECT statements; I don't use user variables in DML; it may be because I don't have a guarantee of their behavior.)
Since you are interested in exactly ONE row, this sequence of three statements may work for you:
SELECT #id := NULL ;
UPDATE mytable
SET card_id = (#id := card_id)
, status = 'used'
WHERE status = 'unused'
LIMIT 1 ;
SELECT ROW_COUNT(), #id AS updated_card_id ;
It's IMPORTANT that these three statements run in the SAME database session (i.e. keep a hold of the database session; don't let go of it and get a new one.)
First, we initialize a user variable (#id) to a value which we won't confuse with a real card_id value from the table. (A SET #id := NULL statement would work as well, without returning a result, like the SELECT statement does.)
Next, we run the UPDATE statement to 1) find one row where status = 'unused', 2) change the value of the status column to 'used', and 3) set the value of the #id user variable to the card_id value of the row we changed. (We'd want that card_id column to be integer type, not character, to avoid any possible character set translation issues.)
Next, we run a query get the number of rows changed by the previous UPDATE statement, using the ROW_COUNT() function (we are going to need to verify that this is 1 on the client side), and retrieve the value of the #id user variable, which will be the card_id value from the row that was changed.
After I post this questions, I thought of a solution which is exactly the same as the one you mentioned at the end. I used update statement, which is "update TABLE set status ='used' where status = 'unused' limit 1", which returns the primary Id of the TABLE, and then I can use this primary ID to get cart_id. Says there are two update statements occurs at the same time, as you said, "MySQL will obtain a lock on the entire table, so there is no "race" condition between two sessions there", so this should solve my issue. But I am not sure why you said, "MySQL does NOT provide support an style statement".
Using a MySQL DB, I am having trouble with a stored procedure and event timer that I created.
I made an empty table that gets populated with data from another via SELECT INTO.
Prior to populating, I TRUNCATE the current data. It's used to track only log entries that occur within 2 months from the current date.
This turns a 350k+ log table into about 750 which really speeds up reporting queries.
The problem is that if a client sends a query precisely between the TRUNCATE statement and the SELECT INTO statement (which has a high probability considering the EVENT is set to run every 1 minute), the query returns no rows...
I have looked into locking a read on the table while this PROCEDURE is ran, but locks are not allowed in STORED PROCEDURES.
Can anyone come up with a workaround that (preferably) doesn't require a remodel?
I really need to be pointed in the right direction here.
Thanks,
Max
I'd suggest an alternate approach instead of truncating the table, and then selecting into it...
You can instead select your new data set into a new table. Next, using a single RENAME command, rename the new table to the existing table and the existing table to some backup name.
RENAME TABLE existing_table TO backup_table, new_table TO existing_table;
This is a single, atomic operation... so it wouldn't be possible for the client to read from the data after it is emptied but before it is re-populated.
Alternately, you could change your TRUNCATE to a DELETE FROM, and then wrap this in a transaction along with the SELECT INTO:
START TRANSACTION
DELETE FROM YourTable;
SELECT INTO YourTable...;
COMMIT
In Kettle, I use the following logic in a transformation, given some Strings X and Y as input:
[User Defined Java Expression] Generate ID
[Insert / Update] Update/Insert table set id = generatedId, name=X, company=Y where name = X; don't update the ID column
[Database Value Lookup]select id from table where name = X
Idea is to update existing entries in the table or create new ones and get the ID of the interesting row in the next step (which may be an existing one or the newly generated one).
This works fine when executed on MySQL + MyISAM but fails on MySQL + InnoDB, with all other parameters being identical. The last step fails when the row is just being inserted in the second step but works for rows already existing in the database. It seems as if the connection tries to execute the SELECT of the last step before the actual insert happened.
All parameters are set to default in the MySQL settings (MySQL 5.1 and 5.5 show the same behavior).
So my questions are: What are the relevant parameters in Kettle and/or MySQL? How can I guarantee that this works as expected? I cannot switch back to MyISAM.
just use the block rows step between the insert step and the next step. Then the step before the block will complete before the next step starts.
Well, after having evaluated different possibilities, three seem to be possible:
Write my own step which performs the select/insert in a transaction
Serialize the whole transformation in its properties (makes everything REALLY slow)
Use Codeks idea and use the blocking step
I went with the third option for now as everything else is not possible for the moment.
Make sure the transaction generated by Update/Insert is committed and the locks are released before doing the SELECT operation takes place. It looks like there are lock problems
I am trying to run an INSERT statement on table X each time I SELECT any record from table Y is there anyway that I can accomplish that using MySQL only?
Something like triggers?
Short answer is No. Triggers are triggered with INSERT, UPDATE or DELETE.
Possible solution for this. rather rare scenario:
First, write some stored procedures
that do the SELECTs you want on
table X.
Then, restrict all users to use only
these stored procedures and do not
allow them to directly use SELECT on table
X.
Then alter the stored procedures to
also call a stored procedure that
performs the action you want
(INSERT or whatever).
Nope - you can't trigger on SELECT - you'll have to create a stored procedure (or any other type of logging facility - like a log file or what ever) that you implicitly call on any query statement - easier if you create a wrapper that calls your query, calls the logging and returns query results.
If you're trying to use table X to log the order of SELECT queries on table Y (a fairly common query-logging setup), you can simply reverse the order of operations and run the INSERT query first, then run your SELECT query.
That way, you don't need to worry about linking the two statements with a TRIGGER: if your server crashes between the two statements then you already logged what you care about with your first statement, and whether the SELECT query runs or fails has no impact on the underlying database.
If you're not logging queries, perhaps you're trying to use table Y as a task queue -- the situation I was struggling with that lead me to this thread -- and you want whichever session queries Y first to lock all other sessions out of the rows returned so you can perform some operations on the results and insert the output into table X. In that case, simply add some logging capabilities to table Y.
For example, you could add an "owner" column to Y, then tack the WHERE part of your SELECT query onto an UPDATE statement, run it, and then modify your SELECT query to only show the results that were claimed by your UPDATE:
UPDATE Y SET owner = 'me' WHERE task = 'new' AND owner IS NULL;
SELECT foo FROM Y WHERE task = 'new' AND owner = 'me';
...do some work on foo, then...
INSERT INTO X (output) VALUES ('awesomeness');
Again, the key is to log first, then query.
I have a table with over 3000000 entries, and i need to delete 500000 of them with given ID's.
My idea is to create a query like:
DELETE FROM TableName WHERE ID IN (id1, id2, ...........)
which I generate with a simple C# code.
The question is:
is there a limit to how many values I can set in the array of ID's.
And if someone have a better way to achieve this delete more efficiently I'm open to ideas.
If your IDs can't be determined with whatever comparison (as in WHERE ID < 1000000) you could
INSERT them into a temp table with multiple inserts and then
JOIN this temp table to yours
But inserts may become problematic. You should check that. How could you speed this thing up?
make deletes in several bulks
insert IDs into temp table in bulks
At the end my solution which works not so bad:
1. Sorted the ID's (to save server paging)
2. Created with C# code query's with 500 ID's in them.
3. sent the query's one by one.
I assume that when i worked with query having 1000+ ids the sql server time to process the query was slowing me down (after all any query you run in sql server is being process and optimized).
I Hope this help someone