I have a database table that lists all orders. Each weekend a cron runs and it generates invoices for each customer. The code loops through each customer, gets their recent orders, creates a PDF and then updates the orders table to record the invoice ID against each of their orders.
The final update query is:
update bookings set invoiced='12345' where username='test-username' and invoiced='';
So, set invoiced to 12345 for all orders for test-username that haven't been previously invoiced.
I have come across a problem where orders are being added to the PDF but not updated to reflect the fact that they have been invoiced.
I have started running the update query manually and come across a strange scenario.
A customer may have 60 orders.
If I run the query once then 1 order is updated. I run it again and 1 order is updated, I repeat the process and each time only a small number of orders are updated - between 1 and 3. It doesn't update the 60 in one query as I would expect. I need to run the query repeatedly until it finally comes back with "0 rows affected" and then I can be sure that all rows have been updated.
I am not including a LIMIT XX in my query so I so no reason why it can't update all orders at once. The query I run repeatedly is identical each time.
Does anybody have any wise suggestions?!
I'm guessing you're using InnoDB. You haven't disclosed the type of code you're running.
But I bet you're seeing an issue that relates to transactions. When a program works differently from an interactive session, it's often a transaction issue.
See here: http://dev.mysql.com/doc/refman/5.5/en/commit.html
Do things work better if you issue a COMMIT; command right after your UPDATE statement?
Note that your language binding may have its own preferred way of issuing the COMMIT; command.
Another way to handle this problem is to issue the SQL command
SET autocommit = 1
right after you establish your connection. This will make every SQL command that changes data do its COMMIT operation automatically.
Related
I want to create a model with ID equal to the current greatest ID for that model plus one (like auto-increment). I'm considering doing this with select_for_update to ensure there is no race condition for the current greatest ID, like this:
with transaction.atomic():
greatest_id = MyModel.objects.select_for_update().order_by('id').last().id
MyModel.objects.create(id=greatest_id + 1)
But I'm wondering, if two processes try to run this simultaneously, once the second one unblocks, will it see the new greatest ID inserted by the first process, or will it still see the old greatest ID?
For example, say the current greatest ID is 10. Two processes go to create a new model. The first one locks ID 10. Then the second one blocks because 10 is locked. The first one inserts 11 and unlocks 10. Then, the second one unblocks, and now will it see the 11 inserted by the first as the greatest, or will it still see 10 because that's the row it blocked on?
In the select_for_update docs, it says:
Usually, if another transaction has already acquired a lock on one of the selected rows, the query will block until the lock is released.
So for my example, I'm thinking this means that the second process will rerun the query for the greatest ID once it unblocks and get 11. But I'm not certain I'm interpreting that right.
Note: I'm using MySQL for the db.
No, I don't think this will work.
First, let me note that you should absolutely check the documentation for the database you're using, as there are many subtle differences between the databases that are not captured in the Django documentation.
Using the PostgreSQL documentation as a guide, the problem is that, at the default READ COMMITTED isolation level, the blocked query will not be rerun. When the first transaction commits, the blocked transaction will be able to see changes to that row, but it will not be able to see that new rows have been added.
It is possible for an updating command to see an inconsistent snapshot: it can see the effects of concurrent updating commands on the same rows it is trying to update, but it does not see effects of those commands on other rows in the database.
So 10 is what will be returned.
Edit: My understanding in this answer is wrong, just leaving it for documentation's sake in case I ever want to come back to it.
After some investigation, I believe this will work as intended.
The reason is that for this call:
MyModel.objects.select_for_update().order_by('id').last().id
The SQL Django generates and runs against the db is actually:
SELECT ... FROM MyModel ORDER BY id ASC FOR UPDATE;
(the call to last() only happens after the queryset has already been evaluated.)
Meaning, the query scans over all rows both times it runs. Meaning the second time it runs, it will pick up the new row and return it accordingly.
I learned that this phenomenon is called a "phantom read", and is possible because the isolation level of my db is REPEATABLE-READ.
#KevinChristopherHenry "The issue is that the query is not rerun after the lock is released; the rows have already been selected" Are you sure that's how it works? Why does READ COMMITTED imply the select doesn't run after the lock is released? I thought the isolation level defines which snapshot of data a query sees when it runs, not ~when~ the query is run. It seems to me that whether the select happens before or after the lock is released is orthogonal to the isolation level. And by definition, doesn't a blocked query not select the rows until after it is unblocked?
For what it's worth, I tried to test this by opening two separate connections to my db in a shell and issuing some queries. In the first, I began a transaction, and got a lock 'select * from MyModel order by id for update'. Then, in the second, I did the same, causing the select to block. Then back in the first, I inserted a new row, and commited the transaction. Then in the second, the query unblocked, and returned the new row. This makes me think my hypothesis is correct.
P.S. I finally actually read the "undesirable results" documentation that you read and I see your point - in that example, it looks like it ignores rows that weren't preselected, so that would point to the conclusion that my second query wouldn't pick up the new row. But I tested in a shell and it did. Now I'm not sure what to make of this.
EDIT: ok, this was all my error. I made a mistake resetting the least_prices values. The database still had the old wrong values. Whenever I clicked in my application on a product to get its id and look it up in the database an aftersave-hook would trigger a recalculation of the least_price, thus changing it to the new correct value. And I wrongly assumed looking up a product in the DB would change the cached value for least_price. I would delete this question, as it is very unlikely to help somebody else, but people have already answered. Thank you and sorry if I have wasted anybody`s time.
I recently set all values of one field (least_price) of my table products to new (higher) values with a php-script. Now I run this query:
SELECT Products.*
FROM products Products
WHERE
(
Products.least_price > 240
AND Products.least_price < 500
) ;
and the result set contains some products with a new least_price value above 500. The result will show wrong (I am assuming the old) values for the least_price field. If I query a particular product with select product where id = 123 that happens to have a new least_price higher than 500, it will show the (newer/higher) least_price correctly. Next time I run the first above-mentioned query the result-set will be smaller by one product and the missing product is the one I queried individually.
Can this behaviour be explained by a query cache? I tried to run RESET QUERY CACHE, but I don't have the previleges to do that unfortunately with that hosting provider. Is there anything else that I can do to alert mysql that the least_price attribute has changed?
I am using mysql 5.6.38-nmm1-log on a x86_64 debian-linux-gnu machine with innodb 5.6.38
No, this can't be due to the query cache.
The query cache doesn't cache row references, it caches the actual results that were returned. So it can't contain results that don't match the criteria in the query.
The cached result for a query is flushed if any of the tables it uses are modified. So it will never return stale data.
For full details of how the MySQL query cache works, see The MySQL Query Cache.
If the least_price column is indexed, your incorrect result could be due to a corrupted index, try repairing the table.
I have an invoices table (innoDB) in which i need to set manually the progressive number for the next invoice. My code now is
SELECT MAX(invoice_n) FROM invoices WHERE invoice_y = 2013
and then regulary save the record putting the new invoice_n = max + 1. I have an index UNIQUE on invoice_n-invoice_y and I'm logging db errors, so I see that sometimes I have duplicate key entry errors, beacuse I have hundreds of different users connected. I put the code in a loop that continues until the invoice is generated, but i think there can be a more elegant solution, especially using transactions. I read a bit but I can't understand how can I achive my result with transactions.
Any help?
You could use "AUTO_INCREMENT" in your column definition. You will see some gaps between numbers if the insertion fails.
Another alternative is creating a table with the last index per year (see comments) and follow the next steps:
Begin a transaction
Select for update last index per row and increment in one
Insert the new invoice
Commit your transaction
Some links:
See MySQL (InnoDB) transaction model
Locking example
Another one, you could use an "optimistic approach" and repeat the select and the insert if it fails because a duplicate key.
I hope this helps you, any comment is welcome!
My application accesses a local DB where it inserts records into a table (+- 30-40 million a day). I have processes that run and process data and do these inserts. Part of the process involves selecting an id from an IDs table which is unique and this is done using a simple
Begin Transaction
Select top 1 #id = siteid from siteids WITH (UPDLOCK, HOLDLOCK)
delete siteids where siteid = #id
Commit Transaction
I then immediately delete that id with a separate statement from that very table so that no other process grabs it. This is causing tremendous timeout issues and with only 4 processes accessing it, I am surprised though. I also get timeout issues when checking my main post table to see if a record was inserted using the above id. It runs fast but with all the deadlocks and timeouts I think this indicates poor design and is a recipe for disaster.
Any advice?
EDIT
this is the actual statement that someone else here helped with. I then removed the delete and included it in my code as a separately executed statement. Will the order by clause really help here?
I'm currently building a system that does running computations, and every 5 seconds inserts or updates information based on those computations to a few rows in MySQL. I'm working on running this system on a few different servers at once right now with a few agents that are each doing similar processing and then writing on the same set of rows. I already randomize the order in which each agent writes its set of rows, but there's still a lot of deadlock happening. What's the best/fastest way to get through those deadlocks? Should I just rerun the query each time one happens, or do row locks, or something else entirely?
I suggest you try something that won't require more than one client to update your 'few rows.'
For example, you could have each agent that produces results do an INSERT to a staging table with the MEMORY access method.
Then, every five seconds you can run a MySQL event (a stored procedure within the server) that loops through all the rows in that table, posting their results to your 'few rows' and then deleting them. If it's important for the rows in your staging table to be processed in order, then you can use an AUTO_INCREMENT id field. But it might not be important for them to be in order.
If you want to get fancier and more scalable than that, you'll need a queue management system like Apache ActiveMQ.