IntegrityError with Django m2m relations - mysql

I have a relatively simple Django app, with quite heavy usage that is responsible for quite some concurrency in the db operations.
I have a model Post with a m2m to a Tag model.
A single line in my code, p.add(t) is repeatedly causing mysql exceptions (where p is a Post instance and t is a Tag instance.)
IntegrityError: (1062, "Duplicate entry '329051-1827414' for key 'post_id'")
When this is raised I can manually run this p.add(t) successfully, so it must have to do with some peculiar state that the db/app are in at the time of normal execution. It happens about once every 1000 tag-adding attempts, without any pattern that I can detect (i.e both numbers in the "329051-1827414" pair of the example change)
A CHECK TABLE in mysql on the relevant table shows that they are all seemingly OK.
Any ideas?

Usually you see errors like that when trying to add to an intermediate table if the row being added duplicates the unique-together constraint for the FK's. I'm guessing that in the example you provided "329051" is a Post id and "1827414" is a Tag id.
Normally in Django you can call the add() method repeatedly to add the same instance and Django takes care of everything for you. I'm assuming the model manager maintains some state to help it determine if each add() represents a new or existing row and if the row appears to be new it attempts an insert.
That in itself doesn't explain why you're getting the error. You mention "is responsible for quite some concurrency in the db operations.". Without knowing what that means, I'm guessing that you could be getting a race condition where multiple thread/processes are attempting to add the same new tag around the same time and both are attempting inserts.

I think I'm seeing a similar problem in my app - If I send two identical requests to add a m2m relation (e.g. tag in my case as well), I get that error because the m2m table has a unique constraint on (user, tag). I'm guessing the server is processing the .add functions at the same time.
if not already in database:
# Both invocations reach here because the next line takes some time to process.
create m2m row
I don't know how that can be remedied.

Related

Create or update record for ArchivedUser

I am trying to make a backup table of users, called archived users. It creates the ArchivedUser by taking a hash of the current users attributes (self) and merging in the self.id as the user_id.
When a user is reinstated, their record as an ArchivedUser still remains in the ArchivedUser table. If the user gets deleted a second time, it should update any attributes that have changed.
Currently, it throws a validation error:
Validation failed: User has already been taken, as the self.id already exists in the ArchivedUser table.
What is a better way to handle an object where you update an existing object if possible, or create a new record if it doesn't exist. I am using Rails 4 and have tried find_or_create_by but it throws an error
Mysql2::Error: Unknown column 'device_details.device_app_version'
which is odd, as that column exists in both tables and doesn't get modified.
User Delete Method
# creates ArchivedUser with the exact attributes of the User
# object and merges self.id to fill user_id on ArchivedUser
if ArchivedUser.create!(
self.attributes.merge(user_id: self.id)
)
Thanks for taking a peek!
If your archived_users table is truly acting as a backup for users and not adding any additional functionality, I would ditch the ArchiveUser model and simply add an archived boolean on the User model to tell whether or not the user is archived.
That way you don't have to deal with moving an object to another table and hooking into a delete callback.
However, if your ArchiveUser model does offer some different functionality compared to User, another option would be to use single table inheritence to differentiate the type of user. In this case, you could have User govern all users, and then distinguish between a user being, for example, an ActiveUser or an ArchivedUser.
This takes more setup and can be a bit confusing if you haven't worked with STI, but it can be useful when two similar models need to differ only slightly.
That being said, if you want to keep your current setup, I believe there are a few issues I see with your code:
If you are going to create an object from an existing object, it's good practice to duplicate the object (dup). That way the id won't be automatically set and can be auto-incremented.
If you truly have deleted the User record from the database, there's no reason to store a reference to its id because it's gone. But if you aren't actually deleting the record, you should definitely just use a boolean attribute to determine whether or not the user is active or archived.
I don't have enough context here as to why find_or_create_by isn't working, but if it were the case, then I would keep it as simple as possible. Don't use all the attributes, but just the consistent ones (like id) that you know will return the proper result.
if ArchivedUser.create! # ... is problematic. The bang after create (i.e. create!) will throw an error if the record could not be created, making the if pointless. So, either use if if you don't want errors thrown and want to handle the condition in which the record was not created. Or use create! without if if you do want to throw an error.

Sailsjs error handling with database / waterline errors

I'm building an api that receives webhook data (orders from shopify) and saves them to a relational mysql database. Each order has products in it that belong to collections (categories). I relate the product to the collection, or create a new collection if it doesn't already exist. The problem I run into is that since I can't use findOrCreate() I do a find() first, and then if nothing is found I do a create. However, since nodejs is async, if I get many orders coming in at once, sometimes the same collection isn't found for two different products at the same time, and then one will create it right before the other one tries, so when the other one tries to create it waterline throws an error:
col_id
• A record with that col_id already exists (405145674).
What is the best way to catch this error so I can just do another find() so the function returns the proper collection instead of an undefined? I could just do this:
if (err){ findmethodagain(); }
But then it would try that for any error, not a "record with that id already exists" error.
If you're not seeing this a lot, it wouldn't hurt to just do a second attempt after a delay. You can use Promise.delay from, for example, Bluebird to run a second function after some number of milliseconds on any error. If in most situations one of them creates the collection, then you'll succeed with just one retry.
If you are seeing lots of other errors, well, fix them.
However, race conditions usually indicate that you need to rework the architecture. I'd suggest instead of relying on error attributes or trying to hack around them, you redesign it, at least for the situation where you have to create a new collection. One popular option is to set up a worker queue. In your case, you would just use find in the first pass to see if a collection exists, and if not, kick it off to the queue for a single worker to pick up sequentially. In the worker job, do your find then create. You can use Redis and Bull to set this up quickly and easily.

Database Corruption: Missing/Duplicate Records in MS Access Database

The Problem
I have a VB6 application which uses Microsoft Access as the back end. The app is used in a multi-user environment. Recently, with no changes made to the application, we're seeing that in one of the tables of the database some records aren't being saved, while other records are saved twice, or some time even 3 times.
The Details
It's a VB6 application with Access 2002 as the back end. The app is installed in a computer running Windows 2008 Server. Multiple users on the network have a shortcut to the application on their computers and they run the application at once, accessing the same database but different records.
The application uses the following logic to save a record to the database:
1
If objectID > 0
' existing record
sql = "UPDATE myTable SET a=..., b=..., etc WHERE Id = objectID"
cn.Execute sql
Else
' new object; create new record
nextID = "SELECT Max(id) + 1 FROM myTable"
sql = "INSERT INTO myTable (a,b,c) VALUES (...)"
cn.Execute sql
objectID = nextID
End If
Exit Function
Err_Handler:
' handle the case where two people get the same ID
If timeNotExpired Then
' Try saving again;
Resume 1
Else
' Could not save; display error
End If
Thus when saving a record, if it exists it's UPDATED, otherwise it's INSERTED. The primary key field is obtained by calling Max(ID) + 1. With this setup, it's possible that Max(ID) + 1 may return the same ID for two users that are saving to the same table at the very same time. If this happens, the application goes back to where that label 1 is, and Max(ID) + 1 is called again until there is no conflict or until the save operation times out. Simple.
Last week, out of the blue, with no changes made to the application, it just started happening that (1) records in one table would randomly not save, or (2) a given record in that same table would show up in the database twice or even 3 times. In other words, a record in that table would appear in the database more than once.
It doesn't happen all the time but it happens a good 5-10 times a day. Please note that there are at least 5 people using the application throughout the day, mostly for data entry purposes. If a given record isn't save properly, the data gets out of sync and the application displays a message. At that point, if I check the database, I'll see that a record is either missing or duplicate. And usually, when it happens to one person, it will happen to other users who are also entering data. At the same time.
Edit
Let me add a bit more context... I have two tables (among others) that represent a parent/child relationship as in a customer/order scenario. A parent is required to have at least one child and the application has checks in place to ensure that a parent is not saved to the database unless the user has added at least one child for it. A user may not proceed to do anything with the application if he adds a parent without any children. The database code that saves parents (and children) has an if statement that reads something along the lines of "If parentHasNoChildren Exit Function". There's absolutely no way, absolutely no way, abso...lutely no... way... for the application to run code which would result in a parent that is saved to the database with no children.
But alas, starting last week, with absolutely no modifications to the application, we're seeing parents with no children left and right in the database. The problem occurs about 10 times per day, per user.
I have since modified the application so that it alerts the user when it finds a parent that has no children. If so, the program instructs them to delete the record and add it again, after which everything is fine.
Now, the fact that parents are reaching the database without children can only mean that (1) the application attempted to save the child, (2) Access returned no errors and behaved like everything was ok and (3) the application "thought everything was peachy" when in fact the child was not saved at all. And I know Access returned no error because the application logs every error that occurs during save operations. I checked the logs and there are no errors about children not being saved.
Edit 2: (I believe I found the problem!)
Upon inspection of the database, I just discovered that the primary key in the child table is gone. That is, the field that is supposed to be set as the primary key is there, but it isn't set as the primary key. I have no idea how this happened. The database design hasn't been touched so I'm assuming MS Access woke up one day and said "hmm, I wonder what would happen if I deleted the primary key from this table..."
In any event, I believe this is definitely the cause of my problem. The primary key was set up to prevent duplicate entries. With the key gone, it's possible to save two child records with the same ID. Since my code uses Max(ID) +1 to generate the ID for new child records, it's possible that Max (ID) +1 would return the same ID for multiple users that attempt to save a child record at the very same time. This would not be a issue in the past because Access would produce an error regarding the duplicate IDs and the application would detect the error and simply execute Max(ID) +1 again. But with no primary key, two child records would be saved with the same ID. And later, if any of the users made a change to one of those records, then both records would be updated and all fields for both (including the foreign key for the parent, parentID) would be set to identical values. This would then result in one parent having no children, and another parent having duplicate children. My goodness what a mess!!
I just tried adding the primary key to the table and I can't because there are duplicate records which I must find and delete. I'll post the final result as an answer after I'm able to add the primary key back. Thanks for all your help.
Now one last note: the table in question is the largest in the database, containing well over 3.5 million records. The table has 22 fields, 20 of which are long integers, one is a text field with a field size of 100. The other a Boolean field.
What I've Done
Since the application hasn't changed, I immediately assumed (and continue to assume) that the problem is corruption in the MS Access Database. I have done the following:
Compact the database every day
Create a fresh database and import the tables from the old database to the new one.
Create a new database and import only the definition of the tables, one at a time, then use Append Queries to get the data over to the new database.
Made sure I got latest service packs for Office
Made sure connection objects and recordsets are properly closed/disposed of
Contemplated suicide
Read and implemented suggestions detailed by a Microsoft article about how to keep a Jet database in top working order.
I'll also go over the application with a fine comb to see if I find anything, though everything points to Access being the culprit
Any Ideas?
Has anyone been in a situation like this before? I myself had a a similar issue with the same database about 10 years ago. Back then, I was getting the "Unrecognizable file format" error which was a clear case of database corruption. I fixed it by creating a new database and importing the tables, but that hasn't helped this time. Any ideas?
I would check the data type of the ID column, and make sure it is large enough. Consider changing it to a counter data type instead of running a domain function (MAX(ID)), or a replicationID if possible. That's where your issue may be happening.
I've set the ID to LONG before, and maintained my own counter in another table. Maybe a table that holds the NextID column, and a VBA function that gets this and updates + 1 for the next guy. Within a transaction this can be more reliable than MAX, which has to contend with locks.
Good luck!
It turns out the issue stemmed from the fact that Microsoft Access dropped the primary keys of two tables on the database. Because I use Max(ID) + 1 to obtain the ID for new records, and there are multiple users creating new records at once, the same ID was sometimes used for more than one record. This caused the issues mentioned above.
To solve the problem, I simply added the keys back after deleting any duplicate entries I found.
I'll also try to stay clear of Max(ID) + 1 for new records as suggested by #DanielG.
If anyone else is unlucky enough to be working with MS Access in a multi-user environment, I suggest following Microsoft's suggestions outlined in the article How to keep a Jet 4.0 database in top working condition.
Thank you all for your help!

Inserting records with Nested ClientDataSet with autoincrementing link

I'm teaching myself Delphi database programming using a MySQL database. I'm trying to add a record from a nested ClientDataSet with the link between master and detail tables an autoincrement field in the master table. I found a question/answer pair that appears to answer my question at: Inserting records with autoincrementing primary keys
The thing I don't understand is setting the required flag in the Query. I can't figure out how to do that as I'm too inexperienced, nor do I understand why it is necessary.
Similar to the question linked above, I have a
(SQLConnection->TSQLDataSet->DataSetProvider->ClientDataSet using dbexpress.
| |->LinkDataSource
->TSQLDataSet2->LinkDataSource
I load data into my nested ClientDataSet fine, so the component links to create the nested structure work. After loading the master/detail tables into the nested dataset, the following code gives an error.
MasterCDS1.Append;
MasterCDS1.FieldByName('TLNo').Required := False;
MasterSDS.FieldByName('TLNo').Required := False; { Error: Field 'TLNo' not found }
MasterCDS1.FieldByName('TLNo').ProviderFlags := [pfInWhere, pfInKey];
{ ... Populate Master table Fields}
MasterCDS1.Post;
MasterCDS1.ApplyUpdates(0);
TLNo is the field linking the tables and part of the primary key of the master table, and part of the primary key of the detail table. The third line where I try to set the TSQLDataSet generates the error shown in the comment. MasterSDS is where I put my 'Select * from master' query. MasterCDS learns the Schema from this query and that the field TLNo is a required field in both master and detail MySQL tables. That third line of code is my "interpretation" of what Mr Uwe Raabe said to do. Clearly I did this wrong. Can someone provide a code example so that this Delphi noob won't misinterpret the instructions? Thanks in advance.
The only reason I can imagine for the error you describe is that MasterSDS is not open when you execute that third line. "Field not found" raises either when the field does not exist in the table or the dataset (i.e. query in this case) is not open and has no static fields defined.
This leads to another point I want to mention: place the Required and the ProviderFlags settings in the AfterOpen event of the corresponding dataset. There is no need to repeat these settings whenever you append a record. If you work with static fields you can even do these settings in the Object Inspector.
For being a starter I suggest you always use static fields which can be adjusted inside the IDE. This will simplify things significantly.

LINQ to SQL - retrieve object, modify, SubmitChanges() creates new objects

I've been battling this for a while. I'm trying to implement a many to one association. I have a bunch of rows in a table, called readings. These accumulate over time, and every now and then I want to export them. When I export them I want to create a new object called ExportEvent, to track which rows were exported, so they can be re-exported if need be. Therefore Reading has a nullable foreign key relationship with ExportEvent, as I create the readings before I export them.
What I'm finding is that when I then do the export, whether I first create the ExportEvent (evt) and add the readings using
evt.Readings.AddRange(),
or if I use
foreach(reading)
reading.ExportEvent = evt
When I call SubmitChanges I am always getting a new bunch of readings created with the association to evt, and the original records aren't updated.
I pared this back to its simplest though, just to see if I could create the two objects with no association, and I even found when I just retrieved all the readings and updated an int value on them, submitchanges still inserted a bunch of new records. What's going on?
Hmmm. Interesting - just clicked this link in my bookmarks, and found that the question has been resurrected, so will provide the (embarrassing) solution. All of my entities have audit data properties on them - CreatedDate and UpdatedDate. Therefore I've implemented the partial methods for the insert and update of each entity in the datacontext. I had copied and pasted (how often is this the cause of some downfall) some of these insert and update methods for the newly created entities. As a result I'd also copied an error, where the Update[blah] methods were calling ExecuteDynamicInsert, instead of ExecuteDynamicUpdate.
Suffice to say I was very frustrated when for 3 hours I'd been trying frantically to solve this problem, only to find it was due to a (silly) copy/paste error - and only to find the error about 3 mins after I'd posted this question!
Hope this helps someone.
I suspect it is because you are calling AddRange(). This will add the new objects to the data context. Instead, you should try just re attaching the existing objects by called Attach() on your data context.
(Or if you never detached them and still have your original data context, you don't need to do anything, just make the changes to the objects and call SubmitChanges())