SQLAlchemy One To Many Updates Failing - sqlalchemy

Thank you in advance for your feedback on this. I've check quite a few sources including StackOverflow and still haven't found a resolution that works for this use case.
I'm developing a CRUD API using SQLAlchemy and the goal is to pass in a Pydantic object and update the database to reflect whatever is in the Pydantic object (including lists of attributes). This seems to only be an issue when the "many" table can't have a composite primary key because one of the fields is nullable.
Below is a very simple example where we have organizations and the organizations can have partner-preferred alternative names or generic alternative names. Essentially, you would be able to say that the name of the organization is "Microsoft" but the New York Stock Exchange refers to it as "MSFT". Additionally, in the past, we've seen the incorrect version, "Micro-Soft".
# Create the classes
class Organization(Base):
__tablename__='organization'
org_id=Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
org_name=Column(String(255))
alt_names = relationship("AltOrgName", lazy=False)
__table_args__={'schema': 'dbo'}
class AltOrgName(Base):
__tablename__='alt_org_name'
alt_org_name_id = Column(UUID(as_uuid=True), primary_key=True, default=uuid.uuid4)
org_id=Column(UUID(as_uuid=True), ForeignKey(Organization.org_id))
alt_org_name = Column(String(255))
partner = Column(String(10)) # <-----Nullable, so it can't be part of the primary key
__table_args__=(UniqueConstraint(org_id, alt_org_name, partner), {'schema': 'dbo'})
From there you can create an organization with an alternative name:
# Create the organization
o1 = Organization(org_name="Microsoft",
alt_names = [AltOrgName(alt_org_name="MSFT", partner="New York Stock Exchange")]
)
db.add(o1)
db.commit()
Now I would like to fully overwrite the former version of the alternative names. In practice, this would be constructed from a Pydantic model, but this is functionally similar enough:
for an in o1.alt_names:
db.delete(an)
o1.alt_names = []
o1.alt_names.append(AltOrgName(alt_org_name="MSFT", partner="New York Stock Exchange"))
o1.alt_names.append(AltOrgName(alt_org_name="Micro-Soft", partner=None))
db.commit()
This leads to an integrity error:
IntegrityError: (pyodbc.IntegrityError) ('23000', "[23000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Violation of UNIQUE KEY constraint 'UQ__alt_org___89413C3E3CD4661D'. Cannot insert duplicate key in object 'dbo.alt_org_name'. The duplicate key value is (c8e907b3-fc71-40de-8cf1-0806e2ebbce6, MSFT, New York Stock Exchange). (2627) (SQLExecDirectW); [23000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]The statement has been terminated. (3621)")
[SQL: INSERT INTO dbo.alt_org_name (alt_org_name_id, org_id, alt_org_name, partner) VALUES (?, ?, ?, ?)]
[parameters: ('d828928d-ba4a-49e9-989e-44fda8027de5', 'c8e907b3-fc71-40de-8cf1-0806e2ebbce6', 'MSFT', 'New York Stock Exchange')]
However, committing after the delete works.
for an in o1.alt_names:
db.delete(an)
db.commit() # <----------New line of code
o1.alt_names = []
o1.alt_names.append(AltOrgName(alt_org_name="MSFT", partner="New York Stock Exchange"))
o1.alt_names.append(AltOrgName(alt_org_name="Micro-Soft", partner=None))
db.commit()
Having two commits is an extremely undesirable code pattern since it creates the risk that the delete commit will work but the revision commit won't--which will ultimately wipe out all alternative names for the company. For instance, the code below would result in an integrity error and wipe out all alternative names for the organization without correctly replacing them:
for an in o1.alt_names:
db.delete(an)
db.commit()
o1.alt_names = []
o1.alt_names.append(AltOrgName(alt_org_name="MSFT", partner="New York Stock Exchange"))
o1.alt_names.append(AltOrgName(alt_org_name="Micro-Soft", partner=None))
o1.alt_names.append(AltOrgName(alt_org_name="Micro-Soft", partner=None)) # <-------------Duplicate
db.commit()
The issue definitely seems to be sqlalchemy's order of operations. If it performed the delete command first, there would be no issue. However, for some reason, it seems like the old alternative names are somehow still in the database at the time it attempts to insert the new ones. One thought is to somehow tell SQLAlchemy to perform the delete before attempting the inserts.
The other resolution is to make the primary key of the alt_org_name table (org_id, alt_org_name, partner) and make the null value for partner an empty string. However, I have a very similar situation where one of the nullable fields is a number so I would prefer a more robust solution if one exists.
Grateful for any ideas or suggestions.
EDIT
I got a response back from SQLAlchemy's team and adding db.flush() seems to fix it
for an in o1.alt_names:
db.delete(an)
db.flush() # <---- New line of code
o1.alt_names = []
o1.alt_names.append(AltOrgName(alt_org_name="MSFT", partner="New York Stock Exchange"))
o1.alt_names.append(AltOrgName(alt_org_name="Micro-Soft", partner=None))
db.commit()

Related

Django admin - model visible to superuser, not staff user

I am aware of syncdb and makemigrations, but we are restricted to do that in production environment.
We recently had couple of tables created on production. As expected, tables were not visible on admin for any user.
Post that, we had below 2 queries executed manually on production sql (i ran migration on my local and did show create table query to fetch raw sql)
django_content_type
INSERT INTO django_content_type(name, app_label, model)
values ('linked_urls',"urls", 'linked_urls');
auth_permission
INSERT INTO auth_permission (name, content_type_id, codename)
values
('Can add linked_urls Table', (SELECT id FROM django_content_type where model='linked_urls' limit 1) ,'add_linked_urls'),
('Can change linked_urls Table', (SELECT id FROM django_content_type where model='linked_urls' limit 1) ,'change_linked_urls'),
('Can delete linked_urls Table', (SELECT id FROM django_content_type where model='linked_urls' limit 1) ,'delete_linked_urls');
Now this model is visible under super-user and is able to grant access to staff users as well, but staff users cant see it.
Is there any table entry that needs to be entered in it?
Or is there any other way to do a solve this problem without syncdb, migrations?
We recently had couple of tables created on production.
I can read what you wrote there in two ways.
First way: you created tables with SQL statements, for which there are no corresponding models in Django. If this is the case, no amount of fiddling with content types and permissions that will make Django suddenly use the tables. You need to create models for the tables. Maybe they'll be unmanaged, but they need to exist.
Second way: the corresponding models in Django do exist, you just manually created tables for them, so that's not a problem. What I'd do in this case is run the following code, explanations follow after the code:
from django.contrib.contenttypes.management import update_contenttypes
from django.apps import apps as configured_apps
from django.contrib.auth.management import create_permissions
for app in configured_apps.get_app_configs():
update_contenttypes(app, interactive=True, verbosity=0)
for app in configured_apps.get_app_configs():
create_permissions(app, verbosity=0)
What the code above does is essentially perform the work that Django performs after it runs migrations. When the migration occurs, Django just creates tables as needed, then when it is done, it calls update_contenttypes, which scans the table associated with the models defined in the project and adds to the django_content_type table whatever needs to be added. Then it calls create_permissions to update auth_permissions with the add/change/delete permissions that need adding. I've used the code above to force permissions to be created early during a migration. It is useful if I have a data migration, for instance, that creates groups that need to refer to the new permissions.
So, finally i had a solution.I did lot of debugging on django and apparanetly below function (at django.contrib.auth.backends) does the job for providing permissions.
def _get_permissions(self, user_obj, obj, from_name):
"""
Returns the permissions of `user_obj` from `from_name`. `from_name` can
be either "group" or "user" to return permissions from
`_get_group_permissions` or `_get_user_permissions` respectively.
"""
if not user_obj.is_active or user_obj.is_anonymous() or obj is not None:
return set()
perm_cache_name = '_%s_perm_cache' % from_name
if not hasattr(user_obj, perm_cache_name):
if user_obj.is_superuser:
perms = Permission.objects.all()
else:
perms = getattr(self, '_get_%s_permissions' % from_name)(user_obj)
perms = perms.values_list('content_type__app_label', 'codename').order_by()
setattr(user_obj, perm_cache_name, set("%s.%s" % (ct, name) for ct, name in perms))
return getattr(user_obj, perm_cache_name)
So what was the issue?
Issue lied in this query :
INSERT INTO django_content_type(name, app_label, model)
values ('linked_urls',"urls", 'linked_urls');
looks fine initially but actual query executed was :
--# notice the caps case here - it looked so trivial, i didn't even bothered to look into it untill i realised what was happening internally
INSERT INTO django_content_type(name, app_label, model)
values ('Linked_Urls',"urls", 'Linked_Urls');
So django, internally, when doing migrate, ensures everything is migrated in lower case - and this was the problem!!
I had a separate query executed to lower case all the previous inserts and voila!

update SQL table from foreign data source without first deleting all entries (but do delete entries no longer present)

I have a bunch of MySQL tables I work with where the ultimate data source from a very slow SQL server administered by someone else. My predecessors' solution to dealing with this is to do queries more-or-less like:
results = python_wrapper('SELECT primary_key, col2, col3 FROM foreign_table;')
other_python_wrapper('DELETE FROM local_table;')
other_python_wrapper('INSERT INTO local_table VALUES() %s;' % results)
The problem is this means you can never use values in local_table as foreign key constraints for other tables because they are constantly being deleted and added back into the table whenever you update it from the foreign source. However, if a record really does dis sapper in the results to the query on the foreign server, than that usually means you would want to trigger a cascade effect to drop records in other local tables that you've linked with a foreign key constraint to data propagated from the foreign table.
The only semi-reasonable solution I've come up with is to do something like:
results = python_wrapper('SELECT primary_key, col2, col3 FROM foreign_table;')
other_python_wrapper('DELETE FROM local_table_temp;')
other_python_wrapper('INSERT INTO local_table_temp VALUES() %s;' % results)
other_python_wrapper('DELETE FROM local_table WHERE primary_key NOT IN local_table_temp;')
other_python_wrapper('INSERT INTO local_table SELECT * FROM local_table_temp ON DUPLICATE KEY UPDATE local_table.col2 = local_table_temp.col2, local_table.col3 = local_table_temp.col3
The problem is there's a fair number of these tables and many of the tables have a large number of columns that need to be updated so it's tedious to write the same boiler-plate over & over. And if the table schema changes, there's more than one place you need to update the listing of all columns.
Is there any more concise way to do this with the SQL code?
Thanks!
I have a somewhat un-satisfactory answer to my own question. Since I'm using python to query the foreign Oracle database and put that into SQL, and I trust the format of the table and column names to be pretty well behaved, I can just wrap the whole procedure in python code and have python generate the update SQL update queries based off inspecting the tables.
For a number of reasons, I'd still like to see a better way to do this, but it works for me because:
I'm using an external scripting language that can inspect the database schema anyway.
I trust the database, column, and table names I'm working with to be well-behaved because these are all things I have direct control over.
My solution depends on the local SQL table structure; specifically which keys are primary keys. The code won't work without properly structured tables. But that's OK, because I can restructure the MySQL tables to make my python code work.
While I do hope someone else can think up a more-elegant and/or general-purpose solution, I will offer up my own python code to anyone who is working on a similar problem who can safely make the same assumptions I did above.
Below is a python wrapper I use to do simple SQL queries in python:
import config, MySQLdb
class SimpleSQLConn(SimpleConn):
'''simplified wrapper around a MySQLdb.connection'''
def __init__(self, **kwargs):
self._connection = MySQLdb.connect(host=config.mysql_host,
user=config.mysql_user,
passwd=config.mysql_pass,
**kwargs)
self._cursor = self._connection.cursor()
def query(self, query_str):
self._cursor.execute(query_str)
self._connection.commit()
return self._cursor.fetchall()
def columns(self, database, table):
return [x[0] for x in self.query('DESCRIBE `%s`.`%s`' % (database, table))g]
def primary_keys(self, database, table):
return [x[0] for x in self.query('DESCRIBE `%s`.`%s`' % (database, table)) if 'PRI' in x]
And here is the actual update function, using the SQL wrapper class above:
def update_table(database,
table,
mysql_insert_with_dbtable_placeholder):
'''update a mysql table without first deleting all the old records
mysql_insert_with_dbtable_placeholder should be set to a string with
placeholders for database and table, something like:
mysql_insert_with_dbtable_placeholder = "
INSERT INTO `%(database)s`.`%(table)s` VALUES (a, b, c);
note: code as is will update all the non-primary keys, structure
your tables accordingly
'''
sql = SimpleSQLConn()
query ='DROP TABLE IF EXISTS `%(database)s`.`%(table)s_temp_for_update`' %\
{'database': database, 'table': table}
sql.query(query)
query ='CREATE TABLE `%(database)s`.`%(table)s_temp_for_update` LIKE `%(database)s`.`%(table)s`'%\
{'database': database, 'table': table}
sql.query(query)
query = mysql_insert_with_dbtable_placeholder %\
{'database': database, 'table': '%s_temp_for_update' % table}
sql.query(query)
query = '''DELETE FROM `%(database)s`.`%(table)s` WHERE
(%(primary_keys)s) NOT IN
(SELECT %(primary_keys)s FROM `%(database)s`.`%(table)s_temp_for_update`);
''' % {'database': database,
'table': table,
'primary_keys': ', '.join(['`%s`' % key for key in sql.primary_keys(database, table)])}
sql.query(query)
update_columns = [col for col in sql.columns(database, table)
if col not in sql.primary_keys(database, table)]
query = '''INSERT into `%(database)s`.`%(table)s`
SELECT * FROM `%(database)s`.`%(table)s_temp_for_update`
ON DUPLICATE KEY UPDATE
%(update_cols)s
''' % {'database': database,
'table': table,
'update_cols' : ',\n'.join(['`%(table)s`.`%(col)s` = `%(table)s_temp_for_update`.`%(col)s`' \
% {'table': table, 'col': col} for col in update_columns])}
sql.query(query)

Updating object fields from separate processes? (kind of upsert)

I have Task objects with several attributes. These tasks are bounced between several processes (using Celery) and I'd like to update the task status in a database.
Every update should update only non-NULL attributes of the object. So far I have something like:
def del_empty_attrs(task):
for name in (key for key, val in vars(task).iteritems() if val is None):
delattr(task, name)
def update_task(session, id, **kw):
task = session.query(Task).get(id)
if task is None:
task = Task(id=id)
for key, value in kw.iteritems():
if not hasattr(task, key):
raise AttributeError('Task does not have {} attribute'.format(key))
setattr(task, key, value)
del_empty_attrs(task) # Don't update empty fields
session.merge(task)
However, get either IntegrityError or StaleDataError. What the right way to do this?
I think the problem is that every process has its own session, but I'm not sure.
a lot more detail would be needed to say for sure, but there is a race condition in this code:
def update_task(session, id, **kw):
# 1.
task = session.query(Task).get(id)
if task is None:
# 2.
task = Task(id=id)
for key, value in kw.iteritems():
if not hasattr(task, key):
raise AttributeError('Task does not have {} attribute'.format(key))
setattr(task, key, value)
del_empty_attrs(task) # Don't update empty fields
# 3.
session.merge(task)
If two processes both encounter #1, and find the object for the given id to be None, they both proceed to create a new Task() object with the given primary key (assuming id here is the primary key attribute). Both processes then race down to the Session.merge() which will attempt to emit an INSERT for the row. One process gets the INSERT, the other one gets an IntegrityError as it did not INSERT the row before the other one did.
There's no simple answer for how to "fix" this, it depends on what you're trying to do. One approach might be to ensure that no two processes work on the same pool of primary key identifiers. Another would be to ensure that all INSERTs of non-existent rows are handled by a single process.
Edit: other approaches might involve going with an "optimistic" approach, where SAVEPOINT (e.g. Session.begin_nested()) is used to intercept an IntegrityError on an INSERT, then continue on after it occurs.

How does SqlAlchemy handle unique constraint in table definition

I have a table with the following declarative definition:
class Type(Base):
__tablename__ = 'Type'
id = Column(Integer, primary_key=True)
name = Column(String, unique = True)
def __init__(self, name):
self.name = name
The column "name" has a unique constraint, but I'm able to do
type1 = Type('name1')
session.add(type1)
type2 = Type(type1.name)
session.add(type2)
So, as can be seen, the unique constraint is not checked at all, since I have added to the session 2 objects with the same name.
When I do session.commit(), I get a mysql error since the constraint is also in the mysql table.
Is it possible that sqlalchemy tells me in advance that I can not make it or identifies it and does not insert 2 entries with the same "name" columm?
If not, should I keep in memory all existing names, so I can check if they exist of not, before creating the object?
SQLAlechemy doesn't handle uniquness, because it's not possible to do good way. Even if you keep track of created objects and/or check whether object with such name exists there is a race condition: anybody in other process can insert a new object with the name you just checked. The only solution is to lock whole table before check and release the lock after insertion (some databases support such locking).
AFAIK, sqlalchemy does not handle uniqueness constraints in python behavior. Those "unique=True" declarations are only used to impose database level table constraints, and only then if you create the table using a sqlalchemy command, i.e.
Type.__table__.create(engine)
or some such. If you create an SA model against an existing table that does not actually have this constraint present, it will be as if it does not exist.
Depending on your specific use case, you'll probably have to use a pattern like
try:
existing = session.query(Type).filter_by(name='name1').one()
# do something with existing
except:
newobj = Type('name1')
session.add(newobj)
or a variant, or you'll just have to catch the mysql exception and recover from there.
From the docs
class MyClass(Base):
__tablename__ = 'sometable'
__table_args__ = (
ForeignKeyConstraint(['id'], ['remote_table.id']),
UniqueConstraint('foo'),
{'autoload':True}
)
.one() throws two kinds of exceptions:
sqlalchemy.orm.exc.NoResultFound and sqlalchemy.orm.exc.MultipleResultsFound
You should create that object when the first exception occurs, if the second occurs you're screwed anyway and shouldn't make is worse.
try:
existing = session.query(Type).filter_by(name='name1').one()
# do something with existing
except NoResultFound:
newobj = Type('name1')
session.add(newobj)

Odd IntegrityError on MySQL: #1452

This is sort of an odd one but I'll try to explain as best I can. I have 2 models: one representing an email message (Message), the other a sales lead (AffiliateLead). When a form is submitted through the site, the system generates a lead and then emails. The Message model has an optional FK back to the Lead. From the Message models file:
lead = models.ForeignKey('tracking.AffiliateLead', blank=True, null=True)
Now, this basic shell works:
from tracking.models import Affiliate, AffiliateLead
from messages.models import Message
from django.contrib.auth.models import User
u = User.objects.get(username='testguy')
a = Affiliate.objects.get(affiliate_id = 'ACD023')
l = AffiliateLead(affiliate = a)
l.save()
m = Message(recipient=u, sender=u, subject='s', body='a', lead=l)
m.save()
However, the form view itself does not. It throws an IntegrityError when I try to save a Message that points to an AffiliateLead:
(1452, 'Cannot add or update a child row: a foreign key constraint fails (`app`.`messages_message`, CONSTRAINT `lead_id_refs_id_6bc546751c1f96` FOREIGN KEY (`lead_id`) REFERENCES `tracking_affiliatelead` (`id`))')
This is despite the fact that the view is simply taking the form, creating and saving the AffiliateLead, then creating and (trying) to save the Message. In fact, when this error is thrown, I can go into MySQL and see the newly-created lead. It even throws this error in the view when I re-retrieve the lead from the DB immediately before saving:
af_lead = AffiliateLead.objects.get(id = af_lead.id)
msg.lead = af_lead
msg.save()
Finally, if I immediately refresh (re-submitting the form), it works. No IntegrityError. If I have Django print out the SQL it's doing, I can indeed see that it is INSERTing the AffiliateLead before it tries to INSERT the Message, and the Message INSERT is using the correct AffiliateLead ID. I'm really stumped at this point. I've even tried manual transaction handling to no avail.
I'm not exactly sure why it happened, but I did seem to find a solution. I'm using South to manage the DB; it created Messages as InnoDB and AffiliateLead as MyISAM. Changing the AffiliateLead table to InnoDB ended the IntegrityErrors. Hope this helps someone else.