How to differentiate causes of SQLAlchemy's IntegrityError? - sqlalchemy

SQLAlchemy appears to just throw a general IntegrityError when there is a data integrity problem with a transaction. Of course, the exact query and error message are contained in the exception, which is sufficient for a human debugging the program. However, when writing error handling code for the exception, there doesn't seem to be a good way so far as I can tell to check which constraint on which table was responsible for the error. Also, the exception is raised by the session.commit() line rather than the line actually responsible for producing the error, so I can't differentiate using multiple try/except blocks either.
Is there a way, short of trying to programmatically parse the error message and/or query, that I can for example distinguish a duplicate primary key error from a foreign key error or a failed CHECK constraint and so forth? Or even just a way to tell which column of which table is in violation of the data integrity? Or just a way to raise the exception immediately on the line that caused the error rather than waiting for the transaction to be committed?

The IntegrityError instance has orig and statement attributes which can be inspected to obtain the error message and the failing SQL statement, respectively.
Given this model:
class Foo(Base):
__tablename__ = 'foo20201209'
id = sa.Column(sa.Integer, primary_key=True)
bar = sa.Column(sa.String(2), unique=True)
baz = sa.Column(sa.Integer, sa.CheckConstraint('baz >= 0'), default=0)
this code:
conn_strings = ['postgresql+psycopg2:///test',
'mysql+mysqlconnector:///test',
'sqlite://']
for cs in conn_strings:
engine = sa.create_engine(cs)
Base.metadata.drop_all(bind=engine)
Base.metadata.create_all(bind=engine)
session = orm.Session(bind=engine)
for kwds in [{'bar': 'a'}, {'bar': 'a'}, {'bar': 'b', 'baz': -11}]:
session.add(Foo(**kwds))
try:
session.commit()
except sa.exc.IntegrityError as ex:
print(ex.orig)
print(ex.statement)
print()
session.rollback()
session.close()
engine.dispose()
will produce this output:
duplicate key value violates unique constraint "foo20201209_bar_key"
DETAIL: Key (bar)=(a) already exists.
INSERT INTO foo20201209 (bar, baz) VALUES (%(bar)s, %(baz)s) RETURNING foo20201209.id
new row for relation "foo20201209" violates check constraint "foo20201209_baz_check"
DETAIL: Failing row contains (3, b, -11).
INSERT INTO foo20201209 (bar, baz) VALUES (%(bar)s, %(baz)s) RETURNING foo20201209.id
1062 (23000): Duplicate entry 'a' for key 'bar'
INSERT INTO foo20201209 (bar, baz) VALUES (%(bar)s, %(baz)s)
4025 (23000): CONSTRAINT `foo20201209.baz` failed for `test`.`foo20201209`
INSERT INTO foo20201209 (bar, baz) VALUES (%(bar)s, %(baz)s)
UNIQUE constraint failed: foo20201209.bar
INSERT INTO foo20201209 (bar, baz) VALUES (?, ?)
CHECK constraint failed: foo20201209
INSERT INTO foo20201209 (bar, baz) VALUES (?, ?)

As DMJ commented below my original answer, different database engines emit different error messages in the event of an integrity error, and SQLAlchemy does not attempt present these messages in a consistent way, so the short answer to the question is: you can't get useful information from the exception without parsing the error messages.
That said, parsing the error messages may not be so difficult, if the constraints are named in a consistent fashion, and these names appear in the error messages. Here SQLAlchemy provides some assistance: we can create a Metadata object with a naming convention for constraints defined (docs). With this convention in place any constraints we create will follow it, we can match constraint names in error messages and use the matches to lookup the table and constraint objects in the metadata.
Here's an example of how you might parse error messages using the conventions. I haven't covered all possible constraint types, nor handled Sqlite's unique key violation message, which omits the constraint name. These are left as exercises for the reader ;-)
import re
import sqlalchemy as sa
convention = {
"ix": 'ix_%(column_0_label)s',
"uq": "uq_%(table_name)s_%(column_0_name)s",
"ck": "ck_%(table_name)s_%(constraint_name)s",
"fk": "fk_%(table_name)s_%(column_0_name)s_%(referred_table_name)s",
"pk": "pk_%(table_name)s"
}
metadata = sa.MetaData(naming_convention=convention)
tbl = sa.Table(
't65189213',
metadata,
sa.Column('id', sa.Integer, primary_key=True),
sa.Column('foo', sa.String(2), unique=True),
sa.Column('bar', sa.Integer, default=0),
sa.CheckConstraint('bar >= 0', name='positive_bar'),
)
# Basic pattern to match constraint names (not exhaustive).
pattern = r'(?P<type_key>uq|ck)_(?P<table>[a-z][a-z0-9]+)_[a-z_]+'
regex = re.compile(pattern)
def parse_exception(ex):
# Returns an informative message, or the original error message.
# We could return the table, constraint object etc. instead.
types = {'ck': 'Check constraint', 'uq': 'Unique key'}
m = regex.search(str(ex.orig))
if m:
type_ = types[m.groupdict()['type_key']]
table_name = m.groupdict()['table']
constraint_name = m.group(0)
table = metadata.tables[table_name]
constraint = next(c for c in table.constraints if c.name == constraint_name)
columns = ','.join(constraint.columns.keys())
return f'{type_} {constraint_name} has been violated on table {table_name!r} columns: {columns}'
return f'{ex.orig}'
conn_strings = ['postgresql+psycopg2:///test',
'mysql+mysqlconnector://root:root#localhost/test',
'sqlite://']
for cs in conn_strings:
engine = sa.create_engine(cs, future=True)
tbl.drop(engine, checkfirst=True)
tbl.create(engine)
with engine.connect() as conn:
print(engine.dialect.name)
for kwds in [{'foo': 'a'}, {'foo': 'a'}, {'foo': 'b', 'bar': -11}]:
try:
conn.execute(tbl.insert(), kwds)
conn.commit()
except sa.exc.IntegrityError as ex:
print(parse_exception(ex))
conn.rollback()
print('-'*10)
engine.dispose()
Output:
postgresql
Unique key uq_t65189213_foo has been violated on table 't65189213' columns: foo
Check constraint ck_t65189213_positive_bar has been violated on table 't65189213' columns:
----------
mysql
Unique key uq_t65189213_foo has been violated on table 't65189213' columns: foo
Check constraint ck_t65189213_positive_bar has been violated on table 't65189213' columns:
----------
sqlite
UNIQUE constraint failed: t65189213.foo
Check constraint ck_t65189213_positive_bar has been violated on table 't65189213' columns:
----------

I ended up using session.flush() to trigger the exceptions earlier. I call it once before the line(s) in question (so I know for 100% certain that the exception wasn't triggered by previous lines) and again inside a try/catch block to see if the line(s) in question caused an error.
I admin I'm not completely happy with this solution, but I haven't been able to find anything else. I'd still love to hear if there is a better solution, ideally one that will tell me exactly which constraint of which table caused the error. But, this is a workaround that might help someone.

I have faced the same trouble, and it looks like, for instance, for constraints it's a better solution to give them names, parsing those exception messages then. Anyway, for my SQLAlchemy==1.4.28, a UNIQUE constraint name is still not presented anywhere in the Exception thrown.
Consider the next example:
class M(Model):
a = Column(String)
i = Column(Integer)
UniqueConstraint(M.i, name="my unique")
CheckConstraint(0 <= M.i, name="my check")
def test_test():
db = create_test_database(Model)
try:
with db.session() as s:
# Here we break our UNIQUE constraint
try:
with s.begin_nested():
s.add_all(
[
M(a="Alice", i=1),
M(a="Bob", i=1),
]
)
except IntegrityError as err:
rich.inspect(err)
# Here we break our CHECK constraint
s.add(M(a="Alice", i=-1))
except IntegrityError as err:
rich.inspect(err)
And here are the results:
┌────────────────────── <class 'sqlalchemy.exc.IntegrityError'> ───────────────────────┐
│ Wraps a DB-API IntegrityError. │
│ │
│ ┌──────────────────────────────────────────────────────────────────────────────────┐ │
│ │ IntegrityError('(sqlite3.IntegrityError) UNIQUE constraint failed: M.i') │ │
│ └──────────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ args = ('(sqlite3.IntegrityError) UNIQUE constraint failed: M.i',) │
│ code = 'gkpj' │
│ connection_invalidated = False │
│ detail = [] │
│ hide_parameters = False │
│ ismulti = False │
│ orig = IntegrityError('UNIQUE constraint failed: M.i') │
│ params = ('Bob', 1) │
│ statement = 'INSERT INTO "M" (a, i) VALUES (?, ?)' │
└──────────────────────────────────────────────────────────────────────────────────────┘
┌──────────────────────── <class 'sqlalchemy.exc.IntegrityError'> ─────────────────────────┐
│ Wraps a DB-API IntegrityError. │
│ │
│ ┌──────────────────────────────────────────────────────────────────────────────────────┐ │
│ │ IntegrityError('(sqlite3.IntegrityError) CHECK constraint failed: my check') │ │
│ └──────────────────────────────────────────────────────────────────────────────────────┘ │
│ │
│ args = ('(sqlite3.IntegrityError) CHECK constraint failed: my check',) │
│ code = 'gkpj' │
│ connection_invalidated = False │
│ detail = [] │
│ hide_parameters = False │
│ ismulti = False │
│ orig = IntegrityError('CHECK constraint failed: my check') │
│ params = ('Alice', -1) │
│ statement = 'INSERT INTO "M" (a, i) VALUES (?, ?)' │
└──────────────────────────────────────────────────────────────────────────────────────────┘
So, it looks like this CHECK constraint name will always be somewhere in the exception's string, and you can write a code around this. And it's clear why this is extra info for a UNIQUE constraint, cause you have already had the word UNIQUE and a field's name (A.i in my example). I don't think they will ever change these strings formats, but it'd be interesting to ask them about.
WARNING:
The problem is that my code relates to SQLite, while yours may relate to another DB and those messages will differ cause they are originally from the underlying DB engine, not from SQLAlchemy itself. So you have to take care about abstracting your code for those strings.

Related

Force check constraints to be evaluated before computed columns

I want to have a JSON column in a table. I want to have (persisted) computed column that extracts useful information from the JSON data.
I want to have a "strict" JSON path but I also want to check that the path exists in the JSON so that the error message is specific to the table and isn't just about the illegal JSON path.
CREATE TABLE DataWithJSON (
DataID BIGINT,
DataJSON NVARCHAR(MAX) CONSTRAINT CK_DataWithJSON_DataJSON CHECK (
ISJSON(DataJSON) = 1
AND JSON_VALUE(DataJSON, 'lax $.Data.ID') IS NOT NULL
),
DataJSONID AS JSON_VALUE(DataJSON, 'strict $.Data.ID') PERSISTED
);
INSERT INTO DataWithJSON (DataID, DataJSON)
VALUES (666, N'{"Data":{"Name":"Tydýt"}}');
This code returns (on my machine) somewhat mysterious error message
Msg 13608, Level 16, State 2, Line xx Property cannot be found on the specified JSON path.
I would like to see more specific message
Msg 547, Level 16, State 0, Line yy The INSERT statement conflicted with the CHECK constraint "CK_DataWithJSON_DataJSON". The conflict occurred in database "DB", table "schema.DataWithJSON", column 'DataJSON'.
Is this possible to achieve just with table constraints or am I out of luck and do I have to check the JSON in a stored procedure/application before inserting to the table?
One solution would be to have "lax" path in the computed column, which, hopefully, is not the only solution. I will fall back to that solution if there is none other to be found.
You can't control the order that check constraints and computed columns are evaluated but you can use a CASE expression in the computed column definition so that the JSON_VALUE(... 'strict ...) part is only evaluated if the check constraint would pass.
CREATE TABLE DataWithJSON (
DataID BIGINT,
DataJSON NVARCHAR(MAX) CONSTRAINT CK_DataWithJSON_DataJSON CHECK (
ISJSON(DataJSON) = 1 AND JSON_VALUE(DataJSON, 'lax $.Data.ID') IS NOT NULL
),
DataJSONID AS CASE WHEN ISJSON(DataJSON) = 1 AND JSON_VALUE(DataJSON, 'lax $.Data.ID') IS NOT NULL THEN JSON_VALUE(DataJSON, 'strict $.Data.ID') END PERSISTED
);
Msg 547, Level 16, State 0, Line 9 The INSERT statement conflicted
with the CHECK constraint "CK_DataWithJSON_DataJSON". The conflict
occurred in database "Foo", table
"dbo.DataWithJSON", column 'DataJSON'. The statement has been
terminated.

How to form the Merge query when there are multiple key columns?

On snowflake database, I am trying to run a merge on a table: PK_TABLE_TEST. This table DDL is as below:
CREATE OR REPLACE TABLE "LOAD".pk_table_test (
RESORT STRING NOT NULL,
STAYDATE DATE NOT NULL,
RATE_CODE STRING NOT NULL,
RNS NUMBER (38, 0),
GST NUMBER (38, 0),
REVENUE FLOAT,
REPORT_DATE DATE NOT NULL,
SYS_INS_DATE timestamp,
PRIMARY KEY (RESORT),
UNIQUE(RESORT, STAYDATE, RATE_CODE, REPORT_DATE)
);
I have the same table on my staging database with the name: pk_table_test_stg.
In my store procedure, I formed a merge query by getting all the keys from INFORMATION_SCHEMA. Below is the merge query:
MERGE INTO LOAD.PK_TABLE_TEST target USING LOAD.PK_TABLE_TEST_STG stg ON target.RESORT = stg.RESORT and target.STAYDATE = stg.STAYDATE and target.RATE_CODE = stg.RATE_CODE and target.REPORT_DATE = stg.REPORT_DATE WHEN MATCHED THEN UPDATE SET target.RESORT = stg.RESORT,target.STAYDATE = stg.STAYDATE,target.RATE_CODE = stg.RATE_CODE,target.RNS = stg.RNS,target.GST = stg.GST,target.REVENUE = stg.REVENUE,target.REPORT_DATE = stg.REPORT_DATE,target.SYS_INS_DATE = '2020-10-10 4:35:24'; WHEN NOT MATCHED THEN INSERT (RESORT,STAYDATE,RATE_CODE,RNS,GST,REVENUE,REPORT_DATE,SYS_INS_DATE) VALUES (stg.RESORT,stg.STAYDATE,stg.RATE_CODE,stg.RNS,stg.GST,stg.REVENUE,stg.REPORT_DATE,stg.SYS_INS_DATE);
But when I run the query, it says unexpected WHEN.
SQL Error [1003] [42000]: SQL compilation error:
syntax error line 2 at position 272 unexpected 'WHEN'.
Is there any syntax error with the query I formed ? Is it the right syntax when there are multiple columns in the ON condition ?
Could anyone let me know how can I fix the issue ? Any help is appreciated.
You have a semicolon in your query:
target.SYS_INS_DATE = '2020-10-10 4:35:24'; WHEN NOT MATCHED THEN INSERT
Try to delete this one :-)
More info about the syntax (which looks correct) can be found here: https://docs.snowflake.com/en/sql-reference/sql/merge.html

postgres force json datatype

When working with JSON datatype, is there a way to ensure the input JSON must have elements. I don't mean primary, I want the JSON that gets inserted to at least have the id and name element, it can have more but at the minimum the id and name must be there.
thanks
The function checks what you want:
create or replace function json_has_id_and_name(val json)
returns boolean language sql as $$
select coalesce(
(
select array['id', 'name'] <# array_agg(key)
from json_object_keys(val) key
),
false)
$$;
select json_has_id_and_name('{"id":1, "name":"abc"}'), json_has_id_and_name('{"id":1}');
json_has_id_and_name | json_has_id_and_name
----------------------+----------------------
t | f
(1 row)
You can use it in a check constraint, e.g.:
create table my_table (
id int primary key,
jdata json check (json_has_id_and_name(jdata))
);
insert into my_table values (1, '{"id":1}');
ERROR: new row for relation "my_table" violates check constraint "my_table_jdata_check"
DETAIL: Failing row contains (1, {"id":1}).

slick: MySQL: auto-increment on secondary column in multiple-column index

I have found an answer here on how I can specify an AUTO_INCREMENT on a secondary column in a multiple column index in MySQL e.g.
CREATE TABLE foo (
id INTEGER NOT NULL AUTO_INCREMENT,
grp INTEGER NOT NULL ,
name VARCHAR(64),
PRIMARY KEY (id, grp)
) ENGINE = MyISAM;
How would I extend a slick.driver.MySQL.api.Table so that such a table is generated, if indeed it is at all possible? Difficulties I am having currently: (1) I don't know how to create a composite primary key in slick within the main create statement and (2) I don't know how to specify to use the MyISAM engine.
Update: Following #ulas' advice, I used slick.codegen to generate the slick data model from the (already created) SQL table. However, the data model cannot be used to then recreate the table - it generates two statements instead of one, and neither reference MyISAM. Regarding this, I have listed an issue in the slick github repos.
For now this leaves me with following #RickJames' advice, which I would rather do anyway since it doesn't rely on MyISAM, a non-default engine for the current version of MySQL.
So my question can now be collapsed to, how would I execute the following using slick?
BEGIN;
SELECT #id := IFNULL(MAX(id), -1) + 1 FROM foo WHERE grp = 1 FOR UPDATE;
INSERT INTO foo VALUES (#id, 1, 'bar');
COMMIT;
I have no idea how to do it using the 'higher-level' abstraction, so I tried following the Plain SQL Queries section of the slick manual. My attempt went something like:
val statements = DBIO.seq(
sqlu" SELECT #id := IFNULL(MAX(id), -1) + 1 FROM foo WHERE grp = 1 FOR UPDATE",
sqlu"INSERT INTO foo VALUES (#id, 1, 'bar')"
)
db.run(statements.transactionally)
But I got the error:
Exception in thread "main" slick.SlickException: Update statements should not return a ResultSet
Help appreciated.

Rails ActiveRecord Mysql2::Error: Unknown column 'objectname.'

I have an update to an existing MySQL table that is failing under Rails. Here's the relevant controller code:
on = ObjectName.find_by_object_id(params[:id])
if (on) #edit existing
if on.update_attributes(params[:param_type] => params[:value])
respond_to do |format|
...
end
The ObjectName model class has 3 values (object_id, other_id, and prop1). When the update occurs, the SQL generated is coming out as
UPDATE `objectname` SET `other_id` = 245 WHERE `objectname`.`` IS NULL
The SET portion of the generated SQL is correct. Why is the WHERE clause being set to .`` IS NULL ?
I ran into the same error when working with a table with no primary key defined. There was a unique key set up on the field but no PK. Setting the PK in the model fixed it for me:
self.primary_key = :object_id