Pyramid + SQLAlchemy + Zope App returns wrong results with raw SQL - mysql

I have a Pyramid 2.X + SQLAlchemy + Zope App created using the official CookieCutter.
There is a table called "schema_b.table_a" with 0 records.
In the below view count(*) should be more than 0 but it returns 0
#view_config(route_name='home', renderer='myproject:templates/home.jinja2')
def my_view(request):
# Call external REST API. This uses HTTP requests. The API inserts in schema_b.table_a
call_thirdparty_api()
mark_changed(request.dbsession)
sql = "SELECT count(*) FROM schema_b.table_a"
total = request.dbsession.execute(sql).fetchone()
print(total) # Total is 0
return {}
On the other hand, the following code returns the correct count(*):
#view_config(route_name='home', renderer='myproject:templates/home.jinja2')
def my_view(request):
engine = create_engine(request.registry.settings.get("sqlalchemy.url"), poolclass=NullPool)
connection = engine.connect()
# Call external REST API. This uses HTTP requests. The API inserts in table_a
call_thirdparty_api()
sql = "SELECT count(*) FROM schema_b.table_a"
total = connection.execute(sql).fetchone()
print(total) # Total is not 0
connection.invalidate()
engine.dispose()
return {}
It seems that request.session is not able to see the data inserted by the external REST API but it is not clear to me why or how to correct it.

Pyramid and Zope provide transaction managers that extend transactions to far beyond databases. In your example I think a transaction was started in mysql when the request was received on the server by the pyramid_tm package, their documentation states:
"At the beginning of a request a new transaction is started using the request.tm.begin() function."
https://docs.pylonsproject.org/projects/pyramid_tm/en/latest/index.html
Because mysql supports consistent nonblocking reads on the transaction you join when calling request.dbsession.execute you query a snapshot of the database made at the start of the transaction. When you use the normal SQLAlchemy function to execute the query a new transaction is created and the expected result is returned.
https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-read.html
This is very confusing in this situation. But I must admit it's impressive how well it seems to work.

Related

Stream a NOT NULL selection of a table?

I'm trying to select the primary key for all rows in a table based on if another column is NULL.
The following code does not do what I want, but this is what it would look like as a pure select(), but the table is so large that it nearly fills up memory before returning any results.
s = tweets.select().where(tweets.c.coordinates != None)
result = engine.execute(s)
for row in result:
print(row)
Because the table is so large, I found a streaming solution that works for the session.query() object:
def page_query(q):
r = True
offset = 0
while r:
r = False
for elem in q.limit(1000).offset(offset):
r = True
yield elem
offset += 1000
so I'm trying to structure the above select() as a query(), but when I do, it returns every row in the table, including ones with coordinates = 'null'
q= session.query(Tweet).filter(Tweet.coordinates.is_not(None))
for i in page_query(q):
print(f' {i}')
If I instead do
q= session.query(Tweet).filter(Tweet.coordinates.is_not('null'))
for i in page_query(q):
print(f' {i}')
I get an error:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.SyntaxError) syntax error at or near "'null'"
LINE 3: WHERE milan_tweets.coordinates IS NOT 'null'
^
(using != appears to give the same results as the built in .is_not())
So how can I make this selection?
EDIT: Code block at the top does NOT do what I expected originally, my mistake.
Rows are added to the database as python Nones, and looking in dbeaver shows the values as "null"
You have correctly diagnosed the problem.
Query returns e.g. a million rows,
and the psycopg2 driver drags all
of those result rows over the network,
buffering them locally, before returning
even a single row up to your app.
Why? Because the public API includes a
detail where your app could ask
"how many rows were in that result?",
and the driver must retrieve all in order
to learn that bit of trivia.
If you promise not to ask the "how many?"
question, you can stream results with this:
import sqlalchemy as sa
engine = sa.create_engine(uri).execution_options(stream_results=True)
Then rows will be up-delivered to your app
nearly
as soon as they become available,
rather than being buffered
a long time.
This yields a significantly smaller
memory footprint for your python process,
as the DB driver layer does not need
to malloc() storage sufficient to
store all million result rows.
https://docs.sqlalchemy.org/en/14/core/connections.html#streaming-with-a-fixed-buffer-via-yield-per
cf test_core_fetchmany_w_streaming

What does 'synchronize_session=False' do exactly in update functions for Sqlalchemy?and what is the best value for it?

We have the CRUD functions in our API which is using FastAPI and SQLAlchemy.
For update functions we have the below code:
def update_user(
user_id: uuid.UUID,
db: Session,
update_model: UserUpdateModel,
) -> bool:
query = (
db.query(User)
.filter(
User.user_id == user_id,
)
.update(update_model, synchronize_session=False)
)
try:
db.commit()
except IntegrityError as e:
if isinstance(e.orig, PG2UniqueViolation):
raise UniqueViolation from e
return bool(query)
What exactly does the 'synchronize_session=False' do here?
What is the best value for it? False or Fetch...?
Is it critical if we don't use it?
By looking at the sqlalchemy doc you can find what synchronize_session does and how to use it properly
From the official doc:
With both the 1.x and 2.0 form of ORM-enabled updates and deletes, the following values for synchronize_session are supported:
False - don’t synchronize the session. This option is the most efficient and is reliable once the session is expired, which typically occurs after a commit(), or explicitly using expire_all(). Before the expiration, objects that were updated or deleted in the database may still remain in the session with stale values, which can lead to confusing results.
'fetch' - Retrieves the primary key identity of affected rows by either performing a SELECT before the UPDATE or DELETE, or by using RETURNING if the database supports it, so that in-memory objects which are affected by the operation can be refreshed with new values (updates) or expunged from the Session (deletes). Note that this synchronization strategy is not available if the given update() or delete() construct specifies columns for UpdateBase.returning() explicitly.
'evaluate' - Evaluate the WHERE criteria given in the UPDATE or DELETE statement in Python, to locate matching objects within the Session. This approach does not add any round trips and in the absence of RETURNING support is more efficient. For UPDATE or DELETE statements with complex criteria, the 'evaluate' strategy may not be able to evaluate the expression in Python and will raise an error. If this occurs, use the 'fetch' strategy for the operation instead.

Does Statement.RETURN_GENERATED_KEYS generate any extra round trip to fetch the newly created identifier?

JDBC allows us to fetch the value of a primary key that is automatically generated by the database (e.g. IDENTITY, AUTO_INCREMENT) using the following syntax:
PreparedStatement ps= connection.prepareStatement(
"INSERT INTO post (title) VALUES (?)",
Statement.RETURN_GENERATED_KEYS
);
while (resultSet.next()) {
LOGGER.info("Generated identifier: {}", resultSet.getLong(1));
}
I'm interested if the Oracle, SQL Server, postgresQL, or MySQL driver uses a separate round trip to fetch the identifier, or there is a single round trip which executes the insert and fetches the ResultSet automatically.
It depends on the database and driver.
Although you didn't ask for it, I will answer for Firebird ;). In Firebird/Jaybird the retrieval itself doesn't require extra roundtrips, but using Statement.RETURN_GENERATED_KEYS or the integer array version will require three extra roundtrips (prepare, execute, fetch) to determine the columns to request (I still need to build a form of caching for it). Using the version with a String array will not require extra roundtrips (I would love to have RETURNING * like in PostgreSQL...).
In PostgreSQL with PgJDBC there is no extra round-trip to fetch generated keys.
It sends a Parse/Describe/Bind/Execute message series followed by a Sync, then reads the results including the returned result-set. There's only one client/server round-trip required because the protocol pipelines requests.
However sometimes batches that can otherwise be streamed to the server may be broken up into smaller chunks or run one by on if generated keys are requested. To avoid this, use the String[] array form where you name the columns you want returned and name only columns of fixed-width data types like integer. This only matters for batches, and it's a due to a design problem in PgJDBC.
(I posted a patch to add batch pipelining support in libpq that doesn't have that limitation, it'll do one client/server round trip for arbitrary sized batches with arbitrary-sized results, including returning keys.)
MySQL receives the generated key(s) automatically in the OK packet of the protocol in response to executing a statement. There is no communication overhead when requesting generated keys.
In my opinion even for such a trivial thing a single approach working in all database systems will fail.
The only pragmatic solution is (in analogy to Hibernate) to find the best working solution for each target RDBMS (and
call it a dialect of your one for all solution:)
Here the information for Oracle
I'm using a sequence to generate key, same behavior is observed for IDENTITY column.
create table auto_pk
(id number,
pad varchar2(100));
This works and use only one roundtrip
def stmt = con.prepareStatement("insert into auto_pk values(auto_pk_seq.nextval, 'XXX')",
Statement.RETURN_GENERATED_KEYS)
def rowCount = stmt.executeUpdate()
def generatedKeys = stmt.getGeneratedKeys()
if (null != generatedKeys && generatedKeys.next()) {
def id = generatedKeys.getString(1);
But unfortunately you get ROWID as a result - not the generated key
How is it implemented internally? You can see it if you activate a 10046 trace (BTW this is also the best way to see
how many roundtrips were performed)
PARSING IN CURSOR
insert into auto_pk values(auto_pk_seq.nextval, 'XXX')
RETURNING ROWID INTO :1
END OF STMT
So you see the JDBC Standard 3.0 is implemented, but you don't get a requested result. Under the cover is used the
RETURNING clause.
The right approach to get the generated key in Oracle is therefore:
def stmt = con.prepareStatement("insert into auto_pk values(auto_pk_seq.nextval, 'XXX') returning id into ?")
stmt.registerReturnParameter(1, Types.INTEGER);
def rowCount = stmt.executeUpdate()
def generatedKeys = stmt.getReturnResultSet()
if (null != generatedKeys && generatedKeys.next()) {
def id = generatedKeys.getLong(1);
}
Note:
Oracle Release 12.1.0.2.0
To activate the 10046 trace use
con.createStatement().execute "alter session set events '10046 trace name context forever, level 12'"
con.createStatement().execute "ALTER SESSION SET tracefile_identifier = my_identifier"
Depending on frameworks or libraries to do things that are perfectly possible in plain SQL is bad design IMHO, especially when working against a defined DBMS. (The Statement.RETURN_GENERATED_KEYS is relatively innocuous, although it apparently does raise a question for you, but where frameworks are built on separate entities and doing all sorts of joins and filters in code or have custom-built transaction isolation logic things get inefficient and messy very quickly.)
Why not simply:
PreparedStatement ps= connection.prepareStatement(
"INSERT INTO post (title) VALUES (?) RETURNING id");
Single trip, defined result.

Adding output converter to pyodbc connection in SQLAlchemy

using:
Python 2.7.3
SQLAlchemy 0.7.8
PyODBC 3.0.3
I have implemented my own Dialect for the EXASolution DB using PyODBC as the underlying db driver. I need to make use of PyODBC's output_converter function to translate DECIMAL(x, 0) columns to integers/longs.
The following code snippet does the trick:
pyodbc = self.dbapi
dbapi_con = connection.connection
dbapi_version = dbapi_con.getinfo(pyodbc.SQL_DRIVER_VER)
(major, minor, patch) = [int(x) for x in dbapi_version]
if major >= 3:
dbapi_con.add_output_converter(pyodbc.SQL_DECIMAL, self.decimal2int)
I have placed this code snippet in the initialize(self, connection) method of
class EXADialect_pyodbc(PyODBCConnector, EXADialect):
Code gets called, and no exception is thrown, but this is a one time initialization. Later on, other connections are created. These connections are not passed through my initialization code.
Does anyone have a hint on how connection initialization works with SQLAlchemy, and where to place my code so that it gets called for every new connection created?
This is an old question, but something I hit recently, so an updated answer may help someone else along the way. In my case, I was trying to automatically downcase mssql UNIQUEIDENTIFIER columns (guids).
You can grab the raw connection (pyodbc) through the session or engine to do this:
engine = create_engine(connection_string)
make_session = sessionmaker(engine)
...
session = make_session()
session.connection().connection.add_output_converter(SQL_DECIMAL, decimal2int)
# or
connection = engine.connect().connection
connection.add_output_converter(SQL_DECIMAL, decimal2int)

MySQL / SQLite3

I stumbled upon the following:
def save_formset(self, request, form, formset, change):
instances = formset.save(commit=False)
bargain_id = 0
total_price = Decimal(0)
for instance in instances:
if isinstance(instance, BargainProduct):
total_price += instance.quantity * instance.product.price
bargain_id = instance.id
instance.save()
updateTotal = Bargain.objects.get(id=bargain_id)
updateTotal.total_price = total_price - updateTotal.discount_price
updateTotal.save()
This code is working for me on my local MySQL setup, however, on my live test enviroment running on SQLite3* I get the "Bargain matching query does not exist." error..
I am figuring this is due to a different hierarchy of saving the instances on SQLite.. however it seems they run(and should) act the same..?
*I cannot recompile MySQL with python support on my liveserver atm so thats a no go
Looking at the code, if you have no instances coming out of the formset.save(), bargain_id will be 0 when it gets down to the Bargain.objects.get(id=bargain_id) line, since it will skip over the for loop. If it is 0, I'm guessing it will fail with the error you are seeing.
You might want to check to see if the values are getting stored correctly in the database during your formset.save() and it is returning something back to instances.
This line is giving the error:
updateTotal = Bargain.objects.get(id=bargain_id)
which most probably is because of this line:
instances = formset.save(commit=False)
Did you define a save() method for the formset? Because it doesn't seen to have one built-in. You save it by accessing what formset.cleaned_data returns as the django docs say.
edit: I correct myself, it actually has a save() method based on this page.
I've been looking at this same issue. It is saving the data to the database, and the formset is filled. The problem is that the save on instances = formset.save(commit=False) doesn't return a value. When I look at the built-in save method, it should give back the saved data.
Another weird thing about this, is that it seems to work on my friends MySQL backend, but not on his SQLITE3 backend. Next to that it doesn't work on my MySQL backend.
Local loop returns these print outs (on MySQL).. on sqlite3 it fails with a does not excist on the query
('Formset: ', <django.forms.formsets.BargainProductFormFormSet object at 0x101fe3790>)
('Instances: ', [<BargainProduct: BargainProduct object>])
[18/Apr/2011 14:46:20] "POST /admin/shop/deal/add/ HTTP/1.1" 302 0