Django connection pooling and time fields - mysql

Has anyone got connection pooling working with Django, SQLAlchemy, and MySQL?
I used this tutorial (http://node.to/wordpress/2008/09/30/another-database-connection-pool-solution-for-django-mysql/) which worked great but the issue I'm having is whenever I bring back a time field it is being converted to a timedelta since the Django-specific conversions are not being used.
Conversion code from django/db/backends/mysql/base.py
django_conversions = conversions.copy()
django_conversions.update({
FIELD_TYPE.TIME: util.typecast_time,
FIELD_TYPE.DECIMAL: util.typecast_decimal,
FIELD_TYPE.NEWDECIMAL: util.typecast_decimal,
})
Connection code from article:
if settings.DATABASE_HOST.startswith('/'):
self.connection = Database.connect(port=kwargs['port'],
unix_socket=kwargs['unix_socket'],
user=kwargs['user'],
db=kwargs['db'],
passwd=kwargs['passwd'],
use_unicode=kwargs['use_unicode'],
charset='utf8')
else:
self.connection = Database.connect(host=kwargs['host'],
port=kwargs['port'],
user=kwargs['user'],
db=kwargs['db'],
passwd=kwargs['passwd'],
use_unicode=kwargs['use_unicode'],
charset='utf8')

In Django trunk, edit django/db/init.py and comment out the line:
signals.request_finished.connect(close_connection)
This signal handler causes it to disconnect from the database after every request. I don't know what all of the side-effects of doing this will be, but it doesn't make any sense to start a new connection after every request; it destroys performance, as you've noticed.
Another necessary change is in django/middleware/transaction.py; remove the two transaction.is_dirty() tests and always call commit() or rollback(). Otherwise, it won't commit a transaction if it only read from the database, which will leave locks open that should be closed.

Related

How do I get Rails 4.x streaming to work with MySQL when testing?

I created a new Rails 4.2.1 test project to try out the new streaming feature (the 'Live' one which I read about here). This project is set up to use MySQL for the database (I also tried Sqlite but couldn't repro the issue with it). The project is simple, consisting only of: 1) a model Test with 2 attributes (both strings). 2) a simple route resources :tests and 3) a simple controller tests_controller with one action index. The model and controller were generated by the standard rails generators, and only the controller was modified, as follows:
class TestsController < ApplicationController
include ActionController::Live
def index
response.headers['Content-Type'] = 'application/json'
response.stream.write('{"count": 5, "tests": [')
Test.find_each do |test|
response.stream.write(test.to_json)
response.stream.write(',')
end
response.stream.write(']}')
response.stream.close
end
end
When I run rails s and test by hand everything seems fine. But when I added a test (shown below) I get a strange error:
1) Error:
TestsControllerTest#test_index:
ActiveRecord::StatementInvalid: Mysql2::Error: This connection is in use by: #<Thread:0x007f862a4a7e48#/Users/xxx/.rvm/gems/ruby-2.2.2/gems/actionpack-4.2.1/lib/action_controller/metal/live.rb:269 sleep>: ROLLBACK
The test is:
require 'test_helper'
class TestsControllerTest < ActionController::TestCase
test "index" do
#request.headers['Accept'] = 'application/json'
get :index
assert_response :success
end
end
Note that the error is intermittent, coming up only about half the time. Also, even though testing by hand doesn't cause any errors I'm worried that when multiple clients hit the API at the same time that errors will occur. Any suggestions as to what's going on here would be much appreciated.
Pretty old, but you need to actually checkout a new database connection since ActionController::Live executes the action in a new thread:
The final caveat is that your actions are executed in a separate thread than the main thread. Make sure your actions are thread safe, and this shouldn't be a problem (don't share state across threads, etc).
https://github.com/rails/rails/blob/861b70e92f4a1fc0e465ffcf2ee62680519c8f6f/actionpack/lib/action_controller/metal/live.rb
You can even use an around_filter/around_action for this.

pymysql lost connection after a long period

using pymysql connect to mysql, leave the program running for a long time,for example, leave the office at night and come back next morning.during this period,no any operation on this application.now doing a database commit would give this error.
File "/usr/local/lib/python3.3/site-packages/pymysql/cursors.py", line 117, in execute
self.errorhandler(self, exc, value)
File "/usr/local/lib/python3.3/site-packages/pymysql/connections.py", line 189, in defaulterrorhandler
raise errorclass(errorvalue)
pymysql.err.OperationalError: (2013, 'Lost connection to MySQL server during query')
restart the web server (tornado),it's ok.why if leave it long time would get this error?
The wait_timeout exists for a reason: long-time idle connections are wasteful of limited server resources. You are correct: increasing it is not the right approach.
Fortunately, this is Python which has a robust exception mechanism. I'm not familiar with pymysql but presumably you've got an "open_connection" method somewhere which allows
try:
cursor.do_something()
except pymysql.err.OperationalError as e:
if e[0] == 2013: # Lost connection to server
# redo open_connection and do_something
else:
raise
Since you didn't post any calling code, I can't structure this example to match your application. There things worth noting about the except clause: the first is that they should always be as narrow as possible, in this case there are (presumably) many OperationalErrors and you only know how to deal with 'Lost connection'.
In case it isn't a lost connection exception, you should re-raise it so it doesn't get swallowed. Unless you know how to handle other OperationalErrors, this will get passed up the stack and result in an informative error message which is a reasonable thing to do since your cursor is probably useless anyway by then.
Restarting the web server only fixes the lost connection as an accidental side effect of reinitializing everything; handling the exception within the code is a much more gentle way of accomplishing your goal.

python: sqlalchemy - how do I ensure connection not stale using new event system

I am using the sqlalchemy package in python. I have an operation that takes some time to execute after I perform an autoload on an existing table. This causes the following error when I attempt to use the connection:
sqlalchemy.exc.OperationalError: (OperationalError) (2006, 'MySQL server has gone away')
I have a simple utility function that performs an insert many:
def insert_data(data_2_insert, table_name):
engine = create_engine('mysql://blah:blah123#localhost/dbname')
# Metadata is a Table catalog.
metadata = MetaData()
table = Table(table_name, metadata, autoload=True, autoload_with=engine)
for c in mytable.c:
print c
column_names = tuple(c.name for c in mytable.c)
final_data = [dict(zip(column_names, x)) for x in data_2_insert]
ins = mytable.insert()
conn = engine.connect()
conn.execute(ins, final_data)
conn.close()
It is the following line that times long time to execute since 'data_2_insert' has 677,161 rows.
final_data = [dict(zip(column_names, x)) for x in data_2_insert]
I came across this question which refers to a similar problem. However I am not sure how to implement the connection management suggested by the accepted answer because robots.jpg pointed this out in a comment:
Note for SQLAlchemy 0.7 - PoolListener is deprecated, but the same solution can be implemented using the new event system.
If someone can please show me a couple of pointers on how I could go about integrating the suggestions into the way I use sqlalchemy I would be very appreciative. Thank you.
I think you are looking for something like this:
from sqlalchemy import exc, event
from sqlalchemy.pool import Pool
#event.listens_for(Pool, "checkout")
def check_connection(dbapi_con, con_record, con_proxy):
'''Listener for Pool checkout events that pings every connection before using.
Implements pessimistic disconnect handling strategy. See also:
http://docs.sqlalchemy.org/en/rel_0_8/core/pooling.html#disconnect-handling-pessimistic'''
cursor = dbapi_con.cursor()
try:
cursor.execute("SELECT 1") # could also be dbapi_con.ping(),
# not sure what is better
except exc.OperationalError, ex:
if ex.args[0] in (2006, # MySQL server has gone away
2013, # Lost connection to MySQL server during query
2055): # Lost connection to MySQL server at '%s', system error: %d
# caught by pool, which will retry with a new connection
raise exc.DisconnectionError()
else:
raise
If you wish to trigger this strategy conditionally, you should avoid use of decorator here and instead register listener using listen() function:
# somewhere during app initialization
if config.check_connection_on_checkout:
event.listen(Pool, "checkout", check_connection)
More info:
Connection Pool Events
Events API
There is a better way to handle it right now - pool_recycle
engine = create_engine('mysql://...', pool_recycle=3600)
MySQL has a default timeout of 8 hours.
This leads to the connection to be closed by MySQL but the engine above it (such as SQLAlchemy) to not know about it.
There are 2 ways to solve it -
Optimistic - Using pool_recycle
Pessimistic - using pool_pre_ping=True
I prefer to go with the pool_recycle as it doesn't emit a SELECT 1 before each query - causing less stress on the db

DataContext connection closed or transaction completed unexpectedly while submitting changes within a TransactionScope transaction?

Code
double timeout_in_hours = 6.0;
MyDataContext db = new MyDataContext();
using (TransactionScope tran = new TransactionScope( TransactionScopeOption.Required, new TransactionOptions(){ IsolationLevel= System.Transactions.IsolationLevel.ReadCommitted, Timeout=TimeSpan.FromHours( timeout_in_hours )}, EnterpriseServicesInteropOption.Automatic ))
{
int total_records_processed = 0;
foreach (DataRow datarow in data.Rows)
{
//Code runs some commands on the DataContext (db),
//possibly reading/writing records and calling db.SubmitChanges
total_records_processed++;
try
{
db.SubmitChanges();
}
catch (Exception err)
{
MessageBox.Show( err.Message );
}
}
tran.Complete();
return total_records_processed;
}
While the above code is running, it successfully completes 6 or 7 hundred loop iterations. However, after 10 to 20 minutes, the catch block above catches the following error:
{"The transaction associated with the current connection has completed but has not been disposed. The transaction must be disposed before the connection can be used to execute SQL statements."}
The tran.Complete call is never made, so why is it saying the transaction associated with the connection is completed?
Why, after successfully submitting hundreds of changes, does the connection associated with the DataContext suddenly enter a closed state? (That's the other error I sometimes get here).
When profiling SQL Server, there are just a lot of consecutive selects and inserts with really nothing else while its running. The very last thing the profiler catches is a sudden "Audit Logout", which I'm not sure if that's the cause of the problem or a side-effect of it.
Wow, the max timeout is limited by machine.config: http://forums.asp.net/t/1587009.aspx/1
"OK, we resolved this issue. apparently the .net 4.0 framework doesn't
allow you to set your transactionscope timeouts in the code as we have
done in the past. we had to make the machine.config changes by adding
< system.transactions> < machineSettings maxTimeout="02:00:00"/>
< defaultSettings timeout="02:00:00"/> < /system.transactions>
to the machine.config file. using the 2.0 framework we did not have
to make these entries as our code was overriding teh default value to
begin with."
It seems that the timeout you set in TransactionScope's constructor is ignored or defeated by a maximum timeout setting in the machine.config file. There is no mention of this in the documentation for the TransactionScope's constructor that accepts a time out parameter: http://msdn.microsoft.com/en-us/library/9wykw3s2.aspx
This makes me wonder, what if this was a shared hosting environment I was dealing with, where I could not access the machine.config file? There's really no way to break up the transaction, since it involves creating data in multiple tables with relationships and identity columns whose values are auto-incremented. What a poor design decision. If this was meant to protect servers with shared hosting, it's pointless, because such a long-running transaction would be isolated to my own database only. Also, if a program specifies a longer timeout, then it obviously expects a transaction to take a longer amount of time, so it should be allowed. This limitation is just a pointless handicap IMO that's going to cause problems. See also: TransactionScope maximumTimeout

SQL Alchemy + Testing webserver with InnoDB fails

I am currently trying to move my DB tables over to InnoDB from MyISAM. I am having timing issues with requests and cron jobs that are running on the server that is leading to some errors. I am quite sure that transaction support will help me with the problem. I am therefore transitioning to InnoDB.
I have a suite of tests which make calls to our webservices REST API and receive XML responses. The test suite is fairly thorough, and it's written in Python and uses SQLAlchemy to query information from the database. When I change the tables in the system from MyISAM to InnoDB however, the tests start failing. However, the tests aren't failing because the system isn't working, they are failing because the ORM is not correctly querying the rows from the database I am testing on. when I step through the code I see the correct results, but the ORM is not returning the correct results at all.
Basic flow is:
class UnitTest(unittest.TestCase):
def setUp(self):
# Create a test object in DB that gets affected by the web server
testObject = Obj(foo='one')
self.testId = testObject.id
session.add(testObject)
session.commit()
def tearDown(self):
# Clean up after the test
testObject = session.query(Obj).get(self.testId)
session.delete(testObject)
session.commit()
def test_web_server(self):
# Ensure the initial state of the object.
objects = session.query(Obj).get(self.testId)
assert objects.foo == 'one'
# This will make a simple HTTP get call on an url that will modify the DB
response = server.request.increment_foo(self.testId)
# This one fails, the object still has a foo of 'one'
# When I stop here in a debugger though, and look at the database,
# The row in question actually has the correct value in the database.
# ????
objects = session.query(Obj).get(self.testId)
assert objects.foo == 'two'
Using MyISAM tables to store the object and this test will pass. However, when I change to InnoDB tables, this test will not pass. What is more interesting is that when I step through the code in the debugger, I can see that the datbase has what I expect, so it's not a problem in the web server code. I have tried nearly every combination of expire_all, autoflush, autocommit, etc. etc, and still can't get this test to pass.
I can provide more info if necessary.
Thanks,
Conrad
The problem is that you put the line self.testId = testObject.id before new object is added to session, flushed, and SQLAlchemy assigned ID to it. Thus self.testId is always None. Move this line below session.commit().