I am currently trying to move my DB tables over to InnoDB from MyISAM. I am having timing issues with requests and cron jobs that are running on the server that is leading to some errors. I am quite sure that transaction support will help me with the problem. I am therefore transitioning to InnoDB.
I have a suite of tests which make calls to our webservices REST API and receive XML responses. The test suite is fairly thorough, and it's written in Python and uses SQLAlchemy to query information from the database. When I change the tables in the system from MyISAM to InnoDB however, the tests start failing. However, the tests aren't failing because the system isn't working, they are failing because the ORM is not correctly querying the rows from the database I am testing on. when I step through the code I see the correct results, but the ORM is not returning the correct results at all.
Basic flow is:
class UnitTest(unittest.TestCase):
def setUp(self):
# Create a test object in DB that gets affected by the web server
testObject = Obj(foo='one')
self.testId = testObject.id
session.add(testObject)
session.commit()
def tearDown(self):
# Clean up after the test
testObject = session.query(Obj).get(self.testId)
session.delete(testObject)
session.commit()
def test_web_server(self):
# Ensure the initial state of the object.
objects = session.query(Obj).get(self.testId)
assert objects.foo == 'one'
# This will make a simple HTTP get call on an url that will modify the DB
response = server.request.increment_foo(self.testId)
# This one fails, the object still has a foo of 'one'
# When I stop here in a debugger though, and look at the database,
# The row in question actually has the correct value in the database.
# ????
objects = session.query(Obj).get(self.testId)
assert objects.foo == 'two'
Using MyISAM tables to store the object and this test will pass. However, when I change to InnoDB tables, this test will not pass. What is more interesting is that when I step through the code in the debugger, I can see that the datbase has what I expect, so it's not a problem in the web server code. I have tried nearly every combination of expire_all, autoflush, autocommit, etc. etc, and still can't get this test to pass.
I can provide more info if necessary.
Thanks,
Conrad
The problem is that you put the line self.testId = testObject.id before new object is added to session, flushed, and SQLAlchemy assigned ID to it. Thus self.testId is always None. Move this line below session.commit().
Related
does anyone know anything about debugging with knexjs and mysql? I'm trying to do a lot of things and test out stuff and I keep polluting my test database with random data. ideally, I'd just like to do things and see what the output query would be instead of running it against the actual database to see if it actually worked.
I can't find anything too helpful in their docs. they mention passing {debug: true} as one of the options in your initialize settings but it doesn't really explain what it does.
I am a junior developer, so maybe some of this is not meant to be understood by juniors but at the end of the day Is just not clear at all what steps I should take to be able to just see what queries would have been ran instead of running the real queries and polluting my db.
const result = await db().transaction(trx =>
trx.insert(mapToSnakeCase(address), 'id').into('addresses')
.then(addressId =>
trx.insert({ addresses_id: addressId, display_name: displayName }, 'id')
.into('chains')).toString();
You can build a knex query, but until you attach a .then() or awiat() (or run . asCallback((error,cb)=>{})), the query is just an object.
So you could do
let localVar = 8
let query = knex('table-a').select().where('id', localVar)
console.log(query.toString())
// outputs a string 'select * from table-a where id = 8'
This does not hit the database, and is synchronous. Make as many of these as you want!
As soon as you do await query or query.then(rows => {}) or query.asCallback( (err,rows)=>{} ) you are awaiting the db results, starting the promise chain or defining the callback. That is when the database is hit.
Turning on debug: true when initializing just writes the results of query.toSQL() to the console as they run against the actual DB. Sometimes an app might make a lot of queries and if one goes wrong this is a way to see why a DB call failed (but is VERY verbose so typically is not on all the time).
In our app's tests, we do actually test against the database because unit testing this type of stuff is a mess. We use knex's migrations on a test database that is brought down and up every time the tests run. So it always starts clean (or with known seed data), and if a test fails the DB is in the same state to be manually inspected. While we create a lot of test data in a testing run, it's cleaned up before the next test.
I created a new Rails 4.2.1 test project to try out the new streaming feature (the 'Live' one which I read about here). This project is set up to use MySQL for the database (I also tried Sqlite but couldn't repro the issue with it). The project is simple, consisting only of: 1) a model Test with 2 attributes (both strings). 2) a simple route resources :tests and 3) a simple controller tests_controller with one action index. The model and controller were generated by the standard rails generators, and only the controller was modified, as follows:
class TestsController < ApplicationController
include ActionController::Live
def index
response.headers['Content-Type'] = 'application/json'
response.stream.write('{"count": 5, "tests": [')
Test.find_each do |test|
response.stream.write(test.to_json)
response.stream.write(',')
end
response.stream.write(']}')
response.stream.close
end
end
When I run rails s and test by hand everything seems fine. But when I added a test (shown below) I get a strange error:
1) Error:
TestsControllerTest#test_index:
ActiveRecord::StatementInvalid: Mysql2::Error: This connection is in use by: #<Thread:0x007f862a4a7e48#/Users/xxx/.rvm/gems/ruby-2.2.2/gems/actionpack-4.2.1/lib/action_controller/metal/live.rb:269 sleep>: ROLLBACK
The test is:
require 'test_helper'
class TestsControllerTest < ActionController::TestCase
test "index" do
#request.headers['Accept'] = 'application/json'
get :index
assert_response :success
end
end
Note that the error is intermittent, coming up only about half the time. Also, even though testing by hand doesn't cause any errors I'm worried that when multiple clients hit the API at the same time that errors will occur. Any suggestions as to what's going on here would be much appreciated.
Pretty old, but you need to actually checkout a new database connection since ActionController::Live executes the action in a new thread:
The final caveat is that your actions are executed in a separate thread than the main thread. Make sure your actions are thread safe, and this shouldn't be a problem (don't share state across threads, etc).
https://github.com/rails/rails/blob/861b70e92f4a1fc0e465ffcf2ee62680519c8f6f/actionpack/lib/action_controller/metal/live.rb
You can even use an around_filter/around_action for this.
I am new to Grails and learning Grails currently.
I configured Mysql as my database. and when I run-app I can see table create in my database.
I tried to do a save() in test case both Unit (extend Specification) and Integration test (extend IntegrationSpec), Test method is shown as follow, which could be passed successfully.
void "test first save"() {
when: "when have user id is 'joe', and password is 'secret'"
def userId = "joe"
def password = "secret"
then: "create a user use ${userId} and ${password}"
User user = new User(userId: userId, password: password, homepage: 'http://www.grailsinaction.com')
expect: "user can be saved successfully"
assert user.save(flush:true, failOnError:true)
assert user.id
def foundUser = User.get(user.id)
assert foundUser?.userId == 'joe'
}
but I found there are no data inserted into database in both unit and integration test.
I understood that Unit test will only mock the persistence, but integration test should use real database for the testing purpose.
So my question is whether integration should commit data into database? If so, anything could be wrong to make committing not occurred?
The integration test runner is configured to start a new transaction for every test, and explicitly roll it back at the end of the test. This is convenient because you don't have to do any cleanup work between tests - everything is reset automatically for you. Note that any work done before the tests start (e.g. in BootStrap) will remain for each test since it is committed already, and the rollback resets back to the state at the beginning of each test.
You can disable this for an individual test class by adding
static transactional = false
but I would avoid this except in rare cases where you are testing transaction commits and rollbacks and need full control at that level.
Also note that the Hibernate dbCreate setting will affect things. If you configure data to remain after the tests run but use create-drop, the tables will be dropped at startup and at shutdown, so using create (which only drops at startup) or explicit migrations would be needed to view anything.
FYI - in your test, the line
def foundUser = User.get(user.id)
will not hit the database - Hibernate will simply give you back the instance you just saved. You can see this by turning on SQL logging. If you want to really re-load the object, you need to clear the Hibernate session to force it to issue a query. One easy way to do this is with the withSession method on domain classes (it's independent of the class it's called on, so use any), e.g.
User.withSession { session ->
session.flush()
session.clear()
}
I added a flush call to ensure that everything it pushed before clearing.
I have a pyramid view that is used for loading data from a large file into a database. For each line in the file it does a little processing then creates some model instances and adds them to the session. This works fine except when the files are big. For large files the view slowly eats up all my ram until everything effectively grinds to a halt.
So my idea is to process each line individually with a function that creates a session, creates the necessary model instances and adds them to the current session, then commits.
def commit_line(lTitles,lLine,oStartDate,oEndDate,iDS,dSettings):
from sqlalchemy.orm import (
scoped_session,
sessionmaker,
)
from sqlalchemy import engine_from_config
from pyramidapp.models import Base, DataEntry
from zope.sqlalchemy import ZopeTransactionExtension
import transaction
oCurrentDBSession = scoped_session(sessionmaker(extension=ZopeTransactionExtension()))
engine = engine_from_config(dSettings, 'sqlalchemy.')
oCurrentDBSession.configure(bind=engine)
Base.metadata.bind = engine
oEntry = DataEntry()
oCurrentDBSession.add(oEntry)
...
transaction.commit()
My requirements for this function are as follows:
create a session (check)
make a bunch of model instances (check)
add those instances to the session (check)
commit those models to the database
get rid of the session (so that it and the objects created in 2 are garbage collected)
I've made sure that the newly created session is passed as an argument whenever necessary in order to stop errors to do with multiple sessions blah blah. But alas! I can't get database connections to go away and stuff isn't being committed.
I tried separating the function out into a celery task so the view executes to completion and does what it needs to but I'm getting an error in celery about having too many mysql connections no matter what I try in terms of committing and closing and disposing and I'm not sure why. And yes, I restart the celery server when I make changes.
Surely there is a simple way to do this? All I want to do is make a session commit then go away and leave me alone.
Creating a new session for each line of your large file is going to be quite slow I would imagine.
What I would try is to commit the session and expunge all objects from it every 1000 rows or so:
counter = 0
for line in mymegafile:
entry = process_line(line)
session.add(entry)
if counter > 1000:
counter = 0
transaction.commit() # if you insist on using ZopeTransactionExtension, otherwise session.commit()
session.expunge_all() # this may not be required actually, see https://groups.google.com/forum/#!topic/sqlalchemy/We4XGX2CYX8
else:
counter += 1
If there are no references to DataEntry instances from anywhere they should be garbage collected by Python interpreter at some point.
However, if all you're doing in that view is inserting new records to the database, it may be much more efficient to use SQLAlchemy Core constructs or literal SQL to bulk-insert data. This would also get rid of the problem with your ORM instances eating up your RAM. See I’m inserting 400,000 rows with the ORM and it’s really slow! for details.
So I tried a bunch of things and, although using SQLAlchemy's built in functionality to solve this was probably possible I could not find any way of pulling that off.
So here's an outline of what I did:
seperate the lines to be processed into batches
for each batch of lines queue up a celery task to deal with those lines
in the celery task a seperate process is launched that does the necessary stuff with the lines.
Reasoning:
The batch stuff is obvious
Celery was used because it took a heck of a long time to process an entire file so queuing just made sense
the task launched a separate process because if it didn't then I had the same problem that I had with the pyramid application
Some code:
Celery task:
def commit_lines(lLineData,dSettings,cwd):
"""
writes the line data to a file then calls a process that reads the file and creates
the necessary data entries. Then deletes the file
"""
import lockfile
sFileName = "/home/sheena/tmp/cid_line_buffer"
lock = lockfile.FileLock("{0}_lock".format(sFileName))
with lock:
f = open(sFileName,'a') #in case the process was at any point interrupted...
for d in lLineData:
f.write('{0}\n'.format(d))
f.close()
#now call the external process
import subprocess
import os
sConnectionString = dSettings.get('sqlalchemy.url')
lArgs = [
'python',os.path.join(cwd,'commit_line_file.py'),
'-c',sConnectionString,
'-f',sFileName
]
#open the subprocess. wait for it to complete before continuing with stuff. if errors: raise
subprocess.check_call(lArgs,shell=False)
#and clear the file
lock = lockfile.FileLock("{0}_lock".format(sFileName))
with lock:
f = open(sFileName,'w')
f.close()
External process:
"""
this script goes through all lines in a file and creates data entries from the lines
"""
def main():
from optparse import OptionParser
from sqlalchemy import create_engine
from pyramidapp.models import Base,DBSession
import ast
import transaction
#get options
oParser = OptionParser()
oParser.add_option('-c','--connection_string',dest='connection_string')
oParser.add_option('-f','--input_file',dest='input_file')
(oOptions, lArgs) = oParser.parse_args()
#set up connection
#engine = engine_from_config(dSettings, 'sqlalchemy.')
engine = create_engine(
oOptions.connection_string,
echo=False)
DBSession.configure(bind=engine)
Base.metadata.bind = engine
#commit stuffs
import lockfile
lock = lockfile.FileLock("{0}_lock".format(oOptions.input_file))
with lock:
for sLine in open(oOptions.input_file,'r'):
dLine = ast.literal_eval(sLine)
create_entry(**dLine)
transaction.commit()
def create_entry(iDS,oStartDate,oEndDate,lTitles,lValues):
#import stuff
oEntry = DataEntry()
#do some other stuff, make more model instances...
DBSession.add(oEntry)
if __name__ == "__main__":
main()
in the view:
for line in big_giant_csv_file_handler:
lLineData.append({'stuff':'lots'})
if lLineData:
lLineSets = [lLineData[i:i+iBatchSize] for i in range(0,len(lLineData),iBatchSize)]
for l in lLineSets:
commit_lines.delay(l,dSettings,sCWD) #queue it for celery
You are just doing it wrong. Period.
Quoted from SQLAlchemy docs
The advanced developer will try to keep the details of session,
transaction and exception management as far as possible from the
details of the program doing its work.
Quoted from Pyramid docs
We made the decision to use SQLAlchemy to talk to our database. We also, though, installed pyramid_tm and zope.sqlalchemy.
Why?
Pyramid has a strong orientation towards support for transactions.
Specifically, you can install a transaction manager into your app
application, either as middleware or a Pyramid "tween". Then, just
before you return the response, all transaction-aware parts of your
application are executed. This means Pyramid view code usually doesn't
manage transactions.
My answer today is not code, but a recommendation to follow best practices recommended by the authors of the packages/frameworks you are working with.
References
Big picture - Using Thread-Local Scope with Web Applications
Typical error message when doing it wrong
Databases using SQLAlchemy
How to use scoped_session
Encapsulate CSV reading and creating SQLAlchemy model instances into something that supports the iterator protocol. I called it BatchingModelReader. It returns a collection of DataEntry instances, collection size depends on batch size. If the model changes overtime, you do not need to change the celery task. The task only puts a batch of models into a session and commits the transaction. By controlling the batch size you control memory consumption. Neither BatchingModelReader nor the celery task save huge amounts of intermediate data. This example shows as well that using celery is only an option. I added links to code samples of an pyramid application I am actually refactoring in a Github fork.
BatchingModelReader - encapsulates csv.reader and uses existing models from your pyramid application
get inspired by source code of csv.DictReader
could be run as a celery task - use appropriate task decorator
from .models import DBSession
import transaction
def import_from_csv(path_to_csv, batchsize)
"""given a CSV file and batchsize iterate over batches of model instances and import them to database"""
for batch in BatchingModelReader(path_to_csv, batchsize):
with transaction.manager:
DBSession.add_all(batch)
pyramid view - just save big giant CSV file, start task, return response immediately
#view_config(...):
def view(request):
"""gets file from request, save it to filesystem and start celery task"""
with open(path_to_csv, 'w') as f:
f.write(big_giant_csv_file)
#start task with parameters
import_from_csv.delay(path_to_csv, 1000)
Code samples
ToDoPyramid - commit transaction from commandline
ToDoPyramid - commit transaction from request
Pyramid using SQLAlchemy
Databases using SQLAlchemy
SQLAlchemy internals
Big picture - Using Thread-Local Scope with Web Applications
How to use scoped_session
We have a hybrid web application integrating a MySql db with Plone (last upgrade was to Plone 4.0), using collective.tin, collective.lead and SqlAlchemy.
Ok, I know that collective.tin never was released and collective.lead has been superseded; however all things work (almost) perfectly since a few years.
Recently we experienced a very strange behaviour and are looking for help in order to understand it.
Among others, we have 2 Plone content types, say A and B, defined by subclassing collective.tin, and the corresponding innodb MySql tables; rows of B have a foreign key towards A.
In the time span of 15-20 minutes, 2 different users created 3 A objects and some 10-20 B objects that weren't committed to MySql but were indexed by Plone; queries I executed with a MySql client from the linux shell weren't able to find those A rows (didn't look for B rows); however, queries executed through the web application (the aforementioned components stack) by those 2 users, and also by other users, occasionally were still finding and correctly visualizing some of those 3 A objects.
Only after I restarted the Zope instance, it was possible to resume normal activity from the Plone web interface; 3 A rows and many B rows were still missing from the MySql db, but the autoincrement counter showed the expected increment; I had to remove 3 invalid brains for A objects from the Plone index (didn't worry for B objects).
Any suggestion on possible causes and on how to investigate the problem?
We had the exact same problem with sqlalchemy 0.4; the session would get out of sync with the actual database contents. The problem was somewhat masked in our case because users were sent to specific backends in the cluster through session affinity. If the affinity was lost suddenly messages had disappeared. The exact details are a little hazy, because I cannot locate the correct (ancient) revision history of the fix I put in place.
From what I can glean from context is that the session identity map prevents the session from requiring the database for objects it retrieved before. It thus won't see changes made to these objects in different sessions.
The fix is to call .expire_all() on the session after each and every commit or rollback; SQLAlchemy 0.5 and up does this automatically (autoexpire=True on the session, now called expire_on_commit I believe), but for 0.4 you'll need to register a SessionExtension to do this for you.
Lucky for you, we also use collective.lead for this project, so my fix is your fix:
# The identity map should be flushed on commit.
# SQLAlchemy 0.5 does this properly, but in 0.4 we need to do this via
# a SesssionExtension.
from sqlalchemy import __version__
if __version__[:3] == '0.4':
from sqlalchemy.orm.session import SessionExtension
class ExpireAllSessionExtension(SessionExtension):
def after_commit(self, session):
"""Expire the identity-map on commit"""
session.expire_all()
def after_rollback(self, session):
"""Expire the identity-map on rollback"""
session.expire_all()
def installExtension():
# Patch collective.lead.database to let us install the extension
# on the session created there.
from collective.lead.database import Database
old_session = Database.session.fget
def session(self):
session = old_session(self)
if session.extension is None:
session.extension = ExpireAllSessionExtension()
return session
Database.session = property(session)
else:
def installExtension():
pass
When defining the mapper, you install this extension with:
from .sessionexpiration import installExtension
# Ensure that sessions get properly expired on commit and rollback.
installExtension()