I have implemented a solution similar to this to prune my database.
# model.rb
after_create do
self.class.prune(ENV['VARIABLE_NAME'])
end
def self.prune(max)
order('created_at DESC').last.destroy! until count <= max
end
This works well in manual testing.
In RSpec, the test looks like this:
# spec/models/model_spec.rb
before(:each) do
#model = Model.new
end
describe "prune" do
it "should prune the database when it becomes larger than the allowed size" do
25.times { create(:model) }
first_model = model.first
expect{create(:model)}.to change{Model.count}.by(0)
expect{Model.find(first_model.id)}.to raise_error(ActiveRecord::RecordNotFound)
end
end
end
The result is
1) Model prune should prune the database when it becomes larger than the allowed size
Failure/Error: expect{Model.find(first_model.id)}.to raise_error(ActiveRecord::RecordNotFound)
expected ActiveRecord::RecordNotFound but nothing was raised
Inspecting the database during the test execution reveals that the call to order('created_at DESC').last is yielding the first instance of the model created in the 25.times block (Model#2) and not the model created in the before(:each) block (Model#1).
If I change the line
25.times { create(:model) }
to
25.times { sleep(1); create(:model) }
the test passes. If I instead sleep(0.1), the test still fails.
Does this mean that if my app creates two or more Model instances within 1 second of each other that it will choose the newest among them when choosing which to destroy (as opposed to the oldest, which is the intended behavior)? Could this be an ActiveRecord or MySQL bug?
Or if not, is there something about the way FactoryGirl or RSpec create records that isn't representative of production? How can I be sure my test represents realistic scenarios?
If the precision of your time column is only one second then you can't distinguish between items created in the same second (when sorting by date only).
If this is a concern in production then you could sort on created_at and id to enforce a deterministic order. From MySQL 5.6 onwards you can also create datetime columns that store fractional seconds. This doesn't eliminate the problem, but it would happen less often.
If it's just in tests then you can also fake time. As of rails 4.1 (I think) active support has the travel test helpers and there is also the timecop gem.
Related
I have phonorgraph object with billions of rows and we are querying it through object set service
for example, I want to get all DriverLicences from certain city.
#Function()
public getDriverLicences(city: string): ObjectSet<DriverLicences> {
let drivers = Objects.search().DriverLicences().filter(row => row.city.exactMatch(city));
return drivers ;
}
I am facing this error when I am trying query it from slate:
ERROR 400: {"errorCode":"INVALID_ARGUMENT","errorName":"ObjectSet:PagingAboveConfiguredLimitNotAllowed","errorInstanceId":"0000-000","parameters":{}}
I understand that I am probably retrieving more than 100 000 results but I need all the results because of the implemented logic in the front is a complex slate dashboard built by another team that we cannot re-factor.
The issue here is that, specifically in the Slate <> Function connector, there is a "translation layer" that serializes the contents of the object set and provides a response data structure that materializes the property:value pairs for each object in the set.
This clearly doesn't work for large object sets where throwing so much data into the browser is likely to overwhelm the resources allocated to the tab.
From context it seems like you might be migrating an existing Slate app over to Functions; in the current version, how is the query limiting the number of results returned? It certainly must not be returning several 100 thousand results for further processing on the front end? (And if so, that might be an anti-pattern to consider addressing).
As for options that you could currently explore, you can sort your object set and then specify a smaller limit to return:
Objects.search().DriverLicences().filter(row => row.city.exactMatch(city)).orderBy(date_of_issue).take(100)
You'll find a few more details in the Functions documentation Reference entry on Ontology API: Object Sets in the section on Ordering and limiting.
You can even make a work around for the (current) lack of paging when return an ObjectSet to Slate by using the last value from the property ordered on (i.e. date_of_issue) as a filter in the subsequent request and return the next N objects.
This can work if you need a Slate table or HTML widget that renders on set of results then, on a user action, gets the next page.
I created a new Rails 4.2.1 test project to try out the new streaming feature (the 'Live' one which I read about here). This project is set up to use MySQL for the database (I also tried Sqlite but couldn't repro the issue with it). The project is simple, consisting only of: 1) a model Test with 2 attributes (both strings). 2) a simple route resources :tests and 3) a simple controller tests_controller with one action index. The model and controller were generated by the standard rails generators, and only the controller was modified, as follows:
class TestsController < ApplicationController
include ActionController::Live
def index
response.headers['Content-Type'] = 'application/json'
response.stream.write('{"count": 5, "tests": [')
Test.find_each do |test|
response.stream.write(test.to_json)
response.stream.write(',')
end
response.stream.write(']}')
response.stream.close
end
end
When I run rails s and test by hand everything seems fine. But when I added a test (shown below) I get a strange error:
1) Error:
TestsControllerTest#test_index:
ActiveRecord::StatementInvalid: Mysql2::Error: This connection is in use by: #<Thread:0x007f862a4a7e48#/Users/xxx/.rvm/gems/ruby-2.2.2/gems/actionpack-4.2.1/lib/action_controller/metal/live.rb:269 sleep>: ROLLBACK
The test is:
require 'test_helper'
class TestsControllerTest < ActionController::TestCase
test "index" do
#request.headers['Accept'] = 'application/json'
get :index
assert_response :success
end
end
Note that the error is intermittent, coming up only about half the time. Also, even though testing by hand doesn't cause any errors I'm worried that when multiple clients hit the API at the same time that errors will occur. Any suggestions as to what's going on here would be much appreciated.
Pretty old, but you need to actually checkout a new database connection since ActionController::Live executes the action in a new thread:
The final caveat is that your actions are executed in a separate thread than the main thread. Make sure your actions are thread safe, and this shouldn't be a problem (don't share state across threads, etc).
https://github.com/rails/rails/blob/861b70e92f4a1fc0e465ffcf2ee62680519c8f6f/actionpack/lib/action_controller/metal/live.rb
You can even use an around_filter/around_action for this.
I describe the outcome of a strategy by numerous rows. Each row contains a symbol (describing an asset), a timestamp (think of a backtest) and a price + weight.
Before a strategy runs I delete all previous results from this particular strategy (I have many strategies). I then loop over all symbols and all times.
# delete all previous data written by this strategy
StrategyRow.objects.filter(strategy=strategy).delete()
for symbol in symbols.keys():
s = symbols[symbol]
for t in portfolio.prices.index:
p = prices[symbol][t]
w = weights[symbol][t]
row = StrategyRow.objects.create(strategy=strategy, symbol=s, time=t)
if not math.isnan(p):
row.price = p
if not math.isnan(w):
row.weight = w
row.save()
This works but is very, very slow. Is there a chance to achive the same with write_frame from pandas? Or maybe using faster raw sql?
I don't think the first thing you should try is the raw SQL route (more on that in a bit)
But I think it's because of calling row.save() on many objects, that operation is known to be slow.
I'd look into StrategyRow.objects.bulk_create() first, https://docs.djangoproject.com/en/1.7/ref/models/querysets/#django.db.models.query.QuerySet.bulk_create
The difference is you pass it a list of your StrategyRow model, instead of calling .save() on individual instances. It's pretty straightforward, bundle up a few rows then create them in batches, maybe try 10, 20, a 100 etc at a time, your database configs can also help find the optimum batch size. (e.g. http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_max_allowed_packet)
Back to your idea of raw SQL, that would make a difference, if e.g. the Python code that creates the StrategyRow instances is slow (e.g. StrategyRow.objects.create()), but still I believe the key is to batch insert them instead of running N queries
We have a hybrid web application integrating a MySql db with Plone (last upgrade was to Plone 4.0), using collective.tin, collective.lead and SqlAlchemy.
Ok, I know that collective.tin never was released and collective.lead has been superseded; however all things work (almost) perfectly since a few years.
Recently we experienced a very strange behaviour and are looking for help in order to understand it.
Among others, we have 2 Plone content types, say A and B, defined by subclassing collective.tin, and the corresponding innodb MySql tables; rows of B have a foreign key towards A.
In the time span of 15-20 minutes, 2 different users created 3 A objects and some 10-20 B objects that weren't committed to MySql but were indexed by Plone; queries I executed with a MySql client from the linux shell weren't able to find those A rows (didn't look for B rows); however, queries executed through the web application (the aforementioned components stack) by those 2 users, and also by other users, occasionally were still finding and correctly visualizing some of those 3 A objects.
Only after I restarted the Zope instance, it was possible to resume normal activity from the Plone web interface; 3 A rows and many B rows were still missing from the MySql db, but the autoincrement counter showed the expected increment; I had to remove 3 invalid brains for A objects from the Plone index (didn't worry for B objects).
Any suggestion on possible causes and on how to investigate the problem?
We had the exact same problem with sqlalchemy 0.4; the session would get out of sync with the actual database contents. The problem was somewhat masked in our case because users were sent to specific backends in the cluster through session affinity. If the affinity was lost suddenly messages had disappeared. The exact details are a little hazy, because I cannot locate the correct (ancient) revision history of the fix I put in place.
From what I can glean from context is that the session identity map prevents the session from requiring the database for objects it retrieved before. It thus won't see changes made to these objects in different sessions.
The fix is to call .expire_all() on the session after each and every commit or rollback; SQLAlchemy 0.5 and up does this automatically (autoexpire=True on the session, now called expire_on_commit I believe), but for 0.4 you'll need to register a SessionExtension to do this for you.
Lucky for you, we also use collective.lead for this project, so my fix is your fix:
# The identity map should be flushed on commit.
# SQLAlchemy 0.5 does this properly, but in 0.4 we need to do this via
# a SesssionExtension.
from sqlalchemy import __version__
if __version__[:3] == '0.4':
from sqlalchemy.orm.session import SessionExtension
class ExpireAllSessionExtension(SessionExtension):
def after_commit(self, session):
"""Expire the identity-map on commit"""
session.expire_all()
def after_rollback(self, session):
"""Expire the identity-map on rollback"""
session.expire_all()
def installExtension():
# Patch collective.lead.database to let us install the extension
# on the session created there.
from collective.lead.database import Database
old_session = Database.session.fget
def session(self):
session = old_session(self)
if session.extension is None:
session.extension = ExpireAllSessionExtension()
return session
Database.session = property(session)
else:
def installExtension():
pass
When defining the mapper, you install this extension with:
from .sessionexpiration import installExtension
# Ensure that sessions get properly expired on commit and rollback.
installExtension()
I am currently trying to move my DB tables over to InnoDB from MyISAM. I am having timing issues with requests and cron jobs that are running on the server that is leading to some errors. I am quite sure that transaction support will help me with the problem. I am therefore transitioning to InnoDB.
I have a suite of tests which make calls to our webservices REST API and receive XML responses. The test suite is fairly thorough, and it's written in Python and uses SQLAlchemy to query information from the database. When I change the tables in the system from MyISAM to InnoDB however, the tests start failing. However, the tests aren't failing because the system isn't working, they are failing because the ORM is not correctly querying the rows from the database I am testing on. when I step through the code I see the correct results, but the ORM is not returning the correct results at all.
Basic flow is:
class UnitTest(unittest.TestCase):
def setUp(self):
# Create a test object in DB that gets affected by the web server
testObject = Obj(foo='one')
self.testId = testObject.id
session.add(testObject)
session.commit()
def tearDown(self):
# Clean up after the test
testObject = session.query(Obj).get(self.testId)
session.delete(testObject)
session.commit()
def test_web_server(self):
# Ensure the initial state of the object.
objects = session.query(Obj).get(self.testId)
assert objects.foo == 'one'
# This will make a simple HTTP get call on an url that will modify the DB
response = server.request.increment_foo(self.testId)
# This one fails, the object still has a foo of 'one'
# When I stop here in a debugger though, and look at the database,
# The row in question actually has the correct value in the database.
# ????
objects = session.query(Obj).get(self.testId)
assert objects.foo == 'two'
Using MyISAM tables to store the object and this test will pass. However, when I change to InnoDB tables, this test will not pass. What is more interesting is that when I step through the code in the debugger, I can see that the datbase has what I expect, so it's not a problem in the web server code. I have tried nearly every combination of expire_all, autoflush, autocommit, etc. etc, and still can't get this test to pass.
I can provide more info if necessary.
Thanks,
Conrad
The problem is that you put the line self.testId = testObject.id before new object is added to session, flushed, and SQLAlchemy assigned ID to it. Thus self.testId is always None. Move this line below session.commit().