ActiveRecord, Intentionally truncate string to db column width - mysql

In Rails 4, ActiveRecord and it's MySQL adapter are set up so if you try to save an attribute in an AR model to a MySQL db, where the attribute string length is too wide for the MySQL column limits -- you'll get an exception raised.
Great! This is much better default than Rails3, where it silently truncated the string.
However, occasionally I have an attribute that I explicitly want to be simply truncated to the maximum size allowed by the db, with no exception. I'm having trouble figuring out the best/supported way to do this with AR.
It should ideally happen as soon as the attribute is set, but I'd take it happening on save. (This isn't exactly a 'validation', as I never want to raise, just truncate, but maybe the validation system is the best supported way to do this?)
Ideally, it would automatically figure out the db column width through AR's db introspection, so if the db column width changed (in a later migration), the truncation limit would change accordingly. But if that's not possible, I'll take a hard-coded truncation limit.
Ideally it would be generic AR code that would work with any db, but if there's no good way to do that I'd take code that only worked for MySQL

You could truncate your data before inserting in db with a before_save or a before_validation
See Active Record Callbacks — Ruby on Rails Guides and ActiveRecord::Callbacks
You can retrieve informations on your table with MODEL.columns and MODEL.columns_hash.
See ActiveRecord::ModelSchema::ClassMethods
For example (not tested):
class User < ActiveRecord::Base
before_save :truncate_col
......
def truncate_col
col_size = User.columns_hash['your_column'].limit
self.your_column = self.your_column.truncate(col_size)
end
end

I'm pretty sure you can accomplish this with a combination of ActiveRecord callbacks and ConnectionsAdapters. ActiveRecord contains several callbacks you can override to perform specific logic at different points during the save flow. Since the exception is being thrown at save, I would recommend adding your logic to the before_save method. Using the column ConnectionAdapter you should be able to determine the limit of the column you wish to insert, though the logic will most likely be different for strings vs ints, etc. Off the top of my head you'll probably want to implement something like:
class User < ActiveRecord::Base
def before_save
limit = User.columns_hash['attribute'].limit
self.attribute = self.attribute[0..limit-1] if self.attribute.length > limit
end
end
The above example is for a string, but this solution should work for all connection adapters assuming they support the limit attribute. Hopefully that helps.

I'd like to address a few points:
If the data type of your_column is text, in Rails 4 User.columns_hash['your_column'].limit will return nil. It returns a number in case of int or varchar.
The text data type in MySQL has a storage limit of 64k. Meaning truncating upon char length is not enough if the content has non ascii chars like ç which needs more than 1 byte to be stored.
I've bumped into this problem very recently, here is a hotfix for it:
before_save :truncate_your_column_to_fit_into_max_storage_size
def truncate_your_column_to_fit_into_max_storage_size
return if your_column.blank?
max_field_size_in_bytes = 65_535
self.your_column = your_column[0, max_field_size_in_bytes]
while your_column.bytesize > max_field_size_in_bytes
self.your_column = your_column[0..-2]
end
end

Here's my own self-answer, which truncates on attribute set (way before save). Curious if anyone has any feedback. It seems to work!
# An ActiveRecord extension that will let you automatically truncate
# certain attributes to the maximum length allowed by the DB.
#
# require 'truncate_to_db_limit'
# class Something < ActiveRecord::Base
# extend TruncateToDbLimit
# truncate_to_db_limit :short_attr, :short_attr2
# #...
#
# Truncation is done whenever the attribute is set, NOT waiting
# until db save.
#
# For a varchar(4), if you do:
# model.short_attr = "123456789"
# model.short_attr # => '1234'
#
#
# We define an override to the `attribute_name=` method, which ActiveRecord, I think,
# promises to call just about all the time when setting the attribute. We call super
# after truncating the value.
module TruncateToDbLimit
def truncate_to_db_limit(*attribute_names)
attribute_names.each do |attribute_name|
ar_attr = columns_hash[attribute_name.to_s]
unless ar_attr
raise ArgumentError.new("truncate_to_db_limit #{attribute_name}: No such attribute")
end
limit = ar_attr.limit
unless limit && limit.to_i != 0
raise ArgumentError.new("truncate_to_db_limit #{attribute_name}: Limit not known")
end
define_method "#{attribute_name}=" do |val|
normalized = val.slice(0, limit)
super(normalized)
end
end
end
end

Related

Best way for getting the count of a query set after updating elements

I am looking for the best way to get the size of a query set in rails. However, the elements are updated in a loop and I need the count of the elements before the update. Here is some example (BUGGY !) code.
p = participations.where(invited_at: nil).limit(50)
p.each do |participation|
# Invite may raise an exception, but also contains operations that
# cannot be undone
participation.invite()
participation.invited_at = Time.zone.now
participation.save
end
DoStuff() if p.count > 0
This code does not work, because the call at p.count creates a new database query that does not consider the records that have been updated in the loop. Therefore, if there are less than 50 records, they are all updated and DoStuff() is not called.
What would be the most idiomatic way in rails to handle this:
Move the if p.count part out of the loop and only enter the loop if there are any records?
Replace p.count by p.size (if I understand size correctly, this should not cause any additional query)
Count the number of iterations in the loop and then use that number
I have a feeling that 1 is most idiomatic in ruby, but I don't have much experience in this language.
EDIT: Improved example to somewhat closer to the original code.
EDIT:
The problem is not about the update queries performed on the participants on the loop. These queries should be individual queries to keep track of which participations have already been handled, even if an error is raised. Rather, the problem is that DoStuff() should be called, whenever there have been any records processed in the loop. However, because count performs a new query AFTER the records have been handled, if there are less than 50 elements to be handled, all will be updated and DoStuff() will not be called.
That's the difference between count - which always execute a query, and size, which will return the number of loaded objects, if they are loaded, or will fall back to count otherwise. So the easiest fix will be to replace count with size.
But then each returns the collection over which it iterates, so you can do DoStfuff if p.each(&block).any? (doesn't look pretty if you have multiline block)
A cleaner way, without having the reviewer to know the difference between size and count, and without checking if each resulted in a collection with at least one element, is to have your code encapsulated in a method and add a guard clause.
def process_invitations
p = participations.where(invited_at: nil).limit(50)
return if p.none?
p.each do |participation|
# Invite may raise an exception, but also contains operations that
# cannot be undone
participation.invite()
participation.invited_at = Time.zone.now
participation.save
end
DoStuff()
end
You can even remove the limit and use p.first(50).each do
Move the if p.count part out of the loop and only enter the loop if there are any records?
.each does not enter the loop if there are no records. If you want proof try:
MyModel.none.each { puts "Hello world" }
If you want a more idiomatic way then don't use #each if you care about the results. #each should only be used when you only care about the side effects of the iteration.
Instead use #map or one of the many other iteration methods.
def process_invitations
p = participations.where(invited_at: nil).limit(50)
p.map do |participation|
# Invite may raise an exception, but also contains operations that
# cannot be undone
participation.invite()
participation.invited_at = Time.zone.now
participation.save
end.reject.yeild_self do |updates|
# run do_stuff if any of the records where updated
do_stuff() if updates.any?
end
end
Or if you for example only wanted to do_stuff to records that where updated:
def process_invitations
p = participations.where(invited_at: nil).limit(50)
p.map do |participation|
# Invite may raise an exception, but also contains operations that
# cannot be undone
participation.invite()
participation.invited_at = Time.zone.now
participation if participation.save
end.reject.yeild_self do |updated|
do_stuff(updated) if updated.any?
end
end
Or do_stuff to each of the records that where updated:
def process_invitations
p = participations.where(invited_at: nil).limit(50)
p.map do |participation|
# Invite may raise an exception, but also contains operations that
# cannot be undone
participation.invite()
participation.invited_at = Time.zone.now
participation if participation.save
end.reject.map do |record|
do_stuff(record)
end
end

Performance difference in Concern method ran with Hook vs on Model?

Basically I notice a big performance difference in dynamically overriding a getter for ActiveRecord::Base models within an after_initialize hook and simply within the model itself.
Say I have the following Concern:
module Greeter
extend ActiveSupport::Concern
included do
after_initialize { override_get_greeting }
end
def override_get_greeting
self.class::COLS.each do |attr|
self.class.class_eval do
define_method attr do
"Hello #{self[attr]}!"
end
end
end
end
end
I then have the following model, consisting of a table with names.
CREATE TABLE 'names' ( 'name' varchar(10) );
INSERT INTO names (name) VALUES ("John")
class Name < ActiveRecord::Base
COLS = %w("name")
include Greeter
end
john = Name.where(name: 'John').first
john.name # Hello John!
This works fine. However, if I try to do this a more Rails way it is significantly slower.
Essentially, I want to simply pass a parameter into Greeter method that contains COLS and then overrides the getters. It'll look something like:
# Greeter
module Greeter
extend ActiveSupport::Concern
def override_get_greeting(cols)
cols.each do |attr|
self.class.class_eval do
define_method attr do
"Hello #{self[attr]}!"
end
end
end
end
end
# Name
class Name < ActiveRecord::Base
include Greeter
override_get_greeting [:name]
end
Now Name.where(name: 'John').first.name # Hello John! is about 2 seconds slower on the first call.
I can't put my finger in it. I have an assumption that the the application is just slower to start with the first example, but not really sure.
I prefer the second example but the performance difference is a big no no.
Has anyone came across something like this?
Unless the real application code is radically different to what you've shown above, there's no way this should be causing a 2 second performance hit!
However, it's still a needlessly verbose and inefficient way to write the code: You're redefining methods on on the class instance, every time you initialize the class.
Instead of using after_initialize, you can just define the methods once. For example, you could put this in the Greeter module:
included do |klass|
klass::COLS.each do |attr|
define_method attr do
"Hello #{self[attr]}!"
end
end
end
Also worth noting is that instead of self[attr], you may instead wish to use super(). The behaviour will be the same (assuming no other overrides are present), except that an error will be raised if the column does not exist.

Database sorting incorrectly in test environment?

I have implemented a solution similar to this to prune my database.
# model.rb
after_create do
self.class.prune(ENV['VARIABLE_NAME'])
end
def self.prune(max)
order('created_at DESC').last.destroy! until count <= max
end
This works well in manual testing.
In RSpec, the test looks like this:
# spec/models/model_spec.rb
before(:each) do
#model = Model.new
end
describe "prune" do
it "should prune the database when it becomes larger than the allowed size" do
25.times { create(:model) }
first_model = model.first
expect{create(:model)}.to change{Model.count}.by(0)
expect{Model.find(first_model.id)}.to raise_error(ActiveRecord::RecordNotFound)
end
end
end
The result is
1) Model prune should prune the database when it becomes larger than the allowed size
Failure/Error: expect{Model.find(first_model.id)}.to raise_error(ActiveRecord::RecordNotFound)
expected ActiveRecord::RecordNotFound but nothing was raised
Inspecting the database during the test execution reveals that the call to order('created_at DESC').last is yielding the first instance of the model created in the 25.times block (Model#2) and not the model created in the before(:each) block (Model#1).
If I change the line
25.times { create(:model) }
to
25.times { sleep(1); create(:model) }
the test passes. If I instead sleep(0.1), the test still fails.
Does this mean that if my app creates two or more Model instances within 1 second of each other that it will choose the newest among them when choosing which to destroy (as opposed to the oldest, which is the intended behavior)? Could this be an ActiveRecord or MySQL bug?
Or if not, is there something about the way FactoryGirl or RSpec create records that isn't representative of production? How can I be sure my test represents realistic scenarios?
If the precision of your time column is only one second then you can't distinguish between items created in the same second (when sorting by date only).
If this is a concern in production then you could sort on created_at and id to enforce a deterministic order. From MySQL 5.6 onwards you can also create datetime columns that store fractional seconds. This doesn't eliminate the problem, but it would happen less often.
If it's just in tests then you can also fake time. As of rails 4.1 (I think) active support has the travel test helpers and there is also the timecop gem.

ActiveRecord undefined method `has_key?' for nil:NilClass error

I have a fairly simple forms-over-data Rails app that calls a remote MySql 5.5 db. Using Rails 3.2.21, Ruby 1.9.3.
One of the pages in the app is throwing the following error:
NoMethodError in GvpController#input
undefined method `has_key?' for nil:NilClass
app/controllers/gvp_controller.rb:9:in `input'
Here is the offending code from the controller:
class GvpController < ApplicationController
def input
# irrelevant stuff
#list = Vendor.gvp_vendor_names.map { |x| x.vendor_name }
# more irrelevant stuff
end
# other irrelevant methods
end
I'm assuming the call to gvp_vendor_names is returning nil.
Here is the Vendor model class:
class Vendor < ActiveRecord::Base
establish_connection :vendor_sql
self.table_name = 'reporting_dw.vp_vendor_mapping'
scope :gvp_vendor_names, -> {
select('reporting_dw.vp_vendor_mapping.vendor_name')}
end
I have searched other posts with this error message and so far haven't found one that seems relevant. I am not overriding the initialize method (one possible cause) and I think the syntax is correct (another).
As an additional wrinkle, I am using vagrant for development, so I thought perhaps I'm not successfully communicating with the database from the vagrant box - maybe an ssh or permissions issue. To test it, I opened an ssh session on the vagrant box, successfully connected with the db via command line ran a select statement and lo and behold, get the full list of results I was expecting. I also tried it with mysql workbench via ssh and had no problems. So, it seems I can communicate remotely with the db, execute queries against it, have the proper permissions etc.
Does anyone have any suggestions as to what the problem might be?
I assume that you haven't any value on your DB tables. That's why the issue arise in controller action block during you call gvp_vendor_names mapped value vendor_name
You should handle this type of case by checking the object value rather than accessing firstclass
GvpController < ApplicationController
def input
# irrelevant stuff
#list = Vendor.gvp_vendor_names.map { |x| x.vendor_name if x.present?}
# more irrelevant stuff
end
# other irrelevant methods
end
In this way you need to compact the nil value. So use this finally if you want to handle the scenario from controller:
class GvpController < ApplicationController
def input
# irrelevant stuff
#list = Vendor.gvp_vendor_names.map { |x| x.vendor_name if x.present?}.compact
# more irrelevant stuff
end
# other irrelevant methods
end
The real problem may just be that I'm a Rails/ActiveRecord n00b. After a little more experimentation, I found the following changes corrected the error.
In the model I added attr_accessible and then used engineersmnky's suggestion of using a method rather than scope, as follows:
class Vendor < ActiveRecord::Base
establish_connection :vendor_sql
attr_accessible :vendor_name
self.table_name = 'reporting_dw.vp_vendor_mapping'
def self.gvp_vendor_names
pluck(:vendor_name).sort
end
end
Then in the controller:
class GvpController < ApplicationController
def input
#irrelevant stuff
#list = Vendor.gvp_vendor_names
#irrelevant stuff
end
end
That fixed it. Thank you everyone for the suggestions!

Overwriting all default accessors in a rails model

class Song < ActiveRecord::Base
# Uses an integer of seconds to hold the length of the song
def length=(minutes)
write_attribute(:length, minutes.to_i * 60)
end
def length
read_attribute(:length) / 60
end
end
This is an easy example by rails api doc.
Is it possible overwrite all attributes for a model without overwrite each one?
Do you look for something like that? Don't know why you would want to do it, but here you go :)
class Song < ActiveRecord::Base
self.columns_hash.keys.each do |name|
define_method :"#{name}=" do
# set
end
define_method :"#{name}" do
# get
end
# OR
class_eval(<<-METHOD, __FILE__, __LINE__ + 1)
def #{name}=
# set
end
def #{name}
# get
end
METHOD
end
end
I'm not sure of a use case where this would be a good idea. However, all rails models dynamically have their properties assigned to them (assuming it isn't already in the class). The answer is partially in your question.
You can override the read_attribute() and write_attribute() methods. That would apply your transformations to every attribute whether they were written to by the accessor or populated in bulk in the controller. Just be careful to not mutate important attributes like the 'id' attribute.
Ruby has a shortcut that is used in rails code a fair bit that can help you. It's the %w keyword. %w will create an array of words based on the symbols inside the parentheses. Because it's an array you can do useful things like this:
#excludes = %w(id, name)
def read_attribute name
value = super
if(not #excludes.member? name)
value = value.to_i * 60
end
value
end
def write_attribute name, value
if(not #excludes.member? name)
value = value.to_i / 60
end
super
end
That should get you started. There are more advanced constructs like using lambdas, etc. Keep in mind you should write some thorough unit tests to make sure you don't have any unintended consequences. You may have to include more attribute names in the list of excludes.
edit: (read|write)_attributes -> (read|write)_attribute