Recalculate Counter Cache of 120k Records [Rails / ActiveRecord] - mysql

The following situation:
I have a poi model, which has many pictures (1:n). I want to recalculate the counter_cache column, because the values are inconsistent.
I've tried to iterate within ruby over each record, but this takes much too long and quits sometimes with some "segmentation fault" bugs.
So i wonder, if its possible to do this with a raw sql query?

If, for example, you have Post and Picture models, and post has_many :pictures, you can do it with update_all :
Post.update_all("pictures_count=(Select count(*) from pictures where pictures.post_id=posts.id)")

I found a nice solution on krautcomputing.
It uses reflections to find all counter caches of a project, SQL queries to find only the objects that are inconsistent and use Rails reset_counters to clean things up.
Unfortunately it only works with "conventional" counter caches (no class name, no custom counter cache names) so I refined it:
Rails.application.eager_load!
ActiveRecord::Base.descendants.each do |many_class|
many_class.reflections.each do |name, reflection|
if reflection.options[:counter_cache]
one_class = reflection.class_name.constantize
one_table, many_table = [one_class, many_class].map(&:table_name)
# more reflections, use :inverse_of, :counter_cache etc.
inverse_of = reflection.options[:inverse_of]
counter_cache = reflection.options[:counter_cache]
if counter_cache === true
counter_cache = "#{many_table}_count"
inverse_of ||= many_table.to_sym
else
inverse_of ||= counter_cache.to_s.sub(/_count$/,'').to_sym
end
ids = one_class
.joins(inverse_of)
.group("#{one_table}.id")
.having("MAX(#{one_table}.#{counter_cache}) != COUNT(#{many_table}.id)")
.pluck("#{one_table}.id")
ids.each do |id|
puts "reset #{id} on #{many_table}"
one_class.reset_counters id, inverse_of
end
end
end
end

Related

Should I make calculations using Ruby, or should I make them using MySQL?

I'm building a reporting system for a chat made in Ruby on Rails but received some comments telling me that my approach is inefficient.
Here's a little sample of how my reports work:
I have a handler that is called each month which calls a Report mailer Like this:
ReportMailer.monthly_report(user).deliver_later
This is how the mailer looks:
class ReportMailer < ApplicationMailer
default from: ENV["DEFAULT_MAILER_FROM"],
template_path: 'mailers/report_mailer'
def monthly_report(agent)
#agent = agent
#organization = agent.organization
#report = Report.new #organization
mail(to: agent.email, subject: #report.email_subject)
end
end
I'm trying to calculate the data using a "plain old" Ruby class:
module Reports
class Component < Report
def initialize(subject)
#component = subject
#cache = {}
end
attr_reader :component
# DELEGATIONS
# -----------------------
delegate :chat_messages, to: :component
def response_count
count = 0
explore_msgs { |msg, next_msg| count += 1 if response? msg, next_msg }
return count
end
def response_time
time = 0
explore_msgs { |msg, next_msg| time += time_difference msg, next_msg if response? msg, next_msg }
return time.to_i.seconds
end
def avg_response_time
#cache[__method__] ||= (response_time / response_count if response_count > 0)
end
private
def response?(msg, next_msg)
next_msg&.user_type == 'Agent' && msg.user_type == 'User' && msg.conversation_id == next_msg.conversation_id && time_difference(msg, next_msg).seconds < 8.hours
end
def time_difference(msg, next_msg)
(next_msg.created_at - msg.created_at).abs
end
def explore_msgs
chat_messages.each_with_index do |msg, i|
next_msg = chat_messages[i+1]
yield msg, next_msg
end
end
end
end
I'm concerned with improving performance. I implemented a simple caching system into the class in charge of making the calculations which made huge improvements in the system efficiency, however, I'm concerned that making these calculations in Ruby might create bottlenecks or that it might not be a scalable solution.
It could be faster. The problem I see is that you are looking a one record and the following record. So how would you get the database to compare the two records?
In straight SQL, I would join the table to itself, group by the first instance of the table and do a min(created_at) on the second instance of the table.
Using our companies table, the SQL looks like this:
select rc1.id, rc1.created_at, min(rc2.created_at)
from companies rc1 inner join companies rc2 on rc1.created_at < rc2.created_at
group by rc1.id
You can add the difference to the select.
This will be certainly be slow if the created_at field is not indexed and the number of records in the table is large.
You can add the test for Agent and User to the having clause.
The query is tricky and the database might not be able to do this fast. It will also be tricky if you try to get ActiveRecord to build the query for you.
However, I think everything you are trying to do in your code can be done this way by the database.
Your query might look like this:
select chat_messages.*,
min(next_msg.created_at) as next_created_at,
next_msg.created_at - chat_messages.created_at as created_at_diff
from chat_messages inner join chat_messages next_msg
on chat_messages.created_at < chat_messages.created_at
and chat_messages.user_type = 'User'
group by chat_messages.id
having next_msg.user_type = 'Agent'
and TIMESTAMPDIFF(HOUR, min(next_msg.created_at), chat_messages.created_at) < 8

Rails update multiple record with hash

I need to update my data iteratively.
But the following way I achieved is the way too time-consuming.
Can I update multiple records with an id-value hash?
SUBST = ''.freeze
re = /<p>|<\/p>/m
(1..1000).each do |id|
choice = QuestionChoice.find id
choice.selections.gsub!(re, SUBST)
choice.save! if choice.changed?
end
Update:
Since I found out my code could be improved by using where
Like the following
QuestionChoice.where(id: (1..1000)).each do |choice|
choice.selections.gsub!(re, SUBST)
choice.save! if choice.changed?
end
But now I still need to call save! for every record which will cost much time.
You are hitting the db 1000 times sequentially to get each record separately, try to use single query to get all records you need to update:
SUBST = ''.freeze
re = /<p>|<\/p>/m
QuestionChoice.where('id <= 1000').map do |q|
q.selections.gsub!(re, SUBST)
q.save! if q.changed?
end
I used to face this problem and I solved it. Try to the following:
MySQL 8.0+:
QuestionChoice.where(id: 1..1000).update_all("selections = REGEXP_REPLACE(selections, '<p>|<\/p>', '')")
Others:
QuestionChoice.where(id: 1..1000).update_all("selections = REPLACE(selections, '</p>', '')")
or
QuestionChoice.where(id: 1..1000).update_all %{
selections =
CASE
WHEN selections RLIKE '<p>|<\/p>'
THEN REPLACE(selections,'<p>|<\/p>', '')
END
WHERE selections RLIKE '<p>|<\/p>'
}
IMPORTANT: Try to put a few backlashes (\) to your regex pattern in the clause if needed.

I need some help implementing eager loading

I've recently been tipped off to eager loading and its necessity in improving performance. I've managed to cut a few queries from loading this page, but I suspect that I can trim them down significantly more if I can eager-load the needed records correctly.
This controller needs to load all of the following to fill the view:
A Student
The seminar (class) page that the student is viewing
All of the objectives included in that seminar
The objective_seminars, the join table between objectives and seminars. This includes the column "priority" which is set by the teacher and used in ordering the objectives.
The objective_students, another join table. Includes a column "points" for the student's score on that objective.
The seminar_students, one last join table. Includes some settings that the student can adjust.
Controller:
def student_view
#student = Student.includes(:objective_students).find(params[:student])
#seminar = Seminar.includes(:objective_seminars).find(params[:id])
#oss = #seminar.objective_seminars.includes(:objective).order(:priority)
#objectives = #seminar.objectives.order(:name)
objective_ids = #objectives.map(&:id)
#student_scores = #student.objective_students.where(:objective_id => objective_ids)
#ss = #student.seminar_students.find_by(:seminar => #seminar)
#teacher = #seminar.user
#teach_options = teach_options(#student, #seminar, 5)
#learn_options = learn_options(#student, #seminar, 5)
end
The method below is where a lot of duplicate queries are occurring that I thought were supposed to be eliminated by eager loading. This method gives the student six options so she can choose one objective to teach her classmates. The method looks first at objectives where the student has scored between 75% and 99%. Within that bracket, they are also sorted by "priority" (from the objective_seminars join table. This value is set by the teacher.) If there is room for more, then the method looks at objectives where the student has scored 100%, sorted by priority. (The learn_options method is practically the same as this method, but with different bracket numbers.)
teach_options method:
def teach_options(student, seminar, list_limit)
teach_opt_array = []
[[70,99],[100,100]].each do |n|
#oss.each do |os|
obj = os.objective
this_score = #student_scores.find_by(:objective => obj)
if this_score
this_points = this_score.points
teach_opt_array.push(obj) if (this_points >= n[0] && this_points <= n[1])
end
end
break if teach_opt_array.length > list_limit
end
return teach_opt_array
end
Thank you in advance for any insight!
#jeff - In regards to your question, I don't see where a lot of queries would be happening outside of #student_scores.find_by(:objective => obj).
Your #student_scores object is already an ActiveRecord relation, correct? So you can use .where() on this, or .select{} without hitting the db again. Select will leave you with an array though, rather than an AR Relation, so be careful there.
this_score = #student_scores.where(objectve: obj)
this_score = #student_scores.select{|score| score.objective == obj}
Those should work.
Just some other suggestions on your top controller method - I don't see any guards or defensive coding, so if any of those objects are nil, your .order(:blah) is probably going to error out. Additionally, if they return nil, your subsequent queries which rely on their data could error out. I'd opt for some try()s or rescues.
Last, just being nitpicky, but those first two lines are a little hard to read, in that you could mistakenly interpret the params as being applied to the includes as well as the main object:
#student = Student.includes(:objective_students).find(params[:student])
#seminar = Seminar.includes(:objective_seminars).find(params[:id])
I'd put the find with your main object, followed by the includes:
#student = Student.find(params[:student]).includes(:objective_students)
#seminar = Seminar.find(params[:id]).includes(:objective_seminars)

Can Kaminari be used twice in the same controller def?

I am new and I am trying to figure out if Kaminari can be used twice in the same controller def as in my example below. I ultimately want to be able to paginate and display two sets of search results on the same page....
Ex:
def whatever
#page = params[:page] ||= 1
#per = params[:per] ||= 32
#code = query (1st query)
#code = query.uniq.to_a
#code = Kaminari.paginate_array(#code).page(#page).per(#per)
#code2 = query (2nd query)
#code2 = query.uniq.to_a
#code2 = Kaminari.paginate_array(#code2).page(#page).per(#per)
end
kaminary is not run per action, it adds methods not only for arrays but it also integrations with the activerecord and you run it in any query directly, it will translate the pagination to a mysql limit + offset which is better than paginating an array.
#data = Query.page(params[:page]).per(32)
if page is nil it will be used as page 1
It also has a lot of features such as global per app settings, per model settings, and more which I don't think would be a good idea to list here, you can refer to the gem README for more details

Rails select random record

I don't know if I'm just looking in the wrong places here or what, but does active record have a method for retrieving a random object?
Something like?
#user = User.random
Or... well since that method doesn't exist is there some amazing "Rails Way" of doing this, I always seem to be to verbose. I'm using mysql as well.
Most of the examples I've seen that do this end up counting the rows in the table, then generating a random number to choose one. This is because alternatives such as RAND() are inefficient in that they actually get every row and assign them a random number, or so I've read (and are database specific I think).
You can add a method like the one I found here.
module ActiveRecord
class Base
def self.random
if (c = count) != 0
find(:first, :offset =>rand(c))
end
end
end
end
This will make it so any Model you use has a method called random which works in the way I described above: generates a random number within the count of the rows in the table, then fetches the row associated with that random number. So basically, you're only doing one fetch which is what you probably prefer :)
You can also take a look at this rails plugin.
We found that offsets ran very slowly on MySql for a large table. Instead of using offset like:
model.find(:first, :offset =>rand(c))
...we found the following technique ran more than 10x faster (fixed off by 1):
max_id = Model.maximum("id")
min_id = Model.minimum("id")
id_range = max_id - min_id + 1
random_id = min_id + rand(id_range).to_i
Model.find(:first, :conditions => "id >= #{random_id}", :limit => 1, :order => "id")
Try using Array's sample method:
#user = User.all.sample(1)
In Rails 4 I would extend ActiveRecord::Relation:
class ActiveRecord::Relation
def random
offset(rand(count))
end
end
This way you can use scopes:
SomeModel.all.random.first # Return one random record
SomeModel.some_scope.another_scope.random.first
I'd use a named scope. Just throw this into your User model.
named_scope :random, :order=>'RAND()', :limit=>1
The random function isn't the same in each database though. SQLite and others use RANDOM() but you'll need to use RAND() for MySQL.
If you'd like to be able to grab more than one random row you can try this.
named_scope :random, lambda { |*args| { :order=>'RAND()', :limit=>args[0] || 1 } }
If you call User.random it will default to 1 but you can also call User.random(3) if you want more than one.
If you would need a random record but only within certain criteria you could use "random_where" from this code:
module ActiveRecord
class Base
def self.random
if (c = count) != 0
find(:first, :offset =>rand(c))
end
end
def self.random_where(*params)
if (c = where(*params).count) != 0
where(*params).find(:first, :offset =>rand(c))
end
end
end
end
For e.g :
#user = User.random_where("active = 1")
This function is very useful for displaying random products based on some additional criteria
Strongly Recommend this gem for random records, which is specially designed for table with lots of data rows:
https://github.com/haopingfan/quick_random_records
Simple Usage:
#user = User.random_records(1).take
All other answers perform badly with large database, except this gem:
quick_random_records only cost 4.6ms totally.
the accepted answer User.order('RAND()').limit(10) cost 733.0ms.
the offset approach cost 245.4ms totally.
the User.all.sample(10) approach cost 573.4ms.
Note: My table only has 120,000 users. The more records you have, the more enormous the difference of performance will be.
UPDATE:
Perform on table with 550,000 rows
Model.where(id: Model.pluck(:id).sample(10)) cost 1384.0ms
gem: quick_random_records only cost 6.4ms totally
Here is the best solution for getting random records from database.
RoR provide everything in ease of use.
For getting random records from DB use sample, below is the description for that with example.
Backport of Array#sample based on Marc-Andre Lafortune’s github.com/marcandre/backports/ Returns a random element or n random elements from the array. If the array is empty and n is nil, returns nil. If n is passed and its value is less than 0, it raises an ArgumentError exception. If the value of n is equal or greater than 0 it returns [].
[1,2,3,4,5,6].sample # => 4
[1,2,3,4,5,6].sample(3) # => [2, 4, 5]
[1,2,3,4,5,6].sample(-3) # => ArgumentError: negative array size
[].sample # => nil
[].sample(3) # => []
You can use condition with as per your requirement like below example.
User.where(active: true).sample(5)
it will return randomly 5 active user's from User table
For more help please visit : http://apidock.com/rails/Array/sample