Best practice: Which information should I store in my database? - mysql

Currently I am developing a small book rating app, where users can rate and comment on books.
Of course I have a book model:
class Book < ActiveRecord::Base
has_many :ratings
end
and a rating model:
class Rating < ActiveRecord::Base
belongs_to :book
end
The "overall rating value" of a rating object is calculated by different rating categories (e.g. readability, ... ). Furthermore the overall rating of one book should be calculated by all given ratings.
Now the question I am asking myself: Should I calculate/query the overall rating for every book EVERYTIME someone visits my page or should I add a field to my book model where the overall rating is (periodically) calculated and saved?
EDIT: The "calculation" I would use in this case is a simple average determination.
Example: A Book has about 200 ratings. Every rating is a composition of 10 category ratings. So I want to determine the average of one rating and in the end of all 200 ratings.

If the averaging of those ratings is not computationally expensive (i.e. doesn't take a long time), then just calculate it on-the-fly. This is in keeping with the idea of not prematurely optimsing (see http://c2.com/cgi/wiki?PrematureOptimization).
However, if you do want to optimise this calculation then storing it on the book model and updating the calculation on rating writes is the way to go. This is known as "caching" the result. Here is some code that will cache the average rating in the database. (There are other ways of caching).
class Book < ActiveRecord::Base
has_many :ratings, after_add :update_average_rating
def update_average_rating
update_attribute(:average_rating, average_rating)
end
def average_rating
rating_sum / ratings.count
end
def rating_sum
ratings.reduce(0) {|sum, rating|
sum + rating.value # assuming rating model has a value attribute
}
end
end
class Rating < ActiveRecord::Base
belongs_to :book
end
Note: the above code assumes the presence of an average_rating column on your book table in your database. Remember to add this column with a migration.

DB
The most efficient (although not conventional) way is to use db-level ALIAS columns, allowing you to calculate the AVG or SUM of the rating with each book call:
#app/models/book.rb
class Book < ActiveRecord::Base
def reviews_avg category
cat = category ? "AND `category` = \"#{category}\"" : ""
sql = "SELECT AVG(`rating`) FROM `reviews` WHERE `book_id` = #{self.id} #{cat})
results = ActiveRecord::Base.connection.execute(sql)
results.first.first.to_f
end
end
This would allow:
#book = Book.find x
#book.reviews_avg # -> 3.5
#book.reviews_avg "readability" # -> 5
This is the most efficient because it's handled entirely by the DB:
Rails
You should use the average functionality of Rails:
#app/models/book.rb
class Book < ActiveRecord::Base
has_many :ratings do
def average category
if category
where(category: category).average(:rating)
else
average(:rating)
end
end
end
end
The above will give you the ability to call an instance of a #book, and evaluate the average or total for its ratings:
#book = Book.find x
#book.reviews.average #-> 3.5
#book.reviews.average "readability" #-> 5
--
You could also use a class method / scope on Review:
#app/models.review.rb
class Review < ActiveRecord::Base
scope :avg, (category) -> { where(category: category).average(:rating) }
end
This would allow you to call:
#book = Book.find x
#book.reviews.avg #-> 3.5
#book.reviews.avg "readability" #-> 5
Association Extensions
A different way (not tested) would be to use the proxy_association.target object in an ActiveRecord Association Extension.
Whilst not as efficient as a DB-level query, it will give you the ability to perform the activity in memory:
#app/models/book.rb
class Book < ActiveRecord::Base
has_many :reviews do
def avg category
associative_array = proxy_association.target
associative_array = associative_array.select{|key, hash| hash["category"] == category } if category
ratings = associative_array.map { |a| a["rating"] }
ratings.inject(:+) / associative_array.size #-> 35/5 = 7
end
end
end
This would allow you to call:
#book = Book.find x
#book.reviews.avg # -> 3.5
#book.reviews.avg "readability" # -> 5

There is no need at all to recalculate the average overall rating for every page visit since it only will change when somebody actually rates the book. So just use a field AVG_RATING or something like this and update the value on every given rating.

Have you consider to use a cached version of the rating.
rating = Rails.cache.fetch("book_#{id}_rating", expires_in: 5.minutes) do
do the actual rating calculation here
end

In most cases you can get averages simply by querying the database:
average = book.reviews.average(:rating)
And in most cases its not going to be expensive enough that querying per request is going to be a real problem - and pre-mature optimization might be a waste of time and resources as Neil Atkinson points out.
However when the cost of calculation becomes an issue there are several approaches to consider which depend on the nature of the calculated data.
If the calculated data is something with merits being a resource on its own you would save it in the database. For example reports that are produced on a regular bases (daily, monthly, annual) and which need to be query-able.
Otherwise if the calculated data has a high "churn rate" (many reviews are created daily) you would use caching to avoid the expensive query where possible but stuffing the data into your database may lead to an excessive amount of slow UPDATE queries and tie up your web or worker processes.
There are many caching approaches that compliment each other:
etags to leverage client side caching - don't re-render if the response has not changed anyways.
fragment caching avoids db queries and re-rendering view chunks for data that has not changed.
model caching in Memcached or Redis can be used to avoid slow queries.
low level caching can be used to store stuff like averages.
See Caching with Rails: An overview for more details.

Related

Best performance wise query to get parent of queried nested resource

I am implementing an availability model nested within a listing. Its for a rental app.
class Listing
has_many :availabilities, dependent: :destroy
end
class Availability
belongs_to :listing
end
availabilities table has start and end date columns.
I am writing a query through search form to find listings where availabilities are present and the date given in the form lies in between start and end dates fo those availabilities.
My query in a class method looks like:
def self.search(params)
date = params[:date]
listingsids = Availability.where('startdate <= ?', date).where('enddate >= ?', date).pluck('listing_id')
products = Listing.where(id: listingsids)
end
However i feel this is not efficient. I wish I can write Listing.joins(:availability) and then use it but rails won't allow it. I can only join the other way which will give me a relation with availability objects and I want listings i.e. parent resource.
How can I make it more efficient and reduce number of queries I am doing?
Will appreciate your help :)
You should be able to use joins on listing to get you availablity relations, joins works using the relation name, not the model name, so instead of joins(:availability) you should be using joins(:availabilities). Something like this should work and use just a single query for your case:
Listing.joins(:availablities).where('availability.startdate <= ?', date).where('availability.enddate >= ?', date)
notice that joins uses the relation name joins(:availabilities) but the string in the where uses the table name where('availability.startdate <=?', date)

Rails 4 join group by query for sum amount as per currency and count

Hello I had given query
refund1 = Spree::Order.joins(:refunds).group('currency').sum(:total)
=> {"USD"=>#<BigDecimal:7f896ea15ed8,'0.17641E4',18(18)>, "SGD"=>#<BigDecimal:7f896ea15d98,'0.11184E3',18(18)>, "EUR"=>#<BigDecimal:7f896ea15ca8,'0.1876E3',18(18)>}
2.2.1 :212 >
refund1 = Spree::Order.joins(:refunds).group('currency').count
=> {"USD"=>2, "SGD"=>1, "EUR"=>2}
refund1.each do |k,v| refund1[k]=[v,refund2[k]] end
=> {"USD"=>[2, #<BigDecimal:7f896f1d83a0,'0.17641E4',18(18)>], "SGD"=>[1, #<BigDecimal:7f896f1d3fa8,'0.11184E3',18(18)>], "EUR"=>[2, #<BigDecimal:7f896f1d3aa8,'0.1876E3',18(18)>]}
refund1 = Spree::Order.joins(:refunds).group('currency').sum(refund.amount)
this is not working i need to sum refund amount not an order total
I need to fetch date also i.e on 02-08-2017 two orders refunded of 100USD
Please guide me how to fetch that.
Rails/ActiveRecord are good for relatively easy groupings, and you can group on multiple attributes instead of just the currency, but applying a function to one of the grouped values and returning multiple aggregations (sum and count) requires some effort.
It will also not be very performant unless you either start specifying SQL fragments in your select clause select("date_trunc(...), currency, sum(...), count(...)") or start using Arel (which to me always looks more complex than SQL with very few redeeming benefits).
I (because I am quite a SQL-ey person) would be tempted here to place a database view in the system that defines the aggregations that you want at the grouping level you want, and reference that in Rails through a model.
Create View spree_refund_day_currency_agg as select ....;
... and ...
class Spree::RefundDayCurrencyAgg < ActiveRecord::Base
def self.table_name
spree_refund_day_currency_agg
end
def read_only?
true
end
belongs_to ....
end
You can then access your aggregated data in the Rails environment as if it were a magically maintained set of data (similar to a materialised view, without the materialisation) in a totally flexible manner (as intended with an RDBMS) using logic defined in Rails.
For example, with scopes defined in the model
def self.bad_day_in_canada
where(currency: CANADA_CURR)
end
Not to everyone's taste though, I'm sure.

Performance improvement for Rails associated model aggregation

I have a Model called Person and Person has multiple posts. When I want to query post count for each person it takes a long time to process since it needs to iterate over each person and query each posts to get the aggregation.
class Person < ActiveRecord::Base
has_many :posts
end
Output (JSON):
Person1
PostsType1Count: 15
PostsType2Count: 45
Person2
PostsType3Count: 33
.
.
.
I want to calculate all the post count for each Person in a optimum way. What would be the best solution?
Here's one way to do this, if you have a small and pre-defined set of Types
class Person < ActiveRecord::Base
has_many :type_1_posts, :class_name => 'Post', :conditions => 'post_type = 1'
has_many :type_2_posts, :class_name => 'Post', :conditions => 'post_type = 2'
has_many :type_3_posts, :class_name => 'Post', :conditions => 'post_type = 3'
end
Then you can write code that looks like this to get all the data:
#all_people = Person.includes(:type_1_posts, :type_2_posts, :type_3_posts).all
The eager loading of the posts allows the count of each type of post to be available, as well as all the posts of each type.
If you need extra performance for this code, because you perform this query a lot, then you can look into using the Rails counter cache mechanism to keep track of the counts of each type on the Person object.
The beauty of Rails here is that your main display code doesn't need to change during this process of making the code faster for reading (adding a counter cache makes adding/deleting posts slower, so you may not want it in all cases).
Write initial code
Use eager loading to make it faster
Use counter cache to make it even faster
Try this May it will work for you
#In Controller
#persons = Person.all
#In View
#persons.each do |person|
person.posts.count # It will gives all posts count
person.posts.collect{|x| true if x.type==1 }.compact.count #If you want to get the post counts based on type
end
Suppose if you want to get any mehods just check in console or debug is person.methods.sort it will give all methods.
try in rails console person.posts.methods also it will give types also then check counts based on type. because i dont know which fields in posts model. so check it.

How to leverage geocoder distance function within SQL query through an association

I'm having some trouble figuring out how to implement the SQL query where I can show the closest results (by calculated distance) first and paginate through them.
Class Location
belongs_to :student
geocoded_by :address
end
Class Student
has_one :location
has_one :school
end
Class School
belongs_to :student
end
now within SQL, I want to have a query that can go through the association Student.joins(:location) and find me the closest student from the perspective of the student who is searching. This is after specifying a specific school (so a subset of the overall Students)
For example, Joe goes to LSU and wants to be shown a list of the closest students that also go to LSU. But the distance is based on Joe's location so this will be different if Bob runs the same query.
So I know geocoder provides something like Location.nearbys(10) but what I'd really like to do is say
joe = Student.find("Joe")
closest_lsu_stdents_to_joe = Student.where("school = LSU").order(distance_to(joe)).paginate(:page => params[:page])
So I don't want to limit the search to a specified radius like nearbys does. I just want the database to calculate the distance between Joe and all the other students at LSU and return the closest ones first. And then Joe can go through all the results via the pagination.
Any help would be much appreciated. I'm using MySQL but open to other solutions.
Thanks a lot!!!

solving for a race condition in rails ( limited number of X in Y )

Edit: Using MySQL...
Say you have an app that adds students to a class, and that class has limited space... so you do something like this:
def add
if some_classroom.size < MAX_SIZE
add_student_to_class
end
end
That's a race condition in a multi-threaded environment. Lame.
Assuming we
don't want this, and
don't want to lock our classroom table or record (which causes our app to suck elsewhere)
What do we do?
I propose this:
class Classroom < ActiveRecord::Base
has_one :classroom_lock
after_create :create_lock_record
def create_lock_record
c = ClassroomLock.new
c.classroom = self
c.save!
end
end
class ClassroomLock < ActiveRecord::Base
belongs_to :classroom
end
def add
c = Classroom.first
ActiveRecord::Base.transaction do
c.classroom_lock.lock!
c = Classroom.first #load this again (it might have changed)
if c.size < MAX_SIZE
c.add_new_student(some_student)
else
do_stuff_about_not_enough_room
end
end
end
This seems like it should work awesomely. My (ficticious) Classroom#show method doesn't block because the classroom record isn't actually locked and the add method is effectively single threaded since any additional processes will be forced to wait at the lock! line until the lock is released.
Does this work? Maybe? I think so? I don't know...
I've done a fair bit of hammering this with multiple processes at once, but it's hard to know for sure (it is a race condition after all).
Can anyone provide some additional insight?
Hi,
I imagine a pretty easy SQL-approach with just one caveat: add an additional column called something like list_position and add a unique index to the classroom-student-relation-table consisting of classroom_id, student_id and list_position.
Inserting than happens by setting list_position to the number of students in class plus one. Ruby code would probably be somewhere along
classroomStudent = ClassroomStudent.new
classroomStudent.list_position = classroom.size + 1
until classroomStudent.save
classroomStudent.list_position += 1
do_stuff_about_not_enough_room if classroomStudent.list_position > MAX_SIZE
end
Result: If two inserts try to insert a student into the same class at the same time the unique index will keep one student out (INSERT fails). This means you may have to retry until list_position=MAX_SIZE, but are guaranteed you will never have too many students in class.
You could also use this approach to build up a waiting-queue.
If you already have data in your relation-table you will have to add a value for list_position first. I would guess something like
UPDATE ClassroomStudents AS c1 SET c1.list_position = COALESCE((SELECT MAX(c2.list_position) FROM ClassroomStudents AS c2 WHERE c2.classroom_id = c1.classroom_id), 0) + 1;
would do the job here although it might be a bit on the slow side.
I know the solution still has some rough edges, but maybe it helps.
Regards
TC