Rails select random record - mysql

I don't know if I'm just looking in the wrong places here or what, but does active record have a method for retrieving a random object?
Something like?
#user = User.random
Or... well since that method doesn't exist is there some amazing "Rails Way" of doing this, I always seem to be to verbose. I'm using mysql as well.

Most of the examples I've seen that do this end up counting the rows in the table, then generating a random number to choose one. This is because alternatives such as RAND() are inefficient in that they actually get every row and assign them a random number, or so I've read (and are database specific I think).
You can add a method like the one I found here.
module ActiveRecord
class Base
def self.random
if (c = count) != 0
find(:first, :offset =>rand(c))
end
end
end
end
This will make it so any Model you use has a method called random which works in the way I described above: generates a random number within the count of the rows in the table, then fetches the row associated with that random number. So basically, you're only doing one fetch which is what you probably prefer :)
You can also take a look at this rails plugin.

We found that offsets ran very slowly on MySql for a large table. Instead of using offset like:
model.find(:first, :offset =>rand(c))
...we found the following technique ran more than 10x faster (fixed off by 1):
max_id = Model.maximum("id")
min_id = Model.minimum("id")
id_range = max_id - min_id + 1
random_id = min_id + rand(id_range).to_i
Model.find(:first, :conditions => "id >= #{random_id}", :limit => 1, :order => "id")

Try using Array's sample method:
#user = User.all.sample(1)

In Rails 4 I would extend ActiveRecord::Relation:
class ActiveRecord::Relation
def random
offset(rand(count))
end
end
This way you can use scopes:
SomeModel.all.random.first # Return one random record
SomeModel.some_scope.another_scope.random.first

I'd use a named scope. Just throw this into your User model.
named_scope :random, :order=>'RAND()', :limit=>1
The random function isn't the same in each database though. SQLite and others use RANDOM() but you'll need to use RAND() for MySQL.
If you'd like to be able to grab more than one random row you can try this.
named_scope :random, lambda { |*args| { :order=>'RAND()', :limit=>args[0] || 1 } }
If you call User.random it will default to 1 but you can also call User.random(3) if you want more than one.

If you would need a random record but only within certain criteria you could use "random_where" from this code:
module ActiveRecord
class Base
def self.random
if (c = count) != 0
find(:first, :offset =>rand(c))
end
end
def self.random_where(*params)
if (c = where(*params).count) != 0
where(*params).find(:first, :offset =>rand(c))
end
end
end
end
For e.g :
#user = User.random_where("active = 1")
This function is very useful for displaying random products based on some additional criteria

Strongly Recommend this gem for random records, which is specially designed for table with lots of data rows:
https://github.com/haopingfan/quick_random_records
Simple Usage:
#user = User.random_records(1).take
All other answers perform badly with large database, except this gem:
quick_random_records only cost 4.6ms totally.
the accepted answer User.order('RAND()').limit(10) cost 733.0ms.
the offset approach cost 245.4ms totally.
the User.all.sample(10) approach cost 573.4ms.
Note: My table only has 120,000 users. The more records you have, the more enormous the difference of performance will be.
UPDATE:
Perform on table with 550,000 rows
Model.where(id: Model.pluck(:id).sample(10)) cost 1384.0ms
gem: quick_random_records only cost 6.4ms totally

Here is the best solution for getting random records from database.
RoR provide everything in ease of use.
For getting random records from DB use sample, below is the description for that with example.
Backport of Array#sample based on Marc-Andre Lafortune’s github.com/marcandre/backports/ Returns a random element or n random elements from the array. If the array is empty and n is nil, returns nil. If n is passed and its value is less than 0, it raises an ArgumentError exception. If the value of n is equal or greater than 0 it returns [].
[1,2,3,4,5,6].sample # => 4
[1,2,3,4,5,6].sample(3) # => [2, 4, 5]
[1,2,3,4,5,6].sample(-3) # => ArgumentError: negative array size
[].sample # => nil
[].sample(3) # => []
You can use condition with as per your requirement like below example.
User.where(active: true).sample(5)
it will return randomly 5 active user's from User table
For more help please visit : http://apidock.com/rails/Array/sample

Related

Stored procedure using dynamic SQL statement stored in database column

I have a table called Coupon.
This table has a column called query which holds a string.
The query string has some logical conditions in it formatted for a where statement. For example:
coupon1.query
=> " '/hats' = :url "
coupon2.query
=> " '/pants' = :url OR '/shoes' = :url "
I want to write a stored procedure that takes as input 2 parameters: a list of Coupon ids and a variable (in this example, the current URL).
I want the procedure to look up the value of the query column from each Coupon. Then it should run that string in a where statement, plugging in my other parameter (current url), then return any Coupon ids that matches.
Here's how I would expect the procedure to behave given the two coupons above.
Example 1:
* Call procedure with ids for coupon1 and coupon2, with #url = '/hats'
* Expect coupon1 to be returned.
Example 2:
* Call procedure with ids for coupon1 and coupon2, with #url = '/pants'
* Expect coupon2 to be returned.
Example 3:
* Call procedure with ids for coupon1 and coupon2, with #url = '/shirts'
* Expect no ids returned. URL does not match '/hats' for coupon1, and doesn't match '/pants or /shoes' for coupon2.
It's easy to test these out in ActiveRecord. Here is just example 1.
#url = '/hats'
#query = coupon1.query
# "'/hats' = :url"
Coupon.where(#query, url: #url).count
=> 2
# count is non-zero number because the query matches the url parameter.
# Coupon1 passes, its id would be returned from the stored procedure.
'/hats' == '/hats'
#query = coupon2.query
# " '/pants' = :url OR '/shoes' = :url "
Coupon.where(#query, url: #url).count
=> 0
# count is 0 because the query does not match the url parameter.
# Coupon2 does not pass, its id would not be returned from the stored procedure.
'/pants' != '/hats', '/shoes' != '/hats'
You could write this as a loop (I'm in ruby on rails with activerecord) but I need something that performs better - I could potentially have lots of coupons so I can't just check each one directly with a loop. The queries contain complex AND/OR logic so I can't just compare against a list of urls either. But here's some code of a loop that is essentially what I'm trying to translate into a stored procedure.
# assume coupon1 has id 1, coupon2 has id 2
#coupons = [coupon1, coupon2]
#url = '/hats'
#coupons.map do |coupon|
if Coupon.where(coupon.query, url: #url).count > 0
coupon.id
else
nil
end
end
=> [1, nil]
Ok, I've been pondering this one.
Big picture:
A. You have a #url you want to search for to find a match among many potential Coupons
B. A coupon has a URL that might match #url
If that's the true extent of the problem, I think you've really over-complicated things.
coupon1.query
=> ["/hats"]
coupon2.query
=> ["/pants", "/shoes"]
#url = '/hats'
Coupon.where('FIND_IN_SET(:url, query) <> 0')
Or something similar, I'm not a mySQL user myself.
However, this is very possible to achieve and may even have a much better ActiveRecord way to do the query.
UPDATE
Ok, I'm missing something. I can't actually reproduce this in console.
#url = '/hats'
#query = coupon1.query
# "'/hats' = :url"
Coupon.where(#query, url: #url).count
> SELECT * FROM 'coupons' WHERE ( '/hats' = '/hats' )
As you can see from the select statement, this will always return all records. It's the same as writing SELECT * FROM 'coupons' WHERE ( true )
How are you actually performing a valid query?
Sorry to post this in my answer, I wanted good formatting.
If I've got something wrong here, maybe we need to move this to a chat room.
I think you have just enough reputation for me to invite you to a room.
UPDATE2
Since you have to compare #query to each record individually, I think you'll have to loop.
But, I don't think you need to use Coupon.where to accomplish this since you are only comparing one record at a time.
#coupons.map do |coupon|
# don't bother putting nil in the array
next unless coupon.query == #url
coupon.id
end
However, your original question was about performance when scaled, and you know you aren't going to solve that with a loop.
Maybe JSONB instead of String so that you could actually do some SQL.
But, even with JSONB, this is still complicated by wanting your conditions to be evaluated properly.
{
"url": {
"AND": ["/hats", "/shoes"],
"OR": ["/pants"]
},
"logged_in": true,
"is_gold_member": false
}
{
"logged_in": false,
"url": "/hats"
}
{
"url": {
"OR": ["/pants", "/shoes"]
}
}
Ultimately, I think what you're doing with query attributes is going to continue to be your stumbling block. It's very clever, but it's not simple.
If it were my app, I think I would go back to considering my use case and try to find a different strategy to map specific coupons to specific parameters in a more on-the-rails way.

I need some help implementing eager loading

I've recently been tipped off to eager loading and its necessity in improving performance. I've managed to cut a few queries from loading this page, but I suspect that I can trim them down significantly more if I can eager-load the needed records correctly.
This controller needs to load all of the following to fill the view:
A Student
The seminar (class) page that the student is viewing
All of the objectives included in that seminar
The objective_seminars, the join table between objectives and seminars. This includes the column "priority" which is set by the teacher and used in ordering the objectives.
The objective_students, another join table. Includes a column "points" for the student's score on that objective.
The seminar_students, one last join table. Includes some settings that the student can adjust.
Controller:
def student_view
#student = Student.includes(:objective_students).find(params[:student])
#seminar = Seminar.includes(:objective_seminars).find(params[:id])
#oss = #seminar.objective_seminars.includes(:objective).order(:priority)
#objectives = #seminar.objectives.order(:name)
objective_ids = #objectives.map(&:id)
#student_scores = #student.objective_students.where(:objective_id => objective_ids)
#ss = #student.seminar_students.find_by(:seminar => #seminar)
#teacher = #seminar.user
#teach_options = teach_options(#student, #seminar, 5)
#learn_options = learn_options(#student, #seminar, 5)
end
The method below is where a lot of duplicate queries are occurring that I thought were supposed to be eliminated by eager loading. This method gives the student six options so she can choose one objective to teach her classmates. The method looks first at objectives where the student has scored between 75% and 99%. Within that bracket, they are also sorted by "priority" (from the objective_seminars join table. This value is set by the teacher.) If there is room for more, then the method looks at objectives where the student has scored 100%, sorted by priority. (The learn_options method is practically the same as this method, but with different bracket numbers.)
teach_options method:
def teach_options(student, seminar, list_limit)
teach_opt_array = []
[[70,99],[100,100]].each do |n|
#oss.each do |os|
obj = os.objective
this_score = #student_scores.find_by(:objective => obj)
if this_score
this_points = this_score.points
teach_opt_array.push(obj) if (this_points >= n[0] && this_points <= n[1])
end
end
break if teach_opt_array.length > list_limit
end
return teach_opt_array
end
Thank you in advance for any insight!
#jeff - In regards to your question, I don't see where a lot of queries would be happening outside of #student_scores.find_by(:objective => obj).
Your #student_scores object is already an ActiveRecord relation, correct? So you can use .where() on this, or .select{} without hitting the db again. Select will leave you with an array though, rather than an AR Relation, so be careful there.
this_score = #student_scores.where(objectve: obj)
this_score = #student_scores.select{|score| score.objective == obj}
Those should work.
Just some other suggestions on your top controller method - I don't see any guards or defensive coding, so if any of those objects are nil, your .order(:blah) is probably going to error out. Additionally, if they return nil, your subsequent queries which rely on their data could error out. I'd opt for some try()s or rescues.
Last, just being nitpicky, but those first two lines are a little hard to read, in that you could mistakenly interpret the params as being applied to the includes as well as the main object:
#student = Student.includes(:objective_students).find(params[:student])
#seminar = Seminar.includes(:objective_seminars).find(params[:id])
I'd put the find with your main object, followed by the includes:
#student = Student.find(params[:student]).includes(:objective_students)
#seminar = Seminar.find(params[:id]).includes(:objective_seminars)

Django bulk update setting each to different values? [duplicate]

I'd like to update a table with Django - something like this in raw SQL:
update tbl_name set name = 'foo' where name = 'bar'
My first result is something like this - but that's nasty, isn't it?
list = ModelClass.objects.filter(name = 'bar')
for obj in list:
obj.name = 'foo'
obj.save()
Is there a more elegant way?
Update:
Django 2.2 version now has a bulk_update.
Old answer:
Refer to the following django documentation section
Updating multiple objects at once
In short you should be able to use:
ModelClass.objects.filter(name='bar').update(name="foo")
You can also use F objects to do things like incrementing rows:
from django.db.models import F
Entry.objects.all().update(n_pingbacks=F('n_pingbacks') + 1)
See the documentation.
However, note that:
This won't use ModelClass.save method (so if you have some logic inside it won't be triggered).
No django signals will be emitted.
You can't perform an .update() on a sliced QuerySet, it must be on an original QuerySet so you'll need to lean on the .filter() and .exclude() methods.
Consider using django-bulk-update found here on GitHub.
Install: pip install django-bulk-update
Implement: (code taken directly from projects ReadMe file)
from bulk_update.helper import bulk_update
random_names = ['Walter', 'The Dude', 'Donny', 'Jesus']
people = Person.objects.all()
for person in people:
r = random.randrange(4)
person.name = random_names[r]
bulk_update(people) # updates all columns using the default db
Update: As Marc points out in the comments this is not suitable for updating thousands of rows at once. Though it is suitable for smaller batches 10's to 100's. The size of the batch that is right for you depends on your CPU and query complexity. This tool is more like a wheel barrow than a dump truck.
Django 2.2 version now has a bulk_update method (release notes).
https://docs.djangoproject.com/en/stable/ref/models/querysets/#bulk-update
Example:
# get a pk: record dictionary of existing records
updates = YourModel.objects.filter(...).in_bulk()
....
# do something with the updates dict
....
if hasattr(YourModel.objects, 'bulk_update') and updates:
# Use the new method
YourModel.objects.bulk_update(updates.values(), [list the fields to update], batch_size=100)
else:
# The old & slow way
with transaction.atomic():
for obj in updates.values():
obj.save(update_fields=[list the fields to update])
If you want to set the same value on a collection of rows, you can use the update() method combined with any query term to update all rows in one query:
some_list = ModelClass.objects.filter(some condition).values('id')
ModelClass.objects.filter(pk__in=some_list).update(foo=bar)
If you want to update a collection of rows with different values depending on some condition, you can in best case batch the updates according to values. Let's say you have 1000 rows where you want to set a column to one of X values, then you could prepare the batches beforehand and then only run X update-queries (each essentially having the form of the first example above) + the initial SELECT-query.
If every row requires a unique value there is no way to avoid one query per update. Perhaps look into other architectures like CQRS/Event sourcing if you need performance in this latter case.
Here is a useful content which i found in internet regarding the above question
https://www.sankalpjonna.com/learn-django/running-a-bulk-update-with-django
The inefficient way
model_qs= ModelClass.objects.filter(name = 'bar')
for obj in model_qs:
obj.name = 'foo'
obj.save()
The efficient way
ModelClass.objects.filter(name = 'bar').update(name="foo") # for single value 'foo' or add loop
Using bulk_update
update_list = []
model_qs= ModelClass.objects.filter(name = 'bar')
for model_obj in model_qs:
model_obj.name = "foo" # Or what ever the value is for simplicty im providing foo only
update_list.append(model_obj)
ModelClass.objects.bulk_update(update_list,['name'])
Using an atomic transaction
from django.db import transaction
with transaction.atomic():
model_qs = ModelClass.objects.filter(name = 'bar')
for obj in model_qs:
ModelClass.objects.filter(name = 'bar').update(name="foo")
Any Up Votes ? Thanks in advance : Thank you for keep an attention ;)
To update with same value we can simply use this
ModelClass.objects.filter(name = 'bar').update(name='foo')
To update with different values
ob_list = ModelClass.objects.filter(name = 'bar')
obj_to_be_update = []
for obj in obj_list:
obj.name = "Dear "+obj.name
obj_to_be_update.append(obj)
ModelClass.objects.bulk_update(obj_to_be_update, ['name'], batch_size=1000)
It won't trigger save signal every time instead we keep all the objects to be updated on the list and trigger update signal at once.
IT returns number of objects are updated in table.
update_counts = ModelClass.objects.filter(name='bar').update(name="foo")
You can refer this link to get more information on bulk update and create.
Bulk update and Create

find row in ruby array

I have a mysql query that returns this type of data:
{"id"=>1, "serviceCode"=>"1D00", "price"=>9.19}
{"id"=>2, "serviceCode"=>"1D01", "price"=>9.65}
I need to return the id field based on a match of the serviceCode.
i.e. I need a method like this
def findID(serviceCode)
find the row that has the service code and return the ID
end
I was thinking of having a serviceCodes.each do |row| method and loop through and essentially go
if row == serviceCode
return row['id']
end
is there a faster / easier way?
You can use the method Enumerable#find:
service_codes = [
{"id"=>1, "serviceCode"=>"1D00", "price"=>9.19},
{"id"=>2, "serviceCode"=>"1D01", "price"=>9.65}
]
service_codes.find { |row| row['serviceCode'] == '1D00' }
# => {"id"=>1, "serviceCode"=>"1D00", "price"=>9.19}
If you use Rails Active Record as ORM and your Model named Product (only for example),
you can use something like this:
def findID(serviceCode)
Product.select(:id).where(serviceCode: serviceCode).first
end
If you have plain SQL Query in plain ruby class (not recommended), you should change this query to get only the id, as Luiggi mentioned. But aware of SQL Injections if your serviceCode coming from external Requests.

Recalculate Counter Cache of 120k Records [Rails / ActiveRecord]

The following situation:
I have a poi model, which has many pictures (1:n). I want to recalculate the counter_cache column, because the values are inconsistent.
I've tried to iterate within ruby over each record, but this takes much too long and quits sometimes with some "segmentation fault" bugs.
So i wonder, if its possible to do this with a raw sql query?
If, for example, you have Post and Picture models, and post has_many :pictures, you can do it with update_all :
Post.update_all("pictures_count=(Select count(*) from pictures where pictures.post_id=posts.id)")
I found a nice solution on krautcomputing.
It uses reflections to find all counter caches of a project, SQL queries to find only the objects that are inconsistent and use Rails reset_counters to clean things up.
Unfortunately it only works with "conventional" counter caches (no class name, no custom counter cache names) so I refined it:
Rails.application.eager_load!
ActiveRecord::Base.descendants.each do |many_class|
many_class.reflections.each do |name, reflection|
if reflection.options[:counter_cache]
one_class = reflection.class_name.constantize
one_table, many_table = [one_class, many_class].map(&:table_name)
# more reflections, use :inverse_of, :counter_cache etc.
inverse_of = reflection.options[:inverse_of]
counter_cache = reflection.options[:counter_cache]
if counter_cache === true
counter_cache = "#{many_table}_count"
inverse_of ||= many_table.to_sym
else
inverse_of ||= counter_cache.to_s.sub(/_count$/,'').to_sym
end
ids = one_class
.joins(inverse_of)
.group("#{one_table}.id")
.having("MAX(#{one_table}.#{counter_cache}) != COUNT(#{many_table}.id)")
.pluck("#{one_table}.id")
ids.each do |id|
puts "reset #{id} on #{many_table}"
one_class.reset_counters id, inverse_of
end
end
end
end