Rails query chokes on includes - mysql

This query executes just fine:
p = PlayersToTeam.select("id").joins(:player).limit(10).order("players.FirstName")
This query causes my whole system to come to a screeching halt:
p = PlayersToTeam.select("id").includes(:player).limit(10).order("players.FirstName")
Here are the models:
class PlayersToTeam < ActiveRecord::Base
belongs_to :player
belongs_to :team
accepts_nested_attributes_for :player
end
class Player < ActiveRecord::Base
has_many :players_to_teams
has_many :teams, through: :players_to_teams
end
As far as I can tell, the includes does a LEFT JOIN and joins does an INNER JOIN. The query spit out (for joins) from Rails is:
SELECT players_to_teams.id FROM `players_to_teams` INNER JOIN `players` ON `players`.`id` = `players_to_teams`.`player_id` ORDER BY players.FirstName LIMIT 10
Which executes just fine on the command line.
SELECT players_to_teams.id FROM `players_to_teams` LEFT JOIN `players` ON `players`.`id` = `players_to_teams`.`player_id` ORDER BY players.FirstName LIMIT 10
also executes just fine, it just takes twice as long.
Is there an efficient way I can sort the players_to_teams records via players? I have an index on FirstName for players.
EDIT
Turns out the query required heavy optimization to run even half decently. Splitting the query was the best solution short of restructuring the Data or customizing the query

You also might consider to split it into 2(3) queries. First - to get ids by sorting with joins:
players_to_teams = PlayersToTeam.select("id").joins(:player).limit(10).order("players.FirstName")
Second (which is inside contains 2 queries) - to get PlayersToTeams with players pre-loaded.
players_to_teams = PlayersToTeam.include(:player).where(:id => players_to_teams.map(&:id))
So after that you will have fully initialized players_to_teams with players loaded and initialized.

One thing to note is that include will add a second db access to do the preloading. You should check what that one looks like (it should contain a big IN statement on the player_ids from players_to_teams).
As for how to avoid using include, if you just need the name from players, you can do it like this:
PlayersToTeam.select("players_to_teams.id, players.FirstName AS player_name").joins(:player).limit(10).order("players.FirstName")

Related

Rails 3: What is the best way to update a column in a very large table

I want to update all of a column in a table with over 2.2 million rows where the attribute is set to null. There is a Users table and a Posts table. Even though there is a column for num_posts in User, only about 70,000 users have that number populated; otherwise I have to query the db like so:
#num_posts = #user.posts.count
I want to use a migration to update the attributes and I'm not sure whether or not it's the best way to do it. Here is my migration file:
class UpdateNilPostCountInUsers < ActiveRecord::Migration
def up
nil_count = User.select(:id).where("num_posts IS NULL")
nil_count.each do |user|
user.update_attribute :num_posts, user.posts.count
end
end
def down
end
end
In my console, I ran a query on the first 10 rows where num_posts was null, and then used puts for each user.posts.count . The total time was 85.3ms for 10 rows, for an avg of 8.53ms. 8.53ms*2.2million rows is about 5.25 hours, and that's without updating any attributes. How do I know if my migration is running as expected? Is there a way to log to the console %complete? I really don't want to wait 5+ hours to find out it didn't do anything. Much appreciated.
EDIT:
Per Max's comment below, I abandoned the migration route and used find_each to solve the problem in batches. I solved the problem by writing the following code in the User model, which I successfully ran from the Rails console:
def self.update_post_count
nil_count = User.select(:id).where("num_posts IS NULL")
nil_count.find_each { |user|
user.update_column(:num_posts, user.posts.count) if user.posts
}
end
Thanks again for the help everyone!
desc 'Update User post cache counter'
task :update_cache_counter => :environment do
users = User.joins('LEFT OUTER JOIN "posts" ON "posts.user_id" = "users.id"')
.select('"users.id", "posts.id", COUNT("posts.id") AS "p_count"')
.where('"num_posts" IS NULL')
puts "Updating user post counts:"
users.find_each do |user|
print '.'
user.update_attribute(:num_posts, user.p_count)
end
end
First off don't use a migration for what is essentially a maintenance task. Migrations should mainly alter the schema of your database. Especially if it is long running like in this case and may fail midway resulting in a botched migration and problems with the database state.
Then you need to address the fact that calling user.posts is causing a N+1 query and you instead should join the posts table and select a count.
And without using batches you are likely to exhaust the servers memory quickly.
You can use update_all and subquery to do this.
sub_query = 'SELECT count(*) FROM `posts` WHERE `posts`.`user_id` = `users`.`id`'
User.where('num_posts IS NULL').update_all('num_posts = (#{sub_query})')
It will take only seconds instead of hours.
If so, you may not have to find a way to log something.

Rails 4 ActiveRecord - how to see how is interpreted a database query?

I have these models:
teacher
class Teacher < ActiveRecord::Base
has_many :days
end
day
class Day < ActiveRecord::Base
belongs_to :teacher
end
And running these query:
active_teachers = Teacher.joins(:days).where("teacher.id" => found_teachers.pluck(:teacher_id).uniq, "days.day_name" => selected_day)
What the query (should) does: found_teachers is an array of all teachers with duplications, remove the duplicity and chose only those teachers that have classes on a respective day (selected_day contains a string, Monday for example).
Because the amount of data in the variable active_teachers is so big that I can't manually go record by record (and I am not sure that I built this query properly and it does exactly what I need), I am trying to find out how is this query translated to SQL from ActiveRecord.
Usually I see everything in the terminal where is running server for the Rails app, but as of now, I don't see there this query stated.
So the question is, how can I see how the ActiveRecord query is translated to SQL?
Thank you in advance.
To get details from a query you're typing, you can do:
query.to_sql
query.explain
You can use
ActiveRecord::Base.logger = Logger.new STDOUT
and run your query in rails console. So it prints out the sql queries in the console

Forming of SQL query in Rails Application

I am new to Ruby on Rails. Now I am working on performance issues of a Rails application. I am using New Relic rpm to find out the bottlenecks of the code. While doing this I find something that I cannot figure out. The problem is that here in my Rails application I have used two models A, B and C where model B has two properties: primary key of A and primary key of C like following:
class B
include DataMapper::Resource
belongs_to :A, :key=>true
belongs_to :C, :key=>true
end
Model of A is as follows:
class A
include DataMapper::Resource
property :prop1
...
has n, :bs
has n, :cs, :through => :bs
end
While issuing the following statement a.find(:c.id=>10) then internally it is executing the following SQL query:
select a.prop1, a.prop2,... from a INNER JOIN b on a.id = b.a_id INNER JOIN c on b.c_id = c.id where (c.id=10) GROUP BY a.prop1, a.prop2,....[here in group by all the properties that has been mentioned in select appears, I don't know why]
And this statement is taking too much time during web transaction. Interesting thing is that, when I am executing the same auto generated query in mysql prompt of my terminal it's taking very less amount of time. I think it's because of mentioning so many fields in group by clause. I cannot understand how the query is being formed. If anyone kindly help me to figure this out and optimize this, I will be really grateful. Thank you.
I assume you have you model associations properly configured, something like this:
class A < ActiveRecord
has_many :B
has_many :C, through: :B
end
class B < ActiveRecord
belongs_to :A
belongs_to :C
end
class C < ActiveRecord
has_many :B
has_many :A, through: :B
end
then you could simply call:
a.c.find(10) #mind the plural forms though
You will get better performance this way.

Many to Many NOT in

I'm trying to do a multi-table join that has a NOT IN component. Tables are
Post -> Term Relationship -> Term
Post
has_many :term_relationships
has_many :terms, :through => :term_relationships
TermRelationship
belongs_to :post
belongs_to :term
Term
has_many :term_relationships
has_many :posts, :through => :term_relationships
The goal is to get all posts except for those in "featured" let's say. My current query would looks like:
WpPost.includes(:terms).where("terms.term NOT IN (?)", ["featured"])
This works great if the only term that it has attached is "featured". If the post belongs to "featured" and "awesome" it will still show because of "awesome".
Anyway to exclude a row entirely? Will it require a subquery? And if it does, how would I go about doing that in rails?
Thanks all!
Justin
You misuse the includes. It's for eager loading, not for joining!
But you're right about the approach. It can be used in your case. But Rails won't issue nested request for NOT IN (?) even if it would be logical. You'll get 2 queries instead (you'll get NOT IN (id1, id2....,) instead of NOT IN (SELECT ....)).
So I would recommend you to use the squeel gem:
regular AR code (can also be prettified with squeel):
featured_posts = WpPost.joins(:terms).where(terms:{term: ['featured']}).uniq
and then use the sqeel's power:
WpPost.where{id.not_in featured_posts}
(in and not_in are also aliased as >> and << but I didn't want to scary anybody)
Note the using blocks and absence of symbols.
Some measurements based on Chinook Database under SQLite:
> Track.all
Track Load (35.0ms) SELECT "Track".* FROM "Track"
Relation with joins and like:
oldie = Track.joins{playlists}.where{playlists.name.like_any %w[%classic% %90%]}
Here's NOT IN:
> Track.where{trackId.not_in oldie}.all
Track Load (37.5ms) SELECT "Track".* FROM "Track" WHERE "Track"."trackId"
NOT IN (SELECT "Track"."TrackId" FROM "Track" INNER JOIN "PlaylistTrack" ON
"PlaylistTrack"."TrackId" = "Track"."TrackId" INNER JOIN "Playlist" ON
"Playlist"."PlaylistId" = "PlaylistTrack"."PlaylistId"
WHERE (("Playlist"."name" LIKE '%classic%' OR "Playlist"."name" LIKE '%90%')))
FYI:
Track.where{trackId.not_in oldie}.count # => 1971
Track.count # => 3503
# join table:
PlaylistTrack.count # => 8715
Conclusion: I don't see the overhead caused by NOT IN. 35.0 vs 37.5 isn't noticeable difference. Few times 35.0 became 37.5 and vice verse.
One option is to do an OUTER JOIN and put the featured argument there. Then you just select all posts where no term was joined. I don't know any way of doing it in a plain "Rails way" but with some extra SQL you could do it like this:
Post.joins("LEFT OUTER JOIN term_relationships ON posts.id = term_relationships.post_id
LEFT OUTER JOIN terms ON term_relationships.term_id = terms.id AND terms.term = ?", "featured").
where("terms.id IS NULL")

MySQL datamapper adapter is chaining queries instead of joining

I'm using DataMapper (the ruby gem) as an ORM to a mysql database. (dm-core 1.1.0, do-mysql-adapter 1.1.0, do_mysql 0.10.6)
I'm writing an application that has two tables: a log of disk usage over time, and a "current usage" table containing foreign keys with the "latest" disk usage for easy reference. The DataMapper classes are Quota and LatestQuota, with a simple schema:
class Quota
include DataMapper::Resource
property :unique_id, Serial, :key => true
property :percentage, Integer
... (more properties)
end
class LatestQuota
include DataMapper::Resource
belongs_to :quota, :key => true
end
In my code I want to find all the entries in the LatestQuota table that correspond with a quota with a percentage higher than 95. I'm using the following datamapper query:
quotas = LatestQuota.all(:quota => {:percentage.gte => threshold})
...later...
quotas.select{|q| some_boolean_function?(q)}
Whereas some_boolean_function is something that filters out the results in a manner that DataMapper can't know about, hence why I need to call ruby's select().
But it ends up calling the following SQL queries (reported from DM's debug output:)
SELECT `unique_id` FROM `quota` WHERE `percentage` >= 95
then later:
SELECT `quota_unique_id` FROM `latest_quota`
WHERE `quota_unique_id` IN (52, 78, 82, 232, 313, 320…. all the unique id's from the above query...)
This is a ridiculously suboptimal query, so I think I'm doing something wrong. The quota table has millions of records in it (historical data) versus the 15k or so records in latest_quota, and selecting all quota records first and then selecting latest_quota records out of the results is exactly the wrong way to do it.
What I would like it to do is something to the effect of:
SELECT q.* from quota q
INNER JOIN latest_quota lq
ON lq.quota_unique_id=q.unique_id
WHERE q.percentage >= 95;
Which takes .01 seconds with my current data, instead of the 5 minutes or so it takes DataMapper to do its query. Any way to coerce it to do what I want? Do I have my relations wrong? Am I querying it wrong?
For some reason nested-Hash-style queries will always perform sub-selects. To force INNER JOINs, use String query-paths: LatestQuota.all('quota.percentage.gte' => threshold)