getting the last 15 dated items -- any way to optimize? - mysql

I have the following rails associations and named scopes:
class Person
has_many :schedulings, :include => :event
class Scheduling
belongs_to :person
belongs_to :event
named_scope :recent, :order => "date DESC", :limit => 15
class Event
has_many :schedulings
I need to get the 15 most recent schedulings for a person. Here is my rails controller code, and the SQL query it generates:
Person.find(1).schedulings.recent
----
SELECT * FROM `schedulings` WHERE (`schedulings`.person_id = 1) ORDER BY date DESC LIMIT 15
Unfortunately this query is getting slower as my table sizes grow (currently at about 10K schedulings and 3K people).
Can anyone give me advice on either A) ways to optimize the existing query, or B) a more direct way to query that information, perhaps with direct SQL instead of Rails Active Record?
My schedulings table has a foreign key index on person_id, and an index on date. I do not have a dual index on person_id and date. Typically a person will not have more than one scheduling per date.
I am comfortable with Rails but a relative novice at SQL.

Creating an index on person_id and date should speed your code up nicely.
Also SELECT * is usually a bad idea speed wise.
Only select the fields you are actually going to use.
Especially if you have memo or blob fields in your table a select * will be slow as hell.

While you should have an index on the key to that table (IMO), more importantly for this case, you should have an index on the "date" field, as that's what you're ordering your results by...
I always like to use rails_footnotes while in development to see how well my queries (and how many) I'm needing for a given view

Related

Rails: Performance issue with joining of records

I have the following setup with ActiveRecord and MySQL:
User has many groups through memberships
Group has many users through memberships
There is also an index by group_id and user_id described in schema.rb:
add_index "memberships", ["group_id", "user_id"], name: "uugj_index", using: :btree
3 different queries:
User.where(id: Membership.uniq.pluck(:user_id))
(3.8ms) SELECT DISTINCT memberships.user_id FROM memberships
User Load (11.0ms) SELECT users.* FROM users WHERE users.id IN (1, 2...)
User.where(id: Membership.uniq.select(:user_id))
User Load (15.2ms) SELECT users.* FROM users WHERE users.id IN (SELECT DISTINCT memberships.user_id FROM memberships)
User.uniq.joins(:memberships)
User Load (135.1ms) SELECT DISTINCT users.* FROM users INNER JOIN memberships ON memberships.user_id = users.id
What is the best approach for doing this? Why the query with join is much slower?
The first query is bad because it sucks all of the user ids into a Ruby array and then sends them back to the database. If you have a lot of users, that's a huge array and a huge amount of bandwidth, plus 2 roundtrips to the database instead of one. Furthermore, the database has no way to efficiently handle that huge array.
The second and third approaches are both efficient database-driven solutions (one is a subquery, and one is a join), but you need to have the proper index. You need an index on the memberships table on user_id.
add_index :memberships, :user_id
The index that you already have, would only be helpful if you wanted to find all of the users that belong to a particular group.
Update:
If you have a lot of columns and data in your users table, the DISTINCT users.* in the 3rd query is going to be fairly slow because MySQL has to compare a lot of data in order to ensure uniqueness.
To be clear: this is not intrinsic slowness with JOIN, it's slowness with DISTINCT. For example: Here is a way to avoid the DISTINCT and still use a JOIN:
SELECT users.* FROM users
INNER JOIN (SELECT DISTINCT memberships.user_id FROM memberships) AS user_ids
ON user_ids.user_id = users.id;
Given all of that, in this case, I believe the 2nd query is going to be the best approach for you. The 2nd query should be even faster than reported in your original results if you add the above index. Please retry the second approach, if you haven't done so yet since adding the index.
Although the 1st query has some slowness issues of its own, from your comment, it's clear that it is still faster than the 3rd query (at least, for your particular dataset). The trade-offs of these approaches is going to depend on your particular dataset in regards to how many users you have and how many memberships you have. Generally speaking, I believe the 1st approach is still the worst even if it ends up being faster.
Also, please note that the index I'm recommending is particularly designed for the three queries you listed in your question. If you have other kinds of queries against these tables, you may be better served by additional indexes, or possibly multi-column indexes, as #tata mentioned in his/her answer.
The query with join is slow because it loads all columns from database despite of the fact that rails don't preload them this way. If you need preloading then you should use includes (or similar) instead. But includes will be even slower because it will construct objects for all associations. Also you should know that
User.where.not(id: Membership.uniq.select(:user_id)) will return empty set in case when there is at least one membership with user_id equal to nil while the query with pluck will return the correct relation.
Below is more efficient solution:
User.exists?(id: Membership.uniq.pluck(:user_id))
join will fetch all the columns from membership table , so it will take more time while in the other queries. Here, you are only fetching rhe user_id from memberships. Calling distinct from users will slow down the query.
I think that you have a problem with the declaration of your indexes.
you declared an index as:
add_index "memberships", ["group_id", "user_id"], name: "uugj_index", using: :btree
If your primary key was ["user_id","group_id"] - you were good to go, but....
Making this in rails is not so trivial.
Therefore in order to query the data with JOIN with Users table - you need to have 2 indexes:
add_index "memberships", ["user_id", "group_id" ]
This is because of the way MySQL handles indexes (they are treated as concatenated strings)
You can read more about it here Multiple-Column Indexes
There are also other techniques to make it faster dependant on all your cases, but the suggested one is the simple one with ActiveRecord
Furthermore - I don't think that you need the .uniq here as the result should be unique anyway because of the terms on the table.
Adding .uniq can make the MySQL to perform unnecessary sorting with filesort and usually it will also put a temporary table on disk.
You can run the command generated by rails directly on the mysql to check it with EXPLAIN
EXPLAIN <your command goes here>
#bublik42 and #user3409950 if I have to chose the Production environment Query then I would go for the First one:
User.where(id: Membership.uniq.pluck(:user_id))
Reason: Because it will use sql DISTINCT keyword to filter out the database result and then SELECT only 'user_id' column from the databse and return those values in a array form([1,2,3..]).
Database level filtration of result is always faster than Active record query object.
For your second query:
User.where(id: Membership.uniq.select(:user_id))
It is same query as with the 'pluck' but with 'select' it will make a active record relation object with single field 'user_id'. In this query it has a overhead of building the active record object as: ([#<Membership user_id: 1>, #<Membership user_id: 2>, ... ], which was not the case for the first query. Though I haven't done any real bench marking for both, but the results are obvious with the steps followed by the queries.
Third case is expensive here because with 'Join' function It will fetch all the columns from memberships table and it will take more time to process the filtration of the result in comparison to other queries.
Thank you
SELECT DISTINCT users.*
FROM users
INNER JOIN memberships
ON memberships.user_id = users.id
is slower because it is performed something like this:
Go through all of one table, collecting stuff as it goes.
for each entry from step 1, reach into the other table.
put that stuff into a tmp table
dedup (DISTINCT) that table to deliver the results
If there are 1000 users and each has 100 memberships, then the table in step 3 will have 100000 rows, even though the answer will have only 1000 rows.
This is a "semi-join" and only checks that the user has at least one membership; it is much more efficient:
SELECT users.*
FROM users -- no DISTINCT needed
WHERE EXISTS
( SELECT *
FROM memberships ON memberships.user_id = users.id
)
If you don't really need that check, then this would be still faster:
SELECT users.*
FROM users
If Rails can't generate these queries, then grumble at it.
Here is a great example, demonstrating Include VS Join :
http://railscasts.com/episodes/181-include-vs-joins
Please try with includes. I'm damn sure. It will take comparatively less time.
User.uniq.includes(:memberships)

Get Relation of Followed or Public User

I am working on a social app. I have users that can have private accounts. Users can also follow each other. What is the fastest way using ActiveRecord or pure SQL to fetch all the records of a has_many on a User that either belong to someone I am following or belong to a public user. In pseudo code:
User.get_all_posts_for_users_being_followed_by(me) + User.get_all_posts_for_public_users
I have this:
SELECT `posts`.*
FROM `posts`
WHERE ( user_id IN (SELECT id
FROM users
WHERE visibility = 'all'
UNION
SELECT followable_id
FROM follows
WHERE followable_type = "User"
AND follower_type = "User"
AND follower_id = 4
AND follows.status = 1) )
But I was hoping there might be a faster way to handle that, or a way to do it with Rails query methods.
You can perform your clear query with ActiveRecord, but I recommend to use this pure version for_now, because it's very easy to modificate it now. You need pay attention on this:
The query might be faster, if you add indexes
add_index :users, :visibility, :name => 'visibility_ix'
Selecting all columns with the * wildcard will cause the query's meaning and behavior to change if the table's schema changes, and might cause the query to retrieve too much data.
IN() and NOT IN() subqueries are poorly optimized. MySQL executes the subquery as a dependent subquery for each row in the outer query. This is a frequent cause of serious performance problems in MySQL 5.5 and older versions. The query probably should be rewritten as a JOIN or a LEFT OUTER JOIN, respectively.

Instructing MySQL to apply WHERE clause to rows returned by previous WHERE clause

I have the following query:
SELECT dt_stamp
FROM claim_notes
WHERE type_id = 0
AND dt_stamp >= :dt_stamp
AND DATE( dt_stamp ) = :date
AND user_id = :user_id
AND note LIKE :click_to_call
ORDER BY dt_stamp
LIMIT 1
The claim_notes table has about half a million rows, so this query runs very slowly since it has to search against the unindexed note column (which I can't do anything about). I know that when the type_id, dt_stamp, and user_id conditions are applied, I'll be searching against about 60 rows instead of half a million. But MySQL doesn't seem to apply these in order. What I'd like to do is to see if there's a way to tell MySQL to only apply the note LIKE :click_to_call condition to the rows that meet the former conditions so that it's not searching all rows with this condition.
What I've come up with is this:
SELECT dt_stamp
FROM (
SELECT *
FROM claim_notes
WHERE type_id = 0
AND dt_stamp >= :dt_stamp
AND DATE( dt_stamp ) = :date
AND user_id = :user_id
)
AND note LIKE :click_to_call
ORDER BY dt_stamp
LIMIT 1
This works and is extremely fast. I'm just wondering if this is the right way to do this, or if there is a more official way to handle it.
It shouldn't be necessary to do this. The MySQL optimizer can handle it if you have multiple terms in your WHERE clause separated by AND. Basically, it knows how to do "apply all the conditions you can using indexes, then apply unindexed expressions only to the remaining rows."
But choosing the right index is important. A multi-column index is best for a series of AND terms than individual indexes. MySQL can apply index intersection, but that's much less effective than finding the same rows with a single index.
A few logical rules apply to creating multi-column indexes:
Conditions on unique columns are preferred over conditions on non-unique columns.
Equality conditions (=) are preferred over ranges (>=, IN, BETWEEN, !=, etc.).
After the first column in the index used for a range condition, subsequent columns won't use an index.
Most of the time, searching the result of a function on a column (e.g. DATE(dt_stamp)) won't use an index. It'd be better in that case to store a DATE data type and use = instead of >=.
If the condition matches > 20% of the table, MySQL probably will decide to skip the index and do a table-scan anyway.
Here are some webinars by myself and my colleagues at Percona to help explain index design:
Tools and Techniques for Index Design
MySQL Indexing: Best Practices
Advanced MySQL Query Tuning
Really Large Queries: Advanced Optimization Techniques
You can get the slides for these webinars for free, and view the recording for free, but the recording requires registration.
Don't go for the derived table solution as it is not performant. I'm surprised about the fact that having = and >= operators MySQL is going for the LIKE first.
Anyway, I'd say you could try adding some indexes on those fields and see what happens:
ALTER TABLE claim_notes ADD INDEX(type_id, user_id);
ALTER TABLE claim_notes ADD INDEX(dt_stamp);
The latter index won't actually improve the search on the indexes but rather the sorting of the results.
Of course, having an EXPLAIN of the query would help.

Rails - index for a query over three table columns

I've got a query that I use often:
Site.where("mobile_visible = true AND (created_at > :date OR updated_at > :date)", :date => "12-04-30")
It produces this sql
SELECT `sites`.* FROM `sites` WHERE (mobile_visible = true AND (created_at > '12-04-30' OR updated_at > '12-04-30'))
I want to add an index or indexes to make this query more efficient. Should I add 3 indexes for the 3 columns separately or 1 index indexing all three columns separately?
The best approach is to construct an index that hits all elements of you're where clause -- not just one.
Databases generally can't use more than one index at a time for a particular portion of a query. If you add three indexes, the database will try to determine which one gives the greatest benefit and it will pick that one. It may or may not choose the best one depending on how the query execution plan is determined.
For this situation, I'd recommend adding the index:
add_index :sites, [:mobile_visible, :created_at, :updated_at]

ActiveRecord/MySQL query to return grouped set of objects

I have a model called a Statement that belongs to a Member. Given an array of members, I want to create a query that will return the most recent statement for each of those members (preferably in one nice clean query).
I thought I might be able to achieve this using group and order - something like:
# #members is already in an array
#statements = Statement.all(
:conditions => ["member_id IN(?)", #members.collect{|m| m.id}],
:group => :member_id,
:order => "created_at DESC"
)
But unfortunately the above always returns the oldest statement for each member. I've tried swapping the order option round, but alas it always returns the oldest statement of the group rather than the most recent.
I'm guessing group_by isn't the way to achieve this - so how do I achieve it?
PS - any non Ruby/Rails people reading this, if you know how to achieve this in raw MySQL, then fire away.
In MySQL directly, you need a sub-query that returns the maximum created_at value for each member, which can then be joined back to the Statement table to retrieve the rest of the row.
SELECT *
FROM Statement s
JOIN (SELECT
member_id, MAX(created_at) max_created_at
FROM Statement
GROUP BY member_id
) latest
ON s.member_id = latest.member_id
AND s.created_at = latest.max_created_at
If you are using Rails 3 I would recommend taking a look at the new ActiveRecord query syntax. There is an overview at http://guides.rubyonrails.org/active_record_querying.html
I am pretty certain you could do what you are trying to do here without writing any SQL. There is an example in the "Grouping" section on that page which looks similar to what you are trying to do.