How do you use a conditional or in thinking sphinx?
The situation is:
I have a Message model with a sender_id and recipient_id attribute. I would like to compose this query:
Message.where("sender_id = ? OR recipient_id = ?", business_id, business_id)
Right now, I'm searching twice, one for all the messages that has recipient_id = business_id and another to return all messages that has sender_id = business_id. Then I just merge them.
I feel that there's a more efficient way to do this.
EDIT - Adding index file
ThinkingSphinx::Index.define :message, with: :active_record, delta: ThinkingSphinx::Deltas::DelayedDelta do
# fields
indexes body
# attributes
has job_id
has sender_id
has recipient_id
end
Sphinx doesn't allow for OR logic between attributes, only fields. However, a workaround would be to combine the two columns into a third attribute:
has [sender_id, recipient_id], :as => :business_ids, :multi => true
And then you can search on the combined values like so:
Message.search :with => {:business_ids => business_id}
Related
Lets say I have a model Account with columns :email, :name, etc. Emails are not unique. What I want is to add a chainable scope that selects rows with distinct emails but with all other fields. By 'chainable' I mean that I could do like this: Account.uniq_by_email.where(bla bla).
What I've tried:
With select:
def self.uniq_by_email
select('distinct email')
end
Doesn't work for me as it selects only email field.
With group
def self.uniq_by_email
group(:email)
end
This almost what I want: I can chain and it selects all fields. But there is a strange thing about count method: it, as you already guessed, returns a hash of email counts. But I want it to return "simple" ActiveRecord_Relation where count returns just a count not a hash. Is that possible to achieve?
My basic idea is to select only the first entry in every group of email.
Make it easy, I create a scope like this instead of using a class method:
scope :uniq_by_email, -> { joins("JOIN (
SELECT MIN(id) as min_id
FROM accounts
GROUP BY email
) AS temp
ON temp.min_id = id") }
From this you can do something like chainable as you described:
Account.uniq_by_email.where(bla bla)
Well, you can use group by, but then u can count as,
Account.uniq_by_email.where(bla bla).flatten.count
I have 3 model as following :
(I'm also describing the database structure in case of anyone not familiar with RubyOnRails is able to help me)
Thread.rb
class Thread
has_many :thread_profils
has_many :profils, :through => :thread_profils
end
Table threads
integer: id (PK)
ThreadProfil.rb
class ThreadProfil
belongs_to :thread
belongs_to :profil
end
Table thread_profils
integer: id (PK)
integer: thread_id (FK)
integer: profil_id (FK)
Profil.rb
class Profil
end
Table profils
integer: id (PK)
In one of my controllers I am looking for the most optimized way to find the Threads IDs that has include exactly two profils (the current one, a some other one) :
I got my current_profil.id and another profil.id and I can't figure out a simple way to get that collection/list/array of Thread.id, while processing the fewer SQL request.
For now the only solution I found is the following one, which I don't consider as being "optimized" at all.
thread_profils = ThreadProfil.where(:profil_id => current_profil.id)
thread_ids = thread_profils.map do | association |
profils = Thread.find(association.thread_id).profils.map do | profil |
profil.id if profil.id != current_profil.id
end.compact
if (profils - [id]).empty?
association.thread_id
end
end.compact
That is processing the following SQL queries :
SELECT `thread_profils`.* FROM `thread_profils` WHERE `thread_profils`.`profil_id` = [current_profil.id]
And for each result :
SELECT `threads`.* FROM `threads` WHERE `threads`.`id` = [thread_id] LIMIT 1
SELECT `profils`.* FROM `profils` INNER JOIN `thread_profils` ON `profils`.`id` = `thread_profils`.`profil_id` WHERE `thread_profils`.`thread_id` = [thread_id]
Is there any light way to do that, either with rails or directly with SQL ?
Thanks
I found the following query in sql:
SELECT array_agg(thread_id) FROM "thread_profils" WHERE "thread_profils"."profil_id" = 1 GROUP BY profil_id HAVING count(thread_id) =2
note: array_agg is a postgres aggregate function. Mysql has group_concat which would give you a comma-delimited string of IDs instead of an array.
This sql was generated by the following Rails code:
ThreadProfil.select('array_agg(mythread_id)').where(profil_id: 1).group(:profil_id).having("count(thread_id) =2").take
This generates the right query, but the result is not meaningful as a ThreadProfil - still, you might be able to work further with this to get what you want.
With Rails 3, I am using the following kind of code to query a MySQL database:
MyData.joins('JOIN (SELECT id, name FROM sellers) AS Q
ON seller_id = Q.id').
select('*').
joins('JOIN (SELECT id, name FROM users) AS T
ON user_id = T.id').
select("*").each do |record|
#..........
Then, a bit further down, I try to access a "name" with this code: (note that both sellers and users have a name column).
str = record.name
This line is giving me a "user name" instead of a "seller name", but shouldn't it give nothing? Since I joined multiple tables with a name column, shouldn't I be get an error like "column 'name' is ambiguous"? Why isn't this happening?
And by the way, the code behaves the same way whether I include that first "select('*')" line or not.
Thank you.
Firstly, there's no reason to call select twice - only the last call will actually be used. Secondly, you should not be using select("*"), because the SQL database (and Rails) will not rename the ambiguous columns for you. Instead, use explicit naming for the extra columns that you need:
MyData.joins('JOIN (SELECT..) AS Q ON ...', 'JOIN (SELECT...) AS T ON ...').
select('my_datas.*, T.name as t_name, Q.name as q_name').
each do |record|
# do something
end
Because of this, there's no reason to make a subquery in your JOIN statements:
MyData.joins('JOIN sellers AS Q ON ...', 'JOIN users AS T ON ...').
And finally, you should already have belongs_to associations set up for seller and user. That would mean that you can just do this:
MyData.joins(:seller, :user).
select("my_datas.*, sellers.name as seller_name, users.name as user_name").
each do |record|
# do something
end
Now you can call record.seller_name and record.user_name without any ambiguity.
Is there a way in Active Record to construct a single query that will do a conditional join for multiple primary keys?
Say I have the following models:
Class Athlete < ActiveRecord::Base
has_many :workouts
end
Class Workout < ActiveRecord::Base
belongs_to :athlete
named_scope :run, :conditions => {:type => "run"}
named_scope :best, :order => "time", :limit => 1
end
With that, I could generate a query to get the best run time for an athlete:
Athlete.find(1).workouts.run.best
How can I get the best run time for each athlete in a group, using a single query?
The following does not work, because it applies the named scopes just once to the whole array, returning the single best time for all athletes:
Athlete.find([1,2,3]).workouts.run.best
The following works. However, it is not scalable for larger numbers of Athletes, since it generates a separate query for each Athlete:
[1,2,3].collect {|id| Athlete.find(id).workouts.run.best}
Is there a way to generate a single query using the Active Record query interface and associations?
If not, can anyone suggest a SQL query pattern that I can use for find_by_SQL? I must confess I am not very strong at SQL, but if someone will point me in the right direction I can probably figure it out.
To get the Workout objects with the best time:
athlete_ids = [1,2,3]
# Sanitize the SQL as we need to substitute the bind variable
# this query will give duplicates
join_sql = Workout.send(:santize_sql, [
"JOIN (
SELECT a.athlete_id, max(a.time) time
FROM workouts a
WHERE a.athlete_id IN (?)
GROUP BY a.athlete_id
) b ON b.athlete_id = workouts.athlete_id AND b.time = workouts.time",
athlete_ids])
Workout.all(:joins => join_sql, :conditions => {:athlete_id => })
If you require just the best workout time per user then:
Athlete.max("workouts.time", :include => :workouts, :group => "athletes.id",
:conditions => {:athlete_id => [1,2,3]}))
This will return a OrderedHash
{1 => 300, 2 => 60, 3 => 120}
Edit 1
The solution below avoids returning multiple workouts with same best time. This solution is very efficient if athlete_id and time columns are indexed.
Workout.all(:joins => "LEFT OUTER JOIN workouts a
ON workouts.athlete_id = a.athlete_id AND
(workouts.time < b.time OR workouts.id < b.id)",
:conditions => ["workouts.athlete_id = ? AND b.id IS NULL", athlete_ids]
)
Read this article to understand how this query works. Last check (workouts.id < b.id) in the JOIN ensures only one row is returned when there are more than one matches for the best time. When there are more than one match to the best time for an athlete, the workout with the highest id is returned(i.e. the last workout).
Certainly following will not work
Athlete.find([1,2,3]).workouts.run.best
Because Athlete.find([1,2,3]) returns an array and you can't call Array.workouts
You can try something like this:
Workout.find(:first, :joins => [:athlete], :conditions => "athletes.id IN (1,2,3)", :order => 'workouts.time DESC')
You can edit the conditions according to your need.
I have a model called HeroStatus with the following attributes:
id
user_id
recordable_type
hero_type (can be NULL!)
recordable_id
created_at
There are over 100 hero_statuses, and a user can have many hero_statuses, but can't have the same hero_status more than once.
A user's hero_status is uniquely identified by the combination of recordable_type + hero_type + recordable_id. What I'm trying to say essentially is that there can't be a duplicate hero_status for a specific user.
Unfortunately, I didn't have a validation in place to assure this, so I got some duplicate hero_statuses for users after I made some code changes. For example:
user_id = 18
recordable_type = 'Evil'
hero_type = 'Halitosis'
recordable_id = 1
created_at = '2010-05-03 18:30:30'
user_id = 18
recordable_type = 'Evil'
hero_type = 'Halitosis'
recordable_id = 1
created_at = '2009-03-03 15:30:00'
user_id = 18
recordable_type = 'Good'
hero_type = 'Hugs'
recordable_id = 1
created_at = '2009-02-03 12:30:00'
user_id = 18
recordable_type = 'Good'
hero_type = NULL
recordable_id = 2
created_at = '2009-012-03 08:30:00'
(Last two are not a dups obviously. First two are.) So what I want to do is get rid of the duplicate hero_status. Which one? The one with the most-recent date.
I have three questions:
How do I remove the duplicates using a SQL-only approach?
How do I remove the duplicates using a pure Ruby solution? Something similar to this: Removing "duplicate objects".
How do I put a validation in place to prevent duplicate entries in the future?
For an SQL only approach, I would use this query - (I'm assuming the id's are unique.)
DELETE FROM HeroStatus WHERE id IN
(SELECT id FROM
(SELECT user_id, recordable_type, hero_type, recordable_id, MAX(created_at)
GROUP BY del.user_id, recordable_type, hero_type, recordable_id
HAVING Count(id)>1) AS del
INNER JOIN HeroStatus AS hs ON
hs.user_id=del.user_id AND hs.recordable_type=del.recordable_type
AND hs.hero_type=del.hero_type AND hs.recordable_id=del.recordable_id
AND hs.created_at = del.created_at)
A bit of a monster! The query finds all duplicates using the natural key (user_id, recordable_type, hero_type) and selects the one with the largest created_at value (most recently created). It then finds the IDs of those rows (by joining back to the main table) and deletes rows with that id.
(Please try this on a copy of the table first and verify you get the results you want! :-)
To prevent this happening in future, add a unique index or constraint over the columns user_id, recordable_type, hero_type, recordable_id. E.g.
ALTER TABLE HeroStatus
ADD UNIQUE (user_id, recordable_type, hero_type, recordable_id)
EDIT:
You add (and remove) this index within a migration like this:
add_index(:HeroStatus, [:user_id, :recordable_type, :hero_type, :recordable_id], :unique => true)
remove_index(:HeroStatus, :column => [:user_id, :recordable_type, :hero_type, :recordable_id], :unique => true)
Or, if you want to explicitly name it:
add_index(:HeroStatus, [:user_id, :recordable_type, :hero_type, :recordable_id], :unique => true, :name => :my_unique_index)
remove_index(:HeroStatus, :name => :my_unique_index)
Sometimes you need to just roll up your sleeves and do some serious SQL to kill off all the ones you don't want. This is easy if it's a one shot thing, and not too hard to roll into a Rake task you can fire on demand.
For instance, to select all the distinct status records, it is reasonable to use something like the following:
SELECT id FROM hero_statuses GROUP BY user_id, hero_type, recordable_id
Given that these are the sufficiently unique records in your set, you can go about removing all the ones you don't want:
DELETE FROM hero_statuses WHERE id NOT IN (SELECT id FROM hero_statuses GROUP BY user_id, hero_type, recordable_id)
As with any operation that involves DELETE FROM, I hope you don't just fire this off on your production data without the usual precautions of backing things up.
As to how to prevent this in the future, if these are unique constraints, create a unique index on them:
add_index :hero_statuses, [ :user_id, :hero_type, :recordable_id ], :unique => true
This will generate ActiveRecord exceptions when you attempt to introduce a duplicate record. One benefit of a unique index is that you can make use of the "INSERT IGNORE INTO..." or "INSERT ... ON DUPLICATE KEY ..." features to recover from potential duplications.