In my web app I need to perform 3 types of searching on items table with the following conditions:
items.is_public = 1 (use title field for indexing) - a lot of results can be retrieved(cardinality is much higher than in other cases)
items.category_id = {X} (use title + private_notes fields for indexing) - usually less than 100 results
items.user_id = {X} (use title + private_notes fields for indexing) - usually less than 100 results
I can't find a way to make Sphinx work in all these cases, but it works well in 1st case.
Should I use Sphinx just for the 1st case and use plain old "slow" FULLTEXT searching in MySQL(at least because of lower cardinality in 2-3 cases)?
Or is it just me and Sphinx can do pretty much everything?
Without full knowledge of your models I might be missing something, but how's this:
class item < ActiveRecord::Base
define_index do
indexes :title
indexes :private_notes
has :is_public, :type => :boolean
has :category_id
has :user_id
end
end
1)
Item.search(:conditions => {:title => "blah"}, :with => {:is_public => true})
2)
Item.search("blah", :with => {:category_id => 1})
3)
Item.search("blah", :with => {:user_id => 196})
Related
Hitting an OutOfBoundsError as a consequence of misunderstanding the proper configuration syntax (which may also be a by-product of legacy syntax).
The manual suggests a Class search taking on WillPaginate styled parameters. Having many fields to draw from, the model is defined as
class AziendaSearch < BaseSearch
set_per_page 10000
accept :terms
end
the set_per_page was put at a high level because if I set it at the target of 100, the will_paginate links do not show up.
the controller may be excessively convoluted to include the ordering parameter, and thus result in a two-step process:
#azienda_search = AziendaSearch.new params
#results = #azienda_search.search
#aziendas = Azienda.order('province_id ASC').where('id IN (?)', #results).paginate :page => params[:page], :per_page => 100
the view paginates on the basis of #aziendas:
<%= will_paginate #aziendas, :previous_label => "precedente ", :next_label => " successiva" %>
My suspicion is that the search model is not properly set, but the syntax is not obvious to me given the manual's indications. page params[:page] certainly does not work...
Update
BaseSearch is a Sphinx method and was in fact inherited from an older version of this applications (rails2.x...). So this may be hanging around creating all sort of syntaxic confusion.
In fact, following the manual, I am now fully uncertain as to how to best makes these statements. Should a seperate class be defined for AziendaSearch ? If not, where should the Azienda.search bloc be invoked... in the controller as such?
#azienda_search = Azienda.search(
:max_matches => 100_000,
:page => params[:page],
:per_page => 100,
:order => "province_id ASC"
)
#results = #azienda_search.search
I'm not sure what BaseSearch is doing with set_per_page (that's certainly not a Thinking Sphinx method), but it's worth noting that Sphinx defaults to a maximum of 1000 records. It is possible to configure Sphinx to return more, though - you need to set max_matches in your config/thinking_sphinx.yml to your preferred limit (per environment):
production:
max_matches: 100000
And also set the limit on the relevant search requests:
Azienda.search(
:max_matches => 100_000,
:page => params[:page],
:per_page => 100
)
As for the doubled queries… if you add province_id as an attribute in your index definition, you'll be able to order search queries by it.
# in your Azienda index definition:
has province_id
# And then when searching:
Azienda.search(
params[:azienda_search][:terms],
:max_matches => 100_000,
:page => params[:page],
:per_page => 100,
:order => "province_id ASC"
)
I'm wanting a sphinx_scope that will only search for records that are current. Each database record has a field, status, whose value is either CURRENT or ARCHIVED.
I have achieved this, but I have to use an odd construct to get there; there is probably a much better way to do it.
Here's what I have:
indices/letter_index.rb
ThinkingSphinx::Index.define :letter, :with => :real_time do
# fields
indexes title, :sortable => true
indexes content
# attributes
has status, :type => :string
has issue_date, :type => :timestamp
has created_at, :type => :timestamp
has updated_at, :type => :timestamp
end
models/letter.rb
class Letter < ActiveRecord::Base
include ThinkingSphinx::Scopes
after_save ThinkingSphinx::RealTime.callback_for(:letter)
.. snip ..
sphinx_scope(:archived) {
{:with => {:status => "'ARCHIVED'"}}
}
The problem that I ran into was that if I used :with => {:status => 'ARCHIVED'}, my query came out as
SELECT * FROM `letter_core` WHERE MATCH('search term') AND `status` = ARCHIVED AND `sphinx_deleted` = 0 LIMIT 0, 20
ThinkingSphinx::SyntaxError: sphinxql: syntax error, unexpected IDENT, expecting CONST_INT (or 4 other tokens) near 'ARCHIVED AND `sphinx_deleted` = 0 LIMIT 0, 20; SHOW META'
but, if I construct it as :with => {:status => "'ARCHIVED'"}, it then adds the single quotes and the query succeeds. :)
Is this the proper way to write the scope, or is there a better way?
Bonus question: where do I find the docs for what is allowed in the scopes, such as :order, :with, :conditions, etc.
Firstly: the need for quotes was a bug - Sphinx has only relatively recently allowed for filtering on string attributes, hence why this wasn't something in place a good while ago. I've patched Riddle (the gem that is a pure Ruby wrapper around Sphinx functionality and used by Thinking Sphinx), you can give the latest a spin with this in your Gemfile:
gem 'riddle', '~> 2.0',
:git => 'git://github.com/pat/riddle.git',
:branch => 'develop',
:ref => '8aec79fdf4'
As for what can go in a Sphinx scope: anything that can go in a normal search call.
define_index do
indexes :first_name, :prefixes => true
indexes :last_name, :prefixes => true
indexes :email, :prefixes => true
set_property :enable_star => 1
set_property :min_perfix_len => 1
end
In this case if i what to search for only email then it will search from all the indexes that are specified.
EG:
email ="*me*"
Contact.search email
Displayed from first_name,last_name and email.
But it should display from only email
What would be solution for searching only one index from the specified indexes.
Just a quick correction - you want to search on a specific field, not a specific index.
And Thinking Sphinx can do this by using the :conditions option - so give the following a try:
Contact.search :conditions => {:email => '*me*'}
Thinking Sphinx can also automatically add wildcards to both ends of each word you give it as well:
Contact.search :conditions => {:email => 'me'}, :star => true
When querying the database with:
#robots = Robot.all(:condition => [:a => 'b'], :limit => 50, :offset => 0)
What is the best way to get the total number of rows without the :limit?
In raw MySQL you could do something like this:
SELECT SQL_CALC_FOUND_ROWS * FROM robots WHERE a=b LIMIT 0, 50
SELECT FOUND_ROWS();
Is there an active record way of doing this?
This works for me:
ps = Post.all(:limit => 10, :select => "SQL_CALC_FOUND_ROWS *")
Post.connection.execute("select found_rows()").fetch_hash
=> {"found_rows()"=>"2447"}
This will probably not work for joins or anything complex, but it works for the simple case.
Robot.count actually is the solution you want.
Reading one of the comments above, it looks like you may have a misunderstanding of how .count works. It returns a count of all the rows in the table only if there's no parameters.
but if you pass in the same conditions that you pass to all/find eg:
Robot.count(:conditions => {:a => 'b'})
.count() will return the number of rows that match the given conditions.
Just to be obvious - you can even save the condition-hash as a variable to pass into both - to reduce duplication, so:
conds = {:a => 'b'}
#robots = Robot.all(:conditions => conds, :limit => 50)
#num_robots = Robot.count(:conditions => conds)
That being said - you can't do an after-the-fact count on the result-set (like in your example). ie you can't just run your query then ask it how many rows would have been found. You do actually have to call .count on purpose.
search = Robot.all(:condition => ["a=b"], :offset => 0)
#robots = search[0..49]
#count = search.count
That should get what you want, gets all the Robots for counting and then sets #robots to the first 50. Might be a bit expensive on the resource front if the Robots table is huge.
You can of course do:
#count=Robot.all(:condition => ["a=b"], :offset => 0).count
#robots=Robot.all(:condition => ["a=b"], :limit => 50, :offset => 0)
but that will hit the database twice on each request (although rails does have query caching).
Both solutions only use active record so are database independent.
What do you need the total returned by the query for? if its pagination look into Will_paginate (Railscast) which can be extended with AJAX etc...
Try find_by_sql may that help.
Is #robots.size what you're looking for? Or Robot.count?
Otherwise, please clarify.
I think hakunin is right.
You can get no of row return by query by just chekcing the size of resulting array of query.
#robots = Robot.find_by_sql("Your sql")
or
#robots = Robot.find(:all , :conditions=>["your condiitons"] )
#robots.size or #robots.count
I have a table with MANY rows, I need just the IDs of certain rows.
The slow way is to call
SomeARClass.find(:all, :conditions => {:foo => true}, :select => :id)
This returns AR Objects...
Is there a way to call a select on a class and have it return a plain old ruby data structure. Something like this:
SomeARClass.select(:id, :conditions => {:foo => true})
-> [1,2,3]
ActiveRecord::Base.connection.select_all(sql)
or
SomeARClass.connection.select_all(sql)
This is what you want to use. It returns an array of hashes. It should be used sparingly though. Hand coded sql is what ActiveRecord was built to replace. I only use in in really performance critical areas where constructing and returning AR objects is too slow.
The pluck method is exactly what you want, without compromising the SQL bad practice:
SomeARClass.where(:foo => true).pluck(:id)
I believe this should be the selected answer!
I don't think there is anything like SomeARClass.select(:id, :conditions => {:foo => true})
but you have two options
SomeARClass.find(:all, :conditions => {:foo => true}, :select => :id).map(&:id)
#=> [1,2,3,4]
id_hash = ActiveRecord::Base.connection.select_all('select id from tablename')
#=> [{"id"=>"1"}, {"id"=>"2"}, {"id"=>"3"}, {"id"=>"4"}]
id_hash.map(&:values).flatten
#=> ["1", "2", "3", "4"]
The second option returns only a hash and not Active record objects but it does looks a bit hackish.
Short answer:
Use .ids to fetch only ids
Use pluck to fetch ONLY some columns, but values will be PARSED by Active Record
Use ActiveRecord::Base.connection.select_all to fetch UNPARSED values.
Notes:
There is small difference between pluck and select_all:
If you pluck data - ActiveRecord map data to Ruby objects, like Dates, Enums,
Models, etc, and it could be different than you expect.
You can notice that the result below is a string:
2.5.1 :001 > ActiveRecord::Base.connection.select_all("select created_at from some_ar_class where id = 1 limit 1").rows.first
(0.6ms) select created_at from some_ar_classes where id = 1 limit 1
=> ["2018-10-01 01:12:31.758161"]
While pluck return Dates
2.5.1 :002 > SomeARClass.where(id: 1).pluck(:created_at).first.class
(0.4ms) SELECT "some_ar_classes"."created_at" FROM "some_ar_classes" WHERE "some_ar_classes"."id" = $1 [["id", 1]]
=> ActiveSupport::TimeWithZone
The same happens with Enums(int in database, but :symbols in Ruby)
And all this parsing/mapping operations also takes time. It's not a lot, but still. So if you asking for the most fast way to fetch data - its definitely raw sql query with connection.select_all, but there are very small situations when you can get some significant performance increase on that.
So my recommendation is using pluck.