I'm wanting a sphinx_scope that will only search for records that are current. Each database record has a field, status, whose value is either CURRENT or ARCHIVED.
I have achieved this, but I have to use an odd construct to get there; there is probably a much better way to do it.
Here's what I have:
indices/letter_index.rb
ThinkingSphinx::Index.define :letter, :with => :real_time do
# fields
indexes title, :sortable => true
indexes content
# attributes
has status, :type => :string
has issue_date, :type => :timestamp
has created_at, :type => :timestamp
has updated_at, :type => :timestamp
end
models/letter.rb
class Letter < ActiveRecord::Base
include ThinkingSphinx::Scopes
after_save ThinkingSphinx::RealTime.callback_for(:letter)
.. snip ..
sphinx_scope(:archived) {
{:with => {:status => "'ARCHIVED'"}}
}
The problem that I ran into was that if I used :with => {:status => 'ARCHIVED'}, my query came out as
SELECT * FROM `letter_core` WHERE MATCH('search term') AND `status` = ARCHIVED AND `sphinx_deleted` = 0 LIMIT 0, 20
ThinkingSphinx::SyntaxError: sphinxql: syntax error, unexpected IDENT, expecting CONST_INT (or 4 other tokens) near 'ARCHIVED AND `sphinx_deleted` = 0 LIMIT 0, 20; SHOW META'
but, if I construct it as :with => {:status => "'ARCHIVED'"}, it then adds the single quotes and the query succeeds. :)
Is this the proper way to write the scope, or is there a better way?
Bonus question: where do I find the docs for what is allowed in the scopes, such as :order, :with, :conditions, etc.
Firstly: the need for quotes was a bug - Sphinx has only relatively recently allowed for filtering on string attributes, hence why this wasn't something in place a good while ago. I've patched Riddle (the gem that is a pure Ruby wrapper around Sphinx functionality and used by Thinking Sphinx), you can give the latest a spin with this in your Gemfile:
gem 'riddle', '~> 2.0',
:git => 'git://github.com/pat/riddle.git',
:branch => 'develop',
:ref => '8aec79fdf4'
As for what can go in a Sphinx scope: anything that can go in a normal search call.
Related
Hitting an OutOfBoundsError as a consequence of misunderstanding the proper configuration syntax (which may also be a by-product of legacy syntax).
The manual suggests a Class search taking on WillPaginate styled parameters. Having many fields to draw from, the model is defined as
class AziendaSearch < BaseSearch
set_per_page 10000
accept :terms
end
the set_per_page was put at a high level because if I set it at the target of 100, the will_paginate links do not show up.
the controller may be excessively convoluted to include the ordering parameter, and thus result in a two-step process:
#azienda_search = AziendaSearch.new params
#results = #azienda_search.search
#aziendas = Azienda.order('province_id ASC').where('id IN (?)', #results).paginate :page => params[:page], :per_page => 100
the view paginates on the basis of #aziendas:
<%= will_paginate #aziendas, :previous_label => "precedente ", :next_label => " successiva" %>
My suspicion is that the search model is not properly set, but the syntax is not obvious to me given the manual's indications. page params[:page] certainly does not work...
Update
BaseSearch is a Sphinx method and was in fact inherited from an older version of this applications (rails2.x...). So this may be hanging around creating all sort of syntaxic confusion.
In fact, following the manual, I am now fully uncertain as to how to best makes these statements. Should a seperate class be defined for AziendaSearch ? If not, where should the Azienda.search bloc be invoked... in the controller as such?
#azienda_search = Azienda.search(
:max_matches => 100_000,
:page => params[:page],
:per_page => 100,
:order => "province_id ASC"
)
#results = #azienda_search.search
I'm not sure what BaseSearch is doing with set_per_page (that's certainly not a Thinking Sphinx method), but it's worth noting that Sphinx defaults to a maximum of 1000 records. It is possible to configure Sphinx to return more, though - you need to set max_matches in your config/thinking_sphinx.yml to your preferred limit (per environment):
production:
max_matches: 100000
And also set the limit on the relevant search requests:
Azienda.search(
:max_matches => 100_000,
:page => params[:page],
:per_page => 100
)
As for the doubled queries… if you add province_id as an attribute in your index definition, you'll be able to order search queries by it.
# in your Azienda index definition:
has province_id
# And then when searching:
Azienda.search(
params[:azienda_search][:terms],
:max_matches => 100_000,
:page => params[:page],
:per_page => 100,
:order => "province_id ASC"
)
I have a client who has a database of images/media that uses a file naming convention that contains a page number for each image in the filename itself.
The images are scans of books and page 1 is often simply the cover image and the actual “page 1” of the book is scanned on something like scan number 3. With that in mind the filenames would look like this in the database field filename:
great_book_001.jpg
great_book_002.jpg
great_book_003_0001.jpg
great_book_004_0002.jpg
great_book_005_0003.jpg
With that in mind, I would like to extract that page number from the filename using MySQL’s SUBSTRING_INDEX. And using pure MySQL it took me about 5 minutes to come up with this raw query which works great:
SELECT `id`, `filename`, SUBSTRING_INDEX(SUBSTRING_INDEX(`filename`, '.', 1), '_', -1) as `page`
FROM `media_files`
WHERE CHAR_LENGTH(SUBSTRING_INDEX(SUBSTRING_INDEX(`filename`, '.', 1), '_', -1)) = 4
ORDER BY `page` ASC
;
The issue is I am trying to understand if it’s possible to implement column aliasing using SUBSTRING_INDEX while using the Sequel Gem for Ruby?
So far I don’t seem to be able to do this with the initial creation of a dataset like this:
# Fetch a dataset of media files.
one_to_many :media_files, :class => MediaFiles,
:key => :id, :order => :rank
Since the returned dataset is an array, I am doing is using the Ruby map method to roll through the fetched dataset & then doing some string processing before inserting a page into the dataset using the Ruby merge:
# Roll through the dataset & set a page value for files that match the page pattern.
def media_files_final
media_files.map{ |m|
split_value = m[:filename].split(/_/, -1).last.split(/ *\. */, 2).first
if split_value != nil && split_value.length == 4
m.values.merge({ :page => split_value })
else
m.values.merge({ :page => nil })
end
}
end
That works fine. But this seems clumsy to me when compared to a simple MySQL query which can do it all in one fell swoop. So the question is, is there any way I can achieve the same results using the Sequel Gem for Ruby?
I gather that perhaps SUBSTRING_INDEX is not easily supported within the Sequel framework. But if not, is there any chance I can insert raw MySQL instead of using Sequel methods to achieve this goal?
If you want your association to use that additional selected column and that filter, just use the :select and :conditions options:
substring_index = Sequel.expr{SUBSTRING_INDEX(SUBSTRING_INDEX(:filename, '.', 1), '_', -1)}
one_to_many :media_files, :class => MediaFiles,
:key => :id, :order => :page,
:select=>[:id, :filename, substring_index.as(:page)],
:conditions => {Sequel.function(:CHAR_LENGTH, substring_index) => 4}
define_index do
indexes :first_name, :prefixes => true
indexes :last_name, :prefixes => true
indexes :email, :prefixes => true
set_property :enable_star => 1
set_property :min_perfix_len => 1
end
In this case if i what to search for only email then it will search from all the indexes that are specified.
EG:
email ="*me*"
Contact.search email
Displayed from first_name,last_name and email.
But it should display from only email
What would be solution for searching only one index from the specified indexes.
Just a quick correction - you want to search on a specific field, not a specific index.
And Thinking Sphinx can do this by using the :conditions option - so give the following a try:
Contact.search :conditions => {:email => '*me*'}
Thinking Sphinx can also automatically add wildcards to both ends of each word you give it as well:
Contact.search :conditions => {:email => 'me'}, :star => true
I have a typical forum style app. There is a Topics model which has_many Posts.
What I want to do using Rails 2.3.x is query the topics table and sort by the most recent post in that topic.
#topics = Topic.paginate :page => params[:page], :per_page => 25,
:include => :posts, :order => 'HELP'
I'm sure this is a simple one but no joy with Google. Thanks.
Sorting on a joined column is probably a bad idea and will take an enormous amount of time to run in many situations. What would be better is to twiddle a special date field on the Topic model when a new Post is created:
class Post < ActiveRecord::Base
after_create :update_topic_activity_at
protected
def update_topic_activity_at
Topic.update_all({ :activity_at => Time.now }, { :id => self.topic_id})
end
end
Then you can easily sort on the activity_at column as required.
When adding this column you can always populate the initial activity_at with the highest posting time if you have existing data to migrate.
In my web app I need to perform 3 types of searching on items table with the following conditions:
items.is_public = 1 (use title field for indexing) - a lot of results can be retrieved(cardinality is much higher than in other cases)
items.category_id = {X} (use title + private_notes fields for indexing) - usually less than 100 results
items.user_id = {X} (use title + private_notes fields for indexing) - usually less than 100 results
I can't find a way to make Sphinx work in all these cases, but it works well in 1st case.
Should I use Sphinx just for the 1st case and use plain old "slow" FULLTEXT searching in MySQL(at least because of lower cardinality in 2-3 cases)?
Or is it just me and Sphinx can do pretty much everything?
Without full knowledge of your models I might be missing something, but how's this:
class item < ActiveRecord::Base
define_index do
indexes :title
indexes :private_notes
has :is_public, :type => :boolean
has :category_id
has :user_id
end
end
1)
Item.search(:conditions => {:title => "blah"}, :with => {:is_public => true})
2)
Item.search("blah", :with => {:category_id => 1})
3)
Item.search("blah", :with => {:user_id => 196})