Is it possible to implement SUBSTRING_INDEX logic using Ruby Sequel to create a column alias? - mysql

I have a client who has a database of images/media that uses a file naming convention that contains a page number for each image in the filename itself.
The images are scans of books and page 1 is often simply the cover image and the actual “page 1” of the book is scanned on something like scan number 3. With that in mind the filenames would look like this in the database field filename:
great_book_001.jpg
great_book_002.jpg
great_book_003_0001.jpg
great_book_004_0002.jpg
great_book_005_0003.jpg
With that in mind, I would like to extract that page number from the filename using MySQL’s SUBSTRING_INDEX. And using pure MySQL it took me about 5 minutes to come up with this raw query which works great:
SELECT `id`, `filename`, SUBSTRING_INDEX(SUBSTRING_INDEX(`filename`, '.', 1), '_', -1) as `page`
FROM `media_files`
WHERE CHAR_LENGTH(SUBSTRING_INDEX(SUBSTRING_INDEX(`filename`, '.', 1), '_', -1)) = 4
ORDER BY `page` ASC
;
The issue is I am trying to understand if it’s possible to implement column aliasing using SUBSTRING_INDEX while using the Sequel Gem for Ruby?
So far I don’t seem to be able to do this with the initial creation of a dataset like this:
# Fetch a dataset of media files.
one_to_many :media_files, :class => MediaFiles,
:key => :id, :order => :rank
Since the returned dataset is an array, I am doing is using the Ruby map method to roll through the fetched dataset & then doing some string processing before inserting a page into the dataset using the Ruby merge:
# Roll through the dataset & set a page value for files that match the page pattern.
def media_files_final
media_files.map{ |m|
split_value = m[:filename].split(/_/, -1).last.split(/ *\. */, 2).first
if split_value != nil && split_value.length == 4
m.values.merge({ :page => split_value })
else
m.values.merge({ :page => nil })
end
}
end
That works fine. But this seems clumsy to me when compared to a simple MySQL query which can do it all in one fell swoop. So the question is, is there any way I can achieve the same results using the Sequel Gem for Ruby?
I gather that perhaps SUBSTRING_INDEX is not easily supported within the Sequel framework. But if not, is there any chance I can insert raw MySQL instead of using Sequel methods to achieve this goal?

If you want your association to use that additional selected column and that filter, just use the :select and :conditions options:
substring_index = Sequel.expr{SUBSTRING_INDEX(SUBSTRING_INDEX(:filename, '.', 1), '_', -1)}
one_to_many :media_files, :class => MediaFiles,
:key => :id, :order => :page,
:select=>[:id, :filename, substring_index.as(:page)],
:conditions => {Sequel.function(:CHAR_LENGTH, substring_index) => 4}

Related

missing quotes in thinking sphinx scoped search

I'm wanting a sphinx_scope that will only search for records that are current. Each database record has a field, status, whose value is either CURRENT or ARCHIVED.
I have achieved this, but I have to use an odd construct to get there; there is probably a much better way to do it.
Here's what I have:
indices/letter_index.rb
ThinkingSphinx::Index.define :letter, :with => :real_time do
# fields
indexes title, :sortable => true
indexes content
# attributes
has status, :type => :string
has issue_date, :type => :timestamp
has created_at, :type => :timestamp
has updated_at, :type => :timestamp
end
models/letter.rb
class Letter < ActiveRecord::Base
include ThinkingSphinx::Scopes
after_save ThinkingSphinx::RealTime.callback_for(:letter)
.. snip ..
sphinx_scope(:archived) {
{:with => {:status => "'ARCHIVED'"}}
}
The problem that I ran into was that if I used :with => {:status => 'ARCHIVED'}, my query came out as
SELECT * FROM `letter_core` WHERE MATCH('search term') AND `status` = ARCHIVED AND `sphinx_deleted` = 0 LIMIT 0, 20
ThinkingSphinx::SyntaxError: sphinxql: syntax error, unexpected IDENT, expecting CONST_INT (or 4 other tokens) near 'ARCHIVED AND `sphinx_deleted` = 0 LIMIT 0, 20; SHOW META'
but, if I construct it as :with => {:status => "'ARCHIVED'"}, it then adds the single quotes and the query succeeds. :)
Is this the proper way to write the scope, or is there a better way?
Bonus question: where do I find the docs for what is allowed in the scopes, such as :order, :with, :conditions, etc.
Firstly: the need for quotes was a bug - Sphinx has only relatively recently allowed for filtering on string attributes, hence why this wasn't something in place a good while ago. I've patched Riddle (the gem that is a pure Ruby wrapper around Sphinx functionality and used by Thinking Sphinx), you can give the latest a spin with this in your Gemfile:
gem 'riddle', '~> 2.0',
:git => 'git://github.com/pat/riddle.git',
:branch => 'develop',
:ref => '8aec79fdf4'
As for what can go in a Sphinx scope: anything that can go in a normal search call.

How to obtain result as array of Hashes in Ruby (mysql2 gem)

I'm using Ruby's mysql2 gem found here:
https://github.com/brianmario/mysql2
I have the following code:
client = Mysql2::Client.new(
:host => dbhost,
:port => dbport, :database => dbname,
:username => dbuser,
:password => dbpass)
sql = "SELECT column1, column2, column3 FROM table WHERE id=#{id}"
res = client.query(sql, :as => :array)
p res # prints #<Mysql2::Result:0x007fa8e514b7d0>
Is it possible the above .query call to return array of hashes, each hesh in the res array to be in the format column => value. I can do this manually but from the docs I was left with the impression that I can get the results directly loaded in memory in the mentioned format. I need this, because after that I have to encode the result in json anyway, so there is no advantage for me to fetch the rows one by one. Also the amount of data is always very small.
Change
res = client.query(sql, :as => :array)
to:
res = client.query(sql, :as => :hash)
As #Tadman says, :as => :hash is the default, so actually you don't have to specify anything.
You can always fetch the results as JSON directly:
res = client.query(sql, :as => :json)
The default format, as far as I know, is an array of hashes. If you want symbol keys you need to ask for those. A lot of this is documented in the gem itself.
You should also be extremely cautious about inserting things into your query with string substitution. Whenever possible, use placeholders. These aren't supported by the mysql2 driver directly, so you should use an adapter layer like ActiveRecord or Sequel.
The source code for mysql2 implemented MySql2::Result to simply include Enumerable, so the obvious way to access the data is by using any method implemented in Enumerabledoc here.
For example, #each, #each_with_index, #collect and #to_a are all useful ways to access the Result's elements.
puts res.collect{ |row| "Then the next result was #{row}" }.join("\t")

FasterCSV tutorial to import data to database?

Is anyone aware of any tutorials that demonstrate how to import data in a Ruby app with FasterCSV and saving it to a SQLite or MySQL database?
Here are the specific steps involved:
Reading a file line by line (the .foreach method does this according to documentation)
Mapping header names in file to database column names
Creating entries in database for CSV data (seems doable with .new and .save within a .foreach block)
This is a basic usage scenario but I haven't been able to find any tutorials for it, so any resources would be helpful.
Thanks!
So it looks like FasterCSV is now part of the Ruby core as of Ruby 1.9, so this is what I ended up doing, to achieve the goals in my question above:
#importedfile = Import.find(params[:id])
filename = #importedfile.csv.path
CSV.foreach(filename, {:headers => true}) do |row|
#post = Post.find_or_create_by_email(
:content => row[0],
:name => row[1],
:blog_url => row[2],
:email => row[3]
)
end
flash[:notice] = "New posts were successfully processed."
redirect_to posts_path
Inside the find_or_create_by_email function is the mapping from the database columns to the columns of the CSV file: row[0], row[1], row[2], row[3].
Since it is a find_or_create function I don't need to explicitly call #post.save to save the entry to the database.
If there's a better way please update or add your own answer.
First, start with other Stack Overflow answers: Best way to read CSV in Ruby. FasterCSV?
Before jumping into writing the code, I check whether there is an existing tool to do the import. You might want to look at mysqlimport.
This is a simple example showing how to map the CSV headers to a database's columns:
require "csv"
data = <<EOT
header1, header2, header 3
1, 2, 3
2, 2, 3
3, 2, 3
EOT
header_to_table_columns = {
'header1' => 'col1',
'header2' => 'col2',
'header 3' => 'col3'
}
arr_of_arrs = CSV.parse(data)
headers = arr_of_arrs.shift.map{ |i| i.strip }
db_cols = header_to_table_columns.values_at(*headers)
arr_of_arrs.each do |ary|
# insert into the database using an ORM or by creating insert statements
end
Ruby is great for rolling your own import routines.
Reading a file(handy block structure to ensure that the file handle is closed properly):
File.open( filepath ) do |f|
f.each_line do |line|
do something with the line...
end
end
Mapping header names to columns(you might want to check for matching array lengths):
Hash[header_array.zip( line_array )]
Creating entries in the database using activerecord:
SomeModel.create( Hash[header_array.zip( line_array )] )
It sounds like you are planning to let users upload csv files and import them into the database. This is asking for trouble unless they are savvy about data. You might want to look into a nosql solution to simplify things on the import front.
This seems to be the shortest way, if you can use the ID to identify the records and if no mapping of column names is necessary:
CSV.foreach(filename, {:headers => true}) do |row|
post = Post.find_or_create_by_id row["id"]
post.update_attributes row.to_hash
end

ActiveRecord select attributes without AR objects

I have a table with MANY rows, I need just the IDs of certain rows.
The slow way is to call
SomeARClass.find(:all, :conditions => {:foo => true}, :select => :id)
This returns AR Objects...
Is there a way to call a select on a class and have it return a plain old ruby data structure. Something like this:
SomeARClass.select(:id, :conditions => {:foo => true})
-> [1,2,3]
ActiveRecord::Base.connection.select_all(sql)
or
SomeARClass.connection.select_all(sql)
This is what you want to use. It returns an array of hashes. It should be used sparingly though. Hand coded sql is what ActiveRecord was built to replace. I only use in in really performance critical areas where constructing and returning AR objects is too slow.
The pluck method is exactly what you want, without compromising the SQL bad practice:
SomeARClass.where(:foo => true).pluck(:id)
I believe this should be the selected answer!
I don't think there is anything like SomeARClass.select(:id, :conditions => {:foo => true})
but you have two options
SomeARClass.find(:all, :conditions => {:foo => true}, :select => :id).map(&:id)
#=> [1,2,3,4]
id_hash = ActiveRecord::Base.connection.select_all('select id from tablename')
#=> [{"id"=>"1"}, {"id"=>"2"}, {"id"=>"3"}, {"id"=>"4"}]
id_hash.map(&:values).flatten
#=> ["1", "2", "3", "4"]
The second option returns only a hash and not Active record objects but it does looks a bit hackish.
Short answer:
Use .ids to fetch only ids
Use pluck to fetch ONLY some columns, but values will be PARSED by Active Record
Use ActiveRecord::Base.connection.select_all to fetch UNPARSED values.
Notes:
There is small difference between pluck and select_all:
If you pluck data - ActiveRecord map data to Ruby objects, like Dates, Enums,
Models, etc, and it could be different than you expect.
You can notice that the result below is a string:
2.5.1 :001 > ActiveRecord::Base.connection.select_all("select created_at from some_ar_class where id = 1 limit 1").rows.first
(0.6ms) select created_at from some_ar_classes where id = 1 limit 1
=> ["2018-10-01 01:12:31.758161"]
While pluck return Dates
2.5.1 :002 > SomeARClass.where(id: 1).pluck(:created_at).first.class
(0.4ms) SELECT "some_ar_classes"."created_at" FROM "some_ar_classes" WHERE "some_ar_classes"."id" = $1 [["id", 1]]
=> ActiveSupport::TimeWithZone
The same happens with Enums(int in database, but :symbols in Ruby)
And all this parsing/mapping operations also takes time. It's not a lot, but still. So if you asking for the most fast way to fetch data - its definitely raw sql query with connection.select_all, but there are very small situations when you can get some significant performance increase on that.
So my recommendation is using pluck.

Is it possible to use Sphinx search with dynamic conditions?

In my web app I need to perform 3 types of searching on items table with the following conditions:
items.is_public = 1 (use title field for indexing) - a lot of results can be retrieved(cardinality is much higher than in other cases)
items.category_id = {X} (use title + private_notes fields for indexing) - usually less than 100 results
items.user_id = {X} (use title + private_notes fields for indexing) - usually less than 100 results
I can't find a way to make Sphinx work in all these cases, but it works well in 1st case.
Should I use Sphinx just for the 1st case and use plain old "slow" FULLTEXT searching in MySQL(at least because of lower cardinality in 2-3 cases)?
Or is it just me and Sphinx can do pretty much everything?
Without full knowledge of your models I might be missing something, but how's this:
class item < ActiveRecord::Base
define_index do
indexes :title
indexes :private_notes
has :is_public, :type => :boolean
has :category_id
has :user_id
end
end
1)
Item.search(:conditions => {:title => "blah"}, :with => {:is_public => true})
2)
Item.search("blah", :with => {:category_id => 1})
3)
Item.search("blah", :with => {:user_id => 196})