ActiveRecord select attributes without AR objects - mysql

I have a table with MANY rows, I need just the IDs of certain rows.
The slow way is to call
SomeARClass.find(:all, :conditions => {:foo => true}, :select => :id)
This returns AR Objects...
Is there a way to call a select on a class and have it return a plain old ruby data structure. Something like this:
SomeARClass.select(:id, :conditions => {:foo => true})
-> [1,2,3]

ActiveRecord::Base.connection.select_all(sql)
or
SomeARClass.connection.select_all(sql)
This is what you want to use. It returns an array of hashes. It should be used sparingly though. Hand coded sql is what ActiveRecord was built to replace. I only use in in really performance critical areas where constructing and returning AR objects is too slow.

The pluck method is exactly what you want, without compromising the SQL bad practice:
SomeARClass.where(:foo => true).pluck(:id)
I believe this should be the selected answer!

I don't think there is anything like SomeARClass.select(:id, :conditions => {:foo => true})
but you have two options
SomeARClass.find(:all, :conditions => {:foo => true}, :select => :id).map(&:id)
#=> [1,2,3,4]
id_hash = ActiveRecord::Base.connection.select_all('select id from tablename')
#=> [{"id"=>"1"}, {"id"=>"2"}, {"id"=>"3"}, {"id"=>"4"}]
id_hash.map(&:values).flatten
#=> ["1", "2", "3", "4"]
The second option returns only a hash and not Active record objects but it does looks a bit hackish.

Short answer:
Use .ids to fetch only ids
Use pluck to fetch ONLY some columns, but values will be PARSED by Active Record
Use ActiveRecord::Base.connection.select_all to fetch UNPARSED values.
Notes:
There is small difference between pluck and select_all:
If you pluck data - ActiveRecord map data to Ruby objects, like Dates, Enums,
Models, etc, and it could be different than you expect.
You can notice that the result below is a string:
2.5.1 :001 > ActiveRecord::Base.connection.select_all("select created_at from some_ar_class where id = 1 limit 1").rows.first
(0.6ms) select created_at from some_ar_classes where id = 1 limit 1
=> ["2018-10-01 01:12:31.758161"]
While pluck return Dates
2.5.1 :002 > SomeARClass.where(id: 1).pluck(:created_at).first.class
(0.4ms) SELECT "some_ar_classes"."created_at" FROM "some_ar_classes" WHERE "some_ar_classes"."id" = $1 [["id", 1]]
=> ActiveSupport::TimeWithZone
The same happens with Enums(int in database, but :symbols in Ruby)
And all this parsing/mapping operations also takes time. It's not a lot, but still. So if you asking for the most fast way to fetch data - its definitely raw sql query with connection.select_all, but there are very small situations when you can get some significant performance increase on that.
So my recommendation is using pluck.

Related

Multiple Fields with a GroupBy Statement in Laravel

Already received a great answer at this post
Laravel Query using GroupBy with distinct traits
But how can I modify it to include more than just one field. The example uses pluck which can only grab one field.
I have tried to do something like this to add multiple fields to the view as such...
$hats = $hatData->groupBy('style')
->map(function ($item){
return ['colors' => $item->color, 'price' => $item->price,'itemNumber'=>$item->itemNumber];
});
In my initial query for "hatData" I can see the fields are all there but yet I get an error saying that 'colors', (etc.) is not available on this collection instance. I can see the collection looks different than what is obtained from pluck, so it looks like when I need more fields and cant use pluck I have to format the map differently but cant see how. Can anyone explain how I can request multiple fields as well as output them on the view rather than just one field as in the original question? Thanks!
When you use groupBy() of Laravel Illuminate\Support\Collection it gives you a deeper nested arrays/objects, so that you need to do more than one map on the result in order to unveil the real models (or arrays).
I will demo this with an example of a nested collection:
$collect = collect([
collect([
'name' => 'abc',
'age' => 1
]),collect([
'name' => 'cde',
'age' => 5
]),collect([
'name' => 'abcde',
'age' => 2
]),collect([
'name' => 'cde',
'age' => 7
]),
]);
$group = $collect->groupBy('name')->values();
$result = $group->map(function($items, $key){
// here we have uncovered the first level of the group
// $key is the group names which is the key to each group
return $items->map(function ($item){
//This second level opens EACH group (or array) in my case:
return $item['age'];
});
});
The summary is that, you need another loop map(), each() over the main grouped collection.

Is it possible to implement SUBSTRING_INDEX logic using Ruby Sequel to create a column alias?

I have a client who has a database of images/media that uses a file naming convention that contains a page number for each image in the filename itself.
The images are scans of books and page 1 is often simply the cover image and the actual “page 1” of the book is scanned on something like scan number 3. With that in mind the filenames would look like this in the database field filename:
great_book_001.jpg
great_book_002.jpg
great_book_003_0001.jpg
great_book_004_0002.jpg
great_book_005_0003.jpg
With that in mind, I would like to extract that page number from the filename using MySQL’s SUBSTRING_INDEX. And using pure MySQL it took me about 5 minutes to come up with this raw query which works great:
SELECT `id`, `filename`, SUBSTRING_INDEX(SUBSTRING_INDEX(`filename`, '.', 1), '_', -1) as `page`
FROM `media_files`
WHERE CHAR_LENGTH(SUBSTRING_INDEX(SUBSTRING_INDEX(`filename`, '.', 1), '_', -1)) = 4
ORDER BY `page` ASC
;
The issue is I am trying to understand if it’s possible to implement column aliasing using SUBSTRING_INDEX while using the Sequel Gem for Ruby?
So far I don’t seem to be able to do this with the initial creation of a dataset like this:
# Fetch a dataset of media files.
one_to_many :media_files, :class => MediaFiles,
:key => :id, :order => :rank
Since the returned dataset is an array, I am doing is using the Ruby map method to roll through the fetched dataset & then doing some string processing before inserting a page into the dataset using the Ruby merge:
# Roll through the dataset & set a page value for files that match the page pattern.
def media_files_final
media_files.map{ |m|
split_value = m[:filename].split(/_/, -1).last.split(/ *\. */, 2).first
if split_value != nil && split_value.length == 4
m.values.merge({ :page => split_value })
else
m.values.merge({ :page => nil })
end
}
end
That works fine. But this seems clumsy to me when compared to a simple MySQL query which can do it all in one fell swoop. So the question is, is there any way I can achieve the same results using the Sequel Gem for Ruby?
I gather that perhaps SUBSTRING_INDEX is not easily supported within the Sequel framework. But if not, is there any chance I can insert raw MySQL instead of using Sequel methods to achieve this goal?
If you want your association to use that additional selected column and that filter, just use the :select and :conditions options:
substring_index = Sequel.expr{SUBSTRING_INDEX(SUBSTRING_INDEX(:filename, '.', 1), '_', -1)}
one_to_many :media_files, :class => MediaFiles,
:key => :id, :order => :page,
:select=>[:id, :filename, substring_index.as(:page)],
:conditions => {Sequel.function(:CHAR_LENGTH, substring_index) => 4}

Filtering an array based on database records

I have an Array if a user's Facebook friend list as follows:
[
{ 'name' => 'John Mallock', 'id' => '123123' },
{ 'name' => 'Susan Freely', 'id' => '123123123' },
...
]
I'd like to filter this list for entries that exist in the users table in my Rails app. I'm currently doing this as follows:
graph = Koala::Facebook::API.new 'access_token'
friends = graph.get_connections 'me', 'friends' # Returns the structure above
friends.select! { |f| User.exists? :facebook_id => f['id'] }
This results in a SELECT query for every friend in the list, which is noticeably inefficient.
Is there a more effective means of filtering this list based on database records?
Probably the simplest way to do this is to pass an array into a where method in ruby. If you pass in an array, it will be converted into a IN query on the database:
users = User.where(:facebook_id => friends.map{|f| f['id']})
# generates: SELECT * FROM users where users.facebook_id IN (f1, f2, etc..)
If you need to know which entries in friends correspond to users, you could then call a select on friends:
existing_facebook_ids = users.map(&:facebook_id)
friends.select! {|f| existing_facebook_ids.include?(f['id'])}
Note that the above is pretty inefficient if you have a decent amount of records in either array. You'd probably want to optimize it somewhat, or better yet, don't use the friends array and just iterate over User records if they contain the same data.
You can use a SQL IN query to select all the IDs at once.
friend_ids = friends.map{|f| f['id']}
User.scoped(:conditions => ['id IN (?)', friend_ids])

How can I get FOUND_ROW()s from an active record object in rails?

When querying the database with:
#robots = Robot.all(:condition => [:a => 'b'], :limit => 50, :offset => 0)
What is the best way to get the total number of rows without the :limit?
In raw MySQL you could do something like this:
SELECT SQL_CALC_FOUND_ROWS * FROM robots WHERE a=b LIMIT 0, 50
SELECT FOUND_ROWS();
Is there an active record way of doing this?
This works for me:
ps = Post.all(:limit => 10, :select => "SQL_CALC_FOUND_ROWS *")
Post.connection.execute("select found_rows()").fetch_hash
=> {"found_rows()"=>"2447"}
This will probably not work for joins or anything complex, but it works for the simple case.
Robot.count actually is the solution you want.
Reading one of the comments above, it looks like you may have a misunderstanding of how .count works. It returns a count of all the rows in the table only if there's no parameters.
but if you pass in the same conditions that you pass to all/find eg:
Robot.count(:conditions => {:a => 'b'})
.count() will return the number of rows that match the given conditions.
Just to be obvious - you can even save the condition-hash as a variable to pass into both - to reduce duplication, so:
conds = {:a => 'b'}
#robots = Robot.all(:conditions => conds, :limit => 50)
#num_robots = Robot.count(:conditions => conds)
That being said - you can't do an after-the-fact count on the result-set (like in your example). ie you can't just run your query then ask it how many rows would have been found. You do actually have to call .count on purpose.
search = Robot.all(:condition => ["a=b"], :offset => 0)
#robots = search[0..49]
#count = search.count
That should get what you want, gets all the Robots for counting and then sets #robots to the first 50. Might be a bit expensive on the resource front if the Robots table is huge.
You can of course do:
#count=Robot.all(:condition => ["a=b"], :offset => 0).count
#robots=Robot.all(:condition => ["a=b"], :limit => 50, :offset => 0)
but that will hit the database twice on each request (although rails does have query caching).
Both solutions only use active record so are database independent.
What do you need the total returned by the query for? if its pagination look into Will_paginate (Railscast) which can be extended with AJAX etc...
Try find_by_sql may that help.
Is #robots.size what you're looking for? Or Robot.count?
Otherwise, please clarify.
I think hakunin is right.
You can get no of row return by query by just chekcing the size of resulting array of query.
#robots = Robot.find_by_sql("Your sql")
or
#robots = Robot.find(:all , :conditions=>["your condiitons"] )
#robots.size or #robots.count

Is it possible to use Sphinx search with dynamic conditions?

In my web app I need to perform 3 types of searching on items table with the following conditions:
items.is_public = 1 (use title field for indexing) - a lot of results can be retrieved(cardinality is much higher than in other cases)
items.category_id = {X} (use title + private_notes fields for indexing) - usually less than 100 results
items.user_id = {X} (use title + private_notes fields for indexing) - usually less than 100 results
I can't find a way to make Sphinx work in all these cases, but it works well in 1st case.
Should I use Sphinx just for the 1st case and use plain old "slow" FULLTEXT searching in MySQL(at least because of lower cardinality in 2-3 cases)?
Or is it just me and Sphinx can do pretty much everything?
Without full knowledge of your models I might be missing something, but how's this:
class item < ActiveRecord::Base
define_index do
indexes :title
indexes :private_notes
has :is_public, :type => :boolean
has :category_id
has :user_id
end
end
1)
Item.search(:conditions => {:title => "blah"}, :with => {:is_public => true})
2)
Item.search("blah", :with => {:category_id => 1})
3)
Item.search("blah", :with => {:user_id => 196})