Rails ordering of collections by proc - mysql

I want to sort a collection using a custom proc. I know Rails has the order method, but I don't believe this works with procs, so I'm just using sort_by instead. Can someone go into detail about the speed I'm sacrificing, or suggest alternatives? My understanding is that the exact implementation of order will depend on the adapter (which, in my case, is mysql), but I'm wondering if there are ways to take advantage of this to speed the sort up.
As an example, I want to do this:
Model.order(|m| m.get_priority )
but am forced to do this
Model.all.sort_by{|m| m.get_priority}

sort_by is implemented at Ruby level and it's part of Ruby, not ActiveRecord. Therefore, the sorting will not be executed by the database, rather by the Ruby interpreter.
This is not an optimal solution as DBMS are generally more efficient at sorting data as they may use existing indexes.
If get_priority performs some sort of computation outside the database, then you don't have a lot of alternatives to the code you posted here unless you want to cache the result of the get_priority as a column in the Model table and sort against it using the ActiveRecord order statement that will result in an ORDER BY SQL statement.

Related

Complex filtering in rails app. Not sure complex sql is the answer?

I have an application that allows users to filter applicants based on very large set of criteria. The criteria are each represented by boolean columns spanning multiple tables in the database. Instead of using active record models I thought it was best to use pure sql and put the bulk of the work in the database. In order to do this I have to construct a rather complex sql query based on the criteria that the users selected and then run it through AR on the db. Is there a better way to do this? I want to maximize performance while also having maintainable and non brittle code at the same time? Any help would be greatly appreciated.
As #hazzit said, it is difficult to answer without much details, but here's my two cents on this. Raw SQL is usually needed to perform complex operations like aggregates, calculations, etc. However, when it comes to search / filtering features, I often find using raw SQL overkill and not quite maintainable.
The key question here is : can you break down your problem in multiple independent filters ?
If the answer is yes, then you should leverage the power of ActiveRecord and Arel. I often find myself implementing something like this in my model :
scope :a_scope, ->{ where something: true }
scope :another_scope, ->( option ){ where an_option: option }
scope :using_arel, ->{ joins(:assoc).where Assoc.arel_table[:some_field].not_eq "foo" }
# cue a bunch of scopes
def self.search( options = {} )
output = relation
relation = relation.a_scope if options[:an_option]
relation = relation.another_scope( options[:another_option] ) unless options[:flag]
# add logic as you need it
end
The beauty of this solution is that you declare a clean interface in which you can directly pour all the params from your checkboxes and fields, and that returns a relation. Breaking the query into multiple, reusable scopes helps keeping the thing readable and maintainable ; using a search class method ties it all together and allows thorough documentation... And all in all, using Arel helps securing the app against injections.
As a side note, this does not prevent you from using raw SQL, as long as the query can be isolated inside a scope.
If this method is not suitable to your needs, there's another option : use a full-fledged search / filtering solution like Sunspot. This uses another store, separate from your db, that indexes defined parts of your data for easy and performant search.
It is hard to answer this question fully without knowing more details, but I'll try anyway.
While databases are bad at quite a few things, they are very good at filtering data, especially when it comes to a high volumes.
If you do the filtering in Ruby on Rails (or just about any other programming language), the system will have to retrieve all of the unfiltered data from the database, which will cause tons of disk I/O and network (or interprocess) traffic. It then has to go through all those unfiltered results in memory, which may be quite a burdon on RAM and CPU.
If you do the filtering in the database, there is a pretty good chance that most of the records will never be actually retrieved from disk, won't be handed over to RoR and won't then be filtered. The main reason for indexes to even exist is for the sole purpose of avoiding expensive operations in order to speed things up. (Yes, they also help maintain data integrity)
To make this work, however, you may need to help the database a bit to do its job efficiently. You will have to create indexes matching your filtering criteria, and you may have to look into performance issues with certain types of queries (how to avoid temporary tables and such). However, it is definately worth it.
Having that said, there actually are a few types of queries that a given database is not good at doing. Those are few and far between, but they do exist. In those cases, an implementation in RoR might be the better way to go. Even without knowing more about your scenario, I'd say it's a pretty safe bet that your queries are not among those.

Activerecord attribute string reversal and then order

Folks,
I have a situation where I have a string attribute(name) in my model which has data in this format:
dan01
joel02
ken01
raol01
What I want to do is query and order by name but I want the order by to use the string in reverse order, that is dan01 should be considered as 10nad, wondering what is the best way to do this in rails.
This is dependent on the database you're using, as you'll want to do this at the database layer for performance. Actually, I'd argue you probably want to rethink what you're actually trying to accomplish here, as I doubt this is really the answer you're looking for. -Jedi Mind Meld-
Assuming you're using MySQL: If your model is Person, your query would be Person.order('REVERSE(name)')
You could do it in ruby, ala Person.all.sort_by{|p| p.name.reverse} but I really wouldn't recommend that.
Without knowing the full context of what you're doing, I'll hazard a guess that you want to be storing a second integer column that's your sort order or something along those lines.
I don't have a MySQL DB in front of me, but you should be able to do something like:
Foo.order('REVERSE(name)')
The above works in Postgres and should also work in MySQL according to http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_reverse.

Performance of where statement in SQL as opposed to filter on dataset

please flog me if I haven't searched thoroughly enough.....
I am wondering what would be better for performance:
Collect, aggregate and sort my data using SQL (WHERE, Group by, Order By statements in dataset)
or
just collect the 'naked' data and group, sort and filter in the report. (Filters on dataset, parameters and aggregating in the report)
Would using stored procedures be beneficial to performance?
Greetings,
Henro
Well, SSRS is a tool to display results, it is optimized to do that, and though it can perform aggregations and filters and a lot more things, it doesn't means that its his primary goal, so its not optimized to do that. When you perform the aggregations, filters and data manipulation on the dataset, you are using the database engine for that, something that its optimized to do that, so you are most likely get better performance this way. As for stored procedures or plain SQL, there is no inherent performance benefits in either of one (I prefer plain SQL only because it gives me more flexibility).
In terms of performance, SQL Server is optimized for that sort of thing;
Under certain circumstances, a stored procedure can significantly increase performance as it precompiles the query plan... in this case, unless the report is being called quite often, I don't know if you'll notice the difference. I do prefer to keep my SQL out of the report, though.

Rails 3 query on condition of an association's count

In Rails 3 with mysql, suppose I have two models, Customers and Purchases, obviously purchase belongs_to customer. I want to find all the customers with 2 orders or more. I can simply say:
Customer.includes(:purchases).all.select{|c| c.purchases.count > 2}
Effectively though, the line above makes query on the magnitude of Customer.all and Purchase.all, then does the "select" type processing in ruby. In a large database, I would much prefer to avoid doing all this "select" calculation in ruby, and have mysql do the processing and only give me the list of qualified customers. That is both much faster (since mysql is more tuned to do this) and significantly reduces bandwidth from the database.
Unfortunately I am unable to conjure up the code with the building blocks in rails(where, having, group, etc) to make this happen, something on the lines of (psudo-code):
Customer.joins(:purchases).where("count(purchases) > 2").all
I will settle for straight MySql solution, though I much prefer to figure this out in the elegant framework of rails.
No need to install a gem to get this to work (though metawhere is cool)
Customer.joins(:purchases).group("customers.id").having("count(purchases.id) > ?",0)
The documentation on this stuff is fairly sparse at this point. I'd look into using Metawhere if you'll be doing any more queries that are similar to this. Using Metawhere, you can do this (or something similar, not sure if the syntax is exactly correct):
Customer.includes(:purchases).where(:purchases => {:count.gte => 2})
The beauty of this is that MetaWhere still uses ActiveRecord and arel to perform the query, so it works with the 'new' rails 3 way of doing queries.
Additionally, you probably don't want to call .all on the end as this will cause the query to ping the database. Instead, you want to use lazy loading and not hit the db until you actually require the data (in the view, or some other method that is processing actual data.)
This is a bit more verbose, but if you want Customers where count = 0 or a more flexible sql, you would do a LEFT JOIN
Customer.joins('LEFT JOIN purchases on purchases.customer_id = customers.id').group('customers.id').having('count(purchases.id) = 0').length

Is there an efficient string matching algorithm in MySQL?

Is there an implementation of a fast string matching algorithm for searching keywords in MySQL? For example Aho-Corasick or any other fast string matching algorithm.
Typically Aho-Corasick is implemented in Java or any other compiled language but it should be possible to write it as a stored procedure in MySQL.
Thanks!
As stored procedures are turing-complete, and you can use a "cursor" to loop through the records in a table (possibly with some existing "WHERE" cause), then you can do it in a stored procedure.
A stored function would also be possible.
However, the MySQL stored-routine language is so terrible both in terms of programmer-usability and performance, that the result is unlikely to be easy or fast.
So you might be better off writing a MySQL UDF (which you can write in any language, provided you can make it look like a C library) and having that do it instead.
Consider your specific requirements. I am assuming that a query with lots of "OR col LIKE ..." tagged together is too inefficient for you, as you wish to match thousands of patterns at once, right?