Building queries dynamically in rails - mysql

Im trying to replicate the searching list style of crunchbase using ruby on rails.
I have an array of filters that looks something like this:
[
{
"id":"0",
"className":"Company",
"field":"name",
"operator":"starts with",
"val":"a"
},
{
"id":"1",
"className":"Company",
"field":"hq_city",
"operator":"equals",
"val":"Karachi"
},
{
"id":"2",
"className":"Category",
"field":"name",
"operator":"does not include",
"val":"ECommerce"
}
]
I send this json string to my ruby controller where I have implemented this logic:
filters = params[:q]
table_names = {}
filters.each do |filter|
filter = filters[filter]
className = filter["className"]
fieldName = filter["field"]
operator = filter["operator"]
val = filter["val"]
if table_names[className].blank?
table_names[className] = []
end
table_names[className].push({
fieldName: fieldName,
operator: operator,
val: val
})
end
table_names.each do |k, v|
i = 0
where_string = ''
val_hash = {}
v.each do |field|
if i > 0
where_string += ' AND '
end
where_string += "#{field[:fieldName]} = :#{field[:fieldName]}"
val_hash[field[:fieldName].to_sym] = field[:val]
i += 1
end
className = k.constantize
puts className.where(where_string, val_hash)
end
What I do is, I loop over the json array and create a hash with keys as table names and values are the array with the name of the column, the operator and the value to apply that operator on. So I would have something like this after the table_names hash is created:
{
'Company':[
{
fieldName:'name',
operator:'starts with',
val:'a'
},
{
fieldName:'hq_city',
operator:'equals',
val:'karachi'
}
],
'Category':[
{
fieldName:'name',
operator:'does not include',
val:'ECommerce'
}
]
}
Now I loop over the table_names hash and create a where query using the Model.where("column_name = :column_name", {column_name: 'abcd'}) syntax.
So I would be generating two queries:
SELECT "companies".* FROM "companies" WHERE (name = 'a' AND hq_city = 'b')
SELECT "categories".* FROM "categories" WHERE (name = 'c')
I have two problems now:
1. Operators:
I have many operators that can be applied on a column like 'starts with', 'ends with', 'equals', 'does not equals', 'includes', 'does not includes', 'greater than', 'less than'. I am guessing the best way would be to do a switch case on the operator and use the appropriate symbol while building the where string. So for example, if the operator is 'starts with', i'd do something like where_string += "#{field[:fieldName]} like %:#{field[:fieldName]}" and likewise for others.
So is this approach correct and is this type of wildcard syntax allowed in this kind of .where?
2. More than 1 table
As you saw, my approach builds 2 queries for more than 2 tables. I do not need 2 queries, I need the category name to be in the same query where the category belongs to the company.
Now what I want to do is I need to create a query like this:
Company.joins(:categories).where("name = :name and hq_city = :hq_city and categories.name = :categories[name]", {name: 'a', hq_city: 'Karachi', categories: {name: 'ECommerce'}})
But this is not it. The search can become very very complex. For example:
A Company has many FundingRound. FundingRound can have many Investment and Investment can have many IndividualInvestor. So I can select create a filter like:
{
"id":"0",
"className":"IndividualInvestor",
"field":"first_name",
"operator":"starts with",
"val":"za"
}
My approach would create a query like this:
SELECT "individual_investors".* FROM "individual_investors" WHERE (first_name like %za%)
This query is wrong. I want to query the individual investors of the investments of the funding round of the company. Which is a lot of joining tables.
The approach that I have used is applicable to a single model and cannot solve the problem that I stated above.
How would I solve this problem?

You can create a SQL query based on your hash. The most generic approach is raw SQL, which can be executed by ActiveRecord.
Here is some concept code that should give you the right idea:
query_select = "select * from "
query_where = ""
tables = [] # for selecting from all tables
hash.each do |table, values|
table_name = table.constantize.table_name
tables << table_name
values.each do |q|
query_where += " AND " unless query_string.empty?
query_where += "'#{ActiveRecord::Base.connection.quote(table_name)}'."
query_where += "'#{ActiveRecord::Base.connection.quote(q[fieldName)}'"
if q[:operator] == "starts with" # this should be done with an appropriate method
query_where += " LIKE '#{ActiveRecord::Base.connection.quote(q[val)}%'"
end
end
end
query_tables = tables.join(", ")
raw_query = query_select + query_tables + " where " + query_where
result = ActiveRecord::Base.connection.execute(raw_query)
result.to_h # not required, but raw results are probably easier to handle as a hash
What this does:
query_select specifies what information you want in the result
query_where builds all the search conditions and escapes input to prevent SQL injections
query_tables is a list of all the tables you need to search
table_name = table.constantize.table_name will give you the SQL table_name as used by the model
raw_query is the actual combined sql query from the parts above
ActiveRecord::Base.connection.execute(raw_query) executes the sql on the database
Make sure to put any user submitted input in quotes and escape it properly to prevent SQL injections.
For your example the created query will look like this:
select * from companies, categories where 'companies'.'name' LIKE 'a%' AND 'companies'.'hq_city' = 'karachi' AND 'categories'.'name' NOT LIKE '%ECommerce%'
This approach might need additional logic for joining tables that are related.
In your case, if company and category have an association, you have to add something like this to the query_where
"AND 'company'.'category_id' = 'categories'.'id'"
Easy approach: You can create a Hash for all pairs of models/tables that can be queried and store the appropriate join condition there. This Hash shouldn't be too complex even for a medium-sized project.
Hard approach: This can be done automatically, if you have has_many, has_one and belongs_to properly defined in your models. You can get the associations of a model using reflect_on_all_associations. Implement a Breath-First-Search or Depth-First Search algorithm and start with any model and search for matching associations to other models from your json input. Start new BFS/DFS runs until there are no unvisited models from the json input left. From the found information, you can derive all join conditions and then add them as expressions in the where clause of the raw sql approach as explained above. Even more complex, but also doable would be reading the database schema and using a similar approach as defined here by looking for foreign keys.
Using associations: If all of them are associated with has_many / has_one, you can handle the joins with ActiveRecord by using the joins method with inject on the "most significant" model like this:
base_model = "Company".constantize
assocations = [:categories] # and so on
result = assocations.inject(base_model) { |model, assoc| model.joins(assoc) }.where(query_where)
What this does:
it passes the base_model as starting input to Enumerable.inject, which will repeatedly call input.send(:joins, :assoc) (for my example this would do Company.send(:joins, :categories) which is equivalent to `Company.categories
on the combined join, it executes the where conditions (constructed as described above)
Disclaimer The exact syntax you need might vary based on the SQL implementation you use.

Full blown SQL string is a security issue, because it exposes your application to a SQL injection attack. If you can get your way around this, it is completely ok to make those query concatenations, as long as you make them compatible with your DB(yes, this solution is DB specific).
Other than that you can make some field that marks some querys as joined, as I have mentioned in the comment, you would have some variable to mark the desired table to be the output of the query, something like:
[
{
"id":"1",
"className":"Category",
"field":"name",
"operator":"does not include",
"val":"ECommerce",
"queryModel":"Company"
}
]
Which, when processing the query, you would use to output the result of this query as the queryModel instead of the className, in those cases the className would be used only to join the table conditions.

I would suggest altering your JSON data. Right now you only send name of the model, without the context, it would be easier if your model would have context.
In your example data would have to look like
data = [
{
id: '0',
className: 'Company',
relation: 'Company',
field: 'name',
operator: 'starts with',
val: 'a'
},
{
id: '1',
className: 'Category',
relation: 'Company.categories',
field: 'name',
operator: 'equals',
val: '12'
},
{
id: '3',
className: 'IndividualInvestor',
relation: 'Company.founding_rounds.investments.individual_investors',
field: 'name',
operator: 'equals',
val: '12'
}
]
And you send this data to QueryBuilder
query = QueryBuilder.new(data)
results = query.find_records
Note: find_records returns array of hashes per model on which you execute query.
For example it would return [{Company: [....]]
class QueryBuilder
def initialize(data)
#data = prepare_data(data)
end
def find_records
queries = #data.group_by {|e| e[:model]}
queries.map do |k, v|
q = v.map do |f|
{
field: "#{f[:table_name]}.#{f[:field]} #{read_operator(f[:operator])} ?",
value: value_based_on_operator(f[:val], f[:operator])
}
end
db_query = q.map {|e| e[:field]}.join(" AND ")
values = q.map {|e| e[:value]}
{"#{k}": k.constantize.joins(join_hash(v)).where(db_query, *values)}
end
end
private
def join_hash(array_of_relations)
hash = {}
array_of_relations.each do |f|
hash.merge!(array_to_hash(f[:joins]))
end
hash.map do |k, v|
if v.nil?
k
else
{"#{k}": v}
end
end
end
def read_operator(operator)
case operator
when 'equals'
'='
when 'starts with'
'LIKE'
end
end
def value_based_on_operator(value, operator)
case operator
when 'equals'
value
when 'starts with'
"%#{value}"
end
end
def prepare_data(data)
data.each do |record|
record.tap do |f|
f[:model] = f[:relation].split('.')[0]
f[:joins] = f[:relation].split('.').drop(1)
f[:table_name] = f[:className].constantize.table_name
end
end
end
def array_to_hash(array)
if array.length < 1
{}
elsif array.length == 1
{"#{array[0]}": nil}
elsif array.length == 2
{"#{array[0]}": array[1]}
else
{"#{array[0]}": array_to_hash(array.drop(1))}
end
end
end

I feel you are over complicating things by having one single controller for everything. I would create a controller for every model or entity that you would want to show and then implement the filters like you said.
Implementing a dynamic where and order by is not very hard but if, as you said, you need to have also the logic to implement some joins you are not only over complicating the solution (because you will have to keep this controller updated every time you add a new model, entity or change the basic logic) but you are also enabling people start playing with your data.
I am not very familiar with Rails so sadly I cannot give you any specific cde other than saying that your approach seems OK to me. I would explode it into multiple controllers.

Related

Stored procedure using dynamic SQL statement stored in database column

I have a table called Coupon.
This table has a column called query which holds a string.
The query string has some logical conditions in it formatted for a where statement. For example:
coupon1.query
=> " '/hats' = :url "
coupon2.query
=> " '/pants' = :url OR '/shoes' = :url "
I want to write a stored procedure that takes as input 2 parameters: a list of Coupon ids and a variable (in this example, the current URL).
I want the procedure to look up the value of the query column from each Coupon. Then it should run that string in a where statement, plugging in my other parameter (current url), then return any Coupon ids that matches.
Here's how I would expect the procedure to behave given the two coupons above.
Example 1:
* Call procedure with ids for coupon1 and coupon2, with #url = '/hats'
* Expect coupon1 to be returned.
Example 2:
* Call procedure with ids for coupon1 and coupon2, with #url = '/pants'
* Expect coupon2 to be returned.
Example 3:
* Call procedure with ids for coupon1 and coupon2, with #url = '/shirts'
* Expect no ids returned. URL does not match '/hats' for coupon1, and doesn't match '/pants or /shoes' for coupon2.
It's easy to test these out in ActiveRecord. Here is just example 1.
#url = '/hats'
#query = coupon1.query
# "'/hats' = :url"
Coupon.where(#query, url: #url).count
=> 2
# count is non-zero number because the query matches the url parameter.
# Coupon1 passes, its id would be returned from the stored procedure.
'/hats' == '/hats'
#query = coupon2.query
# " '/pants' = :url OR '/shoes' = :url "
Coupon.where(#query, url: #url).count
=> 0
# count is 0 because the query does not match the url parameter.
# Coupon2 does not pass, its id would not be returned from the stored procedure.
'/pants' != '/hats', '/shoes' != '/hats'
You could write this as a loop (I'm in ruby on rails with activerecord) but I need something that performs better - I could potentially have lots of coupons so I can't just check each one directly with a loop. The queries contain complex AND/OR logic so I can't just compare against a list of urls either. But here's some code of a loop that is essentially what I'm trying to translate into a stored procedure.
# assume coupon1 has id 1, coupon2 has id 2
#coupons = [coupon1, coupon2]
#url = '/hats'
#coupons.map do |coupon|
if Coupon.where(coupon.query, url: #url).count > 0
coupon.id
else
nil
end
end
=> [1, nil]
Ok, I've been pondering this one.
Big picture:
A. You have a #url you want to search for to find a match among many potential Coupons
B. A coupon has a URL that might match #url
If that's the true extent of the problem, I think you've really over-complicated things.
coupon1.query
=> ["/hats"]
coupon2.query
=> ["/pants", "/shoes"]
#url = '/hats'
Coupon.where('FIND_IN_SET(:url, query) <> 0')
Or something similar, I'm not a mySQL user myself.
However, this is very possible to achieve and may even have a much better ActiveRecord way to do the query.
UPDATE
Ok, I'm missing something. I can't actually reproduce this in console.
#url = '/hats'
#query = coupon1.query
# "'/hats' = :url"
Coupon.where(#query, url: #url).count
> SELECT * FROM 'coupons' WHERE ( '/hats' = '/hats' )
As you can see from the select statement, this will always return all records. It's the same as writing SELECT * FROM 'coupons' WHERE ( true )
How are you actually performing a valid query?
Sorry to post this in my answer, I wanted good formatting.
If I've got something wrong here, maybe we need to move this to a chat room.
I think you have just enough reputation for me to invite you to a room.
UPDATE2
Since you have to compare #query to each record individually, I think you'll have to loop.
But, I don't think you need to use Coupon.where to accomplish this since you are only comparing one record at a time.
#coupons.map do |coupon|
# don't bother putting nil in the array
next unless coupon.query == #url
coupon.id
end
However, your original question was about performance when scaled, and you know you aren't going to solve that with a loop.
Maybe JSONB instead of String so that you could actually do some SQL.
But, even with JSONB, this is still complicated by wanting your conditions to be evaluated properly.
{
"url": {
"AND": ["/hats", "/shoes"],
"OR": ["/pants"]
},
"logged_in": true,
"is_gold_member": false
}
{
"logged_in": false,
"url": "/hats"
}
{
"url": {
"OR": ["/pants", "/shoes"]
}
}
Ultimately, I think what you're doing with query attributes is going to continue to be your stumbling block. It's very clever, but it's not simple.
If it were my app, I think I would go back to considering my use case and try to find a different strategy to map specific coupons to specific parameters in a more on-the-rails way.

find row in ruby array

I have a mysql query that returns this type of data:
{"id"=>1, "serviceCode"=>"1D00", "price"=>9.19}
{"id"=>2, "serviceCode"=>"1D01", "price"=>9.65}
I need to return the id field based on a match of the serviceCode.
i.e. I need a method like this
def findID(serviceCode)
find the row that has the service code and return the ID
end
I was thinking of having a serviceCodes.each do |row| method and loop through and essentially go
if row == serviceCode
return row['id']
end
is there a faster / easier way?
You can use the method Enumerable#find:
service_codes = [
{"id"=>1, "serviceCode"=>"1D00", "price"=>9.19},
{"id"=>2, "serviceCode"=>"1D01", "price"=>9.65}
]
service_codes.find { |row| row['serviceCode'] == '1D00' }
# => {"id"=>1, "serviceCode"=>"1D00", "price"=>9.19}
If you use Rails Active Record as ORM and your Model named Product (only for example),
you can use something like this:
def findID(serviceCode)
Product.select(:id).where(serviceCode: serviceCode).first
end
If you have plain SQL Query in plain ruby class (not recommended), you should change this query to get only the id, as Luiggi mentioned. But aware of SQL Injections if your serviceCode coming from external Requests.

fetch mysql prepared statement as array of hashes

I'm struggling with ruby's mysql gem and prepares statements.
I want to end up with the same as I would do with each_hash over the result, but it's nor supported in prepares statements.
So I came with this horrible mess.
stmt = #db.prepare("SELECT mat_id, name, qty FROM materials WHERE mat_id = ? ")
#those 3 lines hurt my eyes
res = stmt.execute(params[:id])
mat_id, name, qty = res.bind_result(Integer, String, Integer).fetch
#material = [mat_id: mat_id, name: name, qty: qty]
There has to be a better way to fetch the results and get an array of hashes.
A better mysql gem could be a valid answer. An ORM is NOT.
Seeing the comments, I'll still post the Sequel link as an answer:
http://sequel.rubyforge.org/
You don't need to use the model part of Sequel at all. In fact, the docs has an entire section dedicated to SQL junkies:
http://sequel.rubyforge.org/rdoc/files/doc/sql_rdoc.html
example query:
DB.fetch("SELECT * FROM albums WHERE name LIKE ?", 'A%') do |row|
puts row[:name]
end
Oneliner!
Hash[stmt.result_metadata.fetch_fields.map(&:name).zip( stmt.fetch )]
Or more robust
row = stmt.fetch
Hash[stmt.result_metadata.fetch_fields.map(&:name).zip( row )] if row
According to http://tmtm.org/en/mysql/ruby/, results have an "each_hash" method, but statements don't. What a pain in the ass...
#A proxy for the statement class
class Stmt
def each_hash
fields = #target.result_metadata.fetch_fields.map do |f| f.name.to_sym end
#target.execute.each do |x|
hash = {}
fields.zip(x).each do |pair|
hash[pair[0]] = pair[1]
end
yield hash
end
end
def initialize(target)
#target = target
end
def method_missing(name, *args, &block)
#target.send(name, *args, &block)
end
end
Now, you can do this:
Stmt.new(#db.prepare(...).execute(...)).each_hash do |x|
puts x
end
and you can loop through each row as a hash.
I still haven't tested this for multiple executions

How to get Ruby MySQL returning PHP like DB SELECT result

So I use the PDO for a DB connection like this:
$this->dsn[$key] = array('mysql:host=' . $creds['SRVR'] . ';dbname=' . $db, $creds['USER'], $creds['PWD']);
$this->db[$key] = new PDO($this->dsn[$key]);
Using PDO I can then execute a MySQL SELECT using something like this:
$sql = "SELECT * FROM table WHERE id = ?";
$st = $db->prepare($sql);
$st->execute($id);
$result = $st->fetchAll();
The $result variable will then return an array of arrays where each row is given a incremental key - the first row having the array key 0. And then that data will have an array the DB data like this:
$result (array(2)
[0]=>[0=>1, "id"=>1, 1=>"stuff", "field1"=>"stuff", 2=>"more stuff", "field2"=>"more stuff" ...],
[1]=>[0=>2, "id"=>2, 1=>"yet more stuff", "field1"=>"yet more stuff", 2=>"even more stuff", "field2"=>"even more stuff"]);
In this example the DB table's field names would be id, field1 and field2. And the result allows you to spin through the array of data rows and then access the data using either a index (0, 1, 2) or the field name ("id", "field1", "field2"). Most of the time I prefer to access the data via the field names but access via both means is useful.
So I'm learning the ruby-mysql gem right now and I can retrieve the data from the DB. However, I cannot get the field names. I could probably extract it from the SQL statement given but that requires a fair bit of coding for error trapping and only works so long as I'm not using SELECT * FROM ... as my SELECT statement.
So I'm using a table full of State names and their abbreviations for my testing. When I use "SELECT State, Abbr FROM states" with the following code
st = #db.prepare(sql)
if empty(where)
st.execute()
else
st.execute(where)
end
rows = []
while row = st.fetch do
rows << row
end
st.close
return rows
I get a result like this:
[["Alabama", "AL"], ["Alaska", "AK"], ...]
And I'm wanting a result like this:
[[0=>"Alabama", "State"=>"Alabama", 1=>"AL", "Abbr"=>"AL"], ...]
I'm guessing I don't have the way inspect would display it quite right but I'm hoping you get the idea by now.
Anyway to do this? I've seen some reference to doing this type of thing but it appears to require the DBI module. I guess that isn't the end of the world but is that the only way? Or can I do it with ruby-mysql alone?
I've been digging into all the methods I can find without success. Hopefully you guys can help.
Thanks
Gabe
You can do this yourself without too much effort:
expanded_rows = rows.map do |r|
{ 0 => r[0], 'State' => r[0], 1 => r[1], 'Abbr' => r[1] }
end
Or a more general approach that you could wrap up in a method:
columns = ['State', 'Abbr']
expanded_rows = rows.map do |r|
0.upto(names.length - 1).each_with_object({}) do |i, h|
h[names[i]] = h[i] = r[i]
end
end
So you could collect up the rows as you are now and then pump that array of arrays through something like what's above and you should get the sort of data structure you're looking for out the other side.
There are other methods on the row you get from st.fetch as well:
http://rubydoc.info/gems/mysql/2.8.1/Mysql/Result
But you'll have to experiment a little to see what exactly they return as the documentation is, um, a little thin.
You should be able to get the column names out of row or st:
http://rubydoc.info/gems/mysql/2.8.1/Mysql/Stmt
but again, you'll have to experiment to figure out the API. Sorry, I don't have anything set up to play around with the MySQL API that you're using so I can't be more specific.
I realize that php programmers are all cowboys who think using a db layer is cheating, but you should really consider activerecord.

Rails select random record

I don't know if I'm just looking in the wrong places here or what, but does active record have a method for retrieving a random object?
Something like?
#user = User.random
Or... well since that method doesn't exist is there some amazing "Rails Way" of doing this, I always seem to be to verbose. I'm using mysql as well.
Most of the examples I've seen that do this end up counting the rows in the table, then generating a random number to choose one. This is because alternatives such as RAND() are inefficient in that they actually get every row and assign them a random number, or so I've read (and are database specific I think).
You can add a method like the one I found here.
module ActiveRecord
class Base
def self.random
if (c = count) != 0
find(:first, :offset =>rand(c))
end
end
end
end
This will make it so any Model you use has a method called random which works in the way I described above: generates a random number within the count of the rows in the table, then fetches the row associated with that random number. So basically, you're only doing one fetch which is what you probably prefer :)
You can also take a look at this rails plugin.
We found that offsets ran very slowly on MySql for a large table. Instead of using offset like:
model.find(:first, :offset =>rand(c))
...we found the following technique ran more than 10x faster (fixed off by 1):
max_id = Model.maximum("id")
min_id = Model.minimum("id")
id_range = max_id - min_id + 1
random_id = min_id + rand(id_range).to_i
Model.find(:first, :conditions => "id >= #{random_id}", :limit => 1, :order => "id")
Try using Array's sample method:
#user = User.all.sample(1)
In Rails 4 I would extend ActiveRecord::Relation:
class ActiveRecord::Relation
def random
offset(rand(count))
end
end
This way you can use scopes:
SomeModel.all.random.first # Return one random record
SomeModel.some_scope.another_scope.random.first
I'd use a named scope. Just throw this into your User model.
named_scope :random, :order=>'RAND()', :limit=>1
The random function isn't the same in each database though. SQLite and others use RANDOM() but you'll need to use RAND() for MySQL.
If you'd like to be able to grab more than one random row you can try this.
named_scope :random, lambda { |*args| { :order=>'RAND()', :limit=>args[0] || 1 } }
If you call User.random it will default to 1 but you can also call User.random(3) if you want more than one.
If you would need a random record but only within certain criteria you could use "random_where" from this code:
module ActiveRecord
class Base
def self.random
if (c = count) != 0
find(:first, :offset =>rand(c))
end
end
def self.random_where(*params)
if (c = where(*params).count) != 0
where(*params).find(:first, :offset =>rand(c))
end
end
end
end
For e.g :
#user = User.random_where("active = 1")
This function is very useful for displaying random products based on some additional criteria
Strongly Recommend this gem for random records, which is specially designed for table with lots of data rows:
https://github.com/haopingfan/quick_random_records
Simple Usage:
#user = User.random_records(1).take
All other answers perform badly with large database, except this gem:
quick_random_records only cost 4.6ms totally.
the accepted answer User.order('RAND()').limit(10) cost 733.0ms.
the offset approach cost 245.4ms totally.
the User.all.sample(10) approach cost 573.4ms.
Note: My table only has 120,000 users. The more records you have, the more enormous the difference of performance will be.
UPDATE:
Perform on table with 550,000 rows
Model.where(id: Model.pluck(:id).sample(10)) cost 1384.0ms
gem: quick_random_records only cost 6.4ms totally
Here is the best solution for getting random records from database.
RoR provide everything in ease of use.
For getting random records from DB use sample, below is the description for that with example.
Backport of Array#sample based on Marc-Andre Lafortune’s github.com/marcandre/backports/ Returns a random element or n random elements from the array. If the array is empty and n is nil, returns nil. If n is passed and its value is less than 0, it raises an ArgumentError exception. If the value of n is equal or greater than 0 it returns [].
[1,2,3,4,5,6].sample # => 4
[1,2,3,4,5,6].sample(3) # => [2, 4, 5]
[1,2,3,4,5,6].sample(-3) # => ArgumentError: negative array size
[].sample # => nil
[].sample(3) # => []
You can use condition with as per your requirement like below example.
User.where(active: true).sample(5)
it will return randomly 5 active user's from User table
For more help please visit : http://apidock.com/rails/Array/sample