Ecto query and custom MySQL function with variable arity - mysql

I want to perform a query like the following one:
SELECT id, name
FROM mytable
ORDER BY FIELD(name, 'B', 'A', 'D', 'E', 'C')
FIELD is a MySQL specific function, and 'B', 'A', 'D', 'E', 'C' are values coming from a List.
I tried using fragment, but it doesn't seem to allow dynamic arity known only in the runtime.
Except going full-raw using Ecto.Adapters.SQL.query, is there a way to handle this using Ecto's query DSL?
Edit: Here's the first, naive approach, which of course does not work:
ids = [2, 1, 3] # this list is of course created dynamically and does not always have three items
query = MyModel
|> where([a], a.id in ^ids)
|> order_by(fragment("FIELD(id, ?)", ^ids))

ORM are wonderful, until they leak. All do, eventually. Ecto is young (f.e., it only gained ability to OR where clauses together 30 days ago), so it's simply not mature enough to have developed an API that considers advanced SQL gyrations.
Surveying possible options, you're not alone in the request. The inability to comprehend lists in fragments (whether as part of order_by or where or any where else) has been mentioned in Ecto issue #1485, on StackOverflow, on the Elixir Forum and this blog post. The later is particulary instructive. More on that in a bit. First, let's try some experiments.
Experiment #1: One might first try using Kernel.apply/3 to pass the list to fragment, but that won't work:
|> order_by(Kernel.apply(Ecto.Query.Builder, :fragment, ^ids))
Experiment #2: Then perhaps we can build it with string manipulation. How about giving fragment a string built-at-runtime with enough placeholders for it to pull from the list:
|> order_by(fragment(Enum.join(["FIELD(id,", Enum.join(Enum.map(ids, fn _ -> "?" end), ","), ")"], ""), ^ids))
Which would produce FIELD(id,?,?,?) given ids = [1, 2, 3]. Nope, this doesn't work either.
Experiment #3: Creating the entire, final SQL built from the ids, placing the raw ID values directly in the composed string. Besides being horrible, it doesn't work, either:
|> order_by(fragment(Enum.join(["FIELD(id,", Enum.join(^ids, ","), ")"], "")))
Experiment #4: This brings me around to that blog post I mentioned. In it, the author hacks around the lack of or_where using a set of pre-defined macros based on the number of conditions to pull together:
defp orderby_fragment(query, [v1]) do
from u in query, order_by: fragment("FIELD(id,?)", ^v1)
end
defp orderby_fragment(query, [v1,v2]) do
from u in query, order_by: fragment("FIELD(id,?,?)", ^v1, ^v2)
end
defp orderby_fragment(query, [v1,v2,v3]) do
from u in query, order_by: fragment("FIELD(id,?,?,?)", ^v1, ^v2, ^v3)
end
defp orderby_fragment(query, [v1,v2,v3,v4]) do
from u in query, order_by: fragment("FIELD(id,?,?,?)", ^v1, ^v2, ^v3, ^v4)
end
While this works and uses the ORM "with the grain" so to speak, it requires that you have a finite, manageable number of available fields. This may or may not be a game changer.
My recommendation: don't try to juggle around an ORM's leaks. You know the best query. If the ORM won't accept it, write it directly with raw SQL, and document why the ORM does not work. Shield it behind a function or module so you can reserve the future right to change its implementation. One day, when the ORM catches up, you can then just rewrite it nicely with no effects on the rest of the system.

Create a table with 2 columns:
B 1
A 2
D 3
E 4
C 5
Then JOIN LEFT(name, 1) to it and get the ordinal. Then sort by that.
(Sorry, I can't help with Elixir/Ecto/Arity.)

I would try to resolve this using the following SQL SELECT statement:
[Note: Don't have access right now to a system to check the correctness of the syntax, but I think it is OK]
SELECT A.MyID , A.MyName
FROM (
SELECT id AS MyID ,
name AS MyName ,
FIELD(name, 'B', 'A', 'D', 'E', 'C') AS Order_By_Field
FROM mytable
) A
ORDER BY A.Order_By_Field
;
Please note that the list 'B','A',... can be passed as either an array or any other method and replace what is written in the above code sample.

This was actually driving me crazy until I found that (at least in MySQL), there is a FIND_IN_SET function. The syntax is a bit weird, but it doesn't take variable arguments, so you should be able to do this:
ids = [2, 1, 3] # this list is of course created dynamically and does not always have three items
ids_string = Enum.join(ids, ",")
query = MyModel
|> where([a], a.id in ^ids)
|> order_by(fragment("FIND_IN_SET(id, ?)", ^ids_string))

Related

Is there any benefit in using parameters for query parameters that dont change?

I am writing methods in PHP classes that use QueryBuilder for the DBAL (Doctrine) ORM.
One of the thing I like about this is that we can easily use parameters in the SQL that is generated. However, as I was working on a few queries and I notice that there are many instances where the query has parameters that dont change (like "WHERE is_active = 1") where the "1" has been parameterized.
Is there any benefit to parameterizing these values that are written in these functions that aren't ever truly in line or exposed?
example:
$this->db->select('u.LastName, u.FirstName, s.*')
->from('scores', 's' )
->join( 's', 'user', 'u', 's.user_id = u.id')
->where( "s.active = :is_active")
->andWhere("s.id = :user_id" )
->setParameters(['is_active' => 1, 'user_id' => $user_id]);
versus something like this:
$this->db->select('u.LastName, u.FirstName, s.*')
->from('scores', 's' )
->join( 's', 'user', 'u', 's.user_id = u.id')
->where( "s.active = 1")
->andWhere("s.id = :user_id" )
->setParameter('user_id', $user_id);
I am not usually a Belt and Suspenders guy, (meaning I do it just to be safe), if there is no benefit, then I probably wouldn't do it.
I have looked at the DBAL docs and MySQL docs and I just don't see anything that tells me one way or the other. Maybe someone else has better Google-Fu or personal experience.
Thanks in advance,
GaryC.
There is no technical benefit. If the value is always a constant 1, you might as well make it a literal in the query, as you show in your second example.
There may be a benefit that is a developer culture issue, not a technical benefit. If some members of your developer team are novice and have a hard time understanding when to use a parameter and when it's safe not to use a parameter, then giving them a guideline to always use a parameter for any value in any SQL expression at least means they will use parameters when they really need to do so.

MySQL SUBSTR LOCATE multi-search-strings

Tricky one, and my brain is mush after staring at my screen for about an hour.
I'm trying to query my database to return the first part of a string (domain name eg. http://www.example.com) in the column image_link.
I have managed this for all rows where the image_link contains .com as part of the string... but I need the code to be more versatile, so it searches for the likes of .net and .co.uk too.
Had thought some sort of nested REPLACE might work, but it doesn't make sense when I try to apply it - and I'm stuck.
Query Builder code:
$builder->select("SUBSTRING(image_link, 1, LOCATE('.com', image_link) + 3) AS domain");
Example strings, with desired results:
http://www.example.com/brands/567.jpg // http://www.example.com
https://www.example.org/photo.png // https://www.example.org
http://example.net/789 // http://example.net
Any help/advice warmly welcomed!
SELECT ... ,
SUBSTRING_INDEX(image_link, '/', 3) domain
FROM test;
Or, if protocol may be absent, then
SELECT ... ,
SUBSTRING_INDEX(image_link, '/', CASE WHEN LOCATE('//', image_link) THEN 3 ELSE 1 END) domain
FROM test;
fiddle

Why is Rails is adding `OR 1=0` to queries using the where clause hash syntax with a range?

The project that I'm working on is using MySQL on RDS (mysql2 gem specifically).
When I use a hash of conditions including a range in a where statement I'm getting a bit of an odd addition to my query.
User.where(id: [1..5])
and
User.where(id: [1...5])
Result in the following queries respectively:
SELECT `users`.* FROM `users` WHERE ((`users`.`id` BETWEEN 1 AND 5 OR 1=0))
SELECT `users`.* FROM `users` WHERE ((`users`.`id` >= 1 AND `users`.`id` < 5 OR 1=0))
The queries work perfectly fine since OR FALSE is effectively a no-op. I'm just wondering why Rails or ARel is adding this snippet into the query.
EDIT
It looks like the line that could explain this is line 26 in ActiveRecord::PredicateBuilder. Still no idea how the hash could be empty? at that point but maybe someone else does.
EDIT 2
This is intersting. I was looking into Filip's comment to see why he made it since it seems just like a clarification but he is correct that 1..5 != [1..5]. The former is an inclusive range from 1 to 5 where as the latter is an array whose first element is the former. I tried putting these into an ARel where call to see the SQL produced and the OR 1=0 is not there!
User.where(id: 1..5) #=> SELECT "users".* FROM "users" WHERE ("users"."id" BETWEEN 1 AND 5)
User.where(id: 1...5) #=> SELECT "users".* FROM "users" WHERE ("users"."id" >= 1 AND "users"."id" < 5)
While I still do not know why ARel is adding the OR 1=0 which will always be false and seemingly unnecessary. It may be due to how Arrays and Ranges are handled differently.
Building on the fact, which you've discovered, that [1..5] is not the correct way to specify the range... I have discovered why [1..5] behaves as it does. To get there, I first found that an empty array in a hash condition produces the 1=0 SQL condition:
User.where(id: []).to_sql
# => "SELECT \"users\".* FROM \"users\" WHERE 1=0"
And, if you check the ActiveRecord::PredicateBuilder::ArrayHandler code, you'll see that array values are always partitioned into ranges and other values.
ranges, values = values.partition { |v| v.is_a?(Range) }
This explains why you don't see the 1=0 when using non-range values. That is, the only way to get 1=0 from an array without including a range is to supply an empty array, which yields the 1=0 condition, as shown above. And when all the array has in it is a range you're going to get the range conditions (ranges) and, separately, an empty array condition (values) executed. My guess is that there isn't a good reason for this... it just simply is easier to let this be than to avoid it (since the result set is equivalent either way). If the partition code was a bit smarter then it wouldn't have to tack on the additional, empty values array and could skip the 1=0 condition.
As for where the 1=0 comes from in the first place... I think that comes from the database adapter, but I couldn't find exactly where. However, I would call it an attempt to fail to find a record. In other words, WHERE 1=0 isn't ever going to return any users, which makes sense over alternative SQL like WHERE id=null which will find any users whose id is null (realizing that this isn't really correct SQL syntax). And this is what I'd expect when attempting to find all Users whose id is in the empty set (i.e. we're not asking for nil ids or null ids or whatever). So, in my mind, leaving the bit about exactly where 1=0 comes from as a black box is OK. At least we now can reason about why the range inside of the array is causing it to show up!
UPDATE
I've also found that, even when using ARel directly, you can still get 1=0:
User.arel_table[:id].in([]).to_sql
# => "1=0"
This is strictly speaking a guess, since I did something similar in a project of my own (although I used AND 1).
For whatever reason, when generating a query, it is easier to always have a WHERE clause containing a no-op than it is to conditionally generate the WHERE clause at all. That is, if you don't include any where sections it will end up generating something still valid.
On the other hand, I'm not sure why it's taking this form: when I did it I use 1 [<AND (generated code)>...] it allowed arbitrary chaining, but I don't see how what you're seeing would allow it. None the less, I still think it likely to be a result of an algorithmic code generation scheme.
Check to see if you are using active_record-acts_as. That was the problem with me.
Add the line below to your Gemfile:
gem 'active_record-acts_as', :git => 'https://github.com/hzamani/active_record-acts_as.git'
This will just pull the latest version of the Gem that will hopefully be fixed. Worked for me.
I think you're seeing side effects of ruby personally.
I think the better way to do what you're doing would be with
2.0.0-p481#meri :008 > [*1..5]
=> [1, 2, 3, 4, 5]
User.where(id: [*1..5]).to_sql
"SELECT `users`.* FROM `users` WHERE `users`.`id` IN (1, 2, 3, 4, 5)"
As this creates an Array vs an Array with element 1 of class Range.
OR
use an explicit Range to trigger the BETWEEN in AREL.
# with end element, i.e. exclude_end=false
2.0.0-p481#meri :013 > User.where(id: Range.new(1,5)).to_sql
=> "SELECT `users`.* FROM `users` WHERE (`users`.`id` BETWEEN 1 AND 5)"
# without end element, i.e. exclude_end=true
2.0.0-p481#meri :022 > User.where(id: Range.new(1, 5, true)).to_sql
=> "SELECT `users`.* FROM `users` WHERE (`users`.`id` >= 1 AND `users`.`id` < 5)"
If you care about having control of the queries you generate and the full power of the SQL language and database features then I would suggest moving from ActiveRecord/Arel to Sequel.
I can honestly say there are a lot more quirks and infuriating times ahead for you with ActiveRecord, especially when you move beyond simple crud like queries. When you start trying to query your data in anger, perhaps needing to join a few join tables here and there and realize you really do need join conditions or union all type queries.
It is also significantly faster and more reliable in its query generation and result handling and much easier to compose the queries you want. It also has real documentation you can actually read unlike arel.
I just wish I had discovered it much earlier rather than persisting with the rails default data access layer.

SQL select everything with arbitrary IN clause

This will sound silly, but trust me it is for a good (i.e. over-engineered) cause.
Is it possible to write a SQL query using an IN clause which selects everything in that table without knowing anything about the table? Keep in mind this would mean you can't use a subquery that references the table.
In other words I would like to find a statement to replace "SOMETHING" in the following query:
SELECT * FROM table_a WHERE table_a.id IN (SOMETHING)
so that the results are identical to:
SELECT * FROM table_a
by doing nothing beyond changing the value of "SOMETHING"
To satisfy the curious I'll share the reason for the question.
1) I have a FactoryObject abstract class which grants all models that extend it some glorious factory method magic using two template methods: getData() and load()
2) Models must implement the template methods. getData is a static method that accepts ID constraints, pulls rows from the database, and returns a set of associative arrays. load is not static, accepts an associative array, and populates the object based on that array.
3) The non-abstract part of FactoryObject implements a getObject() and a getObjects() method. These call getData, create objects, and loads() the array responses from getData to create and return populated objects.
getObjects() requires ID constraints as an input, either in the form of a list or in the form of a subquery, which are then passed to getData(). I wanted to make it possible to pass in no ID constraints to get all objects.
The problem is that only the models know about their tables. getObjects() is implemented at a higher level and so it doesn't know what to pass getData(), unless there was a universal "return everything" clause for IN.
There are other solutions. I can modify the API to require getData to accept a special parameter and return everything, or I can implement a static getAll[ModelName]s() method at the model level which calls:
static function getAllModelObjects() {
return getObjects("select [model].id from [model]");
}
This is reasonable and may fit the architecture anyway, but I was curious so I thought I would ask!
Works on SQL Server:
SELECT * FROM table_a WHERE table_a.id IN (table_a.id)
Okay, I hate saying no so I had to come up with another solution for you.
Since mysql is opensource you can get the source and incorporate a new feature that understands the infinity symbol. Then you just need to get the mysql community to buy into the usefulness of this feature (steer the conversation away from security as much as possible in your attempts to do so), and then get your company to upgrade their dbms to the new version once this feature has been implemented.
Problem solved.
The answer is simple. The workaround is to add some criteria like these:
# to query on a number column
AND (-1 in (-1) OR sample_table.sample_column in (-1))
# or to query on a string column
AND ('%' in ('%') OR sample_table.sample_column in ('%'))
Therefore, in your example, two following queries should return the same result as soon as you pass -1 as the parameter value.
SELECT * FROM table_a;
SELECT * FROM table_a WHERE (-1 in (-1) OR table_a.id in (-1));
And whenever you want to filter something out, you can pass it as a parameter. For example, in the following query, the records with id of 1, 2 and 6 are filtered.
SELECT * FROM table_a WHERE (-1 in (1, 2, 6) OR table_a.id in (1, 2, 6));
In this case, we have a default value like -1 or % and we have a parameter that can be anything. If the parameter is the default value, nothing is filtered.
I suggest % character as the default value if you are querying over a text column or -1 if you are querying over the PK of the table. But it totally depends to you to substitute % or -1 with any reserved character or number that you decide on.
similiar to #brandonmoore:
select * from table_a where table_a.id not in ('0')
How about:
select * from table_a where table_a.id not ine ('somevaluethatwouldneverpossiblyexistintable_a.id')
EDIT:
As much as I would like to continue thinking of a way to solve your problem, I know there isn't a way to so I figure I'll go ahead and be the first person to tell you so I can at least get credit for the answer. It's truly a bittersweet victory though :/
If you provide more info though maybe I or someone else can help you think of another workaround.

Union (or Concat, etc..) with Constant values and projection

I've discovered a very nasty gotcha with Linq-to-sql, and i'm not sure what the best solution is.
If you take a simple L2S Union statement, and include L2S code in one side, and constants in the other, then the constants do not get included in the SQL Union and are only projected into the output after the SQL, resulting in SQL errors about the number of columns not mathching for the union.
As an example:
(from d in dc.mytable where foo == "bar" select new {First = d.Foo, Second = d.Roo})
.Union(from e in dc.mytable where foo == "roo" select new {First= "", Second = e.Roo})
This will generate an error "All queries combined using a UNION, INTERSECT or EXCEPT operator must have an equal number of expressions in their target lists.
This is particularly insidious (and maddening) because there are obviously the same number of expressions in the list, but when you look at the SQL, you will notice that it does not generate a column for "First" in the second half of the Union. This is because "First" is inserted into the projection AFTER the query.
Ok, the easy solution is to just convert each part into Enumerables or Lists or something and then do the union in memory rather than SQL, and that's fine if you're dealing with a small amount of data. However, if you're working with a large set of data, which you then plan to further filter (in sql) before returning it this is not ideal.
I guess what i'm looking for is a way to force L2S to include the column in the SQL. Is that possible?
UPDATE:
While not an exact duplicate, this error is similar to This Question and has similar solutions. So i'm closing, but not deleting this question because it may help someone else come to posible solutions from a different way.
Unfortunately, L2S is too smart of it's own good sometimes.
I've decided that the only real solution is to use a stored proc. Hope this helps.
This is a bug in the Linq2SQL provider.
In LinqPad you can clearly see the bug.
(from d in dc.mytable where foo == "bar" select new {First = d.Foo, Second = d.Roo})
.Union(from e in dc.mytable where foo == "roo" select new {First= "", Second = e.Roo})
Will server side produce something like this:
SELECT [t2].[Foo], [t2].[Roo]
FROM (
SELECT [t0].[Foo], #p0 AS [value]
FROM [dc].[Mytable] AS [t0]
UNION ALL
SELECT [t1].[Foo], [t1].[Roo]
FROM [dc].[Mytable] AS [t1]
) AS [t2]
This will be a problem because the union will name the second column "value" instead of "Roo", which will cause the outer query to fail.
If you, however, switch the order of the two tables
(from e in dc.mytable where foo == "roo" select new {First= "", Second = e.Roo})
.Union(from d in dc.mytable where foo == "bar" select new {First = d.Foo, Second = d.Roo})
So that the constant assignment within the generated T-SQL comes in the non-first table, then things may work because T-SQL ignores the column names of subsequent tables.
Note: The first table in a union decides both column name and type. So would be smart to get LinqPad anyway.