MySQL 5.7 RAND() and IF() without LIMIT leads to unexpected results - mysql

I have the following query
SELECT t.res, IF(t.res=0, "zero", "more than zero")
FROM (
SELECT table.*, IF (RAND()<=0.2,1, IF (RAND()<=0.4,2, IF (RAND()<=0.6,3,0))) AS res
FROM table LIMIT 20) t
which returns something like this:
That's exactly what you would expect. However, as soon as I remove the LIMIT 20 I receive highly unexpected results (there are more rows returned than 20, I cut it off to make it easier to read):
SELECT t.res, IF(t.res=0, "zero", "more than zero")
FROM (
SELECT table.*, IF (RAND()<=0.2,1, IF (RAND()<=0.4,2, IF (RAND()<=0.6,3,0))) AS res
FROM table) t
Side notes:
I'm using MySQL 5.7.18-15-log and this is a highly abstracted example (real query is much more difficult).
I'm trying to understand what is happening. I do not need answers that offer work arounds without any explanations why the original version is not working. Thank you.
Update:
Instead of using LIMIT, GROUP BY id also works in the first case.
Update 2:
As requested by zerkms, I added t.res = 0 and t.res + 1 to the second example

The problem is caused by a change introduced in MySQL 5.7 on how derived tables in (sub)queries are treated.
Basically, in order to optimize performance, some subqueries are executed at different times and / or multiple times leading to unexpected results when your subquery returns non-deterministic results (like in my case with RAND()).
There are two easy (and likewise ugly) workarounds to get MySQL to "materialize" (aka return deterministic results) these subqueries: Use LIMIT <high number> or GROUP BY id both of which force MySQL to materialize the subquery and return the expected results.
The last option is turn off derived_merge in the optimizer_switch variable: derived_merge=off (make sure to leave all the other parameters as they are).
Further readings:
https://mysqlserverteam.com/derived-tables-in-mysql-5-7/
Subquery's rand() column re-evaluated for every repeated selection in MySQL 5.7/8.0 vs MySQL 5.6

Related

Error: MySQL client ran out of memory

Can anyone please advise me on this error...
The database has 40,000 news stories but only the fields 'story' is large,
'old' is a numeric value 0 or 1,
'title' and 'shortstory' are very short or NULL.
any advice appreciated. This is the result of running a search database query.
Error: MySQL client ran out of memory
Statement: SELECT news30_access.usehtml, old, title, story, shortstory, news30_access.name AS accessname, news30_users.user AS authorname, timestamp, news30_story.id AS newsid FROM news30_story LEFT JOIN news30_users ON news30_story.author = news30_users.uid LEFT JOIN news30_access ON news30_users.uid = news30_access.uid WHERE title LIKE ? OR story LIKE ? OR shortstory LIKE ? OR news30_users.user LIKE ? ORDER BY timestamp DESC
The simple answer is: don't use story in the SELECT clause.
If you want the story, then limit the number of results being returned. Start with, say, 100 results by adding:
limit 100
to the end of the query. This will get the 100 most recent stories.
I also note that you are using like with story as well as other string columns. You probably want to be using match with a full text index. This doesn't solve your immediate problem (which is returning too much data to the client). But, it will make your queries run faster.
To learn about full text search, start with the documentation.

mySQL: Can one rely on the implicit ORDER BY done by mySQL when using an IN-Statement?

I just noticed that,
when i execute the following query:
SELECT * FROM tbl WHERE some_key = 1 AND some_foreign_key IN (2,5,23,8,9);
the results come back in the same order they where given in the IN-Statement List,
e.g. the row with some_foreign_key = 2 is the first row returned,
the one with
some_foreign_key = 9 is the last and so on.
This is exactly the opposite behaviour of what this guy describes:
MySQL WHERE IN - Ordering
Can one rely on this behaviour or modify it via some mySQL Server setting?
I know common wisdom is "no ORDER BY Clause" == "RDBMS can sort however it pleases",
but in my current Task at hand this behaviour is quite helpful (really large import)
and it would be great if i could rely on it.
EDIT: I know about the ORDER BY FIELD Trick already, just wanted to know if i can safely avoid the ORDER BY Clause by setting some config somewhere.
ORDER BY FIELD(some_foreign_key, 2, 5, 23, 8, 9)
isn't really that tough to implement - unless you're really simplifying this example. And as you already know it's the only way to be 100% sure of the output ordering.

Why is Rails is adding `OR 1=0` to queries using the where clause hash syntax with a range?

The project that I'm working on is using MySQL on RDS (mysql2 gem specifically).
When I use a hash of conditions including a range in a where statement I'm getting a bit of an odd addition to my query.
User.where(id: [1..5])
and
User.where(id: [1...5])
Result in the following queries respectively:
SELECT `users`.* FROM `users` WHERE ((`users`.`id` BETWEEN 1 AND 5 OR 1=0))
SELECT `users`.* FROM `users` WHERE ((`users`.`id` >= 1 AND `users`.`id` < 5 OR 1=0))
The queries work perfectly fine since OR FALSE is effectively a no-op. I'm just wondering why Rails or ARel is adding this snippet into the query.
EDIT
It looks like the line that could explain this is line 26 in ActiveRecord::PredicateBuilder. Still no idea how the hash could be empty? at that point but maybe someone else does.
EDIT 2
This is intersting. I was looking into Filip's comment to see why he made it since it seems just like a clarification but he is correct that 1..5 != [1..5]. The former is an inclusive range from 1 to 5 where as the latter is an array whose first element is the former. I tried putting these into an ARel where call to see the SQL produced and the OR 1=0 is not there!
User.where(id: 1..5) #=> SELECT "users".* FROM "users" WHERE ("users"."id" BETWEEN 1 AND 5)
User.where(id: 1...5) #=> SELECT "users".* FROM "users" WHERE ("users"."id" >= 1 AND "users"."id" < 5)
While I still do not know why ARel is adding the OR 1=0 which will always be false and seemingly unnecessary. It may be due to how Arrays and Ranges are handled differently.
Building on the fact, which you've discovered, that [1..5] is not the correct way to specify the range... I have discovered why [1..5] behaves as it does. To get there, I first found that an empty array in a hash condition produces the 1=0 SQL condition:
User.where(id: []).to_sql
# => "SELECT \"users\".* FROM \"users\" WHERE 1=0"
And, if you check the ActiveRecord::PredicateBuilder::ArrayHandler code, you'll see that array values are always partitioned into ranges and other values.
ranges, values = values.partition { |v| v.is_a?(Range) }
This explains why you don't see the 1=0 when using non-range values. That is, the only way to get 1=0 from an array without including a range is to supply an empty array, which yields the 1=0 condition, as shown above. And when all the array has in it is a range you're going to get the range conditions (ranges) and, separately, an empty array condition (values) executed. My guess is that there isn't a good reason for this... it just simply is easier to let this be than to avoid it (since the result set is equivalent either way). If the partition code was a bit smarter then it wouldn't have to tack on the additional, empty values array and could skip the 1=0 condition.
As for where the 1=0 comes from in the first place... I think that comes from the database adapter, but I couldn't find exactly where. However, I would call it an attempt to fail to find a record. In other words, WHERE 1=0 isn't ever going to return any users, which makes sense over alternative SQL like WHERE id=null which will find any users whose id is null (realizing that this isn't really correct SQL syntax). And this is what I'd expect when attempting to find all Users whose id is in the empty set (i.e. we're not asking for nil ids or null ids or whatever). So, in my mind, leaving the bit about exactly where 1=0 comes from as a black box is OK. At least we now can reason about why the range inside of the array is causing it to show up!
UPDATE
I've also found that, even when using ARel directly, you can still get 1=0:
User.arel_table[:id].in([]).to_sql
# => "1=0"
This is strictly speaking a guess, since I did something similar in a project of my own (although I used AND 1).
For whatever reason, when generating a query, it is easier to always have a WHERE clause containing a no-op than it is to conditionally generate the WHERE clause at all. That is, if you don't include any where sections it will end up generating something still valid.
On the other hand, I'm not sure why it's taking this form: when I did it I use 1 [<AND (generated code)>...] it allowed arbitrary chaining, but I don't see how what you're seeing would allow it. None the less, I still think it likely to be a result of an algorithmic code generation scheme.
Check to see if you are using active_record-acts_as. That was the problem with me.
Add the line below to your Gemfile:
gem 'active_record-acts_as', :git => 'https://github.com/hzamani/active_record-acts_as.git'
This will just pull the latest version of the Gem that will hopefully be fixed. Worked for me.
I think you're seeing side effects of ruby personally.
I think the better way to do what you're doing would be with
2.0.0-p481#meri :008 > [*1..5]
=> [1, 2, 3, 4, 5]
User.where(id: [*1..5]).to_sql
"SELECT `users`.* FROM `users` WHERE `users`.`id` IN (1, 2, 3, 4, 5)"
As this creates an Array vs an Array with element 1 of class Range.
OR
use an explicit Range to trigger the BETWEEN in AREL.
# with end element, i.e. exclude_end=false
2.0.0-p481#meri :013 > User.where(id: Range.new(1,5)).to_sql
=> "SELECT `users`.* FROM `users` WHERE (`users`.`id` BETWEEN 1 AND 5)"
# without end element, i.e. exclude_end=true
2.0.0-p481#meri :022 > User.where(id: Range.new(1, 5, true)).to_sql
=> "SELECT `users`.* FROM `users` WHERE (`users`.`id` >= 1 AND `users`.`id` < 5)"
If you care about having control of the queries you generate and the full power of the SQL language and database features then I would suggest moving from ActiveRecord/Arel to Sequel.
I can honestly say there are a lot more quirks and infuriating times ahead for you with ActiveRecord, especially when you move beyond simple crud like queries. When you start trying to query your data in anger, perhaps needing to join a few join tables here and there and realize you really do need join conditions or union all type queries.
It is also significantly faster and more reliable in its query generation and result handling and much easier to compose the queries you want. It also has real documentation you can actually read unlike arel.
I just wish I had discovered it much earlier rather than persisting with the rails default data access layer.

SELECT n-th row WHERE field = x value

format(sql, sizeof(sql), "SELECT * FROM `datab` WHERE License = %s", searchPlate);
Querying with this format will give me all the rows with this result, but what i'm trying to do is take for ex. the third or fifth or even tenth row that has this result, not all of the rows. How can i do this?
Something like this should work in MySQL:
format(sql, sizeof(sql), "SELECT * FROM `datab` WHERE License = %s ORDER BY IDColumnNameGoesHere LIMIT %d, 1", searchPlate, MyDesiredRowInteger);
Some points:
%d might not be right, use the correct symbol for an integer.
You are most definitely using a specific RDBMS, you must look into the documentation and find out. SQL is a standard, MySQL ans SQL-Server etc are implementations of that standard. You must find out which implementation you are using.
Sticking variables into strings like you have done leaves you very vulnerable to SQL injection. You should always parameterize your queries.
LIMIT is specific to MySQL if you are using a different RDBMS you will have to go another route. For example SQL-Server you can use TOP, but as this only has one parameter you will need to use min or max in addition to only get the one record you desire.

Is possible to do this with a single query: get a fixed number of items of different types, ordered

Let's say i have a table with a "type" and a "date" column, and i want to fetch the latest 3 items of each type, ordered by date.(Can't trust the natural table order, or insertion order).
The query doesnt need to calculate all the different values for the "type" column, that can be specified in the query.
I'm trying with variables, like this:
set #c=0;
set #d=0;
select *, #c:=IF(type = 1, #c+1,#c), #d:=IF(type = 2, #d+1,#d) from testtable HAVING((type=1 AND #c < 3) OR (type=2 AND #d<3)) order by testdate;
This is "almost" working, (it's returning one more entry for each type,which is fine), and i guess it's related to way mysql is resolving the HAVING clause (in fact, in some scenarios i'm finding i need to use WHERE instead of HAVING).Can someone shed some light in this? Am i safe using it like this?
Ok, looks that using HAVING is fine...I was having problems because, in my real code, i need to use a few order-by, and the last of those is RAND() (to resolve ties), and that RAND() ordering seems to mess up with the mysql variable assignment.