Okay, here is my query:
SELECT NAME,
DATE_FORMAT(DATE_WRITTEN, "%c/%e/%y") AS written_date,
DATE_FORMAT(RETURN_DATE, "%c/%e/%y") AS return_date
FROM `pfp`.`returns` AS `Re`
LEFT JOIN `pfp`.`insurance` AS `Insurance`
ON ( `insurance`.`id` = `Re`.`INSURANCE_ID` )
LEFT JOIN `pfp`.`remain` AS `Remain`
ON ( `remain`.`id` = `Re`.`REMAIN_ID` )
LEFT JOIN `pfp`.`formula` AS `Formula`
ON ( `formula`.`id` = `remain`.`FORMULA_ID` )
WHERE `NOT_RETURNED` = 'F'
AND `RETURN_DATE` BETWEEN '2014-01-01' AND '2014-08-22'
ORDER BY `RETURN_DATE` DESC
LIMIT 100
The problem is that it sorts by the date 14-8-9 down to 14-8-7 then jumps back up to 14-8-22 and downward from there... why??
When you sort by return_date, you are sorting by the formatted alias. Instead, use the table alias to identify that you really want the column:
WHERE `NOT_RETURNED` = 'F'
AND `RETURN_DATE` BETWEEN '2014-01-01' AND '2014-08-22'
ORDER BY re.RETURN_DATE DESC
LIMIT 100
I am guessing that it is in the re table. Use the appropriate alias.
EDIT:
The fact that the column aliases are searched first is documented:
MySQL resolves unqualified column or alias references in ORDER BY
clauses by searching in the select_expr values, then in the columns of
the tables in the FROM clause. For GROUP BY or HAVING clauses, it
searches the FROM clause before searching in the select_expr values.
(For GROUP BY and HAVING, this differs from the pre-MySQL 5.0 behavior
that used the same rules as for ORDER BY.)
I can speculate on the reasons for this (which I think is consistent with the ANSI standard). SQL queries are logically processed in a particular order, something like from, then where, then select, then order by (leaving out other clauses). This logical processing determines how the query is compiled and what identifiers mean. The logical processing explains why column aliases are not allowed in the where clause -- from the perspective of the compiler, they are not yet identified.
When it comes to the order by, the identifier is determined from the inside out. The first definition is the version in the select, so it chooses that before going to the from.
Related
Notes about the database
It was generated using Prisma so unfortunately the column names in the many-to-many tables are named "A" and "B". "A" refers to the table which comes first in the alphabet and "B" the second. For example, in _ReadingToWord, "A" refers to Reading.id and "B" refers to Word.id because "r" comes before "w" in the alphabet.
The problem
I have the below query that uses a limit statement to implement paging.
The problem I am having is that the result order is non-deterministic. (If I execute the query a bunch of times, some of the time the order will be different).
I am ordering by id which is a primary key so I thought that should ensure a consistent order.
Can anyone explain why the ordering is non-deterministic and how to fix it?
select * from (
SELECT w.id,
hiragana,
group_concat( distinct(concat(coalesce(r.downStep, -1) + 1 , "," ,r.katakana)) order by r.downStep SEPARATOR ' ')
from Hiragana a join _HiraganaToWord b on a.id = b.A join
Word w on w.id = b.B join _ReadingToWord rtw on w.id = rtw.B join
Reading r on r.id = rtw.A
WHERE hiragana like "あ%"
group by w.id
)
as groupQuery
order by length(hiragana), hiragana, id asc limit 600,5;
Sample runs
You are experiencing one of the subtle side-effects of disabling only_full_group_by:
If ONLY_FULL_GROUP_BY is disabled, a MySQL extension to the standard SQL use of GROUP BY permits the select list, HAVING condition, or ORDER BY list to refer to nonaggregated columns even if the columns are not functionally dependent on GROUP BY columns. This causes MySQL to accept the preceding query. In this case, the server is free to choose any value from each group, so unless they are the same, the values chosen are nondeterministic, which is probably not what you want.
If you would enable that mode, you would get an error like
Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'a.hiragana' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
and searching on stackoverflow for that error message will give you lots and lots of examples for this problem.
So in your query
SELECT w.id, a.hiragana,
...
group by w.id
...
order by hiragana
the values for hiragana are not necessarily deterministic. If, for the same w.id, there are several values for a.hiragana, MySQL can pick any of those. And if you order by that non-deterministically chosen value, you can get different orders. MySQL doesn't actually pick a random row, just doesn't care which one it is, so oftentimes, you get the same (which can make this harder to spot), but not always.
It doesn't have to be the entry with id 31752 for which MySQL has picked a different value for hiragana (it can be any of the previous 600 rows), but I would check that value first - if it has a 2nd value that also starts with "あ" but would be ordered after the value for 47348 (or is longer), it might immediately make things clearer.
You can technically fix this by picking a deterministic value there, e.g. the min or max value:
select * from (
SELECT w.id,
min(hiragana) as hiragana,
...
group by w.id
) as groupQuery
order by length(hiragana), hiragana, id asc limit 600,5;
You have to check if that is what you are actually trying to do (e.g., if there are several choices for hiragana, you don't care which one is chosen, as long as it is a determinic one) and if this fits your required result. Other choices might be group by w.id, a.hiragana or group by w.id, a.id, or maybe you need to completely rewrite your query (as it may not cover this case).
I have this select in my MySQL DB:
select r.ID, r.ReservationDate, SUM(p.Amount) AS Amount
from Reservations r
join Payments p
on r.ID = p.ReservationID
where r.ConfirmationNumber = '123456'
and p.CCLast4 = '3506'
and r.ID = 54321
It gives me exactly 1 record -- the correct record -- as expected. But if I change the CCLast4 (3506) to any old number/string I want, I still get the record back, but Amount is null. I would expect no record at all because the where clause no longer matches. If I change the the ConfirmationNumber or the ID, as expected I get back no results. But CCLast4 is being completely ignored.
If I remove the aggregate: SUM(p.Amount) AS Amount - all is good, and the CCLast4 demands the correct number before returning the string.
I don't understand why the aggregate causes the where clause related to the Payments table (CCLast4 column) to be ignored.
How can I change the query so that I can use the aggregate in the select AND all the where clauses are honored?
This is actually the expected behaviour. From the manual:
Without GROUP BY, there is a single group and it is nondeterministic which [non-aggregated column] value to choose for the group.
Although not very clearly stated, this means that you always get one group (so one row), even for an empty table.
It is also worth emphasizing that the values that MySQL chooses for the non-aggregated columns in your select, r.ID and r.ReservationDate, are in fact nondeterministic and will specifically vary across MySQL versions (e.g. they will usually be null for MySQL 8.0 while they will usually contain existing values for earlier versions).
The solution is similarly subtle - add a group by (so the quoted sentence does not apply anymore):
...
where r.ConfirmationNumber = '123456'
and p.CCLast4 = 'xxx'
and r.ID = 54321
group by r.ID, r.ReservationDate
should give you 0 rows.
I have what should be a simple query for any database and which always runs in MySQL but not in SQL Server
select
tagalerts.id,
ts,
assetid,
node.zonename,
battlevel
from tagalerts, node
where
ack=0 and
tagalerts.nodeid=node.id
group by assetid
order by ts desc
The error is:
column tagalerts.id is invalid in the select list because it is not contained in either an aggregate function or the group by clause.
It is not a simple case of adding tagalerts.id to the group by clause because the error repeats for ts and for assetid etc, implying that all the selects need to be in a group or in aggregate functions... either of which will result in a meaningless and inaccurate result.
Splitting the select into a subquery to sort and group correctly (which again works fine with MySQL, as you would expect) makes matters worse
SELECT * from
(select
tagalerts.id,
ts,
assetid,
node.zonename,
battlevel
from tagalerts, node
where
ack=0 and
tagalerts.nodeid=node.id
order by ts desc
)T1
group by assetid
the order by clause is invalid in views, inline functions, derived tables and expressions unless TOP etc is used
the 'correct output' should be
id ts assetid zonename battlevel
1234 a datetime 1569 Reception 0
3182 another datetime 1572 Reception 0
Either I am reading SQL Server's rules entirely wrong or this is a major flaw with that database.
How can I write this to work on both systems?
In most databases you can't just include columns that aren't in the GROUP BY without using an aggregate function.
MySql is an exception to that. But MS SQL Server isn't.
So you could keep that GROUP BY with only the "assetid".
But then use the appropriate aggregate functions for all the other columns.
Also, use the JOIN syntax for heaven's pudding sake.
A SQL like select * from table1, table2 where table1.id2 = table2.id is using a syntax from the previous century.
SELECT
MAX(node.id) AS id,
MAX(ta.ts) AS ts,
ta.assetid,
MAX(node.zonename) AS zonename,
MAX(ta.battlevel) AS battlevel
FROM tagalerts AS ta
JOIN node ON node.id = ta.nodeid
WHERE ta.ack = 0
GROUP BY ta.assetid
ORDER BY ta.ts DESC;
Another trick to use in MS SQL Server is the window function ROW_NUMBER.
But this is probably not what you need.
Example:
SELECT id, ts, assetid, zonename, battlevel
FROM
(
SELECT
node.id,
ta.ts,
ta.assetid,
node.zonename,
ta.battlevel,
ROW_NUMBER() OVER (PARTITION BY ta.assetid ORDER BY ta.ts DESC) AS rn
FROM tagalerts AS ta
JOIN node ON node.id = ta.nodeid
WHERE ta.ack = 0
) q
WHERE rn = 1
ORDER BY ts DESC;
I strongly suspect this query is WRONG even in MySql.
We're missing a lot of details (sample data, and we don't know which table all of the columns belong to), but what I do know is you're grouping by assetid, where it looks like one assetid value could have more than one ts (timestamp) value in the group. It also looks like you're counting on the order by ts desc to ensure both that you see recent timestamps in the results first and that each assetid group uses the most recent possible ts timestamp for that group.
MySql only guarantees the former, not the latter. Nothing in this query guarantees that each assetid is using the most recent timestamp available. You could be seeing the wrong timestamps, and then also using those wrong timestamps for the order by. This is the problem the Sql Server rule is there to stop. MySql violates the SQL standard to allow you to write that wrong query.
Instead, you need to look at each column and either add it to the group by (best when all of the values are known to be the same, anyway) or wrap it in an aggregrate function like MAX(), MIN(), AVG(), etc, so there is a deterministic result for which value from the group is used.
If all of the values for a column in a group are the same, then there's no problem adding it to the group by. If the values are different, you want to be precise about which one is chosen for the result set.
While I'm here, the tagalerts, node join syntax has been obsolete for more than 20 years now. It's also good practice to use an alias with every table and prefix every column with the alias. I mention these to explain why I changed it for my code sample below, though I only prefix columns where I am confident in which table the column belongs to.
This query should run on both databases:
SELECT ta.assetid, MAX(ta.id) "id", MAX(ta.ts) "ts",
MAX(n.zonename) "zonename", MAX(battlevel) "battlevel"
FROM tagalerts ta
INNER JOIN node n ON ta.nodeid = n.id
WHERE ack = 0
GROUP BY ta.assetid
ORDER BY ts DESC
There is also a concern here the results may be choosing values from different records in the joined node table. So if battlevel is part of the node table, you might see a result that matches a zonename with a battlevel that never occurs in any record in the data. In Sql Server, this is easily fixed by using APPLY to match only one node record to each tagalert. MySql doesn't support this (APPLY or an equivalent has been in every other major database since at least 2012), but you can simulate with it in this case with two JOINs, where the first join is a subquery that uses GROUP BY to determine values will uniquely identify the needed node record, and second join is to the node table to actually produce that record. Unfortunately, we need to know more about the tables in question to actually write this code for you.
Why is a column alias being replaced by the original column name when I create a view from a script? The script works, the view fails.
The script selects records using an outer query / inner query a.k.a. query / subquery. The subquery is used in the SELECT clause. Each subquery is itself a SELECT clause which becomes a column in the result set. See http://www.techonthenet.com/mysql/subqueries.php.
The alias used inside the subquery's SELECT clauses is replaced with its original column name. The alias used to give the subquery a short name is not replaced.
Here is a meta version so you can see the structure.
select `t1`.`Date` as **`*When*`**,
( select avg(t1.column)
from t1
where `t1.`Date` = `***When***`
) as `Short column name`,
from t1
group by `Date`
order by `Date`
In the View version, with aliases replaced, the subquery becomes;
(
select avg(t1.column)
from t1
where ***`t1.`Date` = `t1.`Date`***
) as `Short column name`,
The effect of this is that the average is calculated across all dates rather than just for the date specified as When in the outer query.
Another script built the same way translates into a view without a problem. The alias is kept.
There is a difference between the clauses used in the bad and good views but it is not obvious to me that it should cause the problem.
The bad view ends with;
group by `Date`
order by `Date`
while the good one ends only with a group by clause.
Another difference is that the column being aliased in the bad view is probably of field type DATETIME, while the one in the good view ia probably one of the INT types. (it's actually week(t1.Date).
Using:
MySQL 5.5
MySQL Workbench 6.0.8
Ubuntu 14.04
The aliases in the SELECT refer to the output of the query block, not to the processing of the query block.
The correct way to do what you want is to use a correlated subquery with table aliases:
select touter.`Date` as **`*When*`**,
(select avg(tinner.column)
from t1 tinner
where `tinner.`Date` = touter.date
) as `Short column name`,
from t1 as touter
group by `Date`
order by `Date`;
I have no idea why the average would be calculated for all the dates. I would expect it to return an error, or perhaps a NULL value. Perhaps your real where clause is t1.Date = Date and you expect MySQL to magically know what the second Date refers to. Don't depend on magic. Use table aliases and be explicit.
I'm trying to add features to a preexisting application and I came across a MySQL view something like this:
SELECT
AVG(table_name.col1),
AVG(table_name.col2),
AVG(table_name.col3),
table_name.personID,
table_name.col4
FROM table_name
GROUP BY table_name.personID;
OK so there's a few aggregate functions. You can select personID because you're grouping by it. But it also is selecting a column that is not in an aggregate function and is not a part of the GROUP BY clause. How is this possible??? Does it just pick a random value because the values definitely aren't unique per group?
Where I come from (MSSQL Server), that's an error. Can someone explain this behavior to me and why it's allowed in MySQL?
It's true that this feature permits some ambiguous queries, and silently returns a result set with an arbitrary value picked from that column. In practice, it tends to be the value from the row within the group that is physically stored first.
These queries aren't ambiguous if you only choose columns that are functionally dependent on the column(s) in the GROUP BY criteria. In other words, if there can be only one distinct value of the "ambiguous" column per value that defines the group, there's no problem. This query would be illegal in Microsoft SQL Server (and ANSI SQL), even though it cannot logically result in ambiguity:
SELECT AVG(table1.col1), table1.personID, persons.col4
FROM table1 JOIN persons ON (table1.personID = persons.id)
GROUP BY table1.personID;
Also, MySQL has an SQL mode to make it behave per the standard: ONLY_FULL_GROUP_BY
FWIW, SQLite also permits these ambiguous GROUP BY clauses, but it chooses the value from the last row in the group.†
† At least in the version I tested. What it means to be arbitrary is that either MySQL or SQLite could change their implementation in the future, and have some different behavior. You should therefore not rely on the behavior staying they way it is currently in ambiguous cases like this. It's better to rewrite your queries to be deterministic and not ambiguous. That's why MySQL 5.7 now enables ONLY_FULL_GROUP_BY by default.
I should have Googled for just a bit longer... It seems I found my answer.
MySQL extends the use of GROUP BY so
that you can use nonaggregated columns
or calculations in the SELECT list
that do not appear in the GROUP BY
clause. You can use this feature to
get better performance by avoiding
unnecessary column sorting and
grouping. For example, you do not need
to group on customer.name in the
following query
In standard SQL, you would have to add
customer.name to the GROUP BY clause.
In MySQL, the name is redundant.
Still, that just seems... wrong.
Let's say you have a query like this:
SELECT g, v
FROM t
GROUP BY g;
In this case, for each possible value for g, MySQL picks one of the corresponding values of v.
However, which one is chosen, depends on some circumstances.
I read somewhere that for each group of g, the first value of v is kept, in the order how the records were inserted into the table t.
This is quite ugly, because the records in a table should be treated as a set where the order of the elements should not matter. This is so "mysql-ish"...
If you want to determine which value for v to keep, you need to apply a subselect for t like this:
SELECT g, v
FROM (
SELECT *
FROM t
ORDER BY g, v DESC
) q
GROUP BY g;
This way you define which order the records of the subquery are processed by the external query, thus you can trust which value of v it will pick for the individual values of g.
However, if you need some WHERE conditions then be very careful. If you add the WHERE condition to the subquery then it will keep the behaviour, it will always return the value you expect:
SELECT g, v
FROM (
SELECT *
FROM t
WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
ORDER BY g, v DESC
) q
GROUP BY g;
This is what you expect, the subselect filters and orders the table. It keeps the records where g has the given value and the external query returns that g and the first value for v.
However, if you add the same WHERE condition to the outer query then you get a non-deterministic result:
SELECT g, v
FROM (
SELECT *
FROM t
-- WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
ORDER BY g, v DESC
) q
WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
GROUP BY g;
Surprisingly, you may get different values for v when executing the same query again and again which is... strange. The expected behaviour is to get all the records in the appropriate order from the subquery, filtering them in the outer query and then picking the same as it picked in the previous example. But it does not.
It picks a value for v seemingly randomly. The same query returned different values for v if I executed more (~20) times, but the distribution was not uniform.
If instead of adding an outer WHERE, you specify a HAVING condition like this:
SELECT g, v
FROM (
SELECT *
FROM t1
-- WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
ORDER BY g, v DESC
) q
-- WHERE g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9'
GROUP BY g
HAVING g = '737a8783-110c-447e-b4c2-1cbb7c6b72c9';
Then you get a consistent behaviour again.
CONCLUSION
I would suggest not to rely on this technique at all. If you really want/need to then avoid WHERE conditions in the outer query. Use it in the inner query if you can or a HAVING clause in the outer query.
I tested it with this data:
CREATE TABLE t1 (
v INT,
g VARCHAR(36)
);
INSERT INTO t1 VALUES (1, '737a8783-110c-447e-b4c2-1cbb7c6b72c9');
INSERT INTO t1 VALUES (2, '737a8783-110c-447e-b4c2-1cbb7c6b72c9');
in MySQL 5.6.41.
Maybe it is just a bug that gets/got fixed in newer versions, please give feedback if you have experience with newer versions.
select * from personel where p_id IN(select
min(dbo.personel.p_id)
FROM
personel
GROUP BY dbo.personel.p_adi)