MySQL alias replaced by column name when creating view involving subquery - mysql

Why is a column alias being replaced by the original column name when I create a view from a script? The script works, the view fails.
The script selects records using an outer query / inner query a.k.a. query / subquery. The subquery is used in the SELECT clause. Each subquery is itself a SELECT clause which becomes a column in the result set. See http://www.techonthenet.com/mysql/subqueries.php.
The alias used inside the subquery's SELECT clauses is replaced with its original column name. The alias used to give the subquery a short name is not replaced.
Here is a meta version so you can see the structure.
select `t1`.`Date` as **`*When*`**,
( select avg(t1.column)
from t1
where `t1.`Date` = `***When***`
) as `Short column name`,
from t1
group by `Date`
order by `Date`
In the View version, with aliases replaced, the subquery becomes;
(
select avg(t1.column)
from t1
where ***`t1.`Date` = `t1.`Date`***
) as `Short column name`,
The effect of this is that the average is calculated across all dates rather than just for the date specified as When in the outer query.
Another script built the same way translates into a view without a problem. The alias is kept.
There is a difference between the clauses used in the bad and good views but it is not obvious to me that it should cause the problem.
The bad view ends with;
group by `Date`
order by `Date`
while the good one ends only with a group by clause.
Another difference is that the column being aliased in the bad view is probably of field type DATETIME, while the one in the good view ia probably one of the INT types. (it's actually week(t1.Date).
Using:
MySQL 5.5
MySQL Workbench 6.0.8
Ubuntu 14.04

The aliases in the SELECT refer to the output of the query block, not to the processing of the query block.
The correct way to do what you want is to use a correlated subquery with table aliases:
select touter.`Date` as **`*When*`**,
(select avg(tinner.column)
from t1 tinner
where `tinner.`Date` = touter.date
) as `Short column name`,
from t1 as touter
group by `Date`
order by `Date`;
I have no idea why the average would be calculated for all the dates. I would expect it to return an error, or perhaps a NULL value. Perhaps your real where clause is t1.Date = Date and you expect MySQL to magically know what the second Date refers to. Don't depend on magic. Use table aliases and be explicit.

Related

Issue converting Access SQL Inner Join Query to mySQL Query

I am having trouble finding the syntax issue that is causing the following query to give me no results:
SELECT Table1.Country, Table_Data2.Part, Table_Data2.Description, Sum(Table_Data2.Quantity) AS Quantity, Table1.ship_time
FROM Table1 INNER JOIN Table_Data2 ON Table1.CodeValue = Table_Data2.CodeValue
GROUP BY Table1.Country, Table_Data2.Part, Table_Data2.Description, Table1.ship_time
HAVING (((Table_Data2.Part)="BB1234" Or (Table_Data2.Part)="BB-3454") AND ((Table1.ship_time)=Date()));
Which should successfully result in a table that looks like this:
Example of what result should look like
Instead there are no syntax issues that arise nor are there any records that load.
It seems theres a syntax issue in the code above as it does not work in mySQL as it does in MS Access
Few corrections possible:
To get current date in MySQL, use Current_Date(). Date() function in MySQL has different behaviour, as it is used to extract date part out of a date(time) expression.
Parentheses around just field names are unnecessary. Use aliasing in multi table query for code clarity and read ability.
Moreover looking at your conditions in the Having clause, they are more suited to be used in the Where clause. Because they are not aggregated values and you are grouping on these same fields as well. You query will become more performant if you shift them to Where clause, as MySQL will be aggregating on filtered (reduced) data, and thus minimizing temp table space.
Also, you can rewrite multiple OR conditions on same field as IN(...)
You can rewrite as:
SELECT
t1.Country,
t2.Part,
t2.Description,
Sum(t2.Quantity) AS Quantity,
t1.ship_time
FROM Table1 AS t1
INNER JOIN Table_Data2 AS t2
ON t1.CodeValue = t2.CodeValue
WHERE
t2.Part IN ('BB1234', 'BB-3454')
AND t1.ship_time = Current_Date()
GROUP BY
t1.Country,
t2.Part,
t2.Description,
t1.ship_time

Correct format for Select in SQL Server

I have what should be a simple query for any database and which always runs in MySQL but not in SQL Server
select
tagalerts.id,
ts,
assetid,
node.zonename,
battlevel
from tagalerts, node
where
ack=0 and
tagalerts.nodeid=node.id
group by assetid
order by ts desc
The error is:
column tagalerts.id is invalid in the select list because it is not contained in either an aggregate function or the group by clause.
It is not a simple case of adding tagalerts.id to the group by clause because the error repeats for ts and for assetid etc, implying that all the selects need to be in a group or in aggregate functions... either of which will result in a meaningless and inaccurate result.
Splitting the select into a subquery to sort and group correctly (which again works fine with MySQL, as you would expect) makes matters worse
SELECT * from
(select
tagalerts.id,
ts,
assetid,
node.zonename,
battlevel
from tagalerts, node
where
ack=0 and
tagalerts.nodeid=node.id
order by ts desc
)T1
group by assetid
the order by clause is invalid in views, inline functions, derived tables and expressions unless TOP etc is used
the 'correct output' should be
id ts assetid zonename battlevel
1234 a datetime 1569 Reception 0
3182 another datetime 1572 Reception 0
Either I am reading SQL Server's rules entirely wrong or this is a major flaw with that database.
How can I write this to work on both systems?
In most databases you can't just include columns that aren't in the GROUP BY without using an aggregate function.
MySql is an exception to that. But MS SQL Server isn't.
So you could keep that GROUP BY with only the "assetid".
But then use the appropriate aggregate functions for all the other columns.
Also, use the JOIN syntax for heaven's pudding sake.
A SQL like select * from table1, table2 where table1.id2 = table2.id is using a syntax from the previous century.
SELECT
MAX(node.id) AS id,
MAX(ta.ts) AS ts,
ta.assetid,
MAX(node.zonename) AS zonename,
MAX(ta.battlevel) AS battlevel
FROM tagalerts AS ta
JOIN node ON node.id = ta.nodeid
WHERE ta.ack = 0
GROUP BY ta.assetid
ORDER BY ta.ts DESC;
Another trick to use in MS SQL Server is the window function ROW_NUMBER.
But this is probably not what you need.
Example:
SELECT id, ts, assetid, zonename, battlevel
FROM
(
SELECT
node.id,
ta.ts,
ta.assetid,
node.zonename,
ta.battlevel,
ROW_NUMBER() OVER (PARTITION BY ta.assetid ORDER BY ta.ts DESC) AS rn
FROM tagalerts AS ta
JOIN node ON node.id = ta.nodeid
WHERE ta.ack = 0
) q
WHERE rn = 1
ORDER BY ts DESC;
I strongly suspect this query is WRONG even in MySql.
We're missing a lot of details (sample data, and we don't know which table all of the columns belong to), but what I do know is you're grouping by assetid, where it looks like one assetid value could have more than one ts (timestamp) value in the group. It also looks like you're counting on the order by ts desc to ensure both that you see recent timestamps in the results first and that each assetid group uses the most recent possible ts timestamp for that group.
MySql only guarantees the former, not the latter. Nothing in this query guarantees that each assetid is using the most recent timestamp available. You could be seeing the wrong timestamps, and then also using those wrong timestamps for the order by. This is the problem the Sql Server rule is there to stop. MySql violates the SQL standard to allow you to write that wrong query.
Instead, you need to look at each column and either add it to the group by (best when all of the values are known to be the same, anyway) or wrap it in an aggregrate function like MAX(), MIN(), AVG(), etc, so there is a deterministic result for which value from the group is used.
If all of the values for a column in a group are the same, then there's no problem adding it to the group by. If the values are different, you want to be precise about which one is chosen for the result set.
While I'm here, the tagalerts, node join syntax has been obsolete for more than 20 years now. It's also good practice to use an alias with every table and prefix every column with the alias. I mention these to explain why I changed it for my code sample below, though I only prefix columns where I am confident in which table the column belongs to.
This query should run on both databases:
SELECT ta.assetid, MAX(ta.id) "id", MAX(ta.ts) "ts",
MAX(n.zonename) "zonename", MAX(battlevel) "battlevel"
FROM tagalerts ta
INNER JOIN node n ON ta.nodeid = n.id
WHERE ack = 0
GROUP BY ta.assetid
ORDER BY ts DESC
There is also a concern here the results may be choosing values from different records in the joined node table. So if battlevel is part of the node table, you might see a result that matches a zonename with a battlevel that never occurs in any record in the data. In Sql Server, this is easily fixed by using APPLY to match only one node record to each tagalert. MySql doesn't support this (APPLY or an equivalent has been in every other major database since at least 2012), but you can simulate with it in this case with two JOINs, where the first join is a subquery that uses GROUP BY to determine values will uniquely identify the needed node record, and second join is to the node table to actually produce that record. Unfortunately, we need to know more about the tables in question to actually write this code for you.

Remove Duplicate record from Mysql Table using Group By

I have a table structure and data below.
I need to remove duplicate record from the table list. My confusion is that when I am firing query
SELECT * FROM `table` GROUP BY CONCAT(`name`,department)
then giving me correct list(12 records).
Same query when I am using the subquery:
SELECT *
FROM `table` WHERE id IN (SELECT id FROM `table` GROUP BY CONCAT(`name`,department))
It returning all record which is wrong.
So, My question is why group by in subquery is not woking.
Actually as Tim mentioned in his answer that it to get first unique record by group by clause is not a standard feature of sql but mysql allows it till mysql5.6.16 version but from 5.6.21 it has been changed.
Just change mysql version in your sql fiddle and check that you will get what you want.
In the query
SELECT * FROM `table` GROUP BY CONCAT(`name`,department)
You are selecting the id column, which is a non-aggregate column. Many RDBMS would give you an error, but MySQL allows this for performance reasons. This means MySQL has to choose which record to retain in the result set. Based on the result set in your original problem, it appears that MySQL is retaining the id of the first duplicate record, in cases where a group has more than one member.
In the query
SELECT *
FROM `table`
WHERE id IN
(
SELECT id FROM `table` GROUP BY CONCAT(`name`,department)
)
you are also selecting a non-aggregate column in the subquery. It appears that MySQL actually decides which id value to be retained in the subquery based on the id value in the outer query. That is, for each id value in table, MySQL performs the subquery and then selectively chooses to retain a record in the group if two id values match.
You should avoid using a non-aggregate column in a query with GROUP BY, because it is a violation of the ANSI standard, and as you have seen here it can result in unexpected results. If you give us more information about what result set you want, we can give you a correct query which will avoid this problem.
I welcome anyone who has documentation to support these observations to either edit my question or post a new one.
You can JOIN the grouped ids with that of table ids, so that you can get desired results.
Example:
SELECT t.* FROM so_q32175332 t
JOIN ( SELECT id FROM so_q32175332
GROUP BY CONCAT( name, department ) ) f
ON t.id = f.id
ORDER BY CONCAT( name, department );
Here order by was added just to compare directly the * results on group.
Demo on SQL Fiddle: http://sqlfiddle.com/#!9/d715a/1

Mysql sort - SORTING SOME BUT NOT OTHERS?

Okay, here is my query:
SELECT NAME,
DATE_FORMAT(DATE_WRITTEN, "%c/%e/%y") AS written_date,
DATE_FORMAT(RETURN_DATE, "%c/%e/%y") AS return_date
FROM `pfp`.`returns` AS `Re`
LEFT JOIN `pfp`.`insurance` AS `Insurance`
ON ( `insurance`.`id` = `Re`.`INSURANCE_ID` )
LEFT JOIN `pfp`.`remain` AS `Remain`
ON ( `remain`.`id` = `Re`.`REMAIN_ID` )
LEFT JOIN `pfp`.`formula` AS `Formula`
ON ( `formula`.`id` = `remain`.`FORMULA_ID` )
WHERE `NOT_RETURNED` = 'F'
AND `RETURN_DATE` BETWEEN '2014-01-01' AND '2014-08-22'
ORDER BY `RETURN_DATE` DESC
LIMIT 100
The problem is that it sorts by the date 14-8-9 down to 14-8-7 then jumps back up to 14-8-22 and downward from there... why??
When you sort by return_date, you are sorting by the formatted alias. Instead, use the table alias to identify that you really want the column:
WHERE `NOT_RETURNED` = 'F'
AND `RETURN_DATE` BETWEEN '2014-01-01' AND '2014-08-22'
ORDER BY re.RETURN_DATE DESC
LIMIT 100
I am guessing that it is in the re table. Use the appropriate alias.
EDIT:
The fact that the column aliases are searched first is documented:
MySQL resolves unqualified column or alias references in ORDER BY
clauses by searching in the select_expr values, then in the columns of
the tables in the FROM clause. For GROUP BY or HAVING clauses, it
searches the FROM clause before searching in the select_expr values.
(For GROUP BY and HAVING, this differs from the pre-MySQL 5.0 behavior
that used the same rules as for ORDER BY.)
I can speculate on the reasons for this (which I think is consistent with the ANSI standard). SQL queries are logically processed in a particular order, something like from, then where, then select, then order by (leaving out other clauses). This logical processing determines how the query is compiled and what identifiers mean. The logical processing explains why column aliases are not allowed in the where clause -- from the perspective of the compiler, they are not yet identified.
When it comes to the order by, the identifier is determined from the inside out. The first definition is the version in the select, so it chooses that before going to the from.

Why does MySQL allow you to group by columns that are not selected

I'm reading a book on SQL (Sams Teach Yourself SQL in 10 Minutes) and its quite good despite its title. However the chapter on group by confuses me
"Grouping data is a simple process. The selected columns (the column list following
the SELECT keyword in a query) are the columns that can be referenced in the GROUP
BY clause. If a column is not found in the SELECT statement, it cannot be used in the
GROUP BY clause. This is logical if you think about it—how can you group data on a
report if the data is not displayed? "
How come when I ran this statement in MySQL it works?
select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
You're right, MySQL does allow you to create queries that are ambiguous and have arbitrary results. MySQL trusts you to know what you're doing, so it's your responsibility to avoid queries like that.
You can make MySQL enforce GROUP BY in a more standard way:
mysql> SET SQL_MODE=ONLY_FULL_GROUP_BY;
mysql> select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
ERROR 1055 (42000): 'test.EMPLOYEE_PAY_TBL.EMP_ID' isn't in GROUP BY
Because the book is wrong.
The columns in the group by have only one relationship to the columns in the select according to the ANSI standard. If a column is in the select, with no aggregation function, then it (or the expression it is in) needs to be in the group by statement. MySQL actually relaxes this condition.
This is even useful. For instance, if you want to select rows with the highest id for each group from a table, one way to write the query is:
select t.*
from table t
where t.id in (select max(id)
from table t
group by thegroup
);
(Note: There are other ways to write such a query, this is just an example.)
EDIT:
The query that you are suggesting:
select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
would work in MySQL but probably not in any other database (unless BONUS happens to be a poorly named primary key on the table, but that is another matter). It will produce one row for each value of BONUS. For each row, it will get an arbitrary EMP_ID and SALARY from rows in that group. The documentation actually says "indeterminate", but I think arbitrary is easier to understand.
What you should really know about this type of query is simply not to use it. All the "bare" columns in the SELECT (that is, with no aggregation functions) should be in the GROUP BY. This is required in most databases. Note that this is the inverse of what the book says. There is no problem doing:
select EMP_ID
from EMPLOYEE_PAY_TBL
group by EMP_ID, BONUS;
Except that you might get multiple rows back for the same EMP_ID with no way to distinguish among them.