Group by statement column name resolution - mysql

I have a query like this (more columns in reality)
select
custom_amend_column_function(colname) as colname
from source_table
group by colname
I've used this construct a lot.It appears to have always used the output from the function rather than the input to group data by.
Now for the first time today I have run this statement directly into a new table like this:
create table new_table as
select
custom_amend_column_function(colname) as colname
from source_table
group by colname
Curiously by adding the first line, I am getting an error message that my query is using an ambiguous column name in the group statement.
As group by can use new column names (ie. the below works) I always expected that group by will first use the columns defined in the select statement and then look at the underlying table(s) if it cannot find the names.
select
custom_amend_column_function(colname) as new_colname
from source_table
group by new_colname
Am I right? Is the error message right? I cannot find any place this is documented either for the SQL standard or for MySQL.
I know I could avoid this by just creating a new column name, but I want to figure this out as I may need to review existing queries if it is indeed ambiguous.

To trace the problem do an
EXPLAIN EXTENDED SELECT ...
On your query. Beside the normal output this will give you a warning containing the query as it is seen by the optimizer. All implicit renaming is done here, so you can see why the column name is ambiguous.
I think you can solve this by saying
... GROUP BY 1
instead of providing the column name in the group by.

Related

Sql syntax: select without from clause as subquery in select (subselect)

While editing some queries to add alternatives for columns without values, I accidentally wrote something like this (here is the simplyfied version):
SELECT id, (SELECT name) FROM t
To my surprise, MySQL didn't throw any error, but completed the query giving my expected results (the name column values).
I tried to find any documentation about it, but with no success.
Is this SQL standard or a MySQL specialty?
Can I be sure that the result of this syntax is really the column value from the same (outer) table? The extended version would be like this:
SELECT id, (SELECT name FROM t AS t1 where t1.id=t2.id) FROM t AS t2
but the EXPLAIN reports No tables used in the Extra column for the former version, which I think is very nice.
Here's a simple fiddle on SqlFiddle (it keeps timing out for me, I hope you have better luck).
Clarification: I know about subqueries, but I always wrote subqueries (correlated or not) that implied a table to select from, hence causing an additional step in the execution plan; my question is about this syntax and the result it gives, that in MySQL seems to return the expected value without any.
What you within your first query is a correlated subquery which simply returns the name column from the table t. no actual subquery needs to run here (which is what your EXPLAIN is telling you).
In a SQL database query, a correlated subquery (also known as a
synchronized subquery) is a subquery (a query nested inside another
query) that uses values from the outer query.
https://en.wikipedia.org/wiki/Correlated_subquery
SELECT id, (SELECT name) FROM t
is the same as
SELECT id, (SELECT t.name) FROM t
Your 2nd query
SELECT id, (SELECT name FROM t AS t1 where t1.id=t2.id) FROM t AS t2
Also contains correlated subquery but this one is actually running a query on table t to find records where t1.id = t2.id.
This is the default behavior for the SQL language and it is defined on the SQL ANSI 2011 over ISO/IEC 9075-1:2011(en) documentation. Unfortunately it is not open. This behavior is described on the section 4.11 SQL-Statements.
This behavior happens because the databases process the select comand without the from clause, therefore if it encounters:
select id, (select name) from some
It will try to find that name field as a column of the outer queries to process.
Fortunately I remember that some while ago I've answered someone here and find a valid available link to an SQL ANSI document that is online in FULL but it is for the SQL ANSI 99 and the section may not be the same one as the new document. I think, did not check, that it is around the section 4.30. Take a look. And I really recommend the reading (I did that back in the day).
Database Language SQL - ISO/IEC 9075-2:1999 (E)
It's not standard. In oracle,
select 1, (select 2)
from dual
Throws error, ORA-00923: FROM keyword not found where expected
How can you be sure of your results? Get a better understanding of what the query is supposed to acheive before you write it. Even the exetended version in the question does not make any sense.

Union query to combine results of 3 tables

I am relatively new to coding so please have patience.
I am trying to combine data from 3 tables. I have managed to get some data back but it isn't what i need. Please see my example below.
select oid, rrnhs, idnam, idfnam, dte1, ta
as 'access type' from person
left join
(select fk_oid, min(dte), dte1, ta
from
((Select fk_oid,min(accessdate) as dte, accessdate1 as dte1, accesstype as ta
from vascularpdaccess
where isnull(accesstype)=false group by fk_oid)
union
(Select fk_oid, min(hpdate) as dte, hpdate as dte1, HPACCE as ta
from hdtreatment
where isnull(hptype)=false group by fk_oid)) as bla
group by fk_oid) as access
on person.oid=access.fk_oid
where person.rrnhs in (1000010000, 2000020000, 3000030000)
My understanding with a union is that the columns have to be of the same data type but i have two problems. The first is that accesstype and hpacce combine in to a the same column as expected, but i dont want to actually see the hpacce data (dont know if this is even possible).
Secondly, the idea of the query is to pull back a patients 'accesstype' date at the first date of hpdate.
I dont know if this even makes sens to you guys but hoping someone can help..y'all are usually pretty nifty!
Thanks in advance!
Mikey
All queries need to have the same number of columns in the SELECT statement. It looks like you first query has the max number of columns, so you will need to "pad" the other to have the same number of columns. You can use NULL as col to create the column with all null values.
To answer the question (I think) you were asking... for a UNION or UNION ALL set operation, you are correct: the number of columns and the datatypes of the columns returned must match.
But it is possible to return a literal as an expression in the SELECT list. For example, if you don't want to return the value of HPACCE column, you can replace that with a literal or a NULL. (If that column is character datatype (we can't tell from the information provided in the question), you could use (for example) a literal empty string '' AS ta in place of HPACCE AS ta.
SELECT fk_oid
, MIN(HPDATE) AS dte
, hpdate AS dte1
, NULL AS ta
-- -------------------- ^^^^
FROM hdtreatment
Some other notes:
The predicate ISNULL(foo)=FALSE can be more simply expressed as foo IS NOT NULL.
The UNION set operator will remove duplicate rows. If that's not necessary, you could use a UNION ALL set operator.
The subsequent GROUP BY fk_oid operation on the inline view bla is going to collapse rows; but it's indeterminate which row the values from dte1 and ta will be from. (i.e. there is no guarantee those values will be from the row that had the "minimum" value of dte.) Other databases will throw an exception/error with this statement, along the lines of "non-aggregate in SELECT list not in GROUP BY". But this is allowed (without error or warning) by a MySQL specific extension to GROUP BY behavior. (We can get MySQL to behave like other databases and throw an error of we specify a value for sql_mode that includes ONLY_FULL_GROUP_BY (?).)
The predicate on the outer query doesn't get pushed down into the inline view bla. The view bla is going to materialized for every fk_oid, and that could be a performance issue on large sets.
Also, qualifying all column references would make the statement easier to read. And, that will also insulate the statement from throwing an "ambiguous column" error in the future, when a column named (e.g.) ta or dte1 is added to the person table.

Why does MySQL allow you to group by columns that are not selected

I'm reading a book on SQL (Sams Teach Yourself SQL in 10 Minutes) and its quite good despite its title. However the chapter on group by confuses me
"Grouping data is a simple process. The selected columns (the column list following
the SELECT keyword in a query) are the columns that can be referenced in the GROUP
BY clause. If a column is not found in the SELECT statement, it cannot be used in the
GROUP BY clause. This is logical if you think about it—how can you group data on a
report if the data is not displayed? "
How come when I ran this statement in MySQL it works?
select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
You're right, MySQL does allow you to create queries that are ambiguous and have arbitrary results. MySQL trusts you to know what you're doing, so it's your responsibility to avoid queries like that.
You can make MySQL enforce GROUP BY in a more standard way:
mysql> SET SQL_MODE=ONLY_FULL_GROUP_BY;
mysql> select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
ERROR 1055 (42000): 'test.EMPLOYEE_PAY_TBL.EMP_ID' isn't in GROUP BY
Because the book is wrong.
The columns in the group by have only one relationship to the columns in the select according to the ANSI standard. If a column is in the select, with no aggregation function, then it (or the expression it is in) needs to be in the group by statement. MySQL actually relaxes this condition.
This is even useful. For instance, if you want to select rows with the highest id for each group from a table, one way to write the query is:
select t.*
from table t
where t.id in (select max(id)
from table t
group by thegroup
);
(Note: There are other ways to write such a query, this is just an example.)
EDIT:
The query that you are suggesting:
select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
would work in MySQL but probably not in any other database (unless BONUS happens to be a poorly named primary key on the table, but that is another matter). It will produce one row for each value of BONUS. For each row, it will get an arbitrary EMP_ID and SALARY from rows in that group. The documentation actually says "indeterminate", but I think arbitrary is easier to understand.
What you should really know about this type of query is simply not to use it. All the "bare" columns in the SELECT (that is, with no aggregation functions) should be in the GROUP BY. This is required in most databases. Note that this is the inverse of what the book says. There is no problem doing:
select EMP_ID
from EMPLOYEE_PAY_TBL
group by EMP_ID, BONUS;
Except that you might get multiple rows back for the same EMP_ID with no way to distinguish among them.

Select Left(columnA,2) or right(columnA,1)

Is there a way in MySql to use an either or in a select column. For instance
select left(columnA,2) or right(columnA,1) as columnAlias, sum(columnB)
from table
where ((left(columnA,2) in ('aa','bb','cc')) or (right(columnA,1) in ('a,','b','c')))
group by columnAlias
what I have is a table where either the first 2 characters of the column or the last character of the column indicates the facility. I need to sum the values by facility. A union gets me part way there then I could loop through the resulting dataset and sum things up in the code (or do a stored proc to return the sums), but I am wondering if there is a way to just get it from the query.
I've tried using the union query as an on the fly temp table and doing the select and group on that but if there are no records returned from either of the select statments then it throws a "column columnA cannot be null error.
Also tried with the syntax above, but not getting the results I am expecting. Any other ways to do this through the query?
using a CASE would prob be your best bet here.
http://dev.mysql.com/doc/refman/5.0/en/case-statement.html

What does SQL clause "GROUP BY 1" mean?

Someone sent me a SQL query where the GROUP BY clause consisted of the statement: GROUP BY 1.
This must be a typo right? No column is given the alias 1. What could this mean? Am I right to assume that this must be a typo?
It means to group by the first column of your result set regardless of what it's called. You can do the same with ORDER BY.
SELECT account_id, open_emp_id
^^^^ ^^^^
1 2
FROM account
GROUP BY 1;
In above query GROUP BY 1 refers to the first column in select statement which is
account_id.
You also can specify in ORDER BY.
Note : The number in ORDER BY and GROUP BY always start with 1 not with 0.
In addition to grouping by the field name, you may also group by ordinal, or position of the field within the table. 1 corresponds to the first field (regardless of name), 2 is the second, and so on.
This is generally ill-advised if you're grouping on something specific, since the table/view structure may change. Additionally, it may be difficult to quickly comprehend what your SQL query is doing if you haven’t memorized the table fields.
If you are returning a unique set, or quickly performing a temporary lookup, this is nice shorthand syntax to reduce typing. If you plan to run the query again at some point, I’d recommend replacing those to avoid future confusion and unexpected complications (due to scheme changes).
It will group by first field in the select clause
That means *"group by the 1st column in your select clause". Always use GROUP BY 1 together with ORDER BY 1.
You can also use GROUP BY 1,2,3... It is convenient, but you need to pay attention to that condition; the result may not be what you want if someone has modified your select columns and it's not visualized.
It will group by the column position you put after the group by clause.
for example if you run 'SELECT SALESMAN_NAME, SUM(SALES) FROM SALES GROUP BY 1'
it will group by SALESMAN_NAME.
One risk on doing that is if you run 'Select *' and for some reason you recreate the table with columns on a different order, it will give you a different result than you would expect.