Sub-Query and having group by clause in sql - mysql

Can we write this query without using sub-query?
select * from test where EmpTran in
(select max(EmpTran) from test);
I tried this code but it returns empty set.
I read that, 'in absence of group by, entire data is taken as a single group', if that's the case the the query should return same result as the query above.
select EmpTran,EmpName from test
having EmpTran = max(EmpTran);
Sample data:
create table test(EmpName varchar(10),EmpTrans int);
insert into test values('Ans',100);
insert into test values('Sam',50);
insert into test values('Kar',150);
insert into test values('Sar',200);
insert into test values('Raj',200);

The second query doesn't work because as soon as you use an aggregation function anywhere in the query, it causes the rows to be aggregated. Since you don't have GROUP BY, everything is aggregated into a single row in the result set (just as you quoted: in absence of group by, entire data is taken as a single group). In this result set, EmpTran and EmpName are taken from arbitrary rows in the table (they might not even be from the same row).
HAVING then filters this result set. If the selected value of EmpTran doesn't match MAX(EmpTran), the row is removed from the result set and you get an empty result.
The order of processing is:
Use WHERE to select the rows to put in the result set.
Aggregate the result set if necessary.
Use HAVING to filter the aggregated result set.
I don't think there's a way to do this without a subquery in MySQL 5.x. In MySQL 8.x you can do it with a window function (I'm not familiar with these, so I'm not going to show it in my answer).

As Barmar has already explained, your second query won't work because finding the max of a column requires a formal separate subquery. This was the case for MySQL versions earlier than 8+. Starting with MySQL 8+, which introduced window functions, we could try something like this:
SELECT *
FROM
(
SELECT *, MAX(EmpTran) OVER () max_val
FROM test
) t
WHERE EmpTran = max_val;
Demo
The demo is in SQL Server, because Rextester does not yet support MySQL 8+. But, it should run on any database which implements the ANSI standard for window functions.

Related

Can i use MAX() without GROUPBY()? [duplicate]

In MySQL, I observed that a statement which uses an AGGREGATE FUNCTION in SELECT list gets executed though there is no GROUP BY clause. Other RDBMS products like SQL Server throw an error if we do so.
For example, SELECT col1,col2,sum(col3) FROM tbl1; gets executed without any error and returns the first row values of col1,col2 and sum of all values of col3. The result of the above query is a single row.
Can anyone please tell why does this happen with MySQL?
Thanks in advance!!
It's by design - it's one of many extensions to the standard that MySQL permits.
For a query like SELECT name, MAX(age) FROM t; the reference docs says that:
Without GROUP BY, there is a single group and it is indeterminate
which name value to choose for the group
See the documentation on group by handling for more information.
The setting ONLY_FULL_GROUP_BY controls this behavior, see 5.1.7 Server SQL Modes enabling this would disallow a query with an aggregate function lacking a group by statement and it's enabled by default from MySQL version 5.7.5.
You have two points in your question:
Select with mixed with aggregated and not aggregated columns (which not presented in GROUP BY)
Select with aggregated columns without GROUP BY.
First one described well in #jpw answer.
The second one is possible by SQL standard. And result of this query consists of one row.
a) If T is not a grouped table, then
Case:
i) If the <select list> contains a <set function specifica-
tion> that contains a reference to a column of T or di-
rectly contains a <set function specification> that does
not contain an outer reference, then T is the argument or
argument source of each such <set function specification>
and the result of the <query specification> is a table con-
sisting of 1 row. The i-th value of the row is the value
specified by the i-th <value expression>.
set function means aggregate function.
P.S. result that query over empty table consists of one row with nulls (this is the difference between GROUP BY NULL query and query with out GROUP BY at all).
A quote from the MySQL documentation, the page about the aggregate functions:
If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.
If you want a GROUP BY clause on your query then append GROUP BY NULL to it. I cannot tell about other RDBMS-es but on MySQL this is valid syntax. It works the same as the query without it.
Remarks about your query
A quote from your question:
"For example, SELECT col1,col2,sum(col3) FROM tbl1; gets executed without any error and returns the first row values of col1,col2 and sum of all values of col3."
The part with "the first row" is not something to rely on. It just happens most of the times that you get the first row.
Your query selects the columns col1 and col2 that are neither aggregate values nor functionally dependent on the columns in the GROUP BY clause. The query is not valid according to the SQL standard. MySQL allows it but its execution is undefined behaviour and the documentation about the handling of GROUP BY clearly states that:
... the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate...

Can somebody elaborate on the `SELECT LAST_INSERT_ID()` statement?

I have always had the understanding that you use SELECT to select columns from a table. However, I was thrown off when I saw SELECT LAST_INSERT_ID(). I understand what it does... but I don't understand how we can simply just ask for the last inserted id like that. Isn't it true that the SELECT keyword expects to see column names immediately afterwards... so how does that function call satisfy that requirement?
The SELECT statement normally works with a FROM clause to select columns -- and expressions on columns and constants -- from rows in a table.
Without the FROM clause, a SELECT simply evaluates the expressions and returns one row. The function LAST_INSERT_ID() is simply an expression that returns a value, so:
SELECT LAST_INSERT_ID()
returns a result set with single row with a single (unnamed) column.
Some databases do not like the idea of a SELECT without a FROM. Oracle is one of them. It requires a FROM clause and provides a table with one column and one row. MySQL also supports dual, so you could write:
SELECT LAST_INSERT_ID()
FROM dual;
This is handy, if you want to include a WHERE clause with the SELECT (the WHERE requires a FROM in MySQL).

Sql syntax: select without from clause as subquery in select (subselect)

While editing some queries to add alternatives for columns without values, I accidentally wrote something like this (here is the simplyfied version):
SELECT id, (SELECT name) FROM t
To my surprise, MySQL didn't throw any error, but completed the query giving my expected results (the name column values).
I tried to find any documentation about it, but with no success.
Is this SQL standard or a MySQL specialty?
Can I be sure that the result of this syntax is really the column value from the same (outer) table? The extended version would be like this:
SELECT id, (SELECT name FROM t AS t1 where t1.id=t2.id) FROM t AS t2
but the EXPLAIN reports No tables used in the Extra column for the former version, which I think is very nice.
Here's a simple fiddle on SqlFiddle (it keeps timing out for me, I hope you have better luck).
Clarification: I know about subqueries, but I always wrote subqueries (correlated or not) that implied a table to select from, hence causing an additional step in the execution plan; my question is about this syntax and the result it gives, that in MySQL seems to return the expected value without any.
What you within your first query is a correlated subquery which simply returns the name column from the table t. no actual subquery needs to run here (which is what your EXPLAIN is telling you).
In a SQL database query, a correlated subquery (also known as a
synchronized subquery) is a subquery (a query nested inside another
query) that uses values from the outer query.
https://en.wikipedia.org/wiki/Correlated_subquery
SELECT id, (SELECT name) FROM t
is the same as
SELECT id, (SELECT t.name) FROM t
Your 2nd query
SELECT id, (SELECT name FROM t AS t1 where t1.id=t2.id) FROM t AS t2
Also contains correlated subquery but this one is actually running a query on table t to find records where t1.id = t2.id.
This is the default behavior for the SQL language and it is defined on the SQL ANSI 2011 over ISO/IEC 9075-1:2011(en) documentation. Unfortunately it is not open. This behavior is described on the section 4.11 SQL-Statements.
This behavior happens because the databases process the select comand without the from clause, therefore if it encounters:
select id, (select name) from some
It will try to find that name field as a column of the outer queries to process.
Fortunately I remember that some while ago I've answered someone here and find a valid available link to an SQL ANSI document that is online in FULL but it is for the SQL ANSI 99 and the section may not be the same one as the new document. I think, did not check, that it is around the section 4.30. Take a look. And I really recommend the reading (I did that back in the day).
Database Language SQL - ISO/IEC 9075-2:1999 (E)
It's not standard. In oracle,
select 1, (select 2)
from dual
Throws error, ORA-00923: FROM keyword not found where expected
How can you be sure of your results? Get a better understanding of what the query is supposed to acheive before you write it. Even the exetended version in the question does not make any sense.

Check if MySQL Table is empty: COUNT(*) is zero vs. LIMIT(0,1) has a result?

This is a simple question about efficiency specifically related to the MySQL implementation. I want to just check if a table is empty (and if it is empty, populate it with the default data). Would it be best to use a statement like SELECT COUNT(*) FROM `table` and then compare to 0, or would it be better to do a statement like SELECT `id` FROM `table` LIMIT 0,1 then check if any results were returned (the result set has next)?
Although I need this for a project I am working on, I am also interested in how MySQL works with those two statements and whether the reason people seem to suggest using COUNT(*) is because the result is cached or whether it actually goes through every row and adds to a count as it would intuitively seem to me.
You should definitely go with the second query rather than the first.
When using COUNT(*), MySQL is scanning at least an index and counting the records. Even if you would wrap the call in a LEAST() (SELECT LEAST(COUNT(*), 1) FROM table;) or an IF(), MySQL will fully evaluate COUNT() before evaluating further. I don't believe MySQL caches the COUNT(*) result when InnoDB is being used.
Your second query results in only one row being read, furthermore an index is used (assuming id is part of one). Look at the documentation of your driver to find out how to check whether any rows have been returned.
By the way, the id field may be omitted from the query (MySQL will use an arbitrary index):
SELECT 1 FROM table LIMIT 1;
However, I think the simplest and most performant solution is the following (as indicated in Gordon's answer):
SELECT EXISTS (SELECT 1 FROM table);
EXISTS returns 1 if the subquery returns any rows, otherwise 0. Because of this semantic MySQL can optimize the execution properly.
Any fields listed in the subquery are ignored, thus 1 or * is commonly written.
See the MySQL Manual for more info on the EXISTS keyword and its use.
It is better to do the second method or just exists. Specifically, something like:
if exists (select id from table)
should be the fastest way to do what you want. You don't need the limit; the SQL engine takes care of that for you.
By the way, never put identifiers (table and column names) in single quotes.

Two similar MySQL queries give different results

I have a database that holds readings for devices. I am trying to write a query that can select the latest reading from a device. I have two queries that are seemingly the same and that I'd expect to give the same results; however they do not. The queries are as follows:
First query:
select max(datetime), reading
from READINGS
where device_id = '1234567890'
Second query:
select datetime, reading
from READINGS
where device_id = '1234567890' and datetime = (select max(datetime)
from READINGS
where device_id = '1234567890')
The they both give different results for the reading attribute. The second one is the one that gives the right result but why does the first give something different?
This is MySQL behaviour at work. When you use grouping the columns you select must either appear in the group by or be an aggregate function eg min(), max(). Mixing aggregates and normal columns is not allowed in most other database flavours.
The first query will just return the first rating in each group (first in the sense of where it appears on the file system), which is most likely wrong.
The second query correlates rating with maximum time stamp leading to the correct result.
It is because you are not using a GROUP BY reading clause, which you should be using in both queries.
This is normal on MySQL. See the documentation on this:
If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.
Also, read http://dev.mysql.com/doc/refman/5.0/en/group-by-hidden-columns.html
You can use the Explain and Explan extended commands to know more about your queries.