I have always had the understanding that you use SELECT to select columns from a table. However, I was thrown off when I saw SELECT LAST_INSERT_ID(). I understand what it does... but I don't understand how we can simply just ask for the last inserted id like that. Isn't it true that the SELECT keyword expects to see column names immediately afterwards... so how does that function call satisfy that requirement?
The SELECT statement normally works with a FROM clause to select columns -- and expressions on columns and constants -- from rows in a table.
Without the FROM clause, a SELECT simply evaluates the expressions and returns one row. The function LAST_INSERT_ID() is simply an expression that returns a value, so:
SELECT LAST_INSERT_ID()
returns a result set with single row with a single (unnamed) column.
Some databases do not like the idea of a SELECT without a FROM. Oracle is one of them. It requires a FROM clause and provides a table with one column and one row. MySQL also supports dual, so you could write:
SELECT LAST_INSERT_ID()
FROM dual;
This is handy, if you want to include a WHERE clause with the SELECT (the WHERE requires a FROM in MySQL).
Related
In MySQL, I observed that a statement which uses an AGGREGATE FUNCTION in SELECT list gets executed though there is no GROUP BY clause. Other RDBMS products like SQL Server throw an error if we do so.
For example, SELECT col1,col2,sum(col3) FROM tbl1; gets executed without any error and returns the first row values of col1,col2 and sum of all values of col3. The result of the above query is a single row.
Can anyone please tell why does this happen with MySQL?
Thanks in advance!!
It's by design - it's one of many extensions to the standard that MySQL permits.
For a query like SELECT name, MAX(age) FROM t; the reference docs says that:
Without GROUP BY, there is a single group and it is indeterminate
which name value to choose for the group
See the documentation on group by handling for more information.
The setting ONLY_FULL_GROUP_BY controls this behavior, see 5.1.7 Server SQL Modes enabling this would disallow a query with an aggregate function lacking a group by statement and it's enabled by default from MySQL version 5.7.5.
You have two points in your question:
Select with mixed with aggregated and not aggregated columns (which not presented in GROUP BY)
Select with aggregated columns without GROUP BY.
First one described well in #jpw answer.
The second one is possible by SQL standard. And result of this query consists of one row.
a) If T is not a grouped table, then
Case:
i) If the <select list> contains a <set function specifica-
tion> that contains a reference to a column of T or di-
rectly contains a <set function specification> that does
not contain an outer reference, then T is the argument or
argument source of each such <set function specification>
and the result of the <query specification> is a table con-
sisting of 1 row. The i-th value of the row is the value
specified by the i-th <value expression>.
set function means aggregate function.
P.S. result that query over empty table consists of one row with nulls (this is the difference between GROUP BY NULL query and query with out GROUP BY at all).
A quote from the MySQL documentation, the page about the aggregate functions:
If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.
If you want a GROUP BY clause on your query then append GROUP BY NULL to it. I cannot tell about other RDBMS-es but on MySQL this is valid syntax. It works the same as the query without it.
Remarks about your query
A quote from your question:
"For example, SELECT col1,col2,sum(col3) FROM tbl1; gets executed without any error and returns the first row values of col1,col2 and sum of all values of col3."
The part with "the first row" is not something to rely on. It just happens most of the times that you get the first row.
Your query selects the columns col1 and col2 that are neither aggregate values nor functionally dependent on the columns in the GROUP BY clause. The query is not valid according to the SQL standard. MySQL allows it but its execution is undefined behaviour and the documentation about the handling of GROUP BY clearly states that:
... the server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate...
Can we write this query without using sub-query?
select * from test where EmpTran in
(select max(EmpTran) from test);
I tried this code but it returns empty set.
I read that, 'in absence of group by, entire data is taken as a single group', if that's the case the the query should return same result as the query above.
select EmpTran,EmpName from test
having EmpTran = max(EmpTran);
Sample data:
create table test(EmpName varchar(10),EmpTrans int);
insert into test values('Ans',100);
insert into test values('Sam',50);
insert into test values('Kar',150);
insert into test values('Sar',200);
insert into test values('Raj',200);
The second query doesn't work because as soon as you use an aggregation function anywhere in the query, it causes the rows to be aggregated. Since you don't have GROUP BY, everything is aggregated into a single row in the result set (just as you quoted: in absence of group by, entire data is taken as a single group). In this result set, EmpTran and EmpName are taken from arbitrary rows in the table (they might not even be from the same row).
HAVING then filters this result set. If the selected value of EmpTran doesn't match MAX(EmpTran), the row is removed from the result set and you get an empty result.
The order of processing is:
Use WHERE to select the rows to put in the result set.
Aggregate the result set if necessary.
Use HAVING to filter the aggregated result set.
I don't think there's a way to do this without a subquery in MySQL 5.x. In MySQL 8.x you can do it with a window function (I'm not familiar with these, so I'm not going to show it in my answer).
As Barmar has already explained, your second query won't work because finding the max of a column requires a formal separate subquery. This was the case for MySQL versions earlier than 8+. Starting with MySQL 8+, which introduced window functions, we could try something like this:
SELECT *
FROM
(
SELECT *, MAX(EmpTran) OVER () max_val
FROM test
) t
WHERE EmpTran = max_val;
Demo
The demo is in SQL Server, because Rextester does not yet support MySQL 8+. But, it should run on any database which implements the ANSI standard for window functions.
Is it possible to write a SELECT statement, which returns dataset with zero rows and zero columns?
A dataset will always have at least 1 column, even if it contains no data.
SELECT NULL;
EDIT:
As pointed out by #eggyal , above syntax will return a null row.
His query select null from dual where false; wont return a row.
Not possible in my opinion. You will get at least one column, but no rows.
Select null from yourTable where 1 = 2;
This works for postgresql:
create table test22 ();
select * from test22;
it's normally used for Creating empty Table from an Existing Table
CREATE TABLE NEW_TABLE_NAME AS
SELECT *
FROM EXISTING_TABLE_NAME
where 1=2
No, but it is possible to return a query with no rows. In order to do this without referencing any tables, you can use a subquery:
SELECT NULL FROM (SELECT NULL) AS temp WHERE false;
This query will have one (empty) column, but no rows.
I've used the above construct when there is a query that is different in different cases, followed by a code loop that iterates through the results, and under some conditions you want to make it skip the loop. Replacing the query with the one above is a way of returning empty results and thus skipping the loop without an if block. Because the query contains no table names, that aspect of the code never needs to be changed, and for this reason I prefer it to adding a condition like WHERE false in an existing query.
I prefer this solution to the more concise one referencing dual because that construct is not supported in PostgreSQL; this solution works with any backend that supports subqueries.
This is a simple question about efficiency specifically related to the MySQL implementation. I want to just check if a table is empty (and if it is empty, populate it with the default data). Would it be best to use a statement like SELECT COUNT(*) FROM `table` and then compare to 0, or would it be better to do a statement like SELECT `id` FROM `table` LIMIT 0,1 then check if any results were returned (the result set has next)?
Although I need this for a project I am working on, I am also interested in how MySQL works with those two statements and whether the reason people seem to suggest using COUNT(*) is because the result is cached or whether it actually goes through every row and adds to a count as it would intuitively seem to me.
You should definitely go with the second query rather than the first.
When using COUNT(*), MySQL is scanning at least an index and counting the records. Even if you would wrap the call in a LEAST() (SELECT LEAST(COUNT(*), 1) FROM table;) or an IF(), MySQL will fully evaluate COUNT() before evaluating further. I don't believe MySQL caches the COUNT(*) result when InnoDB is being used.
Your second query results in only one row being read, furthermore an index is used (assuming id is part of one). Look at the documentation of your driver to find out how to check whether any rows have been returned.
By the way, the id field may be omitted from the query (MySQL will use an arbitrary index):
SELECT 1 FROM table LIMIT 1;
However, I think the simplest and most performant solution is the following (as indicated in Gordon's answer):
SELECT EXISTS (SELECT 1 FROM table);
EXISTS returns 1 if the subquery returns any rows, otherwise 0. Because of this semantic MySQL can optimize the execution properly.
Any fields listed in the subquery are ignored, thus 1 or * is commonly written.
See the MySQL Manual for more info on the EXISTS keyword and its use.
It is better to do the second method or just exists. Specifically, something like:
if exists (select id from table)
should be the fastest way to do what you want. You don't need the limit; the SQL engine takes care of that for you.
By the way, never put identifiers (table and column names) in single quotes.
I'm trying to do a rather complicated SELECT computation that I will generalize:
Main query is a wildcard select for a table
One subquery does a COUNT() of all items based on a condition (this works fine)
Another subquery does a SUM() of numbers in a column based on another condition. This also works correctly, except when no records meet the conditions, it returns NULL.
I initially wanted to add up the two subqueries, something like (subquery1)+(subquery2) AS total which works fine unless subquery2 is null, in which case total becomes null, regardless of what the result of subquery1 is. My second thought was to try to create a third column that was to be a calculation of the two subqueries (ie, (subquery1) AS count1, (subquery2) AS count2, count1+count2 AS total) but I don't think it's possible to calculate two calculated columns, and even if it were, I feel like the same problem applies.
Does anyone have an elegant solution to this problem outside of just getting the two subquery values and totalling them in my program?
Thanks!
Two issues going on here:
You can't use one column alias in another expression in the same SELECT list.
However, you can establish aliases in a derived table subquery and use them in an outer query.
You can't do arithmetic with NULL, because NULL is not zero.
However, you can "default" NULL to a non-NULL value using the COALESCE() function. This function returns its first non-NULL argument.
Here's an example:
SELECT *, count1+count2 AS total
FROM (SELECT *, COALESCE((subquery1), 0) AS count1,
COALESCE((subquery2), 0) AS count2
FROM ... ) t;
(remember that a derived table must be given a table alias, "t" in this example)
First off, the COALESCE function should help you take care of any null problems.
Could you use a union to merge those two queries into a single result set, then treat it as a subquery for further analysis?
Or maybe I did not completely understand your question?
I would try (for the second query) something like: SELECT SUM(ISNULL(myColumn, 0)) //Please verify syntax on that before you use it, though...
This should return 0 instead of null for any instance of that column being zero.
It might be unnecessary to say, but since you're using it inside a program, You'd rather use program logic to sum the two results (NULL and a number), due to portability issues.
Who knows when COALESCE function is deprecated or if another DBMS supports it or not.