MySQL query / clause execution order - mysql

What is the predefined order in which the clauses are executed in MySQL? Is some of it decided at run time, and is this order correct?
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause

The actual execution of MySQL statements is a bit tricky. However, the standard does specify the order of interpretation of elements in the query. This is basically in the order that you specify, although I think HAVING and GROUP BY could come after SELECT:
FROM clause
WHERE clause
SELECT clause
GROUP BY clause
HAVING clause
ORDER BY clause
This is important for understanding how queries are parsed. You cannot use a column alias defined in a SELECT in the WHERE clause, for instance, because the WHERE is parsed before the SELECT. On the other hand, such an alias can be in the ORDER BY clause.
As for actual execution, that is really left up to the optimizer. For instance:
. . .
GROUP BY a, b, c
ORDER BY NULL
and
. . .
GROUP BY a, b, c
ORDER BY a, b, c
both have the effect of the ORDER BY not being executed at all -- and so not executed after the GROUP BY (in the first case, the effect is to remove sorting from the GROUP BY and in the second the effect is to do nothing more than the GROUP BY already does).

This is how you can get the rough idea about how mysql executes the select query
DROP TABLE if exists new_table;
CREATE TABLE `new_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`testdecimal` decimal(6,2) DEFAULT NULL,
PRIMARY KEY (`id`));
INSERT INTO `new_table` (`testdecimal`) VALUES ('1234.45');
INSERT INTO `new_table` (`testdecimal`) VALUES ('1234.45');
set #mysqlorder := '';
select #mysqlorder := CONCAT(#mysqlorder," SELECT ") from new_table,(select #mysqlorder := CONCAT(#mysqlorder," FROM ")) tt
JOIN (select #mysqlorder := CONCAT(#mysqlorder," JOIN1 ")) t on ((select #mysqlorder := CONCAT(#mysqlorder," ON1 ")) or rand() < 1)
JOIN (select #mysqlorder := CONCAT(#mysqlorder," JOIN2 ")) t2 on ((select #mysqlorder := CONCAT(#mysqlorder," ON2 ")) or rand() < 1)
where ((select #mysqlorder := CONCAT(#mysqlorder," WHERE ")) or IF(new_table.testdecimal = 1234.45,true,false))
group by (select #mysqlorder := CONCAT(#mysqlorder," GROUPBY ")),id
having (select #mysqlorder := CONCAT(#mysqlorder," HAVING "))
order by (select #mysqlorder := CONCAT(#mysqlorder," ORDERBY "));
select #mysqlorder;
And here is the output from above mysql query, hope you can figure out the mysql execution of a SELECT query :-
FROM JOIN1 JOIN2 WHERE ON2 ON1 ORDERBY GROUPBY SELECT WHERE
ON2 ON1 ORDERBY GROUPBY SELECT HAVING HAVING

It appears that the generalized pattern in Standard SQL for Logical Query Processing Phase is (at least from SQL-92 - starting on p.177) :
from clause
joined table
where clause
group by clause
having clause
query specification (ie. SELECT)
You can find and download newer Standard SQL Standardization documents from here:
https://wiki.postgresql.org/wiki/Developer_FAQ#Where_can_I_get_a_copy_of_the_SQL_standards.3F
For MSSQL (since it tends to stay farily close to standard in my experience) the Logical Query Processing Phase is generally:
FROM <left_table>
ON <join_condition>
<join_type> JOIN <right_table>
WHERE <where_condition>
GROUP BY <group_by_list>
WITH {CUBE | ROLLUP}
HAVING <having_condition>
SELECT
DISTINCT
ORDER BY <order_by_list>
<TOP_specification> <select_list>
From: Chapter 1 of Ben-Gan, Itzik, et. al., Inside Microsoft SQL Server 2005: T-SQL Querying, (Microsoft Press)
Also: Ben-Gan, Itzik, et. al., Training Kit (Exam 70-461) Querying Microsoft SQL Server 2012 (MCSA), (Microsoft Press)
It should be noted that MySQL can be configured to operate closer to standard as well if desired by setting the SQL Mode (although probably only recommended for fringe cases):
https://dev.mysql.com/doc/refman/8.0/en/sql-mode.html
For MySQL, I searched both MySQL and MariaDB documentation and could find nothing other than the few statements that Gordon Linoff mentioned in passing that were in the MySQL documentation for SELECT. They are:
If ORDER BY occurs within a parenthesized query expression and also is applied in the outer query, the results are undefined and may change in a future version of MySQL.
If LIMIT occurs within a parenthesized query expression and also is applied in the outer query, the results are undefined and may change in a future version of MySQL.
The HAVING clause is applied nearly last, just before items are sent to the client, with no optimization. (LIMIT is applied after HAVING.)
From MySQL JOIN Documentation: Natural joins and joins with USING, including outer join variants, are processed according to the SQL:2003 standard
Given that a quick skim through of SQL-92 from "from clause" to "query specification" showed that the Logic can be conditional at times depending on how the query is written, and given that I could not find anything in the MySQL or MariaDB documentation (not saying it is not there, I just could not find it), and other articles on MySQL's Logical Query Processing Phase were conflicting in their order, it seems that the best way that MySQL gives to determine some sort of Logical Query Processing Phase (or at least the steps used for join optimization for the query plan) for a specific query is to do a trace on the query execution by doing the following (from MySQL documentation "Tracing The Optimizer/Typical Usage"):
# Turn tracing on (it's off by default):
SET optimizer_trace="enabled=on";
SELECT ...; # your query here
SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;
# possibly more queries...
# When done with tracing, disable it:
SET optimizer_trace="enabled=off";
You can interpret the results by looking at MySQL documentation's Tracing Example.
Basically, it appears that you want to look for "join_optimization" (which says that the from/join statements are being evaluated for the specific query, the specific query being the one stated as "Select#"), then look for "condition_processing: condition", and then "clause_processing: clause". As it says in the MySQL documentation for General Trace Structure:
A trace follows closely the actual execution path: there is a join-preparation object, a join-optimization object, a join-execution object, for each JOIN... It is far from showing everything happening in the optimizer, but we plan to show more information in the future.
Interesting enough, I found that running a query like the following in MySQL gave me that its apparent process order for query optimization was FROM,WHERE,HAVING,ORDER BY, and then GROUP BY:
SELECT
a.id
, max(a.timestamp)
FROM database.table AS a
LEFT JOIN database.table2 AS b on a.id = b.id
WHERE a.id > 1
GROUP BY a.id HAVING max(a.timestamp) > 0
ORDER BY a.id
I am assuming that since "condition_processing" and "clause_processing" are within the "Select#" group that these are processed before SELECT - which lines up with SQL-99, but it is an assumption.
In terms of operators and variables, Zawodny, Jeremy D., et. al., High Performance MySQL, 2nd Edition, (O'Reilly) states that:
The := assignment operator has lower precedence than any other operator, so you have to be careful to parenthesize explicitly.
I only mention this since sometimes it may not be order of Logical Query Processing Phase as much as precedence of assignment operator when working with a variable, or variables, that could be an issue for troubleshooting if the query is not executing as thought.

I think the execution order is like this:
(7) SELECT
(8) DISTINCT <select_list>
(1) FROM <left_table>
(3) <join_type> JOIN <right_table>
(2) ON <join_condition>
(4) WHERE <where_condition>
(5) GROUP BY <group_by_list>
(6) HAVING <having_condition>
(9) ORDER BY <order_by_condition>
(10) LIMIT <limit_number>[, <offset_number>]

Related

Can't Set User-defined Variable From MySQL to Excel With ODBC

I have a query that has a user-defined variable set on top of the main query. Its something like this.
set #from_date = '2019-10-01', #end_date = '2019-12-31';
SELECT * FROM myTable
WHERE create_time between #from_date AND #end_date;
It works just fine when I executed it in MySQL Workbench, but when I put it to Excel via MySQL ODBC it always shows an error like this.
I need that user-defined variable to works in Excel. What am I supposed to change in my query?
The ODBC connector is most likely communicating with MySQL via statements or prepared statements, one statement at a time, and user variables are not supported. A prepared statement would be one way you could bind your date literals. Another option here, given your example, would be to just inline the date literals:
SELECT *
FROM myTable
WHERE create_time >= '2019-10-01' AND create_time < '2020-01-01';
Side note: I expressed the check on the create_time, which seems to want to include the final quarter of 2019, using an inequality. The reason for this is that if create_time be a timestamp/datetime, then using BETWEEN with 31-December on the RHS would only include that day at midnight, at no time after it.
Use subquery for variables values init:
SELECT *
FROM myTable,
( SELECT #from_date := '2019-10-01', #end_date := '2019-12-31' ) init_vars
WHERE create_time between #from_date AND #end_date;
Pay attention:
SELECT is used, not SET;
Assinging operator := is used, = operator will be treated as comparing one in this case giving wrong result;
Alias (init_vars) may be any, but it is compulsory.
Variable is initialized once but may be used a lot of times.
If your query is complex (nested) the variables must be initialized in the most inner subquery. Or in the first CTE if DBMS version knows about CTEs. If the query is extremely complex, and you cannot determine what subquery is "the most inner", then look for execution plan for what table is scanned firstly, and use its subquery for variables init (but remember that execution plan may be altered in any moment).

Initialise user defined variable in mysql with in a subquery

I was going through this answer How do you select every n-th row from mysql. In that I am not able to understand the initialisation in the following subquery.
SELECT
#row := #row +1 AS rownum, [column name]
FROM (
SELECT #row :=0) r, [table name]
How exactly the initialisation of
SELECT #row :=0
is working?
Is some kind of join happening between table ‘r’ and ‘table name’?
If I change above query as below, would there be any difference in the performance?
SET #row = 0;
SELECT
#row := #row +1 AS rownum, [column name] FROM [table name]
Please share your thoughts.
Using two statements, initializing the user-defined variable in a separate statement will be equivalent performance.
Instead of the SET statement, we could do
SELECT #row : = 0
which would achieve the same result, assigning a value to the user-defined variable #row. The difference would that MySQL needs to prepare a resultset to be returned to the client. We avoid that with the SET statement, which doesn't return a resultset.
With two separate statement executions, there's the overhead of sending an extra statement: parsing tokens, syntax check, semantic check, ... and returning the status to the client. It's a small amount of overhead. We aren't going to notice it onesie-twosie.
So performance will be equivalent.
I strongly recommend ditching the oldschool comma syntax for join operation, and using the JOIN keyword instead.
Consider the query:
SELECT t.foo
FROM r
CROSS
JOIN t
ORDER BY t.foo
What happens when the table r is guaranteed to contain exactly one row?
The query is equivalent to:
SELECT t.foo
FROM t
ORDER BY t.foo
We can use a SELECT query in place of a table or view. Consider for example:
SELECT v.foo
FROM ( SELECT t.foo
FROM t
) v
Also consider what happens with this query:
SELECT #foo := 0
There is no FROM clause (or Oracle-style FROM dual), so the query will return a single row. The expression in the SELECT list is evaluated... the constant value 0 is assigned to the user-defined variable #foo.
Consider this query:
SELECT 'bar'
FROM ( SELECT #foo := 0 ) r
Before the outer query runs, the SELECT inside the parens is executed. (MySQL calls it an "derived table" but more generically it's an inline view definition.) The net effect is that the constant 0 is assigned to the user-defined variable, and a single row is returned. So the outer query returns a single row.
If we understand that, we have what we need to understand what is happening here:
SELECT t.mycol
FROM ( SELECT #row := 0 ) r
CROSS
JOIN mytable t
ORDER
BY t.mycol
Inline view r is evaluated, the SELECT returns a single row, the value 0 is assigned to user-defined variable #row. Since r is guaranteed to return a single row, we know that the Cartesian product (cross join) with mytable will result in one row for each row in mytable. Effectively yielding just a copy of mytable.
To answer the question that wasn't asked:
The benefit of doing the initialization within the statement rather than a separate statement is that we now have a single statement that stands alone. It knocks out a dependency i.e. doesn't require a separate execution of a SET statement to assign the user defined variable. Which also cuts out a roundtrip to the database to prepare and execute a separate statement.

How to find rows in SQL that end with the same string?

I have a question similar to the one found here: How to find rows in SQL that start with the same string (similar rows)?, and this solution works in MySQL 5.6 but not 5.7.
I have a database (t) with multiple columns, the important ones being id and filepath, and what I am trying to accomplish is retrieving all the file paths which have the same last 5 characters. The following works in MySQL5.6, and the second SELECT works fine in 5.7:
SELECT id, filepath FROM t
WHERE SUBSTRING(filepath, -5) IN
(
SELECT SUBSTRING(filepath, -5)
FROM t
GROUP BY SUBSTRING(filepath, -5)
HAVING COUNT(*) > 1
)
But when I try to run it on 5.7 I get the error
Expression #1 of HAVING clause is not in GROUP BY clause and contains
nonaggregated column 't.filepath' which is not functionally dependent on
columns in GROUP BY clause; this is incompatible with
sql_mode=only_full_group_by
Sample data:
id filepath
1 /Desktop/file1.txt
2 /Desktop/file2.txt
3 /Desktop/file1.txt
and I would want to return the rows with id 1 and 3. How can I fix this for MySQL5.7?
EDIT: Also can anybody point me in the right direction for the SQL to remove the duplicates? So I would want to remove the entry for id 3 but keep the entry for id 1 and 2.
Please read the mysql documentation on the subject GROUP BY and sql_mode only_full_group_by (like your error message says):
https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
I think changing the inner query to this might fix the problem:
SELECT SUBSTRING(filepath, -5) AS fpath
FROM t
GROUP BY fpath
HAVING COUNT(fpath) > 1
Edit:
As to your question of why adding the "AS fpath" works:
Adding the alias "fpath" is just a clean way to do this. The point of ONLY_FULL_GROUP_BY is that each field you use in the SELECT, HAVING, or ORDER BY must also be in the GROUP BY.
So I added the fpath-alias for multiple reasons:
For performance: The query you wrote had SUBSTRING(filepath, -5) twice, which
is bad for performance. Mysql has to execute that SUBSTRING call twice,
while in my case it has to do it only once (per row).
To fix the group-by issue: You had COUNT() in the having, but "" was not in your GROUP BY statement (I'm not even sure whether that would be possible). You had to count "something", so since "fpath" was in your SELECT and in your GROUP BY, using that as your COUNT() would fix the problem.
I prefer not to put subqueries in an IN() predicate because MySQL tends to run the subquery many times.
You can write the query differently to put the subquery in the FROM clause as a derived table. That will make MySQL run the subquery just once.
SELECT id, filepath
FROM (
SELECT SUBSTRING(filepath, -5) AS suffix, COUNT(*) AS count
FROM t
GROUP BY suffix
HAVING count > 1
) AS t1
JOIN t AS t2 ON SUBSTRING(t2.filepath, -5) = t1.suffix
This is bound to do a table-scan though, so it's going to be a costly query. It can't use an index when doing a substring comparison like that.
To optimize this, you might create a virtual column with an index.
ALTER TABLE t
ADD COLUMN filepath_last VARCHAR(10) AS (SUBSTRING_INDEX(filepath, '/', -1)),
ADD KEY (filepath_last);
Then you can query it like this, and at least the subquery uses an index:
SELECT id, filepath
FROM (
SELECT filepath_last, COUNT(*) AS count
FROM t
GROUP BY filepath_last
HAVING count > 1
) AS t1
STRAIGHT_JOIN t AS t2 ON t2.filepath_last = t1.filepath_last
The solution that ended up working for me was found here: Disable ONLY_FULL_GROUP_BY
I ran SELECT ##sql_mode then SET ##sql_mode = followed by a string containing all the values returned by the first query except for only_full_group_by, but I'm still interested in how this is to be accomplished without changing the SQL settings.

Table statistics (aka row count) over time

i'm preparing a presentation about one of our apps and was asking myself the following question: "based on the data stored in our database, how much growth have happend over the last couple of years?"
so i'd like to basically show in one output/graph, how much data we're storing since beginning of the project.
my current query looks like this:
SELECT DATE_FORMAT(created,'%y-%m') AS label, COUNT(id) FROM table GROUP BY label ORDER BY label;
the example output would be:
11-03: 5
11-04: 200
11-05: 300
unfortunately, this query is missing the accumulation. i would like to receive the following result:
11-03: 5
11-04: 205 (200 + 5)
11-05: 505 (200 + 5 + 300)
is there any way to solve this problem in mysql without the need of having to call the query in a php-loop?
Yes, there's a way to do that. One approach uses MySQL user-defined variables (and behavior that is not guaranteed)
SELECT s.label
, s.cnt
, #tot := #tot + s.cnt AS running_subtotal
FROM ( SELECT DATE_FORMAT(t.created,'%y-%m') AS `label`
, COUNT(t.id) AS cnt
FROM articles t
GROUP BY `label`
ORDER BY `label`
) s
CROSS
JOIN ( SELECT #tot := 0 ) i
Let's unpack that a bit.
The inline view aliased as s returns the same resultset as your original query.
The inline view aliased as i returns a single row. We don't really care what it returns (except that we need it to return exactly one row because of the JOIN operation); what we care about is the side effect, a value of zero gets assigned to the #tot user variable.
Since MySQL materializes the inline view as a derived table, before the outer query runs, that variable gets initialized before the outer query runs.
For each row processed by the outer query, the value of cnt is added to #tot.
The return of s.cnt in the SELECT list is entirely optional, it's just there as a demonstration.
N.B. The MySQL reference manual specifically states that this behavior of user-defined variables is not guaranteed.

Convert query from MySQL to SQL Server

I am trying to convert below MySQL query to SQL Server.
SELECT
#a:= #a + 1 serial_number,
a.id,
a.file_assign_count
FROM usermaster a,
workgroup_master b,
(
SELECT #a: = 0
) AS c
WHERE a.wgroup = b.id
AND file_assign_count > 0
I understand that := operator in MySQL assigns value to a variable & returns the value immediately. But how can I simulate this behavior in SQL Server?
SQL Server 2005 and later support the standard ROW_NUMBER() function, so you can do it this way:
SELECT ROW_NUMBER() OVER (ORDER BY xxxxx) AS serial_number,
a.id,
a.file_assign_count
FROM usermaster a
JOIN workgroup_master b ON a.wgroup = b.id
WHERE file_assign_count > 0
Re your comments: I edited the above to show the OVER clause. The row-numbering only has any meaning if you define the sort order. Your original query didn't do this, which means it's up to the RDBMS what order the rows are returned in.
But when using ROW_NUMBER() you must be specific. Where I put xxxxx, you would put a column or expression to define the sort order. See explanation and examples in the documentation.
The subquery setting #a:=0 is only for initializing that variable, and it doesn't need to be joined into the query anyway. It's just a style that some developers use. This is not needed in SQL Server, because you don't need the user variable at all when you can use ROW_NUMBER() instead.
If the SQL Server database is returning two rows where your MySQL database returned one row, the data must be different. Because neither ROW_NUMBER() or the MySQL user variables would limit the number of rows returned.
P.S.: please use JOIN syntax. It has been standard since 1992.