I was going through this answer How do you select every n-th row from mysql. In that I am not able to understand the initialisation in the following subquery.
SELECT
#row := #row +1 AS rownum, [column name]
FROM (
SELECT #row :=0) r, [table name]
How exactly the initialisation of
SELECT #row :=0
is working?
Is some kind of join happening between table ‘r’ and ‘table name’?
If I change above query as below, would there be any difference in the performance?
SET #row = 0;
SELECT
#row := #row +1 AS rownum, [column name] FROM [table name]
Please share your thoughts.
Using two statements, initializing the user-defined variable in a separate statement will be equivalent performance.
Instead of the SET statement, we could do
SELECT #row : = 0
which would achieve the same result, assigning a value to the user-defined variable #row. The difference would that MySQL needs to prepare a resultset to be returned to the client. We avoid that with the SET statement, which doesn't return a resultset.
With two separate statement executions, there's the overhead of sending an extra statement: parsing tokens, syntax check, semantic check, ... and returning the status to the client. It's a small amount of overhead. We aren't going to notice it onesie-twosie.
So performance will be equivalent.
I strongly recommend ditching the oldschool comma syntax for join operation, and using the JOIN keyword instead.
Consider the query:
SELECT t.foo
FROM r
CROSS
JOIN t
ORDER BY t.foo
What happens when the table r is guaranteed to contain exactly one row?
The query is equivalent to:
SELECT t.foo
FROM t
ORDER BY t.foo
We can use a SELECT query in place of a table or view. Consider for example:
SELECT v.foo
FROM ( SELECT t.foo
FROM t
) v
Also consider what happens with this query:
SELECT #foo := 0
There is no FROM clause (or Oracle-style FROM dual), so the query will return a single row. The expression in the SELECT list is evaluated... the constant value 0 is assigned to the user-defined variable #foo.
Consider this query:
SELECT 'bar'
FROM ( SELECT #foo := 0 ) r
Before the outer query runs, the SELECT inside the parens is executed. (MySQL calls it an "derived table" but more generically it's an inline view definition.) The net effect is that the constant 0 is assigned to the user-defined variable, and a single row is returned. So the outer query returns a single row.
If we understand that, we have what we need to understand what is happening here:
SELECT t.mycol
FROM ( SELECT #row := 0 ) r
CROSS
JOIN mytable t
ORDER
BY t.mycol
Inline view r is evaluated, the SELECT returns a single row, the value 0 is assigned to user-defined variable #row. Since r is guaranteed to return a single row, we know that the Cartesian product (cross join) with mytable will result in one row for each row in mytable. Effectively yielding just a copy of mytable.
To answer the question that wasn't asked:
The benefit of doing the initialization within the statement rather than a separate statement is that we now have a single statement that stands alone. It knocks out a dependency i.e. doesn't require a separate execution of a SET statement to assign the user defined variable. Which also cuts out a roundtrip to the database to prepare and execute a separate statement.
Related
So I found this code snippet here on SO. It essentially fakes a "row_number()" function for MySQL. It executes quite fast, which I like and need, but I am unable to tack on a where clause at the end.
select
#i:=#i+1 as iterator, t.*
from
big_table as t, (select #i:=0) as foo
Adding in where iterator = 875 yields an error.
The snippet above executes in about .0004 seconds. I know I can wrap it within another query as a subquery, but then it becomes painfully slow.
select * from (
select
#i:=#i+1 as iterator, t.*
from
big_table as t, (select #i:=0) as foo) t
where iterator = 875
The snippet above takes over 10 seconds to execute.
Anyway to speed this up?
In this case you could use the LIMIT as a WHERE:
select
#i:=#i+1 as iterator, t.*
from
big_table as t, (select #i:=874) as foo
LIMIT 875,1
Since you only want record 875, this would be fast.
Could you please try this?
Increasing the value of the variable in the where clause and checking it against 875 would do the trick.
SELECT
t.*
FROM
big_table AS t,
(SELECT #i := 0) AS foo
WHERE
(#i :=#i + 1) = 875
LIMIT 1
Caution:
Unless you specify an order by clause it's not guaranteed that you will get the same row every time having the desired row number. MySQL doesn't ensure this since data in table is an unordered set.
So, if you specify an order on some field then you don't need user defined variable to get that particular record.
SELECT
big_table.*
FROM big_table
ORDER BY big_table.<some_field>
LIMIT 875,1;
You can significantly improve performance if the some_field is indexed.
There are several good posts on how to number rows within groups with MySQL, but how does the actually code work? I'm unclear on what MySQL evaluates first in the code below.
For instance, placing #yearqt := yearqt as bloc before the IF() call produces different results, and I'm unclear on the role of the s1 subquery in initializing the # variables: when are they updated as MySQL runs through the data rows? Is the order by statement run before the select?
The code below selects three random records per yearqt group. There may be other ways to do this, but the question pertains to how the code works, not how I could do this differently or whether I can do this more efficiently. Thank you.
select * from (
select customer_id , yearqt , a ,
IF(#yearqt = yearqt , #rownum := #rownum + 1 , #rownum := 1) as rownum ,
#yearqt := yearqt as bloc
from
( select customer_id , yearqt , rand(123) as a from tbl
order by rand(123)
) a join ( select #rownum := 0 , #yearqt := '' ) s1
order by yearqt
) s2
where rownum <= 3
order by bloc
This question is related to how the engine retrieves SQL SELECT query results. The order is roughly the following:
Calculate explain plan
Calculate sets and join them using plan's directives (FROM / JOIN phase)
Apply WHERE clause
Apply GROUP BY/HAVING clause
Apply ORDER BY clause
Projection phase: every row returned is ordered and can now be 'displayed'.
So, in respect to the variables, you now understand why there's subquery to initialize them. This subquery is evaluated only once, and at the beginning of the process.
After that, the project phase seems to treat each selected attribute in the order you decided which is the reason why puting #yearqt := yearqt as bloc up one attribute would changes the outcome of the next/previous IF statement. Since each row will be projected once, it means any work you're doing on the variables will be done as many times as the number of rows in the final resulset.
The purpose of this
join ( select #rownum := 0 , #yearqt := '' ) s1
is to initialize the user-defined variables at the beginning of statement execution. Because this is a rowsource for the outer query (MySQL calls it a derived table) this will be executed BEFORE the outer query runs. We aren't really interested in what this query returns, except that it returns a single row, because of the JOIN operation.
So this inline view s1 could be omitted from the query and be replaced by a couple of SET statements that are executed immediately before the query:
SET #rownum := 0;
SET #yearqt := 0;
But then we'd have three separate statements to run, and we'd get different output from the query if these weren't run, if those variables were set to some other value. By including this in the query itself, it's a single statement, and we remove the dependency on separate SET statements.
This is the query that's really doing the work, whittled down to just the two expressions that matter in this case
SELEECT IF(#yearqt = t.yearqt , #rownum := #rownum + 1 , #rownum := 1) as rownum
, #yearqt := t.yearqt as bloc
FROM ( ... ) t
ORDER BY t.yearqt
Some key points that make this "work"
MySQL processes the expressions in the SELECT list in the order that they appear in the SELECT list.
MySQL processes the rows in the order specified in the ORDER BY.
The references to user-defined variables are evaluated for each row, not once at the beginning of the statement.
Note that the MySQL Reference Manual points out that this behavior is not guaranteed. (So, it may change in a future release.)
So, the processing of that can be described as
for the first expression:
compare the value of the yearqt column from the current row with current value of #yearqt user-defined variable
set the value of #rownum user-defined variable
return the result of the IF() expression in the resultset
for the second expression:
set the value of the #yearqt user-defined variable to the value of the yearqt column from the current row
return the value of the yearqt column in the resultset
The net effect is that for each row processed, we're comparing the value in the yearqt column to the value from the "previously" processed row, and we're saving the current value to compare to the next row.
i'm preparing a presentation about one of our apps and was asking myself the following question: "based on the data stored in our database, how much growth have happend over the last couple of years?"
so i'd like to basically show in one output/graph, how much data we're storing since beginning of the project.
my current query looks like this:
SELECT DATE_FORMAT(created,'%y-%m') AS label, COUNT(id) FROM table GROUP BY label ORDER BY label;
the example output would be:
11-03: 5
11-04: 200
11-05: 300
unfortunately, this query is missing the accumulation. i would like to receive the following result:
11-03: 5
11-04: 205 (200 + 5)
11-05: 505 (200 + 5 + 300)
is there any way to solve this problem in mysql without the need of having to call the query in a php-loop?
Yes, there's a way to do that. One approach uses MySQL user-defined variables (and behavior that is not guaranteed)
SELECT s.label
, s.cnt
, #tot := #tot + s.cnt AS running_subtotal
FROM ( SELECT DATE_FORMAT(t.created,'%y-%m') AS `label`
, COUNT(t.id) AS cnt
FROM articles t
GROUP BY `label`
ORDER BY `label`
) s
CROSS
JOIN ( SELECT #tot := 0 ) i
Let's unpack that a bit.
The inline view aliased as s returns the same resultset as your original query.
The inline view aliased as i returns a single row. We don't really care what it returns (except that we need it to return exactly one row because of the JOIN operation); what we care about is the side effect, a value of zero gets assigned to the #tot user variable.
Since MySQL materializes the inline view as a derived table, before the outer query runs, that variable gets initialized before the outer query runs.
For each row processed by the outer query, the value of cnt is added to #tot.
The return of s.cnt in the SELECT list is entirely optional, it's just there as a demonstration.
N.B. The MySQL reference manual specifically states that this behavior of user-defined variables is not guaranteed.
What is the predefined order in which the clauses are executed in MySQL? Is some of it decided at run time, and is this order correct?
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
The actual execution of MySQL statements is a bit tricky. However, the standard does specify the order of interpretation of elements in the query. This is basically in the order that you specify, although I think HAVING and GROUP BY could come after SELECT:
FROM clause
WHERE clause
SELECT clause
GROUP BY clause
HAVING clause
ORDER BY clause
This is important for understanding how queries are parsed. You cannot use a column alias defined in a SELECT in the WHERE clause, for instance, because the WHERE is parsed before the SELECT. On the other hand, such an alias can be in the ORDER BY clause.
As for actual execution, that is really left up to the optimizer. For instance:
. . .
GROUP BY a, b, c
ORDER BY NULL
and
. . .
GROUP BY a, b, c
ORDER BY a, b, c
both have the effect of the ORDER BY not being executed at all -- and so not executed after the GROUP BY (in the first case, the effect is to remove sorting from the GROUP BY and in the second the effect is to do nothing more than the GROUP BY already does).
This is how you can get the rough idea about how mysql executes the select query
DROP TABLE if exists new_table;
CREATE TABLE `new_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`testdecimal` decimal(6,2) DEFAULT NULL,
PRIMARY KEY (`id`));
INSERT INTO `new_table` (`testdecimal`) VALUES ('1234.45');
INSERT INTO `new_table` (`testdecimal`) VALUES ('1234.45');
set #mysqlorder := '';
select #mysqlorder := CONCAT(#mysqlorder," SELECT ") from new_table,(select #mysqlorder := CONCAT(#mysqlorder," FROM ")) tt
JOIN (select #mysqlorder := CONCAT(#mysqlorder," JOIN1 ")) t on ((select #mysqlorder := CONCAT(#mysqlorder," ON1 ")) or rand() < 1)
JOIN (select #mysqlorder := CONCAT(#mysqlorder," JOIN2 ")) t2 on ((select #mysqlorder := CONCAT(#mysqlorder," ON2 ")) or rand() < 1)
where ((select #mysqlorder := CONCAT(#mysqlorder," WHERE ")) or IF(new_table.testdecimal = 1234.45,true,false))
group by (select #mysqlorder := CONCAT(#mysqlorder," GROUPBY ")),id
having (select #mysqlorder := CONCAT(#mysqlorder," HAVING "))
order by (select #mysqlorder := CONCAT(#mysqlorder," ORDERBY "));
select #mysqlorder;
And here is the output from above mysql query, hope you can figure out the mysql execution of a SELECT query :-
FROM JOIN1 JOIN2 WHERE ON2 ON1 ORDERBY GROUPBY SELECT WHERE
ON2 ON1 ORDERBY GROUPBY SELECT HAVING HAVING
It appears that the generalized pattern in Standard SQL for Logical Query Processing Phase is (at least from SQL-92 - starting on p.177) :
from clause
joined table
where clause
group by clause
having clause
query specification (ie. SELECT)
You can find and download newer Standard SQL Standardization documents from here:
https://wiki.postgresql.org/wiki/Developer_FAQ#Where_can_I_get_a_copy_of_the_SQL_standards.3F
For MSSQL (since it tends to stay farily close to standard in my experience) the Logical Query Processing Phase is generally:
FROM <left_table>
ON <join_condition>
<join_type> JOIN <right_table>
WHERE <where_condition>
GROUP BY <group_by_list>
WITH {CUBE | ROLLUP}
HAVING <having_condition>
SELECT
DISTINCT
ORDER BY <order_by_list>
<TOP_specification> <select_list>
From: Chapter 1 of Ben-Gan, Itzik, et. al., Inside Microsoft SQL Server 2005: T-SQL Querying, (Microsoft Press)
Also: Ben-Gan, Itzik, et. al., Training Kit (Exam 70-461) Querying Microsoft SQL Server 2012 (MCSA), (Microsoft Press)
It should be noted that MySQL can be configured to operate closer to standard as well if desired by setting the SQL Mode (although probably only recommended for fringe cases):
https://dev.mysql.com/doc/refman/8.0/en/sql-mode.html
For MySQL, I searched both MySQL and MariaDB documentation and could find nothing other than the few statements that Gordon Linoff mentioned in passing that were in the MySQL documentation for SELECT. They are:
If ORDER BY occurs within a parenthesized query expression and also is applied in the outer query, the results are undefined and may change in a future version of MySQL.
If LIMIT occurs within a parenthesized query expression and also is applied in the outer query, the results are undefined and may change in a future version of MySQL.
The HAVING clause is applied nearly last, just before items are sent to the client, with no optimization. (LIMIT is applied after HAVING.)
From MySQL JOIN Documentation: Natural joins and joins with USING, including outer join variants, are processed according to the SQL:2003 standard
Given that a quick skim through of SQL-92 from "from clause" to "query specification" showed that the Logic can be conditional at times depending on how the query is written, and given that I could not find anything in the MySQL or MariaDB documentation (not saying it is not there, I just could not find it), and other articles on MySQL's Logical Query Processing Phase were conflicting in their order, it seems that the best way that MySQL gives to determine some sort of Logical Query Processing Phase (or at least the steps used for join optimization for the query plan) for a specific query is to do a trace on the query execution by doing the following (from MySQL documentation "Tracing The Optimizer/Typical Usage"):
# Turn tracing on (it's off by default):
SET optimizer_trace="enabled=on";
SELECT ...; # your query here
SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;
# possibly more queries...
# When done with tracing, disable it:
SET optimizer_trace="enabled=off";
You can interpret the results by looking at MySQL documentation's Tracing Example.
Basically, it appears that you want to look for "join_optimization" (which says that the from/join statements are being evaluated for the specific query, the specific query being the one stated as "Select#"), then look for "condition_processing: condition", and then "clause_processing: clause". As it says in the MySQL documentation for General Trace Structure:
A trace follows closely the actual execution path: there is a join-preparation object, a join-optimization object, a join-execution object, for each JOIN... It is far from showing everything happening in the optimizer, but we plan to show more information in the future.
Interesting enough, I found that running a query like the following in MySQL gave me that its apparent process order for query optimization was FROM,WHERE,HAVING,ORDER BY, and then GROUP BY:
SELECT
a.id
, max(a.timestamp)
FROM database.table AS a
LEFT JOIN database.table2 AS b on a.id = b.id
WHERE a.id > 1
GROUP BY a.id HAVING max(a.timestamp) > 0
ORDER BY a.id
I am assuming that since "condition_processing" and "clause_processing" are within the "Select#" group that these are processed before SELECT - which lines up with SQL-99, but it is an assumption.
In terms of operators and variables, Zawodny, Jeremy D., et. al., High Performance MySQL, 2nd Edition, (O'Reilly) states that:
The := assignment operator has lower precedence than any other operator, so you have to be careful to parenthesize explicitly.
I only mention this since sometimes it may not be order of Logical Query Processing Phase as much as precedence of assignment operator when working with a variable, or variables, that could be an issue for troubleshooting if the query is not executing as thought.
I think the execution order is like this:
(7) SELECT
(8) DISTINCT <select_list>
(1) FROM <left_table>
(3) <join_type> JOIN <right_table>
(2) ON <join_condition>
(4) WHERE <where_condition>
(5) GROUP BY <group_by_list>
(6) HAVING <having_condition>
(9) ORDER BY <order_by_condition>
(10) LIMIT <limit_number>[, <offset_number>]
Basically I store data in MySql 5.5. I use qt to connect to mysql. I want to compare two columns, if col1 is greater than col2, the count continues, but when col1 is less than col2, count finishes and exits. So this is to count how many rows under some condition at the beginning of column. Is it possible in mysql?
An example:
Col1 Col2
2 1
2 3
2 1
The count I need should return 1, because the first row meets the condition of Col1 > Col2, but the second row doesn't. Whenever the condition is not meet, counting exits no matter if following rows meet the condition or not.
SELECT COUNT(*)
FROM table
WHERE col1 > col2
It's a little difficult to understand what you're after, but COUNT(*) will return the number of rows matched by your condition, if that's your desire. If it's not, can you maybe be more specific or show example(s) of what you're going for? I will do my best to correct my answer depending on additional details.
You should not be using SQL for this; any answer you get will be chock full of comprimise and if (for example) the result set from your intial query comes back in a different order (due to an index being created or changed), then they will fail.
SQL is designed for "set based" logic - and you really are after procedural logic. If you have to do this, then
1) Use a cursor
2) Use an order by statement
3) Cross fingers
This is a bit ugly, but will do the job. It'll need adjusting depending on any ORDER etc you would like to apply to someTable but the principle is sound.
SELECT COUNT(*)
FROM (
SELECT
#multiplier:=#multiplier*IF(t.`col1`<t.`col2`,0,1) AS counter
FROM `someTable` t, (SELECT #multiplier := 1) v
HAVING counter = 1
) scanQuery
The #multiplier variable will keep multiplying itself by 1. When it encounters a row where col1 < col2 it multiplies by 0. It will then continue multiplying 0 x 1. The outer query then just sums them up.
It's not ideal, but would suffice. This could be expanded to allow you to get those rows before the break by doing the following
SELECT
`someTable`.*
FROM `someTable`
INNER JOIN (
SELECT
`someTable`.`PrimaryKeyField`
#multiplier:=#multiplier*IF(`col1`<`col2`,0,1) AS counter
FROM `someTable` t, (SELECT #multiplier := 1) v
HAVING counter = 1
) t
ON scanQuery.`PrimaryKeyField` = `someTable`.`PrimaryKeyField`
Or possibly simply
SELECT
`someTable`.*
#multiplier:=#multiplier*IF(`col1`<`col2`,0,1) AS counter
FROM `someTable` t, (SELECT #multiplier := 1) v
HAVING counter = 1