I am aware that the execution order of MySQL is not fixed. But, I heard it usually goes like this:
FROM, including JOINs
WHERE
GROUP BY
HAVING
SELECT
DISTINCT
ORDER BY
LIMIT and OFFSET
However, if I run functions like COUNT() for example (like the code below), when does it get to be executed? and how does MySQL decide the subjects that will be calculated with the function (e.g. What to count for COUNT() function)? I am confused about the execution order and the target designation of functions like AVG(), SUM(), MAX(), etc. in MySQL.
SELECT productvendor, count(*)
FROM products
GROUP BY productvendor
HAVING count(*) >= 9;
You sequence is not correct
select is before GROUP BY
FROM, including JOINs
WHERE
SELECT the row obtained by from and where in a temporary area for others
operation (and build the column alias)
DISTINCT
GROUP BY
HAVING
ORDER BY
LIMIT and OFFSET
return the final result
the count and the aggegation function are done on a temporary result with the select column .. this operation produce the result filtered by having
Related
The challenge is to select daily maximum temperature from a table along with date and time information for each.
SELECT datestamp, max(temp) hitemp from Weather w group by `year`, `month`, `day`;
This causes
Expression #1 of SELECT list is not in GROUP BY clause and contains
nonaggregated column 'Weather.datestamp' which is not
functionally dependent on columns in GROUP BY clause; this is
incompatible with sql_mode=only_full_group_by
Other similar questions propose using JOIN, but I can't see how I can use JOIN syntax because the high temperature values are not unique.
RDBMS work best on set based logic. So think of the data in terms of two sets:
w2: a set of data containing the max temp for a given day
w: The universe of data containing all the measurements for a period of time
By joining these two sets we can obtain just the data from w that have the max temperature for a given day.
By using an inline view {w2} against the entire universe set {w} we can generate the max temp for each day then join back to the base set {w} to get the time information for each day's max temp.
This assumes that:
If a max temp is on multiple records for the same date you want them all as you've not indicated how to handle ties.
datestamp has a time component; and it is the date/time you want to see for max temp on a day.
This is what others meant by join most likely:
date(datestamp) simply returns the date component of a date/time.
max() returns the max temp by the group denoted (in this case date of datestamp)
.
SELECT datestamp, Temp
FROM weather W
INNER JOIN (SELECT date(datestamp) mDate, max(temp) as mtemp
FROM weather
GROUP BY Date(DateStamp)) W2
on W.temp = W2.mtemp
and Date(w.Datestamp) = w2.mDate
ADDITIONAL INFO:
MySQL doesn't support cross apply nor analytical functions row_number() Over (partition by date(datestamp) order by temp desc) which could also be used to solve this issue with likely greater performance. SQL Server, Oracle, DB2, Postgresql all have different ways of solving this; however the above example would work on all RDBMS engines (that I can think of); yet not be the most efficient in all cases.
Your sql mode is full group by. That means all the columns in select must be in the Group By clause
datestamp is in select but not in group by.
But for temp, since you are using an aggregate function MAX, it need not be in GROUP BY.
Use datestamp in group by or change your sql mode.
The exact reason is the mysql full group by mode and the logical query execution order of statements in mysql
Logical Order
FROM
WHERE
GROUP BY
AGGREGATIONS
HAVING
SELECT
So, GROUPING is done before SELECT. So, if full group by is selected, SELECT can access GROUPED and AGGREGATED columns alone.
Could someone explain why the following query throws an error, if I am trying to get the names of all customers along with the total number of customers?
SELECT name, COUNT(*)
FROM CUSTOMER
I know that selecting columns along with an aggregate function requires a GROUP BY statement containing all the column names, but I don't understand the logical principle behind this.
edit:
http://sqlfiddle.com/#!2/90233/595
I guess 'error' isn't quite right, but notice how the current query returns Allison 9 as the only result.
I don't understand why it doesn't return:
Alison 9
Alison 9
Alison 9
Alison 9
Jason 9
...
(This is a new answer based on the comment and looking at the fiddle.)
The issue here is how mysql handles aggregate functions -- which is a non-standard way and different then everyone else.
mysql lets you use aggregate functions (count() is an example of an aggregate function) without a group by. All (or most?) other sql implementations require the group by when you use count(*). When you have a group by you have to say the range in the group by (for example group by name). Also every column has to be in the range or the result of an aggregate function.
SINCE you don't have a range mysql assumes the whole table and since you have a column that is not the result of a aggregate function or in the range (in this case name) mysql does something to make that column the result of an aggregate function. I'm not sure if it is specified in mysql what it does -- lets say "max()". (Fairly sure it is max()). So the real sql that is getting executed is
SELECT ANY_VALUE(name), COUNT(*)
FROM CUSTOMER
Thus you only see one name.
mysql documentation - http://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
After reading the above I see that mysql will use the default aggregate function ANY_VALUE() for columns which are not in the range.
If you just want the total number of customers on each row you could do this
SELECT DISTINCT NAME, COUNT(NAME) OVER () AS CustomerCount
FROM CUSTOMER
In this way you don't need the GROUP BY syntax. Under the covers it is probably doing the same thing as #GordonLinoff 's answer.
I added this because maybe it makes it clearer how group by works.
Select name, Count(*) as 'CountCustomers'
FROM CUSTOMER
Group by name
Order by name
Think of it as giving an instruction of which field to aggregate by. For example, if you had a field with the State of the Customer, you could group by State which would give a count of customers by state.
Also, note you can have multiple aggregate functions in the same select using the "over (partition by" construct.
If you want the names along with the total number of customers, then use a window function:
select name, count(*) as NumCustomersWithName,
sum(count(*)) over () as NumCustomers
from customer
group by name;
Edit:
You actually seem to want:
select name, count(*) over () as NumCustomers
from customer;
In MySQL, you would do this with a subquery:
select name, cnt
from customers cross join
(select count(*) as cnt from customers) x;
The reason your query doesn't work is because it is an aggregation query that returns exactly one row. When you use aggregation functions without a GROUP BY, then the query always returns exactly one row.
The bellow statement does not work but i cant seem to figure out why
select AVG(delay_in_seconds) from A_TABLE ORDER by created_at DESC GROUP BY row_type limit 1000;
I want to get the avg's of the most recent 1000 rows for each row_type. created_at is of type DATETIME and row_type is of type VARCHAR
If you only want the 1000 most recent rows, regardless of row_type, and then get the average of delay_in_seconds for each row_type, that's a fairly straightforward query. For example:
SELECT t.row_type
, AVG(t.delay_in_seconds)
FROM (
SELECT r.row_type
, r.delay_in_seconds
FROM A_table r
ORDER BY r.created_at DESC
LIMIT 1000
) t
GROUP BY t.row_type
I suspect, however, that this query does not satisfy the requirements that were specified. (I know it doesn't satisfy what I understood as the specification.)
If what we want is the average of the most recent 1000 rows for each row_type, that would also be fairly straightforward... if we were using a database that supported analytic functions.
Unfortunately, MySQL doesn't provide support for analytic functions. But it is possible to emulate one in MySQL, but the syntax is a bit involved, and it is dependent on behavior that is not guaranteed.
As an example:
SELECT s.row_type
, AVG(s.delay_in_seconds)
FROM (
SELECT #row_ := IF(#prev_row_type = t.row_type, #row_ + 1, 1) AS row_
, #prev_row_type := t.row_type AS row_type
, t.delay_in_seconds
FROM A_table t
CROSS
JOIN (SELECT #prev_row_type := NULL, #row_ := NULL) i
ORDER BY t.row_type DESC, t.created_at DESC
) s
WHERE s.row_ <= 1000
GROUP
BY s.row_type
NOTES:
The inline view query is going to be expensive for large sets. What that's effectively doing is assigning a row number to each row. The "order by" is sorting the rows in descending sequence by created_at, what we want is for the most recent row to be assigned a value of 1, the next most recent 2, etc. This numbering of rows will be repeated for each distinct value of row_type.
For performance, we'd want a suitable index with leading columns (row_type,created_at,delay_seconds) to avoid an expensive "Using filesort" operation. We need at least those first two columns for that, including the delay_seconds makes it a covering index (the query can be satisfied entirely from the index.)
The outer query then runs against the resultset returned from the view query (a "derived table"). The predicate in the WHERE filters out all rows that were assigned a row number greater than 1000, the rest is a straighforward GROUP BY and and AVG aggregate.
A LIMIT clause is entirely unnecessary. It may be possible to incorporate some additional predicates for some additional performance enhancement... like, what if we specified the most recent 1000 rows, but only that were create_at within the past 30 or 90 days?
(I'm not entirely sure this answers the question that OP was asking. What this answers is: Is there a query that can return the specified resultset, making use of AVG aggregate and GROUP BY, ORDER BY and LIMIT clauses.)
N.B. This query is dependent on a behavior of MySQL user-defined variables which is not guaranteed.
The query above shows one approach, but there is also another approach. It's possible to use a "join" operation (of A_table with A_table) to get a row number assigned (getting a COUNT of the number of rows that are "more recent" than each row. With large sets, however, that can produce a humongous intermediate result, if we aren't careful to limit it.
Write the ORDER BY at the last of the statement.
SELECT AVG(delay_in_seconds) from A_TABLE GROUP BY row_type ORDER by created_at DESC limit 1000;
read mysql dev site for details.
For example:
SELECT MODE(field) FROM table
In another mode, what user-defined function can I use to get the most common value of a column?
I know I can do something like:
SELECT field, COUNT(*) as total FROM table GROUP BY field ORDER BY total DESC LIMIT 1
But I have to query other data in the same MySQL statement too, so I have to use a user-defined function.
Thank you.
Here's a link to MySQL's documentation on aggregate functions. It looks like they don't have anything for "mode", so I would say that your second query is probably your best shot.
MySQL doesn't support user-defined aggregate functions (PostgreSQL does, for what it's worth). You can't use a UDF to do what you want in MySQL.
You can do it for example by putting the mode-computation in a derived table subquery:
SELECT t.*
FROM (SELECT field AS mode, COUNT(*) as total FROM table
GROUP BY field ORDER BY total DESC LIMIT 1) AS m
JOIN table t ON m.mode = t.field;
I have a select query like this
select count(distinct id)*100/totalcount as freq, count (distinct id) from
<few joins, conditions, gorup by here> .....
Will this result in 2 calculations of count under MySql 5.0? I can calculate the frequency in my appliation as well if this is a problem. I am aware of the solutions presented in Adding percentages to multiple counts in one SQL SELECT Query but I just want to avoid nested queries
select count(distinct id)*100/totalcount as freq, count (distinct id) from
<few joins, conditions, gorup by here> .....
Yes, it will result in several evaluations.
Each recordset on DISTINCT id will be built separately for each function
Note that if not for DISTINCT, MySQL would use each record only once (though in multiple function calls).
Since COUNT is very cheap, function calls add almost nothing to overall query time.
You can benefit from rewriting your query as this:
SELECT COUNT(id) * 100 / totalcount AS freq,
COUNT(id)
FROM (
SELECT DISTINCT id
FROM original_query
) q
BTW, why do you need both GROUP BY and DISTINCT in one query? Could you please post your original query as it is?