I have question about priority operation in sql statement. For example:
SELECT t1.c2, COUNT(DISTINCT t2.c1) as count_c1 FROM t1 JOIN t2
WHERE t2.c2 > 1 GROUP BY t1.c2 HAVING count_c1 > 1;
Which filter from this query will be applied first and which last. As I understand condition in HAVING will be last, it means that server generate full record set and after that remove all rows with count_c1 < 1 and return result to client.
Condition under WHERE will be first, it means that server don't even get rows with t2.c2 < 1, but what's about DISTINCT and GROUP BY? Result will be different if server will apply DISTINCT before GROUP BY from the opposite situation(GROUP BY first and DISTINCT second). I can't find anything in documentation about this, may be you help me.
Frist, like Thorsten Kettner said, your SQL syntax is actually invalid. Secondly,
The order of operations I was able to find (source) is as follows:
FROM clause (this includes JOIN's)
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
The priority operation in SQL in as:
FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
You have learn about it in MVA here
The DBMS is free to either join first and then apply the WHERE clause or use the WHERE clause already to restrict rows to join.
Aggregation. Grouping by c2, counting distinct c1 in the process. You count distinct c1 per c2.
The HAVING clause (as it is dealing with the aggregated rows). MySQL allows for an alias name here, which doesn't comply with standard SQL (as the SELECT clause is not yet executed).
SELECT clause: Show the results.
Related
i consider myself a pretty well-versed developer, but this one has me stumped.
The actual use case is somewhat more complicated than this (i have built a data-view framework that allows you to filter and search data), but at its simplest...
Why can't I do something like this?:
SELECT
fundraisers.id,
(
SELECT
count(*)
FROM
transactions
WHERE
transactions.fundraiser_id = fundraisers.id) AS total
FROM
fundraisers
WHERE
total > 331
ORDER BY
total DESC
I've also tried:
I'm aware i can successfully use HAVING to do this, but i need it to be part of the WHERE clause in order to be able to use it in conjunction with other filters using the right AND/OR conditions.
doing it as a subquery JOIN instead, but it never seems to return the right count of transactions for the row.
Any help is appreciated! Thanks folks.
You can use a derived table, in other words a subquery in the FROM clause instead of the select-list.
SELECT t.fundraiser_id, t.total
FROM
fundraisers AS f
JOIN (
SELECT fundraiser_id, COUNT(*) AS total
FROM transactions
GROUP BY fundraiser_id
) AS t ON t.fundraiser_id = f.id
WHERE
t.total > 331
ORDER BY
t.total DESC;
The reason you can't refer to an alias in the WHERE clause is that the conditions of the WHERE clause is evaluated before expressions in the select-list are evaluated. This is a good thing, because you wouldn't want the select-list to be evaluated for potentially millions of rows that would be filtered out by the conditions anyway. Evaluating the select-list only for rows that are included in the result helps improve performance.
But it means the result of those expressions in the select-list, and their alias, isn't available to be referenced by conditions in the WHERE clause.
The workaround I show in my example above produces the results in a subquery, which happens before the WHERE clause gets to filter the results.
i don't know what do you want to select but try this
select fundraisers.id,count(*) as total FROM
fundraisers f join transactions t on t.fundraiser_id=f.fundraiser_id
WHERE
total > 331
ORDER BY
total DESC
Why am I only getting one result from the query below? The suggested "answer" has the first name "Susan" instead of what I got in my results.
SELECT EmpFirstName, EmpLastName, p.ProductName as ProductName,
YEAR(c.OrderDate) AS Year,
SUM(o.QuotedPrice + o.QuantityOrdered) AS TotalValue
FROM Employees
NATURAL JOIN Products p
NATURAL JOIN Order_Details o
NATURAL JOIN Orders c
ORDER BY Year, TotalValue DESC
Image of results
Image of Table Structure
Because there are a Sum in your Query
The result returned by the query does not match your expectations because the query is invalid. And your expectations are incorrect.
The presence of an aggregate (GROUP BY) function in the expression from the SELECT clause requires the presence of a GROUP BY clause. When such a clause does not exists, the SQL standard automatically adds a GROUP BY 1 clause that produces only one group from all the selected rows.
Each expression that appears in the SELECT clause of a GROUP BY query must follow one of these rules, in order to have a valid SQL query:
it also appears in the GROUP BY clause;
it's a call to an aggregate (GROUP BY) function;
is functionally dependent of one column that appears in the GROUP BY clause.
Because your query does not have a GROUP BY clause, the expressions EmpFirstName, EmpLastName, p.ProductName and YEAR(c.OrderDate) are not valid in the SELECT clause.
Before version 5.7.5, MySQL used to allow such invalid SQL queries but it reserved its privilege to return indeterminate values for the invalid expressions.
Since version 5.7.5, MySQL handles such queries correctly and rejects them. Other RDBMS-es handle them correctly since many years ago.
The explanation for the indeterminate values is simple: the JOIN and WHERE clauses extract some rows from the table(s). The (missing) GROUP BY clause produces only one record from all these rows. A GROUP BY query never returns rows from the table, it generates the values it puts in the result set. Since there are multiple different values for EmpFirstName in the group, the SQL standard says the query is invalid. MySQL used to ignore the standard but it had no valid rule about what value to pick from the EmpFirstName expression in the SELECT clause. Any value from the rows in the group is equally valid and that's what it returns: one random value from the group.
In order to get the results you expect you have to group the rows by OrderNumber and ProductNumber (and EmployeeID to get a valid SQL query):
I'm reading a book on SQL (Sams Teach Yourself SQL in 10 Minutes) and its quite good despite its title. However the chapter on group by confuses me
"Grouping data is a simple process. The selected columns (the column list following
the SELECT keyword in a query) are the columns that can be referenced in the GROUP
BY clause. If a column is not found in the SELECT statement, it cannot be used in the
GROUP BY clause. This is logical if you think about it—how can you group data on a
report if the data is not displayed? "
How come when I ran this statement in MySQL it works?
select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
You're right, MySQL does allow you to create queries that are ambiguous and have arbitrary results. MySQL trusts you to know what you're doing, so it's your responsibility to avoid queries like that.
You can make MySQL enforce GROUP BY in a more standard way:
mysql> SET SQL_MODE=ONLY_FULL_GROUP_BY;
mysql> select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
ERROR 1055 (42000): 'test.EMPLOYEE_PAY_TBL.EMP_ID' isn't in GROUP BY
Because the book is wrong.
The columns in the group by have only one relationship to the columns in the select according to the ANSI standard. If a column is in the select, with no aggregation function, then it (or the expression it is in) needs to be in the group by statement. MySQL actually relaxes this condition.
This is even useful. For instance, if you want to select rows with the highest id for each group from a table, one way to write the query is:
select t.*
from table t
where t.id in (select max(id)
from table t
group by thegroup
);
(Note: There are other ways to write such a query, this is just an example.)
EDIT:
The query that you are suggesting:
select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
would work in MySQL but probably not in any other database (unless BONUS happens to be a poorly named primary key on the table, but that is another matter). It will produce one row for each value of BONUS. For each row, it will get an arbitrary EMP_ID and SALARY from rows in that group. The documentation actually says "indeterminate", but I think arbitrary is easier to understand.
What you should really know about this type of query is simply not to use it. All the "bare" columns in the SELECT (that is, with no aggregation functions) should be in the GROUP BY. This is required in most databases. Note that this is the inverse of what the book says. There is no problem doing:
select EMP_ID
from EMPLOYEE_PAY_TBL
group by EMP_ID, BONUS;
Except that you might get multiple rows back for the same EMP_ID with no way to distinguish among them.
Well how mysql works when using more than a column on group by like:
select
a.nome,
b.tb2_id,
count(c.tb2_id) as saida
from tb1 a
left join tb2 b on a.tb1_id = b.tb1_id
left join tb3 c on b.tb2_id = c.tb2_id
group by a.tb1_id, b.tb2_id
order by a.tb1_id desc
how mysql knows which column it will use to group the result set?
i thought that it would do it in order but i changed the group by to 'b.tb2_id,a.tb1_id' but it doesn't make any change, same result.
group by a.tb1_id, b.tb2_id means group by the pair of a.tb1_id and b.tb2_id, both a.tb1_id and b.tb2_id need to be same to be treated as a group.
Only the order by clause affects the order of rows.
The group by clause affects data aggregation. mysql is special in that, unlike most other databases, it allows the data to be grouped by columns not selected, and further allows non-grouped by columns to be non-aggregated. In this case of this last option being exercised (as in your query - a.nome is not being grouped by), mysql returns the first row encountered for each group. All other databases I know would throw an SQL syntax exception if you tried to execute this query.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Is there any difference between Group By and Distinct
What's the difference between GROUP BY and DISTINCT in a MySQL query?
Duplicate of
Is there any difference between GROUP BY and DISTINCT
It is already discussed here
If still want to listen here
Well group by and distinct has its own use.
Distinct is used to filter unique records out of the records that satisfy the query criteria.
Group by clause is used to group the data upon which the aggregate functions are fired and the output is returned based on the columns in the group by clause. It has its own limitations such as all the columns that are in the select query apart from the aggregate functions have to be the part of the Group by clause.
So even though you can have the same data returned by distinct and group by clause its better to use distinct. See the below example
select col1,col2,col3,col4,col5,col6,col7,col8,col9 from table group by col1,col2,col3,col4,col5,col6,col7,col8,col9
can be written as
select distinct col1,col2,col3,col4,col5,col6,col7,col8,col9 from table
It makes you life easier when you have more columns in the select list. But at the same time if you need to display sum(col10) along with the above columns than you will have to use Group By. In that case distinct will not work.
eg
select col1,col2,col3,col4,col5,col6,col7,col8,col9,sum(col10) from table group by col1,col2,col3,col4,col5,col6,col7,col8,col9
Hope this helps.
DISTINCT works only on the entire row. Don't be mislead into thinking SELECT DISTINCT(A), B does something different. This is equivalent to SELECT DISTINCT A, B.
On the other hand GROUP BY creates a group containing all the rows that share each distinct value in a single column (or in a number of columns, or arbitrary expressions). Using GROUP BY you can use aggregate functions such as COUNT and MAX. This is not possible with DISTINCT.
If you want to ensure that all rows in your result set are unique and you do not need to aggregate then use DISTINCT.
For anything more advanced you should use GROUP BY.
Another difference that applies only to MySQL is that GROUP BY also implies an ORDER BY unless you specify otherwise. Here's what can happen if you use DISTINCT:
SELECT DISTINCT a FROM table1
Results:
2
1
But using GROUP BY the results will come in sorted order:
SELECT a FROM table1 GROUP BY a
Results:
1
2
As a result of the lack of sorting using DISTINCT is faster in the case where you can use either. Note: if you don't need the sorting with GROUP BY you can add ORDER BY NULL to improve performance.
Care to look at the docs:
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html
and
http://dev.mysql.com/doc/refman/5.0/en/select.html