How mysql works when using multiple columns on group by? - mysql

Well how mysql works when using more than a column on group by like:
select
a.nome,
b.tb2_id,
count(c.tb2_id) as saida
from tb1 a
left join tb2 b on a.tb1_id = b.tb1_id
left join tb3 c on b.tb2_id = c.tb2_id
group by a.tb1_id, b.tb2_id
order by a.tb1_id desc
how mysql knows which column it will use to group the result set?
i thought that it would do it in order but i changed the group by to 'b.tb2_id,a.tb1_id' but it doesn't make any change, same result.

group by a.tb1_id, b.tb2_id means group by the pair of a.tb1_id and b.tb2_id, both a.tb1_id and b.tb2_id need to be same to be treated as a group.

Only the order by clause affects the order of rows.
The group by clause affects data aggregation. mysql is special in that, unlike most other databases, it allows the data to be grouped by columns not selected, and further allows non-grouped by columns to be non-aggregated. In this case of this last option being exercised (as in your query - a.nome is not being grouped by), mysql returns the first row encountered for each group. All other databases I know would throw an SQL syntax exception if you tried to execute this query.

Related

Why is only one result showing from my query?

Why am I only getting one result from the query below? The suggested "answer" has the first name "Susan" instead of what I got in my results.
SELECT EmpFirstName, EmpLastName, p.ProductName as ProductName,
YEAR(c.OrderDate) AS Year,
SUM(o.QuotedPrice + o.QuantityOrdered) AS TotalValue
FROM Employees
NATURAL JOIN Products p
NATURAL JOIN Order_Details o
NATURAL JOIN Orders c
ORDER BY Year, TotalValue DESC
Image of results
Image of Table Structure
Because there are a Sum in your Query
The result returned by the query does not match your expectations because the query is invalid. And your expectations are incorrect.
The presence of an aggregate (GROUP BY) function in the expression from the SELECT clause requires the presence of a GROUP BY clause. When such a clause does not exists, the SQL standard automatically adds a GROUP BY 1 clause that produces only one group from all the selected rows.
Each expression that appears in the SELECT clause of a GROUP BY query must follow one of these rules, in order to have a valid SQL query:
it also appears in the GROUP BY clause;
it's a call to an aggregate (GROUP BY) function;
is functionally dependent of one column that appears in the GROUP BY clause.
Because your query does not have a GROUP BY clause, the expressions EmpFirstName, EmpLastName, p.ProductName and YEAR(c.OrderDate) are not valid in the SELECT clause.
Before version 5.7.5, MySQL used to allow such invalid SQL queries but it reserved its privilege to return indeterminate values for the invalid expressions.
Since version 5.7.5, MySQL handles such queries correctly and rejects them. Other RDBMS-es handle them correctly since many years ago.
The explanation for the indeterminate values is simple: the JOIN and WHERE clauses extract some rows from the table(s). The (missing) GROUP BY clause produces only one record from all these rows. A GROUP BY query never returns rows from the table, it generates the values it puts in the result set. Since there are multiple different values for EmpFirstName in the group, the SQL standard says the query is invalid. MySQL used to ignore the standard but it had no valid rule about what value to pick from the EmpFirstName expression in the SELECT clause. Any value from the rows in the group is equally valid and that's what it returns: one random value from the group.
In order to get the results you expect you have to group the rows by OrderNumber and ProductNumber (and EmployeeID to get a valid SQL query):

SQL statement operation priority

I have question about priority operation in sql statement. For example:
SELECT t1.c2, COUNT(DISTINCT t2.c1) as count_c1 FROM t1 JOIN t2
WHERE t2.c2 > 1 GROUP BY t1.c2 HAVING count_c1 > 1;
Which filter from this query will be applied first and which last. As I understand condition in HAVING will be last, it means that server generate full record set and after that remove all rows with count_c1 < 1 and return result to client.
Condition under WHERE will be first, it means that server don't even get rows with t2.c2 < 1, but what's about DISTINCT and GROUP BY? Result will be different if server will apply DISTINCT before GROUP BY from the opposite situation(GROUP BY first and DISTINCT second). I can't find anything in documentation about this, may be you help me.
Frist, like Thorsten Kettner said, your SQL syntax is actually invalid. Secondly,
The order of operations I was able to find (source) is as follows:
FROM clause (this includes JOIN's)
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
The priority operation in SQL in as:
FROM
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
You have learn about it in MVA here
The DBMS is free to either join first and then apply the WHERE clause or use the WHERE clause already to restrict rows to join.
Aggregation. Grouping by c2, counting distinct c1 in the process. You count distinct c1 per c2.
The HAVING clause (as it is dealing with the aggregated rows). MySQL allows for an alias name here, which doesn't comply with standard SQL (as the SELECT clause is not yet executed).
SELECT clause: Show the results.

Order by Date not working as expected in MySql

I have a mysql query
select count(*) as TotalCount,
pd.Product_Modified_Date,
psc.Product_Subcategory_Name,
pd.Product_Image_URL
from product_subcategory psc
inner join product_details pd on psc.Product_Subcategory_ID = pd.Product_Subcategory_Reference_ID
where pd.Product_Status = 0 and
psc.Product_Subcategory_Status = 0
group by psc.Product_Subcategory_Name
order by pd.Product_Modified_Date desc
In my product_details table have new image urls. But i could not get it by the above query.
How can i do it?
You are grouping by one column, Product_Subcategory_Name, but you have other columns Product_Image_URL and Product_Modified_Date in your select-list.
If you have cases where the group has multiple rows (which you do, since the count is 14 or more in each group), MySQL can only present one value for the Product_Image_URL. So it picks some row in the group, and uses the value in that row. The URL value for all other rows in the group is ignored.
To fix this, you must group by all columns in your select-list that are not part of an aggregate function. Any column you don't want to use to form a new group must go into an aggregate function.
Roland Bouman wrote an excellent blog detailing how to use GROUP BY properly: http://rpbouman.blogspot.com/2007/05/debunking-group-by-myths.html
Combining GROUP BY and ORDER BY is problematic and your problem is most likely covered in another question on Stack Exchange : MySQL wrong results with GROUP BY and ORDER BY

Database specific selection of data

I have a database and one of tables has the following structure:
recordId, vehicleId, dateOfTireChange, expectedKmBeforeNextChange, tireType
I want to make such a selection from the table that i only get thouse rows that contain the most recent date for each vehicleId.
I tried this approach
SELECT vehicleid,
Max(dateoftirechange) AS lastChange,
expectedkmbeforenextchange,
tiretype
FROM vehicle_tires
GROUP BY vehicleid
but it doesn't select the kilometers associated with the most recent date so it does not work.
Any idea how to make this selection?
There are several ways to get the desired result.
Correlated scalar subquery...
SELECT vt1.*
FROM vehicle_tire vt1
WHERE vt1.recordId = (SELECT vt2.recordId
FROM vehicle_tire vt2
WHERE vt2.vehicleId = vt1.vehicleId
ORDER BY vt2.dateOfTireChange DESC limit 1);
...or derived table...
SELECT vt2.*
FROM vehicle_tire vt2
JOIN (SELECT vt1.vehicleId as vehicleId,
MAX(vt1.dateOfTireChange) as maxDateOfTireChange
FROM vehicle_tire vt1
GROUP BY vt1.vehicleId) dt ON vt2.vehicleId = dt.vehicleId
AND vt2.dateOfTireChange = dt.dateOfTireChange;
...are two that come to mind.
The reason GROUP BY is not correct when applied to the whole table is that any columns you do not GROUP BY and that are also not the subject of aggregate functions MIN() MAX() AVG() COUNT(), etc., are assumed by the server to be columns that you know to be identical in every row of the groups established by the GROUP BY clause.
If, for example, I'm doing a query like this...
SELECT p.id,
p.full_name,
p.date_of_birth,
COUNT(c.id) AS number_of_children
FROM parent p LEFT JOIN child c ON c.parent_id = p.id
GROUP BY p.id;
The correct way to write this query would be GROUP BY p.id, p.full_name, p.date_of_birth, because none of those columns are part of the aggregate function COUNT().
The MySQL optimization allows you to exclude those columns that you know have to, by definition, be the same on each group from the GROUP BY, and the server will fill those columns with data from any row in the group. Which row is not defined. As you can see, in the example, the parent's full_name would be the same in all rows within a group-by parent.id, and that is a case when this optimization is legitimate. The justification is that it allows the server to have to handle smaller values (fewer bytes) when executing the grouping... but in a query like yours where the ungrouped columns have different values within each group, you get an invalid result, by design.
The SQL_MODE ONLY_FULL_GROUP_BY disables this optimization.

How do records get ordered in a mysql group by?

so suppose I do
SELECT * FROM table t GROUP BY t.id
Suppose there are multiple rows in the table with the same id, only one row of that id will ultimately come out...I suppose mysql will order the results that have the same id and the return the first one or something....my question is...how does mysql perform this ordering and is there a way that I can control its ordering so that for instance, it uses a certain field etc?
You can GROUP BY several different columns to arrange the order for which the result is grouped.
SELECT * FROM table t GROUP BY t.id, t.foo, t.bar
In strictly correct SQL, when you use GROUP BY all the values being selected must be either columns named in the GROUP BY clause or aggregate functions. MySQL allows you to violate this rule, unless you set the only_full_group_by mode. But although it allows you to perform such queries, it doesn't specify which row will be selected for each grouped column.
If you want to select a row that corresponds to the max or min of some other column, you can do something like this:
SELECT a.*
FROM table a
JOIN (SELECT id, max(somecol) maxcol
FROM table
GROUP BY id) b
ON a.id = b.id
AND a.somecol = b.maxcol
Note that this can still return multiple rows per ID if they both have the max value. You can add a final GROUP BY clause and it will select one of them arbitrarily.
When using this MySQL extension to the standard SQL GROUP BY functionality you cannot control which values for the non-aggregated, non-GROUP BY columns are selected by the server. The MySQL documentation discusses this specific case and states clearly that
The server is free to choose any value from each group, so unless they
are the same, the values chosen are indeterminate. Furthermore, the
selection of values from each group cannot be influenced by adding an
ORDER BY clause.