Database specific selection of data - mysql

I have a database and one of tables has the following structure:
recordId, vehicleId, dateOfTireChange, expectedKmBeforeNextChange, tireType
I want to make such a selection from the table that i only get thouse rows that contain the most recent date for each vehicleId.
I tried this approach
SELECT vehicleid,
Max(dateoftirechange) AS lastChange,
expectedkmbeforenextchange,
tiretype
FROM vehicle_tires
GROUP BY vehicleid
but it doesn't select the kilometers associated with the most recent date so it does not work.
Any idea how to make this selection?

There are several ways to get the desired result.
Correlated scalar subquery...
SELECT vt1.*
FROM vehicle_tire vt1
WHERE vt1.recordId = (SELECT vt2.recordId
FROM vehicle_tire vt2
WHERE vt2.vehicleId = vt1.vehicleId
ORDER BY vt2.dateOfTireChange DESC limit 1);
...or derived table...
SELECT vt2.*
FROM vehicle_tire vt2
JOIN (SELECT vt1.vehicleId as vehicleId,
MAX(vt1.dateOfTireChange) as maxDateOfTireChange
FROM vehicle_tire vt1
GROUP BY vt1.vehicleId) dt ON vt2.vehicleId = dt.vehicleId
AND vt2.dateOfTireChange = dt.dateOfTireChange;
...are two that come to mind.
The reason GROUP BY is not correct when applied to the whole table is that any columns you do not GROUP BY and that are also not the subject of aggregate functions MIN() MAX() AVG() COUNT(), etc., are assumed by the server to be columns that you know to be identical in every row of the groups established by the GROUP BY clause.
If, for example, I'm doing a query like this...
SELECT p.id,
p.full_name,
p.date_of_birth,
COUNT(c.id) AS number_of_children
FROM parent p LEFT JOIN child c ON c.parent_id = p.id
GROUP BY p.id;
The correct way to write this query would be GROUP BY p.id, p.full_name, p.date_of_birth, because none of those columns are part of the aggregate function COUNT().
The MySQL optimization allows you to exclude those columns that you know have to, by definition, be the same on each group from the GROUP BY, and the server will fill those columns with data from any row in the group. Which row is not defined. As you can see, in the example, the parent's full_name would be the same in all rows within a group-by parent.id, and that is a case when this optimization is legitimate. The justification is that it allows the server to have to handle smaller values (fewer bytes) when executing the grouping... but in a query like yours where the ungrouped columns have different values within each group, you get an invalid result, by design.
The SQL_MODE ONLY_FULL_GROUP_BY disables this optimization.

Related

Mysql - Sum by other column value

Here's the problem. I have a long but not very complex query:
SUM(x.value)
FROM valuetable AS x
LEFT JOIN jointable_1 AS y
LEFT JOIN jointable_2 AS z
etc
...
GROUP BY y.id, z.id
There are n amount of left joins, and I need to keep it this way, for a new left join must be available any time. I obviously get n value dublicates into SUM, since jointables can have multiple results, and I can not break any of them into subquery for flexible WHERE reasons. I need only one x.value per x.id into SUM, thats also obvious.
-I cannot add x.id to GROUP BY, since I so need one row to have sum per y.id.
-I cannot use the calculation:
SUM(x.value)*COUNT(DISTINCT x.id)/COUNT(*)
since there can be any number of x.values in sum, as different x.id-s have different amount of joins.
-I cannot go for DISTINCT x.value, since any x.id can have any x.value and they can contain same value.
-I don't know how to create a subquery for sum, since I cannot use the aggregated value (for example GROUP_CONCAT(DISTINCT x.id)) in subquery, or can I?
Anyways, thats it. I know I can rearrange the query(subqueries instead of joins, different from), but I want to leave it as the last resort. Is there a way to achieve what I want?
Sorry to say, there's no general way to do what you want without subqueries (or maybe views).
A bit of jargon: "Cardinality". For our purpose it's the number of rows in a table or a result set. (For our purpose a result set is a kind of virtual table.)
For aggregate functions like SUM(col) and COUNT(*) to give good results, we must attend to the cardinality of the table being summarized. This kind of thing
SELECT DATE(sale_time) sale_date,
store_id,
SUM(sale_amount) total_sales
FROM sale
GROUP BY DATE(sale_time), store_id
summarizes the same cardinality of result table as the underlying table, so it generates useful results.
But, if we do this
SELECT DATE(sale.sale_time) sale_date,
sale.store_id,
SUM(sale.sale_amount) total_sales,
COUNT(promo.promo_id) promos
FROM sale
LEFT JOIN promo ON sale.store_id = promo.store_id
AND DATE(sale.sale_time) = promo.promo_date
GROUP BY DATE(sale.sale_time), sale.store_id
we wreck the cardinality of the summarized result set. This will never work unless we know for sure that each store had either zero or one promo records for each given day. Why not? The LEFT JOIN operation affects the cardinality of the virtual table being summarized. That means some sale_amount values my show up in the SUM more than once, and therefore the SUM won't be correct, or trustworthy.
How can you prevent LEFT JOIN operations from messing up your cardinality? Make sure your LEFT JOIN's ON clause matches each row on the right to exactly zero rows, or exactly one row, on the left. That is, make sure you (virtual) tables on either side of the JOIN have appropriate cardinality.
(In entity-relationship jargon, your SUM fails because you join two entities with a one-to-many relationship before you do the sum.)
The theoretically cleanest way to do it is to perform both aggregate operations before the join. This joins two virtual tables in a way that the LEFT JOIN is either one-to-none or one-to-one
SELECT sales.sale_date,
sales.store_id,
sales.total_sales,
promos.promo_count
FROM (
SELECT DATE(sale_time) sale_date,
store_id,
SUM(sale_amount) total_sales
FROM sale
GROUP BY DATE(sale_time), sale_store
) sales
LEFT JOIN (
SELECT store_id,
promo_date
COUNT(*) promo_count
FROM promo
GROUP BY store_id, promo_date
) promos ON sales.store_id = promos.store_id
AND sales.sale_date = promo.promo_date
Although this SQL is complex, most servers handle this kind of pattern efficiently.
Troubleshooting tip: If you see SUM() ... FROM ... JOIN ... GROUP BY all at the same level of a query, you may have cardinality problems.

Order by Date not working as expected in MySql

I have a mysql query
select count(*) as TotalCount,
pd.Product_Modified_Date,
psc.Product_Subcategory_Name,
pd.Product_Image_URL
from product_subcategory psc
inner join product_details pd on psc.Product_Subcategory_ID = pd.Product_Subcategory_Reference_ID
where pd.Product_Status = 0 and
psc.Product_Subcategory_Status = 0
group by psc.Product_Subcategory_Name
order by pd.Product_Modified_Date desc
In my product_details table have new image urls. But i could not get it by the above query.
How can i do it?
You are grouping by one column, Product_Subcategory_Name, but you have other columns Product_Image_URL and Product_Modified_Date in your select-list.
If you have cases where the group has multiple rows (which you do, since the count is 14 or more in each group), MySQL can only present one value for the Product_Image_URL. So it picks some row in the group, and uses the value in that row. The URL value for all other rows in the group is ignored.
To fix this, you must group by all columns in your select-list that are not part of an aggregate function. Any column you don't want to use to form a new group must go into an aggregate function.
Roland Bouman wrote an excellent blog detailing how to use GROUP BY properly: http://rpbouman.blogspot.com/2007/05/debunking-group-by-myths.html
Combining GROUP BY and ORDER BY is problematic and your problem is most likely covered in another question on Stack Exchange : MySQL wrong results with GROUP BY and ORDER BY

wrapping inside aggregate function in SQL query

I have 2 tables called Orders and Salesperson shown below:
And I want to retrieve the names of all salespeople that have more than 1 order from the tables above.
Then firing following query shows an error:
SELECT Name
FROM Orders, Salesperson
WHERE Orders.salesperson_id = Salesperson.ID
GROUP BY salesperson_id
HAVING COUNT( salesperson_id ) >1
The error is:
Column 'Name' is invalid in the select list because it is
not contained in either an aggregate function or
the GROUP BY clause.
From the error and searching it on google, I could understand that the error is because of Name column must be either a part of the group by statement or aggregate function.
Also I tried to understand why does the selected column have to be in the group by clause or art of an aggregate function? But didn't understand clearly.
So, how to fix this error?
SELECT max(Name) as Name
FROM Orders, Salesperson
WHERE Orders.salesperson_id = Salesperson.ID
GROUP BY salesperson_id
HAVING COUNT( salesperson_id ) >1
The basic idea is that columns that are not in the group by clause need to be in an aggregate function now here due to the fact that the name is probably the same for every salesperson_id min or max make no real difference (the result is the same)
example
Looking at your data you have 3 entry's for Dan(7) now when a join is created the with row Dan (Name) gets multiplied by 3 (For every number 1 Dan) and then the server does not now witch "Dan" to pick cos to the server that are 3 lines even doh they are semantically the same
also try this so that you see what I am talking about:
SELECT Orders.Number, Salesperson.Name
FROM Orders, Salesperson
WHERE Orders.salesperson_id = Salesperson.ID
As far as the query goes INNER JOIN is a better solution since its kinda the standard for this simple query it should not matter but in some cases can happen that INNER JOIN produces better results but as far as I know this is more of a legacy thing since this days the server should pretty much produce the same execution plan.
For code clarity I would stick with INNER JOIN
Assuming the name is unique to the salesperson.id then simply add it to your group by clause
GROUP BY salesperson_id, salesperson.Name
Otherwise use any Agg function
Select Min(Name)
The reason for this is that SQL doesn't know whether there are multiple name per salesperson.id
For readability and correctness, I usually split aggregate queries into two parts:
The aggregate query
Any additional queries to support fields not contained in aggregate functions
So:
1.Aggregate query - salespeople with more than 1 order
SELECT salesperson_id
FROM ORDERS
GROUP BY salespersonId
HAVING COUNT(Number) > 1
2.Use aggregate as subquery (basically a select joining onto another select) to join on any additional fields:
SELECT *
FROM Salesperson SP
INNER JOIN
(
SELECT salesperson_id
FROM ORDERS
GROUP BY salespersonId
HAVING COUNT(Number) > 1
) AGG_QUERY
ON AGG_QUERY.salesperson_id = SP.ID
There are other approaches, such as selecting the additional fields via aggregation functions (as shown by the other answers). These get the code written quickly so if you are writing the query under time pressure you may prefer that approach. If the query needs to be maintained (and hence readable) I would favour subqueries.

Conditional Distinct in MYSQL with respect o another column

I have query as follow
SELECT * FROM content_type_product cp
JOIN content_field_product_manufacturer cf ON cf.nid = cp.nid group by cp.nid
ORDER
BY field(cf.field_product_manufacturer_value,'12') DESC,
cp.field_product_price_value DESC
This is working perfect just a small flaw, there are two records having the same id (one is for cf.field_product_manufacturer_value='12' and other is for cf.field_product_manufacturer_value = '57') which I eliminated using group by clause. But the problem is that I want to get that particular id which has greater "field_product_price_value" but somehow it gives me the value which is lesser. If I query it for '57' then it gives me the id with greater field_product_price_value but when I query it for '12' it gives me id for lesser "field_product_price_value". Is there any way where I can specify to pick the id with greater "field_product_price_value"
You should use max(field_product_price_value) combined with appropriate GROUP BY-clause.
In general, you should use GROUP BY-clause only when you select both normal columns and aggregate functions (MIN, MAX, COUNT, AVG) in the query.
You query is using a (mis)feature of MySQL called Hidden Columns. This is only advised when all the unaggregated columns in the SELECT and not in the GROUP BY have the same value. This is not the case, so you need to select the correct records yourself:
SELECT cp.*, cf.*
FROM content_type_product cp JOIN
content_field_product_manufacturer cf
ON cf.nid = cp.nid join
(select cf.nid, max(field_product_price_value) as maxprice
from content_field_product_manufacturer
group by cf.nid
) cfmax
on cf.nid = cfmax.nid and cf.field_product_price_value = cfmax.maxprice
ORDER BY field(cf.field_product_manufacturer_value,'12') DESC,
cp.field_product_price_value DESC
Unless you really know what you are doing, when you use a GROUP BY, be sure all unaggregated columns in the SELECT are in the GROUP BY.
'2' > '12'
if we are talking about varchars. I believe you should convert your field to number type and your sort will work fine. Read this article for more information.

How mysql works when using multiple columns on group by?

Well how mysql works when using more than a column on group by like:
select
a.nome,
b.tb2_id,
count(c.tb2_id) as saida
from tb1 a
left join tb2 b on a.tb1_id = b.tb1_id
left join tb3 c on b.tb2_id = c.tb2_id
group by a.tb1_id, b.tb2_id
order by a.tb1_id desc
how mysql knows which column it will use to group the result set?
i thought that it would do it in order but i changed the group by to 'b.tb2_id,a.tb1_id' but it doesn't make any change, same result.
group by a.tb1_id, b.tb2_id means group by the pair of a.tb1_id and b.tb2_id, both a.tb1_id and b.tb2_id need to be same to be treated as a group.
Only the order by clause affects the order of rows.
The group by clause affects data aggregation. mysql is special in that, unlike most other databases, it allows the data to be grouped by columns not selected, and further allows non-grouped by columns to be non-aggregated. In this case of this last option being exercised (as in your query - a.nome is not being grouped by), mysql returns the first row encountered for each group. All other databases I know would throw an SQL syntax exception if you tried to execute this query.