Conditional Distinct in MYSQL with respect o another column - mysql

I have query as follow
SELECT * FROM content_type_product cp
JOIN content_field_product_manufacturer cf ON cf.nid = cp.nid group by cp.nid
ORDER
BY field(cf.field_product_manufacturer_value,'12') DESC,
cp.field_product_price_value DESC
This is working perfect just a small flaw, there are two records having the same id (one is for cf.field_product_manufacturer_value='12' and other is for cf.field_product_manufacturer_value = '57') which I eliminated using group by clause. But the problem is that I want to get that particular id which has greater "field_product_price_value" but somehow it gives me the value which is lesser. If I query it for '57' then it gives me the id with greater field_product_price_value but when I query it for '12' it gives me id for lesser "field_product_price_value". Is there any way where I can specify to pick the id with greater "field_product_price_value"

You should use max(field_product_price_value) combined with appropriate GROUP BY-clause.
In general, you should use GROUP BY-clause only when you select both normal columns and aggregate functions (MIN, MAX, COUNT, AVG) in the query.

You query is using a (mis)feature of MySQL called Hidden Columns. This is only advised when all the unaggregated columns in the SELECT and not in the GROUP BY have the same value. This is not the case, so you need to select the correct records yourself:
SELECT cp.*, cf.*
FROM content_type_product cp JOIN
content_field_product_manufacturer cf
ON cf.nid = cp.nid join
(select cf.nid, max(field_product_price_value) as maxprice
from content_field_product_manufacturer
group by cf.nid
) cfmax
on cf.nid = cfmax.nid and cf.field_product_price_value = cfmax.maxprice
ORDER BY field(cf.field_product_manufacturer_value,'12') DESC,
cp.field_product_price_value DESC
Unless you really know what you are doing, when you use a GROUP BY, be sure all unaggregated columns in the SELECT are in the GROUP BY.

'2' > '12'
if we are talking about varchars. I believe you should convert your field to number type and your sort will work fine. Read this article for more information.

Related

MySQL GROUP BY order when no ORDER BY

In MySQL, what order is the resultset if GROUP BY is used but ORDER BY is not specified?
I have inherited code with queries like:
SELECT col1, COUNT(col1)
FROM table
GROUP BY col1
(Actually the SELECT statement can be much more complex than that, involving JOINs etc., but let's start with the base principle.) Note that there is no ORDER BY.
In, say, SQL Server BOL, I am told:
The GROUP BY clause does not order the result set. Use the ORDER BY
clause to order the result set.
I have been unable to find a statement as to whether MySQL GROUP BY does or does not promise a particular ordering from GROUP BY alone? If a MySQL reference could be provided to back up any answer that would be most welcome.
From the manual:
If you use GROUP BY, output rows are sorted according to the GROUP BY
columns as if you had an ORDER BY for the same columns. To avoid the
overhead of sorting that GROUP BY produces, add ORDER BY NULL:
SELECT a, COUNT(b) FROM test_table GROUP BY a ORDER BY NULL;
Relying on implicit GROUP BY sorting (that is, sorting in the absence
of ASC or DESC designators) is deprecated. To produce a given sort
order, use explicit ASC or DESC designators for GROUP BY columns or
provide an ORDER BY clause.

SQL: Group By is mismatching records

I'm trying to get the highest version within a group. My query:
SELECT
rubric_id,
max(version) as version,
group_id
FROM
rubrics
WHERE
client_id = 1
GROUP BY
group_id
The Data:
The Results:
The rubric of ID 2 does not have a version of 2, why is this being mismatched? What do I need to do to correct this?
Edit, not a duplicate:
This is not a duplicate of SQL Select only rows with Max Value on a Column , which is a post I have read and referenced before writing this. My question is not how to find the max, my question is why is the version not matched to the correct ID
MySQL is confusing you by letting you get away with having a column in your select that isn't in your group by. To resolve the issue, make sure you don't select any field that isn't in the group by.
Instead of trying to get everything in one statement, you will need to use a subquery to find the max_version_id and then join to it.
SELECT T.*
FROM rubrics T
JOIN
(
SELECT
group_id,
max(version) as max_version
FROM
rubrics
GROUP BY
group_id
) dedupe
on T.group_id = dedupe.group_id
and T.version_id = dedupe.max_version_id
WHERE
T.client_id = 1
Edit: So MySQL allows it, but I don't think it's a good practise to use it.
You are trying to query non-aggregated data from an aggregated query. You should not do that.
A GROUP BY takes the field it should make group of rows with (in your case, what you say with your GROUP BY is: give me a result per different group_id) and gives a result (the aggregated data) based on the grouping.
Here, you try to access non aggregated data (rubric_id in your case). For some reason, the query does not crash and picks a "random" id in your aggregated data.

Having both GROUP BY and ORDER BY in one query

I've been told that I can't have GROUP BY and ORDER BY in one MySQL Query. Here is an abbreviated version of the query -
SELECT n.colorName, n.colorComp, n.colorID, SUM(n.gallons) AS TotalGallons
FROM netTran n, Store m, Product p
WHERE ((n.store = m.store) and m.state = "FL")
AND ((n.salesNbr = p.salesNbr) AND (p.intExt = "EXTERIOR" OR p.intExt = "INT/EXT"))
AND ((n.clrnt1 = "L1") AND (n.clrnt1 = "R3"))
GROUP BY n.colorComp, n.colorID
ORDER BY TotalGallons DESC;
I've been told that having the ORDER BY with the GROUP BY will give me different results and that the only way the ORDER BY would work is if the main query were nested in
SELECT * FROM
(query)
ORDER BY TotalGallons DESC;
Is that correct?
Use the query as
SELECT n.colorName, n.colorComp, n.colorID, SUM(n.gallons) AS TotalGallons
FROM netTran n, Store m, Product p
WHERE ((n.store = m.store) and m.state = "FL")
AND ((n.salesNbr = p.salesNbr) AND (p.intExt = "EXTERIOR" OR p.intExt = "INT/EXT"))
AND ((n.clrnt1 = "L1") AND (n.clrnt1 = "R3"))
GROUP BY n.colorName, n.colorComp,n.colorID
ORDER BY TotalGallons DESC;
You can have grouo by and order by in a single query. But you need to provide all columns in case of you are aggregating a column
Group by will change the results.. Order by will just present data in order..
Having the ORDER BY with the GROUP BY won't give you different results
Yes, that's true. In the mysql reference manual you can read th
If you use GROUP BY, output rows are sorted according to the GROUP BY columns as if you had an ORDER BY for the same columns. To avoid the overhead of sorting that GROUP BY produces, add ORDER BY NULL:
I suppose that this means that ORDER BY has no effect at all.
Curious... I always thought that order by worked...
GROUP BY and ORDER BY are two different things. It is plain wrong that you cannot use them together.
GROUP BY is used to tell the DBMS per which group to aggregate the data. In your example you sum gallons per colorComp and colorID.
ORDER BY is used to tell the DBMS in which order you want the data shown. In your query by the sum of gallons descending.
In standard SQL you don't usually use GROUP BY without ORDER BY, because in spite of the grouping, the data may be shown unordered. MySQL however decided to guarantee that GROUP BY performs an ORDER BY. So in MySQL it was not necessary to use ORDER BY after GROUP BY, as long as you didn't want another order as in your example. This non-standard behavior is now deprecated. See here:
https://dev.mysql.com/doc/refman/5.6/en/group-by-optimization.html
However, relying on implicit GROUP BY sorting is deprecated.
So you should have an ORDER BY clause now whenever you want data sorted. With no exception.

Get values from first sorted member of grouped(?) sql query

I feel like this is obvious but i'm struggling. Must be because it's a monday.
I have a licenses table in MySQL which has fields id (int), start_date (date), licensable_id (int), licensable_type (string) and fixed_end_point (boolean).
I want to get all licenses where the start date is equal to or less than today, group them by licensable_id and licensable_type, and then get the most recently starting one so I can get the fixed_end_point field out of it, along with licensable_id and licensable_type.
This is what i'm trying:
SELECT licensable_id, licensable_type, fixed_end_point
FROM licenses
WHERE start_date <= "2016-08-01"
GROUP BY licensable_id, licensable_type
ORDER BY start_date desc;
At the moment, the ORDER BY field seems to be being ignored, and it's just returning the values from the first license for each group, rather than the most recent. Can anyone see what I'm doing wrong? Do I need to make a nested query?
You shouldn't be thinking about this as a group by. You want to select the most recent start_date for each license, given the constraints in the question. One method uses a correlated subquery:
select l.*
from licenses l
where l.start_date = (select max(l2.start_date)
from licenses l2
where l2.licensable_id = l.licensable_id and
l2.licensable_type = l.licensable_type and
l2.start_date <= '2016-08-01'
);
You don't use aggregation function so you should use distinct
SELECT DISTINCT licensable_id, licensable_type, fixed_end_point
FROM licenses
WHERE date(start_date) <= date(now())
ORDER BY start_date desc
limit 1;
The reason this doesn't give you the results you want is how GROUP CONCAT works.
With standard SQL any field in the SELECT must either be also mentioned in the GROUP BY clause or must be an aggregate field (there is an exception for fields 100% related to a field that is returned, but many flavours of SQL do not support this).
MySQL does allow a field to be in the SELECT clause which is not an aggregate value and is not mentioned in the GROUP BY clause, and allowing this was the default until recently. However for these fields there could be multiple values for the GROUP BY fields, and in this case which one is chosen is not defined. As this is worked out prior to the ORDER BY statement being processed, the ORDER BY clause has no effect on which one is chosen.
There are a few normal ways to do this. You can use a as Gordon has suggested, or similarly (and possibly more efficiently depending on records and indexes) you can use a sub query to get the latest rows date for each of your important rows, and then join that back to your main table:-
SELECT l.licensable_id,
l.licensable_type,
l.fixed_end_point
FROM licenses l
INNER JOIN
(
SELECT licensable_id,
licensable_type,
MAX(l2.start_date) AS max_start_date
FROM licenses
GROUP BY licensable_id,
licensable_type
) sub0
ON l.licensable_id = sub0.licensable_id
AND l.licensable_type = sub0.licensable_type
AND l.start_date = sub0.max_start_date
In some situations another option is to (ab)use the GROUP_CONCAT and SUBSTRING_INDEX functions. This way you can GROUP BY the fields you want to, but do a GROUP_CONCAT or the other fields in the descending order of the date. Then use SUBSTRING_INDEX to get everything up to the first comma (the default delimiter for GROUP_CONCAT):-
SELECT licensable_id,
licensable_type,
SUBSTRING_INDEX(GROUP_CONCAT(COALESCE(fixed_end_point, '') ORDER BY start_date DESC), ',', 1)
FROM licenses
WHERE start_date <= "2016-08-01"
GROUP BY licensable_id, licensable_type
Obviously this has issues if the latest row has a null value, hence I have used COALESCE to fudge in non null values. Also if the field contains commas you will need to use an alternative delimiter. And if the field is large then you might have issues with the max field length for GROUP_CONCAT (default is 1024 I think).

Database specific selection of data

I have a database and one of tables has the following structure:
recordId, vehicleId, dateOfTireChange, expectedKmBeforeNextChange, tireType
I want to make such a selection from the table that i only get thouse rows that contain the most recent date for each vehicleId.
I tried this approach
SELECT vehicleid,
Max(dateoftirechange) AS lastChange,
expectedkmbeforenextchange,
tiretype
FROM vehicle_tires
GROUP BY vehicleid
but it doesn't select the kilometers associated with the most recent date so it does not work.
Any idea how to make this selection?
There are several ways to get the desired result.
Correlated scalar subquery...
SELECT vt1.*
FROM vehicle_tire vt1
WHERE vt1.recordId = (SELECT vt2.recordId
FROM vehicle_tire vt2
WHERE vt2.vehicleId = vt1.vehicleId
ORDER BY vt2.dateOfTireChange DESC limit 1);
...or derived table...
SELECT vt2.*
FROM vehicle_tire vt2
JOIN (SELECT vt1.vehicleId as vehicleId,
MAX(vt1.dateOfTireChange) as maxDateOfTireChange
FROM vehicle_tire vt1
GROUP BY vt1.vehicleId) dt ON vt2.vehicleId = dt.vehicleId
AND vt2.dateOfTireChange = dt.dateOfTireChange;
...are two that come to mind.
The reason GROUP BY is not correct when applied to the whole table is that any columns you do not GROUP BY and that are also not the subject of aggregate functions MIN() MAX() AVG() COUNT(), etc., are assumed by the server to be columns that you know to be identical in every row of the groups established by the GROUP BY clause.
If, for example, I'm doing a query like this...
SELECT p.id,
p.full_name,
p.date_of_birth,
COUNT(c.id) AS number_of_children
FROM parent p LEFT JOIN child c ON c.parent_id = p.id
GROUP BY p.id;
The correct way to write this query would be GROUP BY p.id, p.full_name, p.date_of_birth, because none of those columns are part of the aggregate function COUNT().
The MySQL optimization allows you to exclude those columns that you know have to, by definition, be the same on each group from the GROUP BY, and the server will fill those columns with data from any row in the group. Which row is not defined. As you can see, in the example, the parent's full_name would be the same in all rows within a group-by parent.id, and that is a case when this optimization is legitimate. The justification is that it allows the server to have to handle smaller values (fewer bytes) when executing the grouping... but in a query like yours where the ungrouped columns have different values within each group, you get an invalid result, by design.
The SQL_MODE ONLY_FULL_GROUP_BY disables this optimization.