SQL: Group By is mismatching records - mysql

I'm trying to get the highest version within a group. My query:
SELECT
rubric_id,
max(version) as version,
group_id
FROM
rubrics
WHERE
client_id = 1
GROUP BY
group_id
The Data:
The Results:
The rubric of ID 2 does not have a version of 2, why is this being mismatched? What do I need to do to correct this?
Edit, not a duplicate:
This is not a duplicate of SQL Select only rows with Max Value on a Column , which is a post I have read and referenced before writing this. My question is not how to find the max, my question is why is the version not matched to the correct ID

MySQL is confusing you by letting you get away with having a column in your select that isn't in your group by. To resolve the issue, make sure you don't select any field that isn't in the group by.
Instead of trying to get everything in one statement, you will need to use a subquery to find the max_version_id and then join to it.
SELECT T.*
FROM rubrics T
JOIN
(
SELECT
group_id,
max(version) as max_version
FROM
rubrics
GROUP BY
group_id
) dedupe
on T.group_id = dedupe.group_id
and T.version_id = dedupe.max_version_id
WHERE
T.client_id = 1

Edit: So MySQL allows it, but I don't think it's a good practise to use it.
You are trying to query non-aggregated data from an aggregated query. You should not do that.
A GROUP BY takes the field it should make group of rows with (in your case, what you say with your GROUP BY is: give me a result per different group_id) and gives a result (the aggregated data) based on the grouping.
Here, you try to access non aggregated data (rubric_id in your case). For some reason, the query does not crash and picks a "random" id in your aggregated data.

Related

Mysql DISTINCT with more than one column (remove duplicates)

My database is called: (training_session)
I try to print out some information from my data, but I do not want to have any duplicates. I do get it somehow, may someone tell me what I do wrong?
SELECT DISTINCT athlete_id AND duration FROM training_session
SELECT DISTINCT athlete_id, duration FROM training_session
It works perfectly if i use only one column, but when I add another. it does not work.
I think you misunderstood the use of DISTINCT.
There is big difference between using DISTINCT and GROUP BY.
Both have some sort of goal, but they have different purpose.
You use DISTINCT if you want to show a series of columns and never repeat. That means you dont care about calculations or group function aggregates. DISTINCT will show different RESULTS if you keep adding more columns in your SELECT (if the table has many columns)
You use GROUP BY if you want to show "distinctively" on a certain selected columns and you use group function to calculate the data related to it. Therefore you use GROUP BY if you want to use group functions.
Please check group functions you can use in this link.
https://dev.mysql.com/doc/refman/8.0/en/group-by-functions.html
EDIT 1:
It seems like you are trying to get the "latest" of a certain athlete, I'll assume the current scenario if there is no ID.
Here is my alternate solution:
SELECT a.athlete_id ,
( SELECT b.duration
FROM training_session as b
WHERE b.athlete_id = a.athlete_id -- connect
ORDER BY [latest column to sort] DESC
LIMIT 1
) last_duration
FROM training_session as a
GROUP BY a.athlete_id
ORDER BY a.athlete_id
This syntax is called IN-SELECT subquery. With the help of LIMIT 1, it shows the topmost record. In-select subquery must have 1 record to return or else it shows error.
MySQL's DISTINCT clause is used to filter out duplicate recordsets.
If your query was SELECT DISTINCT athlete_id FROM training_session then your output would be:
athlete_id
----------
1
2
3
4
5
6
As soon as you add another column to your query (in your example, the column called duration) then each record resulting from your query are unique, hence the results you're getting. In other words the query is working correctly.

Order by Date not working as expected in MySql

I have a mysql query
select count(*) as TotalCount,
pd.Product_Modified_Date,
psc.Product_Subcategory_Name,
pd.Product_Image_URL
from product_subcategory psc
inner join product_details pd on psc.Product_Subcategory_ID = pd.Product_Subcategory_Reference_ID
where pd.Product_Status = 0 and
psc.Product_Subcategory_Status = 0
group by psc.Product_Subcategory_Name
order by pd.Product_Modified_Date desc
In my product_details table have new image urls. But i could not get it by the above query.
How can i do it?
You are grouping by one column, Product_Subcategory_Name, but you have other columns Product_Image_URL and Product_Modified_Date in your select-list.
If you have cases where the group has multiple rows (which you do, since the count is 14 or more in each group), MySQL can only present one value for the Product_Image_URL. So it picks some row in the group, and uses the value in that row. The URL value for all other rows in the group is ignored.
To fix this, you must group by all columns in your select-list that are not part of an aggregate function. Any column you don't want to use to form a new group must go into an aggregate function.
Roland Bouman wrote an excellent blog detailing how to use GROUP BY properly: http://rpbouman.blogspot.com/2007/05/debunking-group-by-myths.html
Combining GROUP BY and ORDER BY is problematic and your problem is most likely covered in another question on Stack Exchange : MySQL wrong results with GROUP BY and ORDER BY

Remove Duplicate record from Mysql Table using Group By

I have a table structure and data below.
I need to remove duplicate record from the table list. My confusion is that when I am firing query
SELECT * FROM `table` GROUP BY CONCAT(`name`,department)
then giving me correct list(12 records).
Same query when I am using the subquery:
SELECT *
FROM `table` WHERE id IN (SELECT id FROM `table` GROUP BY CONCAT(`name`,department))
It returning all record which is wrong.
So, My question is why group by in subquery is not woking.
Actually as Tim mentioned in his answer that it to get first unique record by group by clause is not a standard feature of sql but mysql allows it till mysql5.6.16 version but from 5.6.21 it has been changed.
Just change mysql version in your sql fiddle and check that you will get what you want.
In the query
SELECT * FROM `table` GROUP BY CONCAT(`name`,department)
You are selecting the id column, which is a non-aggregate column. Many RDBMS would give you an error, but MySQL allows this for performance reasons. This means MySQL has to choose which record to retain in the result set. Based on the result set in your original problem, it appears that MySQL is retaining the id of the first duplicate record, in cases where a group has more than one member.
In the query
SELECT *
FROM `table`
WHERE id IN
(
SELECT id FROM `table` GROUP BY CONCAT(`name`,department)
)
you are also selecting a non-aggregate column in the subquery. It appears that MySQL actually decides which id value to be retained in the subquery based on the id value in the outer query. That is, for each id value in table, MySQL performs the subquery and then selectively chooses to retain a record in the group if two id values match.
You should avoid using a non-aggregate column in a query with GROUP BY, because it is a violation of the ANSI standard, and as you have seen here it can result in unexpected results. If you give us more information about what result set you want, we can give you a correct query which will avoid this problem.
I welcome anyone who has documentation to support these observations to either edit my question or post a new one.
You can JOIN the grouped ids with that of table ids, so that you can get desired results.
Example:
SELECT t.* FROM so_q32175332 t
JOIN ( SELECT id FROM so_q32175332
GROUP BY CONCAT( name, department ) ) f
ON t.id = f.id
ORDER BY CONCAT( name, department );
Here order by was added just to compare directly the * results on group.
Demo on SQL Fiddle: http://sqlfiddle.com/#!9/d715a/1

mysql ORDER BY MIN() not matching up with id

I have a database that has the following columns:
-------------------
id|domain|hit_count
-------------------
And I would like to perform this query on it:
SELECT id,MIN(hit_count)
FROM table WHERE domain='$domain'
GROUP BY domain ORDER BY MIN(hit_count)
I would like this query to give me the id of the row that had the smallest hit_count for $domain. The only problem is that if I have two rows that have the same domain, say www.bestbuy.com, the query will just group by whichever one came first, and then although I will get the correct lowest hit_count, the id may or may not be the id of the row that has the lowest hit_count.
Does anyone know of a way for me to perform this query and to get the id that matches up with MIN(hit_count)? Thanks!
Try this:
SELECT id,MIN(hit_count),domain FROM table GROUP BY domain HAVING domain='$domain'
See, when you're using aggregates, either via aggregate functions (and min() is such a function) or via GROUP BY or HAVING operators, your data is being grouped. In your case it is grouped by domain. You have 2 fields in your select list, id and min(hit_count).
Now, for each group database knows which hit_count to pick, as you've specified this explicitly via the aggregate function. But what about id — which one should be included?
MySQL internally wraps such fields into max() aggregate function, which I find an error prone approach. In all other RDBMSes you will get an error for such a query.
The rule is: if you use aggregates, then all columns should be either arguments of aggregate functions or arguments of GROUP BY operator.
To achieve the desired result, you need a subquery:
SELECT id, domain, hit_count
FROM `table`
WHERE domain = '$domain'
AND hit_count = (SELECT min(hit_count) FROM `table` WHERE domain = '$domain');
I've used backticks, as table is a reserved word in SQL.
SELECT
id,
hit_count
FROM
table
WHERE
domain='$domain'
AND hit_count = (SELECT MIN(hit_count) FROM table WHERE domain='$domain')
Try this:
SELECT id,hit_count
FROM table WHERE domain='$domain'
GROUP BY domain ORDER BY hit_count ASC;
This should also work:
select id, MIN(hit_count) from table where domain="$domain";
I had same question. Please see that question below.
min(column) is not returning me correct data of other columns
You are using a GROPU BY. Which means each row in result represents a group of values.
One of those values is the group name (the value of the field you grouped by). The rest are arbitrary values from within that group.
For example the following table:
F1 | F2
1 aa
1 bb
1 cc
2 gg
2 hh
If u will group by F1: SELECT F1,F2 from T GROUP BY F1
You will get two rows:
1 and one value from (aa,bb,cc)
2 and one value from (gg,hh)
If u want a deterministic result set, you need to tell the software what algorithem to apply to the group. Several for example:
MIN
MAX
COUNT
SUM
etc etc
There is a most simplist way your query is OK just modify it with DESC keyword after GROUP BY domain
SELECT
id,
MIN(hit_count)
FROM table
WHERE domain = '$domain'
GROUP BY domain DESC
ORDER BY MIN(hit_count)
Explanation:
When you use group by with aggregate function it always selects the first record but if you restrict it with desc keyword it will select the lowest or last record of that group.
For testing puspose use this query that has only group_concat added.
SELECT
group_concat(id),
MIN(hit_count)
FROM table
WHERE domain = '$domain'
GROUP BY domain DESC
ORDER BY MIN(hit_count)
If you can have duplicated domains group by id:
SELECT id,MIN(hit_count)
FROM domain WHERE domain='$domain'
GROUP BY id ORDER BY MIN(hit_count)

Will grouping an ordered table always return the first row? MYSQL

I'm writing a query where I group a selection of rows to find the MIN value for one of the columns.
I'd also like to return the other column values associated with the MIN row returned.
e.g
ID QTY PRODUCT TYPE
--------------------
1 2 Orange Fruit
2 4 Banana Fruit
3 3 Apple Fruit
If I GROUP this table by the column 'TYPE' and select the MIN qty, it won't return the corresponding product for the MIN row which in the case above is 'Apple'.
Adding an ORDER BY clause before grouping seems to solve the problem. However, before I go ahead and include this query in my application I'd just like to know whether this method will always return the correct value. Is this the correct approach? I've seen some examples where subqueries are used, however I have also read that this inefficient.
Thanks in advance.
Adding an ORDER BY clause before grouping seems to solve the problem. However, before I go ahead and include this query in my application I'd just like to know whether this method will always return the correct value. Is this the correct approach? I've seen some examples where subqueries are used, however I have also read that this inefficient.
No, this is not the correct approach.
I believe you are talking about a query like this:
SELECT product.*, MIN(qty)
FROM product
GROUP BY
type
ORDER BY
qty
What you are doing here is using MySQL's extension that allows you to select unaggregated/ungrouped columns in a GROUP BY query.
This is mostly used in the queries containing both a JOIN and a GROUP BY on a PRIMARY KEY, like this:
SELECT order.id, order.customer, SUM(price)
FROM order
JOIN orderline
ON orderline.order_id = order.id
GROUP BY
order.id
Here, order.customer is neither grouped nor aggregated, but since you are grouping on order.id, it is guaranteed to have the same value within each group.
In your case, all values of qty have different values within the group.
It is not guaranteed from which record within the group the engine will take the value.
You should do this:
SELECT p.*
FROM (
SELECT DISTINCT type
FROM product p
) pd
JOIN p
ON p.id =
(
SELECT pi.id
FROM product pi
WHERE pi.type = pd.type
ORDER BY
type, qty, id
LIMIT 1
)
If you create an index on product (type, qty, id), this query will work fast.
It's difficult to follow you properly without an example of the query you try.
From your comments I guess you query something like,
SELECT ID, COUNT(*) AS QTY, PRODUCT_TYPE
FROM PRODUCTS
GROUP BY PRODUCT_TYPE
ORDER BY COUNT(*) DESC;
My advice, you group by concept (in this case PRODUCT_TYPE) and you order by the times it appears count(*). The query above would do what you want.
The sub-queries are mostly for sorting or dismissing rows that are not interested.
The MIN you look is not exactly a MIN, it is an occurrence and you want to see first the one who gives less occurrences (meaning appears less times, I guess).
Cheers,