`Count Distinct` and `Group By` produce weird results

`Count Distinct` and `Group By` produce weird results - mysql

Please consider the following query:
SELECT artist.id, COUNT(DISTINCT artist$styles.v_id)
FROM artist
LEFT JOIN artist$styles ON artist$styles.p_id = artist.id
This is the result I get:
id count
1 4
The questions are:
How come it's only selecting one row from the artist table, when there are 4 rows in it and there are no WHERE, HAVING, LIMIT or GROUP BY clauses applied to the query?
There are only three records in artist$styles having p_id of value 1, why is it counting 4?
Why if I add a GROUP BY clause to it I get the correct results?
SELECT artist.id, COUNT(DISTINCT artist$styles.v_id)
FROM artist
LEFT JOIN artist$styles ON artist$styles.p_id = artist.id
GROUP BY artist.id
----
id count
1 3
2 1
3 3
4 1
This all just doesn't make sense to me. Could this be a bug of MySQL? I'm running Community 5.5.25a

As stated in the manual page on aggregate functions (of which COUNT() is one):
If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.
As stated in the manual page on GROUP BY with hidden columns:
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
In other words, the server has chosen one (indeterminate) value for column p_id, which happens in this case to be the value 1, whilst it has properly aggregated and counted the result for the COUNT() function.
Because you are then grouping on the correct columns, rather than on all rows.
It's not a bug; this behaviour is documented and by design.

It is a possible bug in Mysql. All non aggeregate columns should be included in Group by clause. MySQL does not force this and the result is unpredictable and hard to debug. As a rule always include all non-aggregate columns in the Group by clause. This is how all RDBMSs work

Count Function return single row result if you are not using group by clause and that's why its returning one row.
2.In your output
id count
1 4
4 is total no of results in that table not result for id 1.and it display in front of 1 because only one row produce.
3.when you use group by then a group of that column value is created that's why you get that output.
And finally its not a bug.Mysql provide a proper documentation for that you can read on mysql site.

Related

Mysql DISTINCT with more than one column (remove duplicates)

My database is called: (training_session)
I try to print out some information from my data, but I do not want to have any duplicates. I do get it somehow, may someone tell me what I do wrong?
SELECT DISTINCT athlete_id AND duration FROM training_session
SELECT DISTINCT athlete_id, duration FROM training_session
It works perfectly if i use only one column, but when I add another. it does not work.

I think you misunderstood the use of DISTINCT.
There is big difference between using DISTINCT and GROUP BY.
Both have some sort of goal, but they have different purpose.
You use DISTINCT if you want to show a series of columns and never repeat. That means you dont care about calculations or group function aggregates. DISTINCT will show different RESULTS if you keep adding more columns in your SELECT (if the table has many columns)
You use GROUP BY if you want to show "distinctively" on a certain selected columns and you use group function to calculate the data related to it. Therefore you use GROUP BY if you want to use group functions.
Please check group functions you can use in this link.
https://dev.mysql.com/doc/refman/8.0/en/group-by-functions.html
EDIT 1:
It seems like you are trying to get the "latest" of a certain athlete, I'll assume the current scenario if there is no ID.
Here is my alternate solution:
SELECT a.athlete_id ,
( SELECT b.duration
FROM training_session as b
WHERE b.athlete_id = a.athlete_id -- connect
ORDER BY [latest column to sort] DESC
LIMIT 1
) last_duration
FROM training_session as a
GROUP BY a.athlete_id
ORDER BY a.athlete_id
This syntax is called IN-SELECT subquery. With the help of LIMIT 1, it shows the topmost record. In-select subquery must have 1 record to return or else it shows error.

MySQL's DISTINCT clause is used to filter out duplicate recordsets.
If your query was SELECT DISTINCT athlete_id FROM training_session then your output would be:
athlete_id
----------
1
2
3
4
5
6
As soon as you add another column to your query (in your example, the column called duration) then each record resulting from your query are unique, hence the results you're getting. In other words the query is working correctly.

Order by Date not working as expected in MySql

I have a mysql query
select count(*) as TotalCount,
pd.Product_Modified_Date,
psc.Product_Subcategory_Name,
pd.Product_Image_URL
from product_subcategory psc
inner join product_details pd on psc.Product_Subcategory_ID = pd.Product_Subcategory_Reference_ID
where pd.Product_Status = 0 and
psc.Product_Subcategory_Status = 0
group by psc.Product_Subcategory_Name
order by pd.Product_Modified_Date desc
In my product_details table have new image urls. But i could not get it by the above query.
How can i do it?

You are grouping by one column, Product_Subcategory_Name, but you have other columns Product_Image_URL and Product_Modified_Date in your select-list.
If you have cases where the group has multiple rows (which you do, since the count is 14 or more in each group), MySQL can only present one value for the Product_Image_URL. So it picks some row in the group, and uses the value in that row. The URL value for all other rows in the group is ignored.
To fix this, you must group by all columns in your select-list that are not part of an aggregate function. Any column you don't want to use to form a new group must go into an aggregate function.
Roland Bouman wrote an excellent blog detailing how to use GROUP BY properly: http://rpbouman.blogspot.com/2007/05/debunking-group-by-myths.html

Combining GROUP BY and ORDER BY is problematic and your problem is most likely covered in another question on Stack Exchange : MySQL wrong results with GROUP BY and ORDER BY

MySQL GROUP BY returns only first row

I have a table named forms with the following structure-
GROUP | FORM | FILEPATH
====================================
SomeGroup | SomeForm1 | SomePath1
SomeGroup | SomeForm2 | SomePath2
------------------------------------
I use the following query-
SELECT * FROM forms GROUP BY 'GROUP'
It returns only the first row-
GROUP | FORM | FILEPATH
====================================
SomeGroup | SomeForm1 | SomePath1
------------------------------------
Shouldn't it return both (or all of it)? Or am I (possibly) wrong?

As the manual states:
In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause. For example, this query is illegal in standard SQL because the name column in the select list does not appear in the GROUP BY:
SELECT o.custid, c.name, MAX(o.payment)
FROM orders AS o, customers AS c
WHERE o.custid = c.custid
GROUP BY o.custid;
For the query to be legal, the name column must be omitted from the select list or named in the GROUP BY clause.
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
In your case, MySQL is correctly performing the grouping operation, but (since you select all columns including those by which you are not grouping the query) gives you an indeterminate one record from each group.

It only returns one row, because the values of your GROUP column are the same ... that's basically how GROUP BY works.
Btw, when using GROUP BY it's good form to use aggregate functions for the other columns, such as COUNT(), MIN(), MAX(). In MySQL it usually returns the first row of each group if you just specify the column names; other databases will not like that though.

Your code:
SELECT * FROM forms GROUP BY 'GROUP'
isn't very "good" SQL, MySQL lets you get away with it and returns only one value for all columns not mentioned in the group by clause. Almost any other database would not perform this query. As a rule, any column, that is not part of the grouping condition must be used with an aggregate function.

as far as mysql is concerned, I just solved my problem by hit & trial.
I had the same problem 10 minutes ago. I was using mysql statement something like this:
SELECT * FROM forms GROUP BY 'ID'; // returns only one row
However using the statement like the following would yeild same result:
SELECT ID FROM forms GROUP BY 'ID'; // returns only one row
The following was my solution:
SELECT ID FROM forms GROUP BY ID; // returns more than one row (with one column of field "ID") grouped by ID
or
SELECT * FROM forms GROUP BY ID; // returns more than one row (with columns of all fields) grouped by ID
or
SELECT * FROM forms GROUP BY `ID`; // returns more than one row (with columns of all fields) grouped by ID
Lesson: Donot use semicolon, i believe it does a stringtype search with colons. Remove colons from column name and it will group by its value. However you can use backtick escapes eg. ID

Thank you everyone for pointing out the obvious mistake I was too blind to see. I finally replaced GROUP BY with ORDER BY and included a WHERE clause to get my desired result. That is what I was intending to use all along. Silly me.
My final query becomes this-
SELECT * FROM forms WHERE GROUP='SomeGroup' ORDER BY 'GROUP'

SELECT * FROM forms GROUP BY `GROUP`
it's strange that your query does work

The above result is kind of correct, but not quite.
All columns you select, which are not part of the GROUP BY statement have to be aggregated by some function (list of aggregation function from the MySQL docu). Most often they are used together with numeric columns.
Besides this, your query will return one output row for every (combination of) attributes in the columns referenced in the GROUP BY statement. In your case there is just one distinct value in the GROUP column, namely "SomeGroup", so the output will only contain one row for this value.

Group by clause should only be required if you have any group functions, say max, min, avg, sum, etc, applied in query expressions. Your query does not show any such functions. Meaning you actually not required a Group by clause. And if you still use such clause, you will receive only the first record from a grouped results.
Hence output on your query is perfect.

Query result is perfect; it will return only one row.

How do I use distinct for a column along with a where clause in sql server 2008?

I want to get the distinct value of a particular column however duplicity is not properly managed if more than 3 columns are selected.
The query is:
SELECT DISTINCT
ShoppingSessionId, userid
FROM
dbo.tbl_ShoppingCart
GROUP BY
ShoppingSessionId, userid
HAVING
userid = 7
This query produces correct result, but if we add another column then result is wrong.
Please help me as I want to use the ShoppingSessionId as a distinct, except when I want to use all the columns from the table, including with the where clause .
How can I do that?

The DISTINCT keyword applies to the entire row, never to a column.
Presently DISTINCT is not needed at all, because your script already makes sure that ShoppingSession is distinct: by specifying the column in GROUP BY and filtering on the other grouping column (userid).
When you add a third column to GROUP BY and it results in duplicated ShoppingSession, it means that some ShoppingSession values are associated with many different values of the added column.
If you want ShoppingSession to remain distinct after including that third column, you should decide which values of the the added column should be left in the output and which should be discarded. This is called aggregating. You could apply the MAX() function to that column, or MIN() or any other suitable aggregate function. Note that the column should not be included in GROUP BY in this case.
Here's an illustration of what I'm talking about:
SELECT
ShoppingSessionId,
userid,
MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
GROUP BY
ShoppingSessionId,
userid
HAVING userid = 7
There's one more note on your query. The HAVING clause is typically used for filtering on aggregated columns. If your filter does not involve aggregated columns, you'll be better off using the WHERE clause instead:
SELECT
ShoppingSessionId,
userid,
MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
WHERE userid = 7
GROUP BY
ShoppingSessionId,
userid
Although both queries would produce identical results, their efficiency would be different, because the first query would have to pull all rows, group/aggregate them, then discard all rows except userid = 7, but the second one would discard rows first and only then group/aggregate the remaining, which is much more efficient.
You could go even further and exclude the userid column from GROUP BY and pull its value with an aggregate function:
SELECT
ShoppingSessionId,
MAX(userid) AS userid,
MAX(YourThirdColumn) AS YourThirdColumn
FROM dbo.tbl_ShoppingCart
WHERE userid = 7
GROUP BY
ShoppingSessionId
Since all userid values in your output are supposed to contain 7 (because that's in your filter), you can just pick a maximum value per every ShoppingSession, knowing that it'll always be 7.

Why ORDER BY + MAX() return the maximum value when I grouping?

I have this part of a query :
GROUP BY trackid
ORDER BY MAX(history.date) DESC
And I see that, when I grouping, it returns the row with maximum date for each group. Why this behaviour? Order should works on the whole rows...not on the grouping ?!?!?
EDIT (Example)
This is my whole query :
SELECT tracklist.value, history.date
FROM history JOIN tracklist ON history.trackid=tracklist.trackid
ORDER BY history.date DESC
The result is :
tracklist3 2011-04-27 15:40:36
tracklist2 2011-04-27 13:15:43
tracklist2 2011-04-02 00:30:02
tracklist2 2011-04-01 14:20:12
tracklist1 2011-03-02 14:13:58
tracklist1 2011-03-01 12:11:50
As you can see, all line is correctly ordered by history.date.
Now I'd like to group them, keeping for each group the row with MAX history.date.
So the output should be :
tracklist3 2011-04-27 15:40:36
tracklist2 2011-04-27 13:15:43
tracklist1 2011-03-02 14:13:58
I see that :
GROUP BY trackid
ORDER BY MAX(history.date) DESC
works correctly, but I really don't understand why it works :) Order by is for the whole rows, not for the grouping....

When you say SELECT trackid, MAX(history.date) ... GROUP BY trackid ORDER BY MAX(history.date) DESC you are really saying: "Show me for each tracklist the most recent history entry and please show me the tracklist first whose most recent history entry is (overall) most recent."
The ORDER BY is applied after the grouping, (that's why it comes after the GROUP BY in the SQL).
Note that in your example, you have SELECT tracklist.value, history.date instead of SELECT tracklist.value, MAX(history.date). That is just wrong and unfortunately MySQL does not give a warning but it choses a random history.date at its discretion.

ORDER BY MAX(history.date) DESC is somewhat redundant if all you want to do is order the result set. Ordering applies to the result set.
Consider the results if you remove the ORDER BY clause. Without that, your query would only be grouping on the trackid column, so it wouldn't be valid--you would need to add an aggregate function to the date column or add the date column to the GROUP BY clause. By adding the aggregate function to the ORDER BY clause, you are essentially telling the SQL engine that for each group of trackid, get the maximum date. This seems to be what you want.

It seems you do not fully understand the GROUP BY statement. I would recommend looking up a tutorial on it.
But essentially, the GROUP BY statement combines a number of rows into one. The column names you GROUP BY determine how unique the new combined rows will be. SQL doesn't know how to handle all of the non-grouped columns since each new combined row will be pulling data from a number of rows that contain different values in these columns. That's why you use aggregate functions on the non-grouped columns in the SELECT statement. The aggregate function MAX() looks at all of the values in the history.date column for the rows that are being combined and returns only one of them. Additionally, the ORDER BY clause can only use columns that are being selected, that's why ORDER BY also must contain aggregate functions.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

`Count Distinct` and `Group By` produce weird results - mysql

It is a possible bug in Mysql. All non aggeregate columns should be included in Group by clause. MySQL does not force this and the result is unpredictable and hard to debug. As a rule always include all non-aggregate columns in the Group by clause. This is how all RDBMSs work

Related

Mysql DISTINCT with more than one column (remove duplicates)

Order by Date not working as expected in MySql

MySQL GROUP BY returns only first row

How do I use distinct for a column along with a where clause in sql server 2008?

Why ORDER BY + MAX() return the maximum value when I grouping?

Categories

Resources