Scope of COUNT(DISTINCT ..) when used with GROUP BY - mysql

I'm doing something like follows (Example, getting distinct people named "Mark" by State):
Select count(distinct FirstName) FROM table
GROUP BY State
I think the group by query organization is done first, such that the distinct is only relative to each "group by"? Basically, can "Mark" show up as a "distinct" count in each group? This would "scope" my distinct expression to the group by rows only, I believe...

This may actually depend on where DISTINCT is used. For example, SELECT DISTINCT COUNT( would be different than SELECT COUNT(DISTINCT.
In this case, it will work as you want and get a count of distinct names in each group (even if the names are not distinct across groups).

Your understanding is correct. Group by says, essentially, to take a group of rows and aggregate them into one row (based on the criteria). All aggregation functions -- including count(distinct) -- summarize values in this group.
As a note, you are using the word "scope". Just so you know, this has a particular meaning in SQL. The meaning refers to the portions of the query where a column or table alias are understood by the compiler.

Related

How to find items with multiple different names

I'm having trouble with a very simple sql query. I want to identify all items that have more than one name. Here's what I'm currently doing:
select group_concat(distinct name) names
from table
group by master_id
having names like '%,%'
Unfortunately, a lot of names have a , in it, so the above doesn't work well. What would be the correct way to do this query?
Here is a correct version of your query:
SELECT
master_id,
GROUP_CONCAT(DISTINCT name) names
FROM yourTable
GROUP BY master_id
HAVING COUNT(DISTINCT name) > 1;
The reason we need to count distinct in the HAVING clause is that a logical item in the aggregated string is a distinct name.
The correct solution would be:
… HAVING COUNT(name) > 1
In a query using GROUP BY, aggregate functions like COUNT(), MIN(), and MAX() (as well as GROUP_CONCAT(), as well as a few others) can be used to operate on all values of a column in the grouped rows.
You could also include COUNT(name) in the columns to return the number of names for the master_id.

Reason for error using select statement containing aggregate functions?

Could someone explain why the following query throws an error, if I am trying to get the names of all customers along with the total number of customers?
SELECT name, COUNT(*)
FROM CUSTOMER
I know that selecting columns along with an aggregate function requires a GROUP BY statement containing all the column names, but I don't understand the logical principle behind this.
edit:
http://sqlfiddle.com/#!2/90233/595
I guess 'error' isn't quite right, but notice how the current query returns Allison 9 as the only result.
I don't understand why it doesn't return:
Alison 9
Alison 9
Alison 9
Alison 9
Jason 9
...
(This is a new answer based on the comment and looking at the fiddle.)
The issue here is how mysql handles aggregate functions -- which is a non-standard way and different then everyone else.
mysql lets you use aggregate functions (count() is an example of an aggregate function) without a group by. All (or most?) other sql implementations require the group by when you use count(*). When you have a group by you have to say the range in the group by (for example group by name). Also every column has to be in the range or the result of an aggregate function.
SINCE you don't have a range mysql assumes the whole table and since you have a column that is not the result of a aggregate function or in the range (in this case name) mysql does something to make that column the result of an aggregate function. I'm not sure if it is specified in mysql what it does -- lets say "max()". (Fairly sure it is max()). So the real sql that is getting executed is
SELECT ANY_VALUE(name), COUNT(*)
FROM CUSTOMER
Thus you only see one name.
mysql documentation - http://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
After reading the above I see that mysql will use the default aggregate function ANY_VALUE() for columns which are not in the range.
If you just want the total number of customers on each row you could do this
SELECT DISTINCT NAME, COUNT(NAME) OVER () AS CustomerCount
FROM CUSTOMER
In this way you don't need the GROUP BY syntax. Under the covers it is probably doing the same thing as #GordonLinoff 's answer.
I added this because maybe it makes it clearer how group by works.
Select name, Count(*) as 'CountCustomers'
FROM CUSTOMER
Group by name
Order by name
Think of it as giving an instruction of which field to aggregate by. For example, if you had a field with the State of the Customer, you could group by State which would give a count of customers by state.
Also, note you can have multiple aggregate functions in the same select using the "over (partition by" construct.
If you want the names along with the total number of customers, then use a window function:
select name, count(*) as NumCustomersWithName,
sum(count(*)) over () as NumCustomers
from customer
group by name;
Edit:
You actually seem to want:
select name, count(*) over () as NumCustomers
from customer;
In MySQL, you would do this with a subquery:
select name, cnt
from customers cross join
(select count(*) as cnt from customers) x;
The reason your query doesn't work is because it is an aggregation query that returns exactly one row. When you use aggregation functions without a GROUP BY, then the query always returns exactly one row.

How to GROUP BY on a GROUP_CONCAT(DISTINCT xxx) as yyyy

I am trying to GROUP BY a MYSQL request on a GROUP_CONCAT. The trio of values that is generated by this GROUP_CONCAT is the only unique identifier that I have to describe the group I want to apply the GROUP BY.
When I do the following :
SELECT [...] GROUP_CONCAT(DISTINCT xxx) as supsku
[...]
GROUP BY supsku
it says :
Can't group on 'supsku'
Thanks a lot
One way to go try with a subselect
SELECT t.* FROM (
SELECT [...] GROUP_CONCAT(DISTINCT xxx) as supsku
[...]
) t
GROUP BY supsku
You can't group by a column whose contents don't exist until after the groups are formed. That's a chicken-and-egg problem.
By analogy, suppose I ask you to scratch off some lottery tickets, but scratch them only if the total value of the winning tickets is more than $100? Obviously, you can't know what the winning values are before you scratch the lottery tickets, so you can't know if you should scratch them or not.
The answer from #MKhalidJunaid shows part of the solution -- using a subquery to produce a partial result with the strings formed into groups. Then embed that as a derived table subquery to be further processed by an outer query with a GROUP BY.
But the problem with that solution is that we don't know how to group the strings in the inner subquery. Without a valid GROUP BY in the subquery, the default is to treat the whole table as one group, and therefore GROUP_CONCAT will return one row with one string.
So you need to think about defining your problem better. There must be some other grouping criterion you have in mind.

How does adding GROUP BY or DISTINCT give the same result set?

SELECT unit.id,
unit.unit_name,
unit.description,
unit.category_id,
city.name,
mealbase.name AS mealbase_name,
unit.province_id,
unit.rooms,
unit.max_people,
unit.thumblocation,
prices.normal_price,
prices.holiday_price
FROM jos_units AS unit,
jos_prices AS prices,
jos_cities AS city,
jos_meal_basis AS mealbase
WHERE prices.unit_id = unit.id
AND city.id = unit.city_id
AND unit.published = 1
AND unit.mealbasis_id = mealbase.id
When I run this query It gives me redundant result set as below.
But If I add
SELECT DISTINCT unit.id Instead of SELECT unit.id at the beginning Or
GROUP BY unit.unit.id at the end. It gives me correct result set as below.
My issue is What's wrong with my query(join above gives redundant result even I have corrected joined them)? Why does the adding SELECT DISTINCT unit.id or GROUP BY unit.unit.id is same for the query(which fixes the issue) here? (DISTINCT AND GROUP BY are different functionalities)
Given that I know adding `SELECT DISTINCT unit.id will remove the redundant results but how does the adding one of the two snippet gives same result set? Obviously SELECT DISTINCT unit.id should remove redundant rows by how does the GROUP BY do it?
Basically you are grouping the results without using an aggregation function (using a COUNT, or a MAX, for examples), thus you get the aggregate row in the same way you would obtain it by selecting DISTINCT objects. If you don't need to aggregate them, DISTINCT is the right thing to do.
join above gives redundant result even I have corrected joined them
why is it?
Thats because of how your tables:
jos_units.
jos_prices.
jos_cities.
jos_meal_basis.
are related to each others.
It seems like you have one to many or many to many relations between those tables. For instance, for each record in the jos_meal_basis, each meal has a unit, so many meals might be measured by the same unit, then when joining the two tables you will get redundant units because of this. The same with other tables.
Your combination in the first query, ie
(unit.id,
unit.unit_name,
unit.description,
unit.category_id,
city.name,
mealbase.name AS mealbase_name,
unit.province_id,
unit.rooms,
unit.max_people,
unit.thumblocation,
prices.normal_price,
prices.holiday_price) has duplicates and so you are getting more than 1 rows for the same combination.
When you use distinct clause or group by it removes duplicates in your above combination. Hope this helps you.
GROUP BY is primarily used if you want to use aggregate or group functions. For example if you wanted to find the number of rows that match you could do
SELECT
id
, COUNT(id) num_rows
FROM
...
GROUP BY id
because the COUNT is an aggregate function you need to group by the other columns. If you aren't doing any aggregate functions, GROUP BY is essentially just aggregating the rows up (if that's the way you've written it) causing only one row - the same as DISTINCT.

MySQL - What is the difference between GROUP BY and DISTINCT? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Is there any difference between Group By and Distinct
What's the difference between GROUP BY and DISTINCT in a MySQL query?
Duplicate of
Is there any difference between GROUP BY and DISTINCT
It is already discussed here
If still want to listen here
Well group by and distinct has its own use.
Distinct is used to filter unique records out of the records that satisfy the query criteria.
Group by clause is used to group the data upon which the aggregate functions are fired and the output is returned based on the columns in the group by clause. It has its own limitations such as all the columns that are in the select query apart from the aggregate functions have to be the part of the Group by clause.
So even though you can have the same data returned by distinct and group by clause its better to use distinct. See the below example
select col1,col2,col3,col4,col5,col6,col7,col8,col9 from table group by col1,col2,col3,col4,col5,col6,col7,col8,col9
can be written as
select distinct col1,col2,col3,col4,col5,col6,col7,col8,col9 from table
It makes you life easier when you have more columns in the select list. But at the same time if you need to display sum(col10) along with the above columns than you will have to use Group By. In that case distinct will not work.
eg
select col1,col2,col3,col4,col5,col6,col7,col8,col9,sum(col10) from table group by col1,col2,col3,col4,col5,col6,col7,col8,col9
Hope this helps.
DISTINCT works only on the entire row. Don't be mislead into thinking SELECT DISTINCT(A), B does something different. This is equivalent to SELECT DISTINCT A, B.
On the other hand GROUP BY creates a group containing all the rows that share each distinct value in a single column (or in a number of columns, or arbitrary expressions). Using GROUP BY you can use aggregate functions such as COUNT and MAX. This is not possible with DISTINCT.
If you want to ensure that all rows in your result set are unique and you do not need to aggregate then use DISTINCT.
For anything more advanced you should use GROUP BY.
Another difference that applies only to MySQL is that GROUP BY also implies an ORDER BY unless you specify otherwise. Here's what can happen if you use DISTINCT:
SELECT DISTINCT a FROM table1
Results:
2
1
But using GROUP BY the results will come in sorted order:
SELECT a FROM table1 GROUP BY a
Results:
1
2
As a result of the lack of sorting using DISTINCT is faster in the case where you can use either. Note: if you don't need the sorting with GROUP BY you can add ORDER BY NULL to improve performance.
Care to look at the docs:
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html
and
http://dev.mysql.com/doc/refman/5.0/en/select.html