How to GROUP BY on a GROUP_CONCAT(DISTINCT xxx) as yyyy

How to GROUP BY on a GROUP_CONCAT(DISTINCT xxx) as yyyy - mysql

I am trying to GROUP BY a MYSQL request on a GROUP_CONCAT. The trio of values that is generated by this GROUP_CONCAT is the only unique identifier that I have to describe the group I want to apply the GROUP BY.
When I do the following :
SELECT [...] GROUP_CONCAT(DISTINCT xxx) as supsku
[...]
GROUP BY supsku
it says :
Can't group on 'supsku'
Thanks a lot

One way to go try with a subselect
SELECT t.* FROM (
SELECT [...] GROUP_CONCAT(DISTINCT xxx) as supsku
[...]
) t
GROUP BY supsku

You can't group by a column whose contents don't exist until after the groups are formed. That's a chicken-and-egg problem.
By analogy, suppose I ask you to scratch off some lottery tickets, but scratch them only if the total value of the winning tickets is more than $100? Obviously, you can't know what the winning values are before you scratch the lottery tickets, so you can't know if you should scratch them or not.
The answer from #MKhalidJunaid shows part of the solution -- using a subquery to produce a partial result with the strings formed into groups. Then embed that as a derived table subquery to be further processed by an outer query with a GROUP BY.
But the problem with that solution is that we don't know how to group the strings in the inner subquery. Without a valid GROUP BY in the subquery, the default is to treat the whole table as one group, and therefore GROUP_CONCAT will return one row with one string.
So you need to think about defining your problem better. There must be some other grouping criterion you have in mind.

Related

Simplify select with count

I need to get some values from a database and count all rows.
I wrote this code:
SELECT author, alias, (select COUNT(*) from registry WHERE status='OK' AND type='H1') AS count
FROM registry
WHERE status='OK' AND type='H1'
It works, but, how can I simplify this code? Both WERE condition thereof.

If the query is returning the resultset you need, with the "total" count of rows (independent of author and alias), with the same exact value for "count" repeated on each row, we could rewrite the query like this:
SELECT t.author
, t.alias
, s.count
FROM registry t
CROSS
JOIN ( SELECT COUNT(*) AS `count`
FROM registry c
WHERE c.status='OK'
AND c.type='H1'
) s
WHERE t.status='OK'
AND t.type='H1'
I don't know if that's any simpler, but to someone reading the statement, I think it makes it more clear what resultset is being returned.
(I also tend to favor avoiding any subquery in the SELECT list, unless there is a specific reason to add one.)
The resultset from this query is a bit odd. But absent any example data, expected output or any specification other than the original query, we're just guessing. The query in my answer replicates the results from the original query, in a way that's more clear.

try this:
SELECT author, alias, count(1) AS caunt
FROM registry
WHERE status='OK' AND type='H1'
group by author, alias

The status and type are the criteria to extract the fields from the table. You cannot go more simpler than this.

Try using a group by,
SELECT author, alias, COUNT(*) AS COUNT
FROM registry
WHERE STATUS='OK' AND TYPE='H1'
GROUP BY author, alias
Depending on how your table is set up, you may actually need to do this to get the data you are looking for.
Some good reading for this: https://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html
However, if you are wanting the count to be simply ALL the records with that criteria, what you have may be simplest already.

Why does MySQL allow you to group by columns that are not selected

I'm reading a book on SQL (Sams Teach Yourself SQL in 10 Minutes) and its quite good despite its title. However the chapter on group by confuses me
"Grouping data is a simple process. The selected columns (the column list following
the SELECT keyword in a query) are the columns that can be referenced in the GROUP
BY clause. If a column is not found in the SELECT statement, it cannot be used in the
GROUP BY clause. This is logical if you think about it—how can you group data on a
report if the data is not displayed? "
How come when I ran this statement in MySQL it works?
select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;

You're right, MySQL does allow you to create queries that are ambiguous and have arbitrary results. MySQL trusts you to know what you're doing, so it's your responsibility to avoid queries like that.
You can make MySQL enforce GROUP BY in a more standard way:
mysql> SET SQL_MODE=ONLY_FULL_GROUP_BY;
mysql> select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
ERROR 1055 (42000): 'test.EMPLOYEE_PAY_TBL.EMP_ID' isn't in GROUP BY

Because the book is wrong.
The columns in the group by have only one relationship to the columns in the select according to the ANSI standard. If a column is in the select, with no aggregation function, then it (or the expression it is in) needs to be in the group by statement. MySQL actually relaxes this condition.
This is even useful. For instance, if you want to select rows with the highest id for each group from a table, one way to write the query is:
select t.*
from table t
where t.id in (select max(id)
from table t
group by thegroup
);
(Note: There are other ways to write such a query, this is just an example.)
EDIT:
The query that you are suggesting:
select EMP_ID, SALARY
from EMPLOYEE_PAY_TBL
group by BONUS;
would work in MySQL but probably not in any other database (unless BONUS happens to be a poorly named primary key on the table, but that is another matter). It will produce one row for each value of BONUS. For each row, it will get an arbitrary EMP_ID and SALARY from rows in that group. The documentation actually says "indeterminate", but I think arbitrary is easier to understand.
What you should really know about this type of query is simply not to use it. All the "bare" columns in the SELECT (that is, with no aggregation functions) should be in the GROUP BY. This is required in most databases. Note that this is the inverse of what the book says. There is no problem doing:
select EMP_ID
from EMPLOYEE_PAY_TBL
group by EMP_ID, BONUS;
Except that you might get multiple rows back for the same EMP_ID with no way to distinguish among them.

Scope of COUNT(DISTINCT ..) when used with GROUP BY

I'm doing something like follows (Example, getting distinct people named "Mark" by State):
Select count(distinct FirstName) FROM table
GROUP BY State
I think the group by query organization is done first, such that the distinct is only relative to each "group by"? Basically, can "Mark" show up as a "distinct" count in each group? This would "scope" my distinct expression to the group by rows only, I believe...

This may actually depend on where DISTINCT is used. For example, SELECT DISTINCT COUNT( would be different than SELECT COUNT(DISTINCT.
In this case, it will work as you want and get a count of distinct names in each group (even if the names are not distinct across groups).

Your understanding is correct. Group by says, essentially, to take a group of rows and aggregate them into one row (based on the criteria). All aggregation functions -- including count(distinct) -- summarize values in this group.
As a note, you are using the word "scope". Just so you know, this has a particular meaning in SQL. The meaning refers to the portions of the query where a column or table alias are understood by the compiler.

using count and suppress/ignore group by

Is it possible to have count in the select clause with a group by which is suppressed in the count? I need the count to ignore the group by clause
I got this query which is counting the total entries. The query is generic generated and therefore I can't make any comprehensive changes like subqueries etc.
In some specific cases a group by is needed to retrieve the correct rows and because of this the group by can't be removed
SELECT count(dv.id) num
FROM `data_voucher` dv
LEFT JOIN `data_voucher_enclosure` de ON de.data_voucher_id=dv.id
WHERE IF(de.id IS NULL,0,1)=0
GROUP BY dv.id

Is it possible to have count in the select clause with a group by which is suppressed in the count? I need the count to ignore the group by clause
well, the answer to your question is simply you can't have an aggregate that works on all the results, while having a group by statement. That's the whole purpose of the group by to create groups that change the behaviour of aggregates:
The GROUP BY clause causes aggregations to occur in groups (naturally) for the columns you name.
cf this blog post which is only the first result I found on google on this topic.
You'd need to redesign your query, the easiest way being to create a subquery, or a hell of a jointure. But without the schema and a little context on what you want this query to do, I can't give you an alternative that works.
I just can tell you that you're trying to use a hammer to tighten a screw...

Have found an alternative where COUNT DISTINCT is used
SELECT count(distinct dv.id) num
FROM `data_voucher` dv
LEFT JOIN `data_voucher_enclosure` de ON de.data_voucher_id=dv.id
WHERE IF(de.id IS NULL,0,1)=0

MySQL Joins, Group By, and Ordering the Group By Choice

Is it possible to order the GROUP BY chosen results of a MySQL query w/out using a subquery? I'm finding that, with my large dataset, the subquery adds a significant amount of load time to my query.
Here is a similar situation: how to sort order of LEFT JOIN in SQL query?
This is my code that works, but it takes way too long to load:
SELECT tags.contact_id, n.last
FROM tags
LEFT JOIN ( SELECT * FROM names ORDER BY timestamp DESC ) n
ON (n.contact_id=tags.contact_id)
WHERE tags.tag='$tag'
GROUP BY tags.contact_id
ORDER BY n.last ASC;
I can get a fast result doing a simple join w/ a table name, but the "group by" command gives me the first row of the joined table, not the last row.

I'm not really sure what you're trying to do. Here are some of the problems with your query:
selecting n.last, although it is neither in the group by clause, nor an aggregate value. Although MySQL allows this, it's really not a good idea to take advantage of.
needlessly sorting a table before joining, instead of just joining
the subquery isn't really doing anything
I would suggest carefully writing down the desired query results, i.e. "I want the contact id and latest date for each tag" or something similar. It's possible that will lead to a natural, easy-to-write and semantically correct query that is also more efficient than what you showed in the OP.
To answer the question "is it possible to order a GROUP BY query": yes, it's quite easy, and here's an example:
select a, b, sum(c) as `c sum`
from <table_name>
group by a,b
order by `c sum`

You are doing a LEFT JOIN on contact ID which implies you want all tag contacts REGARDLESS of finding a match in the names table. Is that really the case, or will the tags table ALWAYS have a "Names" contact ID record. Additionally, your column "n.Last". Is this the person's last name, or last time something done (which I would assume is actually the timestamp)...
So, that being said, I would just do a simple direct join
SELECT DISTINCT
t.contact_id,
n.last
FROM
tags t
JOIN names n
ON t.contact_id = n.contact_id
WHERE
t.tag = '$tag'
ORDER BY
n.last ASC

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008