Why does SQL query with GROUP BY produce more rows?

Why does SQL query with GROUP BY produce more rows? - mysql

I have the following table:
+------+-------+--------------------------------------+
| id | rev | content |
+------+-------+--------------------------------------+
| 1 | 1 | ... |
| 2 | 1 | ... |
| 1 | 2 | ... |
| 1 | 3 | ... |
+------+-------+--------------------------------------+
When I run the following query:
SELECT id, MAX(rev) maxrev, content
FROM YourTable
GROUP BY id;
I get:
+------+----------+--------------------------------------+
| id | maxrev | content |
+------+----------+--------------------------------------+
| 1 | 3 | ... |
| 2 | 1 | ... |
+------+----------+--------------------------------------+
But if I remove the GROUP BY clause as follows:
SELECT id, MAX(rev) maxrev, content
FROM YourTable;
I get:
+------+----------+--------------------------------------+
| id | maxrev | content |
+------+----------+--------------------------------------+
| 1 | 3 | ... |
+------+----------+--------------------------------------+
This is counter-intuitive to me because of the expectation that a GROUP BY would reduce the number of results by eliminating duplicate values. However, in the above case, introduction of the GROUP BY does the opposite. Is this because of the MAX() function, and if so, how?
PS: The table is based on the SO question here: SQL select only rows with max value on a column. I was trying to understand the answer to that question, and in the process, came across the above situation.
EDIT:
I got the above results on sqlfiddle.com using its MySQL 5.6 engine, with no customization/configuration.

It is utilizing your MAX() function dependent on your GROUP BY clause. So, for your first query, you are saying: Give me the maximum rev for each id, whereas the second is just saying Give me the maximum rev in general.
Thanks to xQbert:
This does NOT mean that you are getting the row with the max rev in the latter case. It will take values from anywhere in the selection to use for your id and content fields.
You can read more about how SQL handles the GROUP BY statement here: Documentation

This because you are using a version previuos that mysql 5.7 ..these version allow the use of aggregated d function and select column not in group by ... this produce impredicatble result for the not aggregated column .. in mysql 5.7 this beahvior is not allowed ... you have an error if you in select not aggregated function not mentioned in group by
the correct sintax is obviuosly the first
SELECT id, MAX(rev) maxrev, content
FROM YourTable
GROUP BY id;

SELECT id, MAX(rev) maxrev, content FROM YourTable
GROUP BY id;
When you run this, as there are 2 distinct ids in the table you get two rows in the result, one per id with the max value. The grouping happens on the id column.
SELECT id, MAX(rev) maxrev, content
FROM YourTable;
If you remove the group by clause, you only get one row in the result corresponding to the max value in the entire table. There is no grouping by id.

Related

Bypassing the only_full_group_by restriction [duplicate]

This question already has answers here:
Retrieving the last record in each group - MySQL
(33 answers)
Closed 2 years ago.
I have this simple table:
mysql> select deviceId, eventName, loggedAt from foo;
+----------+------------+---------------------+
| deviceId | eventName | loggedAt |
+----------+------------+---------------------+
| 1 | foo | 2020-09-18 21:27:21 |
| 1 | bar | 2020-09-18 21:27:26 |
| 1 | last event | 2020-09-18 21:27:43 | <--
| 2 | xyz | 2020-09-18 21:27:37 |
| 2 | last event | 2020-09-18 21:27:55 | <--
| 3 | last one | 2020-09-18 21:28:04 | <--
+----------+------------+---------------------+
and I want to select one row per deviceId with the most recent loggedAt. I've marked those rows with an arrow in the table above for clarity.
If I append group by id in the above query, I get the notorious:
Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'foo.eventName' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
and I don't want to change the sql_mode.
I've come pretty close to what I want using:
select deviceId, any_value(eventName), max(loggedAt) from foo group by deviceId;
but obviously the any_value returns a random result.
How can I solve this?

ONLY_FULL_GROUP_BY is a good thing: it enforces fundamental SQL standard rules, about which MySQL has been lax for a long time. Even if you were disabling it, you would get the same result as what you are getting with any_value().
You have a top-1-per group problem, where you cant the entire row that has the most recent date for each device. Aggregation is not the right tool for that, what you need is to filter the dataset.
One option uses a correlated subquery:
select f.*
from foo f
where f.loggedat = (
select max(f1.loggedate) from foo where f1.deviceid = f.deviceid
)
In MySQL 8.0, you can also use row_number():
select *
from (
select f.*, row_number() over(partition by deviceid order by loggedat desc) rn
from foo f
) f
where rn = 1

SELECT Max index for multiple column values in the same table?

I have the following database structure:
post_id | language | index
Data looks like:
1 | en | 1
1 | en | 2
1 | en | 3
1 | fr | 1
1 | fr | 2
language is a shortcode like en, fr, etc. index is an incrementing integer. There is at least one language code in the database.
I need to get the max value of index for each language.
Currently i get all languages for a post
SELECT DISTINCT(language) FROM my_table WHERE post_id = ?
and iterate then over the language array to get the Max value for each language
SELECT MAX(index) FROM my_table WHERE post_id = ? AND language = 'language_code'
Is there a way to execute just one query to achieve this result?
The result should look like:
1 | en | 3
1 | fr | 2

You can use GROUP BY to achieve this.
You can check this link for more infos about the usage of GROUP BY with MAX: SQL MAX() with group by
SELECT post_id, language, MAX(index)
FROM my_table
GROUP BY post_id, language;
The result of this query will be all the languages associated with the max index.
post_id | language | MAX(index)
1 | en | 3
1 | fr | 2
Hope this helps!

SQL GROUP BY clause is used to group rows according to distinct values of column specified into GROUP BY clause
According to description as mentioned above as a solution to it please try executing following SQL SELECT QUERY
SELECT post_id,language, max(index)
FROM my_table group by language

SQL query to find duplicate rows and return both IDs

I have a table of customers:
id | name | email
--------------------------
1 | Rob | spam#email.com
2 | Jim | spam#email.com
3 | Dave | ham#email.com
4 | Fred | eggs#email.com
5 | Ben | ham#email.com
6 | Tom | ham#email.com
I'm trying to write an SQL query that returns all the rows with duplicate email addresses but... I'd like the query result to return the original ID and the duplicate ID. (The original ID is the first occurrence of the duplicate email.)
The desired result:
original_id | duplicate_id | email
-------------------------------------------
1 | 2 | spam#email.com
3 | 5 | ham#email.com
3 | 6 | ham#email.com
My research so far has indicated it might involve some kind of self join, but I'm stuck on the actual implementation. Can anyone help?

We could handle this using a join, but I might actually go for an option which generates a CSV list of id corresponding to duplicates:
SELECT
email,
GROUP_CONCAT(id ORDER BY id) AS duplicate_ids
FROM yourTable
GROUP BY email
HAVING COUNT(*) > 1
Functionally speaking, this gives you the same information you wanted in your question, but in what is a much simplified form in my opinion. Because we order the id values when concatenating, the original id will always appear first, on the left side of the CSV list. Also, if you have many duplicates your requested output could become verbose and harder to read.
Output:
Demo

select
orig.original_id,
t.id as duplicate_id,
orig.email
from t
inner join (select min(id) as original_id, email
from t
group by email
having count(*)>1) orig on orig.email = t.email
having t.id!=orig.original_id
By the subquery we can find all ids for emails with duplicates.
Then we join the subquery by email and for each one use minimal id as original
UPDATE: http://rextester.com/BLIHK20984 cloned #Tim Biegeleisen's answer

Counting votes in a MySQL table only once or twice

I've got the following table:
+-----------------+
| id| user | vote |
+-----------------+
| 1 | 1 | text |
| 2 | 1 | text2|
| 3 | 2 | text |
| 4 | 3 | text3|
| 5 | 2 | text |
+-----------------+
What I want to do is to count the "votes"
SELECT COUNT(vote), vote FROM table GROUP BY vote
That works fine. Output:
+-------------------+
| count(vote)| vote |
+-------------------+
| 3 | text |
| 1 | text2|
| 1 | text3|
+-------------------+
But now I only want to count the first or the first and the second vote from a user.
So result what I want is (if I count only the first vote):
+-------------------+
| count(vote)| vote |
+-------------------+
| 2 | text |
| 1 | text3|
+-------------------+
I tried to work with count(distinct...) but can get it work.
Any hint in the right direction?

You can do this in a single SQL statement with something like this:
SELECT vote, COUNT(vote)
FROM
(
SELECT MAX(user), vote
FROM table1
GROUP BY user
) d
GROUP BY vote
Note that this only gives you 1 vote not 1 or 2.

The easiest way would be to use one of the "row numbering" solutions listed in this SO question. Then your original query's almost there:
SELECT
COUNT(vote),
vote
FROM tableWithRowNumberAdded
WHERE MadeUpRowNumber IN (1,2)
GROUP BY vote
My alternative is much longer winded and calls for working tables. These can be "real" tables in your schema, or whatever flavour of intermediate resultsets you are comfortable with.
Start by getting the first vote for each user:
SELECT user, min(id) FROM table GROUP BY user
Put this in a working table; let's call it FirstVote. Next we can get each user's second vote, if any:
SELECT user, min(id) FROM table WHERE id not in (select id from FirstVote) GROUP BY user
Let's call the result of this SecondVote. UNION FirstVote to SecondVote, join this to the original table and group by vote. There's your answer!
SELECT
vote,
COUNT(*)
FROM table
INNER JOIN
(
SELECT id FROM FirstVote
UNION ALL
SELECT id FROM SecondVote
) as BothVotes
ON BothVotes.id = table.id
GROUP BY vote
Of course it could be structured as a single statement with multiple sub-queries but that would be horrendous to maintain, or read in this forum.

This is a very triky question for MySQL. On other systems there windowed functions: it performs a calculation across a set of table rows that are somehow related to the current row.
MySQL lacks this functionality. So one should look for a workaround. Here is the problem description and couple solutions suggested: MySQL and window functions.
I also assume that first 2 votes by the User can be determined by Id: earlier vote has smaller Id.
Based on this I would suggest this solution to your problem:
SELECT
Vote,
Count (*)
FROM
Table,
(
SELECT
user_id, SUBSTRING_INDEX(GROUP_CONCAT(Id ORDER BY user_id ASC), ',', 2) AS top_IDs_per_user
FROM
Table
GROUP BY
user_id
) s_top_IDs_per_User
WHERE
Table.user_id = s_top_IDs_per_User.User_id and
FIND_IN_SET(Id, s_top_IDs_per_User.top_IDs_per_user)
GROUP BY Vote
;

How can I make two condition in having clause

I have a table similar to:
domain | file | Number
------------------------------------
aaa.com | aaa.com_1 | 111
bbb.com | bbb.com_1 | 222
ccc.com | ccc.com_2 | 111
ddd.com | ddd.com_1 | 222
eee.com | eee.com_1 | 333
I need to query the number of Domains that share the same Number and their File name ends with _1. I tried the following:
select count(domain) as 'sum domains', file
from table
group by Number
having
count(Number) >1 and File like '%\_1';
It gives me:
sum domains | file
------------------------------
2 | aaa.com
2 | bbb.com
I expected to see the following:
sum domains | file
------------------------------
1 | aaa.com
2 | bbb.com
Because the Number 111 appears once with File ends with _1 and _2, so it should count 1 only. How can I apply the 2 conditions that I stated earlier correctly ?

As documented under SELECT Syntax:
The HAVING clause is applied nearly last, just before items are sent to the client, with no optimization.
In other words, it is applied after the grouping operation has been performed (in contrast with WHERE, which is performed before any grouping operation). See WHERE vs HAVING.
Therefore, your current query first forms the resultset from the following:
SELECT COUNT(domain) AS `sum domains`, file
FROM `table`
GROUP BY Number
See it on sqlfiddle:
| SUM DOMAINS | FILE |
---------------------------
| 2 | aaa.com_1 |
| 2 | bbb.com_1 |
| 1 | eee.com_1 |
As you can see, the values selected for the file column are merely one of the values from each group—as documented under MySQL Extensions to GROUP BY:
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
Your current query then proceeds to filter these results according to your HAVING clause:
HAVING COUNT(Number) > 1 AND file LIKE '%\_1'
With the values of file selected above, every single group matches on the second criterion; and the first two groups match on the first criterion. Therefore the results of the complete query are:
| SUM DOMAINS | FILE |
---------------------------
| 2 | aaa.com_1 |
| 2 | bbb.com_1 |
Following your comments above, you want to filter the records on file before grouping and then filter the resulting groups for those containing more than one match. Therefore use WHERE and HAVING respectively (and select Number instead of file to identify each group):
SELECT Number, COUNT(*) AS `sum domains`
FROM `table`
WHERE file LIKE '%\_1'
GROUP BY Number
HAVING `sum domains` > 1
See it on sqlfiddle:
| NUMBER | SUM DOMAINS |
------------------------
| 222 | 2 |

i am using by following
having ( SUM(qty) > 4 AND SUM(qty) < 15 )

You cannot have the file name in the SELECT statement if it is not also in the GROUP BY. You have to get your GROUP BY result than JOIN back to the original and add the filter logic like so:
SELECT *
FROM
(
select count(domain) as 'sum_domains', Number
from table
group by Number
having
count(Number) >1
) result
join table t on result.Number = t.Number
WHERE file like '%\_1'

Try the nested query below:
select count(domain) as 'sum domains', domain as fileName
from
(select domain, file from tableName
group by Number
having count(Number) >1) as temp
WHERE file like '%\_1';

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Why does SQL query with GROUP BY produce more rows? - mysql

Related

Bypassing the only_full_group_by restriction [duplicate]

SELECT Max index for multiple column values in the same table?

SQL query to find duplicate rows and return both IDs

Counting votes in a MySQL table only once or twice

How can I make two condition in having clause

Categories

Resources