Should I be concerned about duplicate entries in a GROUP BY clause? - mysql

I've come across some PHP code which is generating a query like the below (I've ommited the WHERE clause for clarity)
SELECT user.id FROM users
JOIN table2 ON table2.user_id = users.id
GROUP BY users.id, users.id
Now, it appears to work but I'm weary of the duplicates in GROUP BY. I'm aware that GROUP BY users.id, users.id doesn't make sense and that I can use SELECT distinct(user.id) without grouping.
I'm wondering if it's going to cause problems or if it will always just execute as
SELECT user.id FROM users
JOIN table2 ON table2.user_id = users.id
GROUP BY users.id

This is your query (with the initial "S" added):
SELECT user.id
FROM users JOIN
table2
ON table2.user_id = user.id
GROUP BY user.id, user.id;
This query is not syntactically correct. You have a table named users, which is never referenced, and an alias user which is never defined. If this code is being generated, then there is a bigger error than duplicates in the group by clause.
If the tables are set up correctly, then table2.user_id should always be a valid id in the users table. If so, then the following is a much simpler version of the query:
SELECT distinct t.user_id as id
FROM table2 t;
The difference between this and:
SELECT t.user_id as id
FROM table2 t
GROUP BY t.user_id;
is very minor (they should produce the same execution plans). However, because MySQL supports "hidden columns" -- that is, columns in the select clause that are not aggregated and not in the group by clause -- there are some situations where the distinct and group by are not identical. This does not apply to other databases. And, I would consider these situations to be poorly formed SQL, but they do exist. Here is an example:
SELECT distinct t.user_id as id, t.value
FROM table2 t;
Is not necessarily the same as:
SELECT t.user_id as id, t.value
FROM table2 t
GROUP BY t.user_id;
Although both are syntactically correct MySQL statements.

Related

Proper way to use MySQL GROUP BY for returning one result from a referenced table

I often have a situation with two tables in MySQL where I need one record for each foreign key. For example:
table post {id, ...}
table comment {id, post_id, ...}
SELECT * FROM comment GROUP BY post_id ORDER BY id ASC
-- Oldest comment for each post
or
table client {id, ...}
table payment {id, client_id, ...}
SELECT * FROM payment GROUP BY client_id ORDER BY id DESC
-- Most recent payment from each client
These queries often fail because the "SELECT list is not in GROUP BY clause" and contains nonaggregated columns.
Failed Solutions
I can usually work around this with a min()/max() but that creates a very slow query with mis-matched results (row with min(id) isn't equal to row with min(textfield))
SELECT min(id), min(textfield), ... FROM table GROUP BY fk_id
Adding all the columns to GROUP BY results in duplicate records (from the fk_id) which defeats the purpose of GROUP BY.
SELECT id, textfield, ... FROM table GROUP BY fk_id, id, textfield
Same idea as #GurV but using a join instead of a correlated subquery. The basic idea here is that the subquery finds, for each post which has comments, the oldest post and its corresponding id in the comments table. We then join back to comments again to restrict to the records we want.
SELECT t1.*
FROM comments t1
INNER JOIN
(
SELECT post_id, MIN(id) AS min_id
FROM comments
GROUP BY post_id
) t2
ON t1.post_id = t2.post_id AND
t1.id = t2.min_id
You can use a correlated query with aggregation to find out the earliest comment for each post:
select *
from comments c1
where id = (
select min(id)
from comments c2
where c1.post_id = c2.post_id
)
Compound index - comments(id, post_id) should be helpful.
If you are querying the whole table with many rows, then it will. This query is more useful and performant if you are querying for a small subset of posts. If you are querying the whole table, then #Tim's answer is better suited I think.

MySQL Join on LIKE statement

I need to count how many users are in each group in a database. Unfortunately the database design is not great and the users uids are stored against the group in the group table in a LONGTEXT field column name owncloudusers.
Example of owncloudusers data :
{i:0;s:36:"25C967BD-AF78-4671-88DC-FAD935FF1B26";i:1;s:36:"40D6866B-EA06-4F39-B509-8CE551CC1924";i:2;s:36:"7724C600-DE23-45C8-8BFD-326B0138E029";i:3;s:36:"D6FF37EC-11F4-471F-94C9-F3A28416CF1F";i:4;s:36:"F70C6D03-B7BA-44E4-B703-9AF3EED9BC03";}
I thought I could use a query with a LIKE on the join to compare the user's uid and look inside owncloudusers and see if there is a match.
The closest I have got is:
SELECT T1.owncloudname, count(T2.owncloud_name) AS Users
FROM oc_ldap_group_members T1
LEFT JOIN oc_ldap_user_mapping T2 ON T1.owncloudusers LIKE('%:"'||T2.owncloud_name||'";%')
GROUP BY owncloudname;
T1 table holds the groupings and who is tagged to that group
T2 table holds the users data. column owncloud_name is the users uid column
I have tried a few approaches I found on stackoverflow CONCAT on the LIKE join and LIKE('%:"'+T2.owncloud_name+'";%')
But no joy. The current statement I have returns 0 users against all the groups but I know this is not right.
I know it much but an issue on the join not sure where to go with it next.
Any assistance would be much appreciated.
I think you need a simple
SELECT T1.owncloudname, count(*) AS Users
FROM oc_ldap_group_members T1
LEFT JOIN oc_ldap_user_mapping T2 ON T1.owncloudusers LIKE '%T2.owncloud_name%'
GROUP BY owncloudname;
If you need concat try
SELECT T1.owncloudname, count(T2.owncloud_name) AS Users
FROM oc_ldap_group_members T1
LEFT JOIN oc_ldap_user_mapping T2 ON T1.owncloudusers
LIKE concat( '%',T2.owncloud_name,'%' )
GROUP BY owncloudname;
You were close, but mysql doesn't understand || as a text concatenation operator; use CONCAT() with the text parts passed as a list of values to build the LIKE operand:
SELECT T1.owncloudname, count(T2.owncloud_name) AS Users
FROM oc_ldap_group_members T1
LEFT JOIN oc_ldap_user_mapping T2
ON T1.owncloudusers LIKE CONCAT('%;', T2.owncloud_name, ';%')
GROUP BY owncloudname;
if there aint any performance issue,
could you try it with sub-query,
SELECT
T1.owncloudname,
(SELECT COUNT(*)
FROM oc_ldap_user_mapping AS T2
WHERE LOCATE(T2.owncloud_name,T1.owncloudusers)=1) AS Users
FROM
oc_ldap_group_members T1
GROUP BY
owncloudname;

Can't understand. Is this a subquery?

I have something in a query that I have to edit, that I don't understand.
There are 4 tables that are joined: tickets, tasks, tickets_users, users. The whole query is not important, but you have an example at the end of the post. What bugs me is this kind of code used many times in relation to other tables:
(SELECT name
FROM users
WHERE users.id=tickets_users.users_id
) AS RequesterName,
Is this a subquery with the tables users and tickets_users joined? What is this?
WHERE users.id=tickets_users.users_id
If this was a join I would have expected to see:
ON users.id = tickets_users.users_id
And how is this different from a typical join? Just use the same column definition: users.name and just join with the users table.
Can anyone enlighten me on the advanced SQL querying prowess of the original author?
The query looks like this:
SELECT
description,
(SELECT name
FROM users
WHERE users.id = tickets_users.users_id) AS RequesterName,
(SELECT description
FROM tickets
WHERE tickets.id = ticket_tasks.tickets_id) AS TicketDescription,
ticket_tasks.content AS TaskDescription
FROM
ticket_tasks
RIGHT JOIN
tickets ON ticket_tasks.tickets_id = tickets.id
INNER JOIN
tickets_users ON tickets_users.tickets_id = tickettasks.tickets_id
Thanks,
This is what is called a correlated subquery. To describe it in simple terms its doing a select inside a select.
However doing this more than once in ANY query is not recommended AT ALL.. the performance issue with this will be huge.
A correlated subquery will return a row by row comparison for each row of the select... if that doesnt make sense then think of it this way...
SELECT
id,
(SELECT id FROM tableA AS ta WHERE ta.id > t.id)
FROM
tableB AS t;
This will do for each row in tableB, every row in tableA will be selected and compared to tableB id.
NOTE:
If you have 100 rows in all 4 tables and you do a correlated subquery for each one then you are doing 100*100*100*100 row comparisons. thats 100,000,000 (one hundred million) comparisons!
A correlated subquery is NOT a join, but rather a subquery..
SELECT *
FROM
(SELECT id FROM t -- this is a subquery
) AS temp
However, JOINs are different... generally you can do it one of these two ways
This is the faster way
SELECT *
FROM t
JOIN t1 ON t1.id = t.id
This is the slower way
SELECT *
FROM t, t1
WHERE t1.id = t.id
what the second join is doing is making the Cartesian Product of the two tables and then filtering out the extra stuff in the WHERE clause as opposed to the first JOIN that filters as it joins.
For the different types of joins theres a few and all are useful in their prospective actions..
INNER JOIN (same as JOIN)
LEFT JOIN
RIGHT JOIN
LEFT OUTER JOIN
RIGHT OUTER JOIN
In mysql FULL JOIN or FULL OUTER JOIN does not exist.. so in order to do a FULL join you need to combine a LEFT and RIGHT join. See this link for a better understanding of what joins do with Venn diagrams LINK
REMEMBER this is for SQL so it includes the FULL joins as well. those don't work in MySQL.

Inner Join SQL Syntax

I've never done an inner join SQL statement before, so I don't even know if this is the right thing to use, but here's my situation.
Table 1 Columns: id, course_id, unit, lesson
Table 2 Columns: id, course_id
Ultimately, I want to count the number of id's in each unit in Table 1 that are also in Table 2.
So, even though it doesn't work, maybe something like....
$sql = "SELECT table1.unit, COUNT( id ) as count, table2.id, FROM table1, table2, WHERE course_id=$im_course_id GROUP BY unit";
I'm sure the syntax of what I'm wanting to do is a complete fail. Any ideas on fixing it?
SELECT unit, COUNT( t1.id ) as count
FROM table1 as t1 inner JOIN table2 as t2
ON t1.id = t2.id
GROUP BY unit
hope this helps.
If I understand what you want (maybe you could post an example input and output?):
SELECT unit, COUNT( id ) as count
FROM table1 as t1 JOIN table2 as t2
ON t1.id = t2.id
GROUP BY unit
Okay, so there are a few things going on here. First off, commas as joins are deprecated so they may not even be supported (depending on what you are using). You should probably switch to explicitly writing inner join
Now, whenever you have any sort of join, you also need on. You need to tell sql how it should match these two tables up. The on should come right after the join, like this:
Select *
From table1 inner join table2
on table1.id = table2.id
and table1.name = table2.name
You can join on as many things as you need by using and. This means that if the primary key of one table is several columns, you can easily create a one-to-one match between tables.
Lastly, you may be having issues because of other general syntax errors in your query. A comma is used to separate different pieces of information. So in your query,
SELECT table1.unit, COUNT( id ) as count, table2.id, FROM ...
The comma at the end of the select shouldn't be there. Instead this should read
SELECT table1.unit, COUNT( id ) as count, table2.id FROM ...
This is subtle, but the sql query cannot run with the extra comma.
Another issue is with the COUNT( id ) that you have. Sql doesn't know which id to count since table1 and table2 both have ids. So, you should use either count(table1.id) or count(table2.id)

SQL - Duplicates in join

I have a query:
SELECT DISTINCT *
FROM table1 AS s
LEFT OUTER JOIN table2 AS t
ON s.s_id = t.t_id
WHERE (
s.body LIKE '%string%'
OR t.name LIKE '%string%'
)
ORDER BY s.time DESC
but I am still getting duplicate tuples. Why is this?
GROUP BY s.s_id
was the solution.
The result doesn't contain absolutely equal rows here so technically they aren't duplicated
To get rid of duplicates, you need to SELECT DISTINCT or GROUP BY only fields you need non-duplicated and outer join the rest data in subquery on the corresponding key values, taking only 1 (first or last or whatever) row from them.