MySql Group Concat - Don't want rows excluded from Where clause

MySql Group Concat - Don't want rows excluded from Where clause - mysql

Okay, so I have a case where I want to do a Group Concat, but also use the same column that's in the Group Concat as a Where clause further in the statement, but not exclude the results in the Group Concat because of the Where. Got it? Hahah here's what I mean:
SELECT posts.id, post.post_date,
GROUP_CONCAT(DISTINCT terms.name ORDER BY terms.name DESC SEPARATOR ';')
FROM posts
INNER JOIN terms on terms.post_id = posts.id
WHERE terms.name = "kittens";
So I want to get the posts where the post's terms.name = "kittens", however the post can have multiple terms.name, such as "kittens", "cats", "cuteness", etc etc. So in the Group Concat I want the result to be "kittens;cats;cuteness" (each of these being a separate entry related by posts.id). However with "kittens" in the Where clause, all the Group Concat returns is "kittens".
If I said "WHERE terms.name = 'kittens' OR terms.name = 'cats'", the Group Concat returns "kittens;cats"... but I only want to be searching by one terms.name and get the rest of the terms in the Group Concat.
How can I get around this?

Use Having instead of Where. But you also need to include terms.name in select clause like this :
SELECT posts.id, post.post_date, terms.name,
GROUP_CONCAT(DISTINCT terms.name ORDER BY terms.name DESC SEPARATOR ';')
FROM posts
INNER JOIN terms on terms.post_id = posts.id
HAVING terms.name = "kittens";

The query you need is:
SELECT p.id, p.post_date,
GROUP_CONCAT(DISTINCT t.name ORDER BY t.name DESC SEPARATOR ';')
FROM posts p
INNER JOIN terms s ON s.post_id = p.id # "s" from "search"
INNER JOIN terms t ON t.post_id = p.id
WHERE s.name = "kittens"
GROUP BY p.id
How it works
If I understand correctly the question, you need to select all the terms of the posts that have the term "kitten". It requires joining the table terms twice: once to get all the posts that have the term "kittens" and once again to get the terms of the posts selected this way.
This is what the query does: it joins the table posts aliased as "p" ("p" from "post") with table terms aliased as "s" ("s" from "search") and selects only the pairs (post, term) where "term" is "kittens". Then it joins the table terms again as "t" ("t" from "term") and matches the posts having the term "kittens" with all their terms.
The GROUP BY clause creates one group from all the rows having the same posts.id. Since id is the PK of table posts there is no need to add posts.post_date to the GROUP BY clause, p.post_date is functionally dependent on p.id.

Related

Nested query performance

I have two queries below. The first one has a nested select. The second one makes use of a group by clause.
select
posts.*,
(select count(*) from comments where comments.post_id = posts.id and comments.is_approved = 1) as comments_count
from
posts
select
posts.*,
count(comments.id) comments_count
from
posts
left join comments on
comments.post_id = posts.id
group by
posts.*
From my understanding the first query is worse because it has to do a select for each record in posts where as the second query does not.
Is this true or false?

As with all performance questions, you should test the performance on your system with your data.
However, I would expect the first to perform better, with the right indexes. The right index for:
select p.*,
(select count(*)
from comments c
where c.post_id = p.id and c.is_approved = 1
) as comments_count
from posts p
is comments(post_id, is_approved).
MySQL implements a group by by doing a file sort. This version saves a file sort on all the data. My guess is that will be faster than the second method.
As a note: group by posts.* is not valid syntax. I assume this was intended for illustration purposes only.

This is the standard way I would do it (the use of LEFT JOIN, and SUM lets you also know which posts have no comments.)
SELECT posts.*
, SUM(IF(comments.id IS NULL, 0, 1)) AS comments_count
FROM posts
LEFT JOIN comments USING (post_id)
GROUP BY posts.post_id
;
But if I were trying for faster, this might be better.
SELECT posts.*, IFNULL(subQ.comments_count, 0) AS comments_count
FROM posts
LEFT JOIN (
SELECT post_id, COUNT(1) AS comments_count
FROM comments
GROUP BY post_id
) As subQ
USING (post_id)
;

After a bit more research I found no time difference between the two queries
Benchmark.bm do |b|
b.report('joined') do
1000.times do
ActiveRecord::Base.connection.execute('
select
p.id,
(select count(c.id) from comments c where c.post_id = p.id) comment_count
from
posts l;')
end
end
b.report('nested') do
1000.times do
ActiveRecord::Base.connection.execute('
select
p.id,
count(c.id) comment_count
from
posts File.join(File.dirname(__FILE__), *%w[rel path here])
left join comments c on
c.post_id = p.id
group by
p.id;')
end
end
end
user system total real
nested 2.120000 0.900000 3.020000 ( 3.349015)
joined 2.110000 0.990000 3.100000 ( 3.402986)
However I did notice that when running an explain for both queries, more indexes are possible in the first query. Which makes me think it is a better option if the attributes needed in the select changed.

SELECT columns different from GROUP BY columns

Having this database schema (just for illustration purpose)
[articles (id_article, title)]
[articles_tags (id_tag, id_article)]
[tags (id_tag, name)]
using MySQL it's possible to do:
SELECT a.title, COUNT(at.id_tag) tag_count FROM articles a
JOIN articles_tags at ON a.id_article = at.id_article
JOIN tags t ON t.id_tag = at.id_tag
GROUP BY a.id_article
ORDER BY tag_count DESC
resulting in a result where you have on each row article's title and article's tag count, e.g.
mysql for beginner | 8
ajax for dummies | 4
Since ORACLE doesn't support non-aggregated columns in SELECT statement, is it possible to do this anyhow in one query? When you fulfill ORACLE's needs by either adding aggregate function to SELECT statement or adding the column to GROUP BY statement you already get different results.
Thanks in advance

Yes, it's possible. Return id_article in the SELECT list, instead of title, and wrap that whole query in parens to make it an inline view, and then select from that, and a join to the articles table to get the associated title.
For example:
SELECT b.title
, c.tag_count
FROM ( SELECT a.id_article
, COUNT(at.id_tag) tag_count
FROM articles a
JOIN articles_tags at ON a.id_article = at.id_article
JOIN tags t ON t.id_tag = at.id_tag
GROUP BY a.id_article
) c
JOIN articles b
ON b.id_article = c.id_article
ORDER BY c.tag_count DESC
You can also evaluate whether you really need the articles table included in the inline view. We could do a GROUP BY at.id_article instead.
I think this returns an equivalent result:
SELECT b.title
, c.tag_count
FROM ( SELECT at.id_article
, COUNT(at.id_tag) tag_count
FROM articles_tags at
JOIN tags t ON t.id_tag = at.id_tag
GROUP BY at.id_article
) c
JOIN articles b
ON b.id_article = c.id_article
ORDER BY c.tag_count DESC

How to write mysql subselect properly with conditions and limiting

I have three Tables:
Posts:
id, title, authorId, text
authors:
id, name, country
Comments:
id, authorId, text, postId
I want to run a mysql command which selects the first 5 posts which were written by authors, whose country is 'Ireland'. In the same call, I want to retrieve all the comments for those five posts, and also the author info.
I've tried the following:
SELECT posts.id as 'posts.id', posts.title as 'posts.title' (etc. etc. list all fields in three table)
FROM
(SELECT * FROM posts, authors WHERE authors.country = 'ireland' AND authors.id = posts.authorId LIMIT 0, 5 ) as posts
LEFT JOIN
comments ON comments.postId = posts.id,
authors
WHERE
authors.id = posts.authorId
I had to include every field with an alias ^ because there was a duplicate for id, and more fields in future may become duplicates as I'm looking for a generic solution.
My two questions are:
1) I am getting a duplicate field entry from within my subselect for id, so do I have to list out all my fields as aliases again within the subselect or is there only one field I need for a subselect
2) Is there a way to auto-alias my call? At the moment I've just aliased every field in the main select but can it do this for me so there are no duplicates?
Sorry if this isn't very clear it's a bit of a messy problem! Thanks.

You are doing an unnecessary join back to the author table in your query. You get all the fields you want in the posts subquery. I would rename this to something other than an existing table, perhaps pa to indicate posts and authors.
You say you want the first 5 posts, but have no order clause. A better form of the query is:
SELECT pa.id as 'posts.id', pa.title as 'posts.title' (etc. etc. list all fields in three table)
FROM (SELECT *
FROM posts join
authors
on authors.id = posts.authorId
WHERE authors.country = 'ireland'
order by post.date
LIMIT 0, 5
) pa LEFT JOIN
comments c
ON c.postId = pa.id
Note that this returns the first five posts and their authors (as specified in the question). But one author may be responsible for all five posts.
In MySQL, you can use * and it will get rid of duplicate aliases in the from clause. I think this is dangerous. It is better to list all the columns you want.

To answer your questions:
You can select as many (or as few) columns as you need from a sub-query
You do not need to join the authors table again since you already selected all fields in the sub-query (and so get rid of duplicate columns names).
A few additional remarks...
... about the JOIN syntax
Prefer the form
FROM t1 JOIN t2 ON (t1.fk = t2.pk)
to the obsolete, obscure
FROM t1, t2 WHERE t1.fk = t2.pk
... about the use of a LIMIT clause without an ORDER BY clause
The order in which rows are returned by a SELECT statement without an ORDER BY clause is undefined. Therefore, a LIMIT n clause without an ORDER BY clause could return any n rows in theory.
Your final query should look like this:
SELECT *
FROM (
SELECT *
FROM posts
JOIN authors ON (authors.id = posts.authorId )
WHERE authors.country = 'ireland'
ORDER BY posts.id DESC -- assuming this column is monotonically increasing
LIMIT 5
) AS last_posts
LEFT JOIN comments ON ( comments.postId = last_posts .id )

Do I need this field in GROUP BY to not show duplicate rows

I am creating a query below where a it retrieves data depending on the term(s) the user entered in the a search bar. Now what I am trying to do is not display duplicate data, so if there are 2 rows where all the fields are exactly the same, then it is a duplicate row, so it only shows this row once not multiple times. Now I think it seems to do this as I see no duplicate rows but all I did was do a GROUP BY with all the SELECT fields except for the Answer field as it doesn't let me have a group concat in the GROUP BY clause.
But what my question is that do I need that field in the GROUP Y clause to not show duplicate rows or is it not really needed?
SELECT
q.QuestionContent,
o.OptionType,
q.NoofAnswers,
GROUP_CONCAT(DISTINCT Answer ORDER BY Answer SEPARATOR ',') AS Answer,
r.ReplyType,
q.QuestionMarks
FROM Question q
LEFT JOIN Answer an
ON q.QuestionId = an.QuestionId
LEFT JOIN Reply r
ON q.ReplyId = r.ReplyId
LEFT JOIN Option_Table o
ON q.OptionId = o.OptionId
WHERE ".implode(" AND ", array_fill(0, $numTerms, "q.QuestionContent LIKE ?"))."
GROUP BY q.QuestionContent,
o.OptionType,
q.NoofAnswers,
r.ReplyType,
q.QuestionMarks
ORDER BY ".implode(", ", array_fill(0, $numTerms, "IF(q.QuestionContent LIKE ?, 1, 0) DESC"))."

No, you do not need the group_concat() in the select. The group by will ensure that any particular combination of values will appear once for the columns in the group by. Because these are guaranteed to be distinct, you don't have to worry about any other columns.
The group_concat() is a calculated column, based on the summaries. You are not permitted to have such columns in a group by statement. If you want to aggregate on them again, then you need to use a subquery.

I would think DISTINCT might work better/be simpler:
SELECT
DISTINCT q.QuestionContent,

sql with GROUP_CONCAT

I'm running this SQL query
$sql = "select images.image, images.comment as feedDescription,
customers.fullName, CONCAT('[', GROUP_CONCAT(DISTINCT likes.uid),']') as likes,
CONCAT('[', GROUP_CONCAT(DISTINCT CONCAT('{\"userid\":\"', comments.fid, '\", \"comment\":\"', comments.comment, '\"}') separator ','),']') as comments
FROM images
LEFT JOIN customers on images.client_id = customers.client_id
LEFT JOIN likes on images.image = likes.image
LEFT JOIN comments on images.image = comments.image
WHERE images.fid=:userID
ORDER BY images.image LIMIT $offset,$limit";
the only problem is that I am getting only the first row ...
I have images table, customers table (taking the name of the customer by the id i got in the images), likes table (people who did "like" on the image) and comments (people who wrote "comments" on the table)

You are using an aggregation function on a query, so MySQL is automatically returning only one row -- the aggregation of all the data.
In other databases, this would produce an error, because you have a mixture of aggregated and non-aggregated columns. This is a (mis)feature of MySQL called "hidden columns".
Add a group by to your query to fix the problem:
group by images.image, images.comment, customers.fullName
Be sure to add this after the WHERE clause and before the ORDER BY.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

MySql Group Concat - Don't want rows excluded from Where clause - mysql

Related

Nested query performance

SELECT columns different from GROUP BY columns

How to write mysql subselect properly with conditions and limiting

Do I need this field in GROUP BY to not show duplicate rows

sql with GROUP_CONCAT

Categories

Resources