Using GROUP_CONCAT on subquery in MySQL - mysql

I have a MySQL query in which I want to include a list of ID's from another table. On the website, people are able to add certain items, and people can then add those items to their favourites. I basically want to get the list of ID's of people who have favourited that item (this is a bit simplified, but this is what it boils down to).
Basically, I do something like this:
SELECT *,
GROUP_CONCAT((SELECT userid FROM favourites WHERE itemid = items.id) SEPARATOR ',') AS idlist
FROM items
WHERE id = $someid
This way, I would be able to show who favourited some item, by splitting the idlist later on to an array in PHP further on in my code, however I am getting the following MySQL error:
1242 - Subquery returns more than 1 row
I thought that was kind of the point of using GROUP_CONCAT instead of, for example, CONCAT? Am I going about this the wrong way?
Ok, thanks for the answers so far, that seems to work. However, there is a catch. Items are also considered to be a favourite if it was added by that user. So I would need an additional check to check if creator = userid. Can someone help me come up with a smart (and hopefully efficient) way to do this?
Thank you!
Edit: I just tried to do this:
SELECT [...] LEFT JOIN favourites ON (userid = itemid OR creator = userid)
And idlist is empty. Note that if I use INNER JOIN instead of LEFT JOIN I get an empty result. Even though I am sure there are rows that meet the ON requirement.

OP almost got it right. GROUP_CONCAT should be wrapping the columns in the subquery and not the complete subquery (I'm dismissing the separator because comma is the default):
SELECT i.*,
(SELECT GROUP_CONCAT(userid) FROM favourites f WHERE f.itemid = i.id) AS idlist
FROM items i
WHERE i.id = $someid
This will yield the desired result and also means that the accepted answer is partially wrong, because you can access outer scope variables in a subquery.

You can't access variables in the outer scope in such queries (can't use items.id there). You should rather try something like
SELECT
items.name,
items.color,
CONCAT(favourites.userid) as idlist
FROM
items
INNER JOIN favourites ON items.id = favourites.itemid
WHERE
items.id = $someid
GROUP BY
items.name,
items.color;
Expand the list of fields as needed (name, color...).

I think you may have the "userid = itemid" wrong, shouldn't it be like this:
SELECT ITEMS.id,GROUP_CONCAT(FAVOURITES.UserId) AS IdList
FROM FAVOURITES
INNER JOIN ITEMS ON (ITEMS.Id = FAVOURITES.ItemId OR FAVOURITES.UserId = ITEMS.Creator)
WHERE ITEMS.Id = $someid
GROUP BY ITEMS.ID

The purpose of GROUP_CONCAT is correct but the subquery is unnecessary and causing the problem. Try this instead:
SELECT ITEMS.id,GROUP_CONCAT(FAVOURITES.UserId)
FROM FAVOURITES INNER JOIN ITEMS ON ITEMS.Id = FAVOURITES.ItemId
WHERE ITEMS.Id = $someid
GROUP BY ITEMS.ID

Yes, soulmerge's solution is ok. But I needed a query where I had to collect data from more child tables, for example:
main table: sessions (presentation sessions) (uid, name, ..)
1st child table: events with key session_id (uid, session_uid, date, time_start, time_end)
2nd child table: accessories_needed (laptop, projector, microphones, etc.) with key session_id (uid, session_uid, accessory_name)
3rd child table: session_presenters (presenter persons) with key session_id (uid, session_uid, presenter_name, address...)
Every Session has more rows in child tables tables (more time schedules, more accessories)
And I needed to collect in one collection for every session to display in ore row (some of them):
session_id | session_name | date | time_start | time_end | accessories | presenters
My solution (after many hours of experiments):
SELECT sessions.uid, sessions.name,
,(SELECT GROUP_CONCAT( `events`.date SEPARATOR '</li><li>')
FROM `events`
WHERE `events`.session_id = sessions.uid ORDER BY `events`.date) AS date
,(SELECT GROUP_CONCAT( `events`.time_start SEPARATOR '</li><li>')
FROM `events`
WHERE `events`.session_id = sessions.uid ORDER BY `events`.date) AS time_start
,(SELECT GROUP_CONCAT( `events`.time_end SEPARATOR '</li><li>')
FROM `events`
WHERE `events`.session_id = sessions.uid ORDER BY `events`.date) AS time_end
,(SELECT GROUP_CONCAT( accessories.name SEPARATOR '</li><li>')
FROM accessories
WHERE accessories.session_id = sessions.uid ORDER BY accessories.name) AS accessories
,(SELECT GROUP_CONCAT( presenters.name SEPARATOR '</li><li>')
FROM presenters
WHERE presenters.session_id = sessions.uid ORDER BY presenters.name) AS presenters
FROM sessions
So no JOIN or GROUP BY needed.
Another useful thing to display data friendly (when "echoing" them):
you can wrap the events.date, time_start, time_end, etc in "<UL><LI> ... </LI></UL>" so the "<LI></LI>" used as separator in the query will separate the results in list items.
I hope this helps someone. Cheers!

Related

Mysql: join across five tables

I have a mysql database with this setup (omitting fields not relevant to this question)
users
id #primary key
user_group_teachers
id #primary key
teacher_id #foreign key to users.id
user_group_id #foreign key to users_groups.id
user_groups
id #primary key
user_group_members
id #primary key
pupil_id #foreign key to pupils.id
user_group_id #foreign key to users_groups.id
pupils
id #primary key
I have a collection of user ids in an array, called "user_ids".
For each of those user ids, i want to collect the pupil ids associated with that user via the
user -> user_group_teachers -> user_groups -> user_group_members -> pupils
association. Ie, some kind of join across the tables.
So, i'd like to get some kind of result where the rows look like
[1, [6,7,8,9]]
where 1 is the teacher id, and [6,7,8,9] are the ids of pupils. I'd only like each pupil id to appear once in the second list.
Can anyone tell me how to do this in as small a number of queries as possible (or, more broadly, as efficiently as possible). I will probably usually have between 1000 and 10,000 ids in user_ids.
I'm doing this in a ruby script, so can store the results as variables (arrays or hashes) in between queries, if that makes things simpler.
Thanks! max
EDIT for Lyhan
Lyhan - thanks but your solution doesn't seem to work. For example in the first row of the results, using your method, i have
| user_id | group_concat(pupils.id separator ",")
| 1 | 2292
But, if i get the associated pupil ids in a slower, step by step way, then i get different results:
select group_concat(user_group_teachers.user_group_id separator ",")
from user_group_teachers
where user_group_teachers.teacher_id = 1
group by user_group_teachers.teacher_id;
I get
| group_concat(user_group_teachers.user_group_id separator ",")
| 12,1033,2117,2280,2281
Plugging these values (user_group ids) into another query:
select group_concat(user_group_members.pupil_id separator ",")
from user_group_members
where user_group_members.user_group_id in (12,1033,2117,2280,2281)
group by user_group_members.user_group_id;
I get
| group_concat(user_group_members.pupil_id separator ",")
| 47106,47107
Thanks for the group_concat method btw, that's handy :)
I made a couple comments above that are important to the solution for this, but I think you could start with these two queries to see if it gets you far enough along to get what you need.
To get ordered lists for a teacher for pupils across all groups, you could do this:
select distinct t.teacher_id, m.pupil_id
from user_groups g
inner join user_group_teachers t
on t.user_group_id = g.id
inner join user_group_members m
on t.user_group_id = g.id
order by t.teacher_id, m.pupil_id
To get ordered lists for a teacher for pupils with the relationship to group in tact, you could do this:
select g.id, t.teacher_id, m.pupil_id
from user_groups g
inner join user_group_teachers t
on t.user_group_id = g.id
inner join user_group_members m
on t.user_group_id = g.id
order by g.id, t.teacher_id, m.pupil_id
You would have to walk these result sets and transform them into the nested arrays, but it is the data you wanted.
Update: Update: If the data set is too large or you do not want to walk a single result set, then you could do this to emulate the results of the first query above and build your sub-arrays based on query result sets:
/* Use this query to drive the batch */
select distinct t.teacher_id
from user_groups_teachers t
order by t.teacher_id
/* Inside a loop based on first query result, pull out the array of pupils for a teacher */
select distinct m.pupil_id
from user_groups_members m
inner join user_groups g
on g.id = m.user_group_id
inner join user_groups_teachers t
on t.user_group_id = g.id
where t.teacher_id = /* parameter */
order by m.pupil_id
This is what i came up with:
select pupil_group_teachers.teacher_id, group_concat(pupil_group_members.pupil_id separator ',')
from pupil_group_teachers join pupil_groups on pupil_group_teachers.pupil_group_id = pupil_groups.id
join pupil_group_members on pupil_group_members.pupil_group_id = pupil_groups.id
group by pupil_group_teachers.teacher_id;
it seems to work, and is really fast. Lyhan (who has since deleted his answer) and David Fleeman both helped me figure it out. Cheers guys.

Joining Results From Another Table

I'm dealing with a large query that maps data from one table into a CSV file, so it essentially looks like a basic select query--
SELECT * FROM item_table
--except that * is actually a hundred lines of CASE, IF, IFNULL, and other logic.
I've been told to add a "similar items" line to the select statement, which should be a string of comma-separated item numbers. The similar items are found in a category_table, which can join to item_table on two data points, column_a and column_b, with category_table.category_id having the data that identifies the similar items.
Additionally, I've been told NOT to use a subquery.
So I need to join category_table and group_concat item numbers from that table having the same category_id value (but not having the item number of whatever the current record would be).
If I can only do it with a subquery regardless of the instructions, I will accept that, but I want to do it with a join and group_concat as instructed if possible--I just can't figure it out. How can I do this?
You can make use of a mySQL "feature" called hidden columns.
I am going to assume you have an item id in the item table that uniquely identifies each row. And, if I have your logic correct, the following query does what you want:
select i.*, group_concat(c.category_id)
from item_table i left outer join
category_table c
on i.column_a = c.column_a and
i.column_b = c.column_b and
i.item_id <> c.category_id
group by i.item_id
I think this is what you're looking for, although I wasn't sure what uniquely identified your item_table so I used column_a and column_b (those may be incorrect):
SELECT
...,
GROUP_CONCAT(c.category_id separator ',') CategoryIDs
FROM item_table i
JOIN category_table ct ON i.column_a = ct.column_a AND
i.column_b = ct.column_b
GROUP BY i.column_a, i.column_b
I've used a regular INNER JOIN, but if the category_table might not have any related records, you may need to use a LEFT JOIN instead to get your desired results.
Maybe something like this?
SELECT i.*, GROUP_CONCAT(c.category_id) AS similar_items
FROM item_table i
INNER JOIN category_table c ON (i.column_a = c.column_a AND
i.column_b = c.column_b)
GROUP BY i.column_a, i.column_b

Attempting to Join 3 tables in MySQL

I have three tables that are joined. I almost have the solution but there seems to be one small problem going on here. Here is statement:
SELECT items.item,
COUNT(ratings.item_id) AS total,
COUNT(comments.item_id) AS comments,
AVG(ratings.rating) AS rate
FROM `items`
LEFT JOIN ratings ON (ratings.item_id = items.items_id)
LEFT JOIN comments ON (comments.item_id = items.items_id)
WHERE items.cat_id = '{$cat_id}' AND items.spam < 5
GROUP BY items_id ORDER BY TRIM(LEADING 'The ' FROM items.item) ASC;");
I have a table called items, each item has an id called items_id (notice it's plural). I have a table of individual user comments for each item, and one for ratings for each item. (The last two have a corresponding column called 'item_id').
I simply want to count comments and ratings total (per item) separately. With the way my SQL statement is above, they are a total.
note, total is the total of ratings. It's a bad naming scheme I need to fix!
UPDATE: 'total' seems to count ok, but when I add a comment to 'comments' table, the COUNT function affects both 'comments' and 'total' and seems to equal the combined output.
Problem is you're counting results of all 3 tables joined. Try:
SELECT i.item,
r.ratetotal AS total,
c.commtotal AS comments,
r.rateav AS rate
FROM items AS i
LEFT JOIN
(SELECT item_id,
COUNT(item_id) AS ratetotal,
AVG(rating) AS rateav
FROM ratings GROUP BY item_id) AS r
ON r.item_id = i.items_id
LEFT JOIN
(SELECT item_id,
COUNT(item_id) AS commtotal
FROM comments GROUP BY item_id) AS c
ON c.item_id = i.items_id
WHERE i.cat_id = '{$cat_id}' AND i.spam < 5
ORDER BY TRIM(LEADING 'The ' FROM i.item) ASC;");
In this query, we make the subqueries do the counting properly, then send that value to the main query and filter the results.
I'm guessing this is a cardinality issue. Try COUNT(distinct comments.item_id)

SQL query finding best categories match

I have categories and multiple categorization for my Items. How to find, for specific Item, other Items that have same categories, ordered by most categories matching (aka best match)?
My table structure is roughly:
Item Table
ID
Name
...
Category Table
ID
Name
...
Categorization Table
ID
Item_ID
Category_ID
...
To find all Items having similar categories, for example, I use
SELECT `items`.*
FROM `items`
INNER JOIN `categorizations` c1
ON c1.`item_id` = `items`.`id`
INNER JOIN `categorizations` c2
ON c2.`item_id` = <Item_ID>
WHERE `c1.`category_id` = c2.`category_id`
This should produce a table of counts of category matches between each pair of items that share at least one category.
select i1.item_id,i2.item_id,count(1)
from items i1
join categorizations c1 on c1.item_id=i1.item_id
join categorizations c2 on c2.category_id=c1.category_id
join items i2 on c2.item_id=i2.item_id
where i1.item_id <> i2.item_id
group by i1.item_id,i2.item_id
order by count(1)
I suspect that it may be a bit slow, though. I don't have an instance of MySQL at the moment to try it out.
Something like:
select item_id, count(id)
from item_category ic
where exists(
select category_id
from item_category ic2
where ic2.item_id = #item_id
and ic2.category_id = ic.category_id )
where item_id <> #item_id
group by item_id
order by count(item_id) desc
An alternative method which I have just implemented to solve this problem is using bitwise operators to speed things up. In MySQL this method only works if you have 64 or less categories as the bit functions are 64 bit.
1) Assign each category a unique integer value which is a power of 2.
2) For each item sum the category values that the item is in to create a 64 bit int representing all of the categories that the item is in.
3) To compare an item to another do something like:
SELECT id, BIT_COUNT(item1categories & item2categories) AS numMatchedCats FROM tablename HAVING numMatchedCats > 0 ORDER BY numMatchedCats DESC
The BIT_COUNT() function might be MySQL specific so an alternative may well be required for any other DB.
MySQL bit functions used are explained here:
http://dev.mysql.com/doc/refman/5.0/en/bit-functions.html

How can I make these two queries into one?

I have two tables, one for downloads and one for uploads. They are almost identical but with some other columns that differs them. I want to generate a list of stats for each date for each item in the table.
I use these two queries but have to merge the data in php after running them. I would like to instead run them in a single query, where it would return the columns from both queries in each row grouped by the date. Sometimes there isn't any download data, only upload data, and in all my previous tries it skipped the row if it couldn't find log data from both rows.
How do I merge these two queries into one, where it would display data even if it's just available in one of the tables?
SELECT DATE(upload_date_added) as upload_date, SUM(upload_size) as upload_traffic, SUM(upload_files) as upload_files
FROM packages_uploads
WHERE upload_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY upload_date
ORDER BY upload_date DESC
SELECT DATE(download_date_added) as download_date, SUM(download_size) as download_traffic, SUM(download_files) as download_files
FROM packages_downloads
WHERE download_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY download_date
ORDER BY download_date DESC
I want to get result rows like this:
date, upload_traffic, upload_files, download_traffic, download_files
All help appreciated!
Your two queries can be executed and then combined with the UNION cluase along with an extra field to identify Uploads and Downloads on separate lines:
SELECT
'Uploads' TransmissionType,
DATE(upload_date_added) as TransmissionDate,
SUM(upload_size) as TransmissionTraffic,
SUM(upload_files) as TransmittedFileCount
FROM
packages_uploads
WHERE upload_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY upload_date
ORDER BY upload_date DESC
UNION
SELECT
'Downloads',
DATE(download_date_added),
SUM(download_size),
SUM(download_files)
FROM packages_downloads
WHERE download_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY download_date
ORDER BY download_date DESC;
Give it a Try !!!
What you're asking can only work for rows that have the same add date for upload and download. In this case I think this SQL should work:
SELECT
DATE(u.upload_date_added) as date,
SUM(u.upload_size) as upload_traffic,
SUM(u.upload_files) as upload_files,
SUM(d.download_size) as download_traffic,
SUM(d.download_files) as download_files
FROM
packages_uploads u, packages_downloads d
WHERE u.upload_date_added = d.download_date_added
AND u.upload_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY date
ORDER BY date DESC
Without knowing the schema is hard to give the exact answer so please see the following as a concept not a direct answer.
You could try left join, im not sure if the table package exists but the following may be food for thought
SELECT
p.id,
up.date as upload_date
dwn.date as download_date
FROM
package p
LEFT JOIN package_uploads up ON
( up.package_id = p.id WHERE up.upload_date = 'etc' )
LEFT JOIN package_downloads dwn ON
( dwn.package_id = p.id WHERE up.upload_date = 'etc' )
The above will select all the packages and attempt to join and where the value does not join it will return null.
There is number of ways that you can do this. You can join using primary key and foreign key. In case if you do not have relationship between tables,
You can use,
LEFT JOIN / LEFT OUTER JOIN
Returns all records from the left table and the matched
records from the right table. The result is NULL from the
right side when there is no match.
RIGHT JOIN / RIGHT OUTER JOIN
Returns all records from the right table and the matched
records from the left table. The result is NULL from the left
side when there is no match.
FULL OUTER JOIN
Return all records when there is a match in either left or right table records.
UNION
Is used to combine the result-set of two or more SELECT statements.
Each SELECT statement within UNION must have the same number of,
columns The columns must also have similar data types The columns in,
each SELECT statement must also be in the same order.
INNER JOIN
Select records that have matching values in both tables. -this is good for your situation.
INTERSECT
Does not support MySQL.
NATURAL JOIN
All the column names should be matched.
Since you dont need to update these you can create a view from joining tables then you can use less query in your PHP. But views cannot update. And you did not mentioned about relationship between tables. Because of that I have to go with the UNION.
Like this,
CREATE VIEW checkStatus
AS
SELECT
DATE(upload_date_added) as upload_date,
SUM(upload_size) as upload_traffic,
SUM(upload_files) as upload_files
FROM packages_uploads
WHERE upload_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY upload_date
ORDER BY upload_date DESC
UNION
SELECT
DATE(download_date_added) as download_date,
SUM(download_size) as download_traffic,
SUM(download_files) as download_files
FROM packages_downloads
WHERE download_date_added BETWEEN '2011-10-26' AND '2011-11-16'
GROUP BY download_date
ORDER BY download_date DESC
Then anywhere you want to select you just need one line:
SELECT * FROM checkStatus
learn more.