Query: count multiple aggregates per item - mysql

Often you need to show a list of database items and certain aggregate numbers about each item. For instance, when you type the title text on Stack Overflow, the Related Questions list appears. The list shows the titles of related entries and the single aggregated number of quantity of responses for each title.
I have a similar problem but needing multiple aggregates. I'd like to display a list of items in any of 3 formats depending on user options:
My item's name (15 total, 13 owned by me)
My item's name (15 total)
My item's name (13 owned by me)
My database is:
items: itemId, itemName, ownerId
categories: catId, catName
map: mapId, itemId, catId
The query below gets: category name, count of item ids per category
SELECT
categories.catName,
COUNT(map.itemId) AS item_count
FROM categories
LEFT JOIN map
ON categories.catId = map.catId
GROUP BY categories.catName
This one gets: category name, count of item ids per category for this owner_id only
SELECT categories.catName,
COUNT(map.itemId) AS owner_item_count
FROM categories
LEFT JOIN map
ON categories.catId = map.catId
LEFT JOIN items
ON items.itemId = map.itemId
WHERE owner = #ownerId
GROUP BY categories.catId
But how do i get them at the same time in a single query? I.e.: category name, count of item ids per category, count of item ids per category for this owner_id only
Bonus. How can I optionally only retrieve where catId count != 0 for any of these? In trying "WHERE item_count <> 0" I get:
MySQL said: Documentation
#1054 - Unknown column 'rid_count' in 'where clause'

Here's a trick: calculating a SUM() of values that are known to be either 1 or 0 is equivalent to a COUNT() of the rows where the value is 1. And you know that a boolean comparison returns 1 or 0 (or NULL).
SELECT c.catname, COUNT(m.catid) AS item_count,
SUM(i.ownerid = #ownerid) AS owner_item_count
FROM categories c
LEFT JOIN map m USING (catid)
LEFT JOIN items i USING (itemid)
GROUP BY c.catid;
As for the bonus question, you could simply do an inner join instead of an outer join, which would mean only categories with at least one row in map would be returned.
SELECT c.catname, COUNT(m.catid) AS item_count,
SUM(i.ownerid = #ownerid) AS owner_item_count
FROM categories c
INNER JOIN map m USING (catid)
INNER JOIN items i USING (itemid)
GROUP BY c.catid;
Here's another solution, which is not as efficient but I'll show it to explain why you got the error:
SELECT c.catname, COUNT(m.catid) AS item_count,
SUM(i.ownerid = #ownerid) AS owner_item_count
FROM categories c
LEFT JOIN map m USING (catid)
LEFT JOIN items i USING (itemid)
GROUP BY c.catid
HAVING item_count > 0;
You can't use column aliases in the WHERE clause, because expressions in the WHERE clause are evaluated before the expressions in the select-list. In other words, the values associated with select-list expressions aren't available yet.
You can use column aliases in the GROUP BY, HAVING, and ORDER BY clauses. These clauses are run after all the expressions in the select-list have been evaluated.

You can sneak a CASE statement inside your SUM():
SELECT categories.catName,
COUNT(map.itemId) AS item_count,
SUM(CASE WHEN owner= #ownerid THEN 1 ELSE 0 END) AS owner_item_count
FROM categories
LEFT JOIN map ON categories.catId = map.catId
LEFT JOIN items ON items.itemId = map.itemId
GROUP BY categories.catId
HAVING COUNT(map.itemId) > 0

SELECT categories.catName,
COUNT(map.itemId) AS item_count,
COUNT(items.itemId) AS owner_item_count
FROM categories
INNER JOIN map
ON categories.catId = map.catId
LEFT JOIN items
ON items.itemId = map.itemId
AND items.owner = #ownerId
GROUP BY categories.catId
Note that you could use a HAVING clause on owner_item_count, but the inner join takes care of item_count for you.

Related

Return unrated rows from a MySQL database

I have a system where people rate items, and I have two tables, I want to only show the user items they have not rated.
item (i and simplified for this example)
----
item_id, name
1, widget1
2, widget2
I have a rating table, which stores three columns
rating
------
item_id
user_id
rating
So I want to only return results that that user has not yet rated, now I did try this;
psuedo-query
SELECT * FROM item LEFT JOIN rating r ON r.item_id = i.item_id WHERE r.user_id != USER_ID_OF_THE_USER;
However that still returned items that they had rated, as other people had rated the item...
So if I have 100 items in the database, user a has rated 30 and user b has rated 70... then user a should get the 70 items they have to rate, and user b should get the 30 items they havent rated.
My rating table has a compound unique key, so if they rate item_id = 1 once, and rate it again, it just updates the rating value, it doesnt make a new row. One row is inserted for every item that is rated by a user.
This feels like it should be easy and it probably is, but I am stuck.
I'm assuming every user has to rate every item. If so, then you can do this with not exists:
select *
from item i
where not exists (
select 1
from rating r
where i.item_id = r.item_id and r.user_id = ?)
SQL Fiddle Demo
Give the fact you're using mysql, a left join / null check would probably be faster:
select i.*
from item i
left join rating r on i.item_id = r.item_id and r.user_id = ?
where r.user_id is null
More Fiddle
http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/
Just use an anti-join pattern, like this:
SELECT i.*
FROM item i
LEFT
JOIN rating r
ON r.item_id = i.item_id
AND r.user_id = USER_ID_OF_THE_USER
WHERE r.user_id IS NULL
The outer join returns all rows from item, along with rating by the user. If there is no related row from the rating table, then the values of the columns from rating will be NULL. So all we need is to add a WHERE clause to filter out the rows that had a match.
You can use this query:
SELECT * FROM item i
WHERE NOT EXISTS (
SELECT * FROM rating r
WHERE r.item_id = i.item_id
AND r.user_id = USER_ID_OF_THE_USER);
This will return you those items for which there is no rating of user USER_ID_OF_THE_USER.

MySQL left join multiple column pairs

Given the following table (products_filter):
How can I do a SELECT ... FROM products LEFT JOIN products_filter ... in such a way that it only returns products which have ALL the specified (filter_id,filter_value) pairs.
Example: for (filter_id, filter_value) = (1,1),(3,0) it should only return the product with id 90001, because it matches both values.
If the specified filter pairs is restricted to a deifnite number the the following query should work.
Select a. Product_id
From products a
Left outer join
(Select product_id,filter_id,filter_value,count(*)
From product_filter
Where filter_id in (1,1) and filter_value in(3,0)
Group by product_id,filter_id,filter_value
Having count(*)=2)b
On(a.product_id=b.product_id)
As you only said you wanted the PRODUCTS values having the desired filter attributes... I've limited results to just product.*
The below query uses an inline view with the count of distinct filters by product ID. The outer where clause then uses the distinct count (in case duplicate filters could exist for a product) of the filter_IDs.
The # in the where clause should always match the number of where clause paired sets in the inline view.
Your sample data indicated that the paired sets could be a subset of all filters. so this ensures each filter pair (or more) exists for the desired product.
SELECT p.*
FROM products p
LEFT JOIN (SELECT product_ID, count(Distinct filter_ID) cnt
FROM products_Filter
WHERE (Filter_ID = 1 and filter_value = 1)
or (Filter_ID = 3 and filter_value = 0)
GROUP BY Product_ID) pf
on P.Product_ID = PF.Product_ID
WHERE pf.cnt = 2

Attempting to Join 3 tables in MySQL

I have three tables that are joined. I almost have the solution but there seems to be one small problem going on here. Here is statement:
SELECT items.item,
COUNT(ratings.item_id) AS total,
COUNT(comments.item_id) AS comments,
AVG(ratings.rating) AS rate
FROM `items`
LEFT JOIN ratings ON (ratings.item_id = items.items_id)
LEFT JOIN comments ON (comments.item_id = items.items_id)
WHERE items.cat_id = '{$cat_id}' AND items.spam < 5
GROUP BY items_id ORDER BY TRIM(LEADING 'The ' FROM items.item) ASC;");
I have a table called items, each item has an id called items_id (notice it's plural). I have a table of individual user comments for each item, and one for ratings for each item. (The last two have a corresponding column called 'item_id').
I simply want to count comments and ratings total (per item) separately. With the way my SQL statement is above, they are a total.
note, total is the total of ratings. It's a bad naming scheme I need to fix!
UPDATE: 'total' seems to count ok, but when I add a comment to 'comments' table, the COUNT function affects both 'comments' and 'total' and seems to equal the combined output.
Problem is you're counting results of all 3 tables joined. Try:
SELECT i.item,
r.ratetotal AS total,
c.commtotal AS comments,
r.rateav AS rate
FROM items AS i
LEFT JOIN
(SELECT item_id,
COUNT(item_id) AS ratetotal,
AVG(rating) AS rateav
FROM ratings GROUP BY item_id) AS r
ON r.item_id = i.items_id
LEFT JOIN
(SELECT item_id,
COUNT(item_id) AS commtotal
FROM comments GROUP BY item_id) AS c
ON c.item_id = i.items_id
WHERE i.cat_id = '{$cat_id}' AND i.spam < 5
ORDER BY TRIM(LEADING 'The ' FROM i.item) ASC;");
In this query, we make the subqueries do the counting properly, then send that value to the main query and filter the results.
I'm guessing this is a cardinality issue. Try COUNT(distinct comments.item_id)

SQL query finding best categories match

I have categories and multiple categorization for my Items. How to find, for specific Item, other Items that have same categories, ordered by most categories matching (aka best match)?
My table structure is roughly:
Item Table
ID
Name
...
Category Table
ID
Name
...
Categorization Table
ID
Item_ID
Category_ID
...
To find all Items having similar categories, for example, I use
SELECT `items`.*
FROM `items`
INNER JOIN `categorizations` c1
ON c1.`item_id` = `items`.`id`
INNER JOIN `categorizations` c2
ON c2.`item_id` = <Item_ID>
WHERE `c1.`category_id` = c2.`category_id`
This should produce a table of counts of category matches between each pair of items that share at least one category.
select i1.item_id,i2.item_id,count(1)
from items i1
join categorizations c1 on c1.item_id=i1.item_id
join categorizations c2 on c2.category_id=c1.category_id
join items i2 on c2.item_id=i2.item_id
where i1.item_id <> i2.item_id
group by i1.item_id,i2.item_id
order by count(1)
I suspect that it may be a bit slow, though. I don't have an instance of MySQL at the moment to try it out.
Something like:
select item_id, count(id)
from item_category ic
where exists(
select category_id
from item_category ic2
where ic2.item_id = #item_id
and ic2.category_id = ic.category_id )
where item_id <> #item_id
group by item_id
order by count(item_id) desc
An alternative method which I have just implemented to solve this problem is using bitwise operators to speed things up. In MySQL this method only works if you have 64 or less categories as the bit functions are 64 bit.
1) Assign each category a unique integer value which is a power of 2.
2) For each item sum the category values that the item is in to create a 64 bit int representing all of the categories that the item is in.
3) To compare an item to another do something like:
SELECT id, BIT_COUNT(item1categories & item2categories) AS numMatchedCats FROM tablename HAVING numMatchedCats > 0 ORDER BY numMatchedCats DESC
The BIT_COUNT() function might be MySQL specific so an alternative may well be required for any other DB.
MySQL bit functions used are explained here:
http://dev.mysql.com/doc/refman/5.0/en/bit-functions.html

MySQL Grouping SubQuery Optimization

I have a table of Categories, and a table of Products. Products have a category_id, and also a maker_id.
I am attempting to return a table of Category Names, along with a Binary of whether or not the category contains any products that belong to a given $maker_id (as defined in PHP code).
My current method counts how many matching products are in each category, but I'm assuming there is a faster way since I only need a Yes/No. Current code:
SELECT
c.name,
SUM(CASE WHEN p.maker_id = '{$maker_id}' THEN 1 ELSE 0 END) AS already_used
FROM categories c
LEFT JOIN products p ON p.category_id = c.id
GROUP BY c.id
I'm reading up on using EXISTS, but all the examples I've found are using it in the WHERE clause. Thanks!
You can try this:
SELECT c.name,COUNT(1) AS already_used, SUM(IF(p.status = 'ready', 1, 0)) AS ready_count
FROM categories c
LEFT JOIN products p
ON (p.category_id = c.id)
WHERE p.maker_id = '{$maker_id}'
GROUP BY c.id;