Getting rid of duplicate results in MySQL query when using UNION - mysql

I have a MySQL query to get items that have had recent activity. Basically users can post a review or add it to their wishlist, and I want to get all items that have either had a new review in the last x days, or was placed on someone's wishlist.
The query goes a bit like this (slightly simplified):
SELECT items.*, reaction.timestamp AS date FROM items
LEFT JOIN reactions ON reactions.item_id = items.id
WHERE reactions.timestamp > 1251806994
GROUP BY items.id
UNION
SELECT items.*, wishlists.timestamp AS date FROM items
LEFT JOIN wishlist ON wishlists.item_id = items.id
WHERE wishlists.timestamp > 1251806994
GROUP BY items.id
ORDER BY date DESC LIMIT 5
This works, but when an item has been placed both on someone's wishlist and a review was posted, the item is returned twice. UNION removes duplicates normally, but because the date differs between the two rows, both rows are returned. Can I somehow tell MySQL to ignore the date when removing duplicate rows?
I also tried doing something like this:
SELECT items.*, IF(wishlists.id IS NOT NULL, wishlists.timestamp, reactions.timestamp) AS date FROM items
LEFT JOIN reactions ON reactions.item_id = items.id
LEFT JOIN wishlist ON wishlists.item_id = items.id
WHERE (wishlists.id IS NOT NULL AND wishlists.timestamp > 1251806994) OR
(reactions.id IS NOT NULL AND reactions.timestamp > 1251806994)
GROUP BY items.id
ORDER BY date DESC LIMIT 5
But that turned out to be insanely slow for some reason (took about half a minute).

I solved it myself, based on larryb82's idea. I basically did the following:
SELECT * FROM (
SELECT items.*, reaction.timestamp AS date FROM items
LEFT JOIN reactions ON reactions.item_id = items.id
WHERE reactions.timestamp > 1251806994
GROUP BY items.id
UNION
SELECT items.*, wishlists.timestamp AS date FROM items
LEFT JOIN wishlist ON wishlists.item_id = items.id
WHERE wishlists.timestamp > 1251806994
GROUP BY items.id
ORDER BY date DESC LIMIT 5
) AS items
GROUP BY items.id
ORDER BY date DESC LIMIT 5
Though I realize this probably doesn't take into account which date is the highest for each item... Not sure yet if that matters and if so, what to do about it.

Not sure if this would be a huge performance hit but you could try
SELECT item_field_1, item_field_2, ..., max(date) as date
FROM
(the query you posted)
GROUP BY item_field_1, item_field_2, ...

I don't think you need a UNION here at all.
SELECT item.*, GREATEST(COALESCE(wishlists.timestamp, 0), COALESCE(reaction.timestamp, 0)) as date
FROM items
LEFT JOIN reactions ON reactions.item_id = items.id AND reactions.timestamp > 1251806994
LEFT JOIN wishlists ON wishlists.item_id = items.id AND wishlists.timestamp > 1251806994
ORDER BY date DESC limit 5
Your use of LEFT JOIN above was probably very slow because of the predicate with the OR in it. You asked the database to join the three tables together then examined that result for timestamp information. My statement should form a smaller intermediate table. Items that don't have either a reaction or a wishlist will get a date of 0, which presumably will cause them not to be reported.

Related

Filter inner query by the results of the outer query

Let me start in plain english first
Query: Get top 100 paying users and their current active item (just one item)
Here is a drafted query
SELECT `user_id`, SUM(p.`amount`) as `total`
FROM `users_purcahse` AS p
LEFT JOIN (SELECT `ui`.`item_id` as `item_id`, `ui`.`user_id` as `user_id`
FROM `user_items` AS `ui`
LEFT OUTER JOIN `items` AS `i` ON `ui`.`item_id` = `i`.`id`
LEFT OUTER JOIN `categories` AS `cat` ON `i`.`category_id` = `cat`.`id`
WHERE `ui`.isActive = 1
) AS `ui` ON p.`user_id` = `ui`.`user_id`
GROUP BY `user_id`, `ui`.`item_id`
ORDER BY `total` DESC
LIMIT 0, 100;
The problem with this is that the inner query is getting all users items table and then it will join it with the top 100 paying users
user items is a very large table, the query is taking too long
I simply want to attach the current active items for each user after doing the calculations
Note: a user can have so many items but only 1 active item
Note2: it's not enforced on the DB level that user_items can have one column with is_active per user
This is a job for some well-chosen subqueries.
First, let's find the user_id values of your top-paying users.
SELECT user_id, SUM(amount) total
FROM users_purcahse
ORDER BY SUM(amount) DESC
LIMIT 100
Next, let's find the item_id values for your users. If more than one item is active, we'll take the one with the smallest item_id value to get just one.
SELECT user_id, MIN(item_id) item_id
FROM user_items
WHERE isActive = 1
GROUP BY user_id
Then, in an outer query we can fetch the details of your items.
SELECT top_users.user_id, top_users.total,
active_items.item_id,
items.*, categories.*
FROM (
SELECT user_id, SUM(amount) total
FROM users_purcahse
ORDER BY SUM(amount) DESC
LIMIT 100
) top_users
LEFT JOIN (
SELECT user_id, MIN(item_id) item_id
FROM user_items
WHERE isActive = 1
GROUP BY user_id
) active_items ON top_users.user_id = active_items.user_id
LEFT JOIN items ON active_items.item_id = item.id
LEFT JOIN categories ON item.category_id = categories.id
ORDER BY top_users.total DESC, top_users.user_id
The trick here is to use GROUP BY subqueries to get the data items where you need just one value per user_id.
Once you have the resultset you need, you can use EXPLAIN to help you sort out any performance problems.

mysql left join not returning empty row

I'm trying to get the list of categories with number of child records present in there. If the categories doesn't have records it should return NULL or 0 but my query returning categories with child records looks like its skipping the one without child records. ... will really appreciate the help.
here's my code:
SELECT
t_gal.f_sub_category_id,
t_sub_cat.f_sub_cat_name,
t_gal.f_image_thumb, (
SELECT COUNT(*)
FROM t_gallery
WHERE f_sub_category_id = t_gal.f_sub_category_id)
AS f_image_total
FROM t_gallery t_gal
LEFT JOIN t_sub_category t_sub_cat ON t_sub_cat.r_id = t_gal.f_sub_category_id
GROUP BY t_sub_cat.r_id
ORDER BY t_gal.f_added_on DESC, t_gal.r_id DESC
Here's the two tables:
Problem appears to be your group by clause.
You are grouping by a field that is on the LEFT JOINed table, hence when it does the group by all the rows which do not have a matching row on that table would appear to be aggregated into a single row.
I think what you are trying to get is a list of gallery items, along with the category they are in (if found) and the count of other galleries in the same category. If so try the following (if not let me know!)
SELECT t_gal.f_sub_category_id, t_sub_cat.f_sub_cat_name, t_gal.f_image_thumb, Sub1.GalleryCount
FROM t_gallery t_gal
LEFT JOIN t_sub_category t_sub_cat
ON t_sub_cat.r_id = t_gal.f_sub_category_id
LEFT OUTER JOIN (SELECT f_sub_category_id, COUNT(*) AS GalleryCount FROM t_gallery GROUP BY f_sub_category_id) Sub1
ON Sub1.f_sub_category_id = t_gal.f_sub_category_id
ORDER BY t_gal.f_added_on DESC, t_gal.r_id DESC
It LOOKS like for every sub-category (of a previously selected category), you want to include ALL of that sub-category... And, of those sub-categories, you want a count of how many images for that category wheather or not there even IS an image in the gallery table.
What you may have been running into is the select statement for the FIELD used to count images... first, that could become a performance killer. Instead, you could just do a left-join directly to the gallery table and COUNT the distinct R_IDs from the gallery FOR the corresponding sub-category
SELECT
t_sub_cat.r_id,
t_sub_cat.f_sub_cat_name,
MIN( COALESCE( t_gal.f_image_thumb, '' )) as JustOneThumbImg,
COUNT( DISTINCT( t_gal.r_id )) SubCategoryImageCount
FROM
t_sub_category t_sub_cat
LEFT JOIN t_gallery t_gal
ON t_sub_cat.r_id = t_gal.f_sub_category_id
GROUP BY
t_sub_cat.r_id
ORDER BY
t_sub_cat.f_added_on DESC
Since you are not grabbing all gallery images (since some may not exist FOR a given sub-category), ordering by the t_gal.r_id doesn't make sense
Also, the reason I'm not pre-grabbing aggregates in a sub-query to join against... I don't want to get everything from every category / sub-category without knowing which sub-categories are associated with the category you actually want.
the problem with your query is that you are using t_gallery as your main table and not t_sub_category while using left join.
you could try this: sqlfiddle
select
t_gal.f_sub_category_id,
t_sub_cat.f_sub_cat_name,
(
SELECT COUNT(*)
FROM t_gallery
WHERE f_sub_category_id = t_gal.f_sub_category_id)
AS f_image_total
from t_sub_category as t_sub_cat
left join t_gallery t_gal on t_gal.f_sub_category_id = t_sub_cat.r_id
GROUP BY t_sub_cat.r_id
ORDER BY t_gal.r_id DESC;

Find out AVG column in SQL

I have this php/sql Query:
$result = mysql_query("
SELECT r.item_id, AVG(rating) AS avgrating, count(rating) AS count, i.item, c.category
FROM ratings AS r
LEFT JOIN items AS i
ON r.item_id = i.items_id
INNER JOIN master_cat c
ON c.cat_id = i.cat_id
GROUP BY item_id
ORDER BY avgrating DESC
LIMIT 25;");
When I output this, count is correct, it shows how much votes certain items have received.
I simply want to add a WHERE count >= 10 clause but everything breaks. Obviously, when there are thousands of items, some will get one vote and have 100%. But that is not a good indicator. I want to print out items that have at least 10 votes (or count >= 10)
You should to use having instead where
SELECT
r.item_id, AVG(rating) AS avgrating,
count(rating) AS count, i.item, c.category
FROM
ratings AS r
LEFT JOIN items AS i
ON r.item_id = i.items_id
INNER JOIN master_cat c
ON c.cat_id = i.cat_id
GROUP BY
item_id
HAVING
count >= 10
ORDER BY
avgrating DESC
LIMIT 25;
You can't use a where filter on the results of an aggregate function (count()). where is applied at the row-level, as the DB is deciding whether to include the row or not in the result set - at this point the results of the count aren't available yet.
What you want is a having clause, which is applied as one of the last steps before results are sent to the client, after all the aggregate results have been calculated.
...
GROUP BY item_id
HAVING count > 10
ORDER BY ...
you need to tell it what you want to count
having count(*) > 10

Attempting to Join 3 tables in MySQL

I have three tables that are joined. I almost have the solution but there seems to be one small problem going on here. Here is statement:
SELECT items.item,
COUNT(ratings.item_id) AS total,
COUNT(comments.item_id) AS comments,
AVG(ratings.rating) AS rate
FROM `items`
LEFT JOIN ratings ON (ratings.item_id = items.items_id)
LEFT JOIN comments ON (comments.item_id = items.items_id)
WHERE items.cat_id = '{$cat_id}' AND items.spam < 5
GROUP BY items_id ORDER BY TRIM(LEADING 'The ' FROM items.item) ASC;");
I have a table called items, each item has an id called items_id (notice it's plural). I have a table of individual user comments for each item, and one for ratings for each item. (The last two have a corresponding column called 'item_id').
I simply want to count comments and ratings total (per item) separately. With the way my SQL statement is above, they are a total.
note, total is the total of ratings. It's a bad naming scheme I need to fix!
UPDATE: 'total' seems to count ok, but when I add a comment to 'comments' table, the COUNT function affects both 'comments' and 'total' and seems to equal the combined output.
Problem is you're counting results of all 3 tables joined. Try:
SELECT i.item,
r.ratetotal AS total,
c.commtotal AS comments,
r.rateav AS rate
FROM items AS i
LEFT JOIN
(SELECT item_id,
COUNT(item_id) AS ratetotal,
AVG(rating) AS rateav
FROM ratings GROUP BY item_id) AS r
ON r.item_id = i.items_id
LEFT JOIN
(SELECT item_id,
COUNT(item_id) AS commtotal
FROM comments GROUP BY item_id) AS c
ON c.item_id = i.items_id
WHERE i.cat_id = '{$cat_id}' AND i.spam < 5
ORDER BY TRIM(LEADING 'The ' FROM i.item) ASC;");
In this query, we make the subqueries do the counting properly, then send that value to the main query and filter the results.
I'm guessing this is a cardinality issue. Try COUNT(distinct comments.item_id)

How do I retrieve a set number of records in date order using joins

I'm having a bit of trouble getting the right results from a query.
At the moment I have two tables, main_cats and products.
The result I am after is 6 records, in date order, with only one unique main_cat_id.
The basic table structures are
Main_cats: main_cat_id, main_cat_title
Products: product_id, main_cat_id, product_name, date_added.
I am hitting problems when I join the main_cat table to the products table. It seems to totally ignore the ORDER BY clause.
SELECT date_added, product_name,main_cat_title FROM ic_products p
JOIN ic_main_cats icm on icm.main_cat_id=p.main_cat_id
WHERE p.main_cat_id IN (1,2,12,22,6,8)
GROUP BY p.main_cat_id
ORDER BY date_added ASC
LIMIT 6
If I leave the join out the query works but shows more than one main_cat_id and I cannot display the main_cat_title as needed.
Your question is (at heart) a "select min/max date per group and associated fields" question.
SELECT p.date_added, p.product_name, icm.main_cat_title
FROM ic_products p
LEFT JOIN ic_products p2
ON p.main_cat_id=p2.main_cat_id
AND p.date_added > p2.date_added
LEFT JOIN ic_main_cats icm ON icm.main_cat_id=p.main_cat_id
WHERE p2.date_added IS NULL
AND p.main_cat_id IN (1,2,12,22,6,8)
Let me explain: look at this table, being the first LEFT JOIN of the query above:
SELECT p.date_added, p.product_name
FROM ic_products p
LEFT JOIN ic_products p2
ON p.main_cat_id=p2.main_cat_id
AND p.date_added > p2.date_added
WHERE p2.date_added IS NULL
This joins products to itself: it produces a table with every combination of date_added pairs within each category, where the date in the first column is always greater than the date in the second.
Since this is a left join, when the date in the first column is the smallest for that category, the date in the second will be NULL.
So this basically selects the minimum date for each category (I assume you want the minimum date ie earliest occurence, based off your ORDER BY date_added ASC in your question -- if you wanted the newest date_added you'd change the > to a < in the above join).
The second LEFT JOIN to icm is just the one in your original question, so that we can retrieve main_cat_title.
There is no need to LIMIT 6 here because firstly, only one row is retrieved per main_cat_id thanks to the first LEFT JOIN, and secondly, your AND p.main_cat_id IN (1,2,12,22,6,8) only selects 6 categories. So 6 categories at one row per category retrieves 6 rows. (Or at most 6; if you have no items in a particular category of course no rows will be retrieved).
This should work ...
SELECT p.main_cat_id as cat_id, product_name,main_cat_title, MAX(date_added) FROM ic_products p
JOIN ic_main_cats icm on icm.main_cat_id=p.main_cat_id
WHERE p.main_cat_id IN (1,2,12,22,6,8)
GROUP BY p.main_cat_id
ORDER BY date_added ASC
LIMIT 6