Best way to write this query? Several JOINS - mysql

I have this query (below) while it does work I am wondering if it is the best as it will be going against thousands of records. I will try to explain the best I can.
SELECT items.*,
p.file AS item_pic,
i_f.id AS favorite_id,
COALESCE(f.favorite_count, 0) AS favorite_count,
COALESCE(b.num_buys, 0) AS num_buys,
COALESCE(c.comment_count, 0) AS comment_count
FROM items i
INNER JOIN (SELECT file,
item_id
FROM item_pics
ORDER BY item_pics.id ASC) AS p
ON p.item_id = i.id
LEFT JOIN (SELECT COUNT(*) AS favorite_count,
item_id
FROM item_favorites
GROUP BY item_id) AS f
ON f.item_id = i.id
LEFT JOIN (SELECT COUNT(*) AS num_buys,
item_id
FROM purchases
GROUP BY item_id) AS b
ON b.item_id = i.id
LEFT JOIN (SELECT COUNT(*) AS comment_count,
item_id
FROM comments
GROUP BY item_id) AS c
ON c.item_id = i.id
LEFT JOIN item_favorites AS i_f
ON i.id = i_f.item_id
AND i_f.userid = '14'
GROUP BY i.id
LIMIT 0, 20
So we are selecting the items in the database. The first join is for a picture (Items have multiple pictures but I only want one).
The next join is for favorite count. Each time a user favorites something it adds it to the table favorites with some info, so I am just trying to get the total number of favorites for that item.
Next up is the number of purchases for this item. Pretty much the same as favorites.
After that it is for comments. Again this is just like the purchases and favorites count.
The last join is to see if the logged in user (id 14) has favorited this item if not I use COALESCE to return 0.
Like I said this all works correctly but it does take a few seconds to load on a table of about 6700 items and about 180K rows in the purchases table for only loading 20 at a time (I do a scrolling/load similar to Facebook/Twitter). Indexes have been properly setup on all tables. Once this is complete/correct I would like to know how to limit results for purchases in the last seven days and order by number of purchases (num_buys).
EDIT: Results from EXPLAIN

I suppose you want the first picture (lowest id), and pictures are required, where as everything else is optional.
I guess you're doing subqueries because you think joining on uncorrelated subqueries (hitting the joined tables just once) will be faster than correlated subqueries or a plain JOIN. However, you end up having to lookup the records twice, and the second lookup (for the actual join) doesn't get to use an index because derived (temporary tables) don't have indexes.
Try normal JOINs:
SELECT items.*,
p.file AS item_pic,
COALESCE(i_f.id, 0) AS favorite_id,
COUNT(f.item_id) AS favorite_count,
COUNT(b.item_id) AS num_buys,
COUNT(c.item_id) AS comment_count
FROM items i
STRAIGHT_JOIN item_pics p
ON p.item_id = i.id
LEFT JOIN item_pics p2
ON p2.item_id = i.id
AND p2.id < p1.id
LEFT JOIN item_favorites f
ON f.item_id = i.id
LEFT JOIN purchases b
ON b.item_id = i.id
LEFT JOIN comments c
ON c.item_id = i.id
LEFT JOIN item_favorites AS i_f
ON i_f.item_id = i.id
AND i_f.userid = '14'
WHERE p2.id IS NULL
GROUP BY i.id
LIMIT 20
The double join on pictures is an anti-join WHERE p2.id IS NULL, to retrieve the picture with the lowest id.

Related

Update query optimization with large data

I have products , and categories table, and a pivot table named product_catalog, I need to update the product_catalog table so that I can remove the categories which have less than five products. Those products which are in these redundant categories should move to their parent categories. I have written a query for this but problem is that this product_catalog table has 55213277 records in it and it takes lot of time to run .
Basically it is a nested query and we have to run this query for as many times unless there is no category left having less than five products.
Here is my sql query I tested.
Can you propose me an optimized solution.
UPDATE product_catalogT AS C
INNER JOIN
(SELECT
COUNT(*) AS tp, catalog_id cid, g.parent_id pid
FROM
product_catalog AS p
LEFT JOIN catalog AS g ON p.catalog_id = g.id
Where g.parent_id <> 0
GROUP BY catalog_id
HAVING tp < 5)
AS A ON C.catalog_id = A.cid
SET
C.catalog_id = A.pid
Here's a little less writing, but for performance we'd need to see your tables, indexes, and the EXPLAIN, as mentioned.
UPDATE product_catalogT C
JOIN
( SELECT p.catalog_id
FROM product_catalog p
JOIN catalog g
ON p.catalog_id = g.id
Where g.parent_id <> 0
GROUP
BY catalog_id
HAVING COUNT(*) < 5
) A
ON C.catalog_id = A.cid
SET C.catalog_id = A.pid
Also, I might mention that this seems like a rather strange request

Duplicated rows

SQL Query:
SELECT
T.*,
U.nick AS author_nick,
P.id AS post_id,
P.name AS post_name,
P.author AS post_author_id,
P.date AS post_date,
U2.nick AS post_author
FROM
zero_topics T
LEFT JOIN
zero_posts P
ON
T.id = P.topic_id
LEFT JOIN
zero_players U
ON
T.author = U.uuid
LEFT JOIN
zero_players U2
ON
P.author = U2.uuid
ORDER BY
CASE
WHEN P.date is null THEN T.date
ELSE P.date
END DESC
Output:
Topics:
Posts:
Question: Why i have duplicated topic id 22? i have in mysql two topics (id 22 and 23) and two posts(id 24 and 25). I want to see topic with last post only.
If a join produces multiple results and you want only at most one result, you have to rewrite the join and/or filtering criteria to provide that result. If you want only the latest result of all the results, it's doable and reasonably easy once you use it a few times.
select a.Data, b.Data
from Table1 a
left join Table2 b
on b.JoinValue = a.JoinValue
and b.DateField =(
select Max( DateField )
from Table2
where JoinValue = b.JoinValue );
The correlated subquery pulls out the one date that is the highest (most recent) value of all the joinable candidates. That then becomes the row that takes part in the join -- or, of course, nothing if there are no candidates at all. This is a pattern I use quite a lot.

Stop duplication of data in left join

I have a query that selects data from several tables using LEFT JOINS. The problem is data is being duplicated.
Here's the query
SELECT
A.ID,
T.T_ID,
T.name,
T.pic,
T.timestamp AS T_ts,
(SELECT COUNT(*) FROM track_plays WHERE T_ID = T.T_ID) AS plays,
(SELECT COUNT(*) FROM track_downloads WHERE T.T_ID) AS downloads,
S.S_ID,
S.status,
S.timestamp AS S_ts,
G.G_ID,
G.gig_name,
G.date_time,
G.lineup,
G.price,
G.currency,
G.pic AS G_pic,
G.ticket,
G.venue,
G.timestamp AS G_ts
FROM artists A
LEFT JOIN TRACKS T
ON T.ID = A.ID
LEFT JOIN STATUS S
ON S.ID = A.ID
LEFT JOIN GIGS G
ON G.ID = A.ID
WHERE A.ID = '$ID'
ORDER BY S_ts, G_ts AND T_ts DESC LIMIT 20
The problem is data is duplicated if one of the tables in the join has more data than another. So if tracks has 1 row, status has 2 and gigs has no rows you would get the data from tracks doubled.
I have tried using GROUP BY A.ID but that eliminates data. So in the example given before there would nly be one row of status show.
I've also tried GROUP_CONCAT but am unsure on that function so can't tell you much.
USING SELECT DISTINCT has the same effect as just the GROUP BY A.ID.
Assuming that artists -> gigs and artists -> tracks are 1-N mappings then you have two choices. (both of which were covered in the comments on your OP
1) Specify which of the N rows you want to get back to achieve a 1-1 map:
FROM artists A
LEFT JOIN TRACKS T ON T.ID = A.ID AND T.<SOMETHING> = SOMETHING
LEFT JOIN STATUS S ON S.ID = A.ID
LEFT JOIN GIGS G ON G.ID = A.ID AND G.<SOMETHING> = SOMETHNING
2) Do the joins as you wrote and get multiple entries for tracks and gigs and then pivot them in your calling application. Generally you'd put an ORDER BY clause in the query and check for the same artist key and pivot the list.

Adding a Subquery to a Query in SQL

I have a query that works very well. Let me start with it:
Edit: The SQL has been updated. I get 0 in every row.
SELECT i.item, i.user_id, u.username,
(COALESCE(r.ratetotal, 0)) AS total,
(COALESCE(c.commtotal, 0)) AS comments,
(COALESCE(r.rateav, '50%')) AS rate,
(COALESCE(x.wasRated, '0')) AS wasRated
FROM items AS i
LEFT JOIN master_cat AS c
ON (c.cat_id = i.cat_id)
LEFT JOIN users AS u
ON u.user_id = i.user_id
LEFT JOIN
(SELECT item_id,
COUNT(item_id) AS ratetotal,
AVG(rating) AS rateav
FROM ratings GROUP BY item_id) AS r
ON r.item_id = i.item_id
LEFT JOIN
(SELECT item_id,
COUNT(item_id) AS commtotal
FROM reviews GROUP BY item_id) AS c
ON c.item_id = i.item_id
LEFT JOIN
(SELECT xu.user_id, ra.item_id, '1' AS wasRated
FROM users AS xu
LEFT JOIN ratings AS ra
ON ra.user_id = xu.user_id
WHERE xu.user_id = '1') AS x
ON x.user_id = u.user_id
AND x.item_id = r.item_id
WHERE c.category = 'Movies'
ORDER by i.item ASC;
I need to add one more function to it, where you see AS x
Basically, there are three tables here that are important. items, reviews and ratings. In the top portion you see there are subqueries that are taking statistics such as averages and totals for each item.
I need a final query that is tied to user_id, item_id and rate_id (in ratings). In the end result, where it list each item and the stats with it, I want one more column, a simple true or false if logged in user has rated it. So I need something like this:
SELECT ???
FROM ratings AS r
WHERE r.user_id = '{$user_id}'
(user_id of logged in user is passed in from PHP.`)
How can I make a subquery that gives me that last bit of info, but puts it in each row of items in the parent query?
Add this to the parent query.
, coalesce(x.WasRated, 'false') as WasRated
Your x subquery is:
(select users.user_id
, ratings.item_id
, 'true' WasRated
from users join ratings on user.user_id = ratings.user_id
where users.user_id = the one for the logged in user
) x on x.user_id = users.user_id
and x.item_id = ratings.item_id
or something like it.

MySQL query, dealing with active and inactive products

Facing a problem and not getting the hint for a few hours. Maybe onyone can help me out.
Have the following query which shows the Topsellers. So the status of the product (active or not) is saved in b.Article_Status (0=inactive, 1=active).
How do I get the products of the result list which have no active product in the productfamily at the moment. But the product shall still be shown if an old one was ordered (and so is in table order_items) is now inactive and the active one was not ordered yet.
Actual query looks as follow. Already fund a solution which works when the actual active product has been ordered once, but still the problem with the mentioned case.
SELECT count( a.order_itemid ) AS numOrders, c.Product_ID, c.Product_Name, d.producer_name
FROM order_items a
LEFT OUTER JOIN product_article b ON b.Article_ID = a.order_itemid
LEFT OUTER JOIN product c ON b.Article_Productid = c.Product_ID
LEFT OUTER JOIN producer d ON c.Product_Producer = d.producer_id
GROUP BY c.Product_ID
ORDER BY `numOrders` DESC
Solution was a WHERE EXISTS subquery
SELECT count( a.order_itemid ) AS numOrders, c.Product_ID, c.Product_Name, d.producer_name
FROM order_items a
LEFT OUTER JOIN product_article b ON b.Article_ID = a.order_itemid
LEFT OUTER JOIN product c ON b.Article_Productid = c.Product_ID
LEFT OUTER JOIN producer d ON c.Product_Producer = d.producer_id
WHERE EXISTS (SELECT * FROM product_article x WHERE c.Product_ID = x.Article_Productid AND x.Article_Status = 1)
GROUP BY c.Product_ID
ORDER BY `numOrders` DESC
LIMIT 5