Stop duplication of data in left join - mysql

I have a query that selects data from several tables using LEFT JOINS. The problem is data is being duplicated.
Here's the query
SELECT
A.ID,
T.T_ID,
T.name,
T.pic,
T.timestamp AS T_ts,
(SELECT COUNT(*) FROM track_plays WHERE T_ID = T.T_ID) AS plays,
(SELECT COUNT(*) FROM track_downloads WHERE T.T_ID) AS downloads,
S.S_ID,
S.status,
S.timestamp AS S_ts,
G.G_ID,
G.gig_name,
G.date_time,
G.lineup,
G.price,
G.currency,
G.pic AS G_pic,
G.ticket,
G.venue,
G.timestamp AS G_ts
FROM artists A
LEFT JOIN TRACKS T
ON T.ID = A.ID
LEFT JOIN STATUS S
ON S.ID = A.ID
LEFT JOIN GIGS G
ON G.ID = A.ID
WHERE A.ID = '$ID'
ORDER BY S_ts, G_ts AND T_ts DESC LIMIT 20
The problem is data is duplicated if one of the tables in the join has more data than another. So if tracks has 1 row, status has 2 and gigs has no rows you would get the data from tracks doubled.
I have tried using GROUP BY A.ID but that eliminates data. So in the example given before there would nly be one row of status show.
I've also tried GROUP_CONCAT but am unsure on that function so can't tell you much.
USING SELECT DISTINCT has the same effect as just the GROUP BY A.ID.

Assuming that artists -> gigs and artists -> tracks are 1-N mappings then you have two choices. (both of which were covered in the comments on your OP
1) Specify which of the N rows you want to get back to achieve a 1-1 map:
FROM artists A
LEFT JOIN TRACKS T ON T.ID = A.ID AND T.<SOMETHING> = SOMETHING
LEFT JOIN STATUS S ON S.ID = A.ID
LEFT JOIN GIGS G ON G.ID = A.ID AND G.<SOMETHING> = SOMETHNING
2) Do the joins as you wrote and get multiple entries for tracks and gigs and then pivot them in your calling application. Generally you'd put an ORDER BY clause in the query and check for the same artist key and pivot the list.

Related

Update query optimization with large data

I have products , and categories table, and a pivot table named product_catalog, I need to update the product_catalog table so that I can remove the categories which have less than five products. Those products which are in these redundant categories should move to their parent categories. I have written a query for this but problem is that this product_catalog table has 55213277 records in it and it takes lot of time to run .
Basically it is a nested query and we have to run this query for as many times unless there is no category left having less than five products.
Here is my sql query I tested.
Can you propose me an optimized solution.
UPDATE product_catalogT AS C
INNER JOIN
(SELECT
COUNT(*) AS tp, catalog_id cid, g.parent_id pid
FROM
product_catalog AS p
LEFT JOIN catalog AS g ON p.catalog_id = g.id
Where g.parent_id <> 0
GROUP BY catalog_id
HAVING tp < 5)
AS A ON C.catalog_id = A.cid
SET
C.catalog_id = A.pid
Here's a little less writing, but for performance we'd need to see your tables, indexes, and the EXPLAIN, as mentioned.
UPDATE product_catalogT C
JOIN
( SELECT p.catalog_id
FROM product_catalog p
JOIN catalog g
ON p.catalog_id = g.id
Where g.parent_id <> 0
GROUP
BY catalog_id
HAVING COUNT(*) < 5
) A
ON C.catalog_id = A.cid
SET C.catalog_id = A.pid
Also, I might mention that this seems like a rather strange request

Improve MySql query left outer joins with subquery

We are maintaining a history of Content. We want to get the updated entry of each content, with create Time and update Time should be of the first entry of the Content. The query contains multiple selects and where clauses with so many left joins. The dataset is very huge, thereby query is taking more than 60 seconds to execute. Kindly help in improving the same. Query:
select * from (select * from (
SELECT c.*, initCMS.initcreatetime, initCMS.initupdatetime, user.name as partnerName, r.name as rightsName, r1.name as copyRightsName, a.name as agelimitName, ct.type as contenttypename, cat.name as categoryname, lang.name as languagename FROM ContentCMS c
left join ContentCategoryType ct on ct.id = c.contentType
left join User user on c.contentPartnerId = user.id
left join Category cat on cat.id = c.categoryId
left join Language lang on lang.id = c.languageCode
left join CopyRights r on c.rights = r.id
left join CopyRights r1 on c.copyrights = r1.id
left join Age a on c.ageLimit = a.id
left outer join (
SELECT contentId, createTime as initcreatetime, updateTime as initupdatetime from ContentCMS cms where cms.deleted='0'
) as initCMS on initCMS.contentId = c.contentId WHERE c.deleted='0' order by c.id DESC
) as temp group by contentId) as c where c.editedBy='0'
Any help would be highly appreciated. Thank you.
Just a partial eval and suggestion because your query seems non properly formed
This left join seems unuseful
FROM ContentCMS c
......
left join (
SELECT contentId
, createTime as initcreatetime
, updateTime as initupdatetime
from ContentCMS cms
where cms.deleted='0'
) as initCMS on initCMS.contentId = c.contentId
same table
the order by (without limit) in a subquery in join is unuseful because join ordered values or unordered value produce the same result
the group by contentId is strange beacuse there aren't aggregation function and the sue of group by without aggregation function is deprecated is sql
and in the most recente version for mysql is not allowed (by deafult) if you need distinct value or just a rows for each contentId you should use distinct or retrive the value in a not casual manner (the use of group by without aggregation function retrive casual value for not aggregated column .
for a partial eval your query should be refactored as
SELECT c.*
, c.initcreatetime
, c.initupdatetime
, user.name as partnerName
, r.name as rightsName
, r1.name as copyRightsName
, a.name as agelimitName
, ct.type as contenttypename
, cat.name as categoryname
, lang.name as languagename
FROM ContentCMS c
left join ContentCategoryType ct on ct.id = c.contentType
left join User user on c.contentPartnerId = user.id
left join Category cat on cat.id = c.categoryId
left join Language lang on lang.id = c.languageCode
left join CopyRights r on c.rights = r.id
left join CopyRights r1 on c.copyrights = r1.id
WHERE c.deleted='0'
) as temp
for the rest you should expiclitally select the column you effectively need add proper aggregation function for the others
Also the nested subquery just for improperly reduce the rows don't help performance ... you should also re-eval you data modelling and design.

mySQL Sub Select needed

I have three tables, libraryitems, copies and loans.
A libraryitem hasMany copies, and a copy hasMany loans.
I'm trying to get the latest loan entry for a copy only; The query below returns all loans for a given copy.
SELECT
libraryitems.title,
copies.id,
copies.qruuid,
loans.id AS loanid,
loans.status,
loans.byname,
loans.byemail,
loans.createdAt
FROM copies
INNER JOIN libraryitems ON copies.libraryitemid = libraryitems.id AND libraryitems.deletedAt IS NULL
LEFT OUTER JOIN loans ON copies.id = loans.copyid
WHERE copies.libraryitemid = 1
ORDER BY copies.id ASC, loans.createdAt DESC
I know there needs to be a sub select of some description in here, but struggling to get the correct syntax. How do I only return the latest, i.e MAX(loans.createdAt) row for each distinct copy? Just using group by copies.id returns the earliest, rather than latest entry.
Image example below:
in the subquery , getting maximum created time for a loan i.e. latest entry and joining back with loans to get other details.
SELECT
T.title,
T.id,
T.qruuid,
loans.id AS loanid,
loans.status,
loans.byname,
loans.byemail,
loans.createdAt
FROM
(
SELECT C.id, C.qruuid, L.title, MAX(LN.createdAt) as maxCreatedTime
FROM Copies C
INNER JOIN libraryitems L ON C.libraryitemid = L.id
AND L.deletedAt IS NULL
LEFT OUTER JOIN loans LN ON C.id = LN.copyid
GROUP BY C.id, C.qruuid, L.title) T
JOIN loans ON T.id = loans.copyid
AND T.maxCreatedTime = loans.createdAt
A self left join on loans table will give you latest loan of a copy, you may join the query to the other tables to fetch the desired output.
select * from loans A
left outer join loans B
on A.copyid = B.copyid and A.createdAt < B.createdAt
where B.createdAt is null;
This is your query with one simple modification -- table aliases to make it clearer.
SELECT li.title, c.id, c.qruuid,
l.id AS loanid, l.status, l.byname, l.byemail, l.createdAt
FROM copies c INNER JOIN
libraryitems li
ON c.libraryitemid = li.id AND
li.deletedAt IS NULL LEFT JOIN
loans l
ON c.id = l.copyid
WHERE c.libraryitemid = 1
ORDER BY c.id ASC, l.createdAt DESC ;
With this as a beginning let's think about what you need. You want the load with the latest createdAt date for each c.id. You can get this information with a subquery:
select l.copyid, max(createdAt)
from loans
group by l.copyId
Now, you just need to join this information back in:
SELECT li.title, c.id, c.qruuid,
l.id AS loanid, l.status, l.byname, l.byemail, l.createdAt
FROM copies c INNER JOIN
libraryitems li
ON c.libraryitemid = li.id AND
li.deletedAt IS NULL LEFT JOIN
loans l
ON c.id = l.copyid LEFT JOIN
(SELECT l.copyid, max(l.createdAt) as maxca
FROM loans
GROUP BY l.copyid
) lmax
ON l.copyId = lmax.copyId and l.createdAt = lmax.maxca
WHERE c.libraryitemid = 1
ORDER BY c.id ASC, l.createdAt DESC ;
This should give you the most recent record. And, the use of left join should keep all copies, even those that have never been leant.

Best way to write this query? Several JOINS

I have this query (below) while it does work I am wondering if it is the best as it will be going against thousands of records. I will try to explain the best I can.
SELECT items.*,
p.file AS item_pic,
i_f.id AS favorite_id,
COALESCE(f.favorite_count, 0) AS favorite_count,
COALESCE(b.num_buys, 0) AS num_buys,
COALESCE(c.comment_count, 0) AS comment_count
FROM items i
INNER JOIN (SELECT file,
item_id
FROM item_pics
ORDER BY item_pics.id ASC) AS p
ON p.item_id = i.id
LEFT JOIN (SELECT COUNT(*) AS favorite_count,
item_id
FROM item_favorites
GROUP BY item_id) AS f
ON f.item_id = i.id
LEFT JOIN (SELECT COUNT(*) AS num_buys,
item_id
FROM purchases
GROUP BY item_id) AS b
ON b.item_id = i.id
LEFT JOIN (SELECT COUNT(*) AS comment_count,
item_id
FROM comments
GROUP BY item_id) AS c
ON c.item_id = i.id
LEFT JOIN item_favorites AS i_f
ON i.id = i_f.item_id
AND i_f.userid = '14'
GROUP BY i.id
LIMIT 0, 20
So we are selecting the items in the database. The first join is for a picture (Items have multiple pictures but I only want one).
The next join is for favorite count. Each time a user favorites something it adds it to the table favorites with some info, so I am just trying to get the total number of favorites for that item.
Next up is the number of purchases for this item. Pretty much the same as favorites.
After that it is for comments. Again this is just like the purchases and favorites count.
The last join is to see if the logged in user (id 14) has favorited this item if not I use COALESCE to return 0.
Like I said this all works correctly but it does take a few seconds to load on a table of about 6700 items and about 180K rows in the purchases table for only loading 20 at a time (I do a scrolling/load similar to Facebook/Twitter). Indexes have been properly setup on all tables. Once this is complete/correct I would like to know how to limit results for purchases in the last seven days and order by number of purchases (num_buys).
EDIT: Results from EXPLAIN
I suppose you want the first picture (lowest id), and pictures are required, where as everything else is optional.
I guess you're doing subqueries because you think joining on uncorrelated subqueries (hitting the joined tables just once) will be faster than correlated subqueries or a plain JOIN. However, you end up having to lookup the records twice, and the second lookup (for the actual join) doesn't get to use an index because derived (temporary tables) don't have indexes.
Try normal JOINs:
SELECT items.*,
p.file AS item_pic,
COALESCE(i_f.id, 0) AS favorite_id,
COUNT(f.item_id) AS favorite_count,
COUNT(b.item_id) AS num_buys,
COUNT(c.item_id) AS comment_count
FROM items i
STRAIGHT_JOIN item_pics p
ON p.item_id = i.id
LEFT JOIN item_pics p2
ON p2.item_id = i.id
AND p2.id < p1.id
LEFT JOIN item_favorites f
ON f.item_id = i.id
LEFT JOIN purchases b
ON b.item_id = i.id
LEFT JOIN comments c
ON c.item_id = i.id
LEFT JOIN item_favorites AS i_f
ON i_f.item_id = i.id
AND i_f.userid = '14'
WHERE p2.id IS NULL
GROUP BY i.id
LIMIT 20
The double join on pictures is an anti-join WHERE p2.id IS NULL, to retrieve the picture with the lowest id.

MySQL query, dealing with active and inactive products

Facing a problem and not getting the hint for a few hours. Maybe onyone can help me out.
Have the following query which shows the Topsellers. So the status of the product (active or not) is saved in b.Article_Status (0=inactive, 1=active).
How do I get the products of the result list which have no active product in the productfamily at the moment. But the product shall still be shown if an old one was ordered (and so is in table order_items) is now inactive and the active one was not ordered yet.
Actual query looks as follow. Already fund a solution which works when the actual active product has been ordered once, but still the problem with the mentioned case.
SELECT count( a.order_itemid ) AS numOrders, c.Product_ID, c.Product_Name, d.producer_name
FROM order_items a
LEFT OUTER JOIN product_article b ON b.Article_ID = a.order_itemid
LEFT OUTER JOIN product c ON b.Article_Productid = c.Product_ID
LEFT OUTER JOIN producer d ON c.Product_Producer = d.producer_id
GROUP BY c.Product_ID
ORDER BY `numOrders` DESC
Solution was a WHERE EXISTS subquery
SELECT count( a.order_itemid ) AS numOrders, c.Product_ID, c.Product_Name, d.producer_name
FROM order_items a
LEFT OUTER JOIN product_article b ON b.Article_ID = a.order_itemid
LEFT OUTER JOIN product c ON b.Article_Productid = c.Product_ID
LEFT OUTER JOIN producer d ON c.Product_Producer = d.producer_id
WHERE EXISTS (SELECT * FROM product_article x WHERE c.Product_ID = x.Article_Productid AND x.Article_Status = 1)
GROUP BY c.Product_ID
ORDER BY `numOrders` DESC
LIMIT 5