Wrong MySQL query result using JOIN tables - mysql

The following MySQL query is suppose to rank the posts according to their views + rating + submit date in an ascending order:
select
cat ,
p.id ,
title ,
p.date ,
shares ,
source ,
cat ,
count(v.post_id) views ,
sum(r.ilike) rating,
r.module ,
r.module_id ,
#Rank := #Rank + 1 AS Rank
from
posts p
JOIN
rates r
on
r.module_id = p.id
AND r.module = 'posts'
JOIN
posts_views v
on
v.post_id = p.id
WHERE
p.date <= UNIX_TIMESTAMP(NOW())
AND p.state = '3'
AND
(
p.cat NOT REGEXP '[[:<:]]15[[:>:]]'
)
GROUP BY
r.module_id
ORDER BY
rating DESC ,
views DESC ,
p.date ASC LIMIT 0, 10
Gives the following result:
We have 3 problems in the result:
the views column values are doubled
the rating column values are copying the views' value
The Rank column in NULL

The query is generating a semi-Cartesian product. With multiple matching rows from r and multiple matching rows from v, those rows are getting matched together, inflating the results for rating and views. If we remove the GROUP BY and the aggregate functions, and get detail rows back, we can observe the "duplicate" rows that are causing the views count to be doubled, tripled...
One fix for this is to avoid the Cartesian product, by pre-aggregaing from at least one of the child tables in an inline view. Then we join the derived table to the posts table to get the aggregate to the outer query.
We probably want to consider using an outer join to handle the condition when there are no matching rows in views or rates, so we can return zero count for posts that don't have any views.
Initialize user defined variables, either as a separate statement, or within an inline view.
Also, we want to qualify all column references, both as an aid to the future reader (not force them to look at table definitions to figure out which table a column like cat or title or source is coming from), and to avoid the query breaking with an "ambiguous column" error, in the future when a column of the same name is added to one of the tables referenced in the query.
I suggest something like this:
SELECT p.cat
, p.id
, p.title
, p.date
, p.shares
, p.source
, p.cat
, IFNULL(v.cnt_views,0) AS views
, r.tot_rating AS rating
, r.module
, r.module_id
, #Rank := #Rank + 1 AS Rank
FROM ( SELECT #Rank := 0 ) i
CROSS
JOIN posts p
LEFT
JOIN ( SELECT ra.module_id
, MAX(ra.module) AS module
, SUM(ra.ilike) AS tot_rating
FROM rates ra
WHERE ra.module = 'posts'
GROUP
BY ra.module_id
) r
ON r.module_id = p.id
LEFT
JOIN ( SELECT pv.post_id
, SUM(1) AS cnt_views
FROM posts_views pv
GROUP
BY pv.post_id
) v
ON v.post_id = p.id
WHERE p.date <= UNIX_TIMESTAMP(NOW())
AND p.state = '3'
AND p.cat NOT REGEXP '[[:<:]]15[[:>:]]'
ORDER
BY r.tot_rating DESC
, v.cnt_views DESC
, p.date ASC
LIMIT 0, 10

Related

How to limit record before group by for pagination?

I have this query that will LEFT JOIN and GROUP BY to get SUM of column.
SELECT
c.id,
SUM(
r.score
) AS score_sum,
SUM(
CASE WHEN r.is_active = '0' THEN r.negative ELSE 0 END
) AS negative_sum
FROM comments AS c
LEFT JOIN rates AS r ON (r.comment_id = c.id)
WHERE r.comment_id = c.id
GROUP BY c.id
DB Fiddle link:
https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=fadba795d8426f91471fa4db83845b6f
The query works, but if the comments records is large (10K for example), I need to implement pagination, how do I modify this query to limit the comments records first before GROUP BY?
In short:
Get the first 5 comments by limit to 5
Left join the table rates
Get the SUM by group by
Example, show the first 4 comments SUM
Thanks
You can use subquery to "select c.id from comments limit N" in the FROM clause.
select c.id,
sum(r.score) as score_sum,
SUM(
CASE WHEN r.is_active = '0' THEN r.negative ELSE 0 END
) AS negative_sum
from ( select c.id from comments c limit 2) c
LEFT JOIN rates AS r ON (r.comment_id = c.id)
GROUP BY c.id;
You may apply order by in the subquery to determine order in which you want to select the comments (Top N).
DB Fiddle link
Try the following:
SELECT
c.id,
SUM(
r.score
) AS score_sum,
SUM(
CASE WHEN r.is_active = '0' THEN r.negative ELSE 0 END
) AS negative_sum
FROM comments AS c
LEFT JOIN rates AS r ON (r.comment_id = c.id)
WHERE r.comment_id = c.id
GROUP BY c.id
ORDER BY c.id ASC
LIMIT 5
The rationale behind the above query is that id is the Primary key (hence indexed) in your comments table. Also, your GROUP BY and ORDER BY is on the same column, that is, id; so MySQL will first utilize the index on id and get first 5 rows (due to LIMIT), and then proceed forward to JOIN with other tables and do aggregation etc.
Give it a Try!! More details here: https://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html
We can confirm the same using EXPLAIN .. on this query.

Mysql count result in "where" clause

I'm facing a little problem with mysql where clause.
This is the query:
SELECT u.id user
, p.id product_purchased
, p.name product_name
, pl.store_id store
, COUNT(*) occurrences
, total_spent
, total_product_purchased
, pl.registration
FROM purchases_log pl
JOIN user u
ON pl.user_id = u.id
JOIN product p
ON pl.product_id = p.id
JOIN
( SELECT user_id
, SUM(price) total_spent
, COUNT(product_id) total_product_purchased
FROM purchases_log pl
GROUP
BY user_id
) t1
ON u.id = t1.user_id
WHERE pl.store_id IN (1,2,3)
AND occurrences > 1
GROUP
BY user
, product_name
ORDER
BY u.id ASC
, pl.registration ASC;
This is the output error:
Error Code: 1054. Unknown column 'occurrences' in 'where clause' 0.067 sec
I have already tried assign AS to occurrences or using pl.
So, can someone explain me how to correctly define the result of a count function in where clause?
You need to use HAVING instead of COUNT as group by is applied after WHERE clause and hence, it won't know about any group/aggregate columns, e.g/:
SELECT u.id user,p.id product_purchased, p.name product_name, pl.store_id store, COUNT(*) AS occurrences, total_spent, total_product_purchased, pl.registration
FROM purchases_log pl
JOIN user u ON pl.user_id=u.id
JOIN product p ON pl.product_id=p.id
JOIN (SELECT user_id, SUM(price) AS total_spent,COUNT(product_id) AS total_product_purchased FROM purchases_log pl GROUP BY user_id) t1 ON u.id=t1.user_id
WHERE pl.store_id IN (1,2,3)
GROUP BY user, product_name
HAVING COUNT(*) > 1
ORDER BY u.id ASC, pl.registration ASC;
Update
If a user has more than one product associated then it's good to add all the non aggregate columns in GROUP BY to get all the combinations of user and product. The current query will not return all the combinations.
For further optimization, as #strawberry has suggest, you can run EXPLAIN and see which indices are used and whether there is any need to create any new index.

Count the number of times an index exists using a query containing multiple EXIST() statements

My query gets the results of these products based on if they exist in a separate table index. I am trying to get a count of all the instances where they exist so I can ORDER the results by relevance. Everything I try seems to return the variable #priority as 0. Any ideas?
Maybe it is better to use join statements?
Thank you for your help. Here is my MySQL query:
SELECT `products` . * , #priority
FROM `products`
LEFT JOIN productstypes_index ON productstypes_index.product_id = products.id
WHERE (
EXISTS (
SELECT *
FROM `productstypes_index`
WHERE `productstypes_index`.`product_id` = `products`.`id`
AND `productstypes_index`.`_type_id` = '1'
)
AND (
(
(
EXISTS (
SELECT #priority := COUNT( * )
FROM `producthashtags_index`
WHERE `producthashtags_index`.`product_id` = `products`.`id`
AND `producthashtags_index`.`producthashtag_id` = '43'
)
)
AND (
EXISTS (
SELECT #priority := COUNT( * )
FROM `producthashtags_index`
WHERE `producthashtags_index`.`product_id` = `products`.`id`
AND `producthashtags_index`.`producthashtag_id` = '11'
)
)
)
)
)
ORDER BY `updated_at` DESC;
You could do without those exists, and without variables. Also, a left join has no sense if you have an exists condition on the joined table. Then you might as well do the more efficient inner join and put the extra type condition in the join condition.
The priority can be calculated by a count over the hash tags, but only those with id in ('43', '11').
SELECT products.*
count(distinct producthashtags_index.producthashtag_id) priority
FROM products
INNER JOIN productstypes_index
ON productstypes_index.product_id = products.id
AND productstypes_index._type_id = '1'
INNER JOIN producthashtags_index
ON producthashtags_index.product_id = products.id
AND producthashtags_index.producthashtag_id in ('43', '11')
GROUP BY products.id
ORDER BY updated_at DESC;
MySQL ignores the SELECT list in EXISTS subquery, so it makes no difference what you type in there. This is documented here.
An approach using joins would look like below:
SELECT p.id,
COUNT(case when phi.product_id is not null then 1 end) AS instances
FROM products p
INNER JOIN productstypes_index pti ON pti.product_id = p.id AND pti.`_type_id` = 1
LEFT JOIN producthashtags_index phi ON phi.product_id = p.id AND phi.producthashtag_id IN (11,43)
GROUP BY p.id
ORDER BY instances DESC;
I have removed additional backticks where I believe they are not neccessary and also if your id columns in tables are integers, you do not need quotation marks.

SQL query to check if value doesn't exist in another table

I have a SQL query which does most of what I need it to do but I'm running into a problem.
There are 3 tables in total. entries, entry_meta and votes.
I need to get an entire row from entries when competition_id = 420 in the entry_meta table and the ID either doesn't exist in votes or it does exist but the user_id column value isn't 1.
Here's the query I'm using:
SELECT entries.* FROM entries
INNER JOIN entry_meta ON (entries.ID = entry_meta.entry_id)
WHERE 1=1
AND ( ( entry_meta.meta_key = 'competition_id' AND CAST(entry_meta.meta_value AS CHAR) = '420') )
GROUP BY entries.ID
ORDER BY entries.submission_date DESC
LIMIT 0, 25;
The votes table has 4 columns. vote_id, entry_id, user_id, value.
One option I was thinking of was to SELECT entry_id FROM votes WHERE user_id = 1 and include it in an AND clause in my query. Is this acceptable/efficient?
E.g.
AND entries.ID NOT IN (SELECT entry_id FROM votes WHERE user_id = 1)
A left join with an appropriate where clause might be useful:
SELECT
entries.*
FROM
entries
INNER JOIN entry_meta ON (entries.ID = entry_meta.entry_id)
LEFT JOIN votes ON entries.ID = votes.entry_id
WHERE 1=1
AND (
entry_meta.meta_key = 'competition_id'
AND CAST(entry_meta.meta_value AS CHAR) = '420')
AND votes.entry_id IS NULL -- This will remove any entry with votes
)
GROUP BY entries.ID
ORDER BY entries.submission_date DESC
Here's an implementation of Andrew's suggestion to use exists / not exists.
select
e.*
from
entries e
join entry_meta em on e.ID = em.entry_id
where
em.meta_key = 'competition_id'
and cast(em.meta_value as char) = '420'
and (
not exists (
select 1
from votes v
where
v.entry_id = e.ID
)
or exists (
select 1
from votes v
where
v.entry_id = e.ID
and v.user_id != 1
)
)
group by e.ID
order by e.submission_date desc
limit 0, 25;
Note: it's generally not a good idea to put a function inside a where clause (due to performance reasons), but since you're also joining on IDs you should be OK.
Also, The left join suggestion by Barranka may cause the query to return more rows than your are expecting (assuming that there is a 1:many relationship between entries and votes).

Limit in subquery

When I use the following query without LIMIT nested in a subquery
SELECT `c`.*,
GROUP_CONCAT(g.photo SEPARATOR "|") AS `photos_list`
FROM `contests` AS `c`
LEFT JOIN
(
SELECT `gallery`.`contest_id`,
`gallery`.`photo`
FROM `gallery`
) AS `g` ON c.id = g.contest_id
GROUP BY `c`.`id`
all works fine
id title photos_list
1 title1 50026c35632eb.jpg
2 title2 50026ac53567f.jpg|50026ac5ec82e.jpg|500e71557270f....
Bun when I add LIMIT, I get "photos_list" in only one row. Following query
SELECT `c`.*,
GROUP_CONCAT(g.photo SEPARATOR "|") AS `photos_list`
FROM `contests` AS `c`
LEFT JOIN
(
SELECT `gallery`.`contest_id`,
`gallery`.`photo`
FROM `gallery`
LIMIT 0, 2
) AS `g` ON c.id = g.contest_id
GROUP BY `c`.`id`
will return
id title photos_list
1 title1 NULL
2 title2 50026ac46ea05.jpg|50026ac53567f.jpg
Item with an id = 1 has to contain photos_list, but it doesn't. Noteworthy that LIMIT does work for item with an id = 2.
What should I do to get a correct result?
SELECT `c`.*,
GROUP_CONCAT(g.photo SEPARATOR "|") AS `photos_list`
FROM `contests` AS `c`
LEFT JOIN
(
SELECT `gallery`.`contest_id`,
`gallery`.`photo`
FROM `gallery`
) AS `g` ON c.id = g.contest_id
GROUP BY `c`.`id`
Change GROUP_CONCAT to this:
SUBSTRING_INDEX(GROUP_CONCAT(g.photo SEPARATOR "|"),'|',2) AS `photos_list`
You can do similar things with timestamps (e.g. AND photo_date > gsub.photo_date) or more complex criteria. The only caveat is that if there are several rows that all match the conditions (e.g. several photos have identical timestamps), all of them will be included. That's why I chose photo_id, which is assumably unique.
Insert it into your original query like so:
SELECT c.id, c.title,
GROUP_CONCAT(g.photo SEPARATOR "|") AS photos_list
FROM contests AS c
LEFT JOIN (
//put query from above here
) AS g
ON c.id = g.contest_id GROUP BY c.id
This works as well. However, without wrapping another SELECT clause around it, if there are no photos for a contest, the contest will not show up.
SELECT c.*, GROUP_CONCAT(g.photo SEPARATOR "|") AS photo_list
FROM
contests c
LEFT JOIN
(SELECT *, #num:= if(#contest = contest_id, #num + 1,1) as row_num,
#contest := contest_id as c_id
FROM gallery
ORDER BY contest_id) AS g
ON c.id = g.contest_id
WHERE g.row_num <= 2
GROUP BY c.id, c.title
SELECT c.*, ((
SELECT GROUP_CONCAT(temp.photo SEPARATOR "|")
FROM (SELECT photo FROM gallery g WHERE c.id = g.contest_id LIMIT 2) temp
)) AS photo_list
FROM contests c
Sorry for the incorrect answer. I'm not saying that the following solution is the optimum one but at least it works. BTW, in this new solution I've assumed that you gallery table has a primary key named id.
SELECT c.*, GROUP_CONCAT(g.photo SEPARATOR "|") AS photos_list
FROM contests AS c
LEFT JOIN (
SELECT
g_0.*
FROM (
SELECT
g_1.*
, ((SELECT COUNT(*) FROM gallery g_2 WHERE g_2.contest_id = g_1.contest_id AND g_2.id <= g_1.id)) AS i
FROM gallery g_1
) g_0
WHERE
g_0.i <= 2
) g ON (c.id = g.contest_id)
GROUP BY c.id
How do you decide which 2 of the possible set of photos for a particular contest should be returned? Is it meant to be a random thing? Or is it the 2 most recent photos, or the 2 highest rated photos, or some other criteria? Once you can set a condition for choosing the photos, the rest is straighforward. This query would get you the 2 photos with the highest photo_ids for each contest_id:
SELECT contest_id, photo, photo_id
FROM gallery gsub
WHERE (
SELECT COUNT(*) FROM gallery
WHERE contest_id=gsub.contest_id //for each category
AND photo_id > gsub.photo_id
) < 2 //if number of photo_ids > than this photo_id < 2, keep this photo
ORDER BY contest_id
You can do similar things with timestamps (e.g. AND photo_date > gsub.photo_date) or more complex criteria. The only caveat is that if there are several rows that all match the conditions (e.g. several photos have identical timestamps), all of them will be included. That's why I chose photo_id, which is assumably unique.
Insert it into your original query like so:
SELECT c.id, c.title,
GROUP_CONCAT(g.photo SEPARATOR "|") AS photos_list
FROM contests AS c
LEFT JOIN (
//put query from above here
) AS g
ON c.id = g.contest_id GROUP BY c.id