Left joins --> aggregate function problem - mysql

I have four different tables in my database:
thread:
thread_id
thread_content
timestamp
thread_rating:
thread_rating_id
thread_id
liked
disliked
thread_report:
thread_report_id
thread_id
thread_impression:
thread_impression_id
thread_id
And I'm going to join on these tables with this SQL-Query
SELECT t.thread_id,
t.thread_content,
SUM(tra.liked) AS liked,
SUM(tra.disliked) AS disliked,
t.timestamp,
((100*(tra.liked + SUM(tra.liked))) / (tra.liked + SUM(tra.liked) + (tra.disliked + SUM(tra.disliked)))) AS liked_percent,
((100*(COUNT(DISTINCT tre.thread_report_id)) / ((COUNT(DISTINCT ti.thread_impression_id))))) AS reported_percent
FROM thread AS t
LEFT JOIN thread_rating AS tra ON t.thread_id = tra.thread_id
LEFT JOIN thread_report AS tre ON tra.thread_id = tre.thread_id
LEFT JOIN thread_impression AS ti ON tre.thread_id = ti.thread_id
GROUP BY t.thread_id
ORDER BY liked_percent
The Query should return all thread_ids with the calculated liked and disliked, the likes in percent, the timestamp, when the thread was inserted into the database and the reports in percent to the impressions (the times, the thread was shown to the user)...
Nearly all results are right, the only results which are not right are the likes and dislikes.
If I put a count(*) in front of the query, I can see, that the right results have a count of 1 and the wrong ones have sometimes a count of up to 60.
Seems like there are cross join-problems...
I think that this is an issue with the Grouping, or perhaps I should embrace the Joins.
I've seen solutions with subselects. But I don't think that this is a great solutions for this issue...
What am I doing wrong here?

The tra table has multiple records per thread_id. This caused double counts in the SUM function.
Do the summations in a subselect, grouped by the join field.
That way you will only have one thread_id in tra2 to join with and duplicate rows will be avoided.
SELECT t.thread_id,
t.thread_content,
tra2.liked
tra2.disliked,
t.timestamp,
tra2.liked_percent,
((100*(COUNT(DISTINCT tre.thread_report_id)) / ((COUNT(DISTINCT ti.thread_impression_id))))) AS reported_percent
FROM thread AS t
LEFT JOIN (
SELECT
tra.thread_id
, SUM(tra.liked) AS liked
, SUM(tra.disliked) AS disliked
, ((100*(tra.liked + SUM(tra.liked))) / (tra.liked + SUM(tra.liked) + (tra.disliked + SUM(tra.disliked)))) AS liked_percent
FROM thread_rating AS tra
GROUP BY tra.thread_id
) as tra2 ON t.thread_id = tra2.thread_id
LEFT JOIN thread_report AS tre ON tra.thread_id = tre.thread_id
LEFT JOIN thread_impression AS ti ON tre.thread_id = ti.thread_id
GROUP BY t.thread_id
ORDER BY liked_percent DESC

Related

Mysql getting only 1 result, rather than multiple

Short setup
consider the following.
SELECT forum_category.groupid,
forum_category.categoryid,
forum_category.categoryname,
forum_category.categorydescription,
forum_category.category_url,
forum_category.accesslevel ,
COUNT(DISTINCT forum_topic.topicid) AS topics ,
COUNT(DISTINCT forum_post.postid) AS posts
FROM forum_category
INNER JOIN forum_topic ON forum_topic.categoryid=forum_category.categoryid
INNER JOIN forum_post ON forum_post.topicid=forum_topic.topicid
WHERE groupid = 1
result
This gives me actually one result, while i expect multiple rows (in this case 2) to come back. What am I missing here?

Generating user post count and thread count from different tables

So I don't know much about MySQL but I heard about views and I'm trying to wrath my head around it.
Basically what I want to do is
check the table forum_posts, count the number of posts made by each user
query forum_users for each user to get every column and add to the view
query forum_threads to get the number of threads made by that user.
I don't know if that order is correct performance-wise but the final view should in theory look like either
1.
UId (user id from forum_users)
UName (user name from forum_users)
UThreads (user thread count)
UPosts (user post count)
UFakePosts (named UPosts in forum_users, I'll rename that later to UFakePosts)
ULastPost (this one is not that important but I'm just throwing it here in case anyone knows how to do it, I imagine it would be possible by selecting the post with the biggest PDate column)
2.
All of forum_users but renaming the forum_users.UPosts and forum_users.UThreads to UFakePosts and UFakeThreads
ULastPost
UThreads (user thread count)
UPosts (user post count)
I managed to get the post count working by using the following code
SELECT
IFNULL(a.UId,-1) AS UId,
IFNULL(a.UName,'Unknown') AS UName,
postsquery.Posts AS UPosts,
IFNULL(a.UPosts,-1) AS UFakePosts
FROM
(
SELECT p.PId, p.PAuthorId, COUNT(p1.PAuthorId) as Posts
FROM forum_posts AS p
LEFT JOIN forum_posts AS p1 ON p1.PId = p.PId
GROUP BY p.PAuthorId
)
AS postsquery
LEFT JOIN forum_users AS a ON postsquery.PAuthorId = a.UId
ORDER BY postsquery.Posts DESC
which generates the following result
but no success with getting threads, I can get one of the other but not both at the same time.
I've also tried this
SELECT IFNULL(a.UId,-1) AS UId,
IFNULL(a.UName,'Unknown') AS UName,
postsquery.Posts AS UPosts,
threadsquery.Threads AS UThreads,
IFNULL(a.UPosts,-1) AS UFakePosts
FROM
(
SELECT p.PId, p.PAuthorId, COUNT(p.PAuthorId) as Posts
FROM forum_posts AS p
)
AS postsquery
LEFT JOIN forum_users AS a1 ON postsquery.PAuthorId = a1.UId,
(
SELECT t.TId, t.TAuthorId, COUNT(t.TAuthorId) as Threads
FROM forum_threads AS t
GROUP BY t.TAuthorId
)
AS threadsquery
LEFT JOIN forum_users AS a ON threadsquery.TAuthorId = a.UId
ORDER BY
postsquery.Posts DESC
.....but the results are wrong:
What's supposed to happen:
Unknown (user that I haven't scraped yet): 1 post / 0 threads
User1: 2 posts / 0 threads
User2: 1 post / 2 threads
User3: 0 posts / 0 threads
If I could do another view but for threads, getting number of unique posters and number of posts that would be cool as well but one thing at a time.
Fiddle with database structure
http://sqlfiddle.com/#!9/c93d9/1
Structure should be pretty easy to understand, U stands for user, T for thread, P for post, D for date and so on.
Create a subselect to get the thread count:
LEFT JOIN
(
SELECT
TAuthorId,
COUNT(1) as thread_count
FROM
forum_threads
GROUP BY
TAuthorId
) threads ON
threads.TAuthorId = postsquery.PAuthorId
And then select that column:
IFNULL(threads.thread_count, 0) as thread_count
Putting it all together:
SELECT IFNULL(a.UId,-1) AS UId, IFNULL(a.UName,'Unknown') AS UName, IFNULL(postsquery.Posts, 0) AS UPosts, IFNULL(threads.thread_count, 0) as thread_count, IFNULL(a.UPosts,-1) AS UFakePosts
FROM
(
SELECT p.PId, p.PAuthorId, COUNT(p1.PAuthorId) as Posts
FROM forum_posts AS p
LEFT JOIN forum_posts AS p1 ON p1.PId = p.PId
GROUP BY p.PAuthorId
)
AS postsquery
LEFT JOIN forum_users AS a ON postsquery.PAuthorId = a.UId
LEFT JOIN
(
SELECT
TAuthorId,
COUNT(1) as thread_count
FROM
forum_threads
GROUP BY
TAuthorId
) threads ON
threads.TAuthorId = postsquery.PAuthorId
ORDER BY
postsquery.Posts DESC

sql counts wrong number of likes

I have written an sql statement that besides all the other columns should return the number of comments and the number of likes of a certain post. It works perfectly when I don't try to get the number of times it has been shared too. When I try to get the number of time it was shared instead it returns a wrong number of like that seems to be either the number of shares and likes or something like that. Here is the code:
SELECT
[...],
count(CS.commentId) as shares,
count(CL.commentId) as numberOfLikes
FROM
(SELECT *
FROM accountSpecifics
WHERE institutionId= '{$keyword['id']}') `AS`
INNER JOIN
account A ON A.id = `AS`.accountId
INNER JOIN
comment C ON C.accountId = A.id
LEFT JOIN
commentLikes CL ON C.commentId = CL.commentId
LEFT JOIN
commentShares CS ON C.commentId = CS.commentId
GROUP BY
C.time
ORDER BY
year, month, hour, month
Could you also tell me if you think this is an efficient SQL statement or if you would do it differently? thank you!
Do this instead:
SELECT
[...],
(select count(*) from commentLikes CL where C.commentId = CL.commentId) as shares,
(select count(*) from commentShares CS where C.commentId = CS.commentId) as numberOfLikes
FROM
(SELECT *
FROM accountSpecifics
WHERE institutionId= '{$keyword['id']}') `AS`
INNER JOIN account A ON A.id = `AS`.accountId
INNER JOIN comment C ON C.accountId = A.id
GROUP BY C.time
ORDER BY year, month, hour, month
If you use JOINs, you're getting back one result set, and COUNT(any field) simply counts the rows and will always compute the same thing, and in this case the wrong thing. Subqueries are what you need here. Good luck!
EDIT: as posted below, count(distinct something) can also work, but it's making the database do more work than necessary for the answer you want to end up with.
Quick fix:
SELECT
[...],
count(DISTINCT CS.commentId) as shares,
count(DISTINCT CL.commentId) as numberOfLikes
Better approach:
SELECT [...]
, Coalesce(shares.numberOfShares, 0) As numberOfShares
, Coalesce(likes.numberOfLikes , 0) As numberOfLikes
FROM [...]
LEFT
JOIN (
SELECT commentId
, Count(*) As numberOfShares
FROM commentShares
GROUP
BY commentId
) As shares
ON shares.commentId = c.commentId
LEFT
JOIN (
SELECT commentId
, Count(*) As numberOfLikes
FROM commentLikes
GROUP
BY commentId
) As likes
ON likes.commentId = c.commentId

MySql, Select, left join, group by, limit.. and where i've to place ORDER?

i've a very confusionary quesry string that at this moment appears like this:
SELECT SQL_CALC_FOUND_ROWS m.*,
(
SELECT count(1) FROM votazioni_messaggi v
WHERE v.idMeggaggio=m.ID
) AS Votato,
AVG(voto) AS votazioneMedia,
n.nomeUtente,
FROM messages m
LEFT JOIN votazioni_messaggi v ON v.idMessaggio = m.ID
LEFT JOIN utenti n ON a.idUtente=n.ID
WHERE n.idUser='$idUser' AND m.Genre IN('animal', 'love')
GROUP BY m.ID,
LIMIT $partenza, 20
In other words, i've to select 20 messages at time that haves like genre "animals" and "love", for each of it retrieve info about user and calculate the vote average. Counting and grouping i can know if an user had already voted or not the message.
Now i've to set a last clause for ORDER messages BY insertion time (i've this data in messages.InsertionTime table).
The clause should be ORDER BY messages.InsertionTime but how i can know in what exact point i've to place it?
Try this:
SELECT SQL_CALC_FOUND_ROWS m.*,
(SELECT count(1) FROM votazioni_messaggi v
WHERE v.idMeggaggio=m.ID
) AS Votato,
AVG(voto) AS votazioneMedia,
n.nomeUtente,
FROM messages m
LEFT JOIN votazioni_messaggi v ON v.idMessaggio = m.ID
LEFT JOIN utenti n ON a.idUtente=n.ID
WHERE n.idUser='$idUser' AND m.Genre IN ('animal', 'love')
GROUP BY m.ID
ORDER BY m.InsertionTime
LIMIT $partenza, 20;
You had at least two problems. One was a comma at the end of the group by. The other was using the table name instead of the table alias.

weird problem with select count from multiple tables (with joins)

I'm having an odd problem with the following query, it works all correct,
the count part in it gets me the number of comments on a given 'hintout'
I'm trying to add a similar count that gets the number of 'votes' for each hintout, the below is the query:
SELECT h.*
, h.permalink AS hintout_permalink
, hi.hinter_name
, hi.permalink
, hf.user_id AS followed_hid
, ci.city_id, ci.city_name, co.country_id, co.country_name, ht.thank_id
, COUNT(hc.comment_id) AS commentsCount
FROM hintouts AS h
INNER JOIN hinter_follows AS hf ON h.hinter_id = hf.hinter_id
INNER JOIN hinters AS hi ON h.hinter_id = hi.hinter_id
LEFT JOIN cities AS ci ON h.city_id = ci.city_id
LEFT JOIN countries as co ON h.country_id = co.country_id
LEFT JOIN hintout_thanks AS ht ON (h.hintout_id = ht.hintout_id
AND ht.thanker_user_id = 1)
LEFT JOIN hintout_comments AS hc ON hc.hintout_id = h.hintout_id
WHERE hf.user_id = 1
GROUP BY h.hintout_id
I tried to add the following to the select part:
COUNT(ht2.thanks_id) AS thanksCount
and the following on the join:
LEFT JOIN hintout_thanks AS ht2 ON h.hintout_id = ht2.hintout_id
but the weird thing happening, to which I could not find any answers or solutions,
is that the moment I add this addtiional part, the count for comments get ruined (I get wrong and weird numbers), and I get the same number for the thanks -
I couldn't understand why or how to fix it...and I'm avoiding using nested queries
so any help or pointers would be greatly appreciated!
ps: this might have been posted twice, but I can't find the previous post
When you add
LEFT JOIN hintout_thanks AS ht2 ON h.hintout_id = ht2.hintout_id
The number of rows increases, you get duplicate rows for table hc, which get counted double in COUNT(hc.comment_id).
You can replace
COUNT(hc.comment_id) <<-- counts duplicated
/*with*/
COUNT(DISTINCT(hc.comment_id)) <<-- only counts unique ids
To only count unique appearances on an id.
On values that are not unique, like co.county_name the count(distinct will not work because it will only list the distinct countries (if all your results are in the USA, the count will be 1).
Quassnoi
Has solved the whole count problem by putting the counts in a sub-select so that the extra rows caused by all those joins do not influence those counts.
SELECT h.*, h.permalink AS hintout_permalink, hi.hinter_name,
hi.permalink, hf.user_id AS followed_hid,
ci.city_id, ci.city_name, co.country_id, co.country_name,
ht.thank_id,
COALESCE(
(
SELECT COUNT(*)
FROM hintout_comments hci
WHERE hc.hintout_id = h.hintout_id
), 0) AS commentsCount,
COALESCE(
(
SELECT COUNT(*)
FROM hintout_comments hti
WHERE hti.hintout_id = h.hintout_id
), 0) AS thanksCount
FROM hintouts AS h
JOIN hinter_follows AS hf
ON hf.hinter_id = h.hinter_id
JOIN hinters AS hi
ON hi.hinter_id = h.hinter_id
LEFT JOIN
cities AS ci
ON ci.city_id = h.city_id
LEFT JOIN
countries as co
ON co.country_id = h.country_id
LEFT JOIN
hintout_thanks AS ht
ON ht.hintout_id = h.hintout_id
AND ht.thanker_user_id=1
WHERE hf.user_id = 1