MySQL: Increase table retrieval time with subqueries and links to other tables? - mysql

Following example: In my database I have two tables: One that stores user posts and their content and another that stores the likes of other users from these posts. If a user likes a post a new row gets inserted into the likes table.
If I make a SELECT call on the posts table it also returns the number of likes of the respective post using subqueries (SQL Fiddle):
SELECT allPosts.*,
(SELECT COUNT(*) FROM likes WHERE likes.postID = allPosts.id) AS likeCount
FROM posts allPosts
ORDER BY likeCount DESC
LIMIT 3;
| id | content | likeCount |
|----|---------|-----------|
| 2 | Post 2 | 2 |
| 3 | Post 3 | 1 |
| 1 | Post 1 | 0 |
Problem: If, for example, you want to sort by likeCount using ORDER BY, likeCount must also be generated for each individual post at the same time, which is particularly problematic for very large tables with several thousands of posts. For example, it happened to me that a call took up to 10 seconds for a table containing about 2000 posts, which of course is too slow.
How can you solve this problem? How can I sort by likeCount without having to query likeCount for each individual post, but still sort the posts based on their number of likes?
I am grateful for any help!

Correlated subquery could be rewritten as JOIN:
SELECT allPosts.id, allPosts.content,
COUNT(likes.postID) AS likeCount
FROM posts allPosts
LEFT JOIN likes
ON likes.postID = allPosts.id
GROUP BY allPosts.id, allPosts.content
ORDER BY likeCount DESC
LIMIT 3;
SQLFiddle demo

Related

Can this query, which groups users by amount of comments posted, be simplified?

Two tables are used in this query, and all that matters in the result is the number of users which have or haven't posted any comments so far. The table user of course has the column id, which is the foreign key in the table comment, identified by the column user_id.
The first super-simple query groups users by whether or not they have any comments so far. It outputs two rows (a row with the user count who have comments, and a row with the user count who have no comments), with two columns (number of users, and whether or not they have posted any comments).
SELECT
COUNT(id) AS user_count,
IF( id IN ( SELECT user_id FROM `comment` ), 1, 0) AS has_comment
FROM `user`
GROUP BY has_comment
An example of how the output would look like here:
+------------+-------------+
| user_count | has_comment |
+------------+-------------+
| 150 | 0 |
| 140 | 1 |
+------------+-------------+
Now here comes my question. I want slightly more information here, by grouping these users into 3 groups instead:
Users that have posted no comments
Users that have posted fewer than 10 comments
Users that have posted 10 or more comments
And the best query that I know how to write for this purpose is as follows, which works, but unfortunately runs 4 subqueries and has 2 derived tables:
SELECT
COUNT(id) AS user_count,
CASE
WHEN id IN ( SELECT user_id FROM ( SELECT COUNT(user_id) AS comment_count, user_id FROM `comment` GROUP BY user_id HAVING comment_count >= 10 ) AS a) THEN '10 or more'
WHEN id IN ( SELECT user_id FROM ( SELECT COUNT(user_id) AS comment_count, user_id FROM `comment` GROUP BY user_id HAVING comment_count < 10 ) AS b) THEN 'less than 10'
ELSE 'none'
END AS has_comment
FROM `user`
GROUP BY has_comment
An example of the output here would be something like:
+------------+-------------+
| user_count | has_comment |
+------------+-------------+
| 150 | none |
| 130 | less than 10|
| 100 | 10 or more |
+------------+-------------+
This second query; can it be written more simply and efficiently, and still produce the same kind of result? (potentially maybe even be expanded into more of these kinds of "groups")
You can use two levels of aggregation:
select
count(*) no_users,
case
when no_comments = 0 then 'none'
when no_comments < 10 then 'less than 10'
else '10 or more'
end has_comment
from (
select
u.id,
(select count(*) from comments c where c.user_id = u.id) no_comments
from users u
) t
group by has_comment
order by no_comments
The subquery counts how many comments each user has (you could also express this with a left join and aggregation); then, the outer query classifies and count the users per number of comments.

MySQL - Searching From 3 tables with one query

I have 3 table
Table news:
id_post | news | id_user
3 | IT news | 1
4 | game news | 2
Table user:
id_user | username
1 | bocah
2 | gundul
And Table vote
id_vote | id_post | id_user | LIKE
10 | 3 | 2 | 1
And this is my sql query:
SELECT post.*, username, like, SUM(vote.like) AS like FROM post
INNER JOIN user ON post.id_user=user.id_user
INNER JOIN vote ON post.id_post=vote.id_post
WHERE
(`title` LIKE '%$word%' OR `username` LIKE '%$word%') AND post.id_user=user.id
LIMIT 15
I just want to create search form from searching post or user based on keyword. Then display post, user's username which is also the author of post and total like in that news.
The problem is when keyword not match with any post or any user, my expectation, it should return an empty row. But it's not, it's return 1 row with NULL value.
Any answer to solve this?
Apart from getting one row when there is no match, you probably also get one row when there are multiple matches. I think this is because you have an aggregation (sum) without having a group by.
If I'm correct, adding a group by clause should solve the problem:
SELECT post.*, username, like, SUM(vote.like) AS like FROM post
INNER JOIN user ON post.id_user=user.id_user
INNER JOIN vote ON post.id_post=vote.id_post
WHERE
(`title` LIKE '%$word%' OR `username` LIKE '%$word%') AND post.id_user=user.id
GROUP BY post.id /* Or what's its name */
LIMIT 15
It will then, however, return results by post, so if you search for a user, you will still get all their posts (that is, the top 15), but my guess is that that's exactly what you want.

Query for Highest Rated Video (Likes / Dislikes) SLOW when using ORDER

(Using MySQL) I have a table of videos (simplified):
+---------+-------------+-------------+
| videoID | videoName | videoAuthor |
+---------+-------------+-------------+
| 1 | cool_video1 | rocky |
| 2 | mixingTest2 | sensable |
+---------+-------------+-------------+
and a table of video ratings, that is, every time a user likes or dislikes a video, a row gets added/updated in the videoRatings table:
for example, this would mean the video with videoID 1 has two likes and one dislike. a "1" is a like in the videoRatings table. a "2" is a dislike (simplified)
+---------------+---------+-------------------+
| videoRatingID | videoID | videoRatingTypeID |
+---------------+---------+-------------------+
| 121 | 1 | 1 |
| 234 | 1 | 1 |
| 290 | 1 | 2 |
+---------------+---------+-------------------+
now, simple enough, all I want to do is grab the highest scoring videos of about 100,000 videos.
naturally, I do something like this:
SELECT Videos.videoID,
COUNT(CASE WHEN videoRatingTypeID =1 THEN 1 ELSE NULL END) AS likes,
COUNT(CASE WHEN videoRatingTypeID =2 THEN 1 ELSE NULL END) AS dislikes
FROM Videos
LEFT JOIN VideoRatings ON VideoRatings.videoID = Videos.videoID
GROUP BY Videos.videoID
ORDER BY likes DESC
but this query runs in about a half of a second. That worries me that when the videos table gets to >1mil, this will be even longer. The videoRatings table is very small (~40 rows), and the videos table is ~100,000 rows.
I have indexes on the videoID obviously in the Videos table and indexes in my videoRatings table on videoID, videoRatingID, and a composite index on videoID+videoRatingID
I don't see a better way to do this. I've read several posts about moving the order by outside. But when I do that:
SELECT * FROM (
SELECT Videos.videoID,
COUNT(CASE WHEN videoRatingTypeID =1 THEN 1 ELSE NULL END) AS likes,
COUNT(CASE WHEN videoRatingTypeID =2 THEN 1 ELSE NULL END) AS dislikes
FROM Videos
LEFT JOIN VideoRatings
GROUP BY Videos.videoID
) tmp
ORDER BY tmp.likes DESC
there is zero improvement.
What is a better way to approach this layout, or this query? Thank you!
For real scalability, I think you will need a solution that maintains a summary table. In the meantime, this might be faster:
select v.videoID,
(select count(*)
from VideoRatings vr
where vr.videoID = v.videoID and
videoRatingTypeId = 1
) as likes,
(select count(*)
from VideoRatings vr
where vr.videoID = v.videoID
videoRatingTypeId = 2
) as dislikes
from Videos v;
Be sure that you have an index on VideoRatings(videoId, videoRatingTypeId) (actually, the type id is not so important in the index, but it can help).
This replaces the group by on the whole set of videos and ratings, with index scans and small aggregations. This will scale up, as long as the index for videoRatings fits into memory.
EDIT:
Your video ratings table is very spartan, containing little more information than the summary number of likes and dislikes. Such a table might have the date/time of the rating and the person who did the rating, for instance.
BUT. You are adding new ratings by inserting rows into this table. Well, it is almost the same operation to update another table (which could be videos) with the information. Then think of your current table as the historical log.
The advantage to using updates is that you can truncate the log when it gets big. Right now, you have to keep every rating since the beginning of time for every video.

Joining votes between tables and getting a percentage

I have been working at this one for an hour. I have tried several different joins, and subqueries with no luck. Here is the situation.
Two tables. One with an main index, and one a listing of votes from users. I want to determine how many votes a particular user is leaving for another user (easy)... THEN figuring out the percentage of the total votes and sorting by that (hard).
Table 1 has columns: post_id, poster_id
Table 2 has columns: post_id, voter_id, vote
The post_id is correlated between the two tables. A simple query like this will get you an output showing the total votes 1 user has left another user... then sorts showing who has left the most votes for another.
SELECT poster_id, voter_id, count(*) AS votes
FROM table_1, table_2
WHERE table_1.post_id = table_2.post_id
GROUP BY poster_id, voter_id
ORDER BY votes DESC
That works great... but I want to see who is leaving the most votes as a percentage of the users total votes. So I need to ALSO get the total votes a "poster_id" has, then divide the current number into a percentage... then sort into that percentage. Output should be something like:
poster_id | voter_id | votes | vote_total | percent
----------------------------------------------------
1 | 3 | 10 | 10 | 100%
3 | 1 | 15 | 25 | 60%
2 | 1 | 3 | 6 | 50%
2 | 3 | 2 | 6 | 33%
3 | 2 | 5 | 25 | 20%
2 | 4 | 1 | 6 | 17%
etc.
Basically voter #3 is responsible for 100% of poster #1's votes. Poster #3 got 60% of its votes from voter #1... etc.
We're trying to find out if there is a particular user giving someone more votes (as a percentage) than other users to try and find potential abuses.
I thought a RIGHT JOIN would work, but it is not working out.
SELECT t1.poster_id, t1.voter_id, count(*) AS votes, count(t3.*) AS votes_total, votes / votes_total AS percentage
FROM (table_1 t1, table_2 t2)
RIGHT JOIN (table_1 t3, table_2 t4)
ON (t3.post_id = t4.post_id AND t3.poster_id = t1.poster_id)
WHERE t1.post_id = t2.post_id
GROUP BY t1.poster_id, t2.voter_id
ORDER BY percentage DESC
Basically runs forever and doesn't return anything. I typed that query from memory, and doesn't exactly represent the real table names. Any points in the right direction would help. Inner join perhaps?
Try this:
SELECT
poster_id,
voter_id,
count(*) AS votes,
count(*) * 100 / total.total_votes as percentage
FROM table_1
join (
select poster_id, count(*) AS total_votes
FROM table_1
join table_2 on table_1.post_id = table_2.post_id
GROUP BY poster_id
) total on total.poster_id = table_1.poster_id
join table_2 on table_1.post_id = table_2.post_id
GROUP BY poster_id, voter_id
ORDER BY votes DESC
This uses a subquery to return the total votes for each poster_id then joins to it as if it were a table.
Note also the change to using proper joins instead of joining through the where clause.

Select row based on a rank and previously selected row

I am attempting to do a forum rank system, which can get one record at a time based on the previous record.
The example table is:
| POST_ID | POST_REPLIES |
--------------------------
| 1 | 5 |
| 2 | 2 |
| 3 | 8 |
| 4 | 8 |
| 5 | 12 |
--------------------------
If I do a simple query ORDER BY POST_REPLIES DESC, I get POST_ID 5, 4, 3, 1, 2.
But what I want to do is get onlythe next row (so a single row at a time), and based on the post it is currently at.
For example: if I am currently viewing post #3, there would be a button labeled 'next post with most replies' which would point to post # 4.
I am currently having trouble dealing with duplicates, as I run into a loop between 3 and 4 (3 points to 4, and 4 points to 3 rather than 5)
I had played around with joining the table onto itself and comparing the rows to see which one was greater or less, but since I am using a limit of 1, the row is always 1 and thus useless. So the basic query I had was:
SELECT * FROM posts
WHERE post_id != '$currentPost'
ORDER BY POST_REPLIES DESC, POST_ID DESC LIMIT 1
How can I do this?
The first step you would need would be to "rank" the results. The best way I have found to do this in MySQL is with a variable like so:
SELECT posts.post_id, posts.post_replies, #rank := #rank + 1 AS rank
FROM posts, (SELECT #rank := 0) r
Then you would probably have to nest that query in another one to accomplish what you need. Let me know if that points you in the right direction
SELECT p.*
FROM (
SELECT POST_ID, POST_REPLIES
FROM posts
WHERE POST_ID = #currentId) as cur
JOIN posts p
ON p.POST_REPLIES >= cur.POST_REPLIES
AND p.POST_ID > cur.POST_ID
ORDER BY p.POST_REPLIES DESC, p.POST_ID
LIMIT 1 ;
Limit using a range, for example
limit 0,1
where zero is the starting record and one is the number of records to fetch.
limit 5,1
would get you the sixth record and so on. You can track the page number via post, get, or session and use it to manipulate the query in this way.
It is also common to fetch and store all the records, then present them per page, however this can be problematic if you expect to generate a large number of records.