(Using MySQL) I have a table of videos (simplified):
+---------+-------------+-------------+
| videoID | videoName | videoAuthor |
+---------+-------------+-------------+
| 1 | cool_video1 | rocky |
| 2 | mixingTest2 | sensable |
+---------+-------------+-------------+
and a table of video ratings, that is, every time a user likes or dislikes a video, a row gets added/updated in the videoRatings table:
for example, this would mean the video with videoID 1 has two likes and one dislike. a "1" is a like in the videoRatings table. a "2" is a dislike (simplified)
+---------------+---------+-------------------+
| videoRatingID | videoID | videoRatingTypeID |
+---------------+---------+-------------------+
| 121 | 1 | 1 |
| 234 | 1 | 1 |
| 290 | 1 | 2 |
+---------------+---------+-------------------+
now, simple enough, all I want to do is grab the highest scoring videos of about 100,000 videos.
naturally, I do something like this:
SELECT Videos.videoID,
COUNT(CASE WHEN videoRatingTypeID =1 THEN 1 ELSE NULL END) AS likes,
COUNT(CASE WHEN videoRatingTypeID =2 THEN 1 ELSE NULL END) AS dislikes
FROM Videos
LEFT JOIN VideoRatings ON VideoRatings.videoID = Videos.videoID
GROUP BY Videos.videoID
ORDER BY likes DESC
but this query runs in about a half of a second. That worries me that when the videos table gets to >1mil, this will be even longer. The videoRatings table is very small (~40 rows), and the videos table is ~100,000 rows.
I have indexes on the videoID obviously in the Videos table and indexes in my videoRatings table on videoID, videoRatingID, and a composite index on videoID+videoRatingID
I don't see a better way to do this. I've read several posts about moving the order by outside. But when I do that:
SELECT * FROM (
SELECT Videos.videoID,
COUNT(CASE WHEN videoRatingTypeID =1 THEN 1 ELSE NULL END) AS likes,
COUNT(CASE WHEN videoRatingTypeID =2 THEN 1 ELSE NULL END) AS dislikes
FROM Videos
LEFT JOIN VideoRatings
GROUP BY Videos.videoID
) tmp
ORDER BY tmp.likes DESC
there is zero improvement.
What is a better way to approach this layout, or this query? Thank you!
For real scalability, I think you will need a solution that maintains a summary table. In the meantime, this might be faster:
select v.videoID,
(select count(*)
from VideoRatings vr
where vr.videoID = v.videoID and
videoRatingTypeId = 1
) as likes,
(select count(*)
from VideoRatings vr
where vr.videoID = v.videoID
videoRatingTypeId = 2
) as dislikes
from Videos v;
Be sure that you have an index on VideoRatings(videoId, videoRatingTypeId) (actually, the type id is not so important in the index, but it can help).
This replaces the group by on the whole set of videos and ratings, with index scans and small aggregations. This will scale up, as long as the index for videoRatings fits into memory.
EDIT:
Your video ratings table is very spartan, containing little more information than the summary number of likes and dislikes. Such a table might have the date/time of the rating and the person who did the rating, for instance.
BUT. You are adding new ratings by inserting rows into this table. Well, it is almost the same operation to update another table (which could be videos) with the information. Then think of your current table as the historical log.
The advantage to using updates is that you can truncate the log when it gets big. Right now, you have to keep every rating since the beginning of time for every video.
Related
I have a table where it stores the types of discounts that a user can have.
Some users will get the standard discount, but some will get a bigger and better discount. For users who have the biggest and best discount, there will be two records in the database, one for the default discount and the other for the biggest and best discount. The biggest and best discount will be preferred in the search.
I would like to do a SELECT that would return the record with the highest discount and if you don't find it, return it with the standard discount for me to avoid making two queries in the database or having to filter in the source code.
Ex:
| id | user_id | country | discount | cashback | free_trial |
|-----------------------------------------------------------------------|
| 1 | 1 | EUA | DEFAULT | 10 | false |
| 2 | 1 | EUA | CHRISTMAS | 20 | true |
| 3 | 3 | EUA | DEFAULT | 10 | false |
SELECT *
FROM users
WHERE country = 'EUA'
AND (discount = 'CHRISTMAS' OR discount = 'DEFAULT');
In this example above for user 1 it would return the record with the discount equal to "CHRISTMAS" and for user 3 it would return "DEFAULT" because it is the only one that has. Can you help me please?
You can use the row_number() window function to do this. This function includes a PARTITION BY that lets you start the numbering over with each user, as well as it's own ORDER BY that lets you determine which rows will sort first within each user/partition.
Then you nest this inside another SELECT to limit to rows where the row_number() result is 1 (the discount that sorted best):
SELECT *
FROM (
SELECT *, row_number() OVER (PARTITION BY id, ORDER BY cashback desc) rn
FROM users
WHERE country = 'EUA'
) u
WHERE rn = 1
You could also use a LATERAL JOIN, which is usually better than the correlated join in the other answer, but not as good as the window function.
You can using GROUP BY to do it
SELECT u1.*
FROM users u1
JOIN
(
SELECT COUNT(id) AS cnt,user_id
FROM users WHERE country = 'EUA'
GROUP BY user_id
) u2 ON u1.user_id=u2.user_id
WHERE IF(u2.cnt=1,u1.discount='DEFAULT',u1.discount='CHRISTMAS')
DB Fiddle Demo
I have a table with 1v1 matches like this:
match_number|winner_id|loser_id
------------+---------+--------
1 | 1 | 2
2 | 2 | 3
3 | 1 | 2
4 | 1 | 4
5 | 4 | 1
and I would like to get something like this:
player|matches_won|matches_lost
------+-----------+------------
1 | 3 | 1
2 | 1 | 2
3 | 0 | 1
4 | 1 | 1
My MySQL Query looks like this
SELECT win_matches."winner_id" player, COUNT(win_matches."winner_id") matches_won, COUNT(lost_matches."loser_id") matches_lost FROM `matches` win_matches
JOIN `matches` lost_matches ON win_matches."winner_id" = lost_matches."winner_id"
I don't know what I did wrong, but the query just loads forever and doesn't return anything
You want to unpivot and then aggregate:
select player_id, sum(is_win), sum(is_loss)
from ((select winner_id as player_id 1 as is_win, 0 as is_loss
from t
) union all
(select loser_id, 0, 1
from t
)
) wl
group by player_id;
Your query is simply not correct. The two counts will produce the same same value -- COUNT(<expression>) returns the number of non-NULL rows for that expression. Your two counts return the same thing.
The reason it is taking forever is because of the Cartesian product problem. If a player has 10 wins and 10 losses, then your query produces 100 rows -- and this gets worse for players who have played more often. Processing all those additional rows takes time.
If you have a separate players table, then correlated subqueries may be the fastest method:
select p.*,
(select count(*) from t where t.winner_id = p.player_id) as num_wins,
(select count(*) from t where t.loser_id = p.player_id) as num_loses
from players p;
However, this requires two indexes for performance on (winner_id) and (loser_id). Note these are separate indexes, not a single compound index.
You are joining the same table twice.
Both the alias win_matches and lost_matches are on the table matches, causing your loop.
You probably don't need separate tables for win and losses, and could do both in the same table by writing one or zero in a column for each.
I don't to change your model too much and make it difficult to understand, so here is a slight modification and what it could look like:
SELECT m."player_id" player,
SUM(m."win") matches_won,
SUM(m."loss") matches_lost
FROM `matches` m
GROUP BY player_id
Without a join, all in the same table with win and loss columns. It looked to me like you wanted to know the number of win and loss per player, which you can do with a group by player and a sum/count.
I'm trying to make a simple video rate system for a website using two tables, videos and ratings. Basically, there's a video record on the "videos" table, and then 1 or n ratings for that video id inside the "ratings" table. I can select a single video record and add a column based on the average of all the ratings for that video doing:
SELECT v.*, avg(r.rate) FROM videos v
LEFT JOIN ratings r
ON v.id = r.video_id
WHERE v.id = 100;
This returns:
id | name | file | rate
------------------------------------------
100 | Testvideo | test.avi | 4.4286
There are 7 rows for that id inside "ratings" ranging from integers 3 to 5, so the average is working as expected, if the specific video doesn't have a rating it'd throw null for the rate column but everything else shows up. So, I have the method for single ID covered up, but I can't figure out what the query would be if I want to return the whole video list in the same format
id | name | file | rate
------------------------------------------
1 | test1 | test1.avi | 5
2 | super2 | conan.avi | null
3 | mega3 | wedding.avi | 2.1149
I tried a lot of things, but they all returned either a single row (which I assume is because of the avg function) or duplicated rows.
Thanks.
Use a correlated subquery:
SELECT v.*,
(SELECT avg(r.rate)
FROM ratings r
WHERE v.id = r.video_id
) as avgrating
FROM videos v;
I have been working at this one for an hour. I have tried several different joins, and subqueries with no luck. Here is the situation.
Two tables. One with an main index, and one a listing of votes from users. I want to determine how many votes a particular user is leaving for another user (easy)... THEN figuring out the percentage of the total votes and sorting by that (hard).
Table 1 has columns: post_id, poster_id
Table 2 has columns: post_id, voter_id, vote
The post_id is correlated between the two tables. A simple query like this will get you an output showing the total votes 1 user has left another user... then sorts showing who has left the most votes for another.
SELECT poster_id, voter_id, count(*) AS votes
FROM table_1, table_2
WHERE table_1.post_id = table_2.post_id
GROUP BY poster_id, voter_id
ORDER BY votes DESC
That works great... but I want to see who is leaving the most votes as a percentage of the users total votes. So I need to ALSO get the total votes a "poster_id" has, then divide the current number into a percentage... then sort into that percentage. Output should be something like:
poster_id | voter_id | votes | vote_total | percent
----------------------------------------------------
1 | 3 | 10 | 10 | 100%
3 | 1 | 15 | 25 | 60%
2 | 1 | 3 | 6 | 50%
2 | 3 | 2 | 6 | 33%
3 | 2 | 5 | 25 | 20%
2 | 4 | 1 | 6 | 17%
etc.
Basically voter #3 is responsible for 100% of poster #1's votes. Poster #3 got 60% of its votes from voter #1... etc.
We're trying to find out if there is a particular user giving someone more votes (as a percentage) than other users to try and find potential abuses.
I thought a RIGHT JOIN would work, but it is not working out.
SELECT t1.poster_id, t1.voter_id, count(*) AS votes, count(t3.*) AS votes_total, votes / votes_total AS percentage
FROM (table_1 t1, table_2 t2)
RIGHT JOIN (table_1 t3, table_2 t4)
ON (t3.post_id = t4.post_id AND t3.poster_id = t1.poster_id)
WHERE t1.post_id = t2.post_id
GROUP BY t1.poster_id, t2.voter_id
ORDER BY percentage DESC
Basically runs forever and doesn't return anything. I typed that query from memory, and doesn't exactly represent the real table names. Any points in the right direction would help. Inner join perhaps?
Try this:
SELECT
poster_id,
voter_id,
count(*) AS votes,
count(*) * 100 / total.total_votes as percentage
FROM table_1
join (
select poster_id, count(*) AS total_votes
FROM table_1
join table_2 on table_1.post_id = table_2.post_id
GROUP BY poster_id
) total on total.poster_id = table_1.poster_id
join table_2 on table_1.post_id = table_2.post_id
GROUP BY poster_id, voter_id
ORDER BY votes DESC
This uses a subquery to return the total votes for each poster_id then joins to it as if it were a table.
Note also the change to using proper joins instead of joining through the where clause.
I read many topics about this problem but I can't find the solution.
I have a table (called users) with the users of my website. They have points. For example:
+-----------+------------+
| User_id | Points |
+-----------+------------+
| 1 | 12258 |
| 2 | 112 |
| 3 | 9678 |
| 4 | 689206 |
| 5 | 1868 |
+-----------+------------+
On the top of the page the variable $user_id is set. For example the user_id is 4. Now I would like to get the rank of the user by points (output should be 1 if the user_id is 4).
Thank you very much!
SELECT
COUNT(*) AS rank
FROM users
WHERE Points>=(SELECT Points FROM users WHERE User_id=4)
Updated with some more useful stuff:
SELECT
user_id,
points,
(SELECT COUNT(*)+1 FROM users WHERE Points>x.points) AS rank_upper,
(SELECT COUNT(*) FROM users WHERE Points>=x.points) AS rank_lower
FROM
`users` x
WHERE x.user_id = 4
which includes the range of ranks user is at. So for example if the scores for first five places are 5 4 3 3 3, the result would be:
id points rank_upper rank_lower
id 5 1 1
id 4 2 2
id 3 3 5
id 3 3 5
id 3 3 5
This query should do what you want :
SELECT rank FROM (
SELECT User_id, Points,
FIND_IN_SET(
Points,
(SELECT GROUP_CONCAT(
DISTINCT Points
ORDER BY Points DESC
)
FROM users)
) as rank
FROM users )
WHERE User_id = 4;
If you don't want to do it outside mysql you'll need to use variables to compute the rank.
Here's a solution that describes exactly what you want :
http://www.fromdual.ch/ranking-mysql-results
You still need, it you want to have it directly for each record, to store it in the record and to update it yourself. There is no reasonable query that will give you directly the rank without storage on a real table (I mean not just a few hundreds records).
There's already a simple solution, just suited for your purpose.
This may help
SELECT #rank:=#rank+1 AS rank,`User_id`,`Points` FROM `users` u JOIN (SELECT #rank:=0) r ORDER BY u.Points DESC