Ranking entries and group by - mysql

I have a website where people are saving highscores in games. I've been trying to assign a rank to a player's highscore by finding the number of other highscores that are higher. So if the query finds 5 people with higher highscores, the player's highscore rank will be 6.
The thing is that more than one score (for the same game) can be recorded in the database for every user. This is why I'm using GROUP BY user when I want to display only a player's best score in a game (not all of his scores in that game).
Here's the code I am using now. It doesn't work, since the query seems to always return 2 (like if it was always returning that there was only one score higher than the player's highscore). If I remove temp GROUP BY user, it returns an half-correct value, since counting all the scores (if a player as multiple scores in a game) from every player in a given game.
$count3 = mysql_result(mysql_query("SELECT COUNT(*) + 1 as Num FROM (SELECT * FROM ava_highscores WHERE game = $id AND leaderboard = $get_leaderboard[leaderboard_id] AND score > '$highscore2[score]') temp GROUP BY user");

When you use GROUP BY then COUNT returns a count of rows per group rather than a single count of all rows in the result set. Use COUNT(DISTINCT ...) instead. Also you don't actually need the inner select. You can write it all as a single query:
SELECT COUNT(DISTINCT `user`) + 1 AS Num
FROM ava_highscores
WHERE game = '3'
AND leaderboard = '5'
AND score > '1000'
Notes
Make sure that your score column is a numeric type (not a varchar type) so that the comparison works correctly.
Adding an index on (game, leaderboard, score) will make the query more efficient. The order of the columns in the index is also important.

Related

Filtering out early entries meeting a condition on a per user basis

I am new to SQL and want to filter a database in a way that doesn’t quite map to any of the examples I’ve read. I’m using mySQL with MariaDB.
The table is game result reports that takes the following structure:
id (unique integer id for each report, primary key)
user_id (integer id allocated to each player)
day (integer, the identity of the daily puzzle being reported)
result (integer score)
submitted (timestamp of the report submission)
The day puzzle ticks over at midnight local time so two reports can be submitted at the same time but be from different days.
I want to be able to find the average score reported for each day BUT I want to exclude any user’s score if they’ve never got a particular score prior to that (to eliminate complete newbies and people who are unusually bad at the game). However I can’t work out how to layer in this exclusion.
Where I’ve got to is that I can get the earliest successful game for each user with this query:
SELECT id, user_id, MIN(submitted), result FROM ‘results_daily’ WHERE result > 5 GROUP BY user_id
(Let’s call that output1)
However I can’t see how to apply this to a set of day results as a filter (so only include a result in my average calculation where a user features in output1 AND their daily report has been submitted on or after the date for that user in output1).
It feels like it might be some kind of JOIN operation but I can’t wrap my head around it. Can anyone help?
** EDITED TO ADD:
Ok I think I've got it, although my solution uses a function and I'm not sure if that's the most efficient way to do this or the most SQL-y way. Instinctively it feels like this should be possible without a function but I'm definitely not practiced enough to work it out if it is! O. Jones's answer set me off down the right path I just needed to refine the excluded set with a function. So now my query looks like this:
SELECT day,
AVG(result) average_score,
COUNT(*) number_of_plays,
COUNT(DISTINCT user_id) number_of_non_n00b_players
FROM results_daily
WHERE user_id NOT IN (
SELECT user_id
FROM results_daily
WHERE submitted < GetEureka(user_id)
GROUP BY user_id )
GROUP BY day;
and my function GetEureka() looks like this:
DECLARE eureka TIMESTAMP DEFAULT CURRENT_TIMESTAMP;
SELECT
MIN(submitted) INTO eureka
FROM
results_daily
WHERE
user_id = user
AND
result >= 5
GROUP BY
user_id;
RETURN eureka;
In SQL think about sets. You need the set of user_id values of all n00bz meeting either of these two criteria:
No score greater than 5.
Only one play of the game.
Then, when you compute your averages you exclude rows with those user ids.
So let's get you the n00bz, with the magic of the GROUP BY and HAVING clauses.
SELECT user_id
FROM results_daily
GROUP BY user_id
HAVING COUNT(*) = 1 OR MAX(result) <= 5
Now we can run your stats.
SELECT day,
AVG(score) average_score,
COUNT(*) number_of_plays,
COUNT(DISTINCT user_id) number_of_non_n00b_players
FROM results_daily
WHERE user_id NOT IN (
SELECT user_id
FROM results_daily
GROUP BY user_id
HAVING COUNT(*) = 1 OR MAX(result) <= 5 )
GROUP BY day;
The structured in stuctured query language comes from this use of nested subqueries to define sets of rows and work with them.

MySQL Fetch highest value by key and then add rank to it

I'm trying to create a mysql query that looks through my scoreboard for a given playerid and then finds their highest score and then adding a rank to that score.
I've come quite close to what I'm trying to achieve with this code:
SELECT PlayerUUID, `iGamescore` as score, FIND_IN_SET( iGamescore, (
SELECT GROUP_CONCAT( iGamescore
ORDER BY iGamescore ASC )
FROM crystm_gameScoreboard )
) AS rank
FROM crystm_gameScoreboard
WHERE PlayerUUID = '4c8984f3-651a-48bc-ad1a-879624380fab'
LIMIT 1
Returns:
But I do know that this player has played multiple times and is therefore multiple times in the scoreboard. As seen here:
So the issue here is that yes, it does find the player and gives the rank correctly.. However, since it exists multiple times, it saves the very first score instead of the highest score. How would I fix my query to correct for this or would you instead of creating a new score every time they create a highscore for themselves, just update their previous record in the scoreboard?
Thanks in advance
To get the highest score you need a GROUP BY:
SELECT
PlayerUUID,
MAX(`iGamescore`) as score
RANK() OVER (ORDER BY MAX(`iGamescore`) DESC) as Rang
FROM crystm_gameScoreboard
GROUP BY PlayerUUID
ORDER BY 3 ASC
The order by 3 ASC makes the list sorted by rank
This post solved it:
MySQL - Rank user amongst list of top high-scores
Had everything I was looking for. I dont know why I could not find this post but when searching, keywords are important haha.
Thanks for inputs tho.

Get row rank without fetching every row

I'm using this query to fetch a highscore of a certain user.
SELECT score FROM highscores WHERE user_id=?
I would like to know the how-many'th this particular highscore is compared to the other scores. So basically the row number after we DESC ordered the highscores table.
I could of course fetch all rows and look at the array key but I was wondering if there's a more performance friendly method directly via the MySQL query. The highscore table contains around 500k rows, and it seems a bit overkill (and it will take some time) to have to fetch them all first just to get the rank number of the user's score.
Some time ago I helped a friend with a similar task, and wrote an article,
How to get a single player's rank based on the score.
So the idea is to run a count(*) query with a WHERE clause to include all players having a higher score:
SELECT count(*)+1 FROM players WHERE score > ?
or, based on the user id,
SELECT count(*)+1 FROM players WHERE score > (SELECT score FROM players WHERE id = ?)
This solution is much simpler than anything I was able to find on Stack Overflow.
It can be also refined, to take into account other fields, in case two users have the same score, such as time when the score has been updated:
SELECT count(*)+1 FROM players WHERE score > ? OR (score = ? AND score_updated < ?)

If statement in SQL

I have a game, and a MySQL database table where I store the results of users. Every result is stored, but on highscore page I show only the best result from a certain user, like this:
SELECT user_id,MAX(score) as score
FROM table
GROUP BY user_id
ORDER BY score DESC LIMIT $startLimit,$numPerPage
But now I also want to make it relate to time it took for player to reach certain score, if the scores are level.
For example if the player has two same scores, I want to grab the one that took him less time (ofcourse there is a column "time" in this table).
Try this:
SELECT user_id,score,MIN(time) as MinTime
FROM
(SELECT user_id,MAX(score) as score,time
FROM table
GROUP BY user_id, time ) T
GROUP BY user_id,score
ORDER BY score DESC
LIMIT $startLimit,$numPerPage
It will return the records with minimum time if the user has same score.

MySQL index being ignored for subquery range

Alright, here's a simple enough question about indices and subqueries. I'm using MariaDB 5.5.36 + MyISAM, here's my table structure for leaderboard. It contains about 50 million rows across ~2000 levels.
int userid,
int levelid,
int score,
index (userid, levelid),
index (levelid, score)
This query, meant to return the rank of each score in a level for a given user, runs very slow...
SELECT levelid, (
SELECT COUNT(*) + 1
FROM leaderboard
WHERE score > l.score AND levelid = l.levelid
) AS rank
FROM leaderboard AS l
WHERE userid = 12345;
I've tried using a self-join group approach as well, which runs in half the time as above but still unacceptably slow:
SELECT x.levelid, COUNT(y.score) AS rank
FROM leaderboard AS x
LEFT JOIN leaderboard AS y ON x.levelid = y.levelid AND y.score > x.score
WHERE x.userid = {0}
GROUP BY x.levelid;
... while this alternative runs about 100x faster (pseudocode, looping over the results in an application outside the DB or in a stored procedure or something and running the subquery separately 2000 times with a constant):
results = execute(""SELECT levelid, score
FROM leaderboard
WHERE userid = 12345"");
for each row in results:
execute(""SELECT COUNT(*) + 1
FROM leaderboard
WHERE score > %d AND levelid = %d
"".printf(row.score, row.levelid));
EXPLAIN tells me that the subquery in the slow example has a key_len of 4 bytes (just levelid) while the fast version uses 8 (levelid, score). Interesting side note, if "score > l.score" is replaced with "score = l.score" it switches to using all 8, but obviously that doesn't give me the answer I'm looking for.
Is there something I'm not understanding about how the index fundamentally works? Is there a better way to write this ranking query? Would it be more efficient to add a rank column to my table and update it every time a highscore is achieved (that could mean updating up to 400k rows for one single score achievement)?