I have a users table and a scores table:
-------- users
Name | Type
------------------------|----------------
uuid | VARCHAR(36)
name | VARCHAR(255)
... | ...
-------- scores
Name | Type
------------------------|----------------
uuid | VARCHAR(36)
user_uuid | VARCHAR(36)
score | INT(11)
I can fetch a user, including their total score using a subquery like this:
SELECT user.uuid, user.name,
(SELECT SUM(score) FROM scores WHERE user_uuid = user.uuid) AS score
FROM user WHERE user.uuid = [USER_UUID];
But now, how can I fetch the user's rank? That is, rank being determined by their score vs the scores of every other user.
Is it really necessary to loop through every single user, calculate all of their scores, and then order their scores to determine the rank of one single user? Performing this query on the fly seems taxing, especially if I have a large number of users. Should I instead build a separate rankings table, and re-sort the table after each INSERT into the scores table? That doesn't seem ideal either.
I have another application which will require on-the-fly ranking as well, but the calculations are much more severe: a user's score is determined by a complex algorithm spanning at least 5 different tables. So I really need some insight on database ranking in general - although the above problem is an easy way to represent it. What are some good practices to consider?
I think keeping the rank of each user in a different table(using a procedure when inserting data to the scores table) would be better. So that you can get the rank straight away when you need.
refer the mark as correct answer in this link. It might help you.
rank function in sql
Related
Suppose I have a table storing say student information
id | first_name | last_name | dob | score
where score is some non-unique numeric assessment of their performance. (the schema isn't really relevant, I'm trying to go as generic as possible)
I'd like to know, for any given student, what their score-based overall ranking is. ROW_NUMBER() or any equivalent counter method doesn't really work since they're only accounting for returned entries, so if you're only interested in one particular student, it's always going to be 1.
Counting the number of students with scores greater than the current one won't work either since you can have multiple students with the same score. Would sorting additionally by a secondary field, such as dob work, or would it be too slow?
You should be able to JOIN into a subquery which will provide the ranks of each student across the entire population:
SELECT student.*, ranking.rank
FROM student
JOIN (
SELECT id, RANK() OVER (ORDER BY score DESC) as rank
FROM student
) ranking ON student.id = ranking.id
I suppose the scale of your data will be a key determinant of whether or not this is a realistic solution for your use case.
Lets say I have users table:
| id | username | email | address |
And posts table:
| id | post | user_id | date |
When I want to show posts, each time I need to go users table to retrieve username from user_id. I want to avoid using JOIN for this simple data retreive so what I do is adding another coloumn to posts table:
| id | post | user_id | username | date |
This way I will not have to use JOIN to retreive username when showing posts
Do you think that this is better?
no. your alternative structure is vulnerable to inconsistencies (e.g. if a user changes his name; read about 3rd Normal form here http://en.wikipedia.org/wiki/Third_normal_form#.22Nothing_but_the_key.22)
why don't you want to use JOINs? have you set up approriate indexes?
I think it depends on the design and future, niy I will suggest you not to do that:
although from present respect, you will think it will be better performance to avoid join, but what if your application expand, and it is no good to use this unnormalized table structure.
For instance, if one of the poster changed username, how could you achieve that? to update the whole table? if your data could exccess 10Million tuples, it will be tough because update will lock the table in the process of updating.
thus I will not recommend this.
Join performance can be omit if your application needs frequently updating in that way.
If the [id] of [users] table is the primary key, I think it is good enough to use JOIN.
Alternatively, if you select limited number of posts, such as 10 posts, can also try this sql:
select id, post, user_id,
(select username from users where id = user_id) as username, date
from posts
limit 0, 10
Im making a simple score keeping system and each user can have stats for many different games
But i don't know if each user should have his own score table or have one big table containing all the scores with a user_id column
Multiple Tables:
table name: [user id]_score
game_id | high_score | last_score | average_score
Single Table:
user_id | game_id | high_score | last_score | average_score
Definitely have a single table with the userID as one of the fields. It would be very difficult (and needlessly so) to deal with multiple tables like that.
You will likely want at least one index to include the userId field, so that the records for each userId can be quickly found by queries.
Go with a single table and a composite primary key around user_id and game_id. You may also want to create a supplementary index around game_id so that lookups of high scores by game are fast. (The composite key will not suffice if user_id is not part of the criteria as well.)
Using a one-table-per-user approach would be cause for some LARTing.
Keep one table with all of the user_ids in them, definitely. Generally, you want to stay away from any solution that involves having generated data (userIDs) in table names (there are exceptions, but they're rare).
you'll probably want:
a user table user_info[], user_id
a game table game_info[], game_id
a user_game_join table that holds user_id, game_id, score, insertdate
that way you have all the data to get high score by game, by user, by game & user, averages & whatnot or whatever else.
always build extensibly. you never know when you will need to refine your code and redoing a database is a real PITA if you don't lay it our right to start with.
Hello all and thanks in advance
I have the tables accounts, votes and contests
A vote consists of an author ID, a winner ID, and a contest ID, so as to stop people voting twice
Id like to show for any given account, how many times theyve won a contest, how many times theyve come second and how many times theyve come third
Whats the fastest (execution time) way to do this? (Im using MySQL)
After using MySQL for a long time I'm coming to the conclusion that virtually any use of GROUP BY is really bad for performance, so here's a solution with a couple of temporary tables.
CREATE TEMPORARY TABLE VoteCounts (
accountid INT,
contestid INT,
votecount INT DEFAULT 0
);
INSERT INTO VoteCounts (accountid, contestid)
SELECT DISTINCT v2.accountid, v2.contestid
FROM votes v1 JOIN votes v2 USING (contestid)
WHERE v1.accountid = ?; -- the given account
Make sure you have an index on votes(accountid, contestid).
Now you have a table of every contest that your given user was in, with all the other accounts who were in the same contests.
UPDATE Votes AS v JOIN VoteCounts AS vc USING (accountid, contestid)
SET vc.votecount = vc.votecount+1;
Now you have the count of votes for each account in each contest.
CREATE TEMPORARY TABLE Placings (
accountid INT,
contestid INT,
placing INT
);
SET #prevcontest := 0;
SET #placing := 0;
INSERT INTO Placings (accountid, placing, contestid)
SELECT accountid,
IF(contestid=#prevcontest, #placing:=#placing+1, #placing:=1) AS placing,
#prevcontest:=contestid AS contestid
FROM VoteCounts
ORDER BY contestid, votecount DESC;
Now you have a table with each account paired with their respective placing in each contest. It's easy to get the count for a given placing:
SELECT accountid, COUNT(*) AS count_first_place
FROM Placings
WHERE accountid = ? AND placing = 1;
And you can use a MySQL trick to do all three in one query. A boolean expression always returns an integer value 0 or 1 in MySQL, so you can use SUM() to count up the 1's.
SELECT accountid,
SUM(placing=1) AS count_first_place,
SUM(placing=2) AS count_second_place,
SUM(placing=3) AS count_third_place
FROM Placings
WHERE accountid = ?; -- the given account
Re your comment:
Yes, it's a complex task no matter what to go from the normalized data you have to the results you want. You want it aggregated (summed), ranked, and aggregated (counted) again. That's a heap of work! :-)
Also, a single query is not always the fastest way to do a given task. It's a common misconception among programmers that shorter code is implicitly faster code.
Note I have not tested this so your mileage may vary.
Re your question about the UPDATE:
It's a tricky way of getting the COUNT() of votes per account without using GROUP BY. I've added table aliases v and vc so it may be more clear now. In the votes table, there are N rows for a given account/contest. In the votescount table, there's one row per account/contest. When I join, the UPDATE is evaluated against the N rows, so if I add 1 for each of those N rows, I get the count of N stored in votescount in the row corresponding to each respective account/contest.
If I'm interpreting things correctly, to stop people voting twice I think you only need a unique index on the votes table by author (account?) ID and contestID. It won't prevent people from having multiple accounts and voting twice but it will prevent anyone from casting a vote in a contest twice from the same account. To prevent fraud (sock puppet accounts) you'd need to examine voting patterns and detect when an account votes for another account more often then statistically likely. Unless you have a lot of contests that might actually be hard.
I'm planing to build some database project.
One of the tables have a lot of attributes.
My question is: What is better, to divide the the class into 2 separate tables or put all of them into one table. below is an example
create table User { id, name, surname,... show_name, show_photos, ...)
or
create table User { id, name, surname,... )
create table UserPrivacy {usr_id, show_name, show_photos, ...)
The performance i suppose is similar due to i can use index.
It's best to put all the attributes in the same table.
If you start storing attribute names in a table, you're storing meta data in your database, which breaks first normal form.
Besides, keeping them all in the same table simplifies your queries.
Would you rather have:
SELECT show_photos FROM User WHERE user_id = 1
Or
SELECT up.show_photos FROM User u
LEFT JOIN UserPrivacy up USING(user_id)
WHERE u.user_id = 1
Joins are okay, but keep them for associating separate entities and 1->N relationships.
There is a limit to the number of columns, and only if you think you might hit that limit would you do anything else.
There are legitimate reasons for storing name value pairs in a separate table, but fear of adding columns isn't one of them. For example, creating a name value table might, in some circumstances, make it easier for you to query a list of attributes. However, most database engines, including PDO in PHP include reflection methods whereby you can easily get a list of columns for a table (attributes for an entity).
Also, please note that your id field on User should be user_id, not just id, unless you're using Ruby, which forces just id. 'user_id' is preferred because with just id, your joins look like this:
ON u.id = up.user_id
Which seems odd, and the preferred way is this:
ON u.user_id = up.user_id
or more simply:
USING(user_id)
Don't be afraid to 'add yet another attribute'. It's normal, and it's okay.
I'd say the 2 separate tables especially if you are using ORM. In most cases its best to have each table correspond to a particular object and have its field or "attributes" be things that are required to describe that object.
You don't need 'show_photos' to describe a User but you do need it to describe UserPrivacy.
You should consider splitting the table if all of the privacy attributes are nullable and will most probably have values of NULL.
This will help you to keep the main table smaller.
If the privacy attributes will mostly be filled, there is no point in splitting the table, as it will require extra JOINs to fetch the data.
Since this appears to be a one to one relationship, I would normally keep it all in one table unless:
You would be near the limit of the number of bytes that can be stored in a row - then you should split it out.
Or if you will normally be querying the main table separately and won't need those fields much of the time.
If some columns is (not identifiable or dependent on the primary key) or (values from a definite/fixed set is being used repeatedly) of the Table make a Different Table for those columns and maintain a one to one relationship.
Why not have a User table and Features table, e.g.:
create table User ( id int primary key, name varchar(255) ... )
create table Features (
user_id int,
feature varchar(50),
enabled bit,
primary key (user_id, feature)
)
Then the data in your Features table would look like:
| user_id | feature | enabled
| -------------------------------
| 291 | show_photos | 1
| -------------------------------
| 291 | show_name | 1
| -------------------------------
| 292 | show_photos | 0
| -------------------------------
| 293 | show_name | 0
I would suggest something differnet. It seems likely that in the future you will be asked for 'yet another attribute' to manage. Rather than add a column, you could just add a row to an attributes table:
TABLE Attribute
(
ID
Name
)
TABLE User
(
ID
...
)
TABLE UserAttributes
(
UserID FK Users.ID
Attribute FK Attributes.ID
Value...
)
Good comments from everyone. I should have been clearer in my response.
We do this quite a bit to handle special-cases where customers ask us to tailor our site for them in some way. We never 'pivot' the NVP's into columns in a query - we're always querying "should I do this here?" by looking for a specific attribute listed for a customer. If it is there, that's a 'true'. So rather than having these be a ton of boolean-columns, most of which would be false or NULL for most customers, AND the tendency for these features to grow in number, this works well for us.