Proper way to create ranking table? - mysql

I am creating tennis site which of course needs to display results. Currently to get ranking I need to sum scores for each player and then order them, even if I only need single player results. I think that it will be really slow when player count in system will rise. That's why I need some way to cache results and update them only when something changed(or add some sort of timeout when results has to updated).
Also other requirement would be being able to calculate total scores i.e. I will have several competitions and I will need to show scores for all competitions, and for each competition separately.
What I currently thought of would be single table to store everything. It's schema would be:
ranking_tbl shema
rank_type(could be competition, team, player or something else)
rank_owner(who owns those ranks, can be team player ranks - owner would be team)
rank_item(who is ranked, in team example would be player )
score(actual score to rank by)
rank(precached rank, updated only when new scores added)
Ranking will be important part of my system and used heavily so I need it to be as efficient as possible.
Question: Is there better way to achieve ranking than using my table shema?

I think your schema will work. I can see 3 possible solutions that you could use to get your desired functionality.
Cron - Using a Cron Job (Scheduled Task) to update the rankings on a nightly basis would mean that you can do the bulk processing at an off peak time (2am for example). You would schedule a script that re-orders the players by score, assigns them a rank and saves this to the database.
Single Save Recalculation - If you are inserting scores one player at a time you could possibly look at recalculating the ranks after you save any score. This would provide excellent up to date ranks, but may have some trade off in performance when adding a lot of scores.
Multi Save Recalculation - Compile your scores into a CSV file which contains the player id, and score. You can then write a script to parse your CSV, update the scores of all players. Once the scores are saved you can recalculate the ranks for all players.
I would personally prefer the 3rd option but it may have a little more overhead in time consuption to initially set up.
Hope this helps.

Related

Database + Class design guidance - Is this more complicated than it needs to be?

Novice relational database design question here. This is a "I feel like I'm doing wrong. What is it?" question. What it boils down to is, how can I avoid unnecessary complexity when designing a relational database. I also realise that this is as much a question about class structure design, seeing as I'm using an ORM and I'm really thinking of these tables as objects.
Background
Lets say I want to record the results of a number of competitive games between an unspecified number of "players". These players all belong to a "leaderboard", so, the leaderboard has multiple players and records multiple results. The "score" for each of the players is recorded at the end of each game and belongs to a single "result" instance (see image). A score is also parented by the player to which it belongs.
edit: An example
Each row in the leaderboard table represents a collection of players who together form a league. For example, all of
the players who belong to a tennis league will have the same
leaderboard_id in the player table.
A row in the results table
represents a match that has taken place between players that belong to
a particular league. So the leaderboard_id associated with our
players is recorded in each result in this league. The results table
doesn't hold the score of each player, rather, I've attempted to
normalise (appologies for potentially inappropriate use of that term)
these into a score table.
Bringing this all together. We have a
league in the Leaderboard table, in which a game has taken places
between two players. These players belong to the league in
question. The two players have just played a match and their scores
are recorded as rows in the score table. These rows are collected
together under a single results_id, refereing to a row in the
results table.
Question
Q1. Does this make sense? Is there anything glaringly obvious that is wrong with this design?
As is, I can easily query the scores a particular player has accumulated over time, look up the players that played in a particular result, etc. However, there are some actions that really feel like they should be simple, but, to me, feel overly complicated.
For instance, if I want to return the most recent results for a particular player (ie not the player's scores, rather the results that contain a score that belongs to our player).
So, hand-wavey Q2. Maybe this is just lack of experience with SQL, but, I shouldn't have to do a JOIN to look up this, should I? But then, what's the alternative? Should I create a one-to-many composition between player and results, so that I can simply look up a player results?
With the current design to find the most recent result for a player I would need to do something like this (python sqlalchemy)
Score.query.join(Result, Player)\
.filter(Player.id ==player_id)\
.order_by(Result.timestamp.desc()).first()
Is this bad?

Working out users points - update vs select

I have users who earn points by taking parts in various activities on the website and then the user can spend these points on whatever they like, the way I have it set up the at the minute is I have a table -
tbl_users_achievements and tbl_users_purchased_items
I have these two tables to track what the users have done and what they have bought (Obviously!)
But instead of having a column in my user tables called 'user_points', I have decided to display their points by doing a SELECT on all achievements and getting a sum of the points they have earnt, I am then doing another select on how many points they have spent.
I thought it might of been better to have a column to store their points and when they buy something and win stuff I do an UPDATE on the column for that user, but that seemed like multiple areas I have to manage, I have to insert a new row for the transaction and then update their column where if I use a query to work out their total won - spent I only have to insert the row and do no update. But the problem is then comes to performance of running and doing a calculation with the query.
So which solution would you go with and why?
Have a column to store their points and do an update
Use a query to work out the users points they can spend and have no column
Your current model is logically the right one - a key aspect for RDBMS normalization is not to repeat any information, and keeping an explicit "this customer has x points" column repeats data.
The benefits of this are obvious - you have less data manipulation code to write, and don't have to worry about what happens when you insert the transaction but can't update the users table.
The downsides are that you're running additional queries every time you show the customer profile; this can create a performance problem. The traditional response to that performance problem is to de-normalize, for instance by keeping a calculated total against the user table.
Only do that if that's absolutely, provably necessary.
myself, I would put the user points into a separate table PK'd by user ID or whatever and store them there and do updates to increment or decrement as achievements are attained or points spent.

MYSQL Relational Schema Planning

I'm trying to figure out the best way to manage this data storage problem....
I have a table of players, teams, and competitions.
A team may be involved in let's say 3 competitions.
A player belongs to a team, but may only be eligible to play in 2 of the 3 competitions that his or her team plays in. Likewise another player of the same team may be eligible for all 3.
I don't want to add a column to the player table for each competition as I'm then moving away from the relational model. Do I need another table 'competition_eligiblity' - this seems like a lot of work though!
Any suggestions?
Thanks,
Alan.
Yes, you do need a table for competition eligibility.
It really is no more work to put it there. Actually, it will be less work:
Adding a new competition in the future will be a real pain if it involves adding a new column to a table.
If the competition eligibility is stored in columns, performing a query to get information on eligibility becomes a nightmare.
Suppose you wanted to list all the competitions players are eligible for. Here would be your query:
select player, "competition1" from players where competition1_eligible = 1
union all
select player, "competition2" from players where competition2_eligible = 1
union all
select player, "competition3" from players where competition3_eligible = 1
union all
select player, "competition4" from players where competition4_eligible = 1
Sounds like fun, eh? Whereas, if you have an eligibility table, this information will be very simple to get.
Update: storing all the eligibility info in a single value would be even more of a nightmare, because imagine trying to extract that information back out of the string. That is beyond the limits of a sane SQL query.
Creating a new table is really a trivial piece of work, and you only have to do that once. Everything after that will be much easier if you do it that way.

Retention Tracking

Let’s say I have an Angry Birds game.
I want to know how many players are buying the ‘mighty eagle’ weapon each month out of the players which bought the mighty eagle weapon in the previous months in their LTV in the system
I have the dates of all items bought per each client.
What I practically would like to have is a two dimensional
matrix that will tell me what the percentage of the players which moved from
LTV_month_X to LTV_month_Y for each combination of X<Y for a specific current
month?
An example:
example_png
(it didn't let me to put the pic inline so please press the link to see the pic)
Now, I have found a way to get the number of players moved
actually from from LTV_month_X to LTV_month_Y that LTV_month_Y is their current
month of activity within the system using SQL query and Excel Pivot table.
What I try find out is mainly is how to get the base number of those who potentially could do that transition.
A few definitions:
LTV_month_X = DATEDIFF(MONTH, first_eagle_month, specific_eagle_month)+1
Preferably I would like to have the solutions in ANSI-SQL, if not then MySQL or
MSSQL but no Oracle functions should be used at all.
Since I’m looking for the percentage of the transition two-steps plans could also work, first find the potential ones and the find the actual ones who moved to measure the retention from  LTV_month_X to LTV_month_Y.
One last issue: I need for it to be possible to drill down and find the actual IDs of the clients who moved from any stage X to any other stage Y (>X).
The use of the term LTV here is not clear. I assume you mean the lifetime of the user.
If I understand the question, you are asking, based on a list of entities each with one or more events, how do I group (e.g. count) the entities by the month of the last event and the month of the one before last event.
in mysql, you can use a variable to do that. I'm not going to expalin the whole concept, but basically, when within a SELECT statement you write #var:=column, then that variable is assigned the value of that column, and you can use that to compare values between consectuive columns e.g.
LEAST(IF(#var=column,#same:=#same+1,#same:=0),#var:=column)
the use of LEAST is a trick to ensure execution order.
The two dimension you are looking for are
Actual purchase month
Relative purchase month
SELECT
player_id,
TRUNCATE(first_purchase,'MM') AS first_month ,
TRUNCATE(current_purchase_date ,'MM') AS purchase_month,
months_between(current_purchase _date, first_purchase_date)+1 AS relative_month,
SUM(purchase_amount) AS total_purchase,
COUNT(DISTINCT player_id) AS player_count
FROM ...
Now you can pivot purchase month to relative month and aggregate

Design for 'Total' field in a database

I am trying to find an optimal solution for my Database (MySQL), but I'm stuck over the decision whether or not to store a Total column.
This is the simplified version of my database :
I have a Team table, a Game table and a 'Score' table. Game will have {teamId, scoreId,...} while Score table will have {scoreId, Score,...} (Here ... indicates other columns in the tables).
On the home page I need to show the list of Teams with their scores. Over time the number of Teams will grow to 100s while the list of Score(s) will grow to 100000s. Which is the preferred way:
Should I sum up the scores and show along with teams every time the page is requested. (I don't want to cache because the scores will keep changing) OR
Should I have a total_score field in the Team table where I update the total_score of a team every time a new score is added to the Scores table for that group?
Which of the two is a better option or is there any other better way?
I use two guidelines when deciding to store a calculated value. In the best of all worlds, both of these statements will be true.
1) The value must be computationally expensive.
2) The value must have a low probability of changing.
If the cost of calculating the value is very high, but it changes daily, I might consider making a nightly job that updates the value.
Start without the total column and only add it if you start having performance issues.
Calculating sum at request time is better for accuracy but worse for efficiency.
Caching total in a field (dramatically) improves performance of certain queries, but increases code complexity or may show stale data (if you update cached value not at the same time, but via cron job).
It's up to you! :)
I agree that computed values should not be used except for special situations such as month end snapshots of databases.
I would simply create a view with one column in the view equal to your computed total column. Then you can query the view instead of the base tables.
Depending on how often your scores gets updated and what exactly the "score" means
Case1: Score is a LIVE score
If the "score" is the live score like "runs scored in cricket or baseball" or "score of vollyball match or tabletennis" then I really dont understand the need of showing the "sum" of the "running" scores. However, this may be a requirements also in some cases like showing the total runs scored by a team till now + the runs scored so far in the on going (live) game.
In this case I suggest you another option which is combination of your 1st and 2nd option
Total_score in the team table would be good with slight change in your data model. which is
Add a new column in the scores table called LIVE which will be 0 for a finished match 1 for a live match (and optionally -1 indicating match is about to start but the scores wont get update)
Now union two tables something like
select team_id,sum(total_sore) from (
select team_id,total_score from team
union
select team_id,sum(score) total_score from scores where live = 1 group by team_id)subquery
group by team_id
Case2: Score is just a RESULT
Well just query the db directly (your 1st option) as because the result will be updated only after the game ends and the update infact it will be a new entry in the score table.
If my assumption is correct, the scores get updated only after the game is finished. Moreover the update can be even less often when considered the games played by a team.