I'm trying to figure out the best way to manage this data storage problem....
I have a table of players, teams, and competitions.
A team may be involved in let's say 3 competitions.
A player belongs to a team, but may only be eligible to play in 2 of the 3 competitions that his or her team plays in. Likewise another player of the same team may be eligible for all 3.
I don't want to add a column to the player table for each competition as I'm then moving away from the relational model. Do I need another table 'competition_eligiblity' - this seems like a lot of work though!
Any suggestions?
Thanks,
Alan.
Yes, you do need a table for competition eligibility.
It really is no more work to put it there. Actually, it will be less work:
Adding a new competition in the future will be a real pain if it involves adding a new column to a table.
If the competition eligibility is stored in columns, performing a query to get information on eligibility becomes a nightmare.
Suppose you wanted to list all the competitions players are eligible for. Here would be your query:
select player, "competition1" from players where competition1_eligible = 1
union all
select player, "competition2" from players where competition2_eligible = 1
union all
select player, "competition3" from players where competition3_eligible = 1
union all
select player, "competition4" from players where competition4_eligible = 1
Sounds like fun, eh? Whereas, if you have an eligibility table, this information will be very simple to get.
Update: storing all the eligibility info in a single value would be even more of a nightmare, because imagine trying to extract that information back out of the string. That is beyond the limits of a sane SQL query.
Creating a new table is really a trivial piece of work, and you only have to do that once. Everything after that will be much easier if you do it that way.
Related
Novice relational database design question here. This is a "I feel like I'm doing wrong. What is it?" question. What it boils down to is, how can I avoid unnecessary complexity when designing a relational database. I also realise that this is as much a question about class structure design, seeing as I'm using an ORM and I'm really thinking of these tables as objects.
Background
Lets say I want to record the results of a number of competitive games between an unspecified number of "players". These players all belong to a "leaderboard", so, the leaderboard has multiple players and records multiple results. The "score" for each of the players is recorded at the end of each game and belongs to a single "result" instance (see image). A score is also parented by the player to which it belongs.
edit: An example
Each row in the leaderboard table represents a collection of players who together form a league. For example, all of
the players who belong to a tennis league will have the same
leaderboard_id in the player table.
A row in the results table
represents a match that has taken place between players that belong to
a particular league. So the leaderboard_id associated with our
players is recorded in each result in this league. The results table
doesn't hold the score of each player, rather, I've attempted to
normalise (appologies for potentially inappropriate use of that term)
these into a score table.
Bringing this all together. We have a
league in the Leaderboard table, in which a game has taken places
between two players. These players belong to the league in
question. The two players have just played a match and their scores
are recorded as rows in the score table. These rows are collected
together under a single results_id, refereing to a row in the
results table.
Question
Q1. Does this make sense? Is there anything glaringly obvious that is wrong with this design?
As is, I can easily query the scores a particular player has accumulated over time, look up the players that played in a particular result, etc. However, there are some actions that really feel like they should be simple, but, to me, feel overly complicated.
For instance, if I want to return the most recent results for a particular player (ie not the player's scores, rather the results that contain a score that belongs to our player).
So, hand-wavey Q2. Maybe this is just lack of experience with SQL, but, I shouldn't have to do a JOIN to look up this, should I? But then, what's the alternative? Should I create a one-to-many composition between player and results, so that I can simply look up a player results?
With the current design to find the most recent result for a player I would need to do something like this (python sqlalchemy)
Score.query.join(Result, Player)\
.filter(Player.id ==player_id)\
.order_by(Result.timestamp.desc()).first()
Is this bad?
I'm new to (My)SQL and it's been difficult to find good info on best practices of table design.
I want to save sequences of moves on a chess board, say I have an array $a = ['e4 e5', 'Nf3 Nc6', ...]
Being new, my first idea is a dumb little table with 2 columns, one for the Game ID and one for the moves. The moves (array) would be serialized and stored in a string. I guess this would technically work, but reading and writing potentially huge serialized arrays from a DB - perhaps on every page load - seems suboptimal to me.
Caching the array on the user side might not be possible for various reasons and is not something I'm curious about.
I'm interested in learning how to best store data that can't be entirely predicted in it's format (e.g. the number of moves can vary from 1 to 1000).
Rather than storing the entire game history in a single row, why not store the game ID, the move, and the sequence number of that move.
That way you would retrieve the entire history of a given game by doing something like
SELECT *
FROM MovesTable
WHERE gameID = id
ORDER BY sequence
In general, I would use a table in one of the following forms:
GameId, MoveNumber, Move
GameId, MoveNumber, FromSquare, ToSquare
Which one you use will depend on what you will need to query against and how the data will be presented, but I would lean toward the latter suggestion.
You can then combine this with a parent table that contains the GameId and some data about the game itself, such as dates or players.
If you're only going to consume the moves as an entire block - that is you always want the entire move chain and never will query into individual moves - you could store the string as you suggest. This has the added benefit that there is only one row of data to return, which will be very fast. The downside of course is that you will have to deserialize/parse the data once you receive it.
I'd do something like
Game, MoveNumber, Move
----------------------
Game1, 1, e4 e5
Game1, 2, Nf3 Nc6
...
Game2, 1, ....
You will need a number of tables, one for the players, one for games, one for the pieces and one for moves.
A game would have players and colours and things like date and time
A piece would tell you how it moves, is it a knight or a queen etc.
A player would have the player's name etc.
A move would have the piece, game, player, start position and end position
You would link these tables together by relationships based on id fields in each of the tables.
Try two tables.
First table contains information pertaining to each game. Who are the players, where and when did they play, who won, and whatever other information is pertinent to each game and makes each game unique.
Games
Game_id PK
Move_id FK
Black_name
White_name
Game_Location
Game_Date
Game_Time
Winner
The second table contains all the moves for each game. This contains all the pertinent information for each move: was a piece taken, was the other player put in check, was this a regular move or did they castle, etc.
Moves
Move_id PK
Move_number
Who_moved
Piece_moved
Square_from
Square_to
Piece_taken
Move_type
Check_YorN
Now each row in the Games table (game) is joined to many rows (moves) in the Moves table.
So I have this application that I'm drawing up and I start to think about my users. Well, My initial thought was to create a table for each group type. I've been thinking this over though and I'm not sure that this is the best way.
Example:
// Users
Users [id, name, email, age, etc]
// User Groups
Player [id, years playing, etc]
Ref [id, certified, etc]
Manufacturer Rep [id, years employed, etc]
So everyone would be making an account, but each user would have a different group. They can also be in multiple different groups. Each group has it's own list of different columns. So what is the best way to do this? Lets say I have 5 groups. Do I need 8 tables + a relational table connecting each one to the user table?
I just want to be sure that this is the best way to organize it before I build it.
Edit:
A player would have columns regarding the gear that they use to play, the teams they've played with, events they've gone to.
A ref would have info regarding the certifications they have and the events they've reffed.
Manufacturer reps would have info regarding their position within the company they rep.
A parent would have information regarding how long they've been involved with the sport, perhaps relations with the users they are parent of.
Just as an example.
Edit 2:
**Player Table
id
user id
started date
stopped date
rank
**Ref Table
id
user id
started date
stopped date
is certified
certified by
verified
**Photographer / Videographer / News Reporter Table
id
user id
started date
stopped date
worked under name
website / channel link
about
verified
**Tournament / Big Game Rep Table
id
user id
started date
stopped date
position
tourney id
verified
**Store / Field / Manufacturer Rep Table
id
user id
started date
stopped date
position
store / field / man. id
verified
This is what I planned out so far. I'm still new to this so I could be doing it completely wrong. And it's only five groups. It was more until I condensed it some.
Although I find it weird having so many entities which are different from each other, but I will ignore this and get to the question.
It depends on the group criteria you need, in the case you described where each group has its own columns and information I guess your design is a good one, especially if you need the information in a readable form in the database. If you need all groups in a single table you will have to save the group relevant information in a kind of object, either a blob, XML string or any other form, but then you will lose the ability to filter on these criteria using the database.
In a relational Database I would do it using the design you described.
The design of your tables greatly depends on the requirements of your software.
E.g. your description of users led me in a wrong direction, I was at first thinking about a "normal" user of a software. Basically name, login-information and stuff like that. This I would never split over different tables as it really makes tasks like login, session handling, ... really complicated.
Another point which surprised me, was that you want to store the equipment in columns of those user's tables. Usually the relationship between a person and his equipment is not 1 to 1 and in most cases the amount of different equipment varies. Thus you usually have a relationship between users and their equipment (1:n). Thus you would design an equipment table and there refer to the owner's user id.
But after you have an idea of which data you have in your application and which relationships exist between your data, the design of the tables and so on is rather straitforward.
The good news is, that your data model and database design will develop over time. Try to start with a basic model, covering the majority of your use cases. Then slowly add more use cases / aspects.
As long as you are in the stage of planning and early implementation phasis, it is rather easy to change your database design.
I'm devloping a music streaming site where I have two major tables: 'activity' and 'music'. Activity saves, among other things, every song reproduction into a new record.
Every time I select from music I need to fetch the number of reproductions of every song. So, what would be the better practice
SELECT music.song, music.artist, COUNT (activity.id) AS reproductions
FROM music LEFT JOIN activity USING (song_id) WHERE music.song_id = XX
GROUP BY music.song_id
Or would it be better to save the number of reproductions into a new field in the music table and query this:
SELECT song, artist, reproductions FROM music WHERE music.song_id = XX
This last query is, of course, much easier. But to use it, every time I play a soundfile I should make two querys: one INSERT in the activity table, and one UPDATE on the reproductions field on music table.
What would be the better practice in this scenario?
Well this depends on the response times these two queries will have in time.
After tables will become huge (hypothetically) sql nr 2 will be better.
You have to think that in time even insert might be costly...you you might think on some data warehousing if you will have ..millions of rows in DB.
I have 30 tables, each representing a different neighborhood.
Each table holds real estate listing with a "Price", "Number Of Rooms", "Square Feet" etc columns.
The end user would be able to choose as many neighborhoods as he likes with the
option to screen out results such as "At least 5 Rooms", "Below 250k" etc and
sort the results by "Lowest Price", "Time Submitted", you get the point.
Now I'm a programmer not a DBMS guy. I've search the web but feel that trying to build the query one step at a time would be the wrong approach without some guidance on what to avoid.
I would love to hear and learn from the StackOverflow community on best approaches with this one. Please help me sort this up.
EDIT: i'm currently using MyISAM
You should not have 30 tables. Normalize your schema:
NEIGHBORHOOD
ID, Name
PROPERTY
ID, NeighborhoodID, Name, Price,
Rooms, SquareFeet
Then you can join these together:
SELECT n.Name AS Neighborhood, p.Name AS Property, Price, Rooms, SquareFeet
FROM Property AS p
INNER JOIN Neighborhood AS n ON h.NeighborhoodID = p.ID
WHERE p.NeighborhoodID = X
Then you may need indexes on the tables as the data grows.
You should start modifying your database model. Creating 30 tables for storing the same data (real state information) is not adequate. Try to put all the data in a single table adding a column that indicates the neighborhood. This neighborhood could point to another table with the name, description, ... of the neighborhood. Then you can query a single table to search across all neighborhoods and optionally filtrate the neighborhood the user want to search for.
The best way is to change your db model, get rid of 30 tables, and put everything in one table. With your current model, I don't see any other ways but create a huge union (you can put it into a view, and query this view).