I'm devloping a music streaming site where I have two major tables: 'activity' and 'music'. Activity saves, among other things, every song reproduction into a new record.
Every time I select from music I need to fetch the number of reproductions of every song. So, what would be the better practice
SELECT music.song, music.artist, COUNT (activity.id) AS reproductions
FROM music LEFT JOIN activity USING (song_id) WHERE music.song_id = XX
GROUP BY music.song_id
Or would it be better to save the number of reproductions into a new field in the music table and query this:
SELECT song, artist, reproductions FROM music WHERE music.song_id = XX
This last query is, of course, much easier. But to use it, every time I play a soundfile I should make two querys: one INSERT in the activity table, and one UPDATE on the reproductions field on music table.
What would be the better practice in this scenario?
Well this depends on the response times these two queries will have in time.
After tables will become huge (hypothetically) sql nr 2 will be better.
You have to think that in time even insert might be costly...you you might think on some data warehousing if you will have ..millions of rows in DB.
Related
At the minute im working on a complex database. I've got one table where I'd like to insert Data every day from dozents of Users.
Example:
There are 200 recipes for menus (each column a recipe) and 200+ Users. Every User is using a recipe between 1 and 3 times a day. in Addition to that i want to track the ingredients and the amount of the ingredients. Even more Data to it, like an evaulation of how difficult cooking was, how nice it tasted and so on.
First idea was to make one entry per usage:
[user id, timestamp, recipe#1, recipe#2, ... , recipe#200] // daily up to 3 entries per user
Details of the recipe would be in an array. I was wondering if I could make that easier. I want to synchronise the User's app and the database once per week. So could it be easier to make one entry for the week and differenciate the recipe usuage with a timestamp in that array?
Second idea:
[user id, recipe#1,...,recipe#200]
=>'1','"details","timestamp"','"details","timestamp"','"details","timestamp"'
// weekly one entry per user
If I want to show charts with stats about the recipes Idea1 would be easy, but depending on the users and the entries of those my database grow almost exponential. Could it be better to go with idea2 to reduce it for one entry per week and differenciate with timestamps inside those arrays?
I also dont like the idea of maintaining a structure like this. Adding more recipes wouldnt be very dynamic. Basicly Users are growing, recipes are growing, details a dynamic, time is stamps are getting inserted without an end..
At the end of the day I want to display stats, behaviour depending on user, on time, be able to be sorted by every category possible - which gives me a headache :D
Always go with multiple entries if you need to run queries on it for comparing and calculating.
Could you please share your whole database structure? I would go with more related tables. Preferable not add arrays into the database, see the database more like an array already to collect data in a better way.
I would go with something like cooking_id, user_id, recipe_id, difficulty, taste, timestamp, and create a new post for each time. Then in recipe table you have something like: recipe_id, name, details (maybe ingredients). Depends how you wanna measure things.
If you need to measure ingredients, you could make ingredients a separate table and create a related table for the recipe. Like for ingredients: ingredients_id, name. And for the related table: ingredients_id, recipe_id, grams. (here you can make ingredients_id and recipe_id together as a primary key.
It was some time since I last worked or study more deeply in database structure, hope I could give you some advice at least :)
I have users who earn points by taking parts in various activities on the website and then the user can spend these points on whatever they like, the way I have it set up the at the minute is I have a table -
tbl_users_achievements and tbl_users_purchased_items
I have these two tables to track what the users have done and what they have bought (Obviously!)
But instead of having a column in my user tables called 'user_points', I have decided to display their points by doing a SELECT on all achievements and getting a sum of the points they have earnt, I am then doing another select on how many points they have spent.
I thought it might of been better to have a column to store their points and when they buy something and win stuff I do an UPDATE on the column for that user, but that seemed like multiple areas I have to manage, I have to insert a new row for the transaction and then update their column where if I use a query to work out their total won - spent I only have to insert the row and do no update. But the problem is then comes to performance of running and doing a calculation with the query.
So which solution would you go with and why?
Have a column to store their points and do an update
Use a query to work out the users points they can spend and have no column
Your current model is logically the right one - a key aspect for RDBMS normalization is not to repeat any information, and keeping an explicit "this customer has x points" column repeats data.
The benefits of this are obvious - you have less data manipulation code to write, and don't have to worry about what happens when you insert the transaction but can't update the users table.
The downsides are that you're running additional queries every time you show the customer profile; this can create a performance problem. The traditional response to that performance problem is to de-normalize, for instance by keeping a calculated total against the user table.
Only do that if that's absolutely, provably necessary.
myself, I would put the user points into a separate table PK'd by user ID or whatever and store them there and do updates to increment or decrement as achievements are attained or points spent.
I'm trying to figure out the best way to manage this data storage problem....
I have a table of players, teams, and competitions.
A team may be involved in let's say 3 competitions.
A player belongs to a team, but may only be eligible to play in 2 of the 3 competitions that his or her team plays in. Likewise another player of the same team may be eligible for all 3.
I don't want to add a column to the player table for each competition as I'm then moving away from the relational model. Do I need another table 'competition_eligiblity' - this seems like a lot of work though!
Any suggestions?
Thanks,
Alan.
Yes, you do need a table for competition eligibility.
It really is no more work to put it there. Actually, it will be less work:
Adding a new competition in the future will be a real pain if it involves adding a new column to a table.
If the competition eligibility is stored in columns, performing a query to get information on eligibility becomes a nightmare.
Suppose you wanted to list all the competitions players are eligible for. Here would be your query:
select player, "competition1" from players where competition1_eligible = 1
union all
select player, "competition2" from players where competition2_eligible = 1
union all
select player, "competition3" from players where competition3_eligible = 1
union all
select player, "competition4" from players where competition4_eligible = 1
Sounds like fun, eh? Whereas, if you have an eligibility table, this information will be very simple to get.
Update: storing all the eligibility info in a single value would be even more of a nightmare, because imagine trying to extract that information back out of the string. That is beyond the limits of a sane SQL query.
Creating a new table is really a trivial piece of work, and you only have to do that once. Everything after that will be much easier if you do it that way.
Using MySQL I have table of users, a table of matches (Updated with the actual result) and a table called users_picks (at first it's always going to be 10 football matches pr. gameweek pr. league because there's only one league as of now, but more leagues will come along eventually, and some of them only have 8 matches pr. gameweek).
In the users_picks table should i store each 'pick' (by pick I mean both 'hometeam score' and 'awayteam score') in a different row, or have all 10 picks in one single row? Both with a FK for user and gameweek. All picks in one row would mean I had columns with appended numbers like this:
Option 1: [pick_id, user_id, league_id, gameweek_id, match1_hometeam_score, match1_awayteam_score, match2_hometeam_score, match2_awayteam_score ... etc]
and that option doesn't quite fill me with joy, and looks a bit stupid. Especially since there's going to be lots of potential NULLs in the db. The second option would mean eventually millions of rows. But would look like this:
Option 2: [pick_id, user_id, league_id, gameweek_id, match_id, hometeam_score, awayteam_score]
What's the best practice? And would it be a PITA to do all sorts of statistics using the second option? eg. Calculating how many matches a user has hit correctly in a specific round, how many alltime correct hits etc.
If I'm not making much sense, I'll try to elaborate anything. I just wan't my table design to be good from the start, so I won't have a huge headache in a couple of months.
Thanks in advance.
The second choice is much better than the first. This is called database normalisation and makes querying easier, not harder. I would suggest reading the linked article, and the related descriptions of the various "normal forms", and aiming for a 3rd Normal Form data structure as a minimum.
To see the flaw in your first option, imagine if there were to be included later a new league with 11 matches. Or 400.
You should read up about database normalization.
When you have a 1:n relation, like in your case one team having many matches, you would create two tables. One table "teams" and a second table "matches" where each row includes the ID of the team which played the match.
In the same manner you should also have separate tables for users, picks and leagues.
Option two is better, provided you INDEX your table properly, since (as you indicate) it will grow quite large. The pick_id is the primary key, but also create an INDEX on the user_id field, as likely the most common query will be
SELECT * FROM `users_pics` WHERE `user_id`=?;
to get all the picks for a given user.
I need opinions on the best way to go about creating a table or collection of tables to handle this unique problem. Basically, I'm designing this site with business profiles. The profile table contains all your usual things such as name, uniqueID, address, ect. Now, the whole idea of the site is that it's going to be collecting a small string of informative text. I want to allow the clients to be able to store one per date, with as many as 30 days in advance. The program is only going to show the information from the current date on forward, with expired dates not being shown.
The only way I can really see this being done is a table consisting of the uniqueID, date, and the informative block of text, but this creates pretty extensive queries. Eventually this table is going to be at least 20 times larger than the table of businesses in the first place as these businesses are going to be able to post up to 30 items in this table using their uniqueID.
Now, imagine the search page brings up a list of businesses in the area, it's then got to query the new table for all of those ids to get that block of information I want to show based on the date. I'm pretty sure it would be a rather intensive couple of queries just to show a rather simple block of text, but I imagine this is how status updates work for social networking sites in general? Does facebook store updates in a table of updates tied to a users ID number or have they come up with a better way?
I'm just trying to gain a little more insight into DB design, so throw out any ideas you might have.
The only way I can really see this being done is a table consisting of the uniqueID, date, and the informative block of text...
Assuming you mean the profile uniqueID, and not a unique ID for the text table, you're correct.
As pascal said in his comment, you'd need a primary index on uniqueID and date. A person could only enter one row of text for a given date.
If you want to retrieve the next text row for a person, your SQL query would have the following clauses:
WHERE UNIQUE_ID = PROFILE.UNIQUE_ID
AND DATE >= CURRENT_DATE
LIMIT 1
Since you have an index on uniqueID and date, this should be a fast query.
If you want to retrieve the next 5 texts for a particular person, you'd just have to make one change:
WHERE UNIQUE_ID = PROFILE.UNIQUE_ID
AND DATE >= CURRENT_DATE
LIMIT 5