MYSQL Database Schema Question - mysql

I need opinions on the best way to go about creating a table or collection of tables to handle this unique problem. Basically, I'm designing this site with business profiles. The profile table contains all your usual things such as name, uniqueID, address, ect. Now, the whole idea of the site is that it's going to be collecting a small string of informative text. I want to allow the clients to be able to store one per date, with as many as 30 days in advance. The program is only going to show the information from the current date on forward, with expired dates not being shown.
The only way I can really see this being done is a table consisting of the uniqueID, date, and the informative block of text, but this creates pretty extensive queries. Eventually this table is going to be at least 20 times larger than the table of businesses in the first place as these businesses are going to be able to post up to 30 items in this table using their uniqueID.
Now, imagine the search page brings up a list of businesses in the area, it's then got to query the new table for all of those ids to get that block of information I want to show based on the date. I'm pretty sure it would be a rather intensive couple of queries just to show a rather simple block of text, but I imagine this is how status updates work for social networking sites in general? Does facebook store updates in a table of updates tied to a users ID number or have they come up with a better way?
I'm just trying to gain a little more insight into DB design, so throw out any ideas you might have.

The only way I can really see this being done is a table consisting of the uniqueID, date, and the informative block of text...
Assuming you mean the profile uniqueID, and not a unique ID for the text table, you're correct.
As pascal said in his comment, you'd need a primary index on uniqueID and date. A person could only enter one row of text for a given date.
If you want to retrieve the next text row for a person, your SQL query would have the following clauses:
WHERE UNIQUE_ID = PROFILE.UNIQUE_ID
AND DATE >= CURRENT_DATE
LIMIT 1
Since you have an index on uniqueID and date, this should be a fast query.
If you want to retrieve the next 5 texts for a particular person, you'd just have to make one change:
WHERE UNIQUE_ID = PROFILE.UNIQUE_ID
AND DATE >= CURRENT_DATE
LIMIT 5

Related

mySQL: one entry multiple strings vs multiple entries

At the minute im working on a complex database. I've got one table where I'd like to insert Data every day from dozents of Users.
Example:
There are 200 recipes for menus (each column a recipe) and 200+ Users. Every User is using a recipe between 1 and 3 times a day. in Addition to that i want to track the ingredients and the amount of the ingredients. Even more Data to it, like an evaulation of how difficult cooking was, how nice it tasted and so on.
First idea was to make one entry per usage:
[user id, timestamp, recipe#1, recipe#2, ... , recipe#200] // daily up to 3 entries per user
Details of the recipe would be in an array. I was wondering if I could make that easier. I want to synchronise the User's app and the database once per week. So could it be easier to make one entry for the week and differenciate the recipe usuage with a timestamp in that array?
Second idea:
[user id, recipe#1,...,recipe#200]
=>'1','"details","timestamp"','"details","timestamp"','"details","timestamp"'
// weekly one entry per user
If I want to show charts with stats about the recipes Idea1 would be easy, but depending on the users and the entries of those my database grow almost exponential. Could it be better to go with idea2 to reduce it for one entry per week and differenciate with timestamps inside those arrays?
I also dont like the idea of maintaining a structure like this. Adding more recipes wouldnt be very dynamic. Basicly Users are growing, recipes are growing, details a dynamic, time is stamps are getting inserted without an end..
At the end of the day I want to display stats, behaviour depending on user, on time, be able to be sorted by every category possible - which gives me a headache :D
Always go with multiple entries if you need to run queries on it for comparing and calculating.
Could you please share your whole database structure? I would go with more related tables. Preferable not add arrays into the database, see the database more like an array already to collect data in a better way.
I would go with something like cooking_id, user_id, recipe_id, difficulty, taste, timestamp, and create a new post for each time. Then in recipe table you have something like: recipe_id, name, details (maybe ingredients). Depends how you wanna measure things.
If you need to measure ingredients, you could make ingredients a separate table and create a related table for the recipe. Like for ingredients: ingredients_id, name. And for the related table: ingredients_id, recipe_id, grams. (here you can make ingredients_id and recipe_id together as a primary key.
It was some time since I last worked or study more deeply in database structure, hope I could give you some advice at least :)

Working out users points - update vs select

I have users who earn points by taking parts in various activities on the website and then the user can spend these points on whatever they like, the way I have it set up the at the minute is I have a table -
tbl_users_achievements and tbl_users_purchased_items
I have these two tables to track what the users have done and what they have bought (Obviously!)
But instead of having a column in my user tables called 'user_points', I have decided to display their points by doing a SELECT on all achievements and getting a sum of the points they have earnt, I am then doing another select on how many points they have spent.
I thought it might of been better to have a column to store their points and when they buy something and win stuff I do an UPDATE on the column for that user, but that seemed like multiple areas I have to manage, I have to insert a new row for the transaction and then update their column where if I use a query to work out their total won - spent I only have to insert the row and do no update. But the problem is then comes to performance of running and doing a calculation with the query.
So which solution would you go with and why?
Have a column to store their points and do an update
Use a query to work out the users points they can spend and have no column
Your current model is logically the right one - a key aspect for RDBMS normalization is not to repeat any information, and keeping an explicit "this customer has x points" column repeats data.
The benefits of this are obvious - you have less data manipulation code to write, and don't have to worry about what happens when you insert the transaction but can't update the users table.
The downsides are that you're running additional queries every time you show the customer profile; this can create a performance problem. The traditional response to that performance problem is to de-normalize, for instance by keeping a calculated total against the user table.
Only do that if that's absolutely, provably necessary.
myself, I would put the user points into a separate table PK'd by user ID or whatever and store them there and do updates to increment or decrement as achievements are attained or points spent.

Recommended Database Structure for User's Information

I am trying to figure out how I should go about building my database structure (tables) for user information on my website. The kinds of information that I will be storing (at this time anyways) are:
About Me
Birthday (January, 1, 1970)
Sex (Male/Female)
Interested In: (Male, Female, Both)
Relationship Status: (Single, In a Relationship, Engaged, Married)
Website: (mywebsite.com)
From: (Cupertino, California)
So this is the type of information I will be storing for now. My question basically is, should I have this be one table only? Or would it be better to split the information up depending on what it was (my users have a unique ID which would go along with each table of information, obviously). So I'm not sure if I should have a table exclusively for Birthdays with the columns: userID, Month, Day, Year; or what.
If a user only needs to store one piece of information for an attribute, then you don't need a separate table for it. For example, a user only has one birthday. The only reason you would need a separate Birthdays table would be if you want to store multiple birthdays for the same userid. Each one of the attributes you've listed look like they'd be fine in one Users table.
As for splitting up Birthdays into the columns: userID, Month, Day, Year, it all depends on how you're going to use that information. Will you ever need to know just the Month, Day, or Year that a user's birthday falls on? If that's a common need, you might want to store them separately. It's usually not, so you probably just want to store it as a single Date value.
Note: You can take a look at the schema used by Stack Overflow by checking out the Data Explorer. They keep a similar collection of data in one Users table.
In the vast majority of cases, I've seen what you're asking being stored in one table - usually user or users.
Perhaps including a number of other elements too:
user id (unique)
registration date
status (live/expired/banned)
user hash
plus a variety of others...
Honestly - It's dependent on what you're building and how it's built, but my advice would be to start simple.
On your point about birthdays, just store the date in mysql date format:
YYYY-MM-DD
That way, you can manipulate it in a variety of ways using mysql functions.
Hope this helps.

Need help with a database design for Top 10

I am trying to come up with a database design to hold the "Top 10" results for some calculations that are being done. Basically, when all is said in done, there will be 3 "Top 10" categories, which I am fine with all being separate tables, however I need to be able to go back and later pull historical data about what was in the Top 10 at certain times, hence the need for a database, although a flat-file would work, this has the potential to hold years worth of data.
Now, it's been awhile since I have done anything serious with a database, other than something that had a couple of simple tables, so I am having some issues thinking through this design. If someone could help me with the design of it, I know enough MySQL to get the rest done.
So, in essence, I need to store: A group of 10 names, a % of the total points each name had, the rank they held in the Top 10 and a time associated with that Top 10 (So I can later query for that time)
I would think I need a table for for the Top 10 with 11 columns, one for the ID and 10 for the Foreign Key of the 'Names' table, that holds every name ever used with a PK, Name, %, and Rank. This seems clunky to me, anyone else have a suggestion?
edit:The 'Top 10' is associated with a specific set of data for 5-minute intervals, and each interval is completely independent from the previous or future intervals.
I don't recommend your solution, because then if you want to ask the database "How often has Joe been in the top 10," you have to write 10 queries of the form
SELECT Date FROM Top10 WHERE FirstPlace = 'joe'
SELECT Date FROM Top10 WHERE SecondPlace = 'joe'
...
Instead, how about a Rankings table, with fields:
id
Date
Person
Rank
Then if you want the Top 10 list for a certain date, the query is
SELECT * FROM Rankings WHERE Date = ...
and if you want to know someone's historical ranking, the query is
SELECT * FROM Rankings WHERE Person = ...
and if you want to know all the historical leaders, the query is
SELECT * FROM Rankings WHERE Rank = 1
The downside to this is that you might accidentally make two different people 8th place, and your database would allow the anomaly. But I have good news for you -- people might actually tie for 8th place, so you might actually want that to be possible!
I assume that your "Top 10" is a snapshot data in certain time. And your business logic is that "every 5 minutes" so that the time is the parent entity for table design
top_10_history
th_id - the primary key
th_time - the time point when taking the snapshot data of "Top 10"
top_10_detail
td_th_id - the FK to top_10_history
td_name_id - the FK to name
td_percentage - the "%"
td_rank - the rank
If the sequence of "Top 10" could be calculated from columns in "top_10_detail", you don't need a column to keep the sequence of it. Otherwise, you need a column to persist the sequence for it.
If you need more complicated query such as "The top 10 at 12:00 AM in last 30 days", using individual columns for "day", "hour", and "minute" would be a better idea for performance(with suitable indexes).

Where to store users visited pages?

I have a project, where I have posts for example.
The task is next: I must show to user his last posts visit.
This is my solution: every time user visits new (for him) topic, I create a new record in table visits.
Table visits has next structure: id, user_id, post_id, last_visit.
Now my tables visits has ~14,000,000 records and its still growing every day..
May be my solution isnt optimal and exists another way how to store users visits?
Its important to save every visit as standalone record, because I also have feature to select and use users visits. And I cant purge this table, because data could be needed later month, year. How I could optimize this situation?
Nope, you don't really have much choice other than to store your visit data in a table with columns for (at a bare minimum) user id, post id, and timestamp if you need to track the last time that each user visited each post.
I question whether you need an id field in that table, rather than using a composite key on (user_id, post_id), but I'd expect that to have a minor effect, provided that you already have a unique index on (user_id, post_id). (If you don't have an index on that pair of fields, adding one should improve query performance considerably and making it a unique index or composite key will protect against accidentally inserting duplicate records.)
If performance is still an issue despite proper indexing, you should be able to improve it a bit by segmenting the table into a collection of smaller tables, but segment it by user_id or post_id (rather than by date as previous answers have suggested). If you break it up by user or post id, then you will still be able to determine whether a given user has previously viewed a given post and, if so, on what date with only a single query. If you segment it by date, then that information will be spread across all tables and, in the worst-case scenario of a user who has never previously viewed a post (which I expect to be fairly common), you'll need to separately query each and every table before having a definitive answer.
As for whether to segment it by user id or by post id, that depends on whether you will more often be looking for all posts viewed by a user (segment by user_id to get them all in one query) or all users who have viewed a post (segment by post_id).
If it doesn't need to be long lasting, you could store it in session instead. If it does, you could either break the records apart by table, like say 1 per month, or you could only store the last 5-10 pages visited, and delete old ones as new ones come in. You could also change it to pages visited today, this week, etc.
If you do need all 14 million records, I would create another historical table to archive the visits that are not the most relevant for the day-to-day site operation.
At the end of the month (or week, or quarter, etc...) have some scheduled logic to archive records beyond a certain cutoff point to the historical table and reduce the number of records in the "live" table. This should help increase the query speed on the "live" table since you would have less records in it.
If you do need to query all of the data, you can use both tables and have all of the data available to you.
you could delete the ones you don't need - if you only want to show the last 10 visited posts then
DELETE FROM visits WHERE user_id = ? AND id NOT IN (SELECT id from visits where user_id = ? ORDER BY last_visit DESC LIMIT 0, 10);
(i think that's the best way to do that query, any mysql guru can tell me otherwise? you can ORDER BY in DELETE but the LIMIT only takes 1 parameter, so you can't do LIMIT 10, 100 there)
after inserting/updating each new row, or every few days if you like
Having a structure like (id, user_id, post_id, last_visit) for your vists table, makes it appear as though you are saving all posts, not just last post per Topic. Don't you need a topic ID in there somewhere so that you can determine what there last post PER TOPIC was, and so you know which row to replace when they post in the same topic more than once?
Store post_ids to $_SESSION and then using MYSQL IN with one SELECT query you will be able to show his visited posts. But all those ids will be destroyed after member close his browser, but anyways, this is much more faster and optimal than using database.
edit: sorry, I didn't notice you that you must store that records in database and use it after months. Then I have no idea how to optimize it, but with 14 mln. records you should definitely use indexes.