Database design for time dependent fields - mysql

I am making a MySQL database and am fairly confident I know how to normalize it. However, there is an issue I am not sure how to deal with.
Say I have a table
users
----------
user_id primary key
some_field
some_field2
start_date
user_level
Now, user_level gives the user's level, which can be 1,2,3,4,5 say. But as time passes the user may change levels. Obviously if they change levels I can simply do an UPDATE to the users table. But I want to keep a historical record of the users' past levels
For this reason, I am considering a new table called user_level_history
user_level_history
--------------
id autoincrement primary key
user_id
level_start_date
and then modify the users table:
users
----------
user_id primary key
some_field
some_field2
start_date
user_level_history_id
Then to get the user's current level I check the
user_level_history_id = user_level_history.id
And to get the user's history I can SELECT from user_level_history all rows with the user_id and order chronologically.
Is this the standard way to do this? I can't imagine I'm the first person to come across this problem.
One more point: I am imagining less than 5000 users. Would having many, many more users require a different solution?
Thanks in advance.

I think that could be designed like this:
Have a table for level information like value(1,2,3,4,5) , description ...
Have an association table for user_level_history containing user_id, level_id,level_start_date ...
Have a foreign key from level table to user table with the role user-active-level.
You need to develop a mechanism that when user level is changing, inserting to history table occurs.

No, you aren't the first. Querying temporal data is a common requirement, especially in data warehouse/data mining.
The relational data model doesn't have any native, built in support for storing or querying "temporal data".
A lot of work has been done; I have a book by C.J.Date et al. that covers the topic decently: "Temporal Data and the Relational Model". I've also come across several white papers.
One typical, reasonably simplistic approach to storing a "history" is to have a "current" table (like the one you already have, and then add a "history" table. Whenever a row is changed (inserted,updated,deleted) in the "current" table, you add a row to the "history" table, along with the date the row was changed. (You can store a copy of the pre-change row, or a copy of the post-change row, or both.)
With this approach, there's no need to add any columns to the "current" table.

Related

Nightmare on deciding database schema

I am in greatest nightmare on deciding a database schema ! Recently signed of my first freelancer project:
It has a user registration, and there is pretty decent requirements on user table as follows:
name
password
email
phone
is_active
email_verified
phone_verified
is_admin
is_worker
is_verified
has_payment
last_login
created_at
Now am at huge confusion to decide whether to put everything under a single table or split things, as still i need to add few more fields like
token
otp ( may be in future )
otp_limit ( may be in future ) // rate limiting
And may be something more in future when there is an update: I am afraid that, if there is an future update with new field to table then how to add that again if its a single table
And if i split things will that cause performance issue ? As most of the fields are moderately used on the webapp:
Please help me to decide, this is my first freelancing experience ( and its pretty tough and rough ) :(
If two tables have the same PRIMARY KEY, they should (with few exceptions) be combined in the same table. So, one table.
As for adding columns for future expansion, don't. Do ALTER TABLE .. ADD COLUMN .. when new columns are needed.
Once you have more than a million rows, adding a column becomes invasive, so try to get most new columns added before then.
You mentioned payment. If there is only one payment, simply have a column(s) with the amount and/or date. Make them NULLable to indicate that it has not been paid yet. If there will be multiple payments, then have another table dedicated to "payments", with zero or more rows for the payments.
That NULL technique won't work for a "verified" flag; it does need a separate column.
is_worker, is_admin -- Consider a single column that is an ENUM or SET to provide boolean "attributes for the user. Use SET if, for example, a user can be both a worker and an admin.
Each "entity" (users, payments, etc) should be a database table. "Relations between tables are 1:1 (which I argued against, above), 1:many (eg, user_id in the Payments table), or many:many (with an extra table with 2 ids).

Is it good to have a table with more rows or more tables with less rows in a database?

I am building a database for my application using Mysql, contains 2 tables in which one table will have user details and other table will have all user's activities(say posts,comments,..). I have 2 approaches for this PS.
Group all users activities under one table(say useractivities).
Maintain specific activities table for each user(say user1activity,user2activity,...).
If we go with approach 1, it builds time complexity in case of more users.
with approach 2, eats up database. which design will show less time and space complexity?
For better database maintain, you have to go with the first approach because you can normalize data easily.. and the perfect way to manage database structure, Need to take care of below points
You have to give proper indexing in user_id field for fast result in join query.
In case of large number of records in one table, then you can create another table like user_activities_archive for store old activities. in the regular period, you can move an old record from user_activities to user_activities_archive
You can create multiple tables for user_posts, user_comments instead of user_Activities for more splitting data and different structures of the table, for example you can manage replyto_id in the comment table and user_post table might have title field.
In the second approach for cerate tables for each user, there are many limitations like
Very hard in case of Table Joining with other tables
In case of fetch all user's activity records, you cant do it.
A number of the user base of your application.
Limitation of a number of tables in the database.
Create more complexity in edit update or delete user records.
If the user is not active (just registered) then separate user table useless.
As juergen d mentioned in the comment, approach 2 should not be used.
However I would consider splitting useractivities into different tables if the possible user activites are different from each other to avoid unneccessary column.
Example: A comment table with information about who made the comment (foreign key to user table) and the comment itself. + A foreign key to another user activity to wich the comment was made.
The comment column in the above table does not make sence for say, just a like of a post, so I would have created a different table for likes.

DB design for one-to-one single column table

I'm unsure the best route to take for this example:
A table that holds information for a job; salary, dates of employment etc. The field I am wondering how best to store is 'job_title'.
Job title is going to be used as part of an auto-complete field so
I'll be using a query to fetch results.
The same job title will be used by multiple jobs in the DB.
Job title is going to be a large part of many queries in the
application.
A single job only ever has one title.
1 . Should I have a 2 tables, job and job_title, job table referencing the job_title table for its name.
2 . Should I have a 2 tables, job and job_title but store title as a direct value in job, job_title just storing a list of all preexisting values (somewhat redundant)?
3 . Or should I not use a reference table at all / other suggestion.
What is your choice of design in this situation, and how would it change in a one to many design?
This is an example, the actual design is much larger however I think this well conveys the issue.
Update, To clarify:
A User (outside scope of question) has many Jobs, a job (start/end date, {job title}) has a title, title ( name (ie. 'Web Developer' )
Your option 1 is the best design choice. Create the two tables along these lines:
jobs (job_id PK, title_id FK not null, start_date, end_date, ...)
job_titles (title_id PK, title)
The PKs should have clustered indexes; jobs.title_id and job_titles should have nonclustered or secondary indexes; job_titles.title should have a unique constraint.
This relationship can be modeled as 1-to-1 or 1-to-many (one title, many jobs). To enforce 1-to-1 modeling, apply a unique constraint to jobs.title_id. However, you should not model this as a 1-to-1 relationship, because it's not. You even say so yourself: "The same job title will be used by multiple jobs in the DB" and "A single job only ever has one title." An entry in the jobs table represents a certain position held by a certain user during a certain period of time. Because this is a 1-to-many relationship, a separate table is the correct way to model the data.
Here's a simple example of why this is so. Your company only has one CEO, but what happens if the current one steps down and the board appoints a new one? You'll have two entries in jobs which both reference the same title, even though there's only one CEO "position" and the two users' job date ranges don't overlap. If you enforce a 1-to-1 relationship, modeling this data is impossible.
Why these particular indexes and constraints?
The ID columns are PKs and clustered indexes for hopefully obvious reasons; you use these for joins
jobs.title_id is an FK for hopefully obvious data integrity reasons
jobs.title_id is not null because every job should have a title
jobs.title_id needs an index in order to speed up joins
job_titles.title has an index because you've indicated you'll be querying based on this column (though I wouldn't query in such a fashion, especially since you've said there will be many titles; see below)
job_titles.title has a unique constraint because there's no reason to have duplicates of the same title. You can (and will) have multiple jobs with the same title, but you don't need two entries for "CEO" in job_titles. Enforcing this uniqueness will preserve data integrity useful for reporting purposes (e.g. plot the productivity of IT's web division based on how many "web developer" jobs are filled)
Remarks:
Job title is going to be used as part of an auto-complete field so I'll be using a query to fetch results.
As I mentioned before, use key-value pairs here. Fetch a list of them into memory in your app, and query that list for your autocomplete values. Then send the ID off to the DB for your actual SQL query. The queries will perform better that way; even with indexes, searching integers is generally quicker than searching strings.
You've said that titles will be user created. Put some input sanitation and validation process in place, because you don't want redundant entries like "WEB DEVELOPER", "web developer", "web developer", etc. Validation should occur at both the application and DB levels; the unique constraint is part (but all) of this. Prodigitalson's remark about separate machine and display columns is related to this issue.
Edited: after getting the clarify
A table like this is enough - just add the job_title_id column as foreign key in the main member table
---- "job_title" table ---- (store the job_title)
1. pk - job_title_id
2. unique - job_title_name <- index this
__ original answer __
You need to clarify what's the job_title going represent
a person that hold this position?
the division/department that has this position?
A certain set of attributes? like Sales always has a commission
or just a string of what was it called?
From what I read so far, you just need the "job_title" as some sort of dimension - make the id for it, make the string searchable - and that's it
example
---- "employee" table ---- (store employee info)
1. pk - employee_id
2. fk - job_title_id
3. other attribute (contract_start_date, salary, sex, ... so on ...)
---- "job_title" table ---- (store the job_title)
1. pk - job_title_id
2. unique - job_title_name <- index this
---- "employee_job_title_history" table ---- (We can check the employee job history here)
1. pk - employee_id
2. pk - job_title_id
3. pk - is_effective
4. effective_date [edited: this need to be PK too - thanks to KM.]
I still think you need to provide us a use-case - that will greatly improve both of our understanding I believe
If there are only a few fixed job titles you might want to use an enum in our database.
See http://dev.mysql.com/doc/refman/5.0/en/enum.html
If that's not supported by your version of mysql simply encode it with a numerical index and resolve it to a human readable form in your queries.

Users table - one table or two?

i wanna have a Users details stored in the database.. with columns like firstname, last name, username, password, email, cellphone number, activation codes, gender, birthday, occupation, and a few other more. is it good to store all of these on the same table or should i split it between two users and profile ?
If those are attributes of a User (and they are 1-1) then they belong in the user table.
You would only normally split if there were many columns; then you might create another table in a 1-1 mapping.
Another table is obviously required if there are many profile rows per user.
One table should be good enough.
Two tables or more generally vertical portioning comes in when you want to scale out. So you split your tables in multiple tables where usually the partiotioning criteria is the usage i.e., the most common attributes which are used together are housed in one table and others in another table.
One table should be okay. I'd be storing a hash in the password column.
I suggest you read this article on Wikipedia. about database normalization.
It describes the different possibilities and the pros and cons of each. It really depends on what else you want to store and the relationship between the user and its properties.
Ideally one table should be used. If the number of columns becomes harder to manage only then you should move them to another table. In that case, ideally, the two tables should have a one-one relationship which you can easily establish by setting the foreign key in the related table as the primary key:
User
-------------------------------
UserID INT NOT NULL PRIMARY KEY
UserProfile
-------------------------------------------------------
UserID INT NOT NULL PRIMARY KEY REFERENCES User(UserID)
Depend on what kind of application it is, it might be different.
for an enterprise application that my users are the employees as well, I would suggest two tables.
tbl_UserPersonallInformation
(contains the personal information
like name, address, email,...)
tbl_UserSystemInformation (contains
other information like ( Title,
JoinedTheCompanyOn,
LeftTheCompanyOn)
In systems such as "Document Managements" , "Project Information Managements",... this might be necessary.
for example in a company the employees might leave and rejoin after few years and even they will have different job title. The employee had have some activities and records with his old title and he will have some more with the new one. So it should be recorded in the system that with which title (authority) he had done some stuff.

Different database tables joining on single table

So imagine you have multiple tables in your database each with it's own structure and each with a PRIMARY KEY of it's own.
Now you want to have a Favorites table so that users can add items as favorites. Since there are multiple tables the first thing that comes in mind is to create one Favorites table per table:
Say you have a table called Posts with PRIMARY KEY (post_id) and you create a Post_Favorites with PRIMARY KEY (user_id, post_id)
This would probably be the simplest solution, but could it be possible to have one Favorites table joining across multiple tables?
I've though of the following as a possible solution:
Create a new table called Master with primary key (master_id). Add triggers on all tables in your database on insert, to generate a new master_id and write it along the row in your table. Also let's consider that we also write in the Master table, where the master_id has been used (on which table)
Now you can have one Favorites table with PRIMARY KEY (user_id, master_id)
You can select the Favorites table and join with each individual table on the master_id and get the the favorites per table. But would it be possible to get all the favorites with one query (maybe not a query, but a stored procedure?)
Do you think that this is a stupid approach? Since you will perform one query per table what are you gaining by having a single table?
What are your thoughts on the matter?
One way wold be to sub-type all possible tables to a generic super-type (Entity) and than link user preferences to that super-type. For example:
I think you're on the right track, but a table-based inheritance approach would be great here:
Create a table master_ids, with just one column: an int-identity primary key field called master_id.
On your other tables, (users as an example), change the user_id column from being an int-identity primary key to being just an int primary key. Next, make user_id a foreign key to master_ids.master_id.
This largely preserves data integrity. The only place you can trip up is if you have a master_id = 1, and with a user_id = 1 and a post_id = 1. For a given master_id, you should have only one entry across all tables. In this scenario you have no way of knowing whether master_id 1 refers to the user or to the post. A way to make sure this doesn't happen is to add a second column to the master_ids table, a type_id column. Type_id 1 can refer to users, type_id 2 can refer to posts, etc.. Then you are pretty much good.
Code "gymnastics" may be a bit necessary for inserts. If you're using a good ORM, it shouldn't be a problem. If not, stored procs for inserts are the way to go. But you're having your cake and eating it too.
I'm not sure I really understand the alternative you propose.
But in general, when given the choice of 1) "more tables" or 2) "a mega-table supported by a bunch of fancy code work" ..your interests are best served by more tables without the code gymnastics.
A Red Flag was "Add triggers on all tables in your database" each trigger fire is a performance hit of it's own.
The database designers have built in all kinds of technology to optimize tables/indexes, much of it behind the scenes without you knowing it. Just sit back and enjoy the ride.
Try these for inspiration Database Answers ..no affiliation to me.
An alternative to your approach might be to have the favorites table as user_id, object_id, object_type. When inserting in the favorites table just insert the type of the favorite. However i dont see a simple query being able to work with your approach or mine. One way to go about it might be to use UNION and get one combined resultset and then identify what type of record it is based on the type. Another thing you can do is, turn the UNION query into a MySQL VIEW and simply query that VIEW.
The benefit of using a single table for favorites is a simplicity, which some might consider as against the database normalization rules. But on the upside, you dont have to create so many favorites table and you can add anything to favorites easily by just coming up with a new object_type identifier.
It sounds like you have an is-a type relationship that needs to be modeled. All of the items that can be favourited are a type of "item". It sounds like you are on the right track, but I wouldn't use triggers. What could be the right answer if I have understood correctly, is to pull all the common fields into a single table called items (master is a poor name, master of what?), this should include all the common data that would be needed when you need a users favourite items, I'd expect this to include fields like item_id (primary key), item_type and human_readable_name and maybe some metadata about when the item was created, modified etc. Each of your specific item types would have its own table containing data specific to that item type with an item_id field that has a foreign key relationship to the item table. Then you'd wrap each item type in its own insertion, update and selection SPs (i.e. InsertItemCheese, UpdateItemMonkey, SelectItemCarKeys). The favourites table would then work as you describe, but you only need to select from the item table. If your app needs the specific data for each item type, it would have to be queried for each item (caching is your friend here).
If MySQL supports SPs with multiple result sets you could write one that outputs all the items as a result set, then a result set for each item type if you need all the specific item data in one go. For most cases I would not expect you to need all the data all the time.
Keep in mind that not EVERY use of a PK column needs a constraint. For example a logging table. Even though a logging table has a copy of the PK column from the table being logged, you can't build a constraint.
What would be the worst possible case. You insert a record for Oprah's TV show into the favorites table and then next year you delete the Oprah Show from the list of TV shows but don't delete that ID from the Favorites table? Will that break anything? Probably not. When you join favorites to TV shows that record will fall out of the result set.
There are a couple of ways to share values for PK's. Oracle has the advantage of sequences. If you don't have those you can add a "Step" to your Autonumber fields. There's always a risk though.
Say you think you'll never have more than 10 tables of "things which could be favored" Then start your PK's at 0 for the first table increment by 10, 1 for the second table increment by 10, 2 for the third... and so on. That will guarantee that all the values will be unique across those 10 tables. The risk is that a future requirement will add table 11. You can always 'pad' your guestimate