How to keep data history in mysql? - mysql

I saw many threads asking about keeping history of records in mysql. However, I'm not quite sure they are suitable with my case.
I'm developing an application form with a lot of user information. So far, I have to normalize it into 12 tables even though most of them have 1:1 relations but I though they would be good in the future use:
User (Id, Fullname, Username, Email, CreatedDate, UpdatedDate, ...)
Family (Id, UserId, Name, Relation, Job, ...)
Address (Id, UserId, Road, District, Province, ...)
...
When clients filled all the fields in the form, they have two options, firstly, they can save it as draft, secondly, they can confirm sending the application and can't change it anymore.
I have done some research so far. They are many ways to do it. For example, I could duplicate all the tables with some additional fields; VersionId. However due to the enormous number of tables, I don't think it's a good idea.
So, what I think is to add VersionId to each existing tables. When they save the form as draft, I would just put the information without touching the VersionId field, however, I would increase VersionId by 1 whenever the user confirm submitting the application.
Any suggestion would be really welcomed.

Add a status column to each table. It can have the values draft, current, and history.
When the user makes a change to their data, create a new row with status = draft. As they edit, you modify that row. When they confirm the changes, you set the old current row's status to history, and set the draft row's status to current.

I think it mainly depends on the business requirement (since it doesn't sound like there would be so many updates for the same user I am assuming that performance is not a concern but you may want to think about that too since an insert in this case is cheaper than an update).
So the question you need to ask yourself is does the business require that you keep the history or not? for example if user A fills out the form, and later on he updates some of the information e.g. he has a new address, do you want to be able to know what were his previous addresses or not? If you do then you have to store them (I would add the date of the change in addition to the version id)

Related

Partitioning or not a table on mysql

I know that there is not a "good" answer to my question, and it is opinion based. But since I am now learning those things on my own, I need advice.
I have a table on Mysql about "Customer". In this table there are columns referred to customer's info like name, surname, date of birth, address, and so on.
Each customer has his own credentials (username, password).
Now my question is: It is better to keep credentials in "customer" table, or it has sense to create a separate table, in order to guarantee the protection of these credentials, and also keep track of the changes of them along time, without wasting space repeating all the others customers' info?
You need to answer some questions about your data:
Do the columns change? People change names, addresses, and so on.
The credentials will change, at least the password.
What sort of history do you need?
My recommendation would be different sets of tables for different purposes:
One table that defines the customer id and whatever other immutable information there is (perhaps the date of becoming a customer and related information).
One or more tables with PII (personally identifiable information). You want to keep PII separate for regulatory and privacy reasons.
Tables for history. How you do this depends on your data model and what you need. A simple method is a single archive table per table in your data model. However, I might recommend type-2 tables (i.e. those having version effective and end dates).
Separate tables for credentials. These are even more sensitive than PII and you will want to control access.
Remember to never store clear-text passwords. And often you want to keep a history of passwords to prevent users from using the previous one.
It is better to create personal information in the person table and additional customer information in another table that has a relationship with the customer, and if you have any other information about the person in another table and link it to the table of persons.

DB design - store selection in database

I'm working on a web application where I need to do some research before I implement the database. I hope you can help me make some good decisions before I start to code.
Today i have a database that among other things contains about two million contacts in a table
Contact:
cid, name, phone, address, etc...
Users of the application can search the contact table based on different criteria, and get a list of contacts.
Users are stored i a separate database table
User: uid, name, email, etc...
Now I want to make the users able to store a search result as a selection. The selection has to be a list of cid's representing every contact in the search result the user got. When the selection is stored, a user can open the selection and add notes, statuses etc to the different contacts in the selection.
My first thought is to make a selection table and a selection-contact mapping table like this:
Selection: sid, name, description, uid, etc
SelectionContactMap: sid, cid, status, note, etc...
With an average selection size between 1 000 and 100 000 contacts, and several thousand users storing many selections, I see that the SelectionContactMap table is going to grow very big very fast.
The database is MySql and the application is written in PHP. I'm on a limited budget so I can not throw unlimited hardware on the task.
I'm I on the wrong way here?
Do you have any suggestions to solve this the best possible way?
Other database?
MySql specific suggestions, table type etc?
Other database design?
Any comments and suggestions are appreciated.
Thanks in advance :)
-- Tor Inge
Question: What happens if the results of the query change - eg: a selected contact no longer has the chosen attribute or a new contact gets added?
If the answer is "The result set should be updated" - then you want to store the criteria in the database, not the results themselves.
If you need to cache the results for a period of time, this may be better handled by the application, not the database.

Recommend to track all logins, update login table, or both?

Currently I am having a hard time deciding/weighing the pros/cons of tracking login information for a member website.
Currently
I have two tables, login_i and login_d.
login_i contains the member's id, password, last login datetime, and total count of logins. (member id is primary key and obviously unique so one row per member)
login_d contains a list of all login data in history which tracks each and every time a login occurs. It contains member's id, datetime of login, ip_address of login. This table's primary key is simply an auto-incremented INT field, really purposeless but need a primary and the only unique single field (an index on the otherhand is different but still not concerned).
In many ways I see these tables as being very similar but the benefit of having the latter is to view exactly when a member logged in, how many times, and which IP it came from. All of the information in login_i (last login and count) truthfully exists in login_d but in a more concise form without ever needing to calculate a COUNT(*) on the latter table.
Does anybody have advice on which method is preferred? Two tables will exist regardless but should I keep record of last_login and count in login_i at all if login_d exists?
added thought/question
good comment made below - what about also tracking login attempts based on a username/email/ip? Should this ALSO be stored in a table (a 3rd table I assume).
this is called denormalization.
you ideally would never denormalize.
it is sometimes done anyway to save on computationally expensive results - possibly like your total login count value.
the downside is that you may at some point get into a situation where the value in one table does not match the values in the other table(s). of course you will try your best to keep them properly up to date, but sometimes things happen. In this case, you will possibly generate bugs in application logic if they receive an incorrect value from one of the sources.
In this specific case, a count of logins is probably not that critical to the successful running of the app - so not a big risk - although you will still have the overhead of maintaining the value.
Do you often need last login and count? If Yes, then you should store it in login_i aswell. If it's rarely used then you can take your time process the query in the giant table of all logins instead of storing duplicated data.

database modelling -mysql

I am doing the design of a database, that will have eventually thousands of users. Each user has your profile and specific data associated.
In your opinion, it is best practice a table for id, username, activationLink and hash and another for address, age, photo, job, or it is best a unique table for all stuff?
thanks for your time
If:
All (or almost all) users have all data filled
Most of the time you query for all fields
then keep them in a single table, otherwies split them.
In your model, activationLink seems to be queried for only once per activation, so I'd move it into a separate table (which would allow deleting it after the account had been activated).
Address, age, photo and job are usually shown along with the username, so it would be better to merge them into a single table.
Don't allow your initial design to limit the ability (or just make it difficult) to expand your requirements in the future.
At the moment, a user may have one address so you might put it in the users table - what if you want them to be able to store "work" and "home" addresses in future, or a history of past addresses?
A user may only be allowed to have a single photo, but if you put it (or a URL for it) in users.photo, then you'd have to change your data structure to allow a user to have a history of profile photos
As Quassnoi mentions, there are performance implications for each of these decisions - more tables means more complexity, and more potential for slow queries. Don't create new tables for the sake of it, but consider your data model carefully as it quickly becomes very hard to change it.
Any values that are a strict 1-to-1 relationship with a user entity, and are unlikely to ever change and require a history for (date of birth is a good example) should go in the table with the core definition. Any potential 1-to-many relationships (even if they aren't right now) are good candidates for their own tables.

DB for Commenting System

i wanna create a 2 level status message system. Which is the best way to create a tables ?
Scope:
User sets a Status Message
Users Reply to the status message
this is a picture showing it
Tables i have created
users (id, name .... )
status_messages (id, message, time, user_id)
status_message_replies (id, message, time, status_message_id, user_d)
Some one suggested this can be done in a single table format
status_messages (id, pid, message, time, user_id)
where pid = selfId or ParentId of the status.
I wanna know which is the best method to create the system ?
As long as the original messages and the responses have the same structure (set of attributes, or columns) then you can use the single table approach. It has the advantage that you can search over original messages and responses with a single query.
The set of original messages can be found where pid = selfid and the responses where pid <> selfid. If it's important to be able to see the original and response messages separately (without knowledge of the storage mechanism) you can encapsulate the above conditions in two VIEWs: OriginalMessages and Responses.
If the originals and responses have different attributes (for instance, if you want the original to allow links to URLs, photos, etc) you might consider using two separate tables. But even there, I'd probably argue for the one table structure with a separate, extender table for the additional attributes. That means you don't have to store often-empty columns for those original messages that don't use the extended attributes, and you can later easily add the extended attributes to the response messages as well (if desired).
A classical IS-A relationship: every reply is a message with an extra attribute (the message it is a reply to).
This is probably not the best way to model it. You'll be running the risk of having to write a lot of UNION queries over those two tables.
Alternatives:
just one table: status_messages (id, message, time, status_message_id, user_id), and allowing status_message_id to be NULL
use a HAS-A: one table status_messages (id, message, time, user_id) and one table replies (reply_id, replies_to_id
The former has the disadvantage that working with NULL is tricky in SQL.
The latter will necessitate joins when you want to query replies specifically.
BTW it's much clearer (IMO) to name columns after the relationship they stand for, not the table they refer to.