Efficient way to keep clone of table row - mysql

I have a PHP application which use MySQL database. It has table called profile which store user's details. Now there is a need of keeping snapshot of that profile when he perform a task. Which means whole table row related to a user must be cloned.
I found two ways of doing that.
1) Add another column to table to mention whether it was cloned. Then his original profile can be separated. (original/cloned). Profile data will be maintained in one table.
Other method is ..
2) Add another table similar to profile (with same fields) and store cloned profiles in that. Profile data will be maintained in two tables.
What is most efficient in terms of performance and usability ?

If you have it in one table with just a column/field to distinguish whether it's a clone or original then you will have always double the number of records to handle. Whereas if it is on another table you have only one table to worry about each time unless you need both at one time. Another thing is if you have a separate table you have a virtual back-up for your main table. So, in any case that one is in trouble you have something to fall back on and vice versa. Additionally you don't put yourself in danger of mixing up the records if it is a separate table. In other words I would prefer your second approach rather than the first one.

Related

How to scale mysql table to keep historic data without increasing table size

I have a questionnaire in my app, using which I am creating data corresponding to the user who has submitted it and at what time(I have to apply further processing on the last object/questionnaire per user). This data is saved inside my server's MySQL DB. As this questionnaire is open for all my users and as it will be submitted multiple times, I do not want new entries to be created every time for the same user because this will increase the size of the table(users count could be anything around 10M), But I also want to keep the old data as a history for later processing.
Now I have this option in mind:
Create two tables. One main table to keep new objects and one history table to keep history objects. Whenever a questionnaire is submitted it will create a new entry in the history table, but update the existing entry in the main table.
So, is there any better approach to this and how do other companies tackle such situations?
I think you should go through the SCD (Slowly Changing Dimension) Concepts and decide which one is better approach to you.
Please read this and i think you will find the best way for yourself :
Here

Database Design: Difference between using boolean fields and duplicate tables

I have to design a database schema for an application I'm building. I will be using MySQL. In this application, users enter data and it gets saved in the database obviously. However, this data is not accessible to the public until the user publishes the data. Currently, I have one column for storing all the data. I was wondering if a boolean field in this table that indicates whether the data has been published is a good idea. Or, is it much better design to create one table for saved data and one table for published data and move the saved data to the published data table when the user presses Publish.
What are the advantages and disadvantages of using each one and is one of them considered better design than the other?
Case: Binary
They are about equal. Use this as a learning exercise -- Implement it one way; watch it for a while, then switch to the other way.
(same) Space: Since a row exists exactly once, neither option is 'better'.
(favor 1 table) When "publishing" it takes a transaction to atomically delete from one table and insert into the other.
(favor 2 tables) Certain SELECTs will spend time filtering out records with the other value for published. (This applies to deleted, embargoed, approved, and a host of other possible boolean flags.)
Case: Revision history
If there are many revisions of a record, then two tables, Current data and History, is better. That is because the 'important' queries involve fetching the only Current data.
(PARTITIONs are unlikely to help in either case.)

having trouble normalizing this database

Currently, I have 48 fields.
I'm completely new to access. This is how I decided to connect everything together.
It doesn't seem to be very effective. Could somebody help me understand how to normalize this database?
Should I try to put employee information in one table, job information in another table and then have an equipment lookup table?
The current job, last job, and previous job can all the SAME table. If you sort this table by descending job start date, then then you have current, last and previous. You thus don’t need nor want a separate table for each of these which really amounts to the concept of a “job”. If sorting by date is not enough, then you could add a column called Job Type (current, previous, etc.). Again, we still only using the one table.
The same goes for Equipment. You really don’t care if the limit is 3 last, or 300 last. By building a normalized table, then ONE form can edit all types and you save MASSIVE amounts of coding and building of tables, User interface software, and that of building quires to retrieve + show the last 3 jobs in a form.
The fact that your design with FAR LESS cost of development allows 3 or 300 last jobs is really moot. More important if some manager comes along and now wants you to save the last 4 jobs, you don’t have some massive re-design here. And you can on the fly add new job types. So in place of current, and say previous, you can also have un-completed, or failed jobs. So adding new business rules means again you don’t add a new type of job table, but only a “type” to the one column you already using to define the job as current or previous.
Identify like objects and make one table to store all of them. In your design you have three tables for equipment but each item of equipment has the same fields; they should be one table. Similarly for jobs, each job is pretty much the same; they should be one table. The same for departments.
Figure out one or more column in each table that can uniquely identify the row in the table (that is, if you know the values for those columns it is impossible for there ever to be two rows with those values). These are your primary keys for your tables.
Identify cases in which an item in one table needs to "point to" (refer to) an item in another table. In this case, make sure that the referring table has a set of columns that match the referred-to table.
When you've done that, you'll have the beginnings of a correctly factored relational database design.

Indefinite number of tables vs indefinite number of row with multiple columns

Which one would be better (performance wise and maintenance), a database which creates table dynamically or just adding rows dynamically?
Suppose I am building a project in which I let users to register. Say I have a table which store only basic personal infos, like name, dob, Date of joining, address, phone, etc. Say 10 columns.
Now is the tricky part.
Scene 1: Creating multiple tables
When a user complete registration, a message table is created. So each table is created for each users. The rows of each message table varies for each user.
In the same way there is a cart table for each user like the message table.
For this scene 1, 2 tables are created with every registration.
Scene 2: Adding Rows
The scenario is same here as well, but in this case I have 2 tables for message and cart. Rows are added only when there is an activity.
Note:
You must assume that the number of users is more than 2000 and expect 50+ users to be active all the time. Which means the message and cart tables are always busy for both the cases. Like there is always a query for update, add, delete, insert, select etc. simultaneously.
Also which scene will consume more disk space.
While writing this, it make me wonder what technique would Facebook and others use. If they use the Scene 2 style (all users (billions) use the same big long message table)... Just wondering
Databases has some basic rules defined for Database Design called
"Database Normalization", These basic rules allow us eliminating
redundant data.
1st Normal Form
Store One piece of information in only One Column, A column should store only One piece of information.
2ns Normal Form
A Table should have only the columns that are related to each other. All the related columns should be in One table.
Now if you look at your advised design, A Separate Table for each USER
will split SAME information/Columns about all the user in 1000's of
tables. Which violates the 2nd Normal Form.
You need to Create One Table and put all the related Columns in that
one table for all the users. and you can make use of normal t-sql to
query your data but if you have a table for each user my guess is your
every query that you execute from your application will be built
dynamically and for every query you will be using dynamic sql. which
is one of the Sql Devils and you want to avoid using it whenever
possible.
My suggestion would be read more about Database Design. Once you have
some basic understanding of database design. Draw it on a piece of
paper and see if it provides you everything that your business
requires / expects from this application , Spend sometime on it now it
will save you a lot of pain later.

Is it possible to have "folders" in a database?

I am going to have a database with several (less than 10) "main" tables. Additionally to that I want to have hundreds or thousands tables of the same type (let same "user_1", "user_2", "user_3" and so on). Is it possible to put all these tables in a directory/folder? Or database itself is already considered as a "folder" for tables?
ADDED
Since I go a lot of questions about why I want to do that, I want to elaborate on that. I want to have many tables to optimize query to the database. If I put everything in one table, the table is going to be huge. Than, if I want to extract information about a particular user, I first need to find those rows in the table which have a given user in a given column. And it can be time consuming. I decided to create a table for every user. So, if I need to know something about a user I just read the required information from a "small" table.
To be more specific, I can have 10 000 user and information about a given user can contain 10 000 lines. I do not want to have one table with 100 000 000 lines.
The answer is—you shouldn't be doing this in the first place.
Don't have separate tables for each user—instead, use one table for all your user data, and add a column (e.g. userId) to store information on who it's about.
If you want separate tables based on the user, this tends to be done using an owner or schema concept. In other words, you use:
create table pax.table1 ...
and pax is them the owner of that table. Each user can then have their own data.
If you don't mind everyone seeing the data in each others "folders", you can opt for a single table with a column specifying the particular user but you tend to lose user-based protection in that case.
Having each user's data in their own schema (or owner) means that you can restrict access based on user name. Keep in mind that these are then separate tables so it becomes harder to consolidate data from them should you wish to do so.
It's pretty unusual to have hundreds of thousands of tables, even in the biggest database setups. You might want to consider the possibility that you're doing something unwise. Posting the "why" of this question instead of the "how" will help us in assisting you further.