about database design - mysql

I need some idea about my database design. I have about 5 fields for basic information of user, such as name, email, gender etc.
Then I want to have about 5 fields for optional information such as messenger id's.
And 1 optional text field for info about user.
Should i create only one tabel with all fields all together or i should create separate table for the 5 optional fields in order to avoid redundancy etc?
Thanks.

I'll stick with only one table.
Adding another table would only makes thins more complicated and you will only gain really little disk space.
And I really don't see how this can be redundant in any way ;)

I think that you should definately stick with one table. Since all information is relevant to a user and do not reflect any other logical model (like an article, blog post or such), you can safely keep everything in one place, even if they are optional.

I would create only one table for additional fields. But not with 5 fields but a foreign key relation to base table and key/pair value info. Something like:
create table users (
user_id integer,
name varchar(200),
-- the rest of the fields
)
create table users_additional_info (
user_id integer references users(user_id) not null,
ai_type varchar(10) not null, -- type of additional info: messenger, extra email
ai_value varchar(200) not null
)
Eventually you might want an additional_info table to hold possible valid values for extra info: messenger, extra email, whatever. But that is up to you. I wouldn't bother.

It depends on how many people will be having all of that optional information and whether you plan on adding more fields. If you think you're going to add more fields in the future, it might be useful to move that information to a meta table using the EAV pattern : http://en.wikipedia.org/wiki/Entity-attribute-value_model
So, if you're unsure, your table would be like
User : id, name, email, gender, field1, field2
User_Meta : id, user_id, attribute, value
Using the user_id field in your meta table, you can link it to your user table and add as many sparsely used optional fields as you want.
Note : This pays off ONLY if you have many sparsely populated optional fields. Otherwise have it in one field

I would suggest using a single table for this. Databases are very good at optimizing away space for empty columns.
Splitting this table out into two or more tables is an example of vertical partitioning and in this case is likely to be a case of premature optimization. However, this technique can be useful when you have columns that you only need to query some of the time, eg. large binary blobs.

Related

Nightmare on deciding database schema

I am in greatest nightmare on deciding a database schema ! Recently signed of my first freelancer project:
It has a user registration, and there is pretty decent requirements on user table as follows:
name
password
email
phone
is_active
email_verified
phone_verified
is_admin
is_worker
is_verified
has_payment
last_login
created_at
Now am at huge confusion to decide whether to put everything under a single table or split things, as still i need to add few more fields like
token
otp ( may be in future )
otp_limit ( may be in future ) // rate limiting
And may be something more in future when there is an update: I am afraid that, if there is an future update with new field to table then how to add that again if its a single table
And if i split things will that cause performance issue ? As most of the fields are moderately used on the webapp:
Please help me to decide, this is my first freelancing experience ( and its pretty tough and rough ) :(
If two tables have the same PRIMARY KEY, they should (with few exceptions) be combined in the same table. So, one table.
As for adding columns for future expansion, don't. Do ALTER TABLE .. ADD COLUMN .. when new columns are needed.
Once you have more than a million rows, adding a column becomes invasive, so try to get most new columns added before then.
You mentioned payment. If there is only one payment, simply have a column(s) with the amount and/or date. Make them NULLable to indicate that it has not been paid yet. If there will be multiple payments, then have another table dedicated to "payments", with zero or more rows for the payments.
That NULL technique won't work for a "verified" flag; it does need a separate column.
is_worker, is_admin -- Consider a single column that is an ENUM or SET to provide boolean "attributes for the user. Use SET if, for example, a user can be both a worker and an admin.
Each "entity" (users, payments, etc) should be a database table. "Relations between tables are 1:1 (which I argued against, above), 1:many (eg, user_id in the Payments table), or many:many (with an extra table with 2 ids).

A more efficient way to store data in MySQL using more than one table

I had one single table that had lots of problems. I was saving data separated by commas in some fields, and afterwards I wasn't able to search them. Then, after search the web and find a lot of solutions, I decided to separate some tables.
That one table I had, became 5 tables.
First table is called agendamentos_diarios, this is the table that I'm gonna be storing the schedules.
Second Table is the table is called tecnicos, and I'm storing the technicians names. Two fields, id (primary key) and the name (varchar).
Third table is called agendamento_tecnico. This is the table (link) I'm goona store the id of the first and the second table. Thats because there are some schedules that are gonna be attended by one or more technicians.
Forth table is called veiculos (vehicles). The id and the name of the vehicle (two fields).
Fith table is the link between the first and the vehicles table. Same thing. I'm gonna store the schedule id and the vehicle id.
I had an image that can explain better than I'm trying to say.
Am I doing it correctly? Is there a better way of storing data to MySQL?
I agree with #Strawberry about the ids, but normally it is the Hibernate mapping type that do this. If you are not using Hibernate to design your tables you should take the ID out from agendamento_tecnico and agendamento_veiculos. That way you garantee the unicity. If you don't wanna do that create a unique key on the FK fields on thoose tables.
I notice that you separate the vehicles table from your technicians. On your model the same vehicle can be in two different schedules at the same time (which doesn't make sense). It will be better if the vehicle was linked on agendamento_tecnico table which will turn to be agendamento_tecnico_veiculo.
Looking to your table I note (i'm brazilian) that you have a column called "servico" which, means service. Your schedule table is designed to only one service. What about on the same schedule you have more than one service? To solve this you can create a table services and create a m-n relationship with schedule. It will be easier to create some reports and have the services well separated on your database.
There is also a nome_cliente field which means the client for that schedule. It would be better if you have a cliente (client) table and link the schedule with an FK.
As said before, there is no right answer. You have to think about your problem and on the possible growing of it. Model a database properly will avoid lot of headache later.
Better is subjective, there's no right answer.
My natural instinct would be to break that schedule table up even more.
Looks like data about the technician and the client is duplicated.
There again you might have made a decisions to de-normalise for perfectly valid reasons.
Doubt you'll find anyone on here who disagrees with you not having comma separated fields though.
Where you call a halt to the changes is dependant on your circumstances now. Comma separated fields caused you an issue, you got rid of them. So what bit of where you are is causing you an issue now?
looks ok, especially if a first try
one comment: I would name PK/FK (ids) the same in all tables and not using 'id' as name (additionaly we use '#' or '_' as end char of primary / foreighn keys: example technicos.technico_ and agendamento_tecnico has fields agend_tech_ and technico_. But this is not common sense. It makes queries a bit more coplex (because you must fully qualify the fields), but make the databse schema mor readable (you know in the moment wich PK belong to wich FK)
other comment: the two assotiative (i never wrote that word before!) tables, joining technos and agendamento_tecnico have an own ID field, but they do not need that, because the two (primary/unique) keys of the two tables they join, are unique them selfes, so you can use them as PK for this tables like:
CREATE TABLE agendamento_tecnico (
technico_ int not null,
agend_tech_ int not null,
primary key(technico_,agend_tech_)
)

Is a semicolon delimiter a good way to store a large number of ID #s in a mysql field?

I have a database which will store millions of Post ID#s. I need to associate with each post ID # a number of User ID #s (on the order of about 20-50 for each post ID). I was thinking of constructing a semicolon delimited list in PHP and just inserting that into a DB field on the post ID row.
Is this a relativly efficient and good way to go about doing this?
Thanks!
The long answer to this is you need to create a one-to-many association table. Proper database normalization principles dictate this.
The problem with your approach, serializing the list into the database as a semicolon-concatenated list, is the data itself is virtually useless unless you can deserialize it.
Fields of this sort:
Cannot be indexed effectively.
Can grow to exceed the storage capacity of the column.
Require context to properly utilize.
Cannot work with foreign key integrity checking.
Cannot be easily amended.
Removing entires requires re-writing the entire field.
Cannot be queried directly.
Cannot be used in JOIN operations.
You're talking about creating a simple association table:
CREATE TABLE user_posts (
id INT AUTO_INCREMENT PRIMARY KEY,
user_id INT,
post_id INT
)
You'd have a UNIQUE index on user_id,post_id to ensure that you don't have duplicates. The inclusion of an id column is mostly so you can remove particular rows without having to specify user+post pairs.
No, this is a very bad idea.
A foreign key is what you want here. Basically, for every post_ID you also store the USER ID as a foreign key.
So, if you have a POSTS table, you add a column User_ID (or Poster_ID) and reference the USER ID in the USER table.
I think you should review some of the basics - please see links:
http://www.functionx.com/sql/Lesson11.htm
http://creately.com/blog/diagrams/er-diagrams-tutorial/
https://cs.uwaterloo.ca/~gweddell/cs348/errelational-handout.pdf

Database design - which would be better?

I have multiple tables.
They all have the following fields in them:
item_title | item_description | item_thumbnail | item_keywords
Would I be better off having a single items_table with an extra item_type field and then joining with the respective table, or just keep them all in separate tables?
Depends on the context. If your items have very little differentiation and you’re certain you’re not going to have a scenario in 6 months, 12 months, 2 years where you need items separated, then go the route of one generic “items” table. If a particular item type does have specific requirements, then you can create a separate table that contains this data and create a LEFT JOIN when querying to include the extra data.
I’d also suggest looking at other database types. Judging from your scenario (lots of item types with little variance in the data stored) I think you may benefit from a document-based database engine like MongoDB rather than a relational data-based database engine like MySQL.
OK, so the tables share fields. Do they also share constraints1?
If yes, then go ahead and merge them together.
If not, you may keep them separate, of may merge them together, depending on what kind of tradeoff you are willing to make.
For example, if tables have separate foreign keys, you may keep them separate, or you may merge them into a single table, but keep FKs separate:
item_title
item_description
item_thumbnail
item_keywords
table1_id REFERENCES table1 (table1_id)
table2_id REFERENCES table2 (table2_id)
...
CHECK (
(table1_id IS NOT NULL AND table2_id IS NULL ...)
OR (table1_id IS NULL AND table2_id IS NOT NULL ...)
...
)
(NOTE: MySQL Doesn't enforce CHECK, so you'll need to do the equivalent enforcement from a trigger or client code, or use a different DBMS if you can.)
I'd need to know more about your database to figure out which is better.
with an extra item_type field and then joining with the respective table,
Never enforce FKs in code, if you can help it. Even if you merge the tables together, don't merge FKs, instead do something like the above. Enforcing FKs in code in the context of the concurrent environment (where multiple clients can try to modify the same data at the same time) is difficult to do correctly and with good performance - it's much better to let the DBMS do it for you.
BTW, what is item_keywords? It it's a comma-separated list of keywords (or similar), you'll need to normalize further and extract the keywords into their own separate table.
1 Domain (data type and CHECK), key (PRIMARY KEY and UNIQUE) and referential (FOREIGN KEY) constraints.
I believe that it is good to have as less table as possible. It is easy to maintain. It is hard to imagine that if you have 3000 type of item_type. Then, there would be 3,000 different table. So single table is good idea to me in your case. In the future, when you run into situation when you need to separate the table, you can easily do so.
So the short answer, YES.
If i understand well, you only need to normalize your schema:
items:
item_id
item_name
item_description
items_types
item_id
type_id
types
type_id
item_file_name
So this way you can have any number of items with any number of types
Is this you want to do???
I would suggest you to use one table for item and one table for type for the following reasons (assume there are 10 types).
I am not sure which programming language you are using. As a Java developer, i will have to create each entity class for each type if I have multiple tables. So i would rather have only one class and have a type as an attribute.
When you have to display all of the types in the same page, you will have to execute the select query from all 10 tables for 10 types.
When you introduce a new type, you have to write the code to for the CRUD and Business specific operations. The developer will keep on adding the code for every new type.
Basically, if you have one table for item and one table for type, you won't have to change the database schema and code for each new type you introduce. But if you are sure that, the number of types is less and won't change, you can consider using muiltiple tables.
Create two separate tables and join them as per your required output.
i.e>
1.1'st TABLE (master table==>item_type)
item_type(item_type_id,item_type_name,status)
2.2'nd TABLE(child table==>item_details)
item_details(item_id,item_type_id,item_title,item_description,item_thumbnail,item_keywords)
See more examples..
I feel signle table would be more suitable. It will avoid more joins, complication in program(Code) and errors in compare of multiple tables. Even it will be better from the management point of view like db clustering etc.
If you have so many tables which needs to have the same repeated columns then yes it is a good way to create a separate table for the common fields. This is more efficient if these repeated columns are not fixed and can be changed like adding one more column to the list of common default columns.
So how could you do that?
The idea is to create a seperate table and put the common default columns there.
This table is like a dummy table i.e. the columns can be added/deleted as needed.
For example-
Table - DefaultFields
Columns - item_title | item_description | item_thumbnail | item_keywords
You can then also be able to insert the values in the DefaultFields table dynamically in a loop like:
"INSERT INTO DefaultFields (item_table, item_title , item_description,item_thumbnail ,item_keywords) VALUES('"+ field.item_table + "','" + field.item_title + "','" + field.item_description+ "','" + field.item_thumbnail + "','" + field.item_keywords)");
NOTE: field is the object that holds the values in a table wise loop.
Then further you can alter your tables to create these default fields from DefaultFields table like:
"ALTER TABLE " + item_table+ " ADD COLUMN [" + field.item_title + "] Text"
This can be repeated for each table to alter it as needed.
In this design pattern, even if you want to:
1) add one more column or
2) delete pre existing column or
3) change pre existing column name
Then you can do so in the dummy table and the rest is updated by the ALTER table command in corresponding tables.
In my opinion... I would say no, never.
There is two reason for that:
You really want to preserve a logical meaning in your database. For now it's pretty obvious for you how it's organised. But in two month (or 1 year), will it be so evident? If somebody join the project, isn't it easier for him to understand if the different logical block of your app are separated? I mean... It's true that a human and a cat are animals. Is it still logical to store both of them inside the same box?
Performance. The shorter the table, the faster your request will be. The data will still take as much space on your disk. And i don't talk about the comparison for knowing which type of item you are looking for. I mean, if you want to select all the pages of your application, just compare the two request:
Multiple tables:
Select * from pages_tbl;
Single table:
Select * from item_tbl where type = 'page';
What will you gain from this design? No performance, no disk space, no readability. I really don't see a good reason for it.

Users table - one table or two?

i wanna have a Users details stored in the database.. with columns like firstname, last name, username, password, email, cellphone number, activation codes, gender, birthday, occupation, and a few other more. is it good to store all of these on the same table or should i split it between two users and profile ?
If those are attributes of a User (and they are 1-1) then they belong in the user table.
You would only normally split if there were many columns; then you might create another table in a 1-1 mapping.
Another table is obviously required if there are many profile rows per user.
One table should be good enough.
Two tables or more generally vertical portioning comes in when you want to scale out. So you split your tables in multiple tables where usually the partiotioning criteria is the usage i.e., the most common attributes which are used together are housed in one table and others in another table.
One table should be okay. I'd be storing a hash in the password column.
I suggest you read this article on Wikipedia. about database normalization.
It describes the different possibilities and the pros and cons of each. It really depends on what else you want to store and the relationship between the user and its properties.
Ideally one table should be used. If the number of columns becomes harder to manage only then you should move them to another table. In that case, ideally, the two tables should have a one-one relationship which you can easily establish by setting the foreign key in the related table as the primary key:
User
-------------------------------
UserID INT NOT NULL PRIMARY KEY
UserProfile
-------------------------------------------------------
UserID INT NOT NULL PRIMARY KEY REFERENCES User(UserID)
Depend on what kind of application it is, it might be different.
for an enterprise application that my users are the employees as well, I would suggest two tables.
tbl_UserPersonallInformation
(contains the personal information
like name, address, email,...)
tbl_UserSystemInformation (contains
other information like ( Title,
JoinedTheCompanyOn,
LeftTheCompanyOn)
In systems such as "Document Managements" , "Project Information Managements",... this might be necessary.
for example in a company the employees might leave and rejoin after few years and even they will have different job title. The employee had have some activities and records with his old title and he will have some more with the new one. So it should be recorded in the system that with which title (authority) he had done some stuff.