Data normalization - mysql

Data normalization - mysql - mysql

I searched a bit as my case is same as "multiple tables vs comma separated values" but my case is different. I have done both practices but for a new requirement I am bit confused as the columns that contain comma separated columns are around 3-4...so what guys would suggest...
Example (Existing Scenario):
Student Table
---------------------
id name subjectsEnrolled gamesEnrolled BooksIssued absentOn
1 Naya yoga, maths cricket, football aka#date, baka#date,caka#date 20121010,20130103
2 sait science, botany football,chess,abc prea#date, sunbaka#date 20121010,20130103
Possible Fixes I have thought
Student Table (as student master data)
id name moredetailsofstudent like address & blah blah
1 Naya
2 Naya
then having another table absentTable
StudentId absentOn
1 20121010
1 20121011
2 20121010
2 20121011
1 20121012
then having another tables master "booksTable" with unique ids of books & booksIssueTable
booksIssueTable
bookid issuedby dated
4 1 a date
2 1 a date
and almost same for games (2 tables) & subjects (2 table) and storing same way i am doing above for books.
gameid enrolledby
4 1
2 12
4 10
3 12
and so on...
Am I going correct...just this is what i want to know from our database & coder experts.
note: I am not creating a school management...its just an example...as i couldn't post real data. In real scenraio I have too big data in main student table like 10-20 lac ...so absentTable, booksIssueTable, gamesEnollTable, subjectenrollTable can have easily 10 lac records * 1-10...= around 30-90 lacs each if i go with this solution. Please suggest.
Update
As of now, I have learnt that I am going in right direction, what would anyone suggest to have the dates in absenttable. Should i create a master table for the dates too, and then using its dateid to bind the absentTable ?

Yes, normalizing is better than storing comma-separated values. I gave a popular answer for the pitfalls of comma-separated values here: Is storing a delimited list in a database column really that bad?
As for your other question, no I wouldn't make a lookup table for simple date values. The scenario where that might be useful would be when you have many rows in AbsentTable that have the same date, and you would want to change the value for all these rows by updating a single row somewhere else. But I expect your AbsentTable would never need to do this.
By the way, using an id to refer to an entry in another table has nothing to do with normalization. You can use natural values both as the primary key of the referenced table, and the foreign key in the referencing table, and still both tables comply with normal form.
You may also be interested in what I have written about comma-separated values and database normalization in my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.

You're doing exactly what's expected to get a solution to your problem, even with large amounts of data, it gets to a point where having proper m-n tables will improve a lot more than having values separated by commas in the same fields.
So good job in finding that solution for yourself. Can't suggest any better.

Related

MySQL Unique Combination Across Many Columns

I want to have a unique pairing of columns in MySQL however it can be in any order...
So if I have three columns
Person1 Person2 Person3
and they all are FOREIGN keys to a person and I want these only to be allowed once in the table...
The row:
1 2 3
is the same as
3 2 1
is the same as
2 3 1
etc etc...so just doing UNIQUE(Person1, Person2, Person3) does not accomplish this task I do not think.

My solution (given this schema) would be to make the order predictable. The easiest way is to store them in descending order so that the first row always contains the smallest value and so on. This makes it easier to query and you will not be able to add two rows with the same members.
Since your question is pretty generic, my answer is vague. If you provide an example of what you are trying to accomplish, there might be better ways. In my experience, when you find yourself having columns representing the same type of relation it's often better to normalize. Going from friend1, friend2, ... friendN to a table of friends per user, for example.

Design database for 1 to 1 relationship when some columns are applicable only in certain cases

I have a users table which stores the details of two types of users namely students and teachers. There are 10 fields like username, password etc common to both students and teachers. There are no 1 to n relations in case of any data here.
In case of students, I have to store twenty different 1 to 1 data like weight, DOB, Admission No., Parent,Phone number etc.
In case of teachers, I have to store a separate set of twenty 1 to 1 data like email id, affiliation number etc which is not related to students in any way.
What is the best database structure I can use in this scenario from below? If there are better options please provide that too.
One table with 50 columns where 20 columns will have NULL in case of students and 20 columns will have NULL in case of teachers
One table with 30 columns where first 10 columns stores common data and next 20 columns store students details in case of student and teacher's data in case of teacher.
Two tables one with 10 column to store user details. And another table with 20 columns to store students details in case of student and teacher's data in case of teacher.
Three tables one with 10 column to store user details. Another table with 20 columns to store students details and yet another table with 20 columns to store teacher's data

Single Table Inheritance and Class Table Inheritance are both fine. In fact Fowler has recommended STI for agile. And if you use a good ORM like Hibernate, the difference is trivial. If you use PostgreSQL your nulls won't take up any extra space either.
That being said, you should further normalize your tables (parents phone #s should be in a diff table for example). See https://dba.stackexchange.com/questions/12991/ready-to-use-database-models-example/23831#23831 for some help

You have to remember the principles of relational design. All the columns should be dependent on the key fields and only on the key fields.
Its better to have choice 4 tables:
1) For a base person details (columns teachers and students both have).
2) A teacher table for details that pertain to only teachers. This will relate to base person table with a foreign key (just like table 3).
3) A student table for details that pertain to only students.
No extra empty columns and very flexible in the kind of queries (some of which that you are not anticipating) you will be able to do.

the First thing I thought of was a pigs ear relationship, a link entity so that you could have ID, teacherID, studentID to show which teachers teach which students, but then I realised this isn't what you asked for so...
Why not just have a single boolean, true if teacher, false if not?

Look up these two tags: single-table-inheritance class-table-inheritance
These correspond to well known techniques that are like option 1 and option 4. There are situations where one or the other of these is best. The tag wikis (info) and the questions grouped under the tags will give you some additional help.

The optimal way to store multiple-selection survey answers in a database

I'm currently working on a survey creation/administration web application with PHP/MySQL. I have gone through several revisions of the database tables, and I once again find that I may need to rethink the storage of a certain type of answer.
Right now, I have a table that looks like this:
survey_answers
id PK
eid
sesid
intvalue Nullable
charvalue Nullable
id = unique value assigned to each row
eid = Survey question that this answer is in reply to
sesid = The survey 'session' (information about the time and date of a survey take) id
intvalue = The value of the answer if it is a numerical value
charvalue = the value of the answer if it is a textual representation
This allowed me to continue using MySQL's mathematical functions to speed up processing.
I have however found a new challenge: storing questions that have multiple responses.
An example would be:
Which of the following do you enjoy eating? (choose all the apply)
Girl Scout Cookies
Bacon
Corn
Whale Fat
Now, when I want to store the result, I'm not sure of the best way to handle it.
Currently, I have a table just for multiple choice options that looks like this:
survey_element_options
id PK
eid
value
id = unique value associated with each row
eid = question/element that this option is associated with
value = textual value of that option
With this setup, I then store my returned multiple selection answers in 'survey_answers' as strings of comma separated id's of the element_options rows that were selected in the survey. (ie something like "4,6,7,9") I'm wondering if that is indeed the best solution, or if it would be more practical to create a new table that would hold each answer chosen, and then reference back to a given answer row which in turn references back to the element and ultimately the survey.
EDIT
for anyone interested, here is the approach I ended up taking (In PhpMyAdmin Relations View):
And a rudimentary query to gather the counts for a multiple select question would look like this:
SELECT e.question AS question, eo.value AS value, COUNT(eo.value) AS count
FROM survey_elements e, survey_element_options eo, survey_answer_options ao
WHERE e.id = 19
AND eo.eid = e.id
AND ao.oid = eo.id
GROUP BY eo.value

This really depends on a lot of things.
Generally, storing lists of comma separated values in a database is bad, especially if you plan to do anything remotely intelligent with that data. Especially if you want to do any kind of advanced reporting on the answers.
The best relational way to store this is to also define the answers in a second table and then link them to the users response to a question in a third table (with multiple entries per user-question, or possibly user-survey-question if the user could take multiple surveys with the same question on it.
This can get slightly complex as a a possible scenario as a simple example:
Example tables:
Users (Username, UserID)
Questions (qID, QuestionsText)
Answers (AnswerText [in this case example could be reusable, but this does cause an extra layer of complexity as well], aID)
Question_Answers ([Available answers for this question, multiple entries per question] qaID, qID, aID),
UserQuestionAnswers (qaID, uID)
Note: Meant as an example, not a recommendation

Convert primary key to not unique index and add answers for the same question under the same id.
For example.
id | eid | sesid | intval | charval
3 45 30 2
3 45 30 4
You can still add another column for regular unique PK if needed.
Keep things simple. No need for relation here.

It's a horses for courses thing really.
You can store as a comma separated string (But then what happens when you have a literal comma in one of your answers).
You can store as a one-to-many table, such as:
survey_element_answers
id PK
survey_answers_id FK
intvalue Nullable
charvalue Nullable
And then loop over that table. If you picked one answer, it would create one row in this table. If you pick two answers, it will create two rows in this table, etc. Then you would remove the intvalue and charvalue from the survey_answers table.
Another choice, since you're already storing the element options in their own table, is to create a many-to-many table, such as:
survey_element_answers
id PK
survey_answers_id FK
survey_element_options_id FK
Again, one row per option selected.
Another option yet again is to store a bitmask value. This will remove the need for a many-to-many table.
survey_element_options
id PK
eid FK
value Text
optionnumber unique for each eid
optionbitmask 2 ^ optionnumber
optionnumber should be unique for each eid, and increment starting with one. There will impose a limit of 63 options if you are using bigint, or 31 options if you are using int.
And then in your survey_answers
id PK
eid
sesid
answerbitmask bigint
Answerbitmask is calculated by adding all of the optionbitmask's together, for each option the user selected. For example, if 7 were stored in Answerbitmask, then that means that the user selected the first three options.
Joins can be done by:
WHERE survey_answers.answerbitmask & survey_element_options.optionbitmask > 0
So yeah, there's a few options to consider.

If you don't use the id as a foreign key in another query, or if you can query results using the sesid, try a many to one relationship.
Otherwise I'd store multiple choice answers as a serialized array, such as JSON or through php's serialize() function.

MySQL - How can i query a multi-value field to match a primary key of another table?

I have 2 tables. One table contains all of the states in the USA. The other table is just a list of stuff in those states.
My table is structure looks something like this:
tbl_states - stateID (PK), stateName
tbl_stuff - stuffID, stuffName, relState
The values look like this
1 | Alabama
2 | Georgia
3 | Maryland
The relState column relates to the tbl_states.stateID column and i have it in this format. I plan to have a webform to select multiple states and assign the stuff in the states to the state.
1 | This is some stuff | 1,2 [ and this stuff is only AL, GA. ]
So I'm trying to figure out the best way to write the select statement for this. Is there some way to do it strictly with mysql?

Multi-valued fields in a database are a bad idea. Instead, resolve the many-to-many relationship between states and stuff like this:

I came across this post while searching to do this myself and figured a second answer would be helpful. While it is true that multi-valued fields decrease search efficiency, impact scailability and promote data integrity problems, they can be necessary for simplicity, reporting, and integrating with other systems, as in my case it was.
Assuming the Tables:
Table: States
Id Name
235325235 'Alabama'
457457432 'Georgia'
334634636 'Maryland'
Table: Stuff
Id Text StateIds
1 'Some stuff' '235325235'
2 'Some Stuff for two states' '235325235,457457432'
The following query would return all stuffs for alabama
SELECT * FROM Stuff WHERE FIND_IN_SET('235325235', Stuff.StateIds);
Please note that i complicated your ID's more to make a lower probability of uniqueness and I would recommend using a GUID/UUID since you are using a string searching function.

Storing lots of user information into one MySQL table?

I'm working on a application where each user has to fill out an extensive profile for themselves.
The first part of the user profile consists of about 25 or so fields of general information
The next section of the user profile is a section where they evaluate themselves on a set list of criteria. ie, "Rate how good you are at cooking" and then they tick a radio box from one to five, there is also a check box that the can check if they are 'extra interested' in the activity/subject they rated themselves on.
There are about 40 of these that they rate themselves on.
So my question is, how should I store this information, should there be columns in my users table for every field and item? This would be nearly 70 fields
or should I setup a table for user_profile, and user_self_evaluation, and have the columns for each in there and have a one-one relationship with the users?

Use separate tables. In this way when you update only self evaluation, you does not need to update the user_profile table. The idea here is to separate the often updated fields in another table, leaving the rarely updated on another location. If the table became large, and the username/password is in separate table, the performance of lookup by userid / username won't be affected by the large amount of update queries, nor you'll bring the whole site down if you alter the self_evaluation table.
But if you are planning to add new evaluations, I'd suggest a different design:
user_profile table with the 25 profile field
self_evaluations table, with id and name, and any meta information about the question; with 1 record per evaluation
user_profile_evaluation with userid, evaluationid, score, extra - with one record for each evaluation of the user.
This way your schema will be much more flexible and you won't need to alter the table in order to add another evaluation.

or should I setup a table for user_profile, and user_self_evaluation,
and have the columns for each in there and have a one-one relationship
with the users?
Yes, this is how you should do it, if you know you won't expand the table in the future. The other option is too bad.
If you think you will expand the evaluations in the future, then you can do it like this:
user_self_evaluation table
user_id | evaluation_type | evaluation_value
1 | cooking | 5
1 | singing | 3
2 | cooking | 2
2 | singing | 5
Make the combined columns (user_id, evaluation_type, evaluation_value) a unique or the primary.

I think the latter one is better, a table with 70 columns is really bad-looking and can get really worse if you try to manage it.
When every question is multiple choice you can also add numbers in one field for each answer.
Let's say you've got four questions with 4 choices:
You could save them as 1433 in one column called answerers, (the first question is answer 1, second answer 4, third answer 3, and last but not least question 4 is answer 3)
Just giving you some choices here.
But if I had to choose between one-one relationship and 1 table, I would choose one one relationship because it's easier to manage later on.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008