Storing lots of user information into one MySQL table? - mysql

I'm working on a application where each user has to fill out an extensive profile for themselves.
The first part of the user profile consists of about 25 or so fields of general information
The next section of the user profile is a section where they evaluate themselves on a set list of criteria. ie, "Rate how good you are at cooking" and then they tick a radio box from one to five, there is also a check box that the can check if they are 'extra interested' in the activity/subject they rated themselves on.
There are about 40 of these that they rate themselves on.
So my question is, how should I store this information, should there be columns in my users table for every field and item? This would be nearly 70 fields
or should I setup a table for user_profile, and user_self_evaluation, and have the columns for each in there and have a one-one relationship with the users?

Use separate tables. In this way when you update only self evaluation, you does not need to update the user_profile table. The idea here is to separate the often updated fields in another table, leaving the rarely updated on another location. If the table became large, and the username/password is in separate table, the performance of lookup by userid / username won't be affected by the large amount of update queries, nor you'll bring the whole site down if you alter the self_evaluation table.
But if you are planning to add new evaluations, I'd suggest a different design:
user_profile table with the 25 profile field
self_evaluations table, with id and name, and any meta information about the question; with 1 record per evaluation
user_profile_evaluation with userid, evaluationid, score, extra - with one record for each evaluation of the user.
This way your schema will be much more flexible and you won't need to alter the table in order to add another evaluation.

or should I setup a table for user_profile, and user_self_evaluation,
and have the columns for each in there and have a one-one relationship
with the users?
Yes, this is how you should do it, if you know you won't expand the table in the future. The other option is too bad.
If you think you will expand the evaluations in the future, then you can do it like this:
user_self_evaluation table
user_id | evaluation_type | evaluation_value
1 | cooking | 5
1 | singing | 3
2 | cooking | 2
2 | singing | 5
Make the combined columns (user_id, evaluation_type, evaluation_value) a unique or the primary.

I think the latter one is better, a table with 70 columns is really bad-looking and can get really worse if you try to manage it.
When every question is multiple choice you can also add numbers in one field for each answer.
Let's say you've got four questions with 4 choices:
You could save them as 1433 in one column called answerers, (the first question is answer 1, second answer 4, third answer 3, and last but not least question 4 is answer 3)
Just giving you some choices here.
But if I had to choose between one-one relationship and 1 table, I would choose one one relationship because it's easier to manage later on.

Related

MySQL Unique Combination Across Many Columns

I want to have a unique pairing of columns in MySQL however it can be in any order...
So if I have three columns
Person1 Person2 Person3
and they all are FOREIGN keys to a person and I want these only to be allowed once in the table...
The row:
1 2 3
is the same as
3 2 1
is the same as
2 3 1
etc etc...so just doing UNIQUE(Person1, Person2, Person3) does not accomplish this task I do not think.
My solution (given this schema) would be to make the order predictable. The easiest way is to store them in descending order so that the first row always contains the smallest value and so on. This makes it easier to query and you will not be able to add two rows with the same members.
Since your question is pretty generic, my answer is vague. If you provide an example of what you are trying to accomplish, there might be better ways. In my experience, when you find yourself having columns representing the same type of relation it's often better to normalize. Going from friend1, friend2, ... friendN to a table of friends per user, for example.

Design database for 1 to 1 relationship when some columns are applicable only in certain cases

I have a users table which stores the details of two types of users namely students and teachers. There are 10 fields like username, password etc common to both students and teachers. There are no 1 to n relations in case of any data here.
In case of students, I have to store twenty different 1 to 1 data like weight, DOB, Admission No., Parent,Phone number etc.
In case of teachers, I have to store a separate set of twenty 1 to 1 data like email id, affiliation number etc which is not related to students in any way.
What is the best database structure I can use in this scenario from below? If there are better options please provide that too.
One table with 50 columns where 20 columns will have NULL in case of students and 20 columns will have NULL in case of teachers
One table with 30 columns where first 10 columns stores common data and next 20 columns store students details in case of student and teacher's data in case of teacher.
Two tables one with 10 column to store user details. And another table with 20 columns to store students details in case of student and teacher's data in case of teacher.
Three tables one with 10 column to store user details. Another table with 20 columns to store students details and yet another table with 20 columns to store teacher's data
Single Table Inheritance and Class Table Inheritance are both fine. In fact Fowler has recommended STI for agile. And if you use a good ORM like Hibernate, the difference is trivial. If you use PostgreSQL your nulls won't take up any extra space either.
That being said, you should further normalize your tables (parents phone #s should be in a diff table for example). See https://dba.stackexchange.com/questions/12991/ready-to-use-database-models-example/23831#23831 for some help
You have to remember the principles of relational design. All the columns should be dependent on the key fields and only on the key fields.
Its better to have choice 4 tables:
1) For a base person details (columns teachers and students both have).
2) A teacher table for details that pertain to only teachers. This will relate to base person table with a foreign key (just like table 3).
3) A student table for details that pertain to only students.
No extra empty columns and very flexible in the kind of queries (some of which that you are not anticipating) you will be able to do.
the First thing I thought of was a pigs ear relationship, a link entity so that you could have ID, teacherID, studentID to show which teachers teach which students, but then I realised this isn't what you asked for so...
Why not just have a single boolean, true if teacher, false if not?
Look up these two tags: single-table-inheritance class-table-inheritance
These correspond to well known techniques that are like option 1 and option 4. There are situations where one or the other of these is best. The tag wikis (info) and the questions grouped under the tags will give you some additional help.

Data normalization - mysql

I searched a bit as my case is same as "multiple tables vs comma separated values" but my case is different. I have done both practices but for a new requirement I am bit confused as the columns that contain comma separated columns are around 3-4...so what guys would suggest...
Example (Existing Scenario):
Student Table
---------------------
id name subjectsEnrolled gamesEnrolled BooksIssued absentOn
1 Naya yoga, maths cricket, football aka#date, baka#date,caka#date 20121010,20130103
2 sait science, botany football,chess,abc prea#date, sunbaka#date 20121010,20130103
Possible Fixes I have thought
Student Table (as student master data)
id name moredetailsofstudent like address & blah blah
1 Naya
2 Naya
then having another table absentTable
StudentId absentOn
1 20121010
1 20121011
2 20121010
2 20121011
1 20121012
then having another tables master "booksTable" with unique ids of books & booksIssueTable
booksIssueTable
bookid issuedby dated
4 1 a date
2 1 a date
and almost same for games (2 tables) & subjects (2 table) and storing same way i am doing above for books.
gameid enrolledby
4 1
2 12
4 10
3 12
and so on...
Am I going correct...just this is what i want to know from our database & coder experts.
note: I am not creating a school management...its just an example...as i couldn't post real data. In real scenraio I have too big data in main student table like 10-20 lac ...so absentTable, booksIssueTable, gamesEnollTable, subjectenrollTable can have easily 10 lac records * 1-10...= around 30-90 lacs each if i go with this solution. Please suggest.
Update
As of now, I have learnt that I am going in right direction, what would anyone suggest to have the dates in absenttable. Should i create a master table for the dates too, and then using its dateid to bind the absentTable ?
Yes, normalizing is better than storing comma-separated values. I gave a popular answer for the pitfalls of comma-separated values here: Is storing a delimited list in a database column really that bad?
As for your other question, no I wouldn't make a lookup table for simple date values. The scenario where that might be useful would be when you have many rows in AbsentTable that have the same date, and you would want to change the value for all these rows by updating a single row somewhere else. But I expect your AbsentTable would never need to do this.
By the way, using an id to refer to an entry in another table has nothing to do with normalization. You can use natural values both as the primary key of the referenced table, and the foreign key in the referencing table, and still both tables comply with normal form.
You may also be interested in what I have written about comma-separated values and database normalization in my book, SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
You're doing exactly what's expected to get a solution to your problem, even with large amounts of data, it gets to a point where having proper m-n tables will improve a lot more than having values separated by commas in the same fields.
So good job in finding that solution for yourself. Can't suggest any better.

Is it a bad idea to reference a row from a many-to-many table in another many-to-many table

If I have a schema similar to this:
TABLE 1
id
column
other_column
etc
TABLE 2
id
table1_id
some_other_table_id
Is it a good idea to add a third table like this:
TABLE 3
id
table2_id
row_from_another_table_id
EDIT:
To make things clearer, consider a schema like this:
EVENTS
id
name
other_stuff
RANGES
id
time_from
time_to
max_people
etc
EVENTS_PLACES
id
event_id
place_id
What I want to do is to define a time range for an event. But a specific event in a specific place(EVENTS_PLACES) can 'overwrite' this ranges. Also an event can have multiple ranges.
I hope this makes the question a little bit more clear now.
Its always been my impression that a many to many relationship is a violation Boyce-Codd Normal Form and therefore a violation of a good relational database schema.
Therefore, relating data to a link table is, infact, necessary to achieve BCNF and therefore good. If avoiding data update anomolies is good.
On to the specific schema example you presented. I think you want these logical tables (or entities),
-----------------------
EventClass
-----------------------
Id
Name
... Other attributes common to every instance
-
-----------------------
TimeSlot
-----------------------
Id
Start
End
-
-----------------------
Place
-----------------------
Id
Name
Address
MaxAttendance
... etc
-
----------------------
EventInstance
-----------------------
Id
EventClassId
TimeSlotId
PlaceId
PresenterName
...Other attributes specific to the instance
EventInstance is a realtionship between EventClass, TimeSlot and Place, any attributes specific to the EventInstance should be stored on that entity. Any attributes common to a related group of events should be stored on the EventClass attribute.
Its all a question of Database Normalization, generally speaking, the more normalized the data the better. However, there is a case for compromise when performance is a concern, if the desired data is stored in the output format it does make a select query simpler and faster although, updates might be hell.
I would counteract the case for compromise by suggesting that, with the right Indecies, Materialized Views and, indecies on Materialized Views, you can get the best of both worlds. The maintainability of fully normalized data with the speed of performance. Although, it does require some skill and consideration to get the schema right.
So you have a relation between two tables with properties, and you have a subclass of that relation with some more properties. This is rare but possible.
Suppose in your polygamous hetero dating site one or more Woman entities has a relation with one or more Man. These two tables may be coupled with a junction table, Relationship. Now some of them are married, which you consider a special type of relationship. So Marriage is a subclass of Relationship, and the Marriage table has a reference to the id in the Relationship table.
Of course, it may be simpler to solve such situations in another way, for example to simply have two junction tables between Man and Woman. But there are certainly situations in which you would want to extend on the relationship in the junction table.
Another option would be to add a column to your TABLE2 that describes the nature of the connection between "things". For example, a PERSON table and a RELATIONSHIPS table, you model your "objects" in the first table, then the "links" in the seconds, e.g.
+---------+---------+-------+--------+
| link_id | from_id | to_id | type |
+---------+---------+-------+--------+
| 1 | 2 | 3 | Mother |
| 2 | 8 | 3 | Sister |
+---------+---------+-------+--------+
With appropriate indices, this means you can do things like find all relationships for a given person, or find everyone who has a sister, etc. This is a simple example, but it starts to get interesting when the from_id and to_id can be different types of object i.e. not just people.
I'd used this approach in the past when working with a very generic schema that was aggregating data from a variety of other sources and had to be flexible. Clearly there's a trade-off between flexibility and e.g. speed, query complexity. You have to decide whether it's useful in your case.

Database design to allow for time

So, I want a table of data I have to be represented in time.
That is, I want to collect the data over time and have it record when that is.
Currently, the table has an id, a column with a category name, and a column for a value.
There are about ten categories - each of them has a value associated with the number of occurrences right now.
What's the most efficient way to denote time in this circumstance? Assign a timestamp column to each row? And then when I want to show this information, merely collect via timestamp? I'm really flying blind here.
Would this require having multiple, repeated values of the category column?
This is all in MySQL.
Let me clarify:
Three columns:
ID | Category | Value
I want to record the values over time - I am going to be making a parse once per hour.
The categories will be like:
Happy
Sad
Angry
etc
How do I record time data with this? I want to keep the values per hour.
I was thinking about just repeating all of the categories with timestamp data. Would that be the best idea?
So you'd have
Category | Value | Time
Happy | 0 | 072012
Happy | 2 | 072112
Happy | 1 | 072212
etc - this is what I'm thinking should be done to store constantly changing data.
What is the timestamp, data related to the category or a child of the category? If it is related to the category then adding a timestamp field to the category is probably best. If it is a concept that is now (or in the future) related to transactions in the child (so child would already have this info), I'd probably use a query/view (or a materialized view if there were performance issues). It might also depend on how often you see that timestamp changing and likely data access requirements on subordinate tables.
MeasuredAt is the timestamp
You may use FeelingName instead of int ID too, it will just use more space.
EDIT