I saw a lot of questions here but no one fits with my problem. I'm trying to create an ER model scalable, and if I want to add more data don't break almost anything, so what I've trying to create is :
There are 2 types of users, let's say Admin and Worker, and they have different roles.
Admin can do a CRUD of questions, and also can create a room where the users can join to play together (this is just a name, something like Kahoot! does) but maybe is a good idea to create more attributes inside of it like, WHO is playing in this room, POINTS for everyone but let's talk it afterwards when I show you the design.
Ok the thing is, on my design I have :
Table User which contains :
_id
username
password
date_creation
This is a default one, but then I'm wondering how do I define the Role if it's an Admin or a Worker, something like isAdmin:true and then I check this Bool? Or I can create another table that is Role and connect it to User table?
But maybe I have to create a table for both, I mean there's an Admin which has a password, and some functionalities and then ther'es the user Worker which has another password and another functionalities.
Then I'd like to have the Question table where contains :
_id
question_name
answers[1,2,3,4]
correctAnswer or answers because it can be multi option chooser
topic
isExamQuestion
dificulty
Then the Room table should contains :
_id
name
capacity
type (game can be as a group or solo) that's why this attribute
exam (This should be a flag to know if this question is for an exam or not (It can be for EXAM or PRACTISE)
ranking (This is the top X from 1 to X)
don't think if I have to add the winner here because if I get the position 0 from ranking I get the winner...
There's a table named Topic as well, if my question have a topic then I can select the question by Topic. An example of Topic should be Math so user can do only exams or do tests with math questions.
_id
Name
Questions[...]
Then I have to store like a historic about what are the questions worker has answered correct and what did not, to make some statistics, but I need to store some historicals for Admin to see in this topic the average that Workers have failed more is : Question23 (for instance) something like that.
What I'm missing, could you try to help me to figure it out how to make this design better?
NOTE : I'm using Spring for server side, Angular for Frontend stuff, and Android for App, I can change anything to work faster/better with this database though.
EDIT
There's the flow of the game if you need more details and if I'm explainted wrong .
Admin flow
Create questions (with different kinds of answers like True/false, with a checkbos (single and multianswer), text, etc...)
Create a "game" where workers can join (This is mostly programming stuff) but it should be a room with attributes there, like id of the room, maxNumber, type (exam), and store historicals, theres also a type of game (for instance, images, videos, etc..)
See statistics about Workers it means see how many answers they answered correct, failed, see per topic (This is like joins and stuff, but the design has to be good done)
See historic of the exams that he did before with all of the information (participant, score, time, stuff)
And the Worker flow is
He can practise means that he's answering questions randomly or by topic (every single answer should be saved for statistics and to avoid repeat the ones he respons correct), also he can do exams (not multiplayer) just an option that Admin can check if the question is part of an exam or not.
And then the room stuff, he can join with the Id.
If you need further clarification let me know and I'll reply you as soon as possible.
In fact, your system has three logical parts (modules):
users module that contains user data and implements authentication and the authorization of user actions
questionnaires module that includes management of questions and answer
questionnaires history module that contains history by each user
Database design of those modules can look as follows
USER MODULE:
role - contains user roles in the system
id - unique identifier of the role
name - the role name, for example, admin, worker, etc.
user - contains users and information about roles were assigned to them
id - unique identifier of the user
username
password
role_id - identifier the role was assigned to the user
QUESTIONNAIRES MODULE:
topic - contains question themes
id - unique identifier of the theme
name - a name of the theme
question - contains questions
id - unique identifier of the question
topic_id - topic identifier of the question
text - content of the question
is_exam_question - exam question or not
type - type of answers (boolean, checkboxes or etc.)
difficulty
answer - contains all answers of questions
id - unique identifier of the answer
question_id - identifier of the question that contains the answer
text - content of the question
is_correct - flag that means the answer is true or false
room - contains information about rooms
id - unique identifier of the rum
name - name of the rum
capacity - the maximum number of workers which can join to the room
type - room type: group, solo or etc.
learing_type - room type: exam, practice or etc.
user_in_room - contains information about users which were joined to the room
user_id - identifier of the user that was joined to the room
room_id - identifier of the room
score - current score of the user in the room
HISTORY MODULE:
user_question_history - contains information about questions which were answered by the user
user_id - identifier of the user
room_id - identifier of the room where the user answered questions
question_id - identifier of the question that was answered by the user
score - user score by the question
user_answer_history - contains information about answers which were chosen by the user
user_id - identifier of the user
room_id - identifier of the room where the user answered questions
question_id - identifier of the question that was answered by the user
answer_id - identifier of the answer that was chosen the user
Usage of this schema gives the ability to build different reports. For example, you can display the result of all users by room
SELECT r.id,
r.name,
u.username,
ur.score
FROM room as r
LEFT JOIN user_in_room as ur ON ur.room_id = r.id
LEFT JOIN user as u ON u.id = ur.user_id
WHERE r.id = <id>
Or you can see detail information about answers of the user
SELECT
q.text,
a.text
FROM user_in_room as ur ON ur.room_id = r.id
LEFT JOIN user_question_history as uqh ON ugh.user_id = ur.user_id AND ugh.root_id = ur.room_id
LEFT JOIN question as q ON q.id = ugh.question_id
LEFT JOIN user_answer_history as uah ON uah.user_id = ugh.user_id AND uah.room_id = ugh.room_id AND uah.question_id = ugh.question_id
LEFT JOIN answer as a ON a.id = uah.answer_id
WHERE ur.room_id = <id> AND ur.user_id = <id>
Honestly, if you're certain you'll only have to possible types--both now and in the future--a simple bool isAdmin in your user table will work nicely. Everything else the admin does can be handled from the programming side. No need to clog up your db.
That said, if there's even a small chance you will have other types of users in the future, Maxim's answer is the way to go. That way, adding another role such as "Editor," for instance, is a simple as adding a record to the "Role" table. In fact, adding 1000 new types users is as simple as adding records to your "Role" table. Because you already have a table that you look to for the role, your code doesn't have to worry about all the possible types of users (which is a pain if you have a lot of them), only the ones it needs. The db is handling the rest.
One drawback to Maxim's answer is that it takes more work to implement in the db, and more work to view/update the role of the user.
To implement in the db, you need to create a whole extra table and make sure it's linked properly. This isn't hard, but, especially if you're new to dbs, is extra work that might not be worth it to you. This also means, from a maintenance side, that you have another table to keep tabs on. Not a huge deal, but, again, something to think about.
From the code side, this creates extra chores as well. For one, the user type is no longer directly in your users table--an ID is. This means when you want to know the name of the user type, you will have to either a) query the db for that user type based on the ID you have, or b) join the 2 tables, which can be tricky at times. Updating the role also has similar chores.
Again, these aren't huge problems, just things to think about. It might not be worth the extra work if you will only have two possible options.
So, in short, a boolean will work and is simple to implement, but is not scalable. Maxim's answer is quite scalable and makes future expansion easier, but somewhat harder to implement and maintain. You'll have to make the decision what you prefer.
You need a Profile table with one-to-many with User table, this way if another privelege you want to apply, just add new profile entry:
User table:
Id
Username
Fullname
Profile_id
...
Profile table:
Id
Name
Description
Exam and Question tables are related with many-to-many, to break it you will have a third table Question_Exam:
Question_Exam:
id_exam
Id_Question
Answer(provided)
Id_user(taking the exam)
IsCorrect(boolean, to avoid farther queries )
date
Topic and Question are one-to-many
Question table:
Id
Name
Answer(The correct one)
Id_topic
The other structure is fine.
Related
I'm about to develop a database used for storing questionnaire with a indefinite number of questions. This database will communicate with a server, used for registering the answers and retrieving them when needed. For each user, I need to store the answer to each question, which is server-side mandatory.
The structure should be like that:
QUESTIONNAIRE:
Questionnaire n.1 (Question1,Question2,...,QuestionN)
Questionnaire n.2 (Question1,...,QuestionM)
...
USER:
User1
User2
...
ANSWERS
User1 - Questionnaire n.1 (AnswToQuestion1, AnswToQuestion2...AnswToQuestionN)
User2 - Questionnaire n.1 (AnswToQuestion1, AnswToQuestion2...AnswToQuestionN)
User2 - Questionnaire n.2 (AnswToQuestion1, AnswToQuestion2...AnswToQuestionM)
Given the specifications above, I face 2 major problems:
Indefinite number of questions means that a table must store each individual questions for each questionnaire
There should be a corrispondence between question and answer
I thought about 2 different solution for facing such problems, but I can really decide between the 2.
The first one would consist in creating 5 differents table, as follows:
The idea is basically to store users in USER table, and for each user store an ANSWER which refers to a specific questionnaire. This gives an information like 'user x has replied to questionnaire'. Then, the unique id id:ANSWER would be used to mark a given answer in MARKETINGANSWER. The QUESTIONNAIRE table would save each instance of a questionnaire, and provide an unique id to link each different question in the MARKETINGQUESTION table. At the end, the unique id id:MARKETINGQUESTION would be used to map a question to a given answer.
What does not convince me: this has too many layers of indirection. I do not like the forced connection of MARKETINGANSWER and MARKETINGQUESTION, yet I need a way to map them. Moreover, I need the general table for ANSWER, since it provides some general info about the questionnaire (the hidden columns).
Then I thougth about a second approach: creating an unique ANSWER and QUESTIONNAIRE table which would envelop in a TEXT attribute the series of answers/questions separated by a special escape character (such as #&, for instance). This would ruin the mapping, but it can be implemented in the web application - I was thinking about a control on the number of question/answer, the ordering would be ensured on questionnaire registration and commit on the database. Moreover, the schema would be cleaner. Yet, I would lose a consistent representation in the database schema.
I would like, if possible, some suggestion about the proposed solutions. I am pretty sure there is something simple I cannot quite grasp.
Thank you in advance.
So I have this application that I'm drawing up and I start to think about my users. Well, My initial thought was to create a table for each group type. I've been thinking this over though and I'm not sure that this is the best way.
Example:
// Users
Users [id, name, email, age, etc]
// User Groups
Player [id, years playing, etc]
Ref [id, certified, etc]
Manufacturer Rep [id, years employed, etc]
So everyone would be making an account, but each user would have a different group. They can also be in multiple different groups. Each group has it's own list of different columns. So what is the best way to do this? Lets say I have 5 groups. Do I need 8 tables + a relational table connecting each one to the user table?
I just want to be sure that this is the best way to organize it before I build it.
Edit:
A player would have columns regarding the gear that they use to play, the teams they've played with, events they've gone to.
A ref would have info regarding the certifications they have and the events they've reffed.
Manufacturer reps would have info regarding their position within the company they rep.
A parent would have information regarding how long they've been involved with the sport, perhaps relations with the users they are parent of.
Just as an example.
Edit 2:
**Player Table
id
user id
started date
stopped date
rank
**Ref Table
id
user id
started date
stopped date
is certified
certified by
verified
**Photographer / Videographer / News Reporter Table
id
user id
started date
stopped date
worked under name
website / channel link
about
verified
**Tournament / Big Game Rep Table
id
user id
started date
stopped date
position
tourney id
verified
**Store / Field / Manufacturer Rep Table
id
user id
started date
stopped date
position
store / field / man. id
verified
This is what I planned out so far. I'm still new to this so I could be doing it completely wrong. And it's only five groups. It was more until I condensed it some.
Although I find it weird having so many entities which are different from each other, but I will ignore this and get to the question.
It depends on the group criteria you need, in the case you described where each group has its own columns and information I guess your design is a good one, especially if you need the information in a readable form in the database. If you need all groups in a single table you will have to save the group relevant information in a kind of object, either a blob, XML string or any other form, but then you will lose the ability to filter on these criteria using the database.
In a relational Database I would do it using the design you described.
The design of your tables greatly depends on the requirements of your software.
E.g. your description of users led me in a wrong direction, I was at first thinking about a "normal" user of a software. Basically name, login-information and stuff like that. This I would never split over different tables as it really makes tasks like login, session handling, ... really complicated.
Another point which surprised me, was that you want to store the equipment in columns of those user's tables. Usually the relationship between a person and his equipment is not 1 to 1 and in most cases the amount of different equipment varies. Thus you usually have a relationship between users and their equipment (1:n). Thus you would design an equipment table and there refer to the owner's user id.
But after you have an idea of which data you have in your application and which relationships exist between your data, the design of the tables and so on is rather straitforward.
The good news is, that your data model and database design will develop over time. Try to start with a basic model, covering the majority of your use cases. Then slowly add more use cases / aspects.
As long as you are in the stage of planning and early implementation phasis, it is rather easy to change your database design.
Assuming I want to have a web application that requires storing user information, images, etc as well as storing status updates or posts/comments would I want to separate tables?
For example if I have a "users" table that contains users information like passwords, emails, and typical social networking info like age, location etc. Would it be a good idea do create a second table("posts") that handles user content such as comments and/or post?
Table one: "users"
UserID
Username
Age
etc.
Table Two: "posts"
PostID
PostContent
PostAuthor
PostDate
etc
Is this a valid organization? Furthermore if I wanted to keep track of media should I do this in ANOTHER table?
Table Three: "media"
ID
Type
Uploader
etc.
Any help is much appreciated. I'm curious to see if I'm on the right track or just completely lost. I am mostly wondering if I should have many tables or if I should have larger less segregated tables.
Also of note thus far I planned on keeping information such as followers(or friends) in the 'users' table but I'm not sure that's a good idea in retrospect.
thanks in advance,
Generally speaking to design a database you create a table for each object you will be dealing with. In you example you have Users, Posts, Comments and Media. From that you can flesh out what it is you want to store for each object. Each item you want to store is a field in the table:
[Users]
ID
Username
PasswordHash
Age
Birthdate
Email
JoinDate
LastLogin
[Posts]
ID
UserID
Title
Content
CreateDate
PostedDate
[Comments]
ID
PostID
UserID
Content
[Media]
ID
Title
Description
FileURI
Taking a look above you can see a basic structure for holding the information for each object. By the field names you can even tell the relationships between the objects. That is a post has a UserID so the post was created by that user. the comments have a PostID and a UserID so you can see that a comment was written by a person for a specific post.
Once you have the general fields identified you can look at some other aspects of the design. For example right now the Email field under the Users table means that a user can have one (1) email address, no more. You can solve this one of two ways... add more email fields (EmailA, EmailB, EmailC) this generally works if you know there are specific types of emails you are dealing with, for example EmailWork or EmailHome. This doesn't work if you do not know how many emails in total there will be. To solve this you can pull the emails out into its own table:
[Users]
ID
Username
PasswordHash
Age
Birthdate
JoinDate
LastLogin
[Emails]
ID
UserID
Email
Now you can have any number of emails for a single user. You can do this for just about any database you are trying to design. Take it in small steps and break your bigger objects into smaller ones as needed.
Update
To deal with friends you should think about the relationship you are dealing with. There is one (1) person with many friends. In relation to the tables above its one User to many Users. This can be done with a special table that hold no information other than the relationship you are looking for.
[Friends]
[UserA]
[UserB]
So if the current user's ID is in A his friend's ID is in B and visa-verse. This sets up the friendship so that if you are my friend, then I am your friend. There is no way for me to be your friend without you being mine. If you want to setup the ability for one way friendships you can setup the table like this:
[Friends]
[UserID]
[FriendID]
So If we are both friends with each other there would have to be 2 records, one for my friendship to you and one for your freindship to me.
You need to use multiple tables.
The amount of tables depends on how complex you want your interactive site to be. Based on what you have posted you would need a table that would store information about the users, a table for comments, and more such as a table to store status types.
For example tbl_Users should store:
1. UserID
2. First Name
3. Last name
4. Email
5. Password (encrypted)
6. Address
7. City
8. State
9. Country
10. Date of Birth
11. UserStatus
12. Etc
This project sounds like it should be using a relational DB that will pull up records, such as comments, by relative userIDs.
This means that you will need a table that stores the following:
1. CommentID (primary key, int, auto-increment)
2. Comment (text)
3. UserID (foreign key, int)
The comment is attached to a user through a foreign key, which is essentially the userId from the tbl_Users table. You would need to combine these tables in an SQL statement with your script to query the information as a single piece of information. See example code
$sql_userWall = "SELECT tbl_Users.*, tbl_Comments.*, tbl_userStatus FROM tbl_Users
INNER JOIN tbl_Comments ON tbl_Users.userID = tbl_Comments.userID
INNER JOIN tbl_UserStatus ON tbl_Users.userID = tbl.UserStatus
WHERE tbl_Users.userID = $userID";
This statement essentially says get the information of the provided user from the users table and also get all the comments with that has the same userID attached to it, and get the userStatus from the table of user status'.
Therefore you would need a table called tbl_userStatus that held unique statusIDs (primary key, int, auto-incrementing) along with a text (varchar) of a determined length that may say for example "online" or "offline". When you started the write the info out from e record using php, asp or a similar language the table will automatically retrieve the information from tbl_userStatus for you just by using a simple line like
<?php echo $_REQUEST['userStatus']; ?>
No extra work necessary. Most of your project time will be spent developing the DB structure and writing SQL statements that correctly retrieve the info you want for each page.
There are many great YouTube video series that describe relational DBS and drawing entity relational diagrams. This is what you should look into for learning more on creating the tye of project you were describing.
One last note, if you wanted comments to be visible for all members of a group this would describe what is known as a many-to-many relationship which would require additional tables to allow for multiple users to 'own' a relationship to a single table. You could store a single groupID that referred to a table of groups.
tbl_groups
1. GroupID
2. GroupName
3. More group info, etc
And a table of users registered for the group
Tbl_groupMembers
1. membershipCountID (primary key, int, auto-increment)
2. GroupID (foriegn key, int)
3. UserID (foriegn key, int)
This allows users to registrar for a group and inner join them to group based comments. These relationships take a little more time to understand, the videos will help greatly.
I hope this helps, I'll come back and post some YouTube links later that I found helpful learning this stuff.
Let us suppose I have a site with a certain number of users with the following three distinguishing characteristics:
1) The user is part of a network. (The site contains multiple networks.)
2) The user is a 'contact' of a certain number of other site members.
3) Individual documents uploaded by a user may be shared with certain contacts (excluding other contacts).
In this way, a user's document search is unique for each user based upon his or her network, contacts, and additional documents that have been shared with that user. What would be possible ways to address this -- would I need to append a long unique SQL query for each user for each of his or her searches? I am currently using MySQL as a database -- would using this be sufficient, or would I need to move towards a NoSQL option here to maintain the performance of a similar non-filtered search?
A few questions come to mind to help answer this question:
How many documents do you think the average user will have access to? Will many documents in the network be shared for all to see?
How will users be able to find documents and what do the documents look like? Will they only be able to search by the contact that shared it? By a simple title match? Will they be able to run a full text search against the document's contents?
Depending on the answer to those two questions, a relational system could work just fine, which I'm guessing is preferable since you are already using MySql. I think you could locate the documents for an individual user in a relational system with a few very reasonable queries.
Here is a potential bare bones schema
User
--all users in the system
UserId int
NetworkId int (Not sure if this is a 1 to many relationship)
Document
--all documents in the system
DocumentId int
UserId int -- the author
Name varchar
StatusId -- perhaps a flag to indicate whether it is public or not, e.g. shared with everyone in the same network or shared with all contacts
UserDocumentLink
--Linking between a document and the contacts a user has shared the document with
DocumentId
ContactId
UserContact
--A link between a user and all of their contacts
ContactId -- PK identity to represent a link between two users
UserId -- User who owns the contact
ContactUserId --The contact user
Here is a potential "search" query:
--documents owned by me
SELECT DocumentId
from Document where UserId = #userId
UNION
--documents shared with me explicitly
SELECT DocumentId
From UserContact uc
InnerJoin UserDocumentLink ucl on uc.ContactId = ucl.ContactId
Where
uc.ContactUserId = #userId
UNION
--documents shared with me via some public status, using a keyword filter
Select DocumentId
From Document d
inner join User u on d.UserId = u.UserId
where
u.NetworkId = #userNetworkId
and d.status in ()
and d.Name like '%' + #keyword + '%'
I think what might be a more influential requirement for schema design is one that is not mentioned in your question - how will users be able to search through documents? And what kind of documents are we talking about here? MySql is not a good option for full text search.
It rather depends on what you mean by a "certain number" of users. If you mean a few tens of thousands, then almost any solution can be made to perform adequately. If you mean many millions, then a NoSQL solution may scale up more cheaply and easily.
I suspect that a more general SQL query can be used, rather than a unique one for each user, e.g. selecting documents that belong to users that know the current user, that are marked as being shared with the current user, and match the search string.
Denormalisation can probably be used (as is common in NoSQL approaches) to improve performance.
However, a graph database (as Peter Neubauer suggests) possibly in combination with a document store (CouchDB, MongoDB or Cassandra) would work very well for this type of problem and would scale well.
I would take a look at some of the NOSQL solutions, for this interconnected dataset possibly Neo4j, a Graph Database. It's even pretty straightforward to query it through Cypher so that you get tabular results back.
As others have pointed out the number of users and the frequency of requests (traffic volume) must be looked at. Also, how important is redundancy? How likely are people to work on same documents simultaneously? Are most documents created once and distributed for "readonly" purposes?
NoSQL can help you scale and get redundancy in a much easier way compared to rdbms for this particular scenario. I am assuming that at some point you will want tagging etc. to be enabled on the documents.
Now, I am wondering if there is any particular reason why you are not looking at off the shelf document management and CMS system for this? I am sure there is a good reason, but it might be worth looking at all the those options too.
I hope this helps. Good luck!
Denormalization will give you better read-search performance in this
case.
Don't normalize users, keep frequently joined entities like owner and
text, in one table
e.g. keep names of the owners as FK on text table, to keep their
names on the text table and decrease number of joins, then you can
use sql freely.
I've managed this using long unique queries in MySQL as you suggest for a small-scale social networking project. Nowadays I would suggest using solr and keeping permission information as a denormalized array of interchangeable keywords on each document. Say each network has a unique recognizable code (ie 100N-20000N), similar for users and special permission grants. You can store an array of permission keys, like "5515N 43243N 2342N 603U 203PG 44321PG" and treat those as keywords when searching.
I would address it with a simple business process solution, which will lead to a simple data schema, a simple query and so performances and scalabilty:
Each User has a list of documents... Period.
This list is in fact a list of references to documents in a document table (with owner/security informations...)
When sharing a document to another user this document reference is added to the user's document list (Tagged as a shared one if you want), user is added to the document security list (with permission level for example).
sql query to get documents is a simple: select documentid from userdocument where userid=#userid
With a join on document table, proper indexes and sql tuning it will run with all needed informations and it will run fast.
I hope i understood well what you try to do.
-< = one to many
>-< = many to many (will require link table)
Network -< user -< documents >-< contact(user)
v
|
^
contacts(user,user)
This is relational, I don't see a good reason to go NoSQL unless you have a billion users
Network (unless you can belong to more than one) is an attribute of user
contacts will be maitained in the link table user_contact(user,user)
tables
documents(doc_id,user_id)
user(user_id)
contacts(user_id,c_user_id) with foreign keys on users
document_contact(doc_id,c_user_id) where a trigger constrains the c_user_id
then you get a view for all docs owners and subscribers (contacts)
CREATE OR REPLACE VIEW user_docs AS
SELECT d.user_id, d.doc_id, 'owner' AS role
FROM documents d
JOIN users u ON d.user_id = u.user_id
UNION
SELECT c.user_id, d.doc_id, 'subscriber' AS role
FROM documents d
JOIN contacts c ON d.user_id = c.c_user_id;
you can then filter the view against the document contacts,
select * from user_docs ud
where
(ud.role = 'originator'
or
ud.doc_id in (select doc_id from document_contact dc where ud.doc_id = dc.doc_id)
) and ud.user_id = 'me'
I would trade off immediateness with performance when it comes to full text searching.
I would create a hash table of the user combinations with the documents on a separate thread usually triggered by an asynchronous call when user associations change.
I then query the hash value + other search criteria. This will eliminate the need for the long SQL that appears at the end which may cause a lock.
In Meetup.com, when you join a meetup group, you are usually required to complete a profile for that particular group. For example, if you join a movie meetup group, you may need to list the genres of movies you enjoy, etc.
I'm building a similar application, wherein users can join various groups and complete different profile details for each group. Assume the 2 possibilities:
Users can create their own groups and define what details to ask users that join that group (so, something a bit dynamic -- perhaps suggesting that at least an EAV design is required)
The developer decides now which groups to create and specify what details to ask users who join that group (meaning that the profile details will be predefined and "hard coded" into the system)
What's the best way to model such data?
More elaborate example:
The "Movie Goers" group request their members to specify the following:
Name
Birthdate (to be used to compute member's age)
Gender (must select from "male" or "female")
Favorite Genres (must select 1 or more from a list of specified genres)
The "Extreme Sports" group request their member to specify the following:
Name
Description of Activities Enjoyed (narrative form)
Postal Code
The bottom line is that each group may require different details from members joining their group. Ideally, I would like anyone to create a group (ala MeetUp.com). However, I also need the ability to query for members fairly well (e.g. find all women movie goers between the ages of 25 and 30).
For something like this....you'd want maximum normalization, so you wouldn't have duplicate data anywhere. Because your user-defined tables could possibly contain the same type of record, I think that you might have to go above 3NF for this.
My suggestion would be this - explode your tables so that you have something close to 6NF with EAV, so that each question that users must answer will have its own table. Then, your user-created tables will all reference one of your question tables. This avoids the duplication of data issue. (For instance, you don't want an entry in the "MovieGoers" group with the name "John Brown" and one in the "Extreme Sports" group with the name "Johnny B." for the same user; you also don't want his "what is your favorite color" answer to be "Blue" in one group and "Red" in another. Any data that can span across groups, like common questions, would be normalized in this form.)
The main drawback to this is that you'd end up with a lot of tables, and you'd probably want to create views for your statistical queries. However, in terms of pure data integrity, this would work well.
Note that you could probably get away with only factoring out the common fields, if you really wanted to. Examples of common fields would include Name, Location, Gender, and others; you could also do the same for common questions, like "what is your favorite color" or "do you have pets" or something to that extent. Group-specific questions that don't span across groups could be stored in a separate table for that group, un-exploded. I wouldn't advise this because it wouldn't be as flexible as the pure 6NF option and you run the risk of duplication (how do you predetermine which questions won't be common questions?) but if you really wanted to, you could do this.
There's a good question about 6NF here: Would like to Understand 6NF with an Example
I hope that made some sense and I hope it helps. If you have any questions, leave a comment.
Really, this is exactly a problem for which SQL is not a right solution. Forget normalization. This is exactly the job for NoSQL document stores. Every user as a document, having some essential fields like id, name, pwd etc. And every group adds possibility to add some fields. Unique fields can have names group-id-prefixed, shared fields (that grasp some more general concept) can have that field name free.
Except users (and groups) then you will have field descriptions with name, type, possible values, ... which is also very good for a document store.
If you use key-value document store from the beginning, you gain this freeform possibility of structuring your data plus querying them (though not by SQL, but by the means this or that NoSQL database provides).
First i'd like to note that the following structure is just a basis to your DB and you will need to expand/reduce it.
There are the following entities in DB:
user (just user)
group (any group)
template (list of requirement united into template to simplify assignment)
requirement (single requirement. For example: date of birth, gender, favorite sport)
"Modeling":
**User**
user_id
user_name
**Group**
name
group_id
user_group
user_id (FK)
group_id (FK)
**requirement**:
requirement_id
requirement_name
requirement_type (FK) (means the type: combo, free string, date) - should refers to dictionary)
**template**
template_id
template_name
**template_requirement**
r_id (FK)
t_id (FK)
The next step is to model appropriate schema for storing restrictions, i.e. validating rule for any requirement in any template. We have to separate it because for different groups the same restrictions can be different (for example: "age"). You can use the following table:
**restrictions**
group_id
template_id
requirement_id (should be here as template_id because the same requirement can exists in different templates and any group can consists of many templates)
restriction_type (FK) (points to another dict: value, length, regexp, at_least_one_value_choosed and so on)
So, as i said it is the basis. You can feel free to simplify this schema (wipe out tables, multiple templates for group). Or you can make it more general adding opportunity to create and publish temaplate, requirements and so on.
Hope you find this idea useful
You could save such data as JSON or XML (Structure, Data)
User Table
Userid
Username
Password
Groups -> JSON Array of all Groups
GroupStructure Table
Groupid
Groupname
Groupstructure -> JSON Structure (with specified Fields)
GroupData Table
Userid
Groupid
Groupdata -> JSON Data
I think this covers most of your constraints:
users
user_id, user_name, password, birth_date, gender
1, Robert Jones, *****, 2011-11-11, M
group
group_id, group_name
1, Movie Goers
2, Extreme Sports
group_membership
user_id, group_id
1, 1
1, 2
group_data
group_data_id, group_id, group_data_name
1, 1, Favorite Genres
2, 2, Favorite Activities
group_data_value
id, group_data_id, group_data_value
1,1,Comedy
2,1,Sci-Fi
3,1,Documentaries
4,2,Extreme Cage Fighting
5,2,Naked Extreme Bike Riding
user_group_data
user_id, group_id, group_data_id, group_data_value_id
1,1,1,1
1,1,1,2
1,2,2,4
1,2,2,5
I've had similar issues to this. I'm not sure if this would be the best recommendation for your specific situation but consider this.
Provide a means of storing data as XML, or JSON, or some other format that delimits the data, but basically stores it in field that has no specific format.
Provide a way to store the definition of that data
Provide a lookup/index table for the data.
This is a combination of techniques indicated already.
Essentially, you would create some interface to your clients to create a "form" for what they want saved. This form would indicated what pieces of information they want from the user. It would also indicate what pieces of information you want to search on.
Save this information to the definition table.
The definition table is then used to describe the user interface for entering data.
Once user data is entered, save the data (as xml or whatever) to one table with a unique id. At the same time, another table will be populated as an index with
id where the xml data was saved
name of field data is stored in
value of field data stored.
id of data definition.
now when a search commences, there should be no issue in searching for the information in the index table by name, value and definition id and getting back the id of the xml/json (or whatever) data you stored in the table that the data form was stored.
That data should be transformable once it is retrieved.
I was seriously sketchy on the details here, I hope this is enough of an answer to get you started. If you would like any explanation or additional details, let me know and I'll be happy to help.
if you're not stuck to mysql, i suggest you to use postgresql which provides build-in array datatypes.
you can define a define an array of varchar field to store group specific fields, in your groups table. to store values you can do the same in the membership table.
comparing to string parsing based xml types, this array approach will be really fast.
if you dont like array approach you can check out xml datatypes and an optional hstore datatype which is a key-value store.