MySQL: Best DB model for a "User referral" system - mysql

I'm modeling a DB for an application where one of the functions is to get a user from the DB and display in a diagram the selected user and all the referrals of that selected user, and the referrals (if any) for the selected user's referrals, going that way up to 3 referral levels.
I have two theories on how to model a scheme to accomplish this, but I don't know which one is the "best" (in terms of optimization, normalization, etc).
We have one scheme where the referrals are stored in a different table, with only a BOOLEAN to show if the user is, in fact, referred from another user.
On the other hand, I can substitute the BOOLEAN with a nullable INT (if referred, just store an INT, if null meaning is not referred by anyone).
If there is a better way to accomplish this, suggestions are also welcomed. Thank you.

I'd suggest a third model. You're talking a model where you can optionally have a referral, so perhaps a table that joins twice to the users table, the first column the refered persons ID, and in the second column, the referring users ID.
Then you know if there is a referral by joining to this table in your queries

I would use the second design and perhaps add a closure table for easily finding the referrals across multiple levels (see: http://wiki.pentaho.com/display/EAI/Closure+Generator)

Related

ERD: how to store data for three different payment types

Let’s say a user has 3 different ways for payment. Cash, IBAN, and through providing certain_document. Each payment type requires its own different details.
How can I store this in my database?
Let’s say the user has chosen to pay using his IBAN, assuming this picture is the current database, do I fill the fields associated with the IBAN option and set the others to Null? Or is there a more professional way to store the data without having these Null values?
UPDATE
I found a solution to this problem in the answer to this question, however, the answer is still not sufficient. If anybody has a link to a more detailed documents please let me know.
As #philipxy noted, you're asking about representing inheritance in a RDBMS. There are a few different ways to do this:
Have all of your attributes in one table (which, based on your screenshot, is what you have now). With this approach, it would be best to store NULLs in non-applicable columns---if nothing else, the default settings for InnoDB tables uses the compact row format, which means NULL columns don't take up extra storage. Of course, your queries can get complex, and maintaining these tables can become cumbersome.
Have child tables to store your details:
Payments (PaymentID, PaymentDate, etc.)
CashPaymentDetails(PaymentID, Cash_Detail_1, Cash_Detail_2, etc.)
IBANPaymentDetails(PaymentID, IBAN_Detail_1, etc.)
You can get the information for each payment by joining the base payment table with one of the "subsidiary" tables:
SELECT *
FROM Payments P INNER JOIN CashPaymentDetails C ON
C.PaymentID = P.PaymentID
Your third option is to use the entity-attribute-value (EAV) model. Like with Option 2, you have a base Payment table. However, instead of having one table for each payment method, you have one subsidiary table that contains the payment details. For more information, here's the Wiki page, and here's a blog with some additional information.

Access query is duplicating unique records / Linked table issues

I hope someone can help me with this:
I have a simple query combining a list of names and basic details with another table containing more specific information. Some names will necessarily appear more than once and arbitrary distinctions like "John Smith 1" and "John Smith 2" are not an option, so I have been using an autonumber to keep the records distinct.
The problem is that my query is creating two records for each name that appears more than once. For example, there are two clients named 'Sophoan', each with a different id number, and the query has picked up each one twice resulting in four records (in total there are 122 records when there should only be 102). 'Unique values' is set to 'yes'.
I've researched as much as I can and am completely stuck. I've tried to tinker with sql but it always comes back with errors, I presume because there are too many fields in the query.
What am I missing? Or is a query the wrong approach and I need to find another way to combine my tables?
Project in detail: I'm building a database for a charity which has two main activities: social work and training. The database is to record their client information and the results of their interactions with clients (issues they asked for help with, results of training workshops etc.). Some clients will cross over between activities which the organisation wants to track, hence all registered clients go into one list and individual tables spin of that to collect data for each specific activity the client takes part in. This query is supposed to be my solution for combining these tables for data entry by the user.
At present I have the following tables:
AllList (master list of client names and basic contact info; 'Social Work Register' and 'Participant Register' join to this table by
'Name')
Social Work Register (list of social work clients with full details
of each case)
Social Work Follow-up Table (used when staff call social work clients
to see how their issue is progressing; the register has too many
columns to hold this as well; joined to Register by 'Client Name')
Participants Register (list of clients for training and details of
which workshops they were attended and why they were absent if they
missed a session)
Individual workshop tables x14 (each workshop includes a test and
these tables records the clients answers and their score for each
individual test; there will be more than 20 of these when the
database is finished; all joined to the 'Participants Register' by
'Participant Name')
Queries:
Participant Overview Query (links the attendance data from the 'Register' with the grading data from each Workshop to present a read-only
overview; this one seems to work perfectly)
Social Work Query (non-functional; intended to link the 'Client
Register' to the 'AllList' for data entry so that when a new client
is registered it creates a new record in both tables, with the
records matched together)
Participant Query (not yet attempted; as above, intended to link the
'Participant Register' to the 'AllList' for data entry)
BUT I realised that queries can't be used for data entry, so this approach seems to be a dead end. I have had some success with using subforms for data entry but I'm not sure if it's the best way.
So, what I'm basically hoping to achieve is a way to input the same data to two tables simultaneously (for new records) and have the resulting records matched together (for new entries to existing records). But it needs to be possible for the same name to appear more than once as a unique record (e.g. three individuals named John Smith).
[N.B. There are more tables that store secondary information but aren't relevant to the issue as they are not and will not be linked to any other tables.]
I realised that queries can't be used for data entry
Actually, non-complex queries are usually editable as long as the table whose data you want to edit remains 'at the core' of the query. Access applies a number of factors to determine if a query is editable or not.
Most of the time, it's fairly easy to figure out why a query has become non-editable.
Ask yourself the question: if I edit that data, how will Access ensure that exactly that data will be updated, without ambiguity?
If your tables have defined primary keys and these are part of your query, and if there are no grouping, calculated fields (fields that use some function to change or test the value of that field), or complex joins, then the query should remain editable.
You can read more about that here:
How to troubleshoot errors that may occur when you update data in Access queries and in Access forms
Dealing with Non-Updateable Microsoft Access Queries and the Use of Temporary Tables.
So, what I'm basically hoping to achieve is a way to input the same data to two tables simultaneously (for new records) and have the resulting records matched together (for new entries to existing records). But it needs to be possible for the same name to appear more than once as a unique record (e.g. three individuals named John Smith).
This remark actually proves that you have design issues in your database.
A basic tenet of Database Design is to remove redundancy as much as possible. One of the reasons is actually to avoid having to update the same data in multiple places.
Another remark: you are using the Client's name as a Natural Key. Frankly, it is not a very good idea. Generally, you want to make sure that what constitutes a Primary key for a table is reliably unique over time.
Using people's names is generally the wrong choice because:
people change name, for instance in many cultures, women change their family name after they get married.
There could also have been a typo when entering the name and now it can be hard to correct it if that data is used as a Foreign Key all in different tables.
as your database grows, you are likely to end up with some people having the same name, creating conflicts, or forcing the user to make changes to that name so it doesn't create a duplicate.
The best way to enforce uniqueness of records in a table is to use the default AutoNumber ID field proposed by Access when you create a new table. This is called a Surrogate key.
It's not mean to be edited, changed or even displayed to the user. It's sole purpose is to allow the primary key of a table to be unique and non-changing over time, so it can reliably be used as a way to reference a record from one table to another (if a table needs to refer to a particular record, it will contain a field that will hold that ID. That field is called a Foreign Key).
The names you have for your tables are not precise enough: think of each table as an Entity holding related data.
The fact that you have a table called AllList means that its purpose isn't that well-thought of; it sounds like a catch-all rather than a carefully crafted entity.
Instead, if this is your list of clients, then simply call it Client. Each record of that table holds the information for a single client (whether to use plural or singular is up to you, just stick to your choice though, being consistent is hugely important).
Instead of using the client's name as a key, create an ID field, an Autonumber, and set it as Primary Key.
Let's also rename the "Social Work Register", which holds the Client's cases, simply as ClientCase. That relationship seems clear from your description of the table but it's not clear in the table name itself (by the way, I know Access allows spaces in table and field names, but it's a really bad idea to use them if you care at least a little bit about the future of your work).
In that, create a ClientID Number field (a Foreign Key) that will hold the related Client's ID in the ClientCase table.
You don't talk about the relationship between a Client and its Cases. This is another area where you must be clear: how many cases can a single Client have?
At most 1 Case ? (0 or 1 Case)
exactly 1 Case?
at least one Case? (1 or more Cases)
any number of Cases? (0 or more Cases)
Knowing this is important for selecting the right type of JOIN in your queries. It's a crucial part of the design assumptions when building your database.
For instance, in the most general case, assuming that a Client can have 0 or more cases, you could have a report that displays the Client's Name and the number of cases related to them like this:
SELECT Client.Name,
Count(ClientCase.ID) AS CountOfCases
FROM Client
LEFT JOIN ClientCase
ON Client.ID = ClienCase.ClientID
GROUP BY Client.Name
You've described your basic design a bit more, but that's not enough. Show us the actual table structures and the SQL of the queries you tried. From the description you give, it's hard to really understand the actual details of the design and to tell you why it fails and how to make it work.

Database Design: User Profiles like in Meetup.com

In Meetup.com, when you join a meetup group, you are usually required to complete a profile for that particular group. For example, if you join a movie meetup group, you may need to list the genres of movies you enjoy, etc.
I'm building a similar application, wherein users can join various groups and complete different profile details for each group. Assume the 2 possibilities:
Users can create their own groups and define what details to ask users that join that group (so, something a bit dynamic -- perhaps suggesting that at least an EAV design is required)
The developer decides now which groups to create and specify what details to ask users who join that group (meaning that the profile details will be predefined and "hard coded" into the system)
What's the best way to model such data?
More elaborate example:
The "Movie Goers" group request their members to specify the following:
Name
Birthdate (to be used to compute member's age)
Gender (must select from "male" or "female")
Favorite Genres (must select 1 or more from a list of specified genres)
The "Extreme Sports" group request their member to specify the following:
Name
Description of Activities Enjoyed (narrative form)
Postal Code
The bottom line is that each group may require different details from members joining their group. Ideally, I would like anyone to create a group (ala MeetUp.com). However, I also need the ability to query for members fairly well (e.g. find all women movie goers between the ages of 25 and 30).
For something like this....you'd want maximum normalization, so you wouldn't have duplicate data anywhere. Because your user-defined tables could possibly contain the same type of record, I think that you might have to go above 3NF for this.
My suggestion would be this - explode your tables so that you have something close to 6NF with EAV, so that each question that users must answer will have its own table. Then, your user-created tables will all reference one of your question tables. This avoids the duplication of data issue. (For instance, you don't want an entry in the "MovieGoers" group with the name "John Brown" and one in the "Extreme Sports" group with the name "Johnny B." for the same user; you also don't want his "what is your favorite color" answer to be "Blue" in one group and "Red" in another. Any data that can span across groups, like common questions, would be normalized in this form.)
The main drawback to this is that you'd end up with a lot of tables, and you'd probably want to create views for your statistical queries. However, in terms of pure data integrity, this would work well.
Note that you could probably get away with only factoring out the common fields, if you really wanted to. Examples of common fields would include Name, Location, Gender, and others; you could also do the same for common questions, like "what is your favorite color" or "do you have pets" or something to that extent. Group-specific questions that don't span across groups could be stored in a separate table for that group, un-exploded. I wouldn't advise this because it wouldn't be as flexible as the pure 6NF option and you run the risk of duplication (how do you predetermine which questions won't be common questions?) but if you really wanted to, you could do this.
There's a good question about 6NF here: Would like to Understand 6NF with an Example
I hope that made some sense and I hope it helps. If you have any questions, leave a comment.
Really, this is exactly a problem for which SQL is not a right solution. Forget normalization. This is exactly the job for NoSQL document stores. Every user as a document, having some essential fields like id, name, pwd etc. And every group adds possibility to add some fields. Unique fields can have names group-id-prefixed, shared fields (that grasp some more general concept) can have that field name free.
Except users (and groups) then you will have field descriptions with name, type, possible values, ... which is also very good for a document store.
If you use key-value document store from the beginning, you gain this freeform possibility of structuring your data plus querying them (though not by SQL, but by the means this or that NoSQL database provides).
First i'd like to note that the following structure is just a basis to your DB and you will need to expand/reduce it.
There are the following entities in DB:
user (just user)
group (any group)
template (list of requirement united into template to simplify assignment)
requirement (single requirement. For example: date of birth, gender, favorite sport)
"Modeling":
**User**
user_id
user_name
**Group**
name
group_id
user_group
user_id (FK)
group_id (FK)
**requirement**:
requirement_id
requirement_name
requirement_type (FK) (means the type: combo, free string, date) - should refers to dictionary)
**template**
template_id
template_name
**template_requirement**
r_id (FK)
t_id (FK)
The next step is to model appropriate schema for storing restrictions, i.e. validating rule for any requirement in any template. We have to separate it because for different groups the same restrictions can be different (for example: "age"). You can use the following table:
**restrictions**
group_id
template_id
requirement_id (should be here as template_id because the same requirement can exists in different templates and any group can consists of many templates)
restriction_type (FK) (points to another dict: value, length, regexp, at_least_one_value_choosed and so on)
So, as i said it is the basis. You can feel free to simplify this schema (wipe out tables, multiple templates for group). Or you can make it more general adding opportunity to create and publish temaplate, requirements and so on.
Hope you find this idea useful
You could save such data as JSON or XML (Structure, Data)
User Table
Userid
Username
Password
Groups -> JSON Array of all Groups
GroupStructure Table
Groupid
Groupname
Groupstructure -> JSON Structure (with specified Fields)
GroupData Table
Userid
Groupid
Groupdata -> JSON Data
I think this covers most of your constraints:
users
user_id, user_name, password, birth_date, gender
1, Robert Jones, *****, 2011-11-11, M
group
group_id, group_name
1, Movie Goers
2, Extreme Sports
group_membership
user_id, group_id
1, 1
1, 2
group_data
group_data_id, group_id, group_data_name
1, 1, Favorite Genres
2, 2, Favorite Activities
group_data_value
id, group_data_id, group_data_value
1,1,Comedy
2,1,Sci-Fi
3,1,Documentaries
4,2,Extreme Cage Fighting
5,2,Naked Extreme Bike Riding
user_group_data
user_id, group_id, group_data_id, group_data_value_id
1,1,1,1
1,1,1,2
1,2,2,4
1,2,2,5
I've had similar issues to this. I'm not sure if this would be the best recommendation for your specific situation but consider this.
Provide a means of storing data as XML, or JSON, or some other format that delimits the data, but basically stores it in field that has no specific format.
Provide a way to store the definition of that data
Provide a lookup/index table for the data.
This is a combination of techniques indicated already.
Essentially, you would create some interface to your clients to create a "form" for what they want saved. This form would indicated what pieces of information they want from the user. It would also indicate what pieces of information you want to search on.
Save this information to the definition table.
The definition table is then used to describe the user interface for entering data.
Once user data is entered, save the data (as xml or whatever) to one table with a unique id. At the same time, another table will be populated as an index with
id where the xml data was saved
name of field data is stored in
value of field data stored.
id of data definition.
now when a search commences, there should be no issue in searching for the information in the index table by name, value and definition id and getting back the id of the xml/json (or whatever) data you stored in the table that the data form was stored.
That data should be transformable once it is retrieved.
I was seriously sketchy on the details here, I hope this is enough of an answer to get you started. If you would like any explanation or additional details, let me know and I'll be happy to help.
if you're not stuck to mysql, i suggest you to use postgresql which provides build-in array datatypes.
you can define a define an array of varchar field to store group specific fields, in your groups table. to store values you can do the same in the membership table.
comparing to string parsing based xml types, this array approach will be really fast.
if you dont like array approach you can check out xml datatypes and an optional hstore datatype which is a key-value store.

Implementing Comments and Likes in database

I'm a software developer. I love to code, but I hate databases... Currently, I'm creating a website on which a user will be allowed to mark an entity as liked (like in FB), tag it and comment.
I get stuck on database tables design for handling this functionality. Solution is trivial, if we can do this only for one type of thing (eg. photos). But I need to enable this for 5 different things (for now, but I also assume that this number can grow, as the whole service grows).
I found some similar questions here, but none of them have a satisfying answer, so I'm asking this question again.
The question is, how to properly, efficiently and elastically design the database, so that it can store comments for different tables, likes for different tables and tags for them. Some design pattern as answer will be best ;)
Detailed description:
I have a table User with some user data, and 3 more tables: Photo with photographs, Articles with articles, Places with places. I want to enable any logged user to:
comment on any of those 3 tables
mark any of them as liked
tag any of them with some tag
I also want to count the number of likes for every element and the number of times that particular tag was used.
1st approach:
a) For tags, I will create a table Tag [TagId, tagName, tagCounter], then I will create many-to-many relationships tables for: Photo_has_tags, Place_has_tag, Article_has_tag.
b) The same counts for comments.
c) I will create a table LikedPhotos [idUser, idPhoto], LikedArticles[idUser, idArticle], LikedPlace [idUser, idPlace]. Number of likes will be calculated by queries (which, I assume is bad). And...
I really don't like this design for the last part, it smells badly for me ;)
2nd approach:
I will create a table ElementType [idType, TypeName == some table name] which will be populated by the administrator (me) with the names of tables that can be liked, commented or tagged. Then I will create tables:
a) LikedElement [idLike, idUser, idElementType, idLikedElement] and the same for Comments and Tags with the proper columns for each. Now, when I want to make a photo liked I will insert:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Photo'
INSERT (user id, typeId, photoId)
and for places:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Place'
INSERT (user id, typeId, placeId)
and so on... I think that the second approach is better, but I also feel like something is missing in this design as well...
At last, I also wonder which the best place to store counter for how many times the element was liked is. I can think of only two ways:
in element (Photo/Article/Place) table
by select count().
I hope that my explanation of the issue is more thorough now.
The most extensible solution is to have just one "base" table (connected to "likes", tags and comments), and "inherit" all other tables from it. Adding a new kind of entity involves just adding a new "inherited" table - it then automatically plugs into the whole like/tag/comment machinery.
Entity-relationship term for this is "category" (see the ERwin Methods Guide, section: "Subtype Relationships"). The category symbol is:
Assuming a user can like multiple entities, a same tag can be used for more than one entity but a comment is entity-specific, your model could look like this:
BTW, there are roughly 3 ways to implement the "ER category":
All types in one table.
All concrete types in separate tables.
All concrete and abstract types in separate tables.
Unless you have very stringent performance requirements, the third approach is probably the best (meaning the physical tables match 1:1 the entities in the diagram above).
Since you "hate" databases, why are you trying to implement one? Instead, solicit help from someone who loves and breathes this stuff.
Otherwise, learn to love your database. A well designed database simplifies programming, engineering the site, and smooths its continuing operation. Even an experienced d/b designer will not have complete and perfect foresight: some schema changes down the road will be needed as usage patterns emerge or requirements change.
If this is a one man project, program the database interface into simple operations using stored procedures: add_user, update_user, add_comment, add_like, upload_photo, list_comments, etc. Do not embed the schema into even one line of code. In this manner, the database schema can be changed without affecting any code: only the stored procedures should know about the schema.
You may have to refactor the schema several times. This is normal. Don't worry about getting it perfect the first time. Just make it functional enough to prototype an initial design. If you have the luxury of time, use it some, and then delete the schema and do it again. It is always better the second time.
This is a general idea
please don´t pay much attention to the field names styling, but more to the relation and structure
This pseudocode will get all the comments of photo with ID 5
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "comment"
This pseudocode will get all the likes or users who liked photo with ID 5
(you may use count() to just get the amount of likes)
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "like"
as far as i understand. several tables are required. There is a many to many relation between them.
Table which stores the user data such as name, surname, birth date with a identity field.
Table which stores data types. these types may be photos, shares, links. each type must has a unique table. therefore, there is a relation between their individual tables and this table.
each different data type has its table. for example, status updates, photos, links.
the last table is for many to many relation storing an id, user id, data type and data id.
Look at the access patterns you are going to need. Do any of them seem to made particularly difficult or inefficient my one design choice or the other?
If not favour the one that requires the fewer tables
In this case:
Add Comment: you either pick a particular many/many table or insert into a common table with a known specific identifier for what is being liked, I think client code will be slightly simpler in your second case.
Find comments for item: here it seems using a common table is slightly easier - we just have a single query parameterised by type of entity
Find comments by a person about one kind of thing: simple query in either case
Find all comments by a person about all things: this seems little gnarly either way.
I think your "discriminated" approach, option 2, yields simpler queries in some cases and doesn't seem much worse in the others so I'd go with it.
Consider using table per entity for comments and etc. More tables - better sharding and scaling. It's not a problem to control many similar tables for all frameworks I know.
One day you'll need to optimize reads from such structure. You can easily create agragating tables over base ones and lose a bit on writes.
One big table with dictionary may become uncontrollable one day.
Definitely go with the second approach where you have one table and store the element type for each row, it will give you a lot more flexibility. Basically when something can logically be done with fewer tables it is almost always better to go with fewer tables. One advantage that comes to my mind right now about your particular case, consider you want to delete all liked elements of a certain user, with your first approach you need to issue one query for each element type but with the second approach it can be done with only one query or consider when you want to add a new element type, with the first approach it involves creating a new table for each new type but with the second approach you shouldn't do anything...

MySQL database model for signups with and without addresses

I've been thinking about this all evening (GMT) but I can't seem to figure out a good solution for this one. Here's the case...
I have to create a signup system which distinguishes 4 kinds of "users":
Individual sign ups (require address info)
Group sign ups (don't require address info)
Group contact (require address info)
Application users (don't require address info)
I really cannot come up with a decent way of modeling this into something that makes sense. I'd greatly appreciate it if you could share your ideas.
Thanks in advance!
Sounds like good case for single table inheritance
Requiring certain data is more a function of your application logic than your database. You can definitely define database columns that don't allow NULL values, but they can be set to "" (empty string) without any errors.
As far as how to structure your database, have two separate tables:
User
UserAddress
When you have a new signup that requires contact info, your application will create records in both tables. When a new signup doesn't require address info, your application will only create a record in the User table.
There are a couple considerations here: first, I like to look at User/Group as a case of a Composite pattern. It clearly meets the requirement: you often have to treat the aggregate and individual versions of the entity interchangeably (as you note). Implementing a composite in a database is not that hard. If you are using an ORM, it is pretty simple (inheritance).
On the other part of the question, you always have the ability to create data structures that are mostly empty. Generally, that's a bad idea. So you can say 'well, in the beginning, we don't have any information about the User so we will just leave all the other fields blank.' A better approach is to try and model the phases as if they were part of an FSM. One of the clearest ways to do this in this particular case is to distinguish between Users, Accounts and some other more domain-specific entity, e.g. Subscriber or Customer. Then, I can come and browse using User, sign up and make an Account, then later when you want address and other personal information, become a subscriber. This would also imply inheritance, and you have the added benefit of being able to have a true representation of the population at any time that doesn't require stupid shenanigans like 'SELECT COUNT(*) WHERE _ not null,' etc.
Here's a suggestion from my end after weighing pro's and con's on this model. As I think the ideal setup is to have all users be a user entity that belong to a group without differentiating groups from individuals (except of course flag a group contact person and creating a link with a groups table) we came up with the alternative to copy the group contact user details to the group members when they group is created.
This way all entities that actually are a person will get their own table.
Could this be a good idea? Awaiting your comments :)
I've decided to go with a construction where group members are separated from the user pool anyway. The group members eventually have no relation with a user since they don't require access to mutating their personal data, that's what a group contact person is for. Eventually I could add a possibility for groups to have multiple contact persons, even distinguishing persons that are or are not allowed to edit any member data.
That's my answer on this one.