Database design: Custom data layout and rights management - mysql

I'm workling on a project for my University and im a little curious about my current database design:
First of all, what is the common naming practice for a database table: singular or plural? I once read something about i but i cannot remember it?! (e.g. table: user or users)
The second question is a little mor specific to the project:
The users can login into the website an have to choose 10 elements out of a list and attach each of the elements a priority from 1 to 4. My first try was to save the choice of the user in a single row as a CSV (e.g. 1,2,3;4,5,6;7,8;9,10 which represents the choice of element 1,2,3 with priority 1 etc.). The second attempt was to save each choice as a single entry like: [user_id]|[choosen_element]|[choosen_priority]. What do you think is the better variant or is there a even better one that i havent thought of?
The third question is more about the login and rights management:
The elements that the users can choose from are in groups. Each element can be in multiple groups. There are moderators who have the same groups that the elements have or a subset of it, and they can edit all elements of the group they are in posession of. Besides the groups there are also the rights for the users e.g. user, moderator, admin etc.
In my last design i defined the rights of the users as part of the groups table so that every user that is in the moderators group can edit items of the groups that he is also in.
In my first attempt i had the groups and the rights in a seperate table with a seperate logic in my application!
Is it better to seperate the aplication rights from the groups?
Here is a plot of my current layout if i missed something, or if somebody just likes to look at pictures ;)
http://screens.rofln.de/2012-06-19-4f3o3A.png
Thanks!
Btw.: Im working with PHP and a MySql if someone whants to know!

This is subjective, but if you go by conventions supplied by popular ORMs, it seems pluralized is pretty common. I don't think which you chose matters, only that you are consistent once you have chosen.
A record representing each choice makes most sense. This allows for ordering as well as queries to find highly rated elements, etc. Finally, reading the data in your application, and varying how it is displayed, will be easier since you'll be working with a list of items rather than a packed value.
This is hard to answer, since I'm not familiar with your problem domain. I'd recommend developing use cases and then applying them to your proposed model to see where the cracks are.

It does not matter whether you use singular or plural, what matters is that you are consistent in your use of the standard.
Comma separated values in MySQL are bad, mostly because it is not a congruent way of using a relational database. A standard database relationship, or a many-to-many table is a better idea.
When you make your rights management system more flexible, it becomes more complex. A good heuristic in this case is to build the simplest system that satisfies your requirements, but no simpler.
Speaking of simplicity, why do you have a separate table for userdata? Do you expect some user to have two sets of names and details?

Related

Building Organizations with Subgroups

I'm struggling with a database design issue, and it's kind of a long winded one:
My website will have an unlimited number of organizations users they can join, subgroups under those organizations, and finally specific profiles for those subgroups. Subgroups within the same organization will be able to borrow and make changes to profiles from each other. Users will generate the organizations, the subgroups, and profiles.
I can draw it out, make the flow sensible on paper. When it comes to actually putting it to either SQL I'm lost. The majority of the help guides out there assumes static groups so a simple primary and foreign key set up can refer back to the right information. Mine has too much dynamic information for most of these to outright work as I understand it.
Most writers say stay away from dynamically generated tables, but that's where my instinct takes me. Another idea I had was 3 massive tables one for all Organizations, Groups, and Profiles.
So is there a better way to go about this? Or are there any good documents I should read up on to help me translate from drawing to actual code?
I have some experience with both SQL and MongoDB if that helps explain things.
I don't know about MongoDB(NoSQL), but from the SQL standpoint, here is my opinion.
As far as your schema goes, Most of the time when your "instinct" indicates that :- Only a "Dynamic Tables" solution is your best bet, for some problem that you are working on.
Remember there is a high chance that, that very problem can be solved by multiple static tables with different relationships. (By Static I mean the ones which you have created yourself as a developer.)
Also I'd like to mention that, I too myself in my initial days always thought of problem solving the similar way, but then I started understanding the principles and how exactly the databases work.
Back To Your Problem:-
If your organisation hirerchy consists of three major types of objects/levels, viz. Organizations, Groups, and Profiles then I'd suggest that you go with the 3 tables with correct relationships, which any SQL engine is quiet efficient at handling, in comparison to creating tables at runtime.
Now if the hierarchy is dynamic like say, An organisation can contain many groups which in turn shall contain profiles which again shall/can contain other organisations and so on.... Then you may want to look at Recursive structure with SQL(Recursion). (Just do a google search there are a lot of articles about that.)

MySQL Relational Database with Large Data Sets Unique to Each User

I am working on a project which involves building a social network-style application allowing users to share inventory/product information within their network (for sourcing).
I am a decent programmer, but I am admittedly not an expert with databases; even more so when it comes to database design. Currently, user/company information is stored via a relational database scheme in MySQL which is working perfectly.
My problem is that while my relational scheme works brilliantly for user/company information, it is confusing me on how to implement inventory information. The issue is that each "inventory list" will definitely contain differing attributes specific to the product type, but identical to the attributes of each other product in the list. My first thought was to create a table for each "inventory list". However, I feel like this would be very messy and would complicate future attempts at KDD. I also (briefly) considered using a 'master inventory' and storing the information (e.g. the variable categories and data as a JSON string. But I figured JSON strings MySQL would just become a larger pain in the ass.
My question is essentially how would someone else solve this problem? Or, more generally, sticking with principles of relational database management, what is the "correct" way to associate unique, large data sets of similar type with a parent user? The thing is, I know I could easily jerry-build something that would work, but I am genuinely interested in what the consensus is on how to solve this problem.
Thanks!
I would check out this post: Entity Attribute Value Database vs. strict Relational Model Ecommerce
The way I've always seen this done is to make a base table for inventory that stores universally common fields. A product id, a product name, etc.
Then you have another table that has dynamic attributes. A very popular example of this is Wordpress. If you look at their data model, they use this idea heavily.
One of the good things about this approach is that it's flexible. One of the major negatives is that it's slow and can produce complex code.
I'll throw out an alternative of using a document database. In that case, each document can have a different schema/structure and you can still run queries against them.

(Somewhat) complicated database structure vs. simple — with null fields

I'm currently choosing between two different database designs. One complicated which separates data better then the more simple one. The more complicated design will require more complex queries, while the simpler one will have a couple of null fields.
Consider the examples below:
Complicated:
Simpler:
The above examples are for separating regular users and Facebook users (they will access the same data, eventually, but login differently). On the first example, the data is clearly separated. The second example is way simplier, but will have at least one null field per row. facebookUserId will be null if it's a normal user, while username and password will be null if it's a Facebook-user.
My question is: what's prefered? Pros/cons? Which one is easiest to maintain over time?
First, what Kirk said. It's a good summary of the likely consequences of each alternative design. Second, it's worth knowing what others have done with the same problem.
The case you outline is known in ER modeling circles as "ER specialization". ER specialization is just different wording for the concept of subclasses. The diagrams you present are two different ways of implementing subclasses in SQL tables. The first goes under the name "Class Table Inheritance". The second goes under the name "Single Table Inheritance".
If you do go with Class table inheritance, you will want to apply yet another technique, that goes under the name "shared primary key". In this technique, the id fields of facebookusers and normalusers will be copies of the id field from users. This has several advantages. It enforces the one-to-one nature of the relationship. It saves an extra foreign key in the subclass tables. It automatically provides the index needed to make the joins run faster. And it allows a simple easy join to put specialized data and generalized data together.
You can look up "ER specialization", "single-table-inheritance", "class-table-inheritance", and "shared-primary-key" as tags here in SO. Or you can search for the same topics out on the web. The first thing you will learn is what Kirk has summarized so well. Beyond that, you'll learn how to use each of the techniques.
Great question.
This applies to any abstraction you might choose to implement, whether in code or database. Would you write a separate class for the Facebook user and the 'normal' user, or would you handle the two cases in a single class?
The first option is the more complicated. Why is it complicated? Because it's more extensible. You could easily include additional authentication methods (a table for Twitter IDs, for example), or extend the Facebook table to include... some other facebook specific information. You have extracted the information specific to each authentication method into its own table, allowing each to stand alone. This is great!
The trade off is that it will take more effort to query, it will take more effort to select and insert, and it's likely to be messier. You don't want a dozen tables for a dozen different authentication methods. And you don't really want two tables for two authentication methods unless you're getting some benefit from it. Are you going to need this flexibility? Authentication methods are all similar - they'll have a username and password. This abstraction lets you store more method-specific information, but does that information exist?
Second option is just the reverse the first. Easier, but how will you handle future authentication methods and what if you need to add some authentication method specific information?
Personally I'd try to evaluate how important this authentication component is to the system. Remember YAGNI - you aren't gonna need it - and don't overdesign. Unless you need that extensibility that the first option provides, go with the second. You can always extract it at a later date if necessary.
This depends on the database you are using. For example Postgres has table inheritance that would be great for your example, have a look here:
http://www.postgresql.org/docs/9.1/static/tutorial-inheritance.html
Now if you do not have table inheritance you could still create views to simplify your queries, so the "complicated" example is a viable choice here.
Now if you have infinite time than I would go for the first one (for this one simple example and prefered with table inheritance).
However, this is making things more complicated and so will cost you more time to implement and maintain. If you have many table hierarchies like this it can also have a performance impact (as you have to join many tables). I once developed a database schema that made excessive use of such hierarchies (conceptually). We finally decided to keep the hierarchies conceptually but flatten the hierarchies in the implementation as it had gotten so complex that is was not maintainable anymore.
When you flatten the hierarchy you might consider not using null values, as this can also prove to make things a lot harder (alternatively you can use a -1 or something).
Hope these thoughts help you!
Warning bells are ringing loudly with the presence of two the very similar tables facebookusers and normalusers. What if you get a 3rd type? Or a 10th? This is insane,
There should be one user table with an attribute column to show the type of user. A user is a user.
Keep the data model as simple as you possibly can. Don't build it too much kung fu via data structure. Leave that for the application, which is far easier to alter than altering a database!
Let me dare suggest a third. You could introduce 1 (or 2) tables that will cater for extensibility. I personally try to avoid designs that will introduce (read: pollute) an entity model with non-uniformly applicable columns. Have the third table (after the fashion of the EAV model) contain a many-to-one relationship with your users table to cater for multiple/variable user related field.
I'm not sure what your current/short term needs are, but re-engineering your app to cater for maybe, twitter or linkedIn users might be painful. If you can abstract the content of the facebookUserId column into an attribute table like so
user_attr{
id PK
user_id FK
login_id
}
Now, the above definition is ambiguous enough to handle your current needs. If done right, the EAV should look more like this :
user_attr{
id PK
user_id FK
login_id
login_id_type FK
login_id_status //simple boolean flag to set the validity of a given login
}
Where login_id_type will be a foreign key to an attribute table listing the various login types you currently support. This gives you and your users flexibility in that your users can have multiple logins using different external services without you having to change much of your existing system

Multilingual database design pattern

I'm currently working on Blog-Software which should offer support for content in multiple languages.
I'm thinking of a way to design my database (MySQL). My first thought was the following:
Every entry is stored in a table (lets call it entries). This table
holds information which doesn't change (like the unique ID, if it's
published or not and the post-type).
Another table (let's call it content) contains the strings
(like the content, the headline, the date, and author of the specific
language).
They are then joined by the unique entry-id.
The idea of this is that one article can be translated into multiple other languages, but it doesn't need to be. If there is no translation in the native language of the user (determined by his IP or something), he sees the standard language (which would be English).
For me this sounds like a simple multilingual database and I'm sure there is a design pattern for this. Sadly, I didn't find any.
If there is no pattern, how would you go about realizing this? Any input is greatly appreciated.
Your approach is what I've seen in most applications with this kind of capability. The only changing piece is that some places will put the "default" values into the base table (Entry) while others will treat it as just another Content row.
That design will also give you the ability to search (or restrict search) in all languages easily. From a db design perspective, its imho the best design you can use.
With small amounts of text and a simple application this would work. In the large, you might be bitten by the extra joins needed, especially when your database is larger than ram. Presenting things in the right order (sorting) also might need solving

What is the right way to do flexible columns in database?

Im storing columns in database with users able to add and remove columns, with fake columns. How do I implement this efficiently?
The best way would be to implement the data structure vertically, instead of normal horizontal.
This can be done using something like
TableAttribute
AttributeID
AttributeType
AttributeValue
This application of vertical is mostly used in applications where users can create their own custom forms, and field (if i recall corretly the devexpress form layout allows you to create custom layouts). Mostly used in CRM applications, this is easily modified inproduction, and easily maintainable, but can greatly decrease SQL performance once the data set becomes very large.
EDIT:
This will depend on how far you wish to take it. You can set it up that it will be per form/table, add attributes that describe the actual control (lookup, combo, datetime, etc...) position of the controls, allowed values (min/max/allow null).
This can become a very cumbersome task, but will greatly depend on your actual needs.
I'd think you could allow that at the user-permission level (grant the ALTER privilege on the appropriate tables) and then restrict what types of data can be added/deleted using your presentation layer.
But why add columns? Why not a linked table?
Allowing users to define columns is generally a poor choice as they don't know what they are doing or how to relate it properly to the other data. Sometimes people use the EAV approach to this and let them add as many columns as they want, but this quickly gets out of control and causes performance issues and difficulty in querying the data.
Others take the approach of having a table with user defined columns and give them a set number of columns they can define. This works better performance wise but is more limiting interms of how many new columns they can define.
In any event you should severely restrict who can define the additional columns only to system admins (who can be at the client level). It is a better idea to actually talk to users in the design phase and see what they need. You will find that you can properly design a system that has 90+% of waht the customer needs if you actually talk to them (and not just to managers either, to users at all levels of the organization).
I know it is common in today's world to slough off our responsibility to design by saying we are making things flexible, but I've had to use and provide dba support for many of these systems and the more flexible they try to make the design, the harder it is for the users to use and the more the users hate the system.