Multilingual database design pattern - mysql

I'm currently working on Blog-Software which should offer support for content in multiple languages.
I'm thinking of a way to design my database (MySQL). My first thought was the following:
Every entry is stored in a table (lets call it entries). This table
holds information which doesn't change (like the unique ID, if it's
published or not and the post-type).
Another table (let's call it content) contains the strings
(like the content, the headline, the date, and author of the specific
language).
They are then joined by the unique entry-id.
The idea of this is that one article can be translated into multiple other languages, but it doesn't need to be. If there is no translation in the native language of the user (determined by his IP or something), he sees the standard language (which would be English).
For me this sounds like a simple multilingual database and I'm sure there is a design pattern for this. Sadly, I didn't find any.
If there is no pattern, how would you go about realizing this? Any input is greatly appreciated.

Your approach is what I've seen in most applications with this kind of capability. The only changing piece is that some places will put the "default" values into the base table (Entry) while others will treat it as just another Content row.

That design will also give you the ability to search (or restrict search) in all languages easily. From a db design perspective, its imho the best design you can use.

With small amounts of text and a simple application this would work. In the large, you might be bitten by the extra joins needed, especially when your database is larger than ram. Presenting things in the right order (sorting) also might need solving

Related

MySQL Relational Database with Large Data Sets Unique to Each User

I am working on a project which involves building a social network-style application allowing users to share inventory/product information within their network (for sourcing).
I am a decent programmer, but I am admittedly not an expert with databases; even more so when it comes to database design. Currently, user/company information is stored via a relational database scheme in MySQL which is working perfectly.
My problem is that while my relational scheme works brilliantly for user/company information, it is confusing me on how to implement inventory information. The issue is that each "inventory list" will definitely contain differing attributes specific to the product type, but identical to the attributes of each other product in the list. My first thought was to create a table for each "inventory list". However, I feel like this would be very messy and would complicate future attempts at KDD. I also (briefly) considered using a 'master inventory' and storing the information (e.g. the variable categories and data as a JSON string. But I figured JSON strings MySQL would just become a larger pain in the ass.
My question is essentially how would someone else solve this problem? Or, more generally, sticking with principles of relational database management, what is the "correct" way to associate unique, large data sets of similar type with a parent user? The thing is, I know I could easily jerry-build something that would work, but I am genuinely interested in what the consensus is on how to solve this problem.
Thanks!
I would check out this post: Entity Attribute Value Database vs. strict Relational Model Ecommerce
The way I've always seen this done is to make a base table for inventory that stores universally common fields. A product id, a product name, etc.
Then you have another table that has dynamic attributes. A very popular example of this is Wordpress. If you look at their data model, they use this idea heavily.
One of the good things about this approach is that it's flexible. One of the major negatives is that it's slow and can produce complex code.
I'll throw out an alternative of using a document database. In that case, each document can have a different schema/structure and you can still run queries against them.

Database design: Custom data layout and rights management

I'm workling on a project for my University and im a little curious about my current database design:
First of all, what is the common naming practice for a database table: singular or plural? I once read something about i but i cannot remember it?! (e.g. table: user or users)
The second question is a little mor specific to the project:
The users can login into the website an have to choose 10 elements out of a list and attach each of the elements a priority from 1 to 4. My first try was to save the choice of the user in a single row as a CSV (e.g. 1,2,3;4,5,6;7,8;9,10 which represents the choice of element 1,2,3 with priority 1 etc.). The second attempt was to save each choice as a single entry like: [user_id]|[choosen_element]|[choosen_priority]. What do you think is the better variant or is there a even better one that i havent thought of?
The third question is more about the login and rights management:
The elements that the users can choose from are in groups. Each element can be in multiple groups. There are moderators who have the same groups that the elements have or a subset of it, and they can edit all elements of the group they are in posession of. Besides the groups there are also the rights for the users e.g. user, moderator, admin etc.
In my last design i defined the rights of the users as part of the groups table so that every user that is in the moderators group can edit items of the groups that he is also in.
In my first attempt i had the groups and the rights in a seperate table with a seperate logic in my application!
Is it better to seperate the aplication rights from the groups?
Here is a plot of my current layout if i missed something, or if somebody just likes to look at pictures ;)
http://screens.rofln.de/2012-06-19-4f3o3A.png
Thanks!
Btw.: Im working with PHP and a MySql if someone whants to know!
This is subjective, but if you go by conventions supplied by popular ORMs, it seems pluralized is pretty common. I don't think which you chose matters, only that you are consistent once you have chosen.
A record representing each choice makes most sense. This allows for ordering as well as queries to find highly rated elements, etc. Finally, reading the data in your application, and varying how it is displayed, will be easier since you'll be working with a list of items rather than a packed value.
This is hard to answer, since I'm not familiar with your problem domain. I'd recommend developing use cases and then applying them to your proposed model to see where the cracks are.
It does not matter whether you use singular or plural, what matters is that you are consistent in your use of the standard.
Comma separated values in MySQL are bad, mostly because it is not a congruent way of using a relational database. A standard database relationship, or a many-to-many table is a better idea.
When you make your rights management system more flexible, it becomes more complex. A good heuristic in this case is to build the simplest system that satisfies your requirements, but no simpler.
Speaking of simplicity, why do you have a separate table for userdata? Do you expect some user to have two sets of names and details?

What is the right way to do flexible columns in database?

Im storing columns in database with users able to add and remove columns, with fake columns. How do I implement this efficiently?
The best way would be to implement the data structure vertically, instead of normal horizontal.
This can be done using something like
TableAttribute
AttributeID
AttributeType
AttributeValue
This application of vertical is mostly used in applications where users can create their own custom forms, and field (if i recall corretly the devexpress form layout allows you to create custom layouts). Mostly used in CRM applications, this is easily modified inproduction, and easily maintainable, but can greatly decrease SQL performance once the data set becomes very large.
EDIT:
This will depend on how far you wish to take it. You can set it up that it will be per form/table, add attributes that describe the actual control (lookup, combo, datetime, etc...) position of the controls, allowed values (min/max/allow null).
This can become a very cumbersome task, but will greatly depend on your actual needs.
I'd think you could allow that at the user-permission level (grant the ALTER privilege on the appropriate tables) and then restrict what types of data can be added/deleted using your presentation layer.
But why add columns? Why not a linked table?
Allowing users to define columns is generally a poor choice as they don't know what they are doing or how to relate it properly to the other data. Sometimes people use the EAV approach to this and let them add as many columns as they want, but this quickly gets out of control and causes performance issues and difficulty in querying the data.
Others take the approach of having a table with user defined columns and give them a set number of columns they can define. This works better performance wise but is more limiting interms of how many new columns they can define.
In any event you should severely restrict who can define the additional columns only to system admins (who can be at the client level). It is a better idea to actually talk to users in the design phase and see what they need. You will find that you can properly design a system that has 90+% of waht the customer needs if you actually talk to them (and not just to managers either, to users at all levels of the organization).
I know it is common in today's world to slough off our responsibility to design by saying we are making things flexible, but I've had to use and provide dba support for many of these systems and the more flexible they try to make the design, the harder it is for the users to use and the more the users hate the system.

How do you know when you need separate tables?

How do you know when to create a new table for very similar object types?
Example:
To learn mysql I'm building a model solar system. For the purposes of my project, planets have many similar attributes to dwarf planets, centaurs, and comets. Dwarf planets are almost completely identical to planets. Centaurs and comets are only different from planets because their orbital path has more variation. Should I have a separate table for each type of object, or should they share tables?
The example is probably too simple, but I'm also interested in best practices. Like should I use separate tables just in case I want to make planets and dwarf planets different in the future, or are their any efficiency reasons for keeping them in the same table.
Normal forms is what you should be interested with. They pretty much are the convention for building tables.
Any design that doesn't break the first, second or third normal form is fine by me. That's a pretty long list of requirement though, so I suggest you go read it off the Wikipedia links above.
It depends on what type of information you want to store about the objects. If the information for all of them is the same, say orbit radius, mass and name, then you can use the same table. However, if there are different properties for each (say atmosphere composition for planets, etc.) then you can either use separate tables for each (not very normalized) or have one table for basic properties like orbit, mass and name and a second table for just the properties that are unique to planets (and a similar table for comets, etc. if needed). All objects would be in the first table but only planets would be in the second table and linked through a foreign key to the first table.
It's called Database Normalization
There are many normal forms. By applying normalization you will go through metadata (tables) and study the relationsships between data more clearly. By using the normalization techniques you will optimize the tables to prevent redundancy. This process will help you understand which entities to create based on the relationsships between the different fields.
You should most likely split the data about a planet etc so that the shared (common) information is in another table.
E.g.
Common (Table)
Diameter (Column)
Mass (Column)
Planet
Population
Comet
Speed
Poor columns I know. Have the Planet and Comet tables link to the Common data with a key.
This is definitely a subjective question. It sounds like you are already on the right lines of thinking. I would ask:
Do these objects share many attributes? If so, it's probably worth considering at the very least a base table to list them all in.
Does one object "extend" another - it has all the attributes of the other, plus some extras? If so, it might be worth adding another table with the extra attributes and a one-to-one mapping back to the base object.
Do both objects have many shared attributes and unshared attributes? If this is the case, maybe you need a single table plus a "data extension" system where each object can have a type or category that specifies any amount of extra attributes that may be associated with it.
Do the objects only share one or two attributes? In this case, they are probably dissimilar enough to separate into multiple tables.
You may also ask yourself how you are going to query the data. Will you ever want to get them all in the same list? It's always a good idea to combine data into tables with other data they will commonly be queried with. For example, an "attachments" table where the file can be an image or a video, instead of images and video tables, if you commonly want to query for all attachments. Don't split into multiple tables unless there is a really good reason.
If you will ever want to get planets and comets in one single query, they will pretty much have to be in the same table if you want the database to work efficiently. Inheritance should be handled inside your app itself :)
Here's my answer to a similar question, which I think applies here as well:
How do you store business activities in a SQL database?
There are many different ways to express inheritance in your relational model. For example you can try to squish everything in to one table and have a field that allows you to distinguish between the different types or have one table for the shared attributes with relationships to a child table with the specific attributes etc... in either choice you're still storing the same information. When going from a domain model to a relational model this is what is called an impedance mismatch. Both choices have different trade offs, for example one table will be easier to query, but multiple tables will have higher data density.
In my experience it's best not to try to answer these questions from a database perspective, but let your domain model, and sometimes your application framework of choice, drive the table structure. Of course this isn't always a viable choice, especially when performance is concerned.
I recommend you start by drawing on paper the relationships you want to express and then go from there. Does the table structure you've chosen represent the domain accurately? Is it possible to query to extract the information you want to report on? Are the queries you've written complicated or slow? Answering these questions and others like them will hopefully guide you towards creating a good relational model.
I'd also suggest reading up on database normalization if you're serious about learning good relational modeling principals.
I'd probably have a table called [HeavenlyBodies] or some such thing. Then have a look up table with the type of body, ie Planet, comet, asteroid, star, etc. All will share similar things such as name, size, weight. Most of the answers I read so far all have good advise. Normalization is good, but I feel you can take it too far sometimes. 3rd normal is a good goal.

MySQL Column Unification, any performance improvements?

I'm designing a MySQL table for an authentication system for a high-traffic personal website. Every time a user comment, article, etc is displayed the following fields will be needed:
login
User Display
User Bio ( A little signature )
Website Account
YouTube Account
Twitter Account
Facebook Account
Lastfm Account
So everything is in one table to prevent the need to call sub-tables. So my question is:
¿Would there be any improvements if I combine Website, Youtube, Twitter, Facebook and Lastfm columns to one?
For example:
[website::something.com][youtube::youtube.com/something]
No, combining these columns would not result in any improvement. Indeed it seems you would extend the overall length (with the adding of prefix and separators, hence potentially worsening performance.
A few other tricks however, may help:
reduce the size of the values stored in "xxxAccount" columns, by removing altogether, or replacing with short-hand codes, the most common parts of these values (the examples shown indicate some kind of URL whereby the beginning will likely be repeated.
depending on the average length of the bio, and typical text found therein, it may also be useful to find ways of shrinking its [storage] size, with simple replacement of common words, or possibly with actual compression (ZIP and such), although doing so may result in having to store the column in a BLOB column which may then become separated from the table, depending on the server implementation/configuration.
And, of course, independently form any improvements at the level of the database, the use model indicated seems to prompt for caching this kind of data agressively, to avoid the trick to SQL altogether.
Well i dont think so , think of it this way .. you will need some way to split them and that would require additional processing and then why not just have one field in the whole table and have everything in that? :) Dont worry about the performance it would be better with separate columns