When should one use one to one relationships? When should you add new fields and when should you separate them into a new table?
It seems to me that you'd use it whenever you're grouping fields and/or that group tends to be optional. Yes?
I'm trying to create the tables for an object but grouping/separating everything would require me about 20 joins and some even 4 levels deep.
Am I doing something wrong? How can I improve?
First, I highly recommend reading about Normal Forms
A normalized relational database is extremely useful, and doing this properly is the reason tools such as Hibernate exist - to help manage the difference between objects-represented-as-relational-mappings and objects-as-progrommatic-entities.
Anything that has a one-to-one mapping should probably be in the same table. A Person has only one first name, one last name. Those should logically be in the same table. Having a reference to a table of names isn't necessary - in particular because little additional data can be stored about a name. Obviously, this isn't always true (an etymology database might want to do exactly that), but for most uses, you don't care about where a name comes from - indeed all you want is the name.
Therefore, think of the objects being represented. A person has some singular data points, and some one-to-many relationships (addresses they have lived, for instance). One to many and many to many will almost always require a separate table (or two, to have many to many). Following those two guidelines, you can get a normalized database pretty fast.
Note that optional fields should be avoided if at all possible. Usually this is a case of having a separate table holding the field with a reference back to the original table. Try to keep your tables lean. If a field isn't likely to have something, it probably should be a row in it's own table. Many such properties suggests a 'Property' table that can hold arbitrary optional properties of a particular type (ie, as are applied to a 'Person').
Related
I will start my question with the abstract case and then I will also give a concrete example in case it helps.
Assuming I have a tableX with columns A,B,C,D,E,F.
A,B,F are required.
Now we can have a record with C,D populated (so E is null) or a record with E populated (so C,D are null).
Is this table normalized or properly designed? I am not sure if this relations/expectations among these columns as I described should be "captured" differently.
Example:
A table to be used by a message processor where either the actual msg to get/process is stored in column E OR the url and the protocol to fetch the message to process are stored in columns C and D
Normally tables that store Class Hierarchy (super class and sub-classes together) require a separate discriminator column. In your case each of the three columns - C,D or E - can be used as such, so an additional column is required.
Such data organization offers best performance for simple queries.
If you split it into 3 separate tables (super class and its two sub-classes) you will get a normalized model. I believe in your case it does not make sense, as long as you have just these three nullable columns.
If your example is a simplified presentation of your real data model and your sub-classes differ substantially, then normalization will be more economical in storage space and offer faster execution for queries that rely solely on super class data.
The table is probably not properly normalized. It sounds like there are two types of entities being stored in the table -- the A,B,C,D,F entity and the A,B,E,F entity.
Does this make the schema bad? Probably not. Relational databases use primary keys to connect one table to another. If other tables can connect to either type of entity, then it makes sense to store them in a single table. This allows one single key to connect them. You could, of course, introduce a three table schema (one for each subentity and one for the parent entity). This could be overkill when the entities are really quite similar.
Your example is a fine example. This sounds like a control table for a process that can do one of two things. It makes sense that different columns are used for each type processing.
Conventions for normalized databases rule that the best practice for dealing with multivariable dependencies is spinning them off into their own table with two columns. One column is the primary key of the original table (for example, customer name, of which there is one), while the other is the value with has multiple values (for example, email or phone- the customer could have multiple of these). Together these two columns constitute the primary key for the spun off table.
However, when building normalized databases, I often find naming these spun off tables troublesome. It's hard to come up with a meaningful names for these tables. Is there a standard way of identifying these tables as multivariable dependency tables that are meaningless without the presence of the other table? Some examples I can think of (referencing the example above) are 'customer_phones' or 'customer_has_phones'. I don't think just 'phones' would be good, because that doesn't identify this table as related to and heavily dependent on the customers table.
In real life you end up running into a lot of combinations that vary a lot from each other.
Try to be as clear as possible in case someone else ends up inheriting your design. I personally like to keep short names in the parent tables so they don't end up being super long whenever the relationship grows or spans off new children.
For instance, if I have "Customer", "Subscriptions", "Product" tables I would end up naming their links like "Customer_Subscriptions" or "Subscriptions_Products" and such.
Most of the time it just gets down to what works better for you in terms of maintainability.
The convention we use is the name of the entity table, followed by the name of the attribute.
In your example, if the entity table is customer, the name of the table for the repeating (multi-valued) attribute would be customer_phone or customer_phone_number. (We almost always name tables in the singular, based on the idea that we are naming what ONE tuple (row) represents. (e.g. a row in that table represents one occurrence of a phone number for a customer.)
I need to implement custom fields in a booking software. I need to extend some tables containing, for example, the user groups with dynamic attributes.
But also, a product table where each product can have custom fields (and ideally these fields could be nested).
I already made some searches about EAV but I read many negative comments, so I'm wondering which design to use for this kind of things.
I understand using EAV causes many joins to sort a page of products, but I don't feel like I want to alter the groups/products tables, each time an attribute is created.
Note : I use Innodb
The only good solution is pretty much what you don't want to do, alter the groups/products tables, each time an attribute is created. It's a pain, yes, but it will guarantee data integrity and better performance.
If you don't want to do that, you can create a table with TableName, FieldName, ID and value, and hold lets say:
TableName='Customer', FieldName='Address', ID =1 (customers ID), Value
='customers address'
But as you said, it will need loads of joins. I don't think it is a good solution, I've seen it but wouldn't really recommend it. Just showing because well, it is one possible solution.
Another solution would be to add several pre-defined columns on your tables like column1, column2, column3 and so on and use them as necessary. It's a solution as worst as the previous one but I've seen major ERPs that use it.
Mate, based on experience, anything you will find on this area would be a huge work around and won't be worth implementing, the headache you will have to maintain it will be bigger than adding your fields to your table. Keep it simple and correct.
I am working on a project entirely based on EAV. I agree that EAV make things complex and slow, but it has its own advantages like we don't need to change the database structure or code for adding new attributes and we can have hierarchies among the data in the database tables.
The system can get extremely slow if we are using EAV at all the places.
But, Eav is very helpful, if used wisely. I will never design my entire DB based on EAV. I will divide the common and useful attributes and put them in flat tables while for the additional attributes (which might need to be changed depending on clients or various requirements), I will use EAV.
This way we can have the advantages of EAV which includes flexibility what you want without getting much trouble.
This is just my suggestion, there might be a better solution.
You can do this by adding at least 2 more tables.
One table will contain attribute unique key (attr_id) and attribute values, like attribute name and something else that is needed by your business logic.
Second table will serve as join between your say products table and attributes table and should have the following fields:
(id, product_id, attr_id)
This way, you can add as many dynamic attributes as you like, and your database schema will be future proof.
The only downside that queries now will have to add 2 more tables to be joined.
In my database I have different entities like todos, events, discussions, etc. Each of them can have tags, comments, files, and other related items.
Now I have to design the relationships between these tables, and I think I have to choose from the following two possible solutions:
1. Separated relationship tables
So I will create todos_tags, events_tags, discussions_tags, todos_comments, events_comments, discussions_comments, etc. tables.
2. Common relationship tables
I will create only these tables: related_tags, related_comments, related_files, etc. having a structure like this:
related_tags
entity (event|discussion|todo|etc. - as enum or tinyint (1|2|3|etc.))
entity_id
tag_id
Which design should I use?
Probably you will say: it depends on the situation, and I think this is correct.
I my case most of the time (maybe 70%+) I will have to query only one of the entities (events, discussion or todos), but in some cases I need them all in the same query (both events, discussion, todos having a specified tag for example). In this case I'll have to do on union on 3+ tables (in my case it can be 5+ tables) if I go with separated relationship tables.
I'll not have more than 1000-2000 rows in each table(events, discussions, todos);
What is the correct way to go? What are some personal experiences about this?
The second schema is more extensible. This way you will be able to extend your application to construct queries involving more than one type. In addition, it's possible to easily add new types to the future even dynamically. Furthermore, it allows greater aggregation freedom, for example allowing you to count how many rows in each type exist, or how many were created during a particular timeframe.
On the other hand, the first design does not really have many advantages other than speed: But MySQL is already good at handling these types of queries fast enough for you. You can create an index "entity" to make it work smoothly. If in the future you need to partition your tables to increase speed, you can do so at a later stage.
It is a far simpler design to have a single, common relationship table such as related_tags where you specify the entity type in a column rather than having multiple tables. Just be sure you properly index the entity and tag_id fields together to have optimum performance.
I am designing a database which holds a lot of information for a user. Currently I need to store 20 different values, but over time I could be be adding more and more.
I have looked around StackOverflow for simular questions, but it usually ends up with the asker just not designing his table correctly.
So based of what I have seen around StackOverflow, should I:
Create a table with many null columns and use them when needed (this seems terrible to me)
Create a users table and a information table where information is a key-value pair: [user_id, key, value]
Anything else you can suggest?
Keep in mind this is for a MySQL database, so I understand the disliking for a Key-Value table on a relational database.
Thanks.
hmm, i am a bit confused by the question, but it sounds like you want to have lots of attributes for one user right? And in the future you want to add more??
Well, isn't that just have a customer_attribute_ref ref table of some sort, then you can easily add more by then inserting to the ref table, then in the customer table you have at least three columns : 1. customer ID 2. customer attribute ID 3. customer attribute value...
may be i missed your question. Can you clarify
I'd suggest 3. A hybrid of 1 and 2. That is, put your core fields, which are already known, and you know you'll be querying frequently, into the main table. Then add the key-value table for more obscure or expanded properties. I think this approach balances competing objectives of keeping your table width relatively narrow, and minimizing the number of joins needed for basic queries.
Another approach you could consider instead of or in combination with the above is an ETL process of some kind. Maybe you define a key-value table as a convenient way for your applications to add data; then set up replication, triggers, and/or a nightly/hourly stored procedure to transform the data into a form more suitable for querying and reporting purposes.
The exact best approach should be determined by careful planning and consideration of the entire architecture of your application.