Custom fields / attributes model - mysql

I need to implement custom fields in a booking software. I need to extend some tables containing, for example, the user groups with dynamic attributes.
But also, a product table where each product can have custom fields (and ideally these fields could be nested).
I already made some searches about EAV but I read many negative comments, so I'm wondering which design to use for this kind of things.
I understand using EAV causes many joins to sort a page of products, but I don't feel like I want to alter the groups/products tables, each time an attribute is created.
Note : I use Innodb

The only good solution is pretty much what you don't want to do, alter the groups/products tables, each time an attribute is created. It's a pain, yes, but it will guarantee data integrity and better performance.
If you don't want to do that, you can create a table with TableName, FieldName, ID and value, and hold lets say:
TableName='Customer', FieldName='Address', ID =1 (customers ID), Value
='customers address'
But as you said, it will need loads of joins. I don't think it is a good solution, I've seen it but wouldn't really recommend it. Just showing because well, it is one possible solution.
Another solution would be to add several pre-defined columns on your tables like column1, column2, column3 and so on and use them as necessary. It's a solution as worst as the previous one but I've seen major ERPs that use it.
Mate, based on experience, anything you will find on this area would be a huge work around and won't be worth implementing, the headache you will have to maintain it will be bigger than adding your fields to your table. Keep it simple and correct.

I am working on a project entirely based on EAV. I agree that EAV make things complex and slow, but it has its own advantages like we don't need to change the database structure or code for adding new attributes and we can have hierarchies among the data in the database tables.
The system can get extremely slow if we are using EAV at all the places.
But, Eav is very helpful, if used wisely. I will never design my entire DB based on EAV. I will divide the common and useful attributes and put them in flat tables while for the additional attributes (which might need to be changed depending on clients or various requirements), I will use EAV.
This way we can have the advantages of EAV which includes flexibility what you want without getting much trouble.
This is just my suggestion, there might be a better solution.

You can do this by adding at least 2 more tables.
One table will contain attribute unique key (attr_id) and attribute values, like attribute name and something else that is needed by your business logic.
Second table will serve as join between your say products table and attributes table and should have the following fields:
(id, product_id, attr_id)
This way, you can add as many dynamic attributes as you like, and your database schema will be future proof.
The only downside that queries now will have to add 2 more tables to be joined.

Related

Is it possible to add extra columns to fields in mysql?

We are looking to extend MySQL by adding extra columns to each "column". Right now you have the following.
Field, Type, Null, Key, Default, Extra
We want to be able to add to the "column" definition an extra column like, Attributes. Our system has certain design specifications that we need to describe more data per "column". How can we accomplish this in MySQL?
The query to return back all of the columns is as follows.
SHOW COLUMNS FROM MyDB.MyTable;
EDIT 1
I should have added this to begin with, and I apologize for not doing so. We are currently describing attributes in the Comments section for each column type, and we understand that this is a very dirty solution, but it was the only one we could think of at the time. We have built a code generator that revolves around the DB structure and is what really stems from this initiative. We want to describe code attributes for a column so the code generator can pick up the changes and refresh the code base on each change or run.
First, terminology: "field" and "column" are basically synonyms in this context. There is no distinction between fields and columns. Some MySQL commands even allow you to use these two words interchangeably (e.g. SHOW FIELDS FROM MyDB.MyTable).
We want to assign attributes to each column in a table. Adding "field_foo" for "field" would repeat the same data over and over again for each row.
Simple answer:
If you want more attributes that pertain to a given column foo, then you should create another table, where foo is its primary key, so each distinct value gets exactly one row. This is part of the process of database normalization. This allows you to have attributes to describe a given value of foo without repeating data, even when you use that value many times in your original table.
It sounds like you might also need to allow for extensibility and you want to allow new columns at some future time, but you don't know which columns or how many right now. This is a pretty common project requirement.
You might be interested in my presentation Extensible Data Modeling, in which I give an overview of different solutions in SQL for this type of problem.
Extra Columns
Entity-Attribute-Value
Class Table Inheritance
Serialized LOB
Inverted Indexes
Online Schema Changes
Non-Relational Databases
None of these solutions are foolproof, each has their strengths and weaknesses. So it is worth learning about all of them, and then decide which ones have strengths that matter to your specific project, while their weaknesses are something that doesn't inconvenience you too much (that's the decision process for many software design choices).
We are currently describing attributes in the Comments section for each column type
So you're using something like the "Serialized LOB" solution.

Which of these 2 MySQL DB Schema approaches would be most efficient for retrieval and sorting?

I'm confused as to which of the two db schema approaches I should adopt for the following situation.
I need to store multiple attributes for a website, e.g. page size, word count, category, etc. and where the number of attributes may increase in the future. The purpose is to display this table to the user and he should be able to quickly filter/sort amongst the data (so the table strucuture should support fast querying & sorting). I also want to keep a log of previous data to maintain a timeline of changes. So the two table structure options I've thought of are:
Option A
website_attributes
id, website_id, page_size, word_count, category_id, title_id, ...... (going up to 18 columns and have to keep in mind that there might be a few null values and may also need to add more columns in the future)
website_attributes_change_log
same table strucuture as above with an added column for "change_update_time"
I feel the advantage of this schema is the queries will be easy to write even when some attributes are linked to other tables and also sorting will be simple. The disadvantage I guess will be adding columns later can be problematic with ALTER TABLE taking very long to run on large data tables + there could be many rows with many null columns.
Option B
website_attribute_fields
attribute_id, attribute_name (e.g. page_size), attribute_value_type (e.g. int)
website_attributes
id, website_id, attribute_id, attribute_value, last_update_time
The advantage out here seems to be the flexibility of this approach, in that I can add columns whenever and also I save on storage space. However, as much as I'd like to adopt this approach, I feel that writing queries will be especially complex when needing to display the tables [since I will need to display records for multiple sites at a time and there will also be cross referencing of values with other tables for certain attributes] + sorting the data might be difficult [given that this is not a column based approach].
A sample output of what I'd be looking at would be:
Site-A.com, 232032 bytes, 232 words, PR 4, Real Estate [linked to category table], ..
Site-B.com, ..., ..., ... ,...
And the user needs to be able to sort by all the number based columns, in which case approach B might be difficult.
So I want to know if I'd be doing the right thing by going with Option A or whether there are other better options that I might have not even considered in the first place.
I would recommend using Option A.
You can mitigate the pain of long-running ALTER TABLE by using pt-online-schema-change.
The upcoming MySQL 5.6 supports non-blocking ALTER TABLE operations.
Option B is called Entity-Attribute-Value, or EAV. This breaks rules of relational database design, so it's bound to be awkward to write SQL queries against data in this format. You'll probably regret using it.
I have posted several times on Stack Overflow describing pitfalls of EAV.
Also in my blog: EAV FAIL.
Option A is a better way ,though the time may be large when alert table for adding a extra column, querying and sorting options are quicker. I have used the design like Option A before, and it won't take too long when alert table while millions records in the table.
you should go with option 2 because it is more flexible and uses less ram. When you are using option1 then you have to fetch a lot of content into the ram, so will increases the chances of page fault. If you want to increase the querying time of the database then you should defiantly index your database to get fast result
I think Option A is not a good design. When you design a good data model you should not change the tables in a future. If you domain SQL language, using queries in option B will not be difficult. Also it is the solution of your real problem: "you need to store some attributes (open number, not final attributes) of some webpages, therefore, exist an entity for representation of those attributes"
Use Option A as the attributes are fixed. It will be difficult to query and process data from second model as there will be query based on multiple attributes.

Mysql: separate or common relationship tables for different entities

In my database I have different entities like todos, events, discussions, etc. Each of them can have tags, comments, files, and other related items.
Now I have to design the relationships between these tables, and I think I have to choose from the following two possible solutions:
1. Separated relationship tables
So I will create todos_tags, events_tags, discussions_tags, todos_comments, events_comments, discussions_comments, etc. tables.
2. Common relationship tables
I will create only these tables: related_tags, related_comments, related_files, etc. having a structure like this:
related_tags
entity (event|discussion|todo|etc. - as enum or tinyint (1|2|3|etc.))
entity_id
tag_id
Which design should I use?
Probably you will say: it depends on the situation, and I think this is correct.
I my case most of the time (maybe 70%+) I will have to query only one of the entities (events, discussion or todos), but in some cases I need them all in the same query (both events, discussion, todos having a specified tag for example). In this case I'll have to do on union on 3+ tables (in my case it can be 5+ tables) if I go with separated relationship tables.
I'll not have more than 1000-2000 rows in each table(events, discussions, todos);
What is the correct way to go? What are some personal experiences about this?
The second schema is more extensible. This way you will be able to extend your application to construct queries involving more than one type. In addition, it's possible to easily add new types to the future even dynamically. Furthermore, it allows greater aggregation freedom, for example allowing you to count how many rows in each type exist, or how many were created during a particular timeframe.
On the other hand, the first design does not really have many advantages other than speed: But MySQL is already good at handling these types of queries fast enough for you. You can create an index "entity" to make it work smoothly. If in the future you need to partition your tables to increase speed, you can do so at a later stage.
It is a far simpler design to have a single, common relationship table such as related_tags where you specify the entity type in a column rather than having multiple tables. Just be sure you properly index the entity and tag_id fields together to have optimum performance.

MySQL: Key-Value vs. Null Columns for creating tables with unknown number of columns

I am designing a database which holds a lot of information for a user. Currently I need to store 20 different values, but over time I could be be adding more and more.
I have looked around StackOverflow for simular questions, but it usually ends up with the asker just not designing his table correctly.
So based of what I have seen around StackOverflow, should I:
Create a table with many null columns and use them when needed (this seems terrible to me)
Create a users table and a information table where information is a key-value pair: [user_id, key, value]
Anything else you can suggest?
Keep in mind this is for a MySQL database, so I understand the disliking for a Key-Value table on a relational database.
Thanks.
hmm, i am a bit confused by the question, but it sounds like you want to have lots of attributes for one user right? And in the future you want to add more??
Well, isn't that just have a customer_attribute_ref ref table of some sort, then you can easily add more by then inserting to the ref table, then in the customer table you have at least three columns : 1. customer ID 2. customer attribute ID 3. customer attribute value...
may be i missed your question. Can you clarify
I'd suggest 3. A hybrid of 1 and 2. That is, put your core fields, which are already known, and you know you'll be querying frequently, into the main table. Then add the key-value table for more obscure or expanded properties. I think this approach balances competing objectives of keeping your table width relatively narrow, and minimizing the number of joins needed for basic queries.
Another approach you could consider instead of or in combination with the above is an ETL process of some kind. Maybe you define a key-value table as a convenient way for your applications to add data; then set up replication, triggers, and/or a nightly/hourly stored procedure to transform the data into a form more suitable for querying and reporting purposes.
The exact best approach should be determined by careful planning and consideration of the entire architecture of your application.

When is it a good idea to move columns off a main table into an auxiliary table?

Say I have a table like this:
create table users (
user_id int not null auto_increment,
username varchar,
joined_at datetime,
bio text,
favorite_color varchar,
favorite_band varchar
....
);
Say that over time, more and more columns -- like favorite_animal, favorite_city, etc. -- get added to this table.
Eventually, there are like 20 or more columns.
At this point, I'm feeling like I want to move columns to a separate
user_profiles table is so I can do select * from users without
returning a large number of usually irrelevant columns (like
favorite_color). And when I do need to query by favorite_color, I can just do
something like this:
select * from users inner join user_profiles using user_id where
user_profiles.favorite_color = 'red';
Is moving columns off the main table into an "auxiliary" table a good
idea?
Or is it better to keep all the columns in the users table, and always
be explicit about the columns I want to return? E.g.
select user_id, username, last_logged_in_at, etc. etc. from users;
What performance considerations are involved here?
Don't use an auxiliary table if it's going to contain a collection of miscellaneous fields with no conceptual cohesion.
Do use a separate table if you can come up with a good conceptual grouping of a number of fields e.g. an Address table.
Of course, your application has its own performance and normalisation needs, and you should only apply this advice with proper respect to your own situation.
I would say that the best option is to have properly normalized tables, and also to only ask for the columns you need.
A user profile table might not be a bad idea, if it is structured well to provide data integrity and simple enhancement/modification later. Only you can truly know your requirements.
One thing that no one else has mentioned is that it is often a good idea to have an auxiliary table if the row size of the main table would get too large. Read about the row size limits of your specific databases in the documentation. There are often performance benefits to having tables that are less wide and moving the fields you don't use as often off to a separate table. If you choose to create an auxiliarary table with a one-to-one relationship make sure to set up the PK/FK relationship to maintain data integrity and set a unique index or constraint on the FK field to mainatin the one-to-one relationship.
And to go along with everyone else, I cannot stress too strongly how bad it is to ever use select * in production queries. You save a few seconds of development time and create a performance problem as well as make the application less maintainable (yes less - as you should not willy nilly return things you may not want to show on the application but you need in the database. You will break insert statements that use selects and show users things you don't want them to see when you use select *.).
Try not to get in the habit of using SELECT * FROM ... If your application becomes large, and you query the users table for different things in different parts of your application, then when you do add favorite_animal you are more likely to break some spot that uses SELECT *. Or at the least, that place is now getting unused fields that slows it down.
Select the data you need specifically. It self-documents to the next person exactly what you're trying to do with that code.
Don't de-normalize unless you have good reason to.
Adding a favorite column ever other day every time a user has a new favorite is a maintenance headache at best. I would highly consider creating a table to hold a favorites value in your case. I'm pretty sure I wouldn't just keep adding a new column all the time.
The general guideline that applies to this (called normalization) is that tables are grouped by distinct entities/objects/concepts and that each column(field) in that table should describe some aspect of that entity
In your example, it seems that favorite_color describes (or belongs to) the user. Some times it is a good idea to moved data to a second table: when it becomes clear that that data actually describes a second entity. For example: You start your database collecting user_id, name, email, and zip_code. Then at some point in time, the ceo decides he would also like to collect the street_address. At this point a new entity has been formed, and you could conceptually view your data as two tables:
user: userid, name, email
address: steetaddress, city, state, zip, userid(as a foreign key)
So, to sum it up: the real challenge is to decide what data describes the main entity of the table, and what, if any, other entity exists.
Here is a great example of normalization that helped me understand it better
When there is no other reason (e.g. there are normal forms for databases) you should not do it. You dont save any space, as the data must still stored, instead you waste more as you need another index to access them.
It is always better (though may require more maintenance if schemas change) to fetch only the columns you need.
This will result in lower memory usage by both MySQL and your client application, and reduced query times as the amount of data transferred is reduced. You'll see a benefit whether this is over a network or not.
Here's a rule of thumb: if adding a column to an existing table would require making it nullable (after data has been migrated etc) then instead create a new table with all NOT NULL columns (with a foreign key reference to the original table, of course).
You should not rely on using SELECT * for a variety of reasons (google it).