Which way is better to implement custom fields in a web application - mysql

I have a self made web application in PHP and MySQL. The many different clients using my system would like to augment entities with custom fields. Each client would like to store their own additional data, so this should be done in a dynamic way.
For example client1 would like to add the "color" property to their products, client2 want a field called "safety_level" for their products.
I want a methodology that can be applied not only for products but for users and for any other entities as well.
Here are 2 options I found the optimal, but can't decide which one is the most effective:
OPTION 1:
For every entity I make a [entityname]_customfields table in which I store the additional field values in 1:1.
e.g.:
+---------------------------------------------+
|products_custom_fields |
+---------------------------------------------+
|product_id (PK and FK related to products.id)|
|safety_level |
|some_other_fields |
+---------------------------------------------+
pro: this table can has no more records than the entity table (in this case the products) which means fewer records and it is quite easy to overview.
con: adding new fields or deleting old ones require DDL queries. I don't want to confide DDL to users...not even operators with admin permissions.
OPTION 2:
[entity]_custom_field_values will have N:1 relations to [entity] table. Each row contains the the type of the custom field and the value itself. In this case we need another table which contains the custom field types. e.g.:
custom field values:
+----------------------------------------------------------------------+
|products_custom_field_values |
+----------------------------------------------------------------------+
|custom_field_id |
|custom_field_type (FK product_custom_field_types.custom_field_type_id)|
|value |
+----------------------------------------------------------------------+
custom field types:
+---------------------------------------------------------+
|products_custom_field_types |
+---------------------------------------------------------+
|custom_field_type_id (PK AUTO_INCREMENT) |
|product_id (FK related to products.id) |
+---------------------------------------------------------+
pro: managing fields is easy, does not require to alter table structures
con: more records, all kind of custom field values in a big mess...which is not necessary wrong, because that's the point of MySQL, to extract useful data from a big mess. The question is what about efficiency and performance?

Note: this topic is actually covered in the "SQL Antipatterns", which I strongly recommend you read
I am a lazy person, which means that I tend to apply YANGI to my code. So this is my approach.
So. Let's assume that there are two groups of products:
ProductFoo ProductBar
- productID - productID
- name - name
- price - price
- color - supply
- weight - manufacturerID
- safety
In this case there are three common elements, that go in the main Products table. And the custom parameters would be stored using table inheritance (it's a thing, google it). So, basically you would end up with three tables: Products, ProductsFoo and ProductsBar, where Products table has a "type" field and both of the "child tables" would have a productID foreign key, that's pointing to its parent table.
That's if you know at the development time, what "custom fields" each client will want.
Now, lets assume clients are being difficult and want make up custom fields whenever they feel like it.
In this case I would simply create a Products.data fields, which contains a JSON with all the custom attributes for each product. And only "extract" special attributes to an inheriting table, when client wants to search by that custom attribute (there is not sane way to index JSON, if clients want to search by their new "wallywanker" attribute).
You end up with same basic structure, but the "sub-tables" only contain the attributes, that are expected to be searchable
I hope this made sense.

If its is a company project, follow the standards followed on previous projects.
Have a look at conventions such as Hungarian notation, that would make more sense than repeating a prefix. Also it is more likely your model name is your table name.
Also if you are planning to use ORM they might have some best practices as well.
https://en.wikipedia.org/wiki/Hungarian_notation

Related

MySQL Schema Advice: Unpredictable Field Additions

A little overview of the problem.
Let's say I have a table named TableA with fixed properties, PropertyA, PropertyB, PropertyC. This has been enough for your own website needs but then you suddenly have clients that want custom fields on your site.
ClientA wants to add PropertyD and PropertyE.
ClientB wants to add PropertyF and PropertyG.
The catch is these clients don't want each others fields. Now imagine if you get more clients, the solution of just adding nullable fields in TableA will be cumbersome and you will end up with a mess of a table. Or at least I assume that's the case feel free to correct me. Is it better if I just do that?
Now I thought of two solutions. I'm asking if there's a better way to do it since I'm not that confident with the trade offs and their future performance.
Proposed Solution #1
data_id is a not exactly a foreign key but it stores whatever corresponding client property is attached to a table A row. Using client_id as the only foreign key present on both the property table and table A.
It feels like it's an anti pattern of some sorts but I could imagine queries will be easy this way but it requires that the developer knows what property table it should pick from. I'm not sure if many tables is a bad thing.
Proposed Solution #2
I believe it's a bit more elegant and can easily add more fields as necessary. Not to mention these are the only tables I would need for everything else. Just to visualize. I will add the request properties in the properties table like so:
Properties
-------------
1 | PropertyD
2 | PropertyE
3 | PropertyF
4 | PropertyG
And whenever I save any data I would tag all properties whenever they are available like so. For this example I want to save a ClientA stored in the Clients table on id 1.
Property_Mapping
--------------------------------------------------------
property_id | table_a_id | property_value | client_id
--------------------------------------------------------
1 | 1 | PROPERTY_D_VALUE | 1
2 | 1 | PROPERTY_E_VALUE | 1
There are obvious possible complexity of query on this one, I'd imagine but it's more a tradeoff. I intended client_id to be placed on property_mapping just in case clients want the same fields. Any advice?
You've discovered the Entity-Attribute-Value antipattern. It's a terrible idea for a relational database. It makes your queries far more complex, and it takes 4-10x the storage space.
I covered some pros and cons of several alternatives in an old answer on Stack Overflow:
How to design a product table for many kinds of product where each product has many parameters
And in a presentation:
Extensible Data Modeling with MySQL
As an example of the trouble EAV causes, consider how you would respond if one of your clients says that PropertyD must be mandatory (i.e. the equivalent of NOT NULL) and PropertyE must be UNIQUE. Meanwhile, the other client says that PropertyG should be restricted to a finite set of values, so you should either use an ENUM data type, or use a foreign key to a table of allowed values.
But you can't implement any of these constraints using your Properties table, because all the values of all the properties are stored in the same column.
You lose features of relational databases when you use this antipattern, such as data types and constraints.

MySQL Database Design with tags across multiple tables

I am working on some web apps which should all use the same user table. The different applications all need different table designs, so I created one table for each app, with the UserID being a foreign key referring to the user table.
Now I want to add tags to all apps. The tags should be in one table for every app in order to easily query all tags from one user(for searching purposes, the search should be able to find everything tagged with that tag, no matter the app). Personally, I don't think splitting them up into multiple tables would be a good idea, but I am not that into database design so I might be wrong. My current attempt looks something like this:
[tags]
EntryID | UserID | Tag
The thing is that the EntryIDs of course would have to be unique across all app tables with this solution. For the notes app I need something like this:
[notes]
EntryID | UserID | title | content | etc.
For my calendar I have the following table:
[calendar]
EntryID | UserID | name | start | end | etc.
Now I don't know how to manage those EntryIDs. Should I create another table like this
[entries]
EntryID | UserID | type
with type being something like "note" or "calendar", and EntryID being the primary key? And should the type be something like an integer, or a string, or is there a possibility to kind of refer to another table in the type column? And should I then make the EntryIDs in the app tables into foreign keys referring to the entries table?
I put the userID in every table because I think this is going to speed up querying, for example when I need every tag one user has set across all apps. I know normalization usually prohibits this, but I again think that it would very much increase query speed and reduce load for both the MySQL server and my back-end.
I would appreciate every tip for structuring this, and thanks in advance!
You can use inheritance, similar to this:
I'm not sure what the role of the user is supposed to be here, exactly. In the model above, user "owns" an entry and (presumably) tags it. If you want multiple users to (be able to) tag the same entry, USER would need to be connected to the junctions table TAG_ENTITY.
For more on how to physically implement inheritance, see here.
You may also be interested in this and this.

How to structure table Activities in a database?

I have a site written in cakephp with a mysql database.
Into my site I want to track the activities of every users, for example (like this site) if a user insert a product I want to put this activity into my database.
I have 2 ways:
1) One table called Activities with:
- id
- user_id
- title
- text
- type (the type of activity: comment, post edit)
2) more table differenced by activities
- table activities_comment
- table activities_post
- table activities_badges
The problem is when I go to the page activities of a user I can have different type of activities and I don't know which of this solution is better because a comment has a title and a comment, a post has only a text, a badge has an external id to its table (for example) ecc...
Help me please
I'm not familiar with CakePHP, but from purely database perspective your data model should probably look similar to this:
The symbol denotes category (aka. inheritance, subclass, subtype, generalization hierarchy etc.). Take a look at "Subtype Relationships" in ERwin Methods Guide for more info.
There are generally 3 strategies for implementing the category:
All types in single table. This requires a lot of NULLs and requires CHECKs to make sure separate subtypes are not inappropriately "intermingled".
All concrete types in separate tables (excluding the base, which is ACTIVITY in your case), which means common fields and relationships must be repeated in all child tables.
All types in separate tables (including the base). This implementation requires a little more JOINing, but is flexible and clean. It should be your default, unless there are strong reasons against it.

Many highly similar objects in the same database table

Hello, stackoverflow community!
I am working on a rather large database-driven web application. The underlying database is growing in complexity as more components are being added, but so far I've had absolutely no trouble normalizing the data quite nicely.
However, this final component implies a table that can hold products.
Each product has a category, and depending on the category, has different fields.
Making a table for each product category doesn't seem right, as there are currently five types, and they still have quite a lot of fields in common. (but in weird ways - a few general fields such as description and price are common to all 5 categories, but some attributes are shared between 1 and 2, others 3,4,5 and so on).
I'm trying to steer away from the EAV model for obvious performance reasons.
The thing is that according to what product type the user wants to enter into the database there is a somewhat (but not completely) different field structure - all of them have a name and general description, but other attributes such as "area covered" can be applied only to certain categories such as seeds and pesticides, but not fuel, which would have a diesel/gasoline boolean and a bunch of other fuel-related attributes.
Should I just extract the core features in a table, and make another five for each category type? That would be a bit hard to expand in the future.
My current idea would be to have the product table contain all the fields from all the possible categories, and then just have another table to describe which category from the product table has which fields.
product: id | type | name | description | price | composition | area covered | etc.
fields: id | name (contains a list of the fields in the above table)
product-fields: id | product_type | field_id (links a bunch of fields to the product table based on the product type)
I reckon this wouldn't be too slow, easy to search (no need to actually join the other tables, just perform the search on the main product table based on some inputs) and it would facilitate things like form generation and data validation with just one lightweight additional query /join. (fetch a product from the db and join a concatenated list of the fields actually used in a string - split that and display the proper form fields based on what it contains, i.e. the fields actually associated with that product.
Thanks for your trouble!
Andrei Bârsan
EAV can actually be quite good at storing data and fetching that databack again when you know the key. It also excels in it's ability to add fields without changing the schema. But where it's quite poor is when you need the equivilent of WHERE field1 = x and field2 = y.
So while I agree the data behaviour is important (how many products share the same fields, etc), the use of that data is also important.
Which fields need searching, which fields are always just data storage, etc
In most cases I'd suggest keeping all fields that need searching, in combination with each other, in the same table.
In practice this often leads to a single table solution.
New fields require schema changes, new indexes, etc
Potential for sparsely populated data, using more space than is 'required'
Allows simple queries, simple indexing and often the fastest queries
Often, though not always, the space overhead is marginal
Where the sparse-data overheads reach a critical point, I would then head towards additional tables grouped by what fields they contain. More specifically, I would not create tables by product. This is on the dual assumption that most/all fields will be shared across at least some products, and that those fields will need searching.
This gives a schema more like...
Main_table ( PK, Product_Type, Field1, Field2, Field3 )
Geo_table ( PK, county, longitute, latitude )
Value ( PK, cost, sale_price, tax )
etc
You may also have a meta-data table describing which product types have which fields, etc.
What this schema allows is a more densly populated set of tables, which can be easily indexed and so quickly searched, while minimising table clutter and joins by grouping related fields.
In the end, there isn't a true answer, it's all a balancing act. My general rule of thumb is to stay with a single table until I actually have a real and pressing reason not to, not just a theoretical one.
In my experience unless you are writing a a complete framework that can render fully described fields (we are talking about a lot of metadata describing each field) it is not worth separating field definitions from the main object. Modern frameworks (like Grails) allow for virtual zero pain adding a new column to a domain/Model class and table.
If your common field overlap is about 80% between all object types I would put them all in 1 table and use Table per Hierarchy inheritance model, where a descriminator field helps you tell your object types apart. On the other hand if you have 20% overlap of common fields then go with Table per Class inheritance model with base class and table containing common fields. And other joint tables hang off the base.
Should I just extract the core features in a table, and make another five for each category type? That would be a bit hard to expand in the future.
This is called a SuperType - SubType relationship. It works very well if most of your queries are one of two types:
If you will be querying mostly the SupetType table and only drilling down into the SubType table infrequently.
If you will be querying the database after being filtered to a specific SubType.

Preferable database design for job posts

I have two types of job posts -- company job posts and project job posts. The following would be the input fields (on the front-end) for each:
*company*
- company name
- company location
- company description
*project*
- project name
- project location
- project description
- project type
What would be best database design for this -- one table, keeping all fields separate -
`job_post`
- company_name
- company_location
- company_description
- project_name
- project_description
- project_type
- project_location
One table combining the fields -
`job_post`
- name
- location
- description
- project_type
- is_company (i.e., whether a company (True) or project (False)
Or two tables, or something else? Why would that way be preferable over the other ways?
Depending on a lot of factors including the maximum size of this job, I would normalize the data even further than 2 separate tables, perhaps having a company name table, etc... as joining multiple tables results in much faster queries than one long never ending table full of all of your information. What if you want to add more fields to projects but not companies?
In short, I would definitely use multiple tables.
You have identified 3 major objects in your OP; the job, the project, and the company. Each of these objects will have their own attributes, none of which will are associated to the other objects, so I would recommend something akin to (demonstrative only):
job
id
name
company
id
name
project
id
name
link_job_company_project
job_id
company_id
project_id
This type of schema will allow you to add object specific attributes without affecting the other objects, yet combine them all together via the linking table into a 'job post'.
This surely got to do with volume of data stored in the table . In an abstract view one table looks pretty simple . But as Ryan says what if requirements change tomorrow and more columns to be added for one type either company or project. If you are with the prediction that large amount of data top be stored in the table in future even I will prefer 2 tables which will avoid unnecessary filtering of large amount of data which is a performance overhead.