MySQL Schema Advice: Unpredictable Field Additions

MySQL Schema Advice: Unpredictable Field Additions - mysql

A little overview of the problem.
Let's say I have a table named TableA with fixed properties, PropertyA, PropertyB, PropertyC. This has been enough for your own website needs but then you suddenly have clients that want custom fields on your site.
ClientA wants to add PropertyD and PropertyE.
ClientB wants to add PropertyF and PropertyG.
The catch is these clients don't want each others fields. Now imagine if you get more clients, the solution of just adding nullable fields in TableA will be cumbersome and you will end up with a mess of a table. Or at least I assume that's the case feel free to correct me. Is it better if I just do that?
Now I thought of two solutions. I'm asking if there's a better way to do it since I'm not that confident with the trade offs and their future performance.
Proposed Solution #1
data_id is a not exactly a foreign key but it stores whatever corresponding client property is attached to a table A row. Using client_id as the only foreign key present on both the property table and table A.
It feels like it's an anti pattern of some sorts but I could imagine queries will be easy this way but it requires that the developer knows what property table it should pick from. I'm not sure if many tables is a bad thing.
Proposed Solution #2
I believe it's a bit more elegant and can easily add more fields as necessary. Not to mention these are the only tables I would need for everything else. Just to visualize. I will add the request properties in the properties table like so:
Properties
-------------
1 | PropertyD
2 | PropertyE
3 | PropertyF
4 | PropertyG
And whenever I save any data I would tag all properties whenever they are available like so. For this example I want to save a ClientA stored in the Clients table on id 1.
Property_Mapping
--------------------------------------------------------
property_id | table_a_id | property_value | client_id
--------------------------------------------------------
1 | 1 | PROPERTY_D_VALUE | 1
2 | 1 | PROPERTY_E_VALUE | 1
There are obvious possible complexity of query on this one, I'd imagine but it's more a tradeoff. I intended client_id to be placed on property_mapping just in case clients want the same fields. Any advice?

You've discovered the Entity-Attribute-Value antipattern. It's a terrible idea for a relational database. It makes your queries far more complex, and it takes 4-10x the storage space.
I covered some pros and cons of several alternatives in an old answer on Stack Overflow:
How to design a product table for many kinds of product where each product has many parameters
And in a presentation:
Extensible Data Modeling with MySQL
As an example of the trouble EAV causes, consider how you would respond if one of your clients says that PropertyD must be mandatory (i.e. the equivalent of NOT NULL) and PropertyE must be UNIQUE. Meanwhile, the other client says that PropertyG should be restricted to a finite set of values, so you should either use an ENUM data type, or use a foreign key to a table of allowed values.
But you can't implement any of these constraints using your Properties table, because all the values of all the properties are stored in the same column.
You lose features of relational databases when you use this antipattern, such as data types and constraints.

Related

Efficient MySQL structure for linking features to accommodation listing

I'm building an accommodation rental site for a specific town.
It will include, Houses, Resorts, Hotels etc.
I'm looking for advice on how best to link Property Features (Air-Con, Swimming Pool etc.) to individual properties.
I have a table of around 50 Property Features set up as feature_id, feature_category, feature_name.
What would be the best way to store which features relate to which property?
Would a column in the property table (prop_features) containing an array of feature_id be the best way?
The only example I've managed to find and be able to dissect the DB showed the features added as feature_1, feature_2 etc. which seemed really inefficient as some properties may only have feature_1 and feature_49 for example.
Each one was added as a column to the property_table.
I'm new to creating databases from scratch, so I'd be very grateful for any advice on how best to start with this section of my project.
(It's also why I'm not having much luck Googling it, as I'm not sure how to put it in more general terms that might yield me a solution).

One solution would be to have an intermediate table that joins properties to features like so:
CREATE TABLE propertyfeatures (property_id INT, feature_id INT);
If we have a property called Acme Hotel (property id 1) that has air conditioning (feature id 2) and swimming pool (feature id 4), the data would look something like:
property_id | feature_id
1 2
1 4
To retrieve features per property (excluding properties without features) a simple query would be:
SELECT
p.property_name,
f.feature_name,
f.feature_category
FROM property AS p
INNER JOIN propertyfeatures AS pf
ON p.property_id = pf.property_id
INNER JOIN features AS f
ON pf.feature_id = f.feature_id
GROUP BY p.property_id
Note: I have made assumptions about table and column names in your existing database. You'd have to adjust the above accordingly.
The only example I've managed to find and be able to dissect the DB showed the features added as feature_1, feature_2 etc. which seemed really inefficient as some properties may only have feature_1 and feature_49 for example. Each one was added as a column to the property_table.
Although this can be done, you're correct in that it's inefficient, or rather, it's awkward to maintain. It's referred to as pivoting because you're changing unique row values into multiple columns. For example, what if a new feature (e.g. Free Wifi) was added? It's not a case of simply inserting a new row of data as it would be with the intermediate table, you'd have to create a new column to support that.
Not only that, but you would still have to define the feature columns manually or dynamically. For reference, take a look at MySQL Pivot Table which demonstrates both manual and dynamic methods.

One simple way would be to add another table to your database having the columns. The keyword to this approach is "junction table", it is pretty basic in database design.
property_identifier | feature_identifier (feature_id in your case)
In this table you can display the connection between the properties and specific features.
So you could say property with property_id 1 has a pool (feature_id: 2) and a nice kitchen (feature_id: 23)
So the table would look like this:
propery_id | feature_id
1 | 2
1 | 23

MySQL Database Design with tags across multiple tables

I am working on some web apps which should all use the same user table. The different applications all need different table designs, so I created one table for each app, with the UserID being a foreign key referring to the user table.
Now I want to add tags to all apps. The tags should be in one table for every app in order to easily query all tags from one user(for searching purposes, the search should be able to find everything tagged with that tag, no matter the app). Personally, I don't think splitting them up into multiple tables would be a good idea, but I am not that into database design so I might be wrong. My current attempt looks something like this:
[tags]
EntryID | UserID | Tag
The thing is that the EntryIDs of course would have to be unique across all app tables with this solution. For the notes app I need something like this:
[notes]
EntryID | UserID | title | content | etc.
For my calendar I have the following table:
[calendar]
EntryID | UserID | name | start | end | etc.
Now I don't know how to manage those EntryIDs. Should I create another table like this
[entries]
EntryID | UserID | type
with type being something like "note" or "calendar", and EntryID being the primary key? And should the type be something like an integer, or a string, or is there a possibility to kind of refer to another table in the type column? And should I then make the EntryIDs in the app tables into foreign keys referring to the entries table?
I put the userID in every table because I think this is going to speed up querying, for example when I need every tag one user has set across all apps. I know normalization usually prohibits this, but I again think that it would very much increase query speed and reduce load for both the MySQL server and my back-end.
I would appreciate every tip for structuring this, and thanks in advance!

You can use inheritance, similar to this:
I'm not sure what the role of the user is supposed to be here, exactly. In the model above, user "owns" an entry and (presumably) tags it. If you want multiple users to (be able to) tag the same entry, USER would need to be connected to the junctions table TAG_ENTITY.
For more on how to physically implement inheritance, see here.
You may also be interested in this and this.

If a database table is expected to grow more columns later, then is it good design to keep those columns as rows of another table?

I have a database for a device and the columns are like this:
DeviceID | DeviceParameter1 | DeviceParameter2
At this stage I need only these parameters, but maybe a few months down the line, I may need a few more devices which have more parameters, so I'll have to add DeviceParameter3 etc as columns.
A friend suggested that I keep the parameters as rows in another table (ParamCol) like this:
Column | ColumnNumber
---------------------------------
DeviceParameter1 | 1
DeviceParameter2 | 2
DeviceParameter3 | 3
and then refer to the columns like this:
DeviceID | ColumnNumber <- this is from the ParamCol table
---------------------------------------------------
switchA | 1
switchA | 2
routerB | 1
routerB | 2
routerC | 3
He says that for 3NF, when we expect a table whose columns may increase dynamically, it's better to keep the columns as rows. I don't believe him.
In your opinion, is this really the best way to handle a situation where the columns may increase or is there a better way to design a database for such a situation?

This is a "generic data model" question - if you google the term you'll find quite a bit of material on the net.
Here is my view: if and only if the parameters are NOT qualitatively different from the application perspective, then go with the dynamic row solution (i.e. a generic data model). What does qualitatively mean - it means that within your application you don't treat Parameter3 any different to Parameter17.
You should never ever generate new columns on-the-fly, that's a very bad idea. If the columns are qualitatively different and you want to be able to cater for new ones, then you could have a different Device Parameter table for each different category of parameters. The idea is to avoid dynamic SQL as much as possible as it brings a set of its own problems.

Adding dynamic column is a bad idea, Actually it's a bad design. I would agree with your second option , Adding rows is OK,
Because if you want to add dynamically grow the columns then you have to provide them a default value, also you will not be able to use them as 'UNIQUE' vals, you will find really hard while updating the tables, So better to stick with adding 'ROWS' plan.

Access Custom Primary Keys

eI'd like to create custom primary keys in my Access database.
The database is going to be multi-user, so I need a method that ensures each key is unique even when multiple users are trying to add new records to the same tables.
The reason I need to create custom primary keys is because my database starts off an audit trail that goes in to another, external system that I have no control over.
This other system does however allow the use of a single 12-character length user-defined field for us to pass data of our choice through.
I'd like to use that user-defined field to record a 12-character code that has various abbreviations I can extrapolate later (e.g. first 2 characters relate to a department in our organisation, next 3 characters relate to a product and so on...)
From the reading I've done so far, custom keys in Access seems to be something of a minefield.
For my purposes though, I can kind of see at least a compromise in combining Access' autonumber field to essentially help build the primary key I want.
Here's what I was thinking:
The parts of the code that I would want to extrapolate later can be built by our users, so for example, if the Department was Human Resources, the first 2 characters could always be "HR".
Then lets say I let the AutoNumber in access run for a field in the same table in which my "HR" entry was populated... could I get a third field to automatically concatenate the 2 in the same table (not query)...? i.e. like this:
| Department | AutoNumber | CustomPrimaryKey |
| HR | 1 | HR1 |
If that's something that can be done on some event in VBA, then that would be great (show me the code! :))
The second part would be whether I can get the autonumber to concatenate with leading zeros ensuring the "unique number" part of the custom primary key was between 99999 and 00001, i.e. occupying the same 5 character space like this:
| Department | AutoNumber | CustomPrimaryKey |
| HR | 1 | HR00001 |
| HR | 2 | HR00002 |
It is highly unlikely that I would need more than 100,000 entries.
I hope this is possible and safe!

I'd rather leave this as a comment than an answer as I don't think you're totally clear on what you need, but I'll try to answer as best as possible. Also, I'm not going to "Show you the code!" as you suggest as it teaches nothing.
In the first question of automatically concatenating the third field, it's really a question of how the fields are being populated.
If it's through form input, then you can concatenate all of the component fields into the key field during the update events of the controls those component fields are being populated. In VBA you can easily reference members of the record by accessing the form's recordset.
If you're populating the field through a file import where you already have import specs, then you would perform the import excluding your key field, then open the recordset of the table where you imported and iterate through the recordset. You can learn about ADO recordsets here. Again, I'm not just going to write the code because I don't really know what you need this for.
If you're populating the field through your own parser than I probably don't have to explain how to do this.
To your second question, you can easily right align a number in a string using the format() function. For example format(2,"00000") would yield "00002" and format(210,"0000") would yield "0210". You can also make the number of 0s in which you want to align variable using the string() function. For example format(2054,string(12-len("HR"),"0")) would give you "0000002054"
One additional note that I would leave you on is that it's never a good idea to say something like "It is highly unlikely that I would..." and not prepare for it. Murphy's Law is a pain in the B. You should consider handing conditions where you exceed the limit that your key can handle.

Which way is better to implement custom fields in a web application

I have a self made web application in PHP and MySQL. The many different clients using my system would like to augment entities with custom fields. Each client would like to store their own additional data, so this should be done in a dynamic way.
For example client1 would like to add the "color" property to their products, client2 want a field called "safety_level" for their products.
I want a methodology that can be applied not only for products but for users and for any other entities as well.
Here are 2 options I found the optimal, but can't decide which one is the most effective:
OPTION 1:
For every entity I make a [entityname]_customfields table in which I store the additional field values in 1:1.
e.g.:
+---------------------------------------------+
|products_custom_fields |
+---------------------------------------------+
|product_id (PK and FK related to products.id)|
|safety_level |
|some_other_fields |
+---------------------------------------------+
pro: this table can has no more records than the entity table (in this case the products) which means fewer records and it is quite easy to overview.
con: adding new fields or deleting old ones require DDL queries. I don't want to confide DDL to users...not even operators with admin permissions.
OPTION 2:
[entity]_custom_field_values will have N:1 relations to [entity] table. Each row contains the the type of the custom field and the value itself. In this case we need another table which contains the custom field types. e.g.:
custom field values:
+----------------------------------------------------------------------+
|products_custom_field_values |
+----------------------------------------------------------------------+
|custom_field_id |
|custom_field_type (FK product_custom_field_types.custom_field_type_id)|
|value |
+----------------------------------------------------------------------+
custom field types:
+---------------------------------------------------------+
|products_custom_field_types |
+---------------------------------------------------------+
|custom_field_type_id (PK AUTO_INCREMENT) |
|product_id (FK related to products.id) |
+---------------------------------------------------------+
pro: managing fields is easy, does not require to alter table structures
con: more records, all kind of custom field values in a big mess...which is not necessary wrong, because that's the point of MySQL, to extract useful data from a big mess. The question is what about efficiency and performance?

Note: this topic is actually covered in the "SQL Antipatterns", which I strongly recommend you read
I am a lazy person, which means that I tend to apply YANGI to my code. So this is my approach.
So. Let's assume that there are two groups of products:
ProductFoo ProductBar
- productID - productID
- name - name
- price - price
- color - supply
- weight - manufacturerID
- safety
In this case there are three common elements, that go in the main Products table. And the custom parameters would be stored using table inheritance (it's a thing, google it). So, basically you would end up with three tables: Products, ProductsFoo and ProductsBar, where Products table has a "type" field and both of the "child tables" would have a productID foreign key, that's pointing to its parent table.
That's if you know at the development time, what "custom fields" each client will want.
Now, lets assume clients are being difficult and want make up custom fields whenever they feel like it.
In this case I would simply create a Products.data fields, which contains a JSON with all the custom attributes for each product. And only "extract" special attributes to an inheriting table, when client wants to search by that custom attribute (there is not sane way to index JSON, if clients want to search by their new "wallywanker" attribute).
You end up with same basic structure, but the "sub-tables" only contain the attributes, that are expected to be searchable
I hope this made sense.

If its is a company project, follow the standards followed on previous projects.
Have a look at conventions such as Hungarian notation, that would make more sense than repeating a prefix. Also it is more likely your model name is your table name.
Also if you are planning to use ORM they might have some best practices as well.
https://en.wikipedia.org/wiki/Hungarian_notation

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008