Many to many relationships with large amount of different tables - mysql

I am having trouble developing a piece of my database schema. Currently, my app has a table of users, and a another table of events. I can easily set up a many to many relationship (using a third table) to hold information about which users are attending which events.
My problem is that events is just one feature of my app. The goal is to have a large number of different programs a user can take part in, and each will need its own table. Yet I still need to be able to call up a list of everything the user is signed up for.
Right now, I am thinking about just making one way relationships from each event table back to the user. I would then need to create a custom function (in my websites ORM) that queries each table independently and assembles a full list. I feel like this would be slow, so I am also entertaining the idea of creating a separate table that just list all the programs that users sign up for, and storing in there the info needed for my app to function. This would repeat info in my database, and in general doesn't sound as "clean", but probably would be faster.
Any suggestions as to the best way to handle relationships like this?
P.S. If it matters, I'm using Doctrine2 & Symfony2 to power my site.

In one of my web applications, I have used a this kind of construct for storing comments for any table that has integer as primary key:
CREATE TABLE Comments (
Table VARCHAR(24) NOT NULL,
RowID BIGINT NOT NULL,
Comments VARCHAR(2000) NOT NULL,
PRIMARY KEY (TABLE, RowID, COMMENTS)
);
In my case (DB2, less than 10 million rows in Comments table) it performs well.
So, applying it to your case:
CREATE TABLE Registration (
Table VARCHAR(24) NOT NULL,
RowID BIGINT NOT NULL,
User <datatype> NOT NULL,
Signup TIMESTAMP NOT NULL,
PRIMARY KEY (TABLE, RowID, User)
);
So, the 'Table' column identifies the table containing the program (say, 'Events' table). 'RowID' is the primary key in that table (e.g. PK of an entry in 'Events' table). To perform well, this requres the primary key being of same datatype in all target tables.
NoSQL solutions are cool, but pattern above works in plain old relational database.

What is unique about these event types that requires them to have their own table?
If the objects are so inherently different, make the object as simple as possible with only those things common to all Events:...
public Event
{
public Guid Id;
public string Title;
public DateTime Date;
public string Type;
public string TypeSpecificData; // serialized JSON/XML
}
// Not derived from Event, but built from it.
public SpecialEventType
{
public Guid Id;
// ... and the other common props from Event
// some kind of special prop parsed from the Event's serialized data
public string SpecialField;
}
The "type specific data" could then be used to store details about events that are not in common (that would normally require columns or new tables)... do it something like serialized XML or JSON
Map the table MTM to your Users table, and query by the basic event properties and its type.
Your code is then responsible for parsing the data using its Type property and some predefined XML schema you associate with it.
Very simple, keeps your database nice and clean, and fast, minimizes round trips. The tradeoff here is that you don't have the ability to query the DB for the specifics of a certain Event type... but for large scaling applications, with mature ORM layers, the performance tradeoff is worth it alone...
For example, now you query your data once for Events of a particular Type, build your pseudo-derived types from it, and then "query" them using LINQ.

Unless you have a ridiculous amount of types of events, querying events a user is signed up for from a few tables should not be much slower than querying the same thing from one long table of all the events.
I would take this approach, each table or collection has a user_id field which maps back to the Users table. You dont need to really create a separate function in the ORM. If each of the event types inherit from an event class, then you can just find all events by user_id.

Related

Database Structure for Inconsistent Data

I am creating a database for my company that will store many different types of information. The categories are Brightness, Contrast, Chromaticity, ect. Each category has a number of data points which my company would like to start storing.
Normally, I would create a table for each category which would store the corresponding data. (This is how I learned to do it). However, Sometimes these categories have "sub-data" which would change the number of fields required in each table.
My question is then how do people handle the inconsistency of data when structuring their databases? Do they just keep adding more tables for extra data or is it something else altogether?
There are a few (and thank goodness only a few) unbendable rules about relational database models. One of those is, that if you don't know what to store, you have a hard time storing it. Chances are, you'll have an even harder time retrieving it.
That said, the reality of business rules is often less clear cut than the ivory tower of database design. Most importantly, you might want or even need a way to introduce a new property without changing the schema.
Here are two feasable ways to go at this:
Use a datastore, that specializes in loose or inexistant schemas
(NoSQL and friends). Explaining this in detail is a subject of a CS
Thesis, not a stackoverflow answer.
My recommendation: Use a separate properties table - here is how
this goes:
Assuming for the sake of argument, your products allways have (unique string) name, (integer) id, brightness, contrast, chromaticity plus sometimes (integer) foo and (string) bar, consider these tables
CREATE TABLE products (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(50) NOT NULL,
brightness INT,
contrast INT,
chromaticity INT,
UNIQUE INDEX(name)
);
CREATE TABLE properties (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(50) NOT NULL,
proptype ENUM('null','int','string') NOT NULL default 'null',
UNIQUE INDEX(name)
);
INSERT INTO properties VALUES
(0,'foo','int'),
(0,'bar','string');
CREATE TABLE product_properties (
id INT PRIMARY KEY AUTO_INCREMENT,
products_id INT NOT NULL,
properties_id INT NOT NULL,
intvalue INT NOT NULL,
stringvalue VARCHAR(250) NOT NULL,
UNIQUE INDEX(products_id,properties_id)
);
now your "standard" properties would be in the products table as usual, while the "optional" properties would be stored in a row of product_properties, that references the product id and property id, with the value being in intvalue or stringvalue.
Selecting products including their foo if any would look like
SELECT
products.*,
product_properties.intvalue AS foo
FROM products
LEFT JOIN product_properties
ON products.id=product_properties.product_id
AND product_properties.property_id=1
or even
SELECT
products.*,
product_properties.intvalue AS foo
FROM products
LEFT JOIN product_properties
ON products.id=product_properties.product_id
LEFT JOIN properties
ON product_properties.property_id=properties.id
WHERE properties.name='foo' OR properties.name IS NULL
Please understand, that this incurs a performance penalty - in fact you trade performance against flexibility: Adding another property is nothing more than INSERTing a row in properties, the schema stays the same.
If you're not mysql bound then other databases have table inheritance or arrays to solve certain of those niche cases. Postgresql is a very nice database that you can use as easily and freely as mysql.
With mysql you could:
change your tables, add the extra columns and allow for NULL in the subcategory data that you don't need. This way integrity can be checked since you can still put constraints on the columns. Unless you really have a lot of subcategory columns this way I'd recommend this, otherwise option 3.
store subcategory data dynamically in a seperate table, that has a category_id,category_row_id,subcategory identifier(=type of subcategory) and a value column: that way you can retrieve your data by linking it via the category_id (determines table) and the category_row_id (links to PK of the original category table row). The bad thing: you can't use foreign keys or constraints properly to enforce integrity, you'd need to write hairy insert/update triggers to still have some control there which would push the burden of integrity checking and referential checking solely on the client. (in which case you'd properly be better of going NoSQL route) In short I wouldn't recommend this.
You can make a seperate subcategory table per category table, columns can be fixed or variable via value column(s) + optional subcategory identifier, foreign keys can still be used, best to maintain integrity is fixed since you'll have the full range of constraints at your disposal. If you have a lot of subcategory columns that would otherwise hopefully clutter your regular subcategory table then I'd recommend using this with fixed columns. Like the previous option I'd never recommend going dynamic for anything but throwaway data.
Alternatively if your subcategory is very variable and volatile: use NoSQL with a document database such as mongodb, mind you that you can keep all your regular data in a proper RDBMS and just storeside-data in the document database though that's probably not recommended.
If your subcategory data is in a known fixed state and not prone to change I'd just add the extra columns to the specific category table. Keep in mind that the major feature of a proper DBMS is safeguarding the integrity of your data via checks and constraints, doing away with that never really is a good idea.
If you are not limited to MySQL, you can consider Microsoft SQL server and using Sparse Columns This will allow you to expand your schema to include however many columns you want, without incurring the storage penalty for columns that are not pertinent for a given row.

Normalizing MySQL data

I'm new to MySQL, and just learned about the importance of data normalization. My database has a simple structure:
I have 1 table called users with fields:
userName (string)
userEmail (string)
password (string)
requests (an array of dictionaries in JSON string format)
data (another array of dictionaries in JSON string format)
deviceID (string)
Right now, this is my structure. Being very new to MySQL, I'm really not seeing why my above structure is a bad idea? Why would I need to normalize this and make separate tables? That's the first question-why? (Some have also said not to put JSON in my table. Why or why not?)
The second question is how? With the above structure, how many tables should I have, and what would be in each table?
Edit:
So maybe normalization is not absolutely necessary here, but maybe there's a better way to implement my data field? The data field is an array of dictionaries: each dictionary is just a note item with a few keys (title, author, date, body). So what I do now is, which I think might be inefficient, every time a user composes a new note, I send that note from my app to PHP to handle. I get the JSON array of dictionaries already part of that user's data, I convert it to a PHP array, I then add to the end of this array the new note, convert the whole thing back to JSON, and put it back in the table as an array of dictionaries. And this process is repeated every time a new note is composed. Is there a better way to do this? Maybe a user's data should be a table, with each row being a note-but I'm not really sure how this would work?
The answer to all your questions really depends on what the JSON data is for, and whether you'll ever need to use some property of that data to determine which rows are returned.
If your data truly has no schema, and you're really just using it to store data that will be used by an application that knows how to retrieve the correct row by some other criteria (such as one of the other fields) every time, there's no reason to store it as anything other than exactly as that application expects it (in this case, JSON).
If the JSON data DOES contain some structure that is the same for all entries, and if it's useful to query this data directly from the database, you would want to create one or more tables (or maybe just some more fields) to hold this data.
As a practical example of this, if the data fields contains JSON enumerating services for that user in an array, and each service has a unique id, type, and price, you might want a separate table with the following fields (using your own naming conventions):
serviceId (integer)
userName (string)
serviceType (string)
servicePrice (float)
And each service for that user would get it's own entry. You could then query for users than have a particular service, which depending on your needs, could be very useful. In addition to easy querying, indexing certain fields of the separate tables can also make for very QUICK queries.
Update: Based on your explanation of the data stored, and the way you use it, you probably do want it normalized. Something like the following:
# user table
userId (integer, auto-incrementing)
userName (string)
userEmail (string)
password (string)
deviceID (string)
# note table
noteId (integer, auto-incrementing)
userId (integer, matches user.userId)
noteTime (datetime)
noteData (string, possibly split into separate fields depending on content, such as subject, etC)
# request table
requestId (integer, auto-incrementing)
userId (integer, matches user.userId)
requestTime (datetime)
requestData (string, again split as needed)
You could then query like so:
# Get a user
SELECT * FROM user WHERE userId = '123';
SELECT * FROM user WHERE userNAme = 'foo';
# Get all requests for a user
SELECT * FROM request WHERE userId = '123';
# Get a single request
SELECT * FROM request WHERE requestId = '325325';
# Get all notes for a user
SELECT * FROM note WHERE userId = '123';
# Get all notes from last week
SELECT * FROM note WHERE userId = '123' AND noteTime > CURDATE() - INTERVAL 1 WEEK;
# Add a note to user 123
INSERT INTO note (noteId, userId, noteData) VALUES (null, 123, 'This is a note');
Notice how much more you can do with normalized data, and how easy it is? It's trivial to locate, update, append, or delete any specific component.
Normalization is a philosophy. Some people think it fits their database approach, some don't. Many modern database solutions even focus on denormalization to improve speeds.
Normalization often doesn't improve speed. However, it greatly improves the simplicity of accessing and writing data. For example, if you wanted to add a request, you would have to write a completely new JSON field. If it was normalized, you could simply add a row to a table.
In normalization, "array of dictionaries in JSON string format" is always bad. Array of dictionaries can be translated as list of rows, which is a table.
If you're new to databases: NORMALIZE. Denormalization is something for professionals.
A main benefit of normalization is to eliminate redundant data, but since each user's data is unique to that user, there is no benefit to splitting this table and normalizing. Furthermore, since the front-end will employ the dictionaries as JSON objects anyway, undue complication and a decrease in performance would result from trying to decompose this data.
Okay, here is a normalized mySQL data-model. Note: you can separate authors and titles into two tables to further reduce data redundancy. You can probably use similar techniques for the "requests dictionaries":
CREATE TABLE USERS(
UID int NOT NULL AUTO_INCREMENT PRIMARY KEY,
userName varchar(255) UNIQUE,
password varchar(30),
userEmail varchar(255) UNIQUE,
deviceID varchar(255)
) ENGINE=InnoDB;
CREATE TABLE BOOKS(
BKID int NOT NULL AUTO_INCREMENT PRIMARY KEY,
FKUSERS int,
Title varchar(255),
Author varchar(50)
) ENGINE=InnoDB;
ALTER TABLE BOOKS
ADD FOREIGN KEY (FKUSERS)
REFERENCES USERS(UID);
CREATE TABLE NOTES(
ID int NOT NULL AUTO_INCREMENT PRIMARY KEY,
FKUSERS int,
FKBOOKS int,
Date date,
Notes text
) ENGINE=InnoDB;
ALTER TABLE NOTES
ADD FOREIGN KEY BKNO (FKUSERS)
REFERENCES USERS(UID);
ALTER TABLE NOTES
ADD FOREIGN KEY (FKBOOKS)
REFERENCES BOOKS(BKID);
In your case, I will abstract out the class that handles this table. Then keep the data normalized. if in future, the data access patterns changes and i need to normalized the data, i css just do so with less impact on the program. I just need to change the class that handles this set of data to query the normalized tables , but return the data as if the database structure never changed.

Table design and class hierarchies

Hopefully someone can shed some light on this issue through either an example, or perhaps some suggested reading. I'm wondering what is the best design approach for modeling tables after their class hierarchy equivalencies. This can best be described through an example:
abstract class Card{
private $_name = '';
private $_text = '';
}
class MtgCard extends Card{
private $_manaCost = '';
private $_power = 0;
private $_toughness = 0;
private $_loyalty = 0;
}
class PokemonCard extends Card{
private $_energyType = '';
private $_hp = 0;
private $_retreatCost = 0;
}
Now, when modeling tables to synchronize with this class hierarchy, I've gone with something very similar:
TABLE Card
id INT, AUTO_INCREMENT, PK
name VARCHAR(255)
text TEXT
TABLE MtgCard
id INT, AUTO_INCREMENT, PK
card_id INT, FK(card.id)
manacost VARCHAR(32)
power INT
toughness INT
loyalty INT
TABLE PokemonCard
id INT, AUTO_INCREMENT, PK
card_id INT, FK(card.id)
hp INT
energytype ENUM(...)
retreatcost INT
The problem I'm having is trying to figure out how to associate each Card record with the record containing it's details from the corresponding table. Specifically, how to determine what table I should be looking in.
Should I add a VARCHAR column to Card to hold the name of the associated table? That's the only resolution that my peers and I have come to, but it seems too "dirty". Keeping the design extensible is the key here, allowing for the easy addition of new subclasses.
If someone could provide an example or resources showing a clean way of mirroring class/table hierarchies, it would be most appreciated.
Google "generalization specialization relational modeling". You'll find several excellent articles on the subject of how to model the gen-spec pattern using relational tables. This same question has been asked many times in SO, with slightly different details.
The best of these articles will confirm your decision to have one table for generalized data and separate tables for specialized data. The biggest difference will be the way they recommend using primary and foreign keys. Basically, they recommend that specialized tables have a single column that does double duty. It serves as the primary key to the specialized table, but it's also a foreign key that duplicates the PK of the generalized table.
This is a little complicated to maintain, but it's very sweet at join time.
Also keep in mind that DDL is required when a new class is added to the hierarchy.
Basically don't.
Forget about class hierarchies, storage models, and anything that is specific to your app and your particular app language. Unless you want to use the RDb as a mere storage location for your files, a dependent slave.
If you want the power and flexibility (specifically extensibility) of the relational Database, then you need to model it independent of any app, and using RDb principles, not app language requirements. Leave your app context behind for a while and design the database as a database. Learn about them. Normalise (eliminate all duplication). Learn about the structures and rules, and implement them. When you do that, your queries and your "mapping", will be effortless. There will be no "impedance". Use the correct datatypes and there will be no mismatch.
The structure you require is an ordinary subtype-supertype. Those are Relational Database terms that have been in existence for over 30 years in the RM, and over 23 years in Relational Database products. No need to call them funny new names. Wikipedia is not an academic reference.
Given your tables, which are quite correct as a starting point (you've Normalised automatically), you need:
Rename Card.Id as Card.CardId
Remove the ids for the subtypes, they are 100% redundant; the CardId is both the PK and the FK.
Add a discriminator Card.CardType CHAR(1) or TINYINT. This will identify which subtype to join with, when the CardType is not known.
It appears you do not fully understand the concept of Foreign Keys, so that would be good to gear up on first. It is implemented here in its simple, ordinary form:
ALTER TABLE MtgCard
ADD CONSTRAINT Card_MtgCard_fk
FOREIGN KEY (CardId)
REFERENCES Card(CardId)
The relation between Card and MtgCard or PokemonCard is always 1::1. The supertype is complete only when there is a Card plus { MtgCard | PokemonCard } with the same CardId. In your case there can be only one subtype, easy to enforce with a simple CHECK constraint.
In other cases, more than one subtype is quite legal.
The subtypes there are Person Is a Teacher or Person Is a Student
In Relational Databases there is no concept of joining "from" or "to" (or up/down or left/right), those notions are only there to assist us humans; you can start with any table/key you have, and go to any table you need. The tables in-between are demanded only in the absence of Relational Identifiers (ie. where additional Surrogates, ID columns, are used as PKs instead of meaningful natural keys).
In the example, using your terms, you can go straight from Enrollment to Person (eg, to grab the LastName) or to Course (to grab the Name) without having to visit the intermediate tables; the relation lines are solid.
.
Now, class hierarchies ("Is" or "Is a") and anything else, are simple and effortless.
Quick Reference to Standard Relational Database Diagrams.

How to store data with dynamic number of attributes in a database

I have a number of different objects with a varying number of attributes. Until now I have saved the data in XML files which easily allow for an ever changing number of attributes. But I am trying to move it to a database.
What would be your preferred way to store this data?
A few strategies I have identified so far:
Having one single field named "attributes" in the object's table and store the data serialized or json'ed in there.
Storing the data in two tables (objects, attributes) and using a third to save the relations, making it a true n:m relation. Very clean solution, but possibly very expensive to fetch an entire object and all its attributes
Identifying attributes all objects have in common and creating fields for these to the object's table. Store the remaining attributes as serialized data in another field. This has an advantage over the first strategy, making searches easier.
Any ideas?
If you ever plan on searching for specific attributes, it's a bad idea to serialize them into a single column, since you'll have to use per-row functions to get the information out - this rarely scales well.
I would opt for your second choice. Have a list of attributes in an attribute table, the objects in their own table, and a many-to-many relationship table called object attributes.
For example:
objects:
object_id integer
object_name varchar(20)
primary key (object_id)
attributes:
attr_id integer
attr_name varchar(20)
primary key (attr_id)
object_attributes:
object_id integer references (objects.object_id)
attr_id integer references (attributes.attr_id)
oa_value varchar(20)
primary key (object_id,attr_id)
Your concern about performance is noted but, in my experience, it's always more costly to split a column than to combine multiple columns. If it turns out that there are performance problems, it's perfectly acceptable to break 3NF for performance reasons.
In that case I would store it the same way but also have a column with the raw serialized data. Provided you use insert/update triggers to keep the columnar and combined data in sync, you won't have any problems. But you shouldn't worry about that until an actual problem surfaces.
By using those triggers, you minimize the work required to only when the data changes. By trying to extract sub-column information, you do unnecessary work on every select.
A variation on your 2d solution is just two tables (assuming all attributes are of a single type):
T1: |Object data columns|Object_id|
T2: |Object id|attribute_name|attribute value| (unique index on first 2 columns)
This is even more efficient when combined with 3rd solution, e.g. all of the common fields go into T1.
Sstuffing >1 attribute into the same blob is no recommended - you can not filter by attributes, you can not efficiently update them
Let me give some concreteness to what DVK was saying.
Assuming values are of same type the table would look like (good luck, I feel you're going to need it):
dynamic_attribute_table
------------------------
id NUMBER
key VARCHAR
value SOMETYPE?
example (cars):
|id| key | value |
---------------------------
| 1|'Make' |'Ford' |
| 1|'Model' |'Edge' |
| 1|'Color' |'Blue' |
| 2|'Make' |'Chevrolet'|
| 2|'Model' |'Malibu' |
| 2|'MaxSpeed'|'110mph' |
Thus,
entity 1 = { ('Make', 'Ford'), ('Model', 'Edge'), ('Color', 'Blue') }
and,
entity 2 = { ('Make', 'Chevrolet'), ('Model', 'Malibu'), ('MaxSpeed', '110mph') }.
If you are using a relational db, then I think you did a good job listing the options. They each have their pros and cons. YOU are in the best position to decide what works best for your circumstances.
The serialized approach is probably the fastest (depending on your code for de-serializing), but it means that you won't be able to query the data with SQL. If you say that you don't need to query the data with SQL, then I agree with #longneck, maybe you should use a key/value style db instead of a relational db.
EDIT - reading more of your comments, WHY are you switching to a db if speed is your main concern. What's BAD about your current XML implementation?
I used to implement this scheme:
t_class (id RAW(16), parent RAW(16)) -- holds class hierachy.
t_property (class RAW(16), property VARCHAR) -- holds class members.
t_declaration (id RAW(16), class RAW(16)) -- hold GUIDs and types of all class instances
t_instance (id RAW(16), class RAW(16), property VARCHAR2(100), textvalue VARCHAR2(200), intvalue INT, doublevalue DOUBLE, datevalue DATE) -- holds 'common' properties
t_class1 (id RAW(16), amount DOUBLE, source RAW(16), destination RAW(16)) -- holds 'fast' properties for class1.
t_class2 (id RAW(16), comment VARCHAR2(200)) -- holds 'fast' properties for class2
--- etc.
RAW(16) is where Oracle holds GUIDs
If you want to select all properties for an object, you issue:
SELECT i.*
FROM (
SELECT id
FROM t_class
START WITH
id = (SELECT class FROM t_declaration WHERE id = :object_id)
CONNECT BY
parent = PRIOR id
) c
JOIN property p
ON p.class = c.id
LEFT JOIN
t_instance i
ON i.id = :object_id
AND i.class = p.class
AND i.property = p.property
t_property hold stuff you normally don't search on (like, text descriptions etc.)
Fast properties are in fact normal tables you have in the database, to make the queries efficient. They hold values only for the instances of a certain class or its descendants. This is to avoid extra joins.
You don't have to use fast tables and limit all your data to these four tables.
sounds like you need something lick couchdb, not an RDBMS.
if you are going to edit/manipulate/delete the attributes in later point, making a true n:m (second option) will be the one which I go for. (Or try to make it 2 table where the same attribute repeats.But data size will be high)
If you are not dealing with attributes(just capturing and showing the data) then you can go ahead and store in one field with some separator(Make sure the separator wont occur in the attribute value)
I am assuming you do not have digital attribute soup, but that there is some order to your data.
Otherwise, an RDBMS might not be the best fit. Something along NO SQL might work better.
If your objects are of different types, you should generally have one table per type.
Especially if you want to connect them using primary keys. It also helps to bring order and sanity if you have Products, Orders, Customers, etc tables, instead of just an Object and Attribute table.
Then look at your attributes. Anything that exists more than, say for 50% of the objects in that type category, make it a column in the object's table and use null when it's not being used.
Anything that is mandatory, should, of course, be defined as a NOT NULL column.
The rest, you can either have one or several "extra attributes" tables for.
You could put the attribute names into the table with the values, or normalize them out in a separate table and only use the primary key in the value table.
You may also find that you have combinations of data. For instance, a variant of an object type always has a certain set of attributes while another variant of the same object type has another set of attributes.
In that case, you might want to do something like:
MainObjectTable:
mainObjectId: PRIMARY KEY
columns...
MainObjectVariant1Table:
mainObjectId: FOREIGN KEY TO MainObjectTable
variant1Columns...
MainObjectVariant2Table:
mainObjectId: FOREIGN KEY TO MainObjectTable
variant2Columns...
I think the hard work, that will pay off, in the long run, is to analyze the data, find the objects and the commonly used attributes and make it into a good "object/ERD/DB" model.

Implementing custom fields with ALTER TABLE

We are currently thinking about different ways to implement custom fields for our web application. Users should be able to define custom fields for certain entities and fill in/view this data (and possibly query the data later on).
I understand that there are different ways to implement custom fields (e.g. using a name/value table or using alter table etc.) and we are currently favoring using ALTER TABLE to dynamically add new user fields to the database.
After browsing through other related SO topics, I couldn't find any big drawbacks of this solution. In contrast, having the option to query the data in fast way (e.g. by directly using SQL's where statement) is a big advantage for us.
Are there any drawbacks you could think of by implementing custom fields this way? We are talking about a web application that is used by up to 100 users at the same time (not concurrent requests..) and can use both MySQL and MS SQL Server databases.
Just as an update, we decided to add new columns via ALTER TABLE to the existing database table to implement custom fields. After some research and tests, this looks like the best solution for most database engines. A separate table with meta information about the custom fields provides the needed information to manage, query and work with the custom fields.
The first drawback I see is that you need to grant your application service with ALTER rights.
This implies that your security model needs careful attention as the application will be able to not only add fields but to drop and rename them as well and create some tables (at least for MySQL).
Secondly, how would you distinct fields that are required per user? Or can the fields created by user A be accessed by user B?
Note that the cardinality of the columns may also significantly grow. If every user adds 2 fields, we are already talking about 200 fields.
Personally, I would use one of the two approaches or a mix of them:
Using a serialized field
I would add one text field to the table in which I would store a serialized dictionary or dictionaries:
{
user_1: {key1: val1, key2, val2,...},
user_2: {key1: val1, key2, val2,...},
...
}
The drawback is that the values are not easily searchable.
Using a multi-type name/value table
fields table:
user_id: int
field_name: varchar(100)
type: enum('INT', 'REAL', 'STRING')
values table:
field_id: int
row_id: int # the main table row id
int_value: int
float_value: float
text_value: text
Of course, it requires a join and is a bit more complicated to implement but far more generic and, if indexed properly, quite efficient.
I see nothing wrong with adding new custom fields to the database table.
With this approach, the specific/most appropriate type can be used i.e. need an int field? define it as int. Whereas with a name/value type table, you'd be storing multiple data types as one type (nvarchar probably) - unless you complete that name/value table with multiple columns of different types and populate the appropriate one but that is a bit horrible.
Also, adding new columns makes it easier to query/no need to involve a join to a new name/value table.
It may not feel as generic, but I feel that's better than having a "one-size fits all" name/value table.
From an SQL Server point of view (2005 onwards)....
An alternative, would be to store create 1 "custom data" field of type XML - this would be truly generic and require no field creation or the need for a separate name/value table. Also has the benefit that not all records have to have the same custom data (i.e. the one field is common, but what it contains doesn't have to be). Not 100% on the performance impact but XML data can be indexed.