Handling custom user fields with possibility of grow and shrink - mysql

This question is much about how to do, idea etc.
I have a situation where a user can create as many custom fields as he can of type Number, Text, or Date, and use this to make a form. I have to make/design some table model which can handle and store the value so that query can be done on these values once saved.
Previously I have hard coded the format for 25 user defined fields (UDF). I make a table with 25 column with 10 Number, 10 Text, and 5 Date type and store the label in it if a user makes use of any field. Then map it to other table which has same format and store the value. Mapping is done to know which field is having what label but this is not an efficient way, I hope.
Any suggestion would be appreciated.
Users have permissions for creating any number of UDF of the above types. then it can be used to make forms again this is also N numbers and have to save the data for each form types.
e.g. let's say a user created 10 number 10 date and 10 text fields used first 5 of each to make form1 and all 10 to make form2 now saved the data.
My thoughts on it:
Make a table1 with [id,name(as UDF_xxx where xxx is data type),UserLabel ]
table2 to map form and table1 [id(f_key table1_id), F_id(form id)]
and make 1 table of each data type as [ id(f_key of table1),F_id(form number),R_id(row id for data, would be same for all data type),value]
Thanks to all I'm going to implement, it both DataSet entry and json approach looks good as it gives wider extension-ability. Still I've to figure out which will best fit with the existing format.

There are two approaches I have used.
XML: To create a dynamic user attribute, you may use XML. This XML will be stores in a clob column - say, user_attributes. Store the entire user-data in XML key-value pair, with type as an attribute or another field. This will give you maximum freedom. You can use XOM or any other XML object Model API to display or operate on the data. A typical Node will look like
<userdata>
...
...
<datanode>
<key type="Date">User Birth</key>
<value>1994-02-25</value>
</datanode>
...
</userdata>
Attribute-AttributeValue This is same thing as above but using tables. What you do is you create a table -- attributes with FK as user_id, another table attribute_values with FK as attribute_id. attributes contains multiple field-names and types for each user and attribute_values contains values of those attributes. so basically,
users
user_id
attributes
attr_id
user_id (FK)
attr_type
attr_name
attribute_values
attr_val_id
attr_id (FK)
attr_val
If you see in both the approached you are not limited by how-many or what type of data you have. But there is a down-side of this is parsing. In either of the case, you will have to to do a small amount of processing to display or analyze the data.
The best of both worlds (having rigid column structure vs having completely dynamic data) approach is to have a users table with must-have columns (like user_name, age, sex, address etc) and have user-created data (like favorite pet house etc.) in either XML or attribute-attribute_value.

What do you want to achieve?
A table per form permutation or might each dataset consist of different sets?
Two possibilities pop into my mind:
Create a table that describes one field of a dataset, i.e. the key might be dataset id + field id and additional columns could contain the value stored as a string and the type of that value (i.e. number, string, boolean, etc.).
That way each dataset might be different but upon reading a dataset and storing it into an object you could create the appropriate value types (Integer, Double, String, Boolean etc.)
Create a table per form, using some naimg convention. When the form layout is changed, execute ALTER TABLE statements to add, remove, rename columns or change their type.
When the user changes the type of a column or deletes it, you might need to either deny that if the values are not null or at least ask the user if she's willing to drop values that don't match the new requirements.
Edit: Example for approach 1
Table UDF //describes the available fields--------
id (PK)
user_id (FK)
type
name
Table FORM //describes a form's general attributes--------
id (PK)
user_id (FK)
name
description
Table FORM_LAYOUT //describes a form's field layout--------
form_id (FK)
udf_id (FK)
mapping //mapping info like column index, form field name etc.
Table DATASET_ENTRY //describes one entry of a dataset, i.e. the value of one UDF in
--------
id (PK)
row_id
form_id (FK)
udf_id (FK)
value
Selecting the content for a specific form might then be done like this:
SELECT e.value, f.type, l.mapping from DATASET_ENTRY e
JOIN UDF f ON e.udf_id = f.id
JOIN FORM_LAYOUT l ON e.form_id = l.form_id AND e.udf_id = l.udf_id
WHERE e.row_id = ? AND e.form_id = ?

Create a table which manages which fields exist. Then create tables for each data type you want to support, where the user will their values into.
create table Fields(
fieldid int not null,
fieldname text not null,
fieldtype int not null
);
create table FieldDate
(
ValueId int not null,
fieldid int not null,
value date
);
create table FieldNumber
(
ValueId int not null,
fieldid int not null,
value number
);
..
Another possibility would be to use ALTER TABLE to create custom fields. If your application has the rights to perform this command and the custom fields are changing very rarely this would be the option I chose.

Related

How to store a data whose type can be numeric, date or string in mysql

We're developing a monitoring system. In our system values are reported by agents running on different servers. This observations reported can be values like:
A numeric value. e.g. "CPU USAGE" = 55. Meaning 55% of the CPU is in
use).
Certain event was fired. e.g. "Backup completed".
Status: e.g. SQL Server is offline.
We want to store this observations (which are not know in advance and will be added dynamically to the system without recompiling).
We are considering adding different columns to the observations table like this:
IntMeasure -> INTEGER
FloatMeasure -> FLOAT
Status -> varchar(255)
So if the value we whish to store is a number we can use IntMeasure or FloatMeasure according to the type. If the value is a status we can store the status literal string (or a status id if we decide to add a Statuses(id, name) table).
We suppose it's possible to have a more correct design but would probably become to slow and dark due to joins and dynamic table names depending on types? How would a join work if we can't specify the tables in advance in the query?
I haven't done a formal study, but from my own experience I would guess that more than 80% of database design flaws are generated from designing with performance as the most important (if not only) consideration.
If a good design calls for multiple tables, create multiple tables. Don't automatically assume that joins are something to be avoided. They are rarely the true cause of performance problems.
The primary consideration, first and foremost in all stages of database design, is data integrity. "The answer may not always be correct, but we can get it to you very quickly" is not a goal any shop should be working toward. Once data integrity has been locked down, if performance ever becomes an issue, it can be addressed. Don't sacrifice data integrity, especially to solve problems that may not exist.
With that in mind, look at what you need. You have observations you need to store. These observations can vary in the number and types of attributes and can be things like the value of a measurement, the notification of an event and the change of a status, among others and with the possibility of future observations being added.
This would appear to fit into a standard "type/subtype" pattern, with the "Observation" entry being the type and each type or kind of observation being the subtype, and suggests some form of type indicator field such as:
create table Observations(
...,
ObservationKind char( 1 ) check( ObservationKind in( 'M', 'E', 'S' )),
...
);
But hardcoding a list like this in a check constraint has a very low maintainability level. It becomes part of the schema and can be altered only with DDL statements. Not something your DBA is going to look forward to.
So have the kinds of observations in their own lookup table:
ID Name Meaning
== =========== =======
M Measurement The value of some system metric (CPU_Usage).
E Event An event has been detected.
S Status A change in a status has been detected.
(The char field could just as well be int or smallint. I use char here for illustration.)
Then fill out the Observations table with a PK and the attributes that would be common to all observations.
create table Observations(
ID int identity primary key,
ObservationKind char( 1 ) not null,
DateEntered date not null,
...,
constraint FK_ObservationKind foreign key( ObservationKind )
references ObservationKinds( ID ),
constraint UQ_ObservationIDKind( ID, ObservationKind )
);
It may seem strange to create a unique index on the combination of Kind field and the PK, which is unique all by itself, but bear with me a moment.
Now each kind or subtype gets its own table. Note that each kind of observation gets a table, not the data type.
create table Measurements(
ID int not null,
ObservationKind char( 1 ) check( ObservationKind = 'M' ),
Name varchar( 32 ) not null, -- Such as "CPU Usage"
Value double not null, -- such as 55.00
..., -- other attributes of Measurement observations
constraint PK_Measurements primary key( ID, ObservationKind ),
constraint FK_Measurements_Observations foreign key( ID, ObservationKind )
references Observations( ID, ObservationKind )
);
The first two fields will be the same for the other kinds of observations except the check constraint will force the value to the appropriate kind. The other fields may differ in number, name and data type.
Let's examine an example tuple that may exist in the Measurements table:
ID ObservationKind Name Value ...
==== =============== ========= =====
1001 M CPU Usage 55.0 ...
In order for this tuple to exist in this table, a matching entry must first exist in the Observations table with an ID value of 1001 and an observation kind value of 'M'. No other entry with an ID value of 1001 can exist in either the Observations table or the Measurements table and cannot exist at all in any other of the "kind" tables (Events, Status). This works the same way for all the kind tables.
I would further recommend creating a view for each kind of observation which will provide a join of each kind with the main observation table:
create view MeasurementObservations as
select ...
from Observations o
join Measurements m
on m.ID = o.ID;
Any code that works solely with measurements would need to only hit this view instead of the underlying tables. Using views to create a wall of abstraction between the application code and the raw data greatly enhances the maintainability of the database.
Now the creation of another kind of observation, such as "Error", involves a simple Insert statement to the ObservationKinds table:
F Fault A fault or error has been detected.
Of course, you need to create a new table and view for these error observations, but doing so will have no impact on existing tables, views or application code (except, of course, to write the new code to work with the new observations).
Just create it as a VARCHAR
This will allow you to store whatever data you require in it. It is much more difficult to do queries based on the number in the field such as
Select * from table where MyVARCHARField > 50 //get CPU > 50
However if you think you want to do this, then either you need a field per item or a generalised table such as
Create Table
Description : Varchar
ValueType : Varchar //Can be String, Float, Int
ValueString: Varchar
ValueFloat: Float
ValueInt : Int
Then when you are filling the data you can put your value in the correct field and select like this.
Select Description ,ValueInt from table where Description like '%cpu%' and ValueInt > 50
I had a used two columns for a similar problem. First column was for data type and second value contained data as a Varchar.
First column had codes ( e.g. 1= integer, 2 = string, 3 = date and so on), which could be combined to compare values. ( e.g. find the max integer where type=1)
I did not have joins, but i think you can use this approach. It will also help you if tomorrow more data types are introduced.

MySQL - convert names and values into columns

I have the following 2 tables:
Parameters table: ID, EntityType, ParamName, ParamType
Entity table: ID, Type, Name, ParamID, StringValue, NumberValue, DateValue
Entity.ParamID is linked to Parameters.ID
Entity.Type is linked to Parameters.EntityType
StringValue, NumberValue, DateValue contains data based on Parameters.Type (1,2,3)
the query result should contain:
Entity.ID, Entity.Name, Parameters.ParamName1, Parameters.ParamName2... Parameters.ParamNameX
The content of ParamNameX is as the above correlation. How is it possible to turn the parameters names into columns and their values into data of those columns? I don't even know where to begin.
Explanation for the above: for example entity X can be entitytype 1 and entitytype 2. parameters table contains paramname for both type 1 and 2 but I need to get (for example) only entity type 1's paramname.
What you are trying to archive is a EAV (Entity Attribute Value) Model.
But the way you set up your tables is just wrong.
You should have a table per type.
So entity_string, entity_number, entity_date and a main table entity which holds the id and some general stuff like create_time, update_time and so on.
Look at magento and how they set up their tables. Like this it is much easier to ask for your data and organize it.

How to store a list of each user's items in MySQL?

If each user in a site can enter a comma-separated list of items that they type in (not from a pre-determined list!), how should we store that list for each user in MySQL so that items can be matched across users with the same items?
I know we shouldn't store the comma-separated string they've inputed as a VARCHAR in the DB, so how should it be stored? Should a new table ItemsList be created where each row is a UserID -> ItemName (e.g. if user ID 101 enters "Matches, Gun, Alcohol", we would add 3 rows to ItemsList as 101 -> 'Matches', 101 -> Gun, 101 -> Alcohol) ?
If so, what PRIMARY KEY should be used for that table?
What indexes should be set to make both the retrieval and matching of items as fast as possible?
Lastly, what query should be used to find all the users that have at least one Item in common with another user?
Could you store the input as a JSON string in a TEXT column in the table? That way you can retrieve and display/update the JSON string whenever a user needs to add new items. In doing this you aren't creating a new row for every single input. The Primary Key in this situation would be the user id, and as PKs are already indexes, you wouldn't need to create additional indexes.
Since the input will be comma-separated, you would do a split on commas into a list and then serialize the list as a JSON object (then store in DB).
Yes.
Assuming a table ItemsList with two columns UserID and ItemName, the primary key should be (UserID, ItemName) (both columns)
Besides the Primary Key, add an index on ItemName.
You are asking a bit too much. Please establish your final table structure, try something and come back with some code you have tried.

Best Practice: find row for unique id from multiple tables

our database contain 5+ tables
user
----------
user_id (PK) int NOT NULL
name varchar(50) NOT NULL
photo
--------
photo_id (PK) int NOT NULL
user_id (FK) int NOT NULL
title varchar(50) NOT NULL
comment
-------
comment_id (PK) int NOT NULL
photo_id int NOT NULL
user_id int NOT NULL
message varchar(50) NOT NULL
all primary key id's are unique id's.
all data are linked to http://domain.com/{primary_key_id}
after user visit the link with id, which is unique for all tables.
how should i implement to find what table this id belongs to?
solution 1
select user_id from user where user_id = {primary_key_id}
// if not found, then move next
select photo_id from photo where photo_id = {primary_key_id}
... continue on, until we find which table this primary key belongs to.
solution 2
create object table to hold all the uniqe id and there data type
create trigger on all the tables for AFTER INSERT, to create row in object table with its data type, which was inserted to a selected table
when required, then do select statement to find the table name the id belongs to.
second solution will be double insert. 1 insert for row to actual table with complete data and 2 insert for inserting unique id and table name in object table, which we created on step 1.
select type from object_table where id = {primary_key_id}
solution 3
prepend table name + id = encode into new unique integer - using php
decode id and get the original id with table name (even if its just as number type)
i don't know how to implement this in php, but this solution sounds better!? what are your suggestion?
I don't know what you mean by Facebook reference in the comments but I'll explain my comment a little further.
You don't need unique ID's across five DB tables, just one per table. You have couple of options how to create your links (you can create the links yourself can you?):
using GET variables: http://domain.com/page.html?pk={id}&table={table}
using plain URL: http://domain.com/{id}{table}
Depending on the syntax of the link you choose the function to parse it. You can for example use one or both of the following:
http://php.net/manual/en/function.explode.php
http://www.php.net/manual/en/function.parse-url.php
When you get the simple model working you may add encoding/decoding/hashing functions. But do you really need them? And in what level? (I have no experience in that area so I'll shut up now.)
Is it actually important to maintain uniqueness across tables?
If no, just implement the solution 3 if you can (e.g. using URL encoding).
If yes, you'll need the "parent" table in any case, so the DBMS can enforce the uniqueness.
You can still try to implement the solution 3 on top of that,
or add a type discriminator1 there and you'll be able to (quickly) know which table is referenced for any given ID.
1 Take a look at the lower part of this answer. This is in fact a form of inheritance.

How to structure my Users Database?

I have a website that allows users to be different types. Each of these types can do specific things. I am asking if I should set up 1 table for ALL my users and store the types in an enum, or should I make different tables for each type. Now, if the only thing different was the type it would be easy for me to choose only using one table. However, here's a scenario.
The 4 users are A, B, C, D.
User A has data for:
name
email
User B has data for:
name
email
phone
User C has data for:
name
email
phone
about
User D has data for:
name
email
phone
about
address
If I were to create a single table, should I just leave different fields null for the different users? Or should I create a whole separate table for each user?
Much better if you could create a single table for all of them. Though some fileds are nullable. And add an extra column (enum) for each type of users. If you keep your current design, you will have to use some joins and unions for the records. (which adds extra overhead on the server)
CREATE TABLE users
(
ID INT,
name VARCHAR(50),
email VARCHAR(50),
phone VARCHAR(50),
about VARCHAR(50),
address VARCHAR(50),
userType ENUM() -- put types of user here
)
Another suggested design is to create two tables, one for user and the other one is for the types. The main advantage here is whenever you have another type of user, you don't have to alter the table but by adding only extra record on the user type table which will then be referenced by the users table.
CREATE TABLE UserType
(
ID INT PRIMARY KEY,
name VARCHAR(50)
)
CREATE TABLE users
(
ID INT,
name VARCHAR(50),
email VARCHAR(50),
phone VARCHAR(50),
about VARCHAR(50),
address VARCHAR(50),
TypeID INT,
CONSTRAINT rf_fk FOREIGN KEY (TypeID) REFERENCES UserType(ID)
)
Basic database design principals suggest one table for the common elements and additional tables, JOINed back to the base table, for the attributes that are unique to each type of user.
Your example suggests one and only one additional field per user-type in a straightforward inheritance hierarchy. Is that really what the data looks like, or did you simply for the example? If that's a true representation of your requirements, I might be tempted (for expedience) to use a single table. But if the real requirements are more complex, I'd bite the bullet and do it "correctly".
Try creating four tables:
Table 1: Name, email
Table 2: Name, phone
Table 3: Name, about
Table 4: Name, address
Name is your primary key on all four tables. There are no nulls in the database. You're not storing an enumerated type but derive the type from table joins:
To find all User A select all records in table 1 not in table 2
To find all User B select all records in table 2 not in table 3
To find all User C select all records in table 3 not in table 4
To find all User D select all records in table 4
You should not create tables for different people because this will lead to a bloated database. It's best to create a single table with all the fields you need. If you don't use the field, pass in null values.
I would suggest that you use 1 single table with nullable fields. And a table of something like roles.