Storing structured user data in members table column/s - mysql

I wanted to ask for some advice in structuring the SQL database I am creating.
UPDATE: Storing Data in MySQL as JSON seems to clearly indicate that storing JSON in MySQL is not a smart choice.
My initial idea was to create a table for each user named '{user_id}' with the following rows:
Datetime
Entry (one-digit int)
Comment (reasonably short string)
Activity (one word)
However, I have read that creating a table for each user is not advisable because it's unmanageable in the long run.
Specifically, I wanted to know how I could put all the information that I would have put in the '{user_id}' table in the user's row of my 'members' table.
I had a few ideas, but don't know how good they are:
Storing the user data as a JSON object (converted to a string) in an additional column 'data' of the 'members' table. Would that become unmanageable in the long run too (due to JSON object string becoming too long)?
Storing the user data in various additional columns of the 'members' table, maybe one for each of the parameters listed above (each of them being an array)
Storing the user data in various additional columns of the 'members' table, maybe one for each of the parameters listed above (each of them being a dictionary or some other data structure)
Are there any other better ways, or better data storage types than JSON objects?
What would be a good way of storing this information? Isn't handling the arrays/dictionaries going to become unmanageable over time when they become very big?
(one thing to keep in mind is that the 'data' entries would have to be daily modified and easily accessed)

I think you may simply want a single additional table, maybe called "activities" with a foreign key "user" to the "members" table.
Then for each row in each of the per user table that you were originally thinking of, you have a row in the activities table with the value of "user" being the user in question. Since each row is of relatively small bounded size, one would expect the database to handle it well, and efficiency issues can be addressed by indexing. Basically I am agreeing with #MikeNakis

Related

Whether a table which stores data about single item should grow horizontally or vertically?

Suppose I have a user table that stores the data of single user.
Initially we know nothing about the user, so there is nothing in the table(may be only single column like id which is of no use in this case ).
We do not know what are the details we are going to have about the user and we do not know in which order we are getting the details. Details about user will be obtained gradually in any order.
My question is ,For Example, if I got the name of user, how should I enter it in the table?
I have two options
1) Alter the table structure and add a column called username and store the data there. For all new detail, this process is repeated. So all data will be in one row.
2)Alter the table structure and add to columns key and value. Give name as a key and store the name of user as its value. Thus for each detail about the user,a new row is inserted as key value pairs.
First method makes the table grow horizontally.
Second one make it grow vertically.
which one is good on the basis of good design methods and ease of querying?
If you expect the metadata associated with a user could become arbitrarily large, then adding columns probably isn't the best approach. So this would leave your suggestion to simply add key/value pairs for each new feature associated with a user. There is a third option, which I don't like for so many reasons, which would be to store JSON containing key/value pairs in a single column of the user table. We currently use this approach sporadically, but we handle the JSON manipulation in our Java app layer, which is relatively painless. From a pure database point of view, this isn't so desirable.
So I would vote for your second option of using key/value pairs, because it would scale well. Note that this does not imply that your user table would only have a single column. You might know that a certain number of user attributes will always be there, e.g. username, hashed password, etc., and these columns could be added at the beginning.
Building on what others have already said, you could use a hybrid approach as well. If there are any predefined columns (username, firstname, lastname, password, etc.), you could put those in a table with defined fields, and then link a second table with key/value pairs for additional data.

Integer values for status fields

Often I find myself creating 'status' fields for database tables. I set these up as TINYINT(1) as more than often I only need a handful of status values. I cross-reference these values to array-lookups in my code, an example is as follows:
0 - Pending
1 - Active
2 - Denied
3 - On Hold
This all works very well, except I'm now trying to create better database structures and realise that from a database point of view, these integer values don't actually mean anything.
Now a solution to this may be to create separate tables for statuses - but there could be several status columns across the database and to have separate tables for each status column seems a bit of overkill? (I'd like each status to start from zero - so having one status table for all statuses wouldn't be ideal for me).
Another option is to use the ENUM data type - but there are mixed opinions on this. I see many people not recommending to use ENUM fields.
So what would be the way to go? Do I absolutely need to be putting this data in to its own table?
I think the best approach is to have a single status table for each kind of status. For example, order_status ("placed", "paid", "processing", "completed") is qualitatively different from contact_status ("received", "replied", "resolved"), but the latter might work just as well for customer contacts as for supplier contacts.
This is probably already what you're doing — it's just that your "tables" are in-memory arrays rather than database tables.
As I really agree with "ruakh" on creating another table structured as id statusName which is great. However, I would like to add that for such a table you can still use tinyint(1) for the id field. as tinyint accepts values from 0 to 127 which would cover all status cases you might need.
Can you add (or remove) a status value without changing code?
If yes, then consider a separate lookup table for each status "type". You are already treating this data in a generic way in your code, so you should have a generic data structure for it.
I no, then keep the ENUM (or well-documented integer). You are treating each value in a special way, so there isn't much purpose in trying to generalize the data model.
(I'd like each status to start from zero - so having one status table for all statuses wouldn't be ideal for me
You should never mix several distinct sets of values within the same lookup table (regardless of your "zero issue"). Reasons:
A simple FOREIGN KEY alone won't be able to prevent referencing a value from the wrong set.
All values are forced into the same type, which may not always be desirable.
That's such a common anti-pattern that it even has a name: "one true lookup table".
Instead, keep each lookup "type" within a separate table. That way, FKs work predictably and you can tweak datatypes as necessary.

Database design for user entries (using mysql)

The main pieces of data I'm having my users enter is an object called an "activity" which consists of a few text fields, a few strings, etc. One of the text fields, called a "Description", could possibly be quite long (such as a long blog post). For each user I would like to store all of their activity objects in a mysql database.
Here are some solutions I've thought of:
Have a separate mysql table for each user's activities, i.e. activities_userX, X ranging over
Use json to encode these objects into strings and store them as a column in the main table
Have one separate table for all these activities objects, and just index them; then for each user in the main table have a list of indices corresponding to which activities are theirs.
What are the pros/cons of these methods? More importantly, what else could I be doing?
Thanks.
Have a separate mysql table for each user's activities, i.e. activities_userX, X ranging over
A table for every user? That just means an insane number of tables.
Use json to encode these objects into strings and store them as a column in the main table
JSON is a good transport language. You have a database for storing your data, use its features.
Have one separate table for all these activities objects, and just index them; then for each user in the main table have a list of indices corresponding to which activities are theirs.
Getting closer.
This sort of relationship is usually known as 'has many'. In this case "A user has many activities".
You should have a table of users and a table of activities.
One of the columns of the activities table should be a foreign key that points to the primary key of the user table.
Then you will be able to do:
SELECT fields, i, want from activities WHERE userid=?
Or
SELECT users.foo, users.bar, activities.description from users,activities
WHERE user.userid=activities.userid

Alternative to using same foreign key in almost every table

I am working with a database where "almost" every table in the database has the same field and same value. For example, almost all tables have a field called GroupId and there is only one group id in the database now.
Benefits
All data is related to that field and can be identified by said field
When a new group is created data will be properly identified for the group
Disadvantages
All tables have the this field
All stored procedures need to have this field as a parameter
All queries have to filtered by this field
Is this a big deal? Is there an alternative to this approach?
Thanks
If you need to be able to identify data by more than one group in the future, having foreign keys is a good practice. However, that deosn't mean all tables need to have this field, only the ones directly related to the group. For instance a lookuptable with state values may not need it, but the customers table might. Adding it to all tables willy-nilly can lead to bad things when you try to delete a record and have to check 579 tables (only 25 of which are pertinent). All this depends greatly on what the meaning of the groups is. Most of our tables have a relationship to the client table, because they contain data related to specific clients and because we don't want various clients to have the ability to see data for other clients. Tables which do not contain that kind of data do not.
Yes most queries may need the field and many stored procs will want to have it as an input variable, but if you truly need to filter on this information, then that is as it should be.
If however there is only one group and will never be more than one group, it is a waste of time, effort and space.

Implementing custom fields with ALTER TABLE

We are currently thinking about different ways to implement custom fields for our web application. Users should be able to define custom fields for certain entities and fill in/view this data (and possibly query the data later on).
I understand that there are different ways to implement custom fields (e.g. using a name/value table or using alter table etc.) and we are currently favoring using ALTER TABLE to dynamically add new user fields to the database.
After browsing through other related SO topics, I couldn't find any big drawbacks of this solution. In contrast, having the option to query the data in fast way (e.g. by directly using SQL's where statement) is a big advantage for us.
Are there any drawbacks you could think of by implementing custom fields this way? We are talking about a web application that is used by up to 100 users at the same time (not concurrent requests..) and can use both MySQL and MS SQL Server databases.
Just as an update, we decided to add new columns via ALTER TABLE to the existing database table to implement custom fields. After some research and tests, this looks like the best solution for most database engines. A separate table with meta information about the custom fields provides the needed information to manage, query and work with the custom fields.
The first drawback I see is that you need to grant your application service with ALTER rights.
This implies that your security model needs careful attention as the application will be able to not only add fields but to drop and rename them as well and create some tables (at least for MySQL).
Secondly, how would you distinct fields that are required per user? Or can the fields created by user A be accessed by user B?
Note that the cardinality of the columns may also significantly grow. If every user adds 2 fields, we are already talking about 200 fields.
Personally, I would use one of the two approaches or a mix of them:
Using a serialized field
I would add one text field to the table in which I would store a serialized dictionary or dictionaries:
{
user_1: {key1: val1, key2, val2,...},
user_2: {key1: val1, key2, val2,...},
...
}
The drawback is that the values are not easily searchable.
Using a multi-type name/value table
fields table:
user_id: int
field_name: varchar(100)
type: enum('INT', 'REAL', 'STRING')
values table:
field_id: int
row_id: int # the main table row id
int_value: int
float_value: float
text_value: text
Of course, it requires a join and is a bit more complicated to implement but far more generic and, if indexed properly, quite efficient.
I see nothing wrong with adding new custom fields to the database table.
With this approach, the specific/most appropriate type can be used i.e. need an int field? define it as int. Whereas with a name/value type table, you'd be storing multiple data types as one type (nvarchar probably) - unless you complete that name/value table with multiple columns of different types and populate the appropriate one but that is a bit horrible.
Also, adding new columns makes it easier to query/no need to involve a join to a new name/value table.
It may not feel as generic, but I feel that's better than having a "one-size fits all" name/value table.
From an SQL Server point of view (2005 onwards)....
An alternative, would be to store create 1 "custom data" field of type XML - this would be truly generic and require no field creation or the need for a separate name/value table. Also has the benefit that not all records have to have the same custom data (i.e. the one field is common, but what it contains doesn't have to be). Not 100% on the performance impact but XML data can be indexed.