MySQL column organization - mysql

My web application allows a user to define from 1 up to 30 emails (could be anything else).
Which of these options is best?
1) ...store the data inside only one column using a separator, like this:
[COLUMN emails] peter#example.com,mary#example.com,john#example.com
Structure:
emails VARCHAR(1829)
2) ...or save the data using distinct columns, like this:
[COLUMN email1] peter#example.com
[COLUMN email2] mary#example.com
[COLUMN email3] john#example.com
[...]
Structure:
email1 VARCHAR(60)
email2 VARCHAR(60)
email3 VARCHAR(60)
[...]
email30 VARCHAR(60)
Thank you in advance.

Depends on how you are going to use the data and how fixed the amount of 30 is. If it is an advantage to quickly query for the 3rd address or filter using WHERE clauses and such: use distinct fields; otherwise it might not be worth the effort of creating the columns.
Having the data in a database still has the advantage of concurrent access by several users.

Number two is the better option, without question. If you do the first one (comma separated), then it negates the advantages of using a RDBMS (you can't run an efficient query on your emails in that case, so it may as well be a flat file).

number 2 is better than number one.
However, you should consider another option of getting a normalized structure where you have a separate emails table with a foreign key to your user record. This would allow you to define an index if you wanted to search by email to find a user and place a constraint ensuring no duplicate emails are registered - if you wanted to do that.

Neither one is a very good option.
Option 1 is a poor idea because it makes looking a user up by email a complex, inefficient task. You are effectively required to perform a full text search on the email field in the user record to find one email.
Option 2 is really a WORSE idea, IMO, because it makes any surrounding code a huge pain to write. Suppose, again, that you need to look up all users who have a value X. You now need to enumerate 30 columns and check each one to see if that value exists. Painful!
Storing data in this manner -- 1-or-more of some element of data -- is very common in database design, and as Adam has previously mentioned, is best solved in MOST cases by using a normalized data structure.
A correct table structure, written in MySQL since this was tagged as such, might look like:
Users table:
CREATE TABLE user (
user_id int auto_increment,
...
PRIMARY KEY (user_id)
);
Emails table:
CREATE TABLE user_email (
user_id int,
email char(60) not null default '',
FOREIGN KEY (user_id) REFERENCES user (user_id) ON DELETE CASCADE
);
The FOREIGN KEY statement is optional -- the design will work without it, however, that line causes the database to force the relationship. For example, if you attempt to insert a record into user_email with a user_id of 10, there MUST be a corresponding user record with a user_id of 10, or the query will fail. The ON DELETE CASCADE tells the database that if you delete a record from the user table, all user_email records associated with it will also be deleted (you may or may not want this behavior).
This design of course also means that you need to perform a join when you retrieve a user record. A query like this:
SELECT user.user_id, user_email.email FROM user LEFT JOIN user_email ON user.user_id = user_email.user_id WHERE <your where clause>;
Will return one row for EACH user_email address stored in the system. If you have 5 users and each user has 5 email addresses, the above query will return 25 rows.
Depending on your application, you may want to get one row per user but still have access to all the emails. In that case you might try an aggregate function like GROUP_CONCAT which will return a single row per user, with a comma-delimited list of emails belonging to that user:
SELECT user.user_id, GROUP_CONCAT(user_email.email) AS user_emails FROM user LEFT JOIN user_email ON user.user_id = user_email.user_id WHERE <your where clause> GROUP BY user.user_id;
Again, depending on your application, you may want to add an index to the email column.
Finally, there ARE some situations where you do not want a normalized database design, and a single-column design with delimited text might be more appropriate, although those situations are few and far between. For most normal applications, this type of normalized design is the way to go and will help it perform and scale better.

Related

correct structure (Database) for a mobile application

We are making a mobile application with some friends, but we are having problems regarding the structure of the database due to Unknown.I think it is a good question that can help many people and it would be nice if people with knowledge can explain it well. The app consists of providing various services (more can be added in the future) to customers. They are logged in and have access to our services. At first we thought of a table that contains columns with all the customer data + the services. Then we saw that it was more effective to make another separate table called "services" and that identifies the user by an id. The problem now comes to this table. We do not know whether to make a single column with all services (such as array) or to make one column per service. I took a photo so that what I am proposing can be observed more easily.
The question is which of these options (obviously there may be a third that we do not contemplate) is the best, in terms of performance.
I think that the second option I see several defects but I'm not sure. In terms of latency and speed, traversing an array (and more if services are added, or perhaps they are out of order because the user first hired service2 and then 1) is much higher than in option 1. In addition, the fact that a user is under a service, that implies going through the entire array, looking for it and eliminating it. I don't know you are the experts, what do you recommend?all this will be uploaded to the cloud (azure), so all requests will be to the cloud
Option 2 is better than option 1. But, with respect, it's still not good.
Never never store comma-separated lists of things in columns of data. If you do you'll be sorry. (They're very costly to search.)
You want something like this. Three tables, one for users, another for services, and a so-called JOIN table to establish a many-to-many relationship between the two.
+-----------+ +-------------+ +-----------+
|user | |user_service | |service |
+-----------+ +-------------+ +-----------+
|user_id +--->|user_id |<----+service_id |
|givennamee | |service_id | |name |
|surname | +-------------+ +-----------+
|is_active |
+-----------+
Each row in user_service means a user is authorized to use that service. To authorize a user, INSERT a row. To revoke authorization, DELETE the row.
To find out whethe a user can use a service, use this query.
SELECT user.user_id
FROM user
JOIN user_service USING (user_id)
JOIN service USING (service_id)
WHERE user.givenname = 'Bill' AND user.surname='Gates'
AND service.name = 'CharityNavigator'
AND user.is_active > 0;
If your query returns the user_id then the chosen user may use the chosen service.
To get a list of the services for each user, use this query.
SELECT user.user_id, user.givenname, user.surname,
GROUP_CONCAT(service.name) service_names
FROM user
JOIN user_service USING (user_id)
JOIN service USING (service_id)
WHERE user.is_active > 0
GROUP BY user.user_id
Some explanation:
It's almost always best to build tables with rows for things like your services in them, rather than columns or comma-separated lists in columns. Why?
You can add new services -- as many as you want -- years from now without reworking your database code.
DBMSs, including MySQL, work well with JOIN operations.
Doing WHERE commalist_column SOMEHOW_CONTAINS (some_id) is disgustingly inefficient in most relational database management systems. Doing WHERE column = some_id is far more efficient because it can use an index.
Rows with fewer columns, in general, work better than rows with more columns.
It's far cheaper in production to add rows to databases than it is to add columns. Adding columns means altering table definitions. That operation can require downtime.
When you use columns for things like your services, you're creating a closed system. When you use rows, your system is open-ended.
May I suggest you read about database normalization? Don't be intimidated by all the CS jargon. Just look at some examples of how to normalize various databases.
And maybe read about entity-relationship database modeling?
Edit On the advice of a commenter, I suggest you make the primary key of your user_service table to contain both columns (user_id, service_id). I also suggest you make a reverse index with both columns (service_id, user_id) so your queries can look things up quickly starting with service as well as user. Your table definitions might look something like this:
CREATE TABLE user (
user_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
givenname VARCHAR(50) NULL DEFAULT NULL,
surname VARCHAR(50) NULL DEFAULT NULL,
is_active TINYINT NOT NULL DEFAULT '1',
PRIMARY KEY (user_id)
)
COLLATE='utf8mb4_general_ci';
CREATE TABLE service (
service_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
name VARCHAR(50) NULL DEFAULT NULL,
PRIMARY KEY (service_id)
)
COLLATE='utf8mb4_general_ci';
CREATE TABLE user_service (
user_id INT UNSIGNED NOT NULL,
service_id INT UNSIGNED NOT NULL,
PRIMARY KEY (user_id, service_id),
INDEX reverse_index (service_id, user_id),
CONSTRAINT FK_service
FOREIGN KEY (service_id)
REFERENCES service (service_id)
ON UPDATE RESTRICT ON DELETE RESTRICT,
CONSTRAINT FK_user
FOREIGN KEY (user_id)
REFERENCES user (user_id)
ON UPDATE RESTRICT ON DELETE RESTRICT
);
With this primary key if you attempt to INSERT a duplicate authorization for a user for a service, the dbms rejects it.
Be sure to use the same 'INT UNSIGNED NOT NULLdata type foruser_idandservice_id` in those tables.
This is a very common database design pattern: it's the canonical way of creating a many-to-many relationship between rows of two different tables.
A 3rd way (most frugal on space)
See the SET datatype. It allows for saying which combination of those 6 servs apply.
INT UNSIGNED (of a suitable size) is another way to have a "set".
SET or TINYINT takes only 1 byte to represent up to 8 items.
Your 6 column choice takes 6 bytes.
The "{serv1,... }" might be a VARCHAR, averaging 10-20 bytes.
So, My suggestions are clearly aimed at saving space. But maybe that is not important? Do you have millions or rows? Do you have more tnan 64 "servs"? (There is a limit of 64 on SET and BIGINT UNSIGNED.)
But Which?
Is the question about coding? Well, any method is going to take some effort to split the bits/columns/string apart to build the buttons on the screen. Probably a similar amount of effort and probably less than the effort to build the screen. Ditto for performance.
I highly recommend you pick two solutions and implement both. You will discover
How similar they are in performance, amount code, etc.
How insignificant the question is.
How much extra stuff you have learned about databases.
How easy it is to "try" and "throw away" another way to do something.
How the latency, performance, etc, differences are insignificant. (This is what we are really answering for you.)
The bigger picture
You have pointed out one use for this data structure. I worry that there are, or will be, other uses for this data structure. And that something else is the real determinant of which approach is best. (At that point, you can happily resurrect the thrown away version!)
A 4th way
JSON. But it would be more verbose (take more space) than your VARCHAR way. It may or may not be easier to work with -- this depends on the rest of the requirements.

How to store data like as facebook's "likes"

I have object which store in database, it's a some text with properties.
That text has rating. I need to store this rating, and prevent to one user raise this raiting more than one time. If I store "text id" and "user id" in other table and count all records which have needing "text id" i have too much records in table.
There are two ways:
You can use many-to-many relationship ie use separate table with name like 'user_likes', it will have user_id and like_id columns, both of them are primary key (it makes possible user to like the like_object only once)
Another way - which hightraffic websites use: every user record in user table has columns: likes which is just serialized array or json, whatever. Before update this columns your application retrieve this data and look for particular like_object_id if it doesn't exist - you update your database. Please note that in this case all care about data consistency in your application (for instance like_object_id exists in some user record, but doesn't exist in like_object table) should be implemented in your application code, not database.
P.S. Sorry for my english, but I tried to explain as best as I could.
If I store "text id" and "user id" in other table and count all records which have needing "text id" i have too much records in table.
How do you know what is too many records?
Some of the MySQL tables I support have billions of rows. If they need more than that, they split the data to multiple MySQL servers. 1 million rows is not a problem for a MySQL database.
If you want to limit the data so each user can "like" a given text only once, you must store the data separately for each user. This is also true if a user can "unlike" a text they had previously liked.
CREATE TABLE likes (
user_id BIGINT UNSIGNED NOT NULL,
post_id BIGINT UNSIGNED NOT NULL,
PRIMARY KEY (user_id, post_id),
KEY (post_id, user_id)
);
This example table uses its primary key constraint to ensure each user can like a given post only once. By adding a second index, this helps to optimize queries for likes on a specific post.
This is only 16 bytes per row, plus the size of the index. I filled an InnoDB table with over 1 million rows, and it uses about 60MB.
mysql> show table status\G
Name: likes
Engine: InnoDB
Rows: 1046760
Data_length: 39419904
Index_length: 23658496
It's common to store databases on terabyte-sized storage these days, so a 60MB table doesn't seem too large.
I store the likes with the post itself, but not sure with its performance since non of my websites reached a very heavy load.
but I do the following :
Post {
id int;
likes_count int; // likes count to quickly retrive it
likes string; // id of the users liked this post, comma separated
}
when a user likes a post, (using ajax):
the UI will update directly and show that the user liked the post
ajax will send request to the server with the post id and the user id, then post data will be updated as follow:
post.likes_count += 1;
post.likes += userId + ',' ;
when the user reload the page, it will check if his id is in likes, then it the post will appear as liked.

MySQL - Table Implementation

I had to implement the following into my database:
The activities that users engage in. Each activity can have a name with up to 80 characters, and only distinct activities should be stored. That is, if two different users like “Swimming”, then the activity “Swimming” should only be stored once as a string.
Which activities each individual user engages in. Note that a user can have more than one hobby!
So I have to implement tables for this purpose and I must also make any modifications to existing tables if and as required and implement any keys and foreign key relationships needed.
All this must be stored with minimal amount of storage, i.e., you must choose the appropriate data types from the MySQL manual. You may assume that new activities will be added frequently, that activities will almost never be removed, and that the total number of distinct activities may reach 100,000.
So I already have a 'User' table with 'user_id' as my primary key.
MY SOLUTION TO THIS:
Create a table called 'Activities' and have 'activity_id' as PK (mediumint(5) ) and 'activity' as storing hobbies (varchar(80)) then I can create another table called 'Link' and use the 'user_id' FK from user table and the 'activity_id' FK from the 'Activities' table to show user with the activities that they like to do.
Is my approach to this question right? Is there another way I can do this to make it more efficient?
How would I show if one user pursues more than one activity in the foreign key table 'Link'?
Your idea is the correct, and only(?) way.. it's called a many to many relationship.
Just to reiterate what you're proposing is that you'll have a user table, and this will have a userid, then an activity table with an activityid.
To form the relationship you'll have a 3rd table, which for performance sake doesn't require a primary key however you should index both columns (userid and activityid)
In your logic when someone enters an activity name, pull all records from the activity table, check whether entered value exists, if not add to table and get back the new activityid and then add an entry to the user_activity table linking the activityid to the userid.
If it already exists just add an entry linking that activity id to the userid.
So your approach is right, the final question just indicates you should google for 'many to many' relationships for some more info if needed.

Database design - how to implement user group table?

I want to make user group system that imitates group policy in instant messengers.
Each user can create as many as groups as they want, but they cannot have groups with duplicate names, and they can put as many friends as they want into any groups.
For example, John's friend Jen can be in 'school' group of John and 'coworker' group of John at the same time. And, it is totally independent from how Jen puts John into her group.
I'm thinking two possible ways to implement this in database user_group table.
1.
user_group (
id INT PRIMARY KEY AUTO_INCREMENT,
user_id INT,
group_name VARCHAR(30),
UNIQUE KEY (user_id, group_name)
)
In this case, all groups owned by all users will have a unique id. So, id alone can identify which user and the name of the group.
2.
user_group (
user_id INT,
group_id INT AUTO_INCREMENT,
group_name VARCHAR(30),
PRIMARY KEY (user_id, group_id),
UNIQUE KEY (user_id, group_name)
)
In this case, group_id always starts from 0 for each user, so, there could exist many groups with same group_id s. But, pk pair (user_id, group_id) is unique in the table.
which way is better implementation and why?
what are advantages and drawbacks for each case?
EDIT:
added AUTO_INCREMENT to group_id in second scenario to insure it is auto-assigned from 0 for each user_id.
EDIT:
'better' means...
- better performance in SELECT/INSERT/UPDATE friends to the group since that will be the mostly used operations regarding the user group.
- robustness of database like which one will be more safe in terms of user size.
- popularity or general preference of either one over another.
- flexibility
- extensibility
- usability - easier to use.
Personally, I would go with the 1st approach, but it really depends on how your application is going to work. If it would ever be possible for ownership of a group to be changed, or to merge user profiles, this will be much easier to do in your 1st approach than in the 2nd. In the 2nd approach, if either of those situations ever happen, you would not only have to update your user_group table, but any dependent tables as well that have a foreign key relation to user_group. This will also be a many to many relation (there will be multiple users in a group, and a user will be a member of multiple groups), so it will require a separate joining table. In the 1st approach, this is fairly straightforward:
group_member (
group_id int,
user_id int
)
For your 2nd approach, it would require a 3rd column, which will not only be more confusing since you're now including user_id twice, but also require 33% additional storage space (this may or may not be an issue depending on how large you expect your database to be):
group_member (
owner_id int,
group_id int,
user_id int
)
Also, if you ever plan to move from MySQL to another database platform, this behavior of auto_increment may not be supported. I know in MS SQL Server, an auto_increment field (identity in MSSQL) will always be incremented, not made unique according to indexes on the table, so to get the same functionality you would have to implement it yourself.
Please define "better".
From my gut, I would pick the 2nd one.
The searchable pieces are broken down more, but that wouldn't be what I'd pick if insert/update performance is a concern.
I see no possible benefit to number 2 at all, it is more complex, more fragile (it would not work at all in SQL Server) and gains nothing. Remeber the groupId is without meaning except to identify a record uniquely, likely the user willonly see the group name not the id. So it doesn't matter if they all start from 0 or if there are gaps because a group was rolled back or deleted.

Views performance in MySQL for denormalization

I am currently writing my truly first PHP Application and i would like to know how to project/design/implement MySQL Views properly;
In my particular case User data is spread across several tables (as a consequence of Database Normalization) and i was thinking to use a View to group data into one large table:
CREATE VIEW `Users_Merged` (
name,
surname,
email,
phone,
role
) AS (
SELECT name, surname, email, phone, 'Customer'
FROM `Customer`
)
UNION (
SELECT name, surname, email, tel, 'Admin'
FROM `Administrator`
)
UNION (
SELECT name, surname, email, tel, 'Manager'
FROM `manager`
);
This way i can use the View's data from the PHP app easily but i don't really know how much this can affect performance.
For example:
SELECT * from `Users_Merged` WHERE role = 'Admin';
Is the right way to filter view's data or should i filter BEFORE creating the view itself?
(I need this to have a list of users and the functionality to filter them by role).
EDIT
Specifically what i'm trying to obtain is Denormalization of three tables into one. Is my solution correct?
See Denormalization on wikipedia
In general, the database engine will perform the optimization for you. That means that the engine is going to figure out that the users table needs to be filtered before being joined to the other tables.
So, go ahead and use your view and let the database worry about it.
If you detect poor performance later, use MySQL EXPLAIN to get MySQL to tell you what it's doing.
PS: Your data design allows for only one role per user, is that what you wanted? If so, and if the example query you gave is one you intend to run frequently, make sure to index the role column in users.
If you have <1000 users (which seems likely), it doesn't really matter how you do it. If the user list is unlikely to change for long periods of time, the best you can probably do in terms of performance is to load the user list into memory and not go to the database at all. Even if user data were to change in the meantime, you could update the in-memory structure as well as the database and, again, not have to read user information from the DB.
You would probably be much better off normalizing the Administrators, Users, Managers and what-have-you into one uniform table with a discriminator column "Role" that would save a lot of duplication, which is essentially the reason to do normalization in the first place. You can then add the role specific details to distinct tables that you use with the User table in a join.
Your query could then look as simple as:
SELECT
`Name`, `Surname`, `Email`, `Phone`, `Role`
FROM `User`
WHERE
`User`.`Role` IN('Administrator','Manager','Customer', ...)
Which is also easier for the database to process than a set of unions
If you go a step further you could add a UserRoleCoupling table (instead of the Role column in User) that holds all the roles a User has per user:
CREATE TABLE `UserRoleCoupling` (
UserID INT NOT NULL, -- assuming your User table has and ID column of INT
RoleID INT NOT NULL,
PRIMARY KEY(UserID, RoleID)
);
And put the actual role information into a separate table as well:
CREATE TABLE `Role` (
ID INT NOT NULL UNIQUE AUTO_INCREMENT,
Name VARCHAR(64) NOT NULL
PRIMARY KEY (Name)
)
Now you can have multiple roles per User and use queries like
SELECT
`U`.`Name`
,`U`.`Surname`
,`U`.`Email`
,`U`.`Phone`
,GROUP_CONCAT(`R`.`Name`) `Roles`
FROM `User`
INNER JOIN `UserGroupCoupling` `UGC` ON `UGC`.`UserID` = `User`.`ID`
INNER JOIN `Role` `R` ON `R`.`ID` = `UGC`.`RoleID`
GROUP BY
`U`.`Name`, `U`.`Surname`, `U`.`Email`, `U`.`Phone`
Which would give you the basic User details and a comma seperated list of all assigned Role names.
In general, the best way to normalize a database structure is to make the tables as generic as possible without being redundant, so don't add administrator or customer specific details to the user table, but use a relationship between User and Administrator to find the specific administrator details. The way you're doing it now isn't really normalized.
I'll see if i can find my favorite book on database normalization and post the ISBN when I have time later.