Views performance in MySQL for denormalization - mysql

I am currently writing my truly first PHP Application and i would like to know how to project/design/implement MySQL Views properly;
In my particular case User data is spread across several tables (as a consequence of Database Normalization) and i was thinking to use a View to group data into one large table:
CREATE VIEW `Users_Merged` (
name,
surname,
email,
phone,
role
) AS (
SELECT name, surname, email, phone, 'Customer'
FROM `Customer`
)
UNION (
SELECT name, surname, email, tel, 'Admin'
FROM `Administrator`
)
UNION (
SELECT name, surname, email, tel, 'Manager'
FROM `manager`
);
This way i can use the View's data from the PHP app easily but i don't really know how much this can affect performance.
For example:
SELECT * from `Users_Merged` WHERE role = 'Admin';
Is the right way to filter view's data or should i filter BEFORE creating the view itself?
(I need this to have a list of users and the functionality to filter them by role).
EDIT
Specifically what i'm trying to obtain is Denormalization of three tables into one. Is my solution correct?
See Denormalization on wikipedia

In general, the database engine will perform the optimization for you. That means that the engine is going to figure out that the users table needs to be filtered before being joined to the other tables.
So, go ahead and use your view and let the database worry about it.
If you detect poor performance later, use MySQL EXPLAIN to get MySQL to tell you what it's doing.
PS: Your data design allows for only one role per user, is that what you wanted? If so, and if the example query you gave is one you intend to run frequently, make sure to index the role column in users.

If you have <1000 users (which seems likely), it doesn't really matter how you do it. If the user list is unlikely to change for long periods of time, the best you can probably do in terms of performance is to load the user list into memory and not go to the database at all. Even if user data were to change in the meantime, you could update the in-memory structure as well as the database and, again, not have to read user information from the DB.

You would probably be much better off normalizing the Administrators, Users, Managers and what-have-you into one uniform table with a discriminator column "Role" that would save a lot of duplication, which is essentially the reason to do normalization in the first place. You can then add the role specific details to distinct tables that you use with the User table in a join.
Your query could then look as simple as:
SELECT
`Name`, `Surname`, `Email`, `Phone`, `Role`
FROM `User`
WHERE
`User`.`Role` IN('Administrator','Manager','Customer', ...)
Which is also easier for the database to process than a set of unions
If you go a step further you could add a UserRoleCoupling table (instead of the Role column in User) that holds all the roles a User has per user:
CREATE TABLE `UserRoleCoupling` (
UserID INT NOT NULL, -- assuming your User table has and ID column of INT
RoleID INT NOT NULL,
PRIMARY KEY(UserID, RoleID)
);
And put the actual role information into a separate table as well:
CREATE TABLE `Role` (
ID INT NOT NULL UNIQUE AUTO_INCREMENT,
Name VARCHAR(64) NOT NULL
PRIMARY KEY (Name)
)
Now you can have multiple roles per User and use queries like
SELECT
`U`.`Name`
,`U`.`Surname`
,`U`.`Email`
,`U`.`Phone`
,GROUP_CONCAT(`R`.`Name`) `Roles`
FROM `User`
INNER JOIN `UserGroupCoupling` `UGC` ON `UGC`.`UserID` = `User`.`ID`
INNER JOIN `Role` `R` ON `R`.`ID` = `UGC`.`RoleID`
GROUP BY
`U`.`Name`, `U`.`Surname`, `U`.`Email`, `U`.`Phone`
Which would give you the basic User details and a comma seperated list of all assigned Role names.
In general, the best way to normalize a database structure is to make the tables as generic as possible without being redundant, so don't add administrator or customer specific details to the user table, but use a relationship between User and Administrator to find the specific administrator details. The way you're doing it now isn't really normalized.
I'll see if i can find my favorite book on database normalization and post the ISBN when I have time later.

Related

Mysql Storing multi values in a column

I have a tables called userAccounts userProfiles and usersearches.
Each userAccount may have multiply Profiles. Each user may have many searches.
I have the db set up working with this. However in each search there may be several user profiles.
Ie, each user account may have a profile for each member of their family.
They then want to search and include all or some of their family members in their search. The way i would kinda like it to work is have a column in user searches called profiles and basically have a list of profileID that are included in that search. (But as far as i know, you can't do this in sql)
The only way i can think i can do this is have 10 columns called profile1, profile2 ... profile10 and place each profileid into the column and 0 or null in the unused space. (but this is clearly messy )
Creating columns of the form name1...nameN is a clear violation of the Zero, One or Infinity Rule of database normalization. Arbitrarily having ten of them is not the right approach, that's an assumption that will prove to be either wildly generous or too constrained most of the time. Since you're using a relational database, try and store your data relationally.
Consider the schema:
CREATE TABLE users (
id INT PRIMARY KEY AUTO_INCREMENT NOT NULL,
name VARCHAR(255),
UNIQUE KEY index_on_name (name)
);
CREATE TABLE profiles (
id INT PRIMARY KEY AUTO_INCREMENT NOT NULL,
user_id INT NOT NULL,
name VARCHAR(255),
email VARCHAR(255),
KEY index_on_user_id (user_id)
);
With that you can create zero or more profile records as required. You can also add or remove fields from the profile records without impacting the main user records.
If you ever want to search for all profiles associated with a user:
SELECT ... FROM profiles
LEFT JOIN users ON
users.id=profiles.user_id
WHERE users.name=?
Using a simple JOIN or subquery you can easily exercise this relationship.

MySQL column organization

My web application allows a user to define from 1 up to 30 emails (could be anything else).
Which of these options is best?
1) ...store the data inside only one column using a separator, like this:
[COLUMN emails] peter#example.com,mary#example.com,john#example.com
Structure:
emails VARCHAR(1829)
2) ...or save the data using distinct columns, like this:
[COLUMN email1] peter#example.com
[COLUMN email2] mary#example.com
[COLUMN email3] john#example.com
[...]
Structure:
email1 VARCHAR(60)
email2 VARCHAR(60)
email3 VARCHAR(60)
[...]
email30 VARCHAR(60)
Thank you in advance.
Depends on how you are going to use the data and how fixed the amount of 30 is. If it is an advantage to quickly query for the 3rd address or filter using WHERE clauses and such: use distinct fields; otherwise it might not be worth the effort of creating the columns.
Having the data in a database still has the advantage of concurrent access by several users.
Number two is the better option, without question. If you do the first one (comma separated), then it negates the advantages of using a RDBMS (you can't run an efficient query on your emails in that case, so it may as well be a flat file).
number 2 is better than number one.
However, you should consider another option of getting a normalized structure where you have a separate emails table with a foreign key to your user record. This would allow you to define an index if you wanted to search by email to find a user and place a constraint ensuring no duplicate emails are registered - if you wanted to do that.
Neither one is a very good option.
Option 1 is a poor idea because it makes looking a user up by email a complex, inefficient task. You are effectively required to perform a full text search on the email field in the user record to find one email.
Option 2 is really a WORSE idea, IMO, because it makes any surrounding code a huge pain to write. Suppose, again, that you need to look up all users who have a value X. You now need to enumerate 30 columns and check each one to see if that value exists. Painful!
Storing data in this manner -- 1-or-more of some element of data -- is very common in database design, and as Adam has previously mentioned, is best solved in MOST cases by using a normalized data structure.
A correct table structure, written in MySQL since this was tagged as such, might look like:
Users table:
CREATE TABLE user (
user_id int auto_increment,
...
PRIMARY KEY (user_id)
);
Emails table:
CREATE TABLE user_email (
user_id int,
email char(60) not null default '',
FOREIGN KEY (user_id) REFERENCES user (user_id) ON DELETE CASCADE
);
The FOREIGN KEY statement is optional -- the design will work without it, however, that line causes the database to force the relationship. For example, if you attempt to insert a record into user_email with a user_id of 10, there MUST be a corresponding user record with a user_id of 10, or the query will fail. The ON DELETE CASCADE tells the database that if you delete a record from the user table, all user_email records associated with it will also be deleted (you may or may not want this behavior).
This design of course also means that you need to perform a join when you retrieve a user record. A query like this:
SELECT user.user_id, user_email.email FROM user LEFT JOIN user_email ON user.user_id = user_email.user_id WHERE <your where clause>;
Will return one row for EACH user_email address stored in the system. If you have 5 users and each user has 5 email addresses, the above query will return 25 rows.
Depending on your application, you may want to get one row per user but still have access to all the emails. In that case you might try an aggregate function like GROUP_CONCAT which will return a single row per user, with a comma-delimited list of emails belonging to that user:
SELECT user.user_id, GROUP_CONCAT(user_email.email) AS user_emails FROM user LEFT JOIN user_email ON user.user_id = user_email.user_id WHERE <your where clause> GROUP BY user.user_id;
Again, depending on your application, you may want to add an index to the email column.
Finally, there ARE some situations where you do not want a normalized database design, and a single-column design with delimited text might be more appropriate, although those situations are few and far between. For most normal applications, this type of normalized design is the way to go and will help it perform and scale better.

How to create a "following" table?

I am designing a simple twitter site ( for study ) but with a little bit different: the users can follow other users, keywords and lists. I want to know how to create a following table to put information about following.
Is this approach ( below ) correct ?
Following Table:
id ( id of the following table )
type ( type can be 1 ( user ), 2 ( keyword ) or 3 ( list ) )
idtype ( id of the type table )
user ( user's id )
However there isn't a keyword table. So I don't know.
What is the best approach ?
It's incorrect because you can't create a foreign key from idtype to the parent table, because "parent table" changes depending on type. BTW, if user can follow multiple keywords, then you won't escape having a separate table for that (unless you want to break the 1NF by "packing" several values into the same field, which is a really bad idea).
There are couple of ways to resolve this, probably the simplest one is to use separate id fields for each of the possible parent tables, and then constrain them so only one of them can be non-NULL.
However, since InnoDB tables are clustered and secondary indexes in clustered tables are expensive, I'd rather go with something like this (tweets table not shown):
This will enable you to very efficiently answer the query: "which users follow the given user (or keyword or list)". If you need to answer: "which users (or keywords or lists) the given user follows", reverse the order of fields in the PKs shown above. If you need both, then you'd need indexes in both directions (and pay the clustering price).

Database design - how to implement user group table?

I want to make user group system that imitates group policy in instant messengers.
Each user can create as many as groups as they want, but they cannot have groups with duplicate names, and they can put as many friends as they want into any groups.
For example, John's friend Jen can be in 'school' group of John and 'coworker' group of John at the same time. And, it is totally independent from how Jen puts John into her group.
I'm thinking two possible ways to implement this in database user_group table.
1.
user_group (
id INT PRIMARY KEY AUTO_INCREMENT,
user_id INT,
group_name VARCHAR(30),
UNIQUE KEY (user_id, group_name)
)
In this case, all groups owned by all users will have a unique id. So, id alone can identify which user and the name of the group.
2.
user_group (
user_id INT,
group_id INT AUTO_INCREMENT,
group_name VARCHAR(30),
PRIMARY KEY (user_id, group_id),
UNIQUE KEY (user_id, group_name)
)
In this case, group_id always starts from 0 for each user, so, there could exist many groups with same group_id s. But, pk pair (user_id, group_id) is unique in the table.
which way is better implementation and why?
what are advantages and drawbacks for each case?
EDIT:
added AUTO_INCREMENT to group_id in second scenario to insure it is auto-assigned from 0 for each user_id.
EDIT:
'better' means...
- better performance in SELECT/INSERT/UPDATE friends to the group since that will be the mostly used operations regarding the user group.
- robustness of database like which one will be more safe in terms of user size.
- popularity or general preference of either one over another.
- flexibility
- extensibility
- usability - easier to use.
Personally, I would go with the 1st approach, but it really depends on how your application is going to work. If it would ever be possible for ownership of a group to be changed, or to merge user profiles, this will be much easier to do in your 1st approach than in the 2nd. In the 2nd approach, if either of those situations ever happen, you would not only have to update your user_group table, but any dependent tables as well that have a foreign key relation to user_group. This will also be a many to many relation (there will be multiple users in a group, and a user will be a member of multiple groups), so it will require a separate joining table. In the 1st approach, this is fairly straightforward:
group_member (
group_id int,
user_id int
)
For your 2nd approach, it would require a 3rd column, which will not only be more confusing since you're now including user_id twice, but also require 33% additional storage space (this may or may not be an issue depending on how large you expect your database to be):
group_member (
owner_id int,
group_id int,
user_id int
)
Also, if you ever plan to move from MySQL to another database platform, this behavior of auto_increment may not be supported. I know in MS SQL Server, an auto_increment field (identity in MSSQL) will always be incremented, not made unique according to indexes on the table, so to get the same functionality you would have to implement it yourself.
Please define "better".
From my gut, I would pick the 2nd one.
The searchable pieces are broken down more, but that wouldn't be what I'd pick if insert/update performance is a concern.
I see no possible benefit to number 2 at all, it is more complex, more fragile (it would not work at all in SQL Server) and gains nothing. Remeber the groupId is without meaning except to identify a record uniquely, likely the user willonly see the group name not the id. So it doesn't matter if they all start from 0 or if there are gaps because a group was rolled back or deleted.

database design: what fields are must for a user table in database?

I am trying to design a user table for MySQL.
for now, my user table looks like this
users (
BIGINT id,
VARCHAR(?) username,
VARCHAR(?) password,
VARCHAR(254) email,
DATETIME last_login,
DATETIME data_created
)
what other fields should I also include and why do I need them?
what fields should I exclude from above and why?
how many characters should I allocate for username and password, and why?
should I use BIGINT for id?
Thank you in advance for your helps.
ADDED
I am going to use the table for social web site, so 'users' mean people around the world.
A few comments:
BIGINT is fine. I assume you're using it as a surrogate key. In that case, declare it as
BIGINT id primary key auto_increment,
Mysql will automatically allocate a unique int value to your id whenever you do an insert (don't specify any value for this field). Never try to implement this behaviour by selecting the max id and adding 1 to it (I've seen this so many times).
Length of username and password: this is no big deal really, just pick a length. I tend to standardise on 255 length varchars for these things but that's not based on anything empirical.
Call your table "user", not "users". Conceptually, a table implements an entity in the same way that a class implements an entity. You will likely create a class or data structure called "user" and the table name should correspond to this.
Every table that I create has two timestamps, "created" and "last_updated". You're halfway there:)
I don't think I would store last_login in the user table, this is likely to be something that you will want to log in a separate table. It's a good idea to store all login events (login, logout, failed attempt, account lock etc.) in a logging table. This will give you much better visibility of what the system has been doing.
1/ Username and password: decide for yourself how large you want these to be.
2/ BIGINT is fine, even though an integer probably suffices. But make it UNSIGNED and probably AUTO_INCREMENT, too.
3/Try keeping your Users table as small as possible:
users (
BIGINT id,
VARCHAR(?) username,
VARCHAR(?) password,
VARCHAR(254) email,
DATETIME data_created
)
The rest, you put in extra tables:
logins (
BIGINT loginid
BIGINT userid
DATETIME last_login,
VARCHAR(15) IP_ADRESS
...
)
This way, your users table will only change when a new user is added or deleted, or when someone changes his password, which is less frequently then when someone logs in. This allows for better table caching (MySQL clears the table cache when you write to the table).
All that just depends on your own specs. For username you could take 100 if you like, for password take the length of the hashing function you want to use (32 for MD5).
It's more common to use INTEGER(10) with AUTO_INCREMENT on the primary key of a table.
You might want to ask for a name, surname, birth date, place of living, etc. Think that all the data you ask the user for should be somehow important to the platform that you are building.