I want to make user group system that imitates group policy in instant messengers.
Each user can create as many as groups as they want, but they cannot have groups with duplicate names, and they can put as many friends as they want into any groups.
For example, John's friend Jen can be in 'school' group of John and 'coworker' group of John at the same time. And, it is totally independent from how Jen puts John into her group.
I'm thinking two possible ways to implement this in database user_group table.
1.
user_group (
id INT PRIMARY KEY AUTO_INCREMENT,
user_id INT,
group_name VARCHAR(30),
UNIQUE KEY (user_id, group_name)
)
In this case, all groups owned by all users will have a unique id. So, id alone can identify which user and the name of the group.
2.
user_group (
user_id INT,
group_id INT AUTO_INCREMENT,
group_name VARCHAR(30),
PRIMARY KEY (user_id, group_id),
UNIQUE KEY (user_id, group_name)
)
In this case, group_id always starts from 0 for each user, so, there could exist many groups with same group_id s. But, pk pair (user_id, group_id) is unique in the table.
which way is better implementation and why?
what are advantages and drawbacks for each case?
EDIT:
added AUTO_INCREMENT to group_id in second scenario to insure it is auto-assigned from 0 for each user_id.
EDIT:
'better' means...
- better performance in SELECT/INSERT/UPDATE friends to the group since that will be the mostly used operations regarding the user group.
- robustness of database like which one will be more safe in terms of user size.
- popularity or general preference of either one over another.
- flexibility
- extensibility
- usability - easier to use.
Personally, I would go with the 1st approach, but it really depends on how your application is going to work. If it would ever be possible for ownership of a group to be changed, or to merge user profiles, this will be much easier to do in your 1st approach than in the 2nd. In the 2nd approach, if either of those situations ever happen, you would not only have to update your user_group table, but any dependent tables as well that have a foreign key relation to user_group. This will also be a many to many relation (there will be multiple users in a group, and a user will be a member of multiple groups), so it will require a separate joining table. In the 1st approach, this is fairly straightforward:
group_member (
group_id int,
user_id int
)
For your 2nd approach, it would require a 3rd column, which will not only be more confusing since you're now including user_id twice, but also require 33% additional storage space (this may or may not be an issue depending on how large you expect your database to be):
group_member (
owner_id int,
group_id int,
user_id int
)
Also, if you ever plan to move from MySQL to another database platform, this behavior of auto_increment may not be supported. I know in MS SQL Server, an auto_increment field (identity in MSSQL) will always be incremented, not made unique according to indexes on the table, so to get the same functionality you would have to implement it yourself.
Please define "better".
From my gut, I would pick the 2nd one.
The searchable pieces are broken down more, but that wouldn't be what I'd pick if insert/update performance is a concern.
I see no possible benefit to number 2 at all, it is more complex, more fragile (it would not work at all in SQL Server) and gains nothing. Remeber the groupId is without meaning except to identify a record uniquely, likely the user willonly see the group name not the id. So it doesn't matter if they all start from 0 or if there are gaps because a group was rolled back or deleted.
Related
We are making a mobile application with some friends, but we are having problems regarding the structure of the database due to Unknown.I think it is a good question that can help many people and it would be nice if people with knowledge can explain it well. The app consists of providing various services (more can be added in the future) to customers. They are logged in and have access to our services. At first we thought of a table that contains columns with all the customer data + the services. Then we saw that it was more effective to make another separate table called "services" and that identifies the user by an id. The problem now comes to this table. We do not know whether to make a single column with all services (such as array) or to make one column per service. I took a photo so that what I am proposing can be observed more easily.
The question is which of these options (obviously there may be a third that we do not contemplate) is the best, in terms of performance.
I think that the second option I see several defects but I'm not sure. In terms of latency and speed, traversing an array (and more if services are added, or perhaps they are out of order because the user first hired service2 and then 1) is much higher than in option 1. In addition, the fact that a user is under a service, that implies going through the entire array, looking for it and eliminating it. I don't know you are the experts, what do you recommend?all this will be uploaded to the cloud (azure), so all requests will be to the cloud
Option 2 is better than option 1. But, with respect, it's still not good.
Never never store comma-separated lists of things in columns of data. If you do you'll be sorry. (They're very costly to search.)
You want something like this. Three tables, one for users, another for services, and a so-called JOIN table to establish a many-to-many relationship between the two.
+-----------+ +-------------+ +-----------+
|user | |user_service | |service |
+-----------+ +-------------+ +-----------+
|user_id +--->|user_id |<----+service_id |
|givennamee | |service_id | |name |
|surname | +-------------+ +-----------+
|is_active |
+-----------+
Each row in user_service means a user is authorized to use that service. To authorize a user, INSERT a row. To revoke authorization, DELETE the row.
To find out whethe a user can use a service, use this query.
SELECT user.user_id
FROM user
JOIN user_service USING (user_id)
JOIN service USING (service_id)
WHERE user.givenname = 'Bill' AND user.surname='Gates'
AND service.name = 'CharityNavigator'
AND user.is_active > 0;
If your query returns the user_id then the chosen user may use the chosen service.
To get a list of the services for each user, use this query.
SELECT user.user_id, user.givenname, user.surname,
GROUP_CONCAT(service.name) service_names
FROM user
JOIN user_service USING (user_id)
JOIN service USING (service_id)
WHERE user.is_active > 0
GROUP BY user.user_id
Some explanation:
It's almost always best to build tables with rows for things like your services in them, rather than columns or comma-separated lists in columns. Why?
You can add new services -- as many as you want -- years from now without reworking your database code.
DBMSs, including MySQL, work well with JOIN operations.
Doing WHERE commalist_column SOMEHOW_CONTAINS (some_id) is disgustingly inefficient in most relational database management systems. Doing WHERE column = some_id is far more efficient because it can use an index.
Rows with fewer columns, in general, work better than rows with more columns.
It's far cheaper in production to add rows to databases than it is to add columns. Adding columns means altering table definitions. That operation can require downtime.
When you use columns for things like your services, you're creating a closed system. When you use rows, your system is open-ended.
May I suggest you read about database normalization? Don't be intimidated by all the CS jargon. Just look at some examples of how to normalize various databases.
And maybe read about entity-relationship database modeling?
Edit On the advice of a commenter, I suggest you make the primary key of your user_service table to contain both columns (user_id, service_id). I also suggest you make a reverse index with both columns (service_id, user_id) so your queries can look things up quickly starting with service as well as user. Your table definitions might look something like this:
CREATE TABLE user (
user_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
givenname VARCHAR(50) NULL DEFAULT NULL,
surname VARCHAR(50) NULL DEFAULT NULL,
is_active TINYINT NOT NULL DEFAULT '1',
PRIMARY KEY (user_id)
)
COLLATE='utf8mb4_general_ci';
CREATE TABLE service (
service_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
name VARCHAR(50) NULL DEFAULT NULL,
PRIMARY KEY (service_id)
)
COLLATE='utf8mb4_general_ci';
CREATE TABLE user_service (
user_id INT UNSIGNED NOT NULL,
service_id INT UNSIGNED NOT NULL,
PRIMARY KEY (user_id, service_id),
INDEX reverse_index (service_id, user_id),
CONSTRAINT FK_service
FOREIGN KEY (service_id)
REFERENCES service (service_id)
ON UPDATE RESTRICT ON DELETE RESTRICT,
CONSTRAINT FK_user
FOREIGN KEY (user_id)
REFERENCES user (user_id)
ON UPDATE RESTRICT ON DELETE RESTRICT
);
With this primary key if you attempt to INSERT a duplicate authorization for a user for a service, the dbms rejects it.
Be sure to use the same 'INT UNSIGNED NOT NULLdata type foruser_idandservice_id` in those tables.
This is a very common database design pattern: it's the canonical way of creating a many-to-many relationship between rows of two different tables.
A 3rd way (most frugal on space)
See the SET datatype. It allows for saying which combination of those 6 servs apply.
INT UNSIGNED (of a suitable size) is another way to have a "set".
SET or TINYINT takes only 1 byte to represent up to 8 items.
Your 6 column choice takes 6 bytes.
The "{serv1,... }" might be a VARCHAR, averaging 10-20 bytes.
So, My suggestions are clearly aimed at saving space. But maybe that is not important? Do you have millions or rows? Do you have more tnan 64 "servs"? (There is a limit of 64 on SET and BIGINT UNSIGNED.)
But Which?
Is the question about coding? Well, any method is going to take some effort to split the bits/columns/string apart to build the buttons on the screen. Probably a similar amount of effort and probably less than the effort to build the screen. Ditto for performance.
I highly recommend you pick two solutions and implement both. You will discover
How similar they are in performance, amount code, etc.
How insignificant the question is.
How much extra stuff you have learned about databases.
How easy it is to "try" and "throw away" another way to do something.
How the latency, performance, etc, differences are insignificant. (This is what we are really answering for you.)
The bigger picture
You have pointed out one use for this data structure. I worry that there are, or will be, other uses for this data structure. And that something else is the real determinant of which approach is best. (At that point, you can happily resurrect the thrown away version!)
A 4th way
JSON. But it would be more verbose (take more space) than your VARCHAR way. It may or may not be easier to work with -- this depends on the rest of the requirements.
I'm trying to model a simple poll system, I have 4 tables
Election
id, title, description
Candidate
id, electionId, name
User
id, (other user details)...
Vote
userId, candidateId
There is a 1-n relation from Election to Candidate. If someone runs in multiple elections, they are listed as multiple candidates.
I'm having trouble figuring out how to constrain each user to one vote in each election at the database level. If I create an electionId column in Vote I create inconsistent or redundant data, but I can't think of any other way to constrain the data like that otherwise.
I feel like this has to be a common problem but I don't know what to call it so my last half an hour of searching hasn't been fruitful. What's the correct approach here?
You could change Candidate's PK to be a composite of electionId, name or at least make that combination a unique constraint in Candidate.
Then you would change Vote to be userId, electionId, name where the PK is userId, electionId and there is a FK pointing to Candidate's electionId, name which is now unique.
This means that userId and electionId are unique for the vote table and there is no redundancy left.
You can do this with your current schema by adding validation before the insert into Vote (in mysql this is done with a TRIGGER BEFORE INSERT). You'd select all votes by that particular user, joined with candidate on candidateId, and make sure none of the electionIds match the election Id of the candidate the vote is for.
This is completely normalized but expensive. Sometimes it's worth adding redundant fields for the sake of performance. I'd add electionId to Vote in this schema so that inserts don't need such an expensive validation.
I'm developing SaaS app with multi-tenancy, and i've decide to use single DB (MySQL Innodb for now) for client's data. I chose to use composite primary keys like PK(client_id, id). I have 2 ways here,
1: increment "id" by myself (by trigger or from code)
or
2: Make "id" as auto-increment by mysql.
In first case i will have unique id's for each client, so each client will have id 1, 2, 3 etc..
In second case id's will grow for all clients.
What is the best practice here? My priorities are: performace, security & scaling. Thanks!
You definitely want to use autoincrementing id values as primary keys. There happen to be many reasons for this. Here are some.
Avoiding race conditions (accidental id duplication) requires great care if you generate them yourself. Spend that mental energy -- development, QA, operations -- on making your SaaS excellent instead of reinventing the flat tire on primary keys.
You can still put an index on (client_id, id) even if it isn't the PK.
Your JOIN operations will be easier to write, test, and maintain.
This query pattern is great for getting the latest row for each client from a table. It performs very well. It's harder to do this kind of thing if you generate your own pks.
SELECT t.*
FROM table t
JOIN (SELECT MAX(id) id
FROM table
GROUP BY client_id
) m ON t.id = m.id
"PK(client_id, id)" --
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
PRIMARY KEY(client_id, id),
INDEX(id)
Yes, that combination will work. And it will work efficiently. It will not assign 1,2,3 to each client, but that should not matter. Instead, consecutive ids will be scattered among the clients.
Probably all of your queries will include WHERE client_id = (constant), correct? That means that PRIMARY KEY(client_id, id) will always be used and INDEX(id) won't be used except for satisfying AUTO_INCREMENT.
Furthermore, that PK will be more efficient than having INDEX(client_id, id). (This is because InnoDB "clusters" the PK with the data.)
My web application allows a user to define from 1 up to 30 emails (could be anything else).
Which of these options is best?
1) ...store the data inside only one column using a separator, like this:
[COLUMN emails] peter#example.com,mary#example.com,john#example.com
Structure:
emails VARCHAR(1829)
2) ...or save the data using distinct columns, like this:
[COLUMN email1] peter#example.com
[COLUMN email2] mary#example.com
[COLUMN email3] john#example.com
[...]
Structure:
email1 VARCHAR(60)
email2 VARCHAR(60)
email3 VARCHAR(60)
[...]
email30 VARCHAR(60)
Thank you in advance.
Depends on how you are going to use the data and how fixed the amount of 30 is. If it is an advantage to quickly query for the 3rd address or filter using WHERE clauses and such: use distinct fields; otherwise it might not be worth the effort of creating the columns.
Having the data in a database still has the advantage of concurrent access by several users.
Number two is the better option, without question. If you do the first one (comma separated), then it negates the advantages of using a RDBMS (you can't run an efficient query on your emails in that case, so it may as well be a flat file).
number 2 is better than number one.
However, you should consider another option of getting a normalized structure where you have a separate emails table with a foreign key to your user record. This would allow you to define an index if you wanted to search by email to find a user and place a constraint ensuring no duplicate emails are registered - if you wanted to do that.
Neither one is a very good option.
Option 1 is a poor idea because it makes looking a user up by email a complex, inefficient task. You are effectively required to perform a full text search on the email field in the user record to find one email.
Option 2 is really a WORSE idea, IMO, because it makes any surrounding code a huge pain to write. Suppose, again, that you need to look up all users who have a value X. You now need to enumerate 30 columns and check each one to see if that value exists. Painful!
Storing data in this manner -- 1-or-more of some element of data -- is very common in database design, and as Adam has previously mentioned, is best solved in MOST cases by using a normalized data structure.
A correct table structure, written in MySQL since this was tagged as such, might look like:
Users table:
CREATE TABLE user (
user_id int auto_increment,
...
PRIMARY KEY (user_id)
);
Emails table:
CREATE TABLE user_email (
user_id int,
email char(60) not null default '',
FOREIGN KEY (user_id) REFERENCES user (user_id) ON DELETE CASCADE
);
The FOREIGN KEY statement is optional -- the design will work without it, however, that line causes the database to force the relationship. For example, if you attempt to insert a record into user_email with a user_id of 10, there MUST be a corresponding user record with a user_id of 10, or the query will fail. The ON DELETE CASCADE tells the database that if you delete a record from the user table, all user_email records associated with it will also be deleted (you may or may not want this behavior).
This design of course also means that you need to perform a join when you retrieve a user record. A query like this:
SELECT user.user_id, user_email.email FROM user LEFT JOIN user_email ON user.user_id = user_email.user_id WHERE <your where clause>;
Will return one row for EACH user_email address stored in the system. If you have 5 users and each user has 5 email addresses, the above query will return 25 rows.
Depending on your application, you may want to get one row per user but still have access to all the emails. In that case you might try an aggregate function like GROUP_CONCAT which will return a single row per user, with a comma-delimited list of emails belonging to that user:
SELECT user.user_id, GROUP_CONCAT(user_email.email) AS user_emails FROM user LEFT JOIN user_email ON user.user_id = user_email.user_id WHERE <your where clause> GROUP BY user.user_id;
Again, depending on your application, you may want to add an index to the email column.
Finally, there ARE some situations where you do not want a normalized database design, and a single-column design with delimited text might be more appropriate, although those situations are few and far between. For most normal applications, this type of normalized design is the way to go and will help it perform and scale better.
Newish to mysql DBs here. I have a table of USERS and a table of TEAMS. A user can be on more then one team. What's the best way to store the relationship between a user and what teams he's on?
Lets say there are hundreds of teams, each team consists of about 20 users, and on average a user could be on about 10 teams, also note that users can change teams from time to time.
I can think of possibly adding a column to my TEAMS table which holds a list of user ids, but then i'd have to add a column to my USERS table which holds a list of team ids. Although this might be a solution it seems messy for updating membership. It seems like there might be a smarter way to handle this information... Like another table perhaps? Thoughts?
Thanks!
ps, whats the best field type for storing a list, and whats the best way to delimit?
whats the best field type for storing a list, and whats the best way to delimit?
It's usually a really bad idea to try to store multiple values in a single column. It's hell to process and you'll never get proper referential integrity.
What you're really looking for is a join table. For example:
CREATE TABLE user_teams (
user_id INT NOT NULL FOREIGN KEY REFERENCES users(id),
team_id INT NOT NULL FOREIGN KEY REFERENCES teams(id),
PRIMARY KEY (user_id, team_id)
);
so there can be any number of team_ids for one user and any number of user_ids for one team. (But the primary key ensures there aren't duplicate mappings of the same user-and-team.)
Then to select team details for a user you could say something like:
SELECT teams.*
FROM user_teams
JOIN teams ON teams.id= user_teams.team_id
WHERE user_teams.user_id= (...some id...);