Composite Primary Keys and auto increment? What is a good practices? - mysql

I'm developing SaaS app with multi-tenancy, and i've decide to use single DB (MySQL Innodb for now) for client's data. I chose to use composite primary keys like PK(client_id, id). I have 2 ways here,
1: increment "id" by myself (by trigger or from code)
or
2: Make "id" as auto-increment by mysql.
In first case i will have unique id's for each client, so each client will have id 1, 2, 3 etc..
In second case id's will grow for all clients.
What is the best practice here? My priorities are: performace, security & scaling. Thanks!

You definitely want to use autoincrementing id values as primary keys. There happen to be many reasons for this. Here are some.
Avoiding race conditions (accidental id duplication) requires great care if you generate them yourself. Spend that mental energy -- development, QA, operations -- on making your SaaS excellent instead of reinventing the flat tire on primary keys.
You can still put an index on (client_id, id) even if it isn't the PK.
Your JOIN operations will be easier to write, test, and maintain.
This query pattern is great for getting the latest row for each client from a table. It performs very well. It's harder to do this kind of thing if you generate your own pks.
SELECT t.*
FROM table t
JOIN (SELECT MAX(id) id
FROM table
GROUP BY client_id
) m ON t.id = m.id

"PK(client_id, id)" --
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
PRIMARY KEY(client_id, id),
INDEX(id)
Yes, that combination will work. And it will work efficiently. It will not assign 1,2,3 to each client, but that should not matter. Instead, consecutive ids will be scattered among the clients.
Probably all of your queries will include WHERE client_id = (constant), correct? That means that PRIMARY KEY(client_id, id) will always be used and INDEX(id) won't be used except for satisfying AUTO_INCREMENT.
Furthermore, that PK will be more efficient than having INDEX(client_id, id). (This is because InnoDB "clusters" the PK with the data.)

Related

Composite primary keys vs auto increment primary key in sql workbench

I need the advice of someone who has a greeter experience.
I have an associative entity in my database, like that:
Table2-> CustomerID, ServiceID, DateSub
Since the same customer (with PK, for example 1111) can require the same service (with PK, for example 3) more than once but never in the same date , the composite PK of Table 2 can't be just (CustomerID, ServiceID).
Now I have 2 options:
1- Also "DateSub" will be a primary key, so the PK of table 2 will be (CustomerID, ServiceID, DateSub)
2- Create a specific PK for the associative entity (for example, Table2ID, and so CustomerID and Service ID will be FK)
Which of the 2 approach would you follow and why? Thank you
First of all you need to decide whether is it your requirement to make combination of CustomerID, ServiceID amd DateI column as unique? If so then you should go for firt option.
Otherwise I would go for second option.
With first option if DateI is of date data type you will not be able to insert same service for a customer twice. If it's datetime then it's doable though.
If you want to use this primary key (composite primary key) in any other table as foreign key then you need to use all three columns there too.
I tend to prefer the PK be "natural". You have 3 columns that, together, can uniquely define each row. I would consider using it.
The next question is what order to put the 3 columns in. This depends on the common queries. Please provide them.
An index (including the PK) is used only leftmost first. It may be desirable to have some secondary key(s), for efficient access to other columns. Again, let's see the queries.
If you have a lot of secondary indexes, it may be better to have a surrogate, AUTO_INCREMENT "id" as the PK. Again, let's see the queries.
If you ever use a date range, then it is probably best to have DateSub last in any index. (There are rare exceptions.)
How many rows in the table?
The table is ENGINE=InnoDB, correct?
Reminder: The PRIMARY KEY is a Unique key, which is an INDEX.
DateSub is of datatype DATE, correct?

MySQL 3-way 1..n tables relation

1 database with 3 tables: user - photo - vote
- A user can have many photos.
- A photo can have many votes.
- A user can vote on many photos.
- A vote records:
. the result as an int (-1/disliked, 0/neutral, 1/liked)
. the id of the user who voted.
Here is what I have (all FKs are cascade on delete and update):
http://grab.by/iZYE (sid = surrogate id)
My question is: this doesn't seem right, and I look at this for 2 days already and can't confidently move on. How can I optimize this or am I completely wrong?
MySQL/InnoDB tables are always clustered (more on clustering here and here).
Since primary key also acts as a clustering key1, using the surrogate primary key means you are physically sorting the table in order that doesn't have a useful meaning for the client applications and cannot be utilized for querying.
Furthermore, secondary indexes in clustered tables can be "fatter" than in heap-based tables and may require double lookup.
For these reasons, you'd want to avoid surrogates and use more "natural" keys, similar to this:
({USER_ID, PICTURE_NO} in table VOTE references the same-named fields in PICTURE. The VOTE.VOTER_ID references USER.USER_ID. Use integers for *_ID and *_NO fields if you can.)
This physical model will enable extremely efficient querying for:
Pictures of the given user (a simple range scan on PICTURE primary/clustering index).
Votes on the given picture (a simple range scan on VOTE primary/clustering index). Depending on circumstances, this may actually be fast enough so you don't have to cache the sum in PICTURE.
If you need votes of the given user, change the VOTE PK to: {VOTER_ID, USER_ID, PICTURE_NO}. If you need both (votes of picture and votes of user), keep the existing PK, but create a covering index on {VOTER_ID, USER_ID, PICTURE_NO, VOTE_VALUE}.
1 In InnoDB. There are DBMSes (such as MS SQL Server) where clustering key can differ from primary.
The first thing I see is that you have duplicate unique IDs on the tables. You don't need the sid columns; just use user_id, photo_id, and photo_user_id (maybe rename this one to vote_id). Those ID columns should also be INT type, definitely not VARCHARs. You probably don't need the vote total columns on photo; you can just run a query to get the total when you need it and not worry about keeping both tables in sync.
Assuming that you will only allow one vote per user on each photo, the structure of the can be modified so the only columns are user_id, photo_id, and vote_result. You would then make the primary key a composite index on (user_id, photo_id). However, since you're using foreign keys, that makes this table a bit more complicated.

MySQL table - designing efficient table

I'm designing a db table that will save a list of user's favorited food items.
I created favorite table with the following schema
id, user_id, food_id
user_id and food_id will be foreign key linking to another table.
Im just wondering if this is efficient and scalable cause if user has multiple favorite things then it would need multiple rows of data.
i.e. user has 5 favorited food items, then it will consist of five rows to save the list for that user.
Is this efficient? and scalable? Whats the best way to optimize this schema?
thnx in advance!!!
tldr; This is called a "join table" and is the correct and scalable approach to model M-M relationships in a relational database. (Depending upon the constraints used it can also model 1-M/1-1 relationships in a "no NULL FK" schema.)
However, I contend that the id column should be omitted here so that the table is only user_id, food_id. The PK will be (user_id, food_id) in this case.
Unlike other tables, where surrogate (aka auto-increment) PKs are sometimes argued for, a surrogate PK generally only adds clutter in a join table as it has a very natural compound PK.
While the PK itself is compound in this case, each "joined" table only relates back by part of the PK. Depending upon queries performed it might also be beneficial to add covering indices on food_id or (food_id, user_id).
Eliminate Surrogate Key: Unless you have a specific reason for the surrogate key id, exclude it from the table.
Fine-tune Indexing: A this point, you just have a composite primary key that is the combination of the two foreign keys. In which order should the PK fields be?
If your application(s) predominantly execute queries such as: "for given user, give me foods", then PK should be {user_id, food_id}.
If the predominant query is "for given food, give me users", then the PK should be {food_id, user_id}.
If both query "directions" are common, add a UNIQUE INDEX that has the same fields as PK, but in opposite directions. So you'll have PK on {user_id, food_id} and index on {food_id, user_id}.
Note that InnoDB tables are clustered, which eliminates (in this case "unnecessary") table heap. Yet, the secondary index discussed above will not cause a double-lookup (since it fully covers the query), nor have a hidden overhead of PK fields (since it indexes the same fields as PK, just in opposite order).
For more on designing a junction table, take a look at this post.
To my opinion, you can optimize your table in the following ways:
As a relation table with 2 foreighkeys you don't have to use "id" field.
use "innodb" engine to your table
name your relation table "user_2_food", which will make it more clear.
try to use datatype as small as possible, i.e. "smallint" is better than "int", and don't forget "UNSIGNED" attribute.
Creating the below three Tables will result in an efficient design.
users : userId, username, userdesc
foods : foodId, foodname, fooddesc
userfoodmapping : ufid, userid, foodid, rowstate
The significance of rowstate is, if the user in future doesn't like that food, its state will become -1
You have 2 options in my opnion:
Get rid of the ID field, but in that case, make both your other keys (combined) your primary key
Keep your ID key as the primary key for your table.
In either case, I think this is a proper approach. Once you get into a problem of inefficiency, then you will look at probably how to load part of the table or any other technique. This would do for now.

To make PK on unique combination of columns or add a numeric rowID

This is more of a design problem then a programming one.
I have a table where I store details about retail products:
Name Barcode BarcodeFormat etc...
----------------------------------------
(Name, Barcode, BarcodeFormat) are three columns will uniquely identify a record in the table (Candidate Key). However, I have other tables that need a FK on this one. So I introduced an auto_increment column itemId and made that the PK.
My question is - should I have the PK as (itemId, Name, Barcode, BarcodeFormat) or would it be better to have PK(itemId) and UNIQUE(Name, Barcode, BarcodeFormat).
My primary concern is performance in terms of INSERT and SELECT operations but comments on size are also welcome.
I'm using an innodb table with mysql
Definitely: PK(itemId) and UNIQUE(Name, Barcode, BarcodeFormat).
You don't want the hassle of using a multi-part key for all your joins etc
You may one day have rows without barcode values which then won't be unique, so you don't want uniqueness hard-wired into your model (you can easily drop the unique without breaking any relationships etc)
The constraint on uniqueness is a business-level issue, not a database entity one: You'll always need a key, but you may not always need the business rule of uniqueness
Unless you have millions of products, or very high throughput requirements it won't make much difference in terms of performance.
My preference is to have a surrogate PK (i.e. the auto increment column, your second option of PK(itemId) and UNIQUE(Name, Barcode, BarcodeFormat) ) because this is easier to manage if business keys change.
You have two candidate keys. We call the three-column compound key the 'natural key' and the auto_increment column (in this case) the 'surrogate key'. Both require unique constraints ('unique' in lower case to denote logical) at the database level.
Optionally, one candidate key may be designated 'primary'. The choice of which key (if any) should get this designation is arbitrary. Beware of anyone giving you definitive advice on this matter!
If you already add an itemId then you should use that as PK and have the other three columns with a UNIQUE.
If you don't have an itemId then you could use the other columns as the PK, but it may become difficult to keep it everywhere. In this case it is not great, because the product should have an id since it is an entity, but if it where just a relationship, then it would be acceptable not to have an id column.

Database design - how to implement user group table?

I want to make user group system that imitates group policy in instant messengers.
Each user can create as many as groups as they want, but they cannot have groups with duplicate names, and they can put as many friends as they want into any groups.
For example, John's friend Jen can be in 'school' group of John and 'coworker' group of John at the same time. And, it is totally independent from how Jen puts John into her group.
I'm thinking two possible ways to implement this in database user_group table.
1.
user_group (
id INT PRIMARY KEY AUTO_INCREMENT,
user_id INT,
group_name VARCHAR(30),
UNIQUE KEY (user_id, group_name)
)
In this case, all groups owned by all users will have a unique id. So, id alone can identify which user and the name of the group.
2.
user_group (
user_id INT,
group_id INT AUTO_INCREMENT,
group_name VARCHAR(30),
PRIMARY KEY (user_id, group_id),
UNIQUE KEY (user_id, group_name)
)
In this case, group_id always starts from 0 for each user, so, there could exist many groups with same group_id s. But, pk pair (user_id, group_id) is unique in the table.
which way is better implementation and why?
what are advantages and drawbacks for each case?
EDIT:
added AUTO_INCREMENT to group_id in second scenario to insure it is auto-assigned from 0 for each user_id.
EDIT:
'better' means...
- better performance in SELECT/INSERT/UPDATE friends to the group since that will be the mostly used operations regarding the user group.
- robustness of database like which one will be more safe in terms of user size.
- popularity or general preference of either one over another.
- flexibility
- extensibility
- usability - easier to use.
Personally, I would go with the 1st approach, but it really depends on how your application is going to work. If it would ever be possible for ownership of a group to be changed, or to merge user profiles, this will be much easier to do in your 1st approach than in the 2nd. In the 2nd approach, if either of those situations ever happen, you would not only have to update your user_group table, but any dependent tables as well that have a foreign key relation to user_group. This will also be a many to many relation (there will be multiple users in a group, and a user will be a member of multiple groups), so it will require a separate joining table. In the 1st approach, this is fairly straightforward:
group_member (
group_id int,
user_id int
)
For your 2nd approach, it would require a 3rd column, which will not only be more confusing since you're now including user_id twice, but also require 33% additional storage space (this may or may not be an issue depending on how large you expect your database to be):
group_member (
owner_id int,
group_id int,
user_id int
)
Also, if you ever plan to move from MySQL to another database platform, this behavior of auto_increment may not be supported. I know in MS SQL Server, an auto_increment field (identity in MSSQL) will always be incremented, not made unique according to indexes on the table, so to get the same functionality you would have to implement it yourself.
Please define "better".
From my gut, I would pick the 2nd one.
The searchable pieces are broken down more, but that wouldn't be what I'd pick if insert/update performance is a concern.
I see no possible benefit to number 2 at all, it is more complex, more fragile (it would not work at all in SQL Server) and gains nothing. Remeber the groupId is without meaning except to identify a record uniquely, likely the user willonly see the group name not the id. So it doesn't matter if they all start from 0 or if there are gaps because a group was rolled back or deleted.