Scoped/composite surrogate keys in MySQL

Scoped/composite surrogate keys in MySQL - mysql

Here's an excerpt of my current database (changed the table-names for an easier understanding):
Pet(ownerFK, id, name, age)
Owner(id, name)
Where id is always a surrogate key, created with auto_increment.
I want to have the surrogate key Pet.id to be "scoped" by Pet.ownerFK or in otherwords, have a composite key [ownerFk, id] as my minimum key. I want the table to behave like this:
INSERT Pet(1, ?, "Garfield", 8);
INSERT Pet(1, ?, "Pluto", 12);
INSERT Pet(2, ?, "Mortimer", 1);
SELECT * FROM Pet;
RESULT:
Pet(1, 1, "Garfield", 8)
Pet(1, 2, "Pluto", 12)
Pet(2, 1, "Mortimer", 1)
I am currently using this feature of MyISAM where "you can specify AUTO_INCREMENT on a secondary column in a multiple-column index. In this case, the generated value for the AUTO_INCREMENT column is calculated as MAX(auto_increment_column) + 1 WHERE prefix=given-prefix. This is useful when you want to put data into ordered groups."
However, due to various (and maybe obvious) reasons, I want to switch from MyISAM to InnoDB, as I need transactions at some places.
Is there any way how to achieve this effect with InnoDB?
I found some posts on this issue, many of them proposed to write-lock the table before insertion. I am not very familiar with this, but wouldn't be a table-write-lock a little-bit of an overhaul for this one? I rather thought of having write-safe transactions (which I never did before) if these are possible - having a Owner.current_pet_counter as an helper field.
So another acceptable Solution would be...
Actually I don't need the "scoped" ID to be part of the actual Key. My actual database design uses a separate "permalink" table which uses this 'feature'. I currently use it as a workaround for the missing transactions. I thought of the following alternative:
Pet(id, ownerFK, scopedId, name, age), KEY(id), UNIQUE(ownerFK, scopedId)
Owner(id, name, current_pet_counter)
START TRANSACTION WITH CONSISTENT SNAPSHOT;
SELECT #new=current_pet_counter FROM Owner WHERE id = :owner_id;
INSERT Pet(?, :owner_id, #new, "Pluto", 21);
UPDATE Owners SET current_pet_counter = #new + 1 WHERE id = :owner_id;
COMMIT;
I haven't worked with transactions/transactionvars in MySQL yet, so I don't know whether there would be serious issues with this one.
Note: I do not want to reuse ids that have been given to a pet once. That's why I don't use MAX(). Does this solution have any caveats?

I don't believe so. If you really had to have that schema, you could use a transaction to SELECT the MAX(id) WHERE ownerFK, then INSERT.
I'm very sceptical there's a good reason for that schema, though; the primary key is now also a fact about the key, which might make the database theorists unhappy.
Normally you'd want ‘id’ to really be a proper primary key on its own, with ownerFK used to group and, if you needed it, a separate ‘rank’ column to put pets in a particular order per owner, and a UNIQUE index over (ownerFK, rank).

Related

Composite Primary Keys and auto increment? What is a good practices?

I'm developing SaaS app with multi-tenancy, and i've decide to use single DB (MySQL Innodb for now) for client's data. I chose to use composite primary keys like PK(client_id, id). I have 2 ways here,
1: increment "id" by myself (by trigger or from code)
or
2: Make "id" as auto-increment by mysql.
In first case i will have unique id's for each client, so each client will have id 1, 2, 3 etc..
In second case id's will grow for all clients.
What is the best practice here? My priorities are: performace, security & scaling. Thanks!

You definitely want to use autoincrementing id values as primary keys. There happen to be many reasons for this. Here are some.
Avoiding race conditions (accidental id duplication) requires great care if you generate them yourself. Spend that mental energy -- development, QA, operations -- on making your SaaS excellent instead of reinventing the flat tire on primary keys.
You can still put an index on (client_id, id) even if it isn't the PK.
Your JOIN operations will be easier to write, test, and maintain.
This query pattern is great for getting the latest row for each client from a table. It performs very well. It's harder to do this kind of thing if you generate your own pks.
SELECT t.*
FROM table t
JOIN (SELECT MAX(id) id
FROM table
GROUP BY client_id
) m ON t.id = m.id

"PK(client_id, id)" --
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
PRIMARY KEY(client_id, id),
INDEX(id)
Yes, that combination will work. And it will work efficiently. It will not assign 1,2,3 to each client, but that should not matter. Instead, consecutive ids will be scattered among the clients.
Probably all of your queries will include WHERE client_id = (constant), correct? That means that PRIMARY KEY(client_id, id) will always be used and INDEX(id) won't be used except for satisfying AUTO_INCREMENT.
Furthermore, that PK will be more efficient than having INDEX(client_id, id). (This is because InnoDB "clusters" the PK with the data.)

Foreign Key or Composite Key?

I just started applying everything that I read about table relationships but I'm kind of confused on how to insert data on tables with MANY-TO-MANY relationship considering there's a third table.
Right now I have these tables.
subject
name
code PK
units
description
schoolyear
schoolyearId PK
yearStart
yearEnd
schoolyearsubjects (MANY TO MANY table)
id PK
code FK
schoolyearId FK
But the problem with the above schoolyearsubjects table is that, I don't know how I can insert the schoolyearId from the GUI. On the GUI screenshot, as soon as "Save" button is clicked, a TRANSACTION containing 2 INSERT statements (to insert on subject) and (to insert on schoolyearsubjects) will execute. If I stick with the above, I'll have to insert the schoolyearId. schoolyearId definitely won't come from GUI.
I'm thinking of changing the columns of schoolyearsubjects and schoolyear to this:
schoolyear
--composite keys (yearStart, yearEnd)
yearStart (PK)
yearEnd (PK)
schoolyearsubjects(MANY TO MANY table)
id PK
code (FK)
yearStart (FK) --is this possible?
yearEnd (FK) --is this possible?
1.) Is the solution to change the columns and make a composite key so I can just insert yearStart and yearEnd values instead of schoolyearId?
2.) Is my junction / linking table schoolyearsubjects correct?
3.) What can you advise?
I'd appreciate any help.
Thanks.

For me schoolyear is a period, and as such, there is no need to use a surrogate key here. This always makes things more confusing, and it is always more difficult to develop a graphical interface for it (I'm talking about how we model periods as developers).
If you stop to think, periods are seen itself as something unique. Will you have a period equal to the other? Stop and think. Even if you have, this will occur in years or different times. So we already have a primary key for schoolyear. Eliminate "schoolyard PK" from schoolyear. Use composite key here with yearStart and yearend. So, your schoolyear entity (in future, table) will be like:
yearStart PK
yearEnd PK
In the intermediate table, you will have 3 fields as composite primary key (also foreign key!):
yearStart PK FK (from schoolyear)
yearEnd PK FK (from schoolyear)
code PK FK (from subject)
This will permit that a period have only a single subject. If, on the other hand, you want a period with more than one subject, you would have to put a surrogate key here.
Now, to draw the graphical interface, you will only have to use a select box (combo box). In this case you will have each item as a text, something like "from year X to Y" (a period). Your users can very well understand and select it.
Note: In anyway, you may not have the ID of a record in an interface, but the values that identify it. This is permissible to be seen, and identifies a record of their remaining.
If, however, you do not have periods as something unique, then "yearStart" and "yearEnd" are fields in subject entity, and there is no schoolyear entity. To be honest, the entity "schoolyear" should only exist if you want to reuse it's records to relationships with other records of other(s) table(s). I'm not saying this is or is not the case. Watch out as well. If you do this you say that every period has only one subject (as fields). I do not know if this is exactly what you want. We must always remember the most important thing in shaping an ER-Diagram:
CONTEXT
Check your context. What does it ask? If you have any questions, please comment. If you can offer me some more context here, I can help you more.

Assuming you have parameters #code, #yearStart and #yearEnd with values from the UI:
INSERT INTO schoolyearsubjects ( code, yearStart, yearEnd )
SELECT #code, y.yearStart, y.yearEnd
FROM schoolyear y
WHERE #yearStart <= y.yearStart
AND y.yearEnd <= #yearEnd;
...but I think you have a design flaw with your schoolyearsubjects because it allows duplicates e.g. doing this:
INSERT INTO schoolyearsubjects VALUES ( 'code red', '2016', '2017' );
INSERT INTO schoolyearsubjects VALUES ( 'code red', '2016', '2017' );
INSERT INTO schoolyearsubjects VALUES ( 'code red', '2016', '2017' );
looks like it would result in three de facto duplicate rows.

With your current scheme you can insert the schoolyearId with a request as follows:
INSERT INTO schoolyearsubjects (id, code, schoolyearId)
VALUES ( ${id},
${code_from_GUI},
( SELECT schoolyearId
FROM schoolyear
WHERE yearStart=${start_from_GUI} AND yearEnd=${end_from_GUI})
);
For this to work, the unique constraint on (yearStart, yearEnd) in the schoolyear table is required.
As to the rest of your questions:
1) You can use a composite key in the schoolyear table it will work either way.
2) The schoolyearsubjects is correct as it allows to write join queries. If you get rid of the schoolyearId columns than you will not probably need the schoolyear table alltogether as all data you may want to get will be in the schoolyearsubjects table.
3) This article may help to deside what type of key to use.

How to reduce the auto increment number in SQL database?

Currently the table structure is like this:
user_preference
---------------
id
user_id
pref_id
this table store all the user options, and id is auto -inc
the problems are:
1) is it necessary to keep an ID for every table ? It seems the common practice to keep a system generated id for every table
2) whenever the user update their perference, I will clear all related record for him and insert the update one, the auto-inc number will become very large later. How can I prevent that?
Thanks for helping.

You can periodically reset the auto increment counter back to 1, to ensure that the id does not become arbitrarily large (and sparse) over the course of frequent deletion of records.
In MySQL:
ALTER TABLE table_name AUTO_INCREMENT = 1
In SQL Server:
DBCC CHECKIDENT (table_name, RESEED, 0)
Each of these commands will reset the auto increment counter to 1, or to the value closest to 1 if 1 be already in use by another record.

You do not need to have an AUTO_INCREMENT PRIMARY KEY for every table. Sometimes there is a 'natural' key that works quite well for the PK.
Do not manipulate AUTO_INCREMENT values. Do not depend on any property other than uniqueness.
Your user_preference table smells like a many-to-many mapping? If so, this is optimal:
CREATE TABLE user_preference (
user_id ...,
pref_id ...,
PRIMARY KEY(user_id, pref_id),
INDEX (pref_id, user_id)
) ENGINE=InnoDB;
For discussion of "why", see http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table

INSERT ON DUPLICATE KEY UPDATE as substitute for UPDATE

This question is somewhat about "best practices", but also a search for potential problems. I would like to be able to run an update on multiple fields and assign different values without running multiple queries and not using a super complex query. So, what I've done is created a table with a primary key and the "name" column as a unique key.
Now, when I want to update multiple columns with different values, I can run a query like this:
INSERT INTO my_table (name, description) VALUES ('name', 'mydescription'), ('name2', 'description2') ON DUPLICATE KEY UPDATE description = VALUES(description)
Is this a bad idea? Is there a better way to do this? Are the standards police going to come arrest me?
Edit: I did just notice one potential issue with this, being a race condition. If one user removes a row while another user is editing it and they save the information, the edit will recreate the row. (Which could be used as a feature or a bug.)

Further to my comment above (linking to a question where another poster advises of the performance impact from using INSERT ... ON DUPLICATE KEY UPDATE where the records are known to exist), one could use the multiple-table UPDATE syntax with a table materialised from constants using UNION:
UPDATE my_table JOIN (
SELECT 'name' AS name, 'mydescription' AS description
UNION ALL
SELECT 'name2', 'description2'
) t USING (name) SET my_table.description = t.description

To make PK on unique combination of columns or add a numeric rowID

This is more of a design problem then a programming one.
I have a table where I store details about retail products:
Name Barcode BarcodeFormat etc...
----------------------------------------
(Name, Barcode, BarcodeFormat) are three columns will uniquely identify a record in the table (Candidate Key). However, I have other tables that need a FK on this one. So I introduced an auto_increment column itemId and made that the PK.
My question is - should I have the PK as (itemId, Name, Barcode, BarcodeFormat) or would it be better to have PK(itemId) and UNIQUE(Name, Barcode, BarcodeFormat).
My primary concern is performance in terms of INSERT and SELECT operations but comments on size are also welcome.
I'm using an innodb table with mysql

Definitely: PK(itemId) and UNIQUE(Name, Barcode, BarcodeFormat).
You don't want the hassle of using a multi-part key for all your joins etc
You may one day have rows without barcode values which then won't be unique, so you don't want uniqueness hard-wired into your model (you can easily drop the unique without breaking any relationships etc)
The constraint on uniqueness is a business-level issue, not a database entity one: You'll always need a key, but you may not always need the business rule of uniqueness

Unless you have millions of products, or very high throughput requirements it won't make much difference in terms of performance.
My preference is to have a surrogate PK (i.e. the auto increment column, your second option of PK(itemId) and UNIQUE(Name, Barcode, BarcodeFormat) ) because this is easier to manage if business keys change.

You have two candidate keys. We call the three-column compound key the 'natural key' and the auto_increment column (in this case) the 'surrogate key'. Both require unique constraints ('unique' in lower case to denote logical) at the database level.
Optionally, one candidate key may be designated 'primary'. The choice of which key (if any) should get this designation is arbitrary. Beware of anyone giving you definitive advice on this matter!

If you already add an itemId then you should use that as PK and have the other three columns with a UNIQUE.
If you don't have an itemId then you could use the other columns as the PK, but it may become difficult to keep it everywhere. In this case it is not great, because the product should have an id since it is an entity, but if it where just a relationship, then it would be acceptable not to have an id column.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008