i would like to ask you a design question:
I am designing a table that makes me scratch my head, not sure what the best approach is, i feel like i am missing something:
There are two tables A and B and one M:N relationship table between them. The relationship table has right now these values:
A.ID, B.ID, From, To
Bussiness requirements:
At any time, A:B relation ship can be only 1:1
A:B can repeat in time as defined by From and To datetime values, which specify an interval
example: Car/Driver.
Any car can have only 1 Driver at any time
Any Driver can drive only 1 car at any time (this is NOT topgear, ok? :) )
Driver can change the car after some time, and can return to the same car
Now, i am not sure:
- what PK should i go with? A,B is not enough, adding From and To doesnt feel right, maybe an autoincrement PK?
-any way to enforce the bussiness requirements by DB design?
-for business reason, i would prefer it not to be in a historical table. Why? Well, let's assume the car is rented and i want to know, given a date, who had what car rented at that date. Splitting it into historical table would require more joinst :(
I feel like i am missing something, some kind of general patter ... or i dont know....
Thankful for any help, so thank you :)
I don't think you are actually missing anything. I think you've got a handle on what the problem is.
I've read a couple of articles about how to handle "temporal" data in a relational database.
Bottom line consensus is that the traditional relational model doesn't have any builtin mechanism for supporting temporal data.
There are several approaches, some better suited to particular requirements than others, but all of the approaches feel like they are "duct taped" on.
(I was going to say "bolted on", but I thought at tip of the hat to Red Green was in order: "... the handyman's secret weapon, duct tape", and "if the women don't find you handsome, they should at least find in you handy.")
As far as a PRIMARY KEY or UNIQUE KEY for the table, you could use the combination of (a_id, b_id, from). That would give the row a unique identifier.
But, that doesn't do anything to prevent overlapping "time" ranges.
There is no declarative constraint for a MySQL table that prevents "overlapping" datetime ranges that are stored as "start","end" or "start","duration", etc. (At least, in the general case. If you had very well defined ranges, and triggers that rounded the from to an even four hour boundary, and a duration to exactly four hours, you could use a UNIQUE constraint. In the more general case, for any ol' values of from and to, the UNIQUE constraint does not work for us.
A CHECK constraint is insufficient (since you would need to look at other rows), and even if it were possible, MySQL doesn't actually enforce check constraints.
The only way (I know of) to get the database to enforce such a constraint would be a TRIGGER that looks for the existence of another row for which the affected (inserted/updated) row would conflict.
You'd need both a BEFORE INSERT trigger and a BEFORE UPDATE trigger. The trigger would need to query the table, to check for the existence of a row that "overlaps" the new/modified row
SELECT 1
FROM mytable t
WHERE t.a_id = NEW.a_id
AND t.b_id = NEW.b_id
AND t.from <> OLD.from
AND < (t.from, t.to) overlaps (NEW.from,NEW.to) >
Obviously, that last line is pseudocode for the actual syntax that would be required.
The line before that would only be needed in the BEFORE UPDATE trigger, so we don't find (as a "match") the row being updated. The actual check there would really depend on the selection of the PRIMARY KEY (or UNIQUE KEY)(s).
With MySQL 5.5, we can use the SIGNAL statement to return an error, if we find the new/updated row would violate the constraint. With previous versions of MySQL, we can "throw" an error by doing something that causes an actual error to occur, such as running a query against a table name that we know does not exist.
And finally, this type of functionality doesn't necessarily have to be implemented in a database trigger; this could be handled on the client side.
How about three tables:
TCar, TDriver, TLog
TCar
pkCarID
fkDriverID
name
A unique index on driver ensures a driver is only ever in one car. Turning the foreign key
fkDriverID into a 1:1 relation ship.
TDriver
pkDriverID
name
TLog
pkLogID (surrogate pk)
fkCarID
fkDriverID
from
to
With 2 joins you will get any information you describe. if you just need to find Car data by driverID or driver data by cardid you can do it with one join.
thank you everyone for you input, so far i am thinking about this approach, would be thankful for any criticism/pointing out flaws:
Tables (pseudoSQLcode):
Car (ID pk auto_increment, name)
Driver(ID pk auto_increment, name)
Assignment (CarID unique,DriverID unique,from Datetime), composite PK (CarID,DriverID)
AssignmentHistory (CarID unique,DriverID unique,from Datetime,to Datetime) no pk
of course, CarID is a FK to Car(ID) and DriverID is a FK to Driver(ID)
the next stage are two triggers (and boy oh boy, i hope this can be done in mysql (works on MSSSQL, but i dont have a mysql db handy right now to test):
!!! Warning, MSSQL for now
create trigger Assignment _Update on Assignment instead of update as
delete Assignment
from Assignment
join inserted
on ( inserted.CarID= Assignment .CarID
or inserted.DriverID= Assignment .DriverID)
and ( inserted.CarID<> omem.CarID or inserted.DriverID<> omem.DriverID)
insert into Assignment
select * from inserted;
create trigger Assignment _Insert on Assignment after delete as
insert into Assignment_History
select CarID,DriverID,from,NOW() from deleted;
i tested it a bit and it seems for each bussiness case it does what i need it to do
Related
State description
I have two databases, DB1 and DB2, that have the same table, Author, with the fields Author.ID and Author.AuthorName.
The DB1.Author has the AUTO_INCREMENT on its Author.ID field, while the DB2.Author does not have the AUTO_INCREMENT since it relies on the correctness of DB1 data.
Both tables have the PRIMARY index on Author.ID and a UNIQUE index on Author.AuthorName.
DB2.Author has rows copied from the DB1.Author.
Both databases use MariaDB version 10.6.7.
The problem
DB1 manager deleted some entries in the DB1.Author table, and then reordered indexes to have no gaps in index numbers. This means they might have had:
ID
AuthorName
1
A
2
B
3
C
Then they deleted the row where the AuthorName was 'B':
ID
AuthorName
1
A
3
C
And they finally updated the indexes to have no gaps (3-C changed to 2-C):
ID
AuthorName
1
A
2
C
Now I need to find a way to copy such updated state of the rows from the DB1.Author to the DB2.Author without deleting everything from the DB2.Author table, so that I don't lose the data on CASCADE effects.
What is the best approach for this?
My shot
This is what I did, but it obviously cannot work, since in the case of duplicate key, it would attempt to create another duplicate key (duplicate ID 2 would try to INSERT duplicate value of 'C', since it already exists on ID 3):
INSERT INTO DB2.Author (ID, AuthorName)
SELECT DB1.Author.ID, DB1.Author.AuthorName FROM DB1.Author
ON DUPLICATE KEY UPDATE
ID = DB1.Author.ID,
AuthorName = DB1.Author.AuthorName;
Additional ways?
Other than the possible SQL query solution, are there any other ways to automatically update the table data in one database when the other database changes its data? Would need to replicate only some tables, while other, linked tables are different.
tl;dr your problem is your DB manager. The solution is to get him/her to undo the damage they caused by restoring the data to how it was before. Deleting rows is fine. Updating primary keys is never OK.
Do not create a work around or validate it by accommodating his/her mistake, because doing so will make it more likely that it will happen again.
Full answer.
Your actual problem is your "DB manager", who violated a fundamental rule of databases: Never update surrogate key values!
In your case it's even more tragic, because gaps in the ID column values don't matter in any way. If gaps do matter, you're in even worse shape. Allow me to explain...
The author's name is your actual identifier. We know this because there a unique constraint on it.
The ID column is a surrogate key, which are most conveniently implemented as an auto incrementing integer, but surrogate keys would work just as well if they were random (unique) numbers. Gaps, and even the choice of values themselves, are irrelevant to the effectiveness of surrogate keys.
You need to treat the DB2 table as completely wrong as the update of primary keys on the source table would have completely spoilt it.
Delete everything in DB2 table
Insert into DB2 table everything from DB1 table
Going forwards, without being condescending, the users with access to DB1 need training (or perhaps you need to reconsider the security against the DB). Updating a primary key value is a wrong thing to do. Gapless sequences is a silly thing to want, especially when you have known dependencies. In fact, gapless sequences is often listed as poor database security (as it makes it easy to just cycle through all data).
You probably want to consider commercial solutions for logical data replication. If they don’t support updates of primary keys then you can use that as a good enough reason not to.
I would invest time in making sure there’s no other logical corruptions of data like this.
i want to generate a id number for my user table.
id number is unique index.
here my trigger
USE `schema_epolling`;
DELIMITER $$
CREATE DEFINER=`root`#`localhost` TRIGGER `tbl_user_BINS` BEFORE INSERT ON `tbl_user`
FOR EACH ROW
BEGIN
SET NEW.id_number = CONCAT(DATE_FORMAT(NOW(),'%y'),LPAD((SELECT auto_increment FROM
information_schema.tables WHERE table_schema = 'schema_epolling' AND table_name =
'tbl_user'),6,0));
END
it works if i insert one by one .. or may 5 rows at a time.
but if i insert a bulk rows.. an error occured.
id number
heres the code i use for inserting bulk rows from another schema/table:
INSERT INTO schema_epolling.tbl_user (last_name, first_name)
SELECT last_name, first_name
FROM schema_nc.tbl_person
heres the error:
Error Code: 1062. Duplicate entry '14000004' for key 'id_number_UNIQUE'
Error Code: 1062. Duplicate entry '14000011' for key 'id_number_UNIQUE'
Error Code: 1062. Duplicate entry '14000018' for key 'id_number_UNIQUE'
Error Code: 1062. Duplicate entry '14000025' for key 'id_number_UNIQUE'
Error Code: 1062. Duplicate entry '14000032' for key 'id_number_UNIQUE'
if i use uuid() function it works fine. but i dont want uuid() its too long.
You don't want to generate id values that way.
The auto-increment value for the current INSERT is not generated yet at the time the BEFORE INSERT trigger executes.
Even if it were, the INFORMATION_SCHEMA would contain the maximum auto-increment value generate by any thread, not just the thread executing the trigger. So you would have a race condition that would easily conflict with other concurrent inserts and get the wrong value.
Also, querying INFORMATION_SCHEMA on every INSERT is likely to be a bottleneck for your performance.
In this case, to get the auto-increment value formatted with the two-digit year number prepended, you could advance the table's auto-increment value up to %y million, and then when we reach January 1 2015 you would ALTER TABLE to advance it again.
Re your comments:
The answer I gave above applies to how MySQL's auto-increment works. If you don't rely on auto-increment, you can generate the values by some other means.
Incrementing another one-row table as #Vatev suggests (though this creates a relatively long-lived lock on that table, which could be a bottleneck for your inserts).
Generating values in your application, based on an central, atomic id-generator like memcached. See other ideas here: Generate unique IDs in a distributed environment
Using UUID(). Yes, sorry, it's 32 characters long. Don't truncate it or you will use uniqueness.
But combining triggers with auto-increment in the way you show simply won't work.
I'd like to add my two cents to expound on Bill Karwin's point.
It's better that you don't generate a Unique ID by attempting to manually cobble one together.
The fact that your school produces an ID in that way does not mean that's the best way to do it (assuming that is what they are using that generated value for which I can't know without more information).
Your database work will be simpler and less error prone if you accept that the purpose for an ID field (or key) is to guarantee uniqueness in each row of data, not as a reference point to store certain pieces of human readable data in a central spot.
This type of a ID/key is known as a surrogate key.
If you'd like to read more about them here's a good article: http://en.wikipedia.org/wiki/Surrogate_key
It's common for a surrogate key to also be the primary key of a table, (and when it's used in this way it can greatly simplify creating relationships between tables).
If you would like to add a secondary column that concatenates date values and other information because that's valuable for an application you are writing, or any other purpose you see fit, then create that as a separate column in your table.
Thinking of an ID column/key in this, fire & forget, way may simplify the concept enough that you may experience a number of benefits in your database creation efforts.
As an example, should you require uniqueness between un-associated databases, you will more easily be able to stomach the use of a UUID.
(Because you'll know it's purpose is merely to ensure uniqueness NOT to be useful to you in any other way.)
Additionally, as you've found, taking the responsibility on yourself, instead of relying on the database, to produce a unique value adds time consuming complexity that can otherwise be avoided.
Hope this helps.
In a program, should we use try catch to check insertion of duplicate values into tables, or should we check if the value is already present in the table and avoid insertion?
This is easy enough to enforce with a UNIQUE constraint on the database side so that's my recommendation. I try to put as much of the data integrity into the database so that I can avoid having bad data (although sometimes unavoidable).
If this is how you already have it you might as well just catch the mysql exception for duplicate value insertion on such a table as doing the check then the insertion is more costly then having the database do one simple lookup (and possibly an insert).
Depends upon whether you are inserting one, or a million, as well as whether the duplicate is the primary key.
If its the primary key, read: http://database-programmer.blogspot.com/2009/06/approaches-to-upsert.html
An UPSERT or ON DUPLICATE KEY... The idea behind an UPSERT is simple.
The client issues an INSERT command. If a row already exists with the
given primary key, then instead of throwing a key violation error, it
takes the non-key values and updates the row.
This is one of those strange (and very unusual) cases where MySQL
actually supports something you will not find in all of the other more
mature databases. So if you are using MySQL, you do not need to do
anything special to make an UPSERT. You just add the term "ON
DUPLICATE KEY UPDATE" to the INSERT statement:
If it's not the primary key, and you are inserting just one row, then you can still make sure this doesn't cause a failure.
For your actual question, I don't really like the idea of using try/catch for program flow, but really, you have to evaluate readability and user experience (in this case performance), and pick what you think is the best of mix of the two.
You can add a UNIQUE constraint to your table.. Something like
CREATE TABLE IF NOT EXISTS login
(
loginid SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
loginname CHAR(20) NOT NULL,
UNIQUE (loginname)
);
This will ensure no two login names are the same.
you can Create a Unique Composite Key
ALTER TABLE `TableName` ADD UNIQUE KEY (KeyOne, KeyTwo, ...);
you just need to create a unique key in your table so that it will not permit to add the same value again.
You should try inserting the value and catch the exception. In a busy system, if you check for the existience of a value it might get inserted between the time you check and the time you insert it.
Let the database do it's job, let the database check for the duplicate entry.
A database is a computerized representation of a set of business rules and a DBMS is used to enforce these business rules as constraints. Neither can verify a proposition in the database is true in the real world. For example, if the model in question is the employees of an enterprise and the Employees table contains two people named 'Jimmy Barnes' DBMS (nor the database) cannot know whether one is a duplicate, whether either are real people, etc. A trusted source is required to determine existence and identity. In the above example, the enterprise's personnel department is responsible for checking public records, perusing references, ensuring the person is not already on the payroll, etc then allocating an unique employee reference number that can be used as a key. This is why we look for industry-standard identifiers with a trusted source: ISBN for books, VIN for cars, ISO 4217 for currencies, ISO 3166 for countries, etc.
I think it is better to check if the value already exists and avoid the insertion. The check for duplicate values can be done in the procedure that saves the data (using exists if your database is an SQL database).
If a duplicate exists you avoid the insertion and can return a value to your app indicating so and then show a message accordingly.
For example, a piece of SQL code could be something like this:
select #ret_val = 0
If exists (select * from employee where last_name = #param_ln and first_name = #param_fn)
select #ret_val = -1
Else
-- your insert statement here
Select #ret_val
Your condition for duplicate values will depend on what you define as a duplicate record. In your application you would use the return value to know if the data was a duplicate. Good luck!
I have an invoices table which stores a single record for each invoice, with the id column (int AUTO_INCREMENT) being the primary key, but also the invoice reference number.
Now, unfortunately I've had to manual migrate some invoices generated on an old system which have a five digit id, instead of a four digit one which the current system uses.
However, even when I reset the AUTO_INCREMENT through PhpMyAdmin (Table Operations) back to the next four digit id, it still inserts a five digit one being the higher id currently in the table plus one.
From searching around, it would seem that I actually need to change the insert_id as well as the AUTO_INCREMENT ? I've tried to execute ALTER TABLE invoices SET insert_id=8125 as well as ALTER TABLE invoices insert_id=8125 but neither of these commands seem to be valid.
Can anyone explain the correct way that I can reset the AUTO_INCREMENT so that it will insert records with id's 8125 onwards, and then when it gets to 10962 it will skip over the four records I've manually added and continue sequential id's from 10966 onwards. If it won't skip over 10962 - 10966 then this doesn't really matter, as the company doesn't generate that many invoices each year so this will occur in a subsequent year hence not causing a problem hopefully.
I would really appreciate any help with this sticky situation I've found myself in! Many Thanks
First thing I'll suggest is to ditch PHPMyAdmin because it's one of the worst "applications" ever made to be used to work with MySQL. Get a proper GUI. My favourite is SQLYog.
Now on to the problem. Never, ever tamper with the primary key, don't try to "reset" it as you said or to update columns that have an integer generated by the database. As for why, the topic is broad and can be discussed in another question, just never, ever touch the primary key once you've set it up.
Second thing is that someone was deleting records of invoices hence the autoincrement is now at 10k+ rather than at 8k+. It's not a bad thing, but if you need sequential values for your invoices (such as there can't be a gap between invoices 1 and 5) then use an extra field called sequence_id or invoice_ref and use triggers to calculate that number. Don't rely on auto_increment feature that it'll reuse numbers that have been lost trough DELETE operation.
Alternatively, what you can do is export the database you've been using, find the CREATE TABLE definition for the invoices table, and find the line where it says "AUTO_INCREMENT = [some number]" and delete that statement. Import into your new database and the auto_increment will continue from the latest invoice. You could do the same by using ALTER TABLE however it's safer to re-import.
I am considering designing a relational DB schema for a DB that never actually deletes anything (sets a deleted flag or something).
1) What metadata columns are typically used to accomodate such an architecture? Obviously a boolean flag for IsDeleted can be set. Or maybe just a timestamp in a Deleted column works better, or possibly both. I'm not sure which method will cause me more problems in the long run.
2) How are updates typically handled in such architectures? If you mark the old value as deleted and insert a new one, you will run into PK unique constraint issues (e.g. if you have PK column id, then the new row must have the same id as the one you just marked as invalid, or else all of your foreign keys in other tables for that id will be rendered useless).
If your goal is auditing, I'd create a shadow table for each table you have. Add some triggers that get fired on update and delete and insert a copy of the row into the shadow table.
Here are some additional questions that you'll also want to consider
How often do deletes occur. What's your performance budget like? This can affect your choices. The answer to your design will be different depending of if a user deleting a single row (like lets say an answer on a Q&A site vs deleting records on an hourly basis from a feed)
How are you going to expose the deleted records in your system. Is it only through administrative purposes or can any user see deleted records. This makes a difference because you'll probably need to come up with a filtering mechanism depending on the user.
How will foreign key constraints work. Can one table reference another table where there's a deleted record?
When you add or alter existing tables what happens to the deleted records?
Typically the systems that care a lot about audit use tables as Steve Prentice mentioned. It often has every field from the original table with all the constraints turned off. It often will have a action field to track updates vs deletes, and include a date/timestamp of the change along with the user.
For an example see the PostHistory Table at https://data.stackexchange.com/stackoverflow/query/new
I think what you're looking for here is typically referred to as "knowledge dating".
In this case, your primary key would be your regular key plus the knowledge start date.
Your end date might either be null for a current record or an "end of time" sentinel.
On an update, you'd typically set the end date of the current record to "now" and insert a new record the starts at the same "now" with the new values.
On a "delete", you'd just set the end date to "now".
i've done that.
2.a) version number solves the unique constraint issue somewhat although that's really just relaxing the uniqueness isn't it.
2.b) you can also archive the old versions into another table.