Merging two table entries with unique columns (MySQL) - mysql

I know full well this should never happen. Ever. However, I started working at a company recently that hasn't had the greatest database design or input validation and this situation has come up.
There is a table which we'll call 'jobs'*. Jobs has a primary key, 'ID'. The job with the ID of 1 has loads of data associated with it; However, stupidly someone has duplicated that job as id 2 (this has happened around ~500 times so far). All of the information for both needs to be merged as id 1 (or 2, it doesn't matter).
The columns ARE linked by Foreign Key with UPDATE: CASCADE and DELETE: RESTRICT. They are not all called jobs_id.
Is my only (seemingly sensible) option here to:
Change id 1 to something I can guarantee is not used (2,147,483,647)
Temporarily remove the Foreign Key DELETE: RESTRICT
Delete the entry with id 1
Update id 2 to 2,147,483,647 (to link it with all the other entries)
Change id 2,147,483,647 to id 2
Reinstate DELETE: RESTRICT
As none of the code actually performs a delete (the restriction is there just as a fail-safe (someone editing direct in DB)), and the update: cascade is left in, data shouldn't get out of sync. This does seem messy though.
This will be wrapped in a transaction.
I could write something to iterate through each table (~180) and each column to find certain names / conditions, then update from 1 to 2, but that would need maintenance when a new table / column came along.
As this has happened a lot, and I don't see a re-write to prevent it happening any time soon, the 'solution' (sticking plaster) needs to be semi-automatic.
not the table's real name. His (or her) identity has been disguised so he (or she) doesn't get bullied.
Appreciate any input.

Assuming that you know how to identify the duplicated records why not create a new table with the same structure (maybe without the FKs), then loop through the original while copying values to the new table. When you hit a duplication, fix the value when writing to the new table. Then drop the original and rename the temp to the original.
This will clean up the table but if processes are still making the duplicated entries you could use a unique key to limit the damage going forward.

Related

Foreign key to dynamically changing record

Let's say i have a table with a lot of records "employee_hour", and another with alot less records: "turnover_export". employee_hour has a foreign key to turnover_export. However, sometimes the linked turnover_export gets outdated and needs to be updated for a lot of employee_hour records.
How it works currently (this way has become too slow):
For each employee_hour record where the linked turnover_export is outdated, a new and identical employee_hour record is inserted with a link to the updated turnover_export
How could i design a db/process to handle this as efficiently as possible?
Notable points:
Don't want to update the employee_hour records with the new turnover_export (because this means inserting new employee_hour records for each employee_hour record where the linked turnover_export is outdated, -> point 4 of this list)
Don't want to use triggers
The design should be tied as little as possible to a certain platform/db-engine
History should be kept, nothing gets deleted, updates mean new inserts
What I have thought of:
Link to a foreign key which is not an insert id but a UUID, then keep track with another col in turnover_export whether it is the one that should be used. Doesn't work because a foreign key from a single column cannot reference multiple columns
I know I haven't thought of or tried much but honestly i can't think of anything else.
Some context on what these tables are used for:
We want to report on the turnover generated for certain employee hours. We do this by getting a employee_hour record and then multiplying a certain column in this record with a certain column in the linked turnover_export.

MySQL: Insert a new row at a specific primary key, or alternately, bump all subsequent rows down?

I am creating two tables in a database in MySQL just so I can play around with SQL and learn more, as I am a novice. I have read several questions on Stack relating to inserting a new row, and updating an existing row. My question is a little different, hopefully it won't be considered a dupe as none of the other answers I read gave me the full explanation I need because I think it's the auto-increment part that's confusing me. I don't think I can just go in and assign a new value for the primary keys in one of the tables with auto-increment set up, can I?
I have two tables: english_words and spanish_words. Their primary keys are respectively eng_id and span_id, and are set up to auto-increment. My hope had been to practice SQL and eventually get things set up enough so that I can practice my joins later on. For now, in english_words, I entered a duplicate row by mistake, with the ID 7. I deleted that row, and of course it now goes "6...8..." ..... and when I created my spanish_words table, I forgot all about the missing row 7. I'd hoped to keep everything very simple and aligned between the two tables until I'm ready for more complex endeavors later. Is there a way I can either:
Bump row 7 (and all subsequent rows) down by one in my spanish_words (so 7 becomes 8, 8 becomes 9, etc)
OR
Pull up everything after row 6 in english_words?
OR
Is there a better solution than either of those that you could suggest?
It's possible there's not a way. Originally I'd thought of trying to UPDATE the row 7 data in english_words or maybe insert a new row, but in my research I found an answer on Stack that said you can't insert data into a specific row in the table...and then I realized that's not going to fix anything anyway.
Do those of you more experienced with SQL have any ideas? (Aside from not making such silly mistakes anyway).
Additionally, I'm open to scrapping my tables and starting again, if there's a best-practice that I'm missing. Would setting up a foreign key to correspond between the two tables be a way to fix this? I'm pretty sure you have to do that anyway to perform the joins, but I was going to cross that bridge when I get there. What is best practice amongst database admins - set up foreign keys early on, or later when you need them?
Thanks in advance for your guidance.
A better way to set this up is to create a relation table:
CREATE TABLE translation (
eng_id int,
span_id int,
FOREIGN KEY (eng_id) REFERENCES english_words (eng_id),
FOREIGN KEY (span_id) REFERENCES spanish_words (span_id)
)
This is better than using a foreign key in the original tables, because you can't have bidirection foreign keys (you have to create the referenced row before the referencing row, so whichever table you insert into first can't have a foreign key pointing to the other one).

I want to reuse the gaps of the deleted rows

I have a auto-increment primary key on one of my tables. If I have 3 rows and, for example, delete the third row I'm left with two. However, if I insert a new row its ID is automatically 4 and the IDs are 1, 2 and 4.
How can I re-use the deleted ID and have the ID of the newly inserted record to be 3 automatically?
Really, you shouldn't. Primary keys should be purely technical, meaningless values. Their value, and the monotony of the generation, shouldn't matter at all.
Moreover, since it's the PK of the row, you'll have potentially dozens (or thousands) of other rows in other tables referencing this ID (foreign keys), so changing it in the table would not be enough: you would have to change it everywhere.
And there's a good chance that this ID is also referenced in other applications (for example, it could be part of a bookmarked URL in a browser), and changing its value would make all these references invalid.
You should never change a primary key. It should be immutable, forever.
EDIT: I misread the question. You actually want to reuse an old ID. This is also a bad idea. Existing references would reference something other than they initially referenced. This is what happens when you change your phone number and it's being reused by someone else, who starts receiving lots of calls from people who still think this phone number is yours. Very annoying. You want to avoid this situation.

MySQL PhpMyAdmin: Alter AUTO_INCREMENT and/or INSERT_ID

I have an invoices table which stores a single record for each invoice, with the id column (int AUTO_INCREMENT) being the primary key, but also the invoice reference number.
Now, unfortunately I've had to manual migrate some invoices generated on an old system which have a five digit id, instead of a four digit one which the current system uses.
However, even when I reset the AUTO_INCREMENT through PhpMyAdmin (Table Operations) back to the next four digit id, it still inserts a five digit one being the higher id currently in the table plus one.
From searching around, it would seem that I actually need to change the insert_id as well as the AUTO_INCREMENT ? I've tried to execute ALTER TABLE invoices SET insert_id=8125 as well as ALTER TABLE invoices insert_id=8125 but neither of these commands seem to be valid.
Can anyone explain the correct way that I can reset the AUTO_INCREMENT so that it will insert records with id's 8125 onwards, and then when it gets to 10962 it will skip over the four records I've manually added and continue sequential id's from 10966 onwards. If it won't skip over 10962 - 10966 then this doesn't really matter, as the company doesn't generate that many invoices each year so this will occur in a subsequent year hence not causing a problem hopefully.
I would really appreciate any help with this sticky situation I've found myself in! Many Thanks
First thing I'll suggest is to ditch PHPMyAdmin because it's one of the worst "applications" ever made to be used to work with MySQL. Get a proper GUI. My favourite is SQLYog.
Now on to the problem. Never, ever tamper with the primary key, don't try to "reset" it as you said or to update columns that have an integer generated by the database. As for why, the topic is broad and can be discussed in another question, just never, ever touch the primary key once you've set it up.
Second thing is that someone was deleting records of invoices hence the autoincrement is now at 10k+ rather than at 8k+. It's not a bad thing, but if you need sequential values for your invoices (such as there can't be a gap between invoices 1 and 5) then use an extra field called sequence_id or invoice_ref and use triggers to calculate that number. Don't rely on auto_increment feature that it'll reuse numbers that have been lost trough DELETE operation.
Alternatively, what you can do is export the database you've been using, find the CREATE TABLE definition for the invoices table, and find the line where it says "AUTO_INCREMENT = [some number]" and delete that statement. Import into your new database and the auto_increment will continue from the latest invoice. You could do the same by using ALTER TABLE however it's safer to re-import.

A Never Delete Relational DB Schema Design

I am considering designing a relational DB schema for a DB that never actually deletes anything (sets a deleted flag or something).
1) What metadata columns are typically used to accomodate such an architecture? Obviously a boolean flag for IsDeleted can be set. Or maybe just a timestamp in a Deleted column works better, or possibly both. I'm not sure which method will cause me more problems in the long run.
2) How are updates typically handled in such architectures? If you mark the old value as deleted and insert a new one, you will run into PK unique constraint issues (e.g. if you have PK column id, then the new row must have the same id as the one you just marked as invalid, or else all of your foreign keys in other tables for that id will be rendered useless).
If your goal is auditing, I'd create a shadow table for each table you have. Add some triggers that get fired on update and delete and insert a copy of the row into the shadow table.
Here are some additional questions that you'll also want to consider
How often do deletes occur. What's your performance budget like? This can affect your choices. The answer to your design will be different depending of if a user deleting a single row (like lets say an answer on a Q&A site vs deleting records on an hourly basis from a feed)
How are you going to expose the deleted records in your system. Is it only through administrative purposes or can any user see deleted records. This makes a difference because you'll probably need to come up with a filtering mechanism depending on the user.
How will foreign key constraints work. Can one table reference another table where there's a deleted record?
When you add or alter existing tables what happens to the deleted records?
Typically the systems that care a lot about audit use tables as Steve Prentice mentioned. It often has every field from the original table with all the constraints turned off. It often will have a action field to track updates vs deletes, and include a date/timestamp of the change along with the user.
For an example see the PostHistory Table at https://data.stackexchange.com/stackoverflow/query/new
I think what you're looking for here is typically referred to as "knowledge dating".
In this case, your primary key would be your regular key plus the knowledge start date.
Your end date might either be null for a current record or an "end of time" sentinel.
On an update, you'd typically set the end date of the current record to "now" and insert a new record the starts at the same "now" with the new values.
On a "delete", you'd just set the end date to "now".
i've done that.
2.a) version number solves the unique constraint issue somewhat although that's really just relaxing the uniqueness isn't it.
2.b) you can also archive the old versions into another table.