Let's say i have a table with a lot of records "employee_hour", and another with alot less records: "turnover_export". employee_hour has a foreign key to turnover_export. However, sometimes the linked turnover_export gets outdated and needs to be updated for a lot of employee_hour records.
How it works currently (this way has become too slow):
For each employee_hour record where the linked turnover_export is outdated, a new and identical employee_hour record is inserted with a link to the updated turnover_export
How could i design a db/process to handle this as efficiently as possible?
Notable points:
Don't want to update the employee_hour records with the new turnover_export (because this means inserting new employee_hour records for each employee_hour record where the linked turnover_export is outdated, -> point 4 of this list)
Don't want to use triggers
The design should be tied as little as possible to a certain platform/db-engine
History should be kept, nothing gets deleted, updates mean new inserts
What I have thought of:
Link to a foreign key which is not an insert id but a UUID, then keep track with another col in turnover_export whether it is the one that should be used. Doesn't work because a foreign key from a single column cannot reference multiple columns
I know I haven't thought of or tried much but honestly i can't think of anything else.
Some context on what these tables are used for:
We want to report on the turnover generated for certain employee hours. We do this by getting a employee_hour record and then multiplying a certain column in this record with a certain column in the linked turnover_export.
Related
Context
Let us say there is table A. Table B replicates table A by using an upsert statement on a file that has only the updated records found on table A. It is easy to tell whether table A records where updated, as it will reflect on the column that shows the last modified date for said records. However, table A does not show any signs whether any records that were deleted (there is not a flag column that shows whether the record is deleted or not due to the huge amount of active records the table contains). If a record is deleted, said record will not show up anymore when trying to fetch it.
Current Solution
So given the above context, after I do my upsert on table B, I found so far that to mirror removed records from original table, I have to download a file of the full list of primary key columns currently available in table A for creating table C and use that to do an inner join with the primary keys on table B.
Better Solution?
My question is: Is there a better way to do this where I don't have to download a file of the full list of primary key columns currently available in table A?
I have a MySql database containing data about users of an application. This application is in production already, however improvements are added every day. The last improvement I've made changed the way data is collected and inserted into the database.
Just to be clearer, my database is composed of 5 tables containing user data and 1 table to relate all the tables, through foreign keys. These 5 foreign keys, together, form my Unique Index for this "Main Table" I have.
The issue is that one of these tables containing user data changed its format, and I want to remove all the data older than the modification I made on my application (just from this table, the other ones I need to keep untouched). However, this dataset has foreign keys in the main table, and I can't just drop these lines on the main table because the other informations I have are important. I tried to change the value of the foreign key for this table, in specific, but then, obviously, I have a problem related to duplicated indexes.
Reading on internet, I've found a solution to my problem using "Insert ... On duplicate key update ...", but i'm not inserting data, just updating it. I have an Idea about how to make a program on PHP to update my database, but is there another easier solution? Is it possible to avoid these problems using just MySql syntax?
might be worth looking at the below link
http://www.kavoir.com/2009/05/mysql-insert-if-doesnt-exist-otherwise-update-the-existing-row.html
I have a auto-increment primary key on one of my tables. If I have 3 rows and, for example, delete the third row I'm left with two. However, if I insert a new row its ID is automatically 4 and the IDs are 1, 2 and 4.
How can I re-use the deleted ID and have the ID of the newly inserted record to be 3 automatically?
Really, you shouldn't. Primary keys should be purely technical, meaningless values. Their value, and the monotony of the generation, shouldn't matter at all.
Moreover, since it's the PK of the row, you'll have potentially dozens (or thousands) of other rows in other tables referencing this ID (foreign keys), so changing it in the table would not be enough: you would have to change it everywhere.
And there's a good chance that this ID is also referenced in other applications (for example, it could be part of a bookmarked URL in a browser), and changing its value would make all these references invalid.
You should never change a primary key. It should be immutable, forever.
EDIT: I misread the question. You actually want to reuse an old ID. This is also a bad idea. Existing references would reference something other than they initially referenced. This is what happens when you change your phone number and it's being reused by someone else, who starts receiving lots of calls from people who still think this phone number is yours. Very annoying. You want to avoid this situation.
I know full well this should never happen. Ever. However, I started working at a company recently that hasn't had the greatest database design or input validation and this situation has come up.
There is a table which we'll call 'jobs'*. Jobs has a primary key, 'ID'. The job with the ID of 1 has loads of data associated with it; However, stupidly someone has duplicated that job as id 2 (this has happened around ~500 times so far). All of the information for both needs to be merged as id 1 (or 2, it doesn't matter).
The columns ARE linked by Foreign Key with UPDATE: CASCADE and DELETE: RESTRICT. They are not all called jobs_id.
Is my only (seemingly sensible) option here to:
Change id 1 to something I can guarantee is not used (2,147,483,647)
Temporarily remove the Foreign Key DELETE: RESTRICT
Delete the entry with id 1
Update id 2 to 2,147,483,647 (to link it with all the other entries)
Change id 2,147,483,647 to id 2
Reinstate DELETE: RESTRICT
As none of the code actually performs a delete (the restriction is there just as a fail-safe (someone editing direct in DB)), and the update: cascade is left in, data shouldn't get out of sync. This does seem messy though.
This will be wrapped in a transaction.
I could write something to iterate through each table (~180) and each column to find certain names / conditions, then update from 1 to 2, but that would need maintenance when a new table / column came along.
As this has happened a lot, and I don't see a re-write to prevent it happening any time soon, the 'solution' (sticking plaster) needs to be semi-automatic.
not the table's real name. His (or her) identity has been disguised so he (or she) doesn't get bullied.
Appreciate any input.
Assuming that you know how to identify the duplicated records why not create a new table with the same structure (maybe without the FKs), then loop through the original while copying values to the new table. When you hit a duplication, fix the value when writing to the new table. Then drop the original and rename the temp to the original.
This will clean up the table but if processes are still making the duplicated entries you could use a unique key to limit the damage going forward.
I am considering designing a relational DB schema for a DB that never actually deletes anything (sets a deleted flag or something).
1) What metadata columns are typically used to accomodate such an architecture? Obviously a boolean flag for IsDeleted can be set. Or maybe just a timestamp in a Deleted column works better, or possibly both. I'm not sure which method will cause me more problems in the long run.
2) How are updates typically handled in such architectures? If you mark the old value as deleted and insert a new one, you will run into PK unique constraint issues (e.g. if you have PK column id, then the new row must have the same id as the one you just marked as invalid, or else all of your foreign keys in other tables for that id will be rendered useless).
If your goal is auditing, I'd create a shadow table for each table you have. Add some triggers that get fired on update and delete and insert a copy of the row into the shadow table.
Here are some additional questions that you'll also want to consider
How often do deletes occur. What's your performance budget like? This can affect your choices. The answer to your design will be different depending of if a user deleting a single row (like lets say an answer on a Q&A site vs deleting records on an hourly basis from a feed)
How are you going to expose the deleted records in your system. Is it only through administrative purposes or can any user see deleted records. This makes a difference because you'll probably need to come up with a filtering mechanism depending on the user.
How will foreign key constraints work. Can one table reference another table where there's a deleted record?
When you add or alter existing tables what happens to the deleted records?
Typically the systems that care a lot about audit use tables as Steve Prentice mentioned. It often has every field from the original table with all the constraints turned off. It often will have a action field to track updates vs deletes, and include a date/timestamp of the change along with the user.
For an example see the PostHistory Table at https://data.stackexchange.com/stackoverflow/query/new
I think what you're looking for here is typically referred to as "knowledge dating".
In this case, your primary key would be your regular key plus the knowledge start date.
Your end date might either be null for a current record or an "end of time" sentinel.
On an update, you'd typically set the end date of the current record to "now" and insert a new record the starts at the same "now" with the new values.
On a "delete", you'd just set the end date to "now".
i've done that.
2.a) version number solves the unique constraint issue somewhat although that's really just relaxing the uniqueness isn't it.
2.b) you can also archive the old versions into another table.