Is there some nice automated way/program for cleaning up my database? I have a couple of tables with relations that in some cases just point at records that doesn't exist.
Sounds like you are missing some Foreign Key Constraints.
Depending on the ON DELETE option, orphaned records will be deleted together with the referenced records, referencing columns set to NULL, or deleting will be rejected.
You will have to delete those existing entries manually using a query like this, before creating your constraints:
DELETE FROM table_a
WHERE ref_b IS NOT NULL
AND NOT EXISTS ( SELECT 1 FROM table_b WHERE table_b.id = table_a.ref_b )
Related
I'm using MySQL. Let's assume I have a table hierarchy with two columns: id, parent_id.
The parent_id refers to id of other row of the same table, so I have the foreign key there.
The hierarchy table contains some data, but they are not relevant now.
I also have a second table called new_hierarchy_entries that has the same columns, but there are no foreign key restrictions set.
new_hierarchy_entries contains:
id parent_id
2 1
1 null
Now I want to copy all the rows from new_hierarchy_entries into hierarchy. When I run naively:
INSERT INTO hierarchy SELECT * FROM new_hierarchy_entries
I get error: Cannot add or update a child row: a foreign key constraint fails (my_db.hierarchy, CONSTRAINT hierarchy_ibfk_2 FOREIGN KEY (parent_id) REFERENCES hierarchy (id))
Of course, if the rows are inserted one by one, the first row (id=2, parent=1) cannot be inserted, because there is no row with id=1 in table hierarchy.
On the other hand, if all rows were added at once, then the constraints would be satisfied. So how can I copy the rows in such a way that I'm sure that constraints are satisfied after the copying, but they may not be satisfied while copying?
Sorting rows of new_hierarchy_entries by id will not help. I cannot assume that parent_id < id in the same row.
Sorting rows of new_hierarchy_entries by the hierarchy (using tree terminology, give me leaves first, then their parents etc.) would help, but I'm not sure how to do that in MySQL query.
I played with the idea of temporarily turning the FOREIGN_KEY_CHECKS off. But then I could insert inconsistent data and I wouldn't find out. Turning FOREIGN_KEY_CHECKS on doesn't make the database check consistency of all the data. It would take too much resources anyway.
This is tricky. I don't know any way to make MySQL re-check foreign key references after enabling FOREIGN_KEY_CHECKS.
You could check yourself for orphan rows, and if there are any, roll back.
BEGIN;
SET SESSION FOREIGN_KEY_CHECKS=0;
INSERT INTO hierarchy SELECT * FROM new_hierarchy_entries;
SET SESSION FOREIGN_KEY_CHECKS=1;
SELECT COUNT(*) FROM hierarchy AS c
LEFT OUTER JOIN hierarchy AS p ON p.id=c.parent_id
WHERE p.id IS NULL;
-- if count == 0 then...
COMMIT;
-- otherwise ROLLBACK and investigate the bad data
One other possibility is to use INSERT with the IGNORE option, which will skip failed rows. Then repeat the same statement in a loop, as long as you see "rows affected" more than 0.
INSERT IGNORE INTO hierarchy SELECT * FROM new_hierarchy_entries;
INSERT IGNORE INTO hierarchy SELECT * FROM new_hierarchy_entries;
INSERT IGNORE INTO hierarchy SELECT * FROM new_hierarchy_entries;
...
I want to do some automatic updates between MySQL many tables, but I wonder how to do it fine. Here is an example :
Table `articles` : ID, TEXT, DATE, DELETED
Table `comments` : ID, TEXT, DATE, DELETED
Table `users` : ID, NAME, AGE, DELETED
Table `link` : ID, ARTICLE_ID, COMMENT_ID, USER_ID, DELETED
As you guess, the link table contains IDs from others.
I have already designed my database like this, to keep tables with the lowest columns as possible.
I know there are lots of questions here about it, but I don't really know what is the best solution. So here is what I want to do :
When a comment is deleted (update comments set DELETED=1 where ID=...), I want to update the column link.DELETED (=> 1).
When a user is deleted (update users set DELETED=1 where ID=...), I want to update the columns link.DELETED (=> 1) and articles.TEXT (=> NULL).
I know I can use foreign keys between link.ARTICLE_ID and article.ID for example, and simply delete rows. But I have put limited rights on the user used by my website. I will cron a batch with a more powerful user to delete rows tagged DELETED to cleanup the database.
Is it possible to do that wih FK, or should I use triggers (I don't really know how to use them) or something else ?
I have tried to put FK between DELETED columns but when I update a comments row, all link ones are updated too :/
Thanks for your help.
I've done what I wanted with the 2 solutions :
With a trigger : pseudo-delete (update DELETED=1 ...) causes pseudo-delete in my link table.
With a foreign key : true delete (delete from ...) causes true delete in my link table.
So, I can pseudo-delete rows from my website, with the basic user.
And I can clean my base with my batch using a user with delete rights.
I have an existing table of contacts that has about 140k records in it. I am introducing a parent table (let's call them "parent_contacts") such that one parent_contact can have many contacts; but initially, parent_contacts will be seeded to have one record for every contact that currently exists in the database.
I thought I was being clever in trying something like the following, which I now understand is not allowed (assume all the necessary parent_contact records have been created ahead of time):
UPDATE contacts
SET contacts.parent_id =
(SELECT parent_contacts.id FROM parent_contacts
WHERE NOT EXISTS
(SELECT * FROM contacts AS c WHERE c.parent_id = parent_contacts.id) LIMIT 1)
(If not readily apparent, the idea here being to set the parent_id of each contact to the id of the first parent_contact that another contact isn't already linked to)
Since this particular approach is not possible, is there another way of doing this that doesn't involve executing 140k individual update statements?
FOLLOW-UP: I resolved this by introducing a temporary child_id on the parent table, which I then removed after the seeding was finished. But in the context of the original question, I think Tony's answer below sounds apt.
You seem to have done this backwards
Add Parent_id to contacts (no constraint yet!)
Update Contacts filling Parent_id with a unique number.
Create ParentContracts, Don't put Identity in or Primary key.
The backfill ParentContacts with a Insert into ParentContacts Select Parent_id, .... From Contacts
Add the Identity (don't forget seed to next value) and Primary key to ParentContacts
Add the foreign key constraint to Contacts.
Nice easy steps and easy to check each one instead of this whole cloth manouvre you are trying now.
I've got a MySQL table that has a lot of entries. Its got a unique key defined as (state, source) so there are no duplicates for that combination of columns. However now I am realizing that much of the state data is not entered consistently. For example in some rows it is entered as "CA" and others it might be spelled out as "California."
I'd like to update all the entries that say "California" to be "CA" and if it creates a conflict in the unique key, drop the row. How can I do that?
You may be better off dumping your data and using an external tool like Google Refine to clean it up. Look at using foreign keys in the future to avoid these issues.
I don't think you can do this in one SQL statement. And if you have foreign key relationships from other tables to the one you are trying to clean-up then you definitely do not want to do this in one step (even if you could).
CREATE TABLE state_mappings (
`old` VARCHAR(64) NOT NULL,
`new` VARCHAR(64) NOT NULL
);
INSERT INTO state_mappings VALUES ('California', 'CA'), ...;
INSERT IGNORE INTO MyTable (state, source)
SELECT sm.new, s.source from states s JOIN state_mappings sm
ON s.state = sm.old;
// Update tables with foreign keys here
DELETE FROm MyTable WHERE state IN (SELECT distinct old FROM state_mappings);
DROP TABLE state_mappings;
I'm no SQL pro, so these statements can probably be optimized, but you get the gist.
I am trying to run a query:
INSERT
INTO `ProductState` (`ProductId`, `ChangedOn`, `State`)
SELECT t.`ProductId`, t.`ProcessedOn`, \'Activated\'
FROM `tmpImport` t
LEFT JOIN `Product` p
ON t.`ProductId` = p.`Id`
WHERE p.`Id` IS NULL
ON DUPLICATE KEY UPDATE
`ChangedOn` = VALUES(`ChangedOn`)
(I am not quite sure the query is correct, but it appears to be working), however I am running into the following issue. I am running this query before creating the entry into the 'Products' table and am getting a foreign key constraint problem due to the fact that the entry is not in the Products table yet.
My question is, is there a way to run this query, but wait until the next query (which updates the Product table) before performing the insert portion of the query above? Also to note, if the query is run after the Product entry is created it will no longer see the p.Id as being null and therefore failing so it has to be performed before the Product entry is created.
---> Edit <---
The concept I am trying to achieve is as follows:
For starters I am importing a set of data into a temp table, the Product table is a list of all products that are (or have been in the past) added through the set of data from the temp table. What I need is a separate table that provides a state change to the product as sometimes the product will become unavailable (no longer in the data set provided by the vendor).
The ProductState table is as follows:
CREATE TABLE IF NOT EXISTS `ProductState` (
`ProductId` VARCHAR(32) NOT NULL ,
`ChangedOn` DATE NOT NULL ,
`State` ENUM('Activated','Deactivated') NULL ,
PRIMARY KEY (`ProductId`, `ChangedOn`) ,
INDEX `fk_ProductState_Product` (`ProductId` ASC) ,
CONSTRAINT `fk_ProductState_Product`
FOREIGN KEY (`ProductId` )
REFERENCES `Product` (`Id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8
COLLATE = utf8_general_ci;
The foreign key is an identifying relationship with the Product table (Product.Id)
Essentially what I am trying to accomplish is this:
1. Anytime a new product (or previously deactivated product) shows up in the vendor data set, the record is created in the ProductState table as 'Activated'.
2. Anytime a product (that is activated), does not show up in the vendor data set, the record is created as 'Deactivated' in the ProductState table.
The purpose of the ProductState table is to track activation and deactivation states of a product. Also the ProductState is a Multi-To-One relationship with the Product Table, and the state of the product will only change once daily, therefore my PKEY would be ProductId and ChangedDate.
With foreign keys, you definitely need to have the data on the Product table first, before entering the state, think about it with this logic: "How can something that dont exist have a state" ?
So pseudocode of what you should do:
Read in the vendor's product list
Compare them to the existing list in your Product table
If new ones found: 3.1 Insert it to Product table, 3.2 Insert it to ProductState table
If missing from vendor's list: 4.1 Insert it to ProductState table
All these should be done in 1 transaction. Note that you should NOT delete things from Product table, unless you really want to delete every information associated with it, ie. also delete all the "states" that you have stored.
Rather than trying to do this all in 1 query - best bet is to create a stored procedure that does the work as step-by-step above. I think it gets overly complicated (or in this case, probably impossible) to do all in 1 query.
Edit: Something like this:
CREATE PROCEDURE `some_procedure_name` ()
BEGIN
-- Breakdown the tmpImport table to 2 tables: new and removed
SELECT * INTO _temp_new_products
FROM`tmpImport` t
LEFT JOIN `Product` p
ON t.`ProductId` = p.`Id`
WHERE p.`Id` IS NULL
SELECT * INTO _temp_removed_products
FROM `Product` p
LEFT JOIN `tmpImport` t
ON t.`ProductId` = p.`Id`
WHERE t.`ProductId` IS NULL
-- For each entry in _temp_new_products:
-- 1. Insert into Product table
-- 2. Insert into ProductState table 'activated'
-- For each entry in _temp_removed_products:
-- 1. Insert into ProductState table 'deactivated'
-- drop the temporary tables
DROP TABLE _temp_new_products
DROP TABLE _temp_removed_products
END
I think you should:
start a transaction
do your insert into the Products table
do your insert into the ProductState table
commit the transaction
This will avoid any foreign key errors, but will also make sure your data is always accurate. You do not want to 'avoid' the foreign key constraint in any way, and InnoDB (which I'm sure you are using) never defers these constraints unless you turn them off completely.
Also no you cannot insert into multiple tables in one INSERT ... SELECT statement.