Optimize where in delete in mysql - mysql

I have a mysql performance problem : i have to update a large innodb table (approx. 1 million lines), deleting rows by thousands. Let's say it's items, coming from multiple sources. So, the table has a primary key item_id, an provider_id column identifying the provider of the item, and an external_id column (which is the id of this item in this provider, the way that this specific provider identify it) which has to be a varchar (several providers, several internal ways for them of identifying their items).
When i update, i go provider by provider, and i make a match/diff between a json file and the database, to know which items have to be added, updated or deleted. The match is on the external_id. When i have to delete items, i go with a query like DELETE FROM table_items WHERE provider_id=A AND external_id IN (...).
Even if i make batches of 1k items, it's really slow.
Here's a simplified table definition
CREATE TABLE `annonce` (
`annonce_id` INT(11) NOT NULL AUTO_INCREMENT,
`annonce_id_externe` VARCHAR(70) NOT NULL,
`provenance_id` INT(11) NOT NULL,
`categorie_id` INT(11) NOT NULL,
PRIMARY KEY (`annonce_id`),
UNIQUE INDEX `id_externe_par_provenance_et_categorie` (`annonce_id_externe`, `provenance_id`, `categorie_id`),
INDEX `provenance_id` (`provenance_id`),
INDEX `annonce_id_externe` (`annonce_id_externe`),
INDEX `categorie_id` (`categorie_id`),
CONSTRAINT `annonce_categorie_id` FOREIGN KEY (`categorie_id`) REFERENCES `categorie` (`categorie_id`) ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT `annonce_provenance_id` FOREIGN KEY (`provenance_id`) REFERENCES `provenance` (`provenance_id`) ON UPDATE CASCADE ON DELETE CASCADE,
)
Any idea on making it faster ?
Thanks

Related

Finding and Clearing Records from an Index Table in MySQL

I created an association table and made a unique key using the pairing of two foreign keys.
I then had records created in this table upon the creation of an incident (record in another table). The incident's ID would then be paired up with each of the active items from another table. Thereby creating an association between the new incident and all active items. The unique key was to prevent the same item from being inadvertently added to the association multiple times.
This worked for awhile, until I cascade deleted some test records. Now when a new record is created, it throws errors that the ID pairing is violating the Unique key constraints.
I had assume that the deletion of the records would allow for recreation of the records but apparently there is another table somewhere that has as a field value the name of the constraint and as data points the pairings of the IDs. Since this hidden table still has the old values, I cannot proceed.
What might this table be called, and in what schema? Additionally, what options might I select on the delete action to also delete the appropriate records from the index table?
Now I'm also getting that that primary key already exists for the association table, but it clearly does not. Perhaps, I could just drop the table and recreate it, after exporting the few hundreds of currently correct data rows.
CREATE TABLE `device_checklist` (
`ID` int(11) NOT NULL,
`RECORD` int(11) NOT NULL,
`DEVICE` int(11) NOT NULL,
`ACTIVE_STATUS` bit(1) DEFAULT b'0',
`CHECKBOX` bit(1) DEFAULT b'0',
PRIMARY KEY (`ID`),
UNIQUE KEY `ID_UNIQUE` (`ID`),
UNIQUE KEY `UNQ_DEVICE_KEY` (`RECORD`,`DEVICE`),
KEY `REC_FK_idx` (`RECORD`),
KEY `DEVICE_FK_idx` (`DEVICE`),
CONSTRAINT `DEVICE_FK` FOREIGN KEY (`DEVICE`) REFERENCES `devices` (`ID`) ON DELETE CASCADE,
CONSTRAINT `REC_ID_FK` FOREIGN KEY (`RECORD`) REFERENCES `mysql`.`maindbtable` (`ID`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
Odd...when I did a:
SELECT * FROM device_checklist
and ten ordered the results I got 1018 as the highest ID number. But when I did a:
SELECT MAX(ID) FROM device_checklist
I got a 1063 as the highest ID, so I did a:
SELECT * FROM device_checklist ORDER BY ID DESC
and it finally showed me the records with ID higher than 1018. Almost like MySQL Workbench had cached the previous result set and was using that cache rather than finding the new set.

On delete cascade, records can be deleted on the parent table, but parent table can not be dropped

I have two tables
CREATE TABLE `category` (`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `item` (`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
categoryid` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`), KEY `fk_categoryid_item` (`categoryid`),
CONSTRAINT `fk_categoryid_item` FOREIGN KEY (`categoryid`)
REFERENCES `category` (`id`) ON DELETE CASCADE)
ENGINE=InnoDB DEFAULT CHARSET=utf8
In the table category I have a record with id 2.
In the item I have a record with id = 1, categoryid = 2, with 2 as the foreign key referring to the category table. If I delete the row in the category table with the id 2, the record in the item table that has the categoryid as 2 also gets deleted. This is as expected because of on delete cascade. But If I try to drop the table category, I get the error Error Code:
1217. Cannot delete or update a parent row: a foreign key constraint fails
Why does this happen ? Of course, setting foreign_key_checks = 0 dropping the table becomes possible. But I would like to know why does this happen that we can delete the records, but can not drop the table with on cascade delete option. Does this option only apply for deleting records, but not for dropping tables.
I checked the documentation, I could not find any explanation for this.
Please let me know if there is something fundamental that I am missing or if you point out to the related documentation it would be helpful. I am using MySQL 5.7.
Thanks in advance.
If you delete the table category but do not remove/alter the foreign key, then that will be left pointing to nothing. Internally the database has a management system that reinforces the referential constraints and that prevents you from creating lose ends. See also this, this and this questions.
It has something to do also with the math behind it, it is called relational algebra. I am not at that level either, but I think it breaks the definition of a FK if you delete one of the associated tables.
In database relational modeling and implementation, a unique key is a set of zero or more attributes, the value(s) of which are guaranteed to be unique for each tuple (row) in a relation.

MySQL table with more indexes than columns

I am working with an InnoDB MySQL database from MySQL workbench and am stuck on the indexes for one table.
I have a table
╔═══════════════╗
║ poll_votes ║
╟───────────────╢
║pk poll_id fk║ //references polls.id
║pk voter_id fk║ //references users.id
║ option_id fk║ //references poll_options.id
╚═══════════════╝
Since the primary key is a composite key, MySQL automatically generates a multi-column index for poll_id and voter_id. Since each foreign key must have an associated index, MySQL further generates 3 additional indexes corresponding to the 3 columns.
Now I have 4 indexes on a 3-column table, and MySQL Workbench won't let me delete any of them, even though one of them is redundant. Furthermore, I'll never need the option_id index, so that's just wasting space.
Is having more indexes than columns going to hurt me here, or should I not worry about it? Is there a better way to design this table?
EDIT: The SQL (I edited some of the field names so there's a possibility there's a typo in here):
CREATE TABLE `poll_votes` (
`poll_id` int(11) NOT NULL,
`voter_id` int(11) NOT NULL,
`poll_option_id` int(11) NOT NULL,
PRIMARY KEY (`poll_id`,`voter_id`),
KEY `fk_poll_votes_polls1_idx` (`poll_id`),
KEY `fk_poll_votes_poll_votes1_idx` (`poll_option_id`),
KEY `fk_poll_votes_users1_idx` (`voter_id`),
CONSTRAINT `fk_poll_votes_polls1` FOREIGN KEY (`poll_id`) REFERENCES `polls` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION,
CONSTRAINT `fk_poll_votes_poll_options1` FOREIGN KEY (`poll_option_id`) REFERENCES `poll_options` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION,
CONSTRAINT `fk_poll_votes_users1` FOREIGN KEY (`voter_id`) REFERENCES `users` (`id`) ON DELETE CASCADE ON UPDATE NO ACTION
)
SET foreign_key_checks = OFF;
CREATE...
SET foreign_key_checks = ON;
(No, I don't understand why that flag is controlling the issue you encountered.)

MySQL Relationships & Joins

In a MySQL database where there are relationships between tables and the primary key of one table is stored as a foreign key in a second table, is there still a need to perform a join?
If there is, what is the point on declaring the relationship? I'd take a stab in the dark and say it's something to do with the indexing or related tables can find related records much faster? I've tried Googleing this, but can't seem to find much. I'm sure there is loads out there on this, but I don't know the keywords to search for.
Here is an example of table 1 and table 2:
------------------- Table 1 ----------------------
CREATE TABLE IF NOT EXISTS `db_hint`.`user` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`fb_id` INT NOT NULL,
`last_logged_in` DATETIME NULL,
`permissions` INT UNSIGNED NOT NULL,
PRIMARY KEY (`id`),
INDEX `permissions_id_idx` (`permissions` ASC),
CONSTRAINT `permissions_id`
FOREIGN KEY (`permissions`)
REFERENCES `db_hint`.`permissions` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
----------------- Table 2 ----------------------
CREATE TABLE IF NOT EXISTS `db_hint`.`user_stat` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`user_id` INT UNSIGNED NOT NULL,
PRIMARY KEY (`id`),
INDEX `user_id_idx3` (`user_id` ASC),
CONSTRAINT `user_id`
FOREIGN KEY (`user_id`)
REFERENCES `db_hint`.`user` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
When performing any kind of join, does the InnoDB engine use the relationship in any way? Thanks.
The point of declaring the foreign key is to enforce data consistency.
You will still need the JOIN in order to get desired data.
In MySQL foreign keys will improve performance, but don't expect much comparable to indexes.
To do a query involving two tables, you need JOIN ... ON ... to say how they are related. FOREIGN KEYs are not involved in a SELECT and has zero impact on performance of SELECT. You do not "have to have" FOREIGN KEYs to perform SELECTs.
A FOREIGN KEY is used during INSERTs (and other writes) to verify that a subsequent JOIN will actually find something in the other table. It is an overhead during the write -- the INSERT actively checks (via an index) that the referenced table has the indicated row.
FOREIGN KEYs may also do a cascading operation. For example, a DELETE can cause another DELETE to happen. I prefer to such take control in my application code.

In SQL, is it OK for two tables to refer to each other?

In this system, we store products, images of products (there can be many image for a product), and a default image for a product. The database:
CREATE TABLE `products` (
`ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
`NAME` varchar(255) NOT NULL,
`DESCRIPTION` text NOT NULL,
`ENABLED` tinyint(1) NOT NULL DEFAULT '1',
`DATEADDED` datetime NOT NULL,
`DEFAULT_PICTURE_ID` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`ID`),
KEY `Index_2` (`DATEADDED`),
KEY `FK_products_1` (`DEFAULT_PICTURE_ID`),
CONSTRAINT `FK_products_1` FOREIGN KEY (`DEFAULT_PICTURE_ID`) REFERENCES `products_pictures` (`ID`) ON DELETE SET NULL ON UPDATE SET NULL
) ENGINE=InnoDB AUTO_INCREMENT=30 DEFAULT CHARSET=utf8;
CREATE TABLE `products_pictures` (
`ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
`IMG_PATH` varchar(255) NOT NULL,
`PRODUCT_ID` int(10) unsigned NOT NULL,
PRIMARY KEY (`ID`),
KEY `FK_products_pictures_1` (`PRODUCT_ID`),
CONSTRAINT `FK_products_pictures_1` FOREIGN KEY (`PRODUCT_ID`) REFERENCES `products` (`ID`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=20 DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;
as you can see, products_pictures.PRODUCT_ID -> products.ID and products.DEFAULT_PICTURE_ID -> products_pictures.ID, so a cycle reference. Is it OK?
No, it's not OK. Circular references between tables are messy. See this (decade old) article: SQL By Design: The Circular Reference
Some DBMS can handle these, and with special care, but MySQL will have issues.
Option 1
As your design, to make one of the two FKs nullable. This allows you to solve the chicken-and-egg problem (which table should I first Insert into?).
There is a problem though with your code. It will allow a product to have a default picture where that picture will be referencing another product!
To disallow such an error, your FK constraint should be:
CONSTRAINT FK_products_1
FOREIGN KEY (id, default_picture_id)
REFERENCES products_pictures (product_id, id)
ON DELETE RESTRICT --- the SET NULL options would
ON UPDATE RESTRICT --- lead to other issues
This will require a UNIQUE constraint/index in table products_pictures on (product_id, id) for the above FK to be defined and work properly.
Option 2
Another approach is to remove the Default_Picture_ID column form the product table and add an IsDefault BIT column in the picture table. The problem with this solution is how to allow only one picture per product to have that bit on and all others to have it off. In SQL-Server (and I think in Postgres) this can be done with a partial index:
CREATE UNIQUE INDEX is_DefaultPicture
ON products_pictures (Product_ID)
WHERE IsDefault = 1 ;
But MySQL has no such feature.
Option 3
This approach, allows you to even have both FK columns defined as NOT NULL is to use deferrable constraints. This works in PostgreSQL and I think in Oracle. Check this question and the answer by #Erwin: Complex foreign key constraint in SQLAlchemy (the All key columns NOT NULL Part).
Constraints in MySQL cannot be deferrable.
Option 4
The approach (which I find cleanest) is to remove the Default_Picture_ID column and add another table. No circular path in the FK constraints and all FK columns will be NOT NULL with this solution:
product_default_picture
----------------------
product_id NOT NULL
default_picture_id NOT NULL
PRIMARY KEY (product_id)
FOREIGN KEY (product_id, default_picture_id)
REFERENCES products_pictures (product_id, id)
This will also require a UNIQUE constraint/index in table products_pictures on (product_id, id) as in solution 1.
To summarize, with MySQL you have two options:
option 1 (a nullable FK column) with the correction above to enforce integrity correctly
option 4 (no nullable FK columns)
The only issue you're going to encounter is when you do inserts.
Which one do you insert first?
With this, you will have to do something like:
Insert product with null default picture
Insert picture(s) with the newly created product ID
Update the product to set the default picture to one that you just inserted.
Again, deleting will not be fun.
this is just suggestion but if possible create one join table between this table might be helpfull to tracking
product_productcat_join
------------------------
ID(PK)
ProductID(FK)- product table primary key
PictureID(FK) - category table primary key
In the other table you can just hold that field without the foreign key constraint.
it is useful in some cases where you want to process with the smaller table but connect to the bigger table with the result of the process.
For example if you add a product_location table which holds the country, district, city, address and longitude and latitude information. There might be a case that you want to show the product within a circle on the map.
John what your doing isnt anything bad but using PK-FK actually helps with normalizing your data by removing redundant repeating data. Which has some fantastic advantages from
Improved data integrity owing to the elimination of duplicate storage locations for the same data
Reduced locking contention and improved multiple-user concurrency
Smaller files
that is not a cyclic ref, that is pk-fk