MySQL Relationships & Joins - mysql

In a MySQL database where there are relationships between tables and the primary key of one table is stored as a foreign key in a second table, is there still a need to perform a join?
If there is, what is the point on declaring the relationship? I'd take a stab in the dark and say it's something to do with the indexing or related tables can find related records much faster? I've tried Googleing this, but can't seem to find much. I'm sure there is loads out there on this, but I don't know the keywords to search for.
Here is an example of table 1 and table 2:
------------------- Table 1 ----------------------
CREATE TABLE IF NOT EXISTS `db_hint`.`user` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`fb_id` INT NOT NULL,
`last_logged_in` DATETIME NULL,
`permissions` INT UNSIGNED NOT NULL,
PRIMARY KEY (`id`),
INDEX `permissions_id_idx` (`permissions` ASC),
CONSTRAINT `permissions_id`
FOREIGN KEY (`permissions`)
REFERENCES `db_hint`.`permissions` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
----------------- Table 2 ----------------------
CREATE TABLE IF NOT EXISTS `db_hint`.`user_stat` (
`id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
`user_id` INT UNSIGNED NOT NULL,
PRIMARY KEY (`id`),
INDEX `user_id_idx3` (`user_id` ASC),
CONSTRAINT `user_id`
FOREIGN KEY (`user_id`)
REFERENCES `db_hint`.`user` (`id`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
When performing any kind of join, does the InnoDB engine use the relationship in any way? Thanks.

The point of declaring the foreign key is to enforce data consistency.
You will still need the JOIN in order to get desired data.
In MySQL foreign keys will improve performance, but don't expect much comparable to indexes.

To do a query involving two tables, you need JOIN ... ON ... to say how they are related. FOREIGN KEYs are not involved in a SELECT and has zero impact on performance of SELECT. You do not "have to have" FOREIGN KEYs to perform SELECTs.
A FOREIGN KEY is used during INSERTs (and other writes) to verify that a subsequent JOIN will actually find something in the other table. It is an overhead during the write -- the INSERT actively checks (via an index) that the referenced table has the indicated row.
FOREIGN KEYs may also do a cascading operation. For example, a DELETE can cause another DELETE to happen. I prefer to such take control in my application code.

Related

Optimize where in delete in mysql

I have a mysql performance problem : i have to update a large innodb table (approx. 1 million lines), deleting rows by thousands. Let's say it's items, coming from multiple sources. So, the table has a primary key item_id, an provider_id column identifying the provider of the item, and an external_id column (which is the id of this item in this provider, the way that this specific provider identify it) which has to be a varchar (several providers, several internal ways for them of identifying their items).
When i update, i go provider by provider, and i make a match/diff between a json file and the database, to know which items have to be added, updated or deleted. The match is on the external_id. When i have to delete items, i go with a query like DELETE FROM table_items WHERE provider_id=A AND external_id IN (...).
Even if i make batches of 1k items, it's really slow.
Here's a simplified table definition
CREATE TABLE `annonce` (
`annonce_id` INT(11) NOT NULL AUTO_INCREMENT,
`annonce_id_externe` VARCHAR(70) NOT NULL,
`provenance_id` INT(11) NOT NULL,
`categorie_id` INT(11) NOT NULL,
PRIMARY KEY (`annonce_id`),
UNIQUE INDEX `id_externe_par_provenance_et_categorie` (`annonce_id_externe`, `provenance_id`, `categorie_id`),
INDEX `provenance_id` (`provenance_id`),
INDEX `annonce_id_externe` (`annonce_id_externe`),
INDEX `categorie_id` (`categorie_id`),
CONSTRAINT `annonce_categorie_id` FOREIGN KEY (`categorie_id`) REFERENCES `categorie` (`categorie_id`) ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT `annonce_provenance_id` FOREIGN KEY (`provenance_id`) REFERENCES `provenance` (`provenance_id`) ON UPDATE CASCADE ON DELETE CASCADE,
)
Any idea on making it faster ?
Thanks

On delete cascade, records can be deleted on the parent table, but parent table can not be dropped

I have two tables
CREATE TABLE `category` (`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)) ENGINE=InnoDB DEFAULT CHARSET=utf8
CREATE TABLE `item` (`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
categoryid` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`id`), KEY `fk_categoryid_item` (`categoryid`),
CONSTRAINT `fk_categoryid_item` FOREIGN KEY (`categoryid`)
REFERENCES `category` (`id`) ON DELETE CASCADE)
ENGINE=InnoDB DEFAULT CHARSET=utf8
In the table category I have a record with id 2.
In the item I have a record with id = 1, categoryid = 2, with 2 as the foreign key referring to the category table. If I delete the row in the category table with the id 2, the record in the item table that has the categoryid as 2 also gets deleted. This is as expected because of on delete cascade. But If I try to drop the table category, I get the error Error Code:
1217. Cannot delete or update a parent row: a foreign key constraint fails
Why does this happen ? Of course, setting foreign_key_checks = 0 dropping the table becomes possible. But I would like to know why does this happen that we can delete the records, but can not drop the table with on cascade delete option. Does this option only apply for deleting records, but not for dropping tables.
I checked the documentation, I could not find any explanation for this.
Please let me know if there is something fundamental that I am missing or if you point out to the related documentation it would be helpful. I am using MySQL 5.7.
Thanks in advance.
If you delete the table category but do not remove/alter the foreign key, then that will be left pointing to nothing. Internally the database has a management system that reinforces the referential constraints and that prevents you from creating lose ends. See also this, this and this questions.
It has something to do also with the math behind it, it is called relational algebra. I am not at that level either, but I think it breaks the definition of a FK if you delete one of the associated tables.
In database relational modeling and implementation, a unique key is a set of zero or more attributes, the value(s) of which are guaranteed to be unique for each tuple (row) in a relation.

MySQL INSERT INTO tables with foreign key speed is very slow

I'm trying to insert (from postgres via grails) about 10 millions records into a table with a primary key and 2 foreign keys. If I keep the all primary and foreign keys and the indexes automatically generated along with these keys, it'll take about 7.5 hours to complete. If I drop all the keys and indexes before the inserts, it'll take only 10 minutes to executes all the inserts. But when I used ALTER TABLE to add the keys back in, it took forever (more than 7 hours) to perform. Is there a way to improve the performance?
The concept table that this table linked to has about 1 million records.
Here's the CREATE TABLE statement:
CREATE TABLE `concept_relationship` (
`concept_id_1` int(11) NOT NULL,
`concept_id_2` int(11) NOT NULL,
`relationship_id` int(11) NOT NULL,
`valid_start_date` date NOT NULL,
`valid_end_date` date NOT NULL DEFAULT '2099-12-31',
`invalid_reason` char(1) DEFAULT NULL,
PRIMARY KEY (`concept_id_1`,`concept_id_2`,`relationship_id`),
KEY `concept_id_1` (`concept_id_1`),
KEY `concept_id_2` (`concept_id_2`),
KEY `relationship_id` (`relationship_id`),
CONSTRAINT `FK_CONCEPT_REL_child` FOREIGN KEY (`concept_id_2`) REFERENCES `concept` (`concept_id`),
CONSTRAINT `FK_CONCEPT_REL_Parent` FOREIGN KEY (`concept_id_1`) REFERENCES `concept` (`concept_id`),
CONSTRAINT `FK_CONCEPT_REL_REL_TYPE` FOREIGN KEY (`relationship_id`) REFERENCES `relationship` (`relationship_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Thanks for your help
First, the index concept_id_1 is not needed. The primary key covers this index entirely.
My suggestion is to create the table without the keys or foreign references, except for the primary key. When you insert into the table, be sure that the input data is sorted by the keys of the primary key. Then add back the other keys with explicit index creation:
create index concept_relationship_idx1 on concept_relationship(concept_id_1);
And so on.
If this doesn't work efficiently, then reconsider the primary key. The data is actually ordered by the primary key, which can be computationally intensive for inserts. Add an auto-incremented primary key. Insert the data. Then create a unique index for what is now the primary key, and indexes for the other keys.

In SQL, is it OK for two tables to refer to each other?

In this system, we store products, images of products (there can be many image for a product), and a default image for a product. The database:
CREATE TABLE `products` (
`ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
`NAME` varchar(255) NOT NULL,
`DESCRIPTION` text NOT NULL,
`ENABLED` tinyint(1) NOT NULL DEFAULT '1',
`DATEADDED` datetime NOT NULL,
`DEFAULT_PICTURE_ID` int(10) unsigned DEFAULT NULL,
PRIMARY KEY (`ID`),
KEY `Index_2` (`DATEADDED`),
KEY `FK_products_1` (`DEFAULT_PICTURE_ID`),
CONSTRAINT `FK_products_1` FOREIGN KEY (`DEFAULT_PICTURE_ID`) REFERENCES `products_pictures` (`ID`) ON DELETE SET NULL ON UPDATE SET NULL
) ENGINE=InnoDB AUTO_INCREMENT=30 DEFAULT CHARSET=utf8;
CREATE TABLE `products_pictures` (
`ID` int(10) unsigned NOT NULL AUTO_INCREMENT,
`IMG_PATH` varchar(255) NOT NULL,
`PRODUCT_ID` int(10) unsigned NOT NULL,
PRIMARY KEY (`ID`),
KEY `FK_products_pictures_1` (`PRODUCT_ID`),
CONSTRAINT `FK_products_pictures_1` FOREIGN KEY (`PRODUCT_ID`) REFERENCES `products` (`ID`) ON DELETE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=20 DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;
as you can see, products_pictures.PRODUCT_ID -> products.ID and products.DEFAULT_PICTURE_ID -> products_pictures.ID, so a cycle reference. Is it OK?
No, it's not OK. Circular references between tables are messy. See this (decade old) article: SQL By Design: The Circular Reference
Some DBMS can handle these, and with special care, but MySQL will have issues.
Option 1
As your design, to make one of the two FKs nullable. This allows you to solve the chicken-and-egg problem (which table should I first Insert into?).
There is a problem though with your code. It will allow a product to have a default picture where that picture will be referencing another product!
To disallow such an error, your FK constraint should be:
CONSTRAINT FK_products_1
FOREIGN KEY (id, default_picture_id)
REFERENCES products_pictures (product_id, id)
ON DELETE RESTRICT --- the SET NULL options would
ON UPDATE RESTRICT --- lead to other issues
This will require a UNIQUE constraint/index in table products_pictures on (product_id, id) for the above FK to be defined and work properly.
Option 2
Another approach is to remove the Default_Picture_ID column form the product table and add an IsDefault BIT column in the picture table. The problem with this solution is how to allow only one picture per product to have that bit on and all others to have it off. In SQL-Server (and I think in Postgres) this can be done with a partial index:
CREATE UNIQUE INDEX is_DefaultPicture
ON products_pictures (Product_ID)
WHERE IsDefault = 1 ;
But MySQL has no such feature.
Option 3
This approach, allows you to even have both FK columns defined as NOT NULL is to use deferrable constraints. This works in PostgreSQL and I think in Oracle. Check this question and the answer by #Erwin: Complex foreign key constraint in SQLAlchemy (the All key columns NOT NULL Part).
Constraints in MySQL cannot be deferrable.
Option 4
The approach (which I find cleanest) is to remove the Default_Picture_ID column and add another table. No circular path in the FK constraints and all FK columns will be NOT NULL with this solution:
product_default_picture
----------------------
product_id NOT NULL
default_picture_id NOT NULL
PRIMARY KEY (product_id)
FOREIGN KEY (product_id, default_picture_id)
REFERENCES products_pictures (product_id, id)
This will also require a UNIQUE constraint/index in table products_pictures on (product_id, id) as in solution 1.
To summarize, with MySQL you have two options:
option 1 (a nullable FK column) with the correction above to enforce integrity correctly
option 4 (no nullable FK columns)
The only issue you're going to encounter is when you do inserts.
Which one do you insert first?
With this, you will have to do something like:
Insert product with null default picture
Insert picture(s) with the newly created product ID
Update the product to set the default picture to one that you just inserted.
Again, deleting will not be fun.
this is just suggestion but if possible create one join table between this table might be helpfull to tracking
product_productcat_join
------------------------
ID(PK)
ProductID(FK)- product table primary key
PictureID(FK) - category table primary key
In the other table you can just hold that field without the foreign key constraint.
it is useful in some cases where you want to process with the smaller table but connect to the bigger table with the result of the process.
For example if you add a product_location table which holds the country, district, city, address and longitude and latitude information. There might be a case that you want to show the product within a circle on the map.
John what your doing isnt anything bad but using PK-FK actually helps with normalizing your data by removing redundant repeating data. Which has some fantastic advantages from
Improved data integrity owing to the elimination of duplicate storage locations for the same data
Reduced locking contention and improved multiple-user concurrency
Smaller files
that is not a cyclic ref, that is pk-fk

Understanding / mySQL aka tricking ForeignKey relationships in Django

So I've inherited some django.
The mySQL table is simple enough where parent is NOT a FK relationship just the "Parent" id:
CREATE TABLE `Child` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`parent` int(10) unsigned NOT NULL,
`name` varchar(255) NOT NULL,
UNIQUE KEY `id` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=24;
But then the originator did this..
class Child(models.Model):
"""Project Child information"""
id = models.AutoField(primary_key=True)
parent = models.ForeignKey(Parent)
name = models.CharField(max_length=255)
class Meta:
managed = False
Admittedly I am NOT a SQL Jockey but I know that a "real" Foreign Key Relationship looks similar to this notice CONSTRAINT...
CREATE TABLE `Child` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`parent_id` int(11) NOT NULL,
`name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `child_63f17a16` (`parent_id`),
CONSTRAINT `parent_id_refs_id_34923e1e` FOREIGN KEY (`parent_id`) REFERENCES `Parent` (`id`)
) ENGINE=InnoDB;
What I want to know is the following:
What problems could I expect to see by this "trickery".
While this appears to work - is it recommended or advised.
Would we be advised to modify the SQL to add in the constraint?
Thanks so much!
Not having an actual constraint might lead to broken references, invalid parents and other sorts of data inconsistencies. I am not a Django expert but I would venture a guess that in most cases Django will still handle the relations fine unless you purposefully add some invalid records.
Normally, if your RDBMS supports foreign key constraints, there is absolutely no reason not to use them, and it could potentially be considered a design flaw to ignore them.
You should consider adding the key constraints. Not only do they give your DBMS a good idea of how to optimize the queries, they also ensure consistency in your data. I am pretty sure Django has a setting somewhere that will automatically generate the SQL to add the key constraints when you run manage.py syncdb
For more information about why you should prefer foreign keys, you should read the MySQL Foreign Key Documentation
Most interestingly:
InnoDB requires indexes on foreign keys and referenced keys so that foreign key checks can be fast and not require a table scan. In the referencing table, there must be an index where the foreign key columns are listed as the first columns in the same order. Such an index is created on the referencing table automatically if it does not exist. (This is in contrast to some older versions, in which indexes had to be created explicitly or the creation of foreign key constraints would fail.) index_name, if given, is used as described previously.
Its supposed to be faster ... since you mysql doesn't check the constraint before adding a row in the child table.
But with the foreign key, it would make your life easier since you can use the on update and on delete.
I'd go with the constraint.