Creating foreign key by matching strings between tables - mysql

I'm a beginner to SQL so this is probably a pretty newbie question, but I can't seem to get my head straight on it. I have a pair of tables called MATCH and SEGMENT.
MATCH.id int(11) ai pk
MATCH.name varchar(45)
etc.
SEGMENT.id int(11) ai pk
SEGMENT.name varchar(45)
etc.
Each row in MATCH can have one or more SEGMENT rows associated with it. The name in MATCH is unique on each row. Right now I do an inner join on the name fields to figure out which segments go with which match. I want to copy the tables to a new set of tables and set up a foreign key in SEGMENT that contains the unique ID from the MATCH row both to improve performance and to fix some problems where the names aren't always precisely the same (and they should be).
Is there a way to do a single INSERT or UPDATE statement that will do the name comparisons and add the foreign key to each row in the SEGMENT table - at least for the rows where the names are precisely the same? (For the ones that don't match, I may have to write a SQL function to "clean" the name by removing extra blanks and special characters before comparing)
Thanks for any help anyone can give me!

Here's one way I would consider doing it: add the FK column, add the constraint definition, then populate the column with an UPDATE statement using a correlated subquery:
ALTER TABLE `SEGMENT` ADD COLUMN match_id INT(11) COMMENT 'FK ref MATCH.id' ;
ALTER TABLE `SEGMENT` ADD CONSTRAINT fk_SEGMENT_MATCH
FOREIGN KEY (match_id) REFERENCES `MATCH`(id) ;
UPDATE `SEGMENT` s
SET s.match_id = (SELECT m.id
FROM MATCH m
WHERE m.name = s.name) ;
A correlated subquery (like in the example UPDATE statement above) usually isn't the most efficient approach to getting a column populated. But it seems a lot of people think it's easier to understand than the (usually) more efficient alternative, an UPDATE using a JOIN operation like this:
UPDATE `SEGMENT` s
JOIN `MATCH` m
ON m.name = s.name
SET s.match_id = m.id

Add an ID field you your MATCH Table and populate it.
them add a column MATCHID (which will be your foriegn key) to your SEGMENT table - Note you wont be able to set this as a Foreign Key till you have mapped the records correctly
Use the following query to update the foreign keys:
UPDATE A
FROM SEGMENT A
INNER JOIn MATCH B
on A.NAME=B.NAME
SET MATCHID = B.ID

Related

Replicate 'ON DELETE SET NULL' behavior from combination of two non-unique columns on another table

The Situation
I have a full-stack web application with two MySQL tables: channel_strips and mic_lookup.
DESCRIBE `channel_strips`;
Field Type Null Key
preset_id varchar(127) NO PRI
mic_function varchar(225) YES
phantom_power tinyint(1) YES
...
DESCRIBE `mic_lookup`;
Field Type Null Key
Microphone varchar(40) NO PRI
mic_function varchar(225) NO
phantom_power tinyint(1) NO
...
(many other columns)
I want the channel_strips table to always hold only combinations of mic_function and phantom_power values that can be currently found in the mic_lookup table (or null values).
What is working
On the HTML end, I've limited the input to these columns in channel_strips with a <select> element that gets values from this mysqli query: SELECT DISTINCT `mic_function`, `phantom_power` FROM `mic_lookup`; This successfully restricts the user input.
The Problem
The one situation I've identified where this fails is when entries are deleted or changed in mic_lookup such that one pre-existing combination of mic_function and phantom_power is eliminated. As a result, channel_strips could still have a combination of the two columns that is actually no longer an option. In this situation, I'd like those two columns to be nullified on rows where they hold the old combination, essentially emulating an ON DELETE SET NULL statement as if it were a foreign key.
What I've tried
For a while, I had an intermediate table, mic_functions with a single column, mic_function, which served as a foreign key to both tables' mic_function columns. However, this was before I realized that phantom_power needed to be included. Furthermore, it was very confusing from a user's perspective, since intuitively you would want to set these values in the mic_lookup table.
My next idea was to create a view instead so I'd have a 'table' that automatically updates, and reference that as a foreign key - maybe something like...
CREATE VIEW mic_functions AS(
SELECT DISTINCT
`mic_function`,
`phantom_power`
FROM
`mic_lookup`
);
ALTER VIEW mic_functions ADD CONSTRAINT PK_mic_function PRIMARY KEY(`mic_function`, `phantom_power`);
Of course, this doesn't work. You can't add a primary key to a VIEW.
Finally, I suppose I could write a bunch of php to query and perform a series of checks on channel_strips every time the mic_lookup table is updated, and execute an appropriate UPDATE query if the checks are violated, but it seems to me like there ought to be a simpler way to handle this on the SQL side of things. Maybe using SQL checks or triggers or a combination of the two would work, but I have no experience with checks and triggers.
Note
phantom_power is boolean, and I'm using MySQL version 10.4.21-MariaDB
I found a solution using triggers, by joining the tables and filtering down to where the joined primary key is null:
CREATE TRIGGER `mic_lookup_update` AFTER UPDATE
ON
`mic_lookup` FOR EACH ROW
UPDATE
`channel_strips` c
LEFT JOIN `mic_lookup` m ON
c.mic_function = m.mic_function AND c.phantom_power = m.phantom_power
SET
c.mic_function = NULL,
c.phantom_power = NULL
WHERE
m.Microphone IS NULL;
CREATE TRIGGER `mic_lookup_delete` AFTER DELETE
ON
`mic_lookup` FOR EACH ROW
UPDATE
`channel_strips` c
LEFT JOIN `mic_lookup` m ON
c.mic_function = m.mic_function AND c.phantom_power = m.phantom_power
SET
c.mic_function = NULL,
c.phantom_power = NULL
WHERE
m.Microphone IS NULL;

What is the point of providing a JOIN condition when there are foreign keys?

TL;DR: Why do we have to add ON table1.column = table2.column?
This question asks roughly why do we need to have foreign keys if joining works just fine without them. Here, I'd like to ask the reverse. Given the simplest possible database, like this:
CREATE TABLE class (
class_id INT PRIMARY KEY,
class_name VARCHAR(40)
);
CREATE TABLE student (
student_id INT PRIMARY KEY,
student_name VARCHAR(40),
class_id INT,
FOREIGN KEY(class_id) REFERENCES class(class_id) ON DELETE SET NULL
);
… and a simple join, like this:
SELECT student_id, student_name, class_name
FROM student
JOIN class
ON student.class_id = class.class_id;
… why can't we just omit the ON clause?
SELECT student_id, student_name, class_name
FROM student
JOIN class;
To me, the line FOREIGN KEY(class_id) REFERENCES class(class_id) … in the definition of student already includes all the necessary information for the FROM student JOIN class to have an implicit ON student.class_id = class.class_id condition; but we still have to add it. Why is that?
For this you must consider the JOIN operation. It doesn't check if your two table or collection have relation or not. So the simple join without condition (ON) you will have a big result with all possibilities.
The ON operation filters to get your expected result
Reposting Damien_The_Unbeliever's comment as an answer
you don't have to join on foreign keys;
sometimes multiple foreign keys exist between the same pair of tables.
Also, SQL is a crusty language without many shortcuts for the most common use case.
JOIN condition is an expression which specifies the maching criteria, and it is checked during JOIN process. It can cause a fail only if syntax error occures.
FOREIGN KEY is a rule for data consistency checking subsystem, and it is checked during data change. It will cause a fail if the data state (intermnediate and/or final) does not match the rule.
In other words, there is nothing in common between them, they are completely different and unrelated things.
I feel like I have to reiterate parts of the question. Please, give it a second read - Dima Parzhitsky
Imagine that your offer is accepted. I have tables:
CREATE TABLE users (userid INT PRIMARY KEY);
CREATE TABLE messages (sender INT REFERENCES users (userid),
receiver INT REFERENCES users (userid));
I write SELECT * FROM users JOIN messages.
What reference must be used for joining condition? And justify your assumption...

Add a Foreign Key to an Existing Table that Contains what would be Invalid FK Values (MYSQL)

I need to add a Foreign Key to a table that already exists and is populated with data that would contain invalid Foreign Key Values. (MYSQL)
I know there are several questions along these lines, but I can't seem to find any that answer my scenario.
Table and Data Structure
"GblTable" contains an "Org" field that needs to become a FK of the Org table. The Org table has a PK field called "number".
Currently, the GblTable contains non-existent Org numbers (ie. If the Org table has rows with PKs 1,2, and 3, GblTable might have rows with Org as 4 or 5). While this is the case, I cannot apply the constraint to reference GblTable.org to Org.number.
I believe the best approach for this particular situation will will be to set the FK field in those rows to NULL before I apply the constraint. NULL is a valid GblTable.Org value for the program, so this would achieve an acceptable outcome.
What I Have so Far
I want to set all GblTable.Org values to NULL where they do not match a valid Org.Number.
In pseudocode:
set GblTable.ORG to NULL
where the GblTable.number is one of the following:
( select all GblTable.numbers where the GblTable.Org does not match an existing Org.Number )
In Sql, but I get the error "You can't specify target table 'GblTable' for update in FROM clause":
update GblTable set Org=NULL
where number IN (
select number
from GblTable
where Org NOT IN (select number from Org)
)
What's the best way to achieve my requirement?
You don't need the extra level of subquery:
update GblTable set Org=NULL
where Org NOT IN (select number from Org)

Delayed insert due to foreign key constraints

I am trying to run a query:
INSERT
INTO `ProductState` (`ProductId`, `ChangedOn`, `State`)
SELECT t.`ProductId`, t.`ProcessedOn`, \'Activated\'
FROM `tmpImport` t
LEFT JOIN `Product` p
ON t.`ProductId` = p.`Id`
WHERE p.`Id` IS NULL
ON DUPLICATE KEY UPDATE
`ChangedOn` = VALUES(`ChangedOn`)
(I am not quite sure the query is correct, but it appears to be working), however I am running into the following issue. I am running this query before creating the entry into the 'Products' table and am getting a foreign key constraint problem due to the fact that the entry is not in the Products table yet.
My question is, is there a way to run this query, but wait until the next query (which updates the Product table) before performing the insert portion of the query above? Also to note, if the query is run after the Product entry is created it will no longer see the p.Id as being null and therefore failing so it has to be performed before the Product entry is created.
---> Edit <---
The concept I am trying to achieve is as follows:
For starters I am importing a set of data into a temp table, the Product table is a list of all products that are (or have been in the past) added through the set of data from the temp table. What I need is a separate table that provides a state change to the product as sometimes the product will become unavailable (no longer in the data set provided by the vendor).
The ProductState table is as follows:
CREATE TABLE IF NOT EXISTS `ProductState` (
`ProductId` VARCHAR(32) NOT NULL ,
`ChangedOn` DATE NOT NULL ,
`State` ENUM('Activated','Deactivated') NULL ,
PRIMARY KEY (`ProductId`, `ChangedOn`) ,
INDEX `fk_ProductState_Product` (`ProductId` ASC) ,
CONSTRAINT `fk_ProductState_Product`
FOREIGN KEY (`ProductId` )
REFERENCES `Product` (`Id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8
COLLATE = utf8_general_ci;
The foreign key is an identifying relationship with the Product table (Product.Id)
Essentially what I am trying to accomplish is this:
1. Anytime a new product (or previously deactivated product) shows up in the vendor data set, the record is created in the ProductState table as 'Activated'.
2. Anytime a product (that is activated), does not show up in the vendor data set, the record is created as 'Deactivated' in the ProductState table.
The purpose of the ProductState table is to track activation and deactivation states of a product. Also the ProductState is a Multi-To-One relationship with the Product Table, and the state of the product will only change once daily, therefore my PKEY would be ProductId and ChangedDate.
With foreign keys, you definitely need to have the data on the Product table first, before entering the state, think about it with this logic: "How can something that dont exist have a state" ?
So pseudocode of what you should do:
Read in the vendor's product list
Compare them to the existing list in your Product table
If new ones found: 3.1 Insert it to Product table, 3.2 Insert it to ProductState table
If missing from vendor's list: 4.1 Insert it to ProductState table
All these should be done in 1 transaction. Note that you should NOT delete things from Product table, unless you really want to delete every information associated with it, ie. also delete all the "states" that you have stored.
Rather than trying to do this all in 1 query - best bet is to create a stored procedure that does the work as step-by-step above. I think it gets overly complicated (or in this case, probably impossible) to do all in 1 query.
Edit: Something like this:
CREATE PROCEDURE `some_procedure_name` ()
BEGIN
-- Breakdown the tmpImport table to 2 tables: new and removed
SELECT * INTO _temp_new_products
FROM`tmpImport` t
LEFT JOIN `Product` p
ON t.`ProductId` = p.`Id`
WHERE p.`Id` IS NULL
SELECT * INTO _temp_removed_products
FROM `Product` p
LEFT JOIN `tmpImport` t
ON t.`ProductId` = p.`Id`
WHERE t.`ProductId` IS NULL
-- For each entry in _temp_new_products:
-- 1. Insert into Product table
-- 2. Insert into ProductState table 'activated'
-- For each entry in _temp_removed_products:
-- 1. Insert into ProductState table 'deactivated'
-- drop the temporary tables
DROP TABLE _temp_new_products
DROP TABLE _temp_removed_products
END
I think you should:
start a transaction
do your insert into the Products table
do your insert into the ProductState table
commit the transaction
This will avoid any foreign key errors, but will also make sure your data is always accurate. You do not want to 'avoid' the foreign key constraint in any way, and InnoDB (which I'm sure you are using) never defers these constraints unless you turn them off completely.
Also no you cannot insert into multiple tables in one INSERT ... SELECT statement.

How to set a database integrity check on foreign keys referenced fields

I have four Database Tables like these:
Book
ID_Book |ID_Company|Description
BookExtension
ID_BookExtension | ID_Book| ID_Discount
Discount
ID_Discount | Description | ID_Company
Company
ID_Company | Description
Any BookExtension record via foreign keys points indirectly to two different ID_Company fields:
BookExtension.ID_Book references a Book record that contains a Book.ID_Company
BookExtension.ID_Discount references a Discount record that contains a Discount.ID_Company
Is it possible to enforce in Sql Server that any new record in BookExtension must have Book.ID_Company = Discount.ID_Company ?
In a nutshell I want that the following Query must return 0 record!
SELECT count(*) from BookExtension
INNER JOIN Book ON BookExstension.ID_Book = Book.ID_Book
INNER JOIN Discount ON BookExstension.ID_Discount = Discount.ID_Discount
WHERE Book.ID_Company <> Discount.ID_Company
or, in plain English:
I don't want that a BookExtension record references a Book record of a Company and a Discount record of another different Company!
Unless I've misunderstood your intent, the general form of the SQL statement you'd use is
ALTER TABLE FooExtension
ADD CONSTRAINT your-constraint-name
CHECK (ID_Foo = ID_Bar);
That assumes existing data already conforms to the new constraint. If existing data doesn't conform, you can either fix the data (assuming it needs fixing), or you can limit the scope (probably) of the new constraint by also checking the value of ID_FooExtension. (Assuming you can identify "new" rows by the value of ID_FooExtension.)
Later . . .
Thanks, I did indeed misunderstand your situation.
As far as I know, you can't enforce that constraint the way you want to in SQL Server, because it doesn't allow SELECT queries within a CHECK constraint. (I might be wrong about that in SQL Server 2008.) A common workaround is to wrap a SELECT query in a function, and call the function, but that's not reliable according to what I've learned.
You can do this, though.
Create a UNIQUE constraint on Book
(ID_Book, ID_Company). Part of it will look like UNIQUE (ID_Book, ID_Company).
Create a UNIQUE constraint on Discount (ID_Discount, ID_Company).
Add two columns to
BookExtension--Book_ID_Company and
Discount_ID_Company.
Populate those new columns.
Change the foreign key constraints
in BookExtension. You want
BookExtension (ID_Book,
Book_ID_Company) to reference
Book (ID_Book, ID_Company). Similar change for the foreign key
referencing Discount.
Now you can add a check constraint to guarantee that BookExtension.Book_ID_Company is the same as BookExtension.Discount_ID_Company.
I'm not sure how [in]efficient this would be but you could also use an indexed view to achieve this. It needs a helper table with 2 rows as CTEs and UNION are not allowed in indexed views.
CREATE TABLE dbo.TwoNums
(
Num int primary key
)
INSERT INTO TwoNums SELECT 1 UNION ALL SELECT 2
Then the view definition
CREATE VIEW dbo.ConstraintView
WITH SCHEMABINDING
AS
SELECT 1 AS Col FROM dbo.BookExtension
INNER JOIN dbo.Book ON dbo.BookExtension.ID_Book = Book.ID_Book
INNER JOIN dbo.Discount ON dbo.BookExtension.ID_Discount = Discount.ID_Discount
INNER JOIN dbo.TwoNums ON Num = Num
WHERE dbo.Book.ID_Company <> dbo.Discount.ID_Company
And a unique index on the View
CREATE UNIQUE CLUSTERED INDEX [uix] ON [dbo].[ConstraintView]([Col] ASC)