Delayed insert due to foreign key constraints - mysql

I am trying to run a query:
INSERT
INTO `ProductState` (`ProductId`, `ChangedOn`, `State`)
SELECT t.`ProductId`, t.`ProcessedOn`, \'Activated\'
FROM `tmpImport` t
LEFT JOIN `Product` p
ON t.`ProductId` = p.`Id`
WHERE p.`Id` IS NULL
ON DUPLICATE KEY UPDATE
`ChangedOn` = VALUES(`ChangedOn`)
(I am not quite sure the query is correct, but it appears to be working), however I am running into the following issue. I am running this query before creating the entry into the 'Products' table and am getting a foreign key constraint problem due to the fact that the entry is not in the Products table yet.
My question is, is there a way to run this query, but wait until the next query (which updates the Product table) before performing the insert portion of the query above? Also to note, if the query is run after the Product entry is created it will no longer see the p.Id as being null and therefore failing so it has to be performed before the Product entry is created.
---> Edit <---
The concept I am trying to achieve is as follows:
For starters I am importing a set of data into a temp table, the Product table is a list of all products that are (or have been in the past) added through the set of data from the temp table. What I need is a separate table that provides a state change to the product as sometimes the product will become unavailable (no longer in the data set provided by the vendor).
The ProductState table is as follows:
CREATE TABLE IF NOT EXISTS `ProductState` (
`ProductId` VARCHAR(32) NOT NULL ,
`ChangedOn` DATE NOT NULL ,
`State` ENUM('Activated','Deactivated') NULL ,
PRIMARY KEY (`ProductId`, `ChangedOn`) ,
INDEX `fk_ProductState_Product` (`ProductId` ASC) ,
CONSTRAINT `fk_ProductState_Product`
FOREIGN KEY (`ProductId` )
REFERENCES `Product` (`Id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8
COLLATE = utf8_general_ci;
The foreign key is an identifying relationship with the Product table (Product.Id)
Essentially what I am trying to accomplish is this:
1. Anytime a new product (or previously deactivated product) shows up in the vendor data set, the record is created in the ProductState table as 'Activated'.
2. Anytime a product (that is activated), does not show up in the vendor data set, the record is created as 'Deactivated' in the ProductState table.
The purpose of the ProductState table is to track activation and deactivation states of a product. Also the ProductState is a Multi-To-One relationship with the Product Table, and the state of the product will only change once daily, therefore my PKEY would be ProductId and ChangedDate.

With foreign keys, you definitely need to have the data on the Product table first, before entering the state, think about it with this logic: "How can something that dont exist have a state" ?
So pseudocode of what you should do:
Read in the vendor's product list
Compare them to the existing list in your Product table
If new ones found: 3.1 Insert it to Product table, 3.2 Insert it to ProductState table
If missing from vendor's list: 4.1 Insert it to ProductState table
All these should be done in 1 transaction. Note that you should NOT delete things from Product table, unless you really want to delete every information associated with it, ie. also delete all the "states" that you have stored.
Rather than trying to do this all in 1 query - best bet is to create a stored procedure that does the work as step-by-step above. I think it gets overly complicated (or in this case, probably impossible) to do all in 1 query.
Edit: Something like this:
CREATE PROCEDURE `some_procedure_name` ()
BEGIN
-- Breakdown the tmpImport table to 2 tables: new and removed
SELECT * INTO _temp_new_products
FROM`tmpImport` t
LEFT JOIN `Product` p
ON t.`ProductId` = p.`Id`
WHERE p.`Id` IS NULL
SELECT * INTO _temp_removed_products
FROM `Product` p
LEFT JOIN `tmpImport` t
ON t.`ProductId` = p.`Id`
WHERE t.`ProductId` IS NULL
-- For each entry in _temp_new_products:
-- 1. Insert into Product table
-- 2. Insert into ProductState table 'activated'
-- For each entry in _temp_removed_products:
-- 1. Insert into ProductState table 'deactivated'
-- drop the temporary tables
DROP TABLE _temp_new_products
DROP TABLE _temp_removed_products
END

I think you should:
start a transaction
do your insert into the Products table
do your insert into the ProductState table
commit the transaction
This will avoid any foreign key errors, but will also make sure your data is always accurate. You do not want to 'avoid' the foreign key constraint in any way, and InnoDB (which I'm sure you are using) never defers these constraints unless you turn them off completely.
Also no you cannot insert into multiple tables in one INSERT ... SELECT statement.

Related

Performance of Update query compared To Delete - Insert

I have two tables : Shop and Product
Table Shop
(id INT AUTO_INCREMENT,
shop_id INT,
PRIMARY KEY(id)
);
Table Product
(
product_id INT AUTO_INCREMENT,
p_name VARCHAR(100),
p_price INT,
shop_id INT,
PRIMARY KEY(product_id),
FOREIGN KEY(shop_id) REFERENCES Shop(id)
);
On a server using Node and mysql2 package for queries.
On a client side, I'm displaying all Products that are related to specific Shop in a table.
User can change Products, and when he is pressing Save, requests are being made, sending new data, and storing her.
User can either change existing Products, or add new ones.
But i have concerns, how it will behave with a relatively big amount of products per one shop. Let's say there are 1000 of them.
The data that was inserted - marked with the flag saved_in_db=false.
Existing data, that was changed - changed=true.
Considered a few approaches :
On a server, filtering array of records received from a client, INSERT into db newly created, that are not stored yet.
But to UPDATE existing Products, i need to create a bunch of UPDATE Products SET p_name=val_1 WHERE id = ? queries, and execute them at once.
To take all Products with the specified Shop_id, DELETE them, and INSERT a new bulk of data. Not making separation between already existing records, or changed.
In this approach, i see two cons.
First - sending constant amount of data from client to server.
Second - running out of ids in DB. Because if there are 10 shops, with 1000 Products in each, and every user frequently updates records, every update, even if one new record was added, or changed, will increment id by around 1000.
Is it the only way, to update a certain amount of records in DB, executing a bunch of UPDATE queries one after another?
You could INSERT...ON DUPLICATE KEY UPDATE.
INSERT INTO Products (product_id, p_name)
VALUES (123, 'newname1'), (456, 'newname2'), (789, 'newname3'), ...more...
ON DUPLICATE KEY UPDATE p_name = VALUES(p_name);
This does not change the primary key values, it only updates the columns you tell it to.
You must include the product id's in the INSERT VALUES, because that's how it detects that you're inserting a row that already exists in the table.

Creating foreign key by matching strings between tables

I'm a beginner to SQL so this is probably a pretty newbie question, but I can't seem to get my head straight on it. I have a pair of tables called MATCH and SEGMENT.
MATCH.id int(11) ai pk
MATCH.name varchar(45)
etc.
SEGMENT.id int(11) ai pk
SEGMENT.name varchar(45)
etc.
Each row in MATCH can have one or more SEGMENT rows associated with it. The name in MATCH is unique on each row. Right now I do an inner join on the name fields to figure out which segments go with which match. I want to copy the tables to a new set of tables and set up a foreign key in SEGMENT that contains the unique ID from the MATCH row both to improve performance and to fix some problems where the names aren't always precisely the same (and they should be).
Is there a way to do a single INSERT or UPDATE statement that will do the name comparisons and add the foreign key to each row in the SEGMENT table - at least for the rows where the names are precisely the same? (For the ones that don't match, I may have to write a SQL function to "clean" the name by removing extra blanks and special characters before comparing)
Thanks for any help anyone can give me!
Here's one way I would consider doing it: add the FK column, add the constraint definition, then populate the column with an UPDATE statement using a correlated subquery:
ALTER TABLE `SEGMENT` ADD COLUMN match_id INT(11) COMMENT 'FK ref MATCH.id' ;
ALTER TABLE `SEGMENT` ADD CONSTRAINT fk_SEGMENT_MATCH
FOREIGN KEY (match_id) REFERENCES `MATCH`(id) ;
UPDATE `SEGMENT` s
SET s.match_id = (SELECT m.id
FROM MATCH m
WHERE m.name = s.name) ;
A correlated subquery (like in the example UPDATE statement above) usually isn't the most efficient approach to getting a column populated. But it seems a lot of people think it's easier to understand than the (usually) more efficient alternative, an UPDATE using a JOIN operation like this:
UPDATE `SEGMENT` s
JOIN `MATCH` m
ON m.name = s.name
SET s.match_id = m.id
Add an ID field you your MATCH Table and populate it.
them add a column MATCHID (which will be your foriegn key) to your SEGMENT table - Note you wont be able to set this as a Foreign Key till you have mapped the records correctly
Use the following query to update the foreign keys:
UPDATE A
FROM SEGMENT A
INNER JOIn MATCH B
on A.NAME=B.NAME
SET MATCHID = B.ID

Why is this simple update running so slowly?

I'm running what I thought was a fairly straight forward update on a fairly large table. I am trying to find out why this simple update is running so slowly. It took about 5 hours to complete.
master table: approx 2m row and 90 fields.
builder table: approx 1.5m rows and 15 fields
I had initially attempted the insert directly:
-- Update master table with newly calculated mcap
update master as m
inner join
(select b.date_base, b.gvkey, sum(b.sec_cap) as sum_sec_mkt
from builder as b
group by b.gvkey, b.date_base) as x
on x.gvkey = m.gvkey AND
x.date_base = m.date_base
set m.mcap = x.sum_sec_mkt;
Unfortunately this ran for a number of hours and I finally killed it after waiting 4hrs.
I then thought I'd create a temporary table and insert the results from the initial select into it.
CREATE TABLE `temp_mkt_cap` (
`date_base` date NOT NULL,
`gvkey` varchar(15) DEFAULT NULL,
`mkt_cap` double DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
-- insert market cap values in to temporary table
insert into temp_mkt_cap
select b.date_base, b.gvkey, sum(b.sec_cap) as sum_sec_mkt
from builder as b
group by b.gvkey, b.date_base;
ALTER TABLE temp_mkt_cap
add primary key (date_base, gvkey);
The insert worked fine with temp_mkt_cap having about 1.4m rows, but the final update took 5hrs to complete.
-- Update master table with newly calculated mcap
update master as m
inner join temp_mkt_cap as mc
on m.date_base = mc.date_base AND m.gvkey = mc.gvkey
set m.mcap = mc.mkt_cap;
'master' has 'date_base' and gvkey_iid as PRIMARY KEYS and gvkey as a KEY.
I have completed more complicated inserts and updates on the table before and can't work out why this isn't working.
Any help would be greatly appreciated.
Thanks,
Update: The keys on the master table are:
ALTER TABLE master
ADD PRIMARY KEY (gvkey_iid,date_base),
ADD KEY date_offset (date_offset),
ADD KEY gvkey (gvkey),
ADD KEY iid (iid);
Update I added a new key to the master table and the update ran in 93.6secs, down from 5 hours. Thanks for everyone's help.
ALTER TABLE master
ADD KEY 'date-gvkey' (date_base, gvkey);
Since you are joining on mc.date_base AND m.gvkey = mc.gvkey, you need an index on these fields in the same order you are joining them, on both tables.
If you are joining table1 with table2 on table1.field1 = table2.field1 AND table1.field2 = table2.field2, you need an index on (table1.field1, table1.field2) AND (table2.field1, table2.field2).
Not null fields are preferable.
Also, because you are updating from the mc.mkt_cap field, you need a SINGLE key on this field if it is NOT already the first field of a composite key you created earlier.
ALL other keys or indexes are going to possibly slow down your query.
Please inspect carefully your database...

Update a row in mysql and drop the row if it creates a duplicate as defined by the unique key

I've got a MySQL table that has a lot of entries. Its got a unique key defined as (state, source) so there are no duplicates for that combination of columns. However now I am realizing that much of the state data is not entered consistently. For example in some rows it is entered as "CA" and others it might be spelled out as "California."
I'd like to update all the entries that say "California" to be "CA" and if it creates a conflict in the unique key, drop the row. How can I do that?
You may be better off dumping your data and using an external tool like Google Refine to clean it up. Look at using foreign keys in the future to avoid these issues.
I don't think you can do this in one SQL statement. And if you have foreign key relationships from other tables to the one you are trying to clean-up then you definitely do not want to do this in one step (even if you could).
CREATE TABLE state_mappings (
`old` VARCHAR(64) NOT NULL,
`new` VARCHAR(64) NOT NULL
);
INSERT INTO state_mappings VALUES ('California', 'CA'), ...;
INSERT IGNORE INTO MyTable (state, source)
SELECT sm.new, s.source from states s JOIN state_mappings sm
ON s.state = sm.old;
// Update tables with foreign keys here
DELETE FROm MyTable WHERE state IN (SELECT distinct old FROM state_mappings);
DROP TABLE state_mappings;
I'm no SQL pro, so these statements can probably be optimized, but you get the gist.

How to set a database integrity check on foreign keys referenced fields

I have four Database Tables like these:
Book
ID_Book |ID_Company|Description
BookExtension
ID_BookExtension | ID_Book| ID_Discount
Discount
ID_Discount | Description | ID_Company
Company
ID_Company | Description
Any BookExtension record via foreign keys points indirectly to two different ID_Company fields:
BookExtension.ID_Book references a Book record that contains a Book.ID_Company
BookExtension.ID_Discount references a Discount record that contains a Discount.ID_Company
Is it possible to enforce in Sql Server that any new record in BookExtension must have Book.ID_Company = Discount.ID_Company ?
In a nutshell I want that the following Query must return 0 record!
SELECT count(*) from BookExtension
INNER JOIN Book ON BookExstension.ID_Book = Book.ID_Book
INNER JOIN Discount ON BookExstension.ID_Discount = Discount.ID_Discount
WHERE Book.ID_Company <> Discount.ID_Company
or, in plain English:
I don't want that a BookExtension record references a Book record of a Company and a Discount record of another different Company!
Unless I've misunderstood your intent, the general form of the SQL statement you'd use is
ALTER TABLE FooExtension
ADD CONSTRAINT your-constraint-name
CHECK (ID_Foo = ID_Bar);
That assumes existing data already conforms to the new constraint. If existing data doesn't conform, you can either fix the data (assuming it needs fixing), or you can limit the scope (probably) of the new constraint by also checking the value of ID_FooExtension. (Assuming you can identify "new" rows by the value of ID_FooExtension.)
Later . . .
Thanks, I did indeed misunderstand your situation.
As far as I know, you can't enforce that constraint the way you want to in SQL Server, because it doesn't allow SELECT queries within a CHECK constraint. (I might be wrong about that in SQL Server 2008.) A common workaround is to wrap a SELECT query in a function, and call the function, but that's not reliable according to what I've learned.
You can do this, though.
Create a UNIQUE constraint on Book
(ID_Book, ID_Company). Part of it will look like UNIQUE (ID_Book, ID_Company).
Create a UNIQUE constraint on Discount (ID_Discount, ID_Company).
Add two columns to
BookExtension--Book_ID_Company and
Discount_ID_Company.
Populate those new columns.
Change the foreign key constraints
in BookExtension. You want
BookExtension (ID_Book,
Book_ID_Company) to reference
Book (ID_Book, ID_Company). Similar change for the foreign key
referencing Discount.
Now you can add a check constraint to guarantee that BookExtension.Book_ID_Company is the same as BookExtension.Discount_ID_Company.
I'm not sure how [in]efficient this would be but you could also use an indexed view to achieve this. It needs a helper table with 2 rows as CTEs and UNION are not allowed in indexed views.
CREATE TABLE dbo.TwoNums
(
Num int primary key
)
INSERT INTO TwoNums SELECT 1 UNION ALL SELECT 2
Then the view definition
CREATE VIEW dbo.ConstraintView
WITH SCHEMABINDING
AS
SELECT 1 AS Col FROM dbo.BookExtension
INNER JOIN dbo.Book ON dbo.BookExtension.ID_Book = Book.ID_Book
INNER JOIN dbo.Discount ON dbo.BookExtension.ID_Discount = Discount.ID_Discount
INNER JOIN dbo.TwoNums ON Num = Num
WHERE dbo.Book.ID_Company <> dbo.Discount.ID_Company
And a unique index on the View
CREATE UNIQUE CLUSTERED INDEX [uix] ON [dbo].[ConstraintView]([Col] ASC)