Update to table joined on composite key - mysql

I am trying to update rows in a data table that intersect rows in a smaller index table. The two tables are joined on the composite PK of the data table, and explain-select using the same criteria shows that the index is being used properly, and the correct unique rows are fetched - but I'm still having issues with the update.
The update on the joined tables works fine when there's only 1 row in the temp table, but when I have more rows, I get MySql Error 1175, and none of the WHERE conditions I specify are recognized.
I'm aware that I can just switch off safe mode with SET SQL_SAFE_UPDATES=0, but can anyone tell me what I'm not understanding here? Why is my WHERE condition not accepted, and why does it even need a where when I'm doing a NATURAL JOIN - and why does this work with only one row in the right-hand-side table (MyTempTable)?
The Code
Below is vastly simplified, but structurally identical create table & updates representing my problem.
-- The Data Table.
Create Table MyDataTable
(
KeyPartOne int not null,
KeyPartTwo varchar(64) not null,
KeyPartThree int not null,
RelevantData varchar(200) null,
Primary key (KeyPartOne, KeyPartTwo, KeyPartThree)
) Engine=InnoDB;
-- The 'Temp' table.
Create Table MyTempTable
(
KeyPartOne int not null,
KeyPartTwo varchar(64) not null,
KeyPartThree int not null,
Primary key (KeyPartOne, KeyPartTwo, KeyPartThree)
)Engine=Memory;
-- The Update Query (works fine with only 1 row in Temp table)
update MyDataTable natural join MyTempTable
set RelevantData = 'Something Meaningful';
-- Specifying 'where' - roduces same effect as the other update query
update MyDataTable mdt join MyTempTable mtt
on mdt.KeyPartOne = mtt.KeyPartOne
and mdt.KeyPartTwo = mtt.KeyPartTwo
and mdt.KeyPartThree = mtt.KeyPartThree
set RelevantData = 'Something Meaningful'
where mdt.KeyPartOne = mtt.KeyPartOne
and mdt.KeyPartTwo = mtt.KeyPartTwo
and mdt.KeyPartThree = mtt.KeyPartThree;
P.S. Both of the above update statements work as expected when the temp table contains only one row, but give me the error when there's more than one row. I'm seriously curious about why!

In your first UPDATE query, you use NATURAL JOIN, which is the same as NATURAL LEFT JOIN.
In your second UPDATE query, you use JOIN, which is the same as INNER JOIN.
A LEFT JOIN is not the same as an INNER JOIN, and a NATURAL JOIN is not the same as a JOIN.
Not sure what you're trying to do, but if you are trying to update all rows in MyDataTable where a corresponding entry exists in MyTempTable, this query should do the trick:
UPDATE
myDataTable mdt
INNER JOIN MyTempTable mtt ON
mdt.KeyPartOne = mtt.KeyPartOne
AND mdt.KeyPartTwo = mtt.KeyPartTwo
AND mdt.KeyPartThree = mtt.KeyPartThree
SET
mdt.RelevantData = 'Something Meaningful'
If that's not what you're trying to do, please clarify and I will update my answer.

Per the MySql forum, the update queries are valid, and the fact that they don't work in Workbench with safe-update mode turned on does not indicate that there's anything wrong with the index. It's just a quirk of Workbench's "don't-shoot-yourself-in-the-foot" mode. :-)

Related

INNER JOIN and GROUP BY to prevent duplicate results

Context:
I'm working on a simple ORM (for PHP) that automatize most of queries, based on a static configuration.
Thus, from tables and entities definitions, the library handles joins automatically and generates appropriate fields/table alias... No problem for LEFT joins but INNER may result in duplicated results in case of relation One-to-Many.
My thought was to automatically add a GROUP BY clause (on the auto-increment key) if necessary.
The question
Is it correct to consider that I need to add a GROUP BY clause if (and only if) the join's ON and WHERE conditions doesn't match a unique key of the joined table ?
Example
A very simple example, where I want to select all events with (at least) an associated Showing.
If there is an other way to do it without INNER JOIN, I'm interested to know how :)
CREATE TABLE `Event` (
`Id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
`Name` VARCHAR(255) NOT NULL
);
INSERT INTO `Event` (`Name`) VALUES ('My cool event');
CREATE TABLE `Showing` (
`Id` INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
`EventId` INT UNSIGNED NOT NULL,
`Place` VARCHAR(50) NOT NULL,
FOREIGN KEY (`EventId`) REFERENCES `Event`(`Id`),
UNIQUE (`EventId`, `Place`)
);
INSERT INTO `Showing` (`EventId`, `Place`) VALUES (1, 'School');
INSERT INTO `Showing` (`EventId`, `Place`) VALUES (1, 'Park');
-- Correct queries
SELECT t.* FROM `Event` t INNER JOIN `Showing` t1 ON t.Id=t1.`EventId` WHERE t1.`PlaceId` = 'School';
SELECT t.* FROM `Event` t INNER JOIN `Showing` t1 ON t.Id=t1.`EventId` AND t1.`PlaceId` = 'School';
-- Query leading to duplicate values
SELECT t.* FROM `Event` t INNER JOIN `Showing` t1 ON t.Id=t1.`EventId`;
-- Group by query to prevent duplicate values
SELECT t.* FROM `Event` t INNER JOIN `Showing` t1 ON t.Id=t1.`EventId` GROUP BY t.`Id`;
Thanks !
(this should be a comment but its a bit long)
No problem for LEFT joins but INNER may result in duplicated results in case of relation One-to-Many
It's clear from that sentence that at least one of us is very confused about how a relational database works, and how object-relation mapping should work.
Query leading to duplicate values
The rows produced are not duplicates - you've written the query so it doesn't show you why they are different:
SELECT t1.place, t.*
FROM Event
INNER JOIN Showing
ON Event.Id=Showing.EventId;
If you're not interested in the data from 'showing' then why is it in your query? If you have events without related showing records then you should be using an 'EXISTS' - not a join (consider where you have a single event but 3 million showings)
SELECT t1.place, t.*
FROM `Event` t
WHERE EXISTS (SELECT 1
FROM Showing
WHERE Event.Id=Showing.EventId);
If you are strictly implementing ORM, then you probably shouldn't be writing queries with joins at all - but IMHO, the scenario is better served by using factories.
The data is saying that "My Cool Event" is happening at the park, and at the school. If you inner join the tables you will get more than one result.
Do this query to see what is going on:
Select t.*, t1.* FROM `Event` t INNER JOIN `Showing` t1 ON t.Id=t1.`EventId`;
That is the same query as your duplicate query, but selecting columns from both tables.
The first line of results says the event is happening at the park. The second line says that the same event is happening at the school.

Creating foreign key by matching strings between tables

I'm a beginner to SQL so this is probably a pretty newbie question, but I can't seem to get my head straight on it. I have a pair of tables called MATCH and SEGMENT.
MATCH.id int(11) ai pk
MATCH.name varchar(45)
etc.
SEGMENT.id int(11) ai pk
SEGMENT.name varchar(45)
etc.
Each row in MATCH can have one or more SEGMENT rows associated with it. The name in MATCH is unique on each row. Right now I do an inner join on the name fields to figure out which segments go with which match. I want to copy the tables to a new set of tables and set up a foreign key in SEGMENT that contains the unique ID from the MATCH row both to improve performance and to fix some problems where the names aren't always precisely the same (and they should be).
Is there a way to do a single INSERT or UPDATE statement that will do the name comparisons and add the foreign key to each row in the SEGMENT table - at least for the rows where the names are precisely the same? (For the ones that don't match, I may have to write a SQL function to "clean" the name by removing extra blanks and special characters before comparing)
Thanks for any help anyone can give me!
Here's one way I would consider doing it: add the FK column, add the constraint definition, then populate the column with an UPDATE statement using a correlated subquery:
ALTER TABLE `SEGMENT` ADD COLUMN match_id INT(11) COMMENT 'FK ref MATCH.id' ;
ALTER TABLE `SEGMENT` ADD CONSTRAINT fk_SEGMENT_MATCH
FOREIGN KEY (match_id) REFERENCES `MATCH`(id) ;
UPDATE `SEGMENT` s
SET s.match_id = (SELECT m.id
FROM MATCH m
WHERE m.name = s.name) ;
A correlated subquery (like in the example UPDATE statement above) usually isn't the most efficient approach to getting a column populated. But it seems a lot of people think it's easier to understand than the (usually) more efficient alternative, an UPDATE using a JOIN operation like this:
UPDATE `SEGMENT` s
JOIN `MATCH` m
ON m.name = s.name
SET s.match_id = m.id
Add an ID field you your MATCH Table and populate it.
them add a column MATCHID (which will be your foriegn key) to your SEGMENT table - Note you wont be able to set this as a Foreign Key till you have mapped the records correctly
Use the following query to update the foreign keys:
UPDATE A
FROM SEGMENT A
INNER JOIn MATCH B
on A.NAME=B.NAME
SET MATCHID = B.ID

Why is this simple update running so slowly?

I'm running what I thought was a fairly straight forward update on a fairly large table. I am trying to find out why this simple update is running so slowly. It took about 5 hours to complete.
master table: approx 2m row and 90 fields.
builder table: approx 1.5m rows and 15 fields
I had initially attempted the insert directly:
-- Update master table with newly calculated mcap
update master as m
inner join
(select b.date_base, b.gvkey, sum(b.sec_cap) as sum_sec_mkt
from builder as b
group by b.gvkey, b.date_base) as x
on x.gvkey = m.gvkey AND
x.date_base = m.date_base
set m.mcap = x.sum_sec_mkt;
Unfortunately this ran for a number of hours and I finally killed it after waiting 4hrs.
I then thought I'd create a temporary table and insert the results from the initial select into it.
CREATE TABLE `temp_mkt_cap` (
`date_base` date NOT NULL,
`gvkey` varchar(15) DEFAULT NULL,
`mkt_cap` double DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
-- insert market cap values in to temporary table
insert into temp_mkt_cap
select b.date_base, b.gvkey, sum(b.sec_cap) as sum_sec_mkt
from builder as b
group by b.gvkey, b.date_base;
ALTER TABLE temp_mkt_cap
add primary key (date_base, gvkey);
The insert worked fine with temp_mkt_cap having about 1.4m rows, but the final update took 5hrs to complete.
-- Update master table with newly calculated mcap
update master as m
inner join temp_mkt_cap as mc
on m.date_base = mc.date_base AND m.gvkey = mc.gvkey
set m.mcap = mc.mkt_cap;
'master' has 'date_base' and gvkey_iid as PRIMARY KEYS and gvkey as a KEY.
I have completed more complicated inserts and updates on the table before and can't work out why this isn't working.
Any help would be greatly appreciated.
Thanks,
Update: The keys on the master table are:
ALTER TABLE master
ADD PRIMARY KEY (gvkey_iid,date_base),
ADD KEY date_offset (date_offset),
ADD KEY gvkey (gvkey),
ADD KEY iid (iid);
Update I added a new key to the master table and the update ran in 93.6secs, down from 5 hours. Thanks for everyone's help.
ALTER TABLE master
ADD KEY 'date-gvkey' (date_base, gvkey);
Since you are joining on mc.date_base AND m.gvkey = mc.gvkey, you need an index on these fields in the same order you are joining them, on both tables.
If you are joining table1 with table2 on table1.field1 = table2.field1 AND table1.field2 = table2.field2, you need an index on (table1.field1, table1.field2) AND (table2.field1, table2.field2).
Not null fields are preferable.
Also, because you are updating from the mc.mkt_cap field, you need a SINGLE key on this field if it is NOT already the first field of a composite key you created earlier.
ALL other keys or indexes are going to possibly slow down your query.
Please inspect carefully your database...

Delayed insert due to foreign key constraints

I am trying to run a query:
INSERT
INTO `ProductState` (`ProductId`, `ChangedOn`, `State`)
SELECT t.`ProductId`, t.`ProcessedOn`, \'Activated\'
FROM `tmpImport` t
LEFT JOIN `Product` p
ON t.`ProductId` = p.`Id`
WHERE p.`Id` IS NULL
ON DUPLICATE KEY UPDATE
`ChangedOn` = VALUES(`ChangedOn`)
(I am not quite sure the query is correct, but it appears to be working), however I am running into the following issue. I am running this query before creating the entry into the 'Products' table and am getting a foreign key constraint problem due to the fact that the entry is not in the Products table yet.
My question is, is there a way to run this query, but wait until the next query (which updates the Product table) before performing the insert portion of the query above? Also to note, if the query is run after the Product entry is created it will no longer see the p.Id as being null and therefore failing so it has to be performed before the Product entry is created.
---> Edit <---
The concept I am trying to achieve is as follows:
For starters I am importing a set of data into a temp table, the Product table is a list of all products that are (or have been in the past) added through the set of data from the temp table. What I need is a separate table that provides a state change to the product as sometimes the product will become unavailable (no longer in the data set provided by the vendor).
The ProductState table is as follows:
CREATE TABLE IF NOT EXISTS `ProductState` (
`ProductId` VARCHAR(32) NOT NULL ,
`ChangedOn` DATE NOT NULL ,
`State` ENUM('Activated','Deactivated') NULL ,
PRIMARY KEY (`ProductId`, `ChangedOn`) ,
INDEX `fk_ProductState_Product` (`ProductId` ASC) ,
CONSTRAINT `fk_ProductState_Product`
FOREIGN KEY (`ProductId` )
REFERENCES `Product` (`Id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8
COLLATE = utf8_general_ci;
The foreign key is an identifying relationship with the Product table (Product.Id)
Essentially what I am trying to accomplish is this:
1. Anytime a new product (or previously deactivated product) shows up in the vendor data set, the record is created in the ProductState table as 'Activated'.
2. Anytime a product (that is activated), does not show up in the vendor data set, the record is created as 'Deactivated' in the ProductState table.
The purpose of the ProductState table is to track activation and deactivation states of a product. Also the ProductState is a Multi-To-One relationship with the Product Table, and the state of the product will only change once daily, therefore my PKEY would be ProductId and ChangedDate.
With foreign keys, you definitely need to have the data on the Product table first, before entering the state, think about it with this logic: "How can something that dont exist have a state" ?
So pseudocode of what you should do:
Read in the vendor's product list
Compare them to the existing list in your Product table
If new ones found: 3.1 Insert it to Product table, 3.2 Insert it to ProductState table
If missing from vendor's list: 4.1 Insert it to ProductState table
All these should be done in 1 transaction. Note that you should NOT delete things from Product table, unless you really want to delete every information associated with it, ie. also delete all the "states" that you have stored.
Rather than trying to do this all in 1 query - best bet is to create a stored procedure that does the work as step-by-step above. I think it gets overly complicated (or in this case, probably impossible) to do all in 1 query.
Edit: Something like this:
CREATE PROCEDURE `some_procedure_name` ()
BEGIN
-- Breakdown the tmpImport table to 2 tables: new and removed
SELECT * INTO _temp_new_products
FROM`tmpImport` t
LEFT JOIN `Product` p
ON t.`ProductId` = p.`Id`
WHERE p.`Id` IS NULL
SELECT * INTO _temp_removed_products
FROM `Product` p
LEFT JOIN `tmpImport` t
ON t.`ProductId` = p.`Id`
WHERE t.`ProductId` IS NULL
-- For each entry in _temp_new_products:
-- 1. Insert into Product table
-- 2. Insert into ProductState table 'activated'
-- For each entry in _temp_removed_products:
-- 1. Insert into ProductState table 'deactivated'
-- drop the temporary tables
DROP TABLE _temp_new_products
DROP TABLE _temp_removed_products
END
I think you should:
start a transaction
do your insert into the Products table
do your insert into the ProductState table
commit the transaction
This will avoid any foreign key errors, but will also make sure your data is always accurate. You do not want to 'avoid' the foreign key constraint in any way, and InnoDB (which I'm sure you are using) never defers these constraints unless you turn them off completely.
Also no you cannot insert into multiple tables in one INSERT ... SELECT statement.

mysql left join, limit and sorting

I've a doubt. I need to make a left join between two tables and get only the first result (I mean the first record on table A that doesn't match nothing on table B).
This is an example
create table a (
id int not null auto_increment primary key,
name varchar(50),
surname varchar(50),
prov char(2)
) engine = myisam;
insert into a (name,surname,prov)
values ('aaa','aaa','ss'),('bbb','bbb','ca'),('ccc','ccc','mi'),('ddd','ddd','mi'),('eee','eee','to'),
('fff','fff','mi'),('ggg','ggg','ss'),('hhh','hhh','mi'),('jjj','jjj','ss'),('kkk','kkk','to');
create table b (
id int not null auto_increment primary key,
id_name int
) engine = myisam;
insert into b (id_name) values (3),(4),(8),(5),(10),(1);
Query A:
select a.*
from a
left join b
on a.id = b.id_name
where b.id_name is null and a.prov = 'ss'
order by a.id
limit 1
Query B:
select a.*
from a
left join b
on a.id = b.id_name
where b.id_name is null and a.prov = 'ss'
limit 1
Both queries gives me right result, that is record with id = 7.
I want to know if I can rely on query B even without specifing sorting on id or if it's just a case that I get the right result.
I ask that because on large recordset (more than 10 millions of rows), the query without sorting gives me one record immediately while applying sorting it takes even more than 20 seconds even though a.id is primary key.
Thanks in advance.
You can't rely on query B. Mysql just returned what it found faster to return.
Is there an index on table "b" on column "id_name"? If no, then create it and tell us what You get (I mean how fast) It doesn't matter You are looking for not matched rows, JOIN has to be made before it can test if there is match or not.