I'm running what I thought was a fairly straight forward update on a fairly large table. I am trying to find out why this simple update is running so slowly. It took about 5 hours to complete.
master table: approx 2m row and 90 fields.
builder table: approx 1.5m rows and 15 fields
I had initially attempted the insert directly:
-- Update master table with newly calculated mcap
update master as m
inner join
(select b.date_base, b.gvkey, sum(b.sec_cap) as sum_sec_mkt
from builder as b
group by b.gvkey, b.date_base) as x
on x.gvkey = m.gvkey AND
x.date_base = m.date_base
set m.mcap = x.sum_sec_mkt;
Unfortunately this ran for a number of hours and I finally killed it after waiting 4hrs.
I then thought I'd create a temporary table and insert the results from the initial select into it.
CREATE TABLE `temp_mkt_cap` (
`date_base` date NOT NULL,
`gvkey` varchar(15) DEFAULT NULL,
`mkt_cap` double DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
-- insert market cap values in to temporary table
insert into temp_mkt_cap
select b.date_base, b.gvkey, sum(b.sec_cap) as sum_sec_mkt
from builder as b
group by b.gvkey, b.date_base;
ALTER TABLE temp_mkt_cap
add primary key (date_base, gvkey);
The insert worked fine with temp_mkt_cap having about 1.4m rows, but the final update took 5hrs to complete.
-- Update master table with newly calculated mcap
update master as m
inner join temp_mkt_cap as mc
on m.date_base = mc.date_base AND m.gvkey = mc.gvkey
set m.mcap = mc.mkt_cap;
'master' has 'date_base' and gvkey_iid as PRIMARY KEYS and gvkey as a KEY.
I have completed more complicated inserts and updates on the table before and can't work out why this isn't working.
Any help would be greatly appreciated.
Thanks,
Update: The keys on the master table are:
ALTER TABLE master
ADD PRIMARY KEY (gvkey_iid,date_base),
ADD KEY date_offset (date_offset),
ADD KEY gvkey (gvkey),
ADD KEY iid (iid);
Update I added a new key to the master table and the update ran in 93.6secs, down from 5 hours. Thanks for everyone's help.
ALTER TABLE master
ADD KEY 'date-gvkey' (date_base, gvkey);
Since you are joining on mc.date_base AND m.gvkey = mc.gvkey, you need an index on these fields in the same order you are joining them, on both tables.
If you are joining table1 with table2 on table1.field1 = table2.field1 AND table1.field2 = table2.field2, you need an index on (table1.field1, table1.field2) AND (table2.field1, table2.field2).
Not null fields are preferable.
Also, because you are updating from the mc.mkt_cap field, you need a SINGLE key on this field if it is NOT already the first field of a composite key you created earlier.
ALL other keys or indexes are going to possibly slow down your query.
Please inspect carefully your database...
Related
table 1 is called (athlete) and table2 is called (training_session.id) the primary key to table 1 is ID, and the table 2 has the primary key Athelete_id
I want to delete a person from my database by using his name, which I've called "Pet". However, he is also connected to another table which stores his training session. So (ID 1) on table 1 is connected to table 2 (athlete id1)
I struggle a lot, I try using INNER JOIN.
DELETE athlete,training_session FROM athlete
INNER JOIN
training_session ON training_session.id = athlete.name
WHERE
athlete.name = "Pet;
I have something wrong with my syntax, is it correct to use Inner Join or have I misunderstood
You should have set up foreign key constraints with Cascade deletions to simplify the logic and all you would have needed than was to delete from athlete. So I would suggest you add it.
For more info you can take a look at:
http://www.mysqltutorial.org/mysql-on-delete-cascade/
I have a table named product. In this table, I want to update some product id values, however I cannot do it directly, my 'where' condition needs to determine records based on value from another table, so I do:
update product set prod_active = 0 where prod_id in (
select prod_id from user_prod_sel where seltype = 100
)
The problem I have here is this is very slow. How can I convert this into a oin based query that can give me faster results?
Any help is greatly appreciated.
You should be able to do this:
UPDATE product a
JOIN user_prod_sel b ON a.prod_id = b.prod_id AND b.selType = 100
SET a.prod_active = 0
If you want to speed up the query you need to add indexes for product.prod_id and user_prod_sel.seltype. Table indexes exist exactly for speeding up selection of data.
ALTER TABLE `product` ADD INDEX `prod_id` (`prod_id`)
Do not add index product.prod_id if it already has index (if it's a primary key, for example).
ALTER TABLE `user_prod_sel` ADD INDEX `seltype` (`seltype`)
And then:
UPDATE `product`, `user_prod_sel`
SET `product`.`prod_active` = 0
WHERE
`product`.`prod_id` = `user_prod_sel`.`prod_id` AND
`user_prod_sel`.`seltype` = 100
Of course, creating indexes is a task which is performed once, not before every UPDATE query.
I have a address table which is referenced from 6 other tables (sometimes multiple tables). Some of those tables have around half a million records (and the address table around 750000 records). I want to have a periodical query running which deletes all records that are not referenced from any of the tables.
The following sub-queries is not a option, because the query never finishes - the scope is too big.
delete from address where address_id not in (select ...)
and not in (select ...) and not in (select ...) ...
What I was hoping was that I could use the foreign key constraint and I could simply delete all records for which the foreign key constraint does not stop me (because there is no reference to the table). I could not find a way to do this (or is there?). Anybody another good idea to tackle this problem?
You can try this ways
DELETE
address
FROM
address
LEFT JOIN other_table ON (address.id = other_table.ref_field)
LEFT JOIN other_table ON (address.id = other_table2.ref_field)
WHERE
other_table.id IS NULL AND other_table2.id IS NULL
OR
DELETE
FROM address A
WHERE NOT EXISTS (
SELECT 1
FROM other_table B
WHERE B.a_key = A.id
)
I always use this:
DELETE FROM table WHERE id NOT IN (SELECT id FROM OTHER table)
I'd do this by first creating a TEMPORARY TABLE (t) that is a UNION of the IDs in the 6 referencing tables, then run:
DELETE x FROM x LEFT JOIN t USING (ID) WHERE x.ID IS NULL;
Where x is the address table.
See 'Multiple-table syntax' here:
http://dev.mysql.com/doc/refman/5.0/en/delete.html
Obviously, your temporary table should have its PRIMARY KEY on ID. It may take some time to query and join, but I can't see a way round it. It should be optimized, unlike the multiple sub-query version.
I am trying to update rows in a data table that intersect rows in a smaller index table. The two tables are joined on the composite PK of the data table, and explain-select using the same criteria shows that the index is being used properly, and the correct unique rows are fetched - but I'm still having issues with the update.
The update on the joined tables works fine when there's only 1 row in the temp table, but when I have more rows, I get MySql Error 1175, and none of the WHERE conditions I specify are recognized.
I'm aware that I can just switch off safe mode with SET SQL_SAFE_UPDATES=0, but can anyone tell me what I'm not understanding here? Why is my WHERE condition not accepted, and why does it even need a where when I'm doing a NATURAL JOIN - and why does this work with only one row in the right-hand-side table (MyTempTable)?
The Code
Below is vastly simplified, but structurally identical create table & updates representing my problem.
-- The Data Table.
Create Table MyDataTable
(
KeyPartOne int not null,
KeyPartTwo varchar(64) not null,
KeyPartThree int not null,
RelevantData varchar(200) null,
Primary key (KeyPartOne, KeyPartTwo, KeyPartThree)
) Engine=InnoDB;
-- The 'Temp' table.
Create Table MyTempTable
(
KeyPartOne int not null,
KeyPartTwo varchar(64) not null,
KeyPartThree int not null,
Primary key (KeyPartOne, KeyPartTwo, KeyPartThree)
)Engine=Memory;
-- The Update Query (works fine with only 1 row in Temp table)
update MyDataTable natural join MyTempTable
set RelevantData = 'Something Meaningful';
-- Specifying 'where' - roduces same effect as the other update query
update MyDataTable mdt join MyTempTable mtt
on mdt.KeyPartOne = mtt.KeyPartOne
and mdt.KeyPartTwo = mtt.KeyPartTwo
and mdt.KeyPartThree = mtt.KeyPartThree
set RelevantData = 'Something Meaningful'
where mdt.KeyPartOne = mtt.KeyPartOne
and mdt.KeyPartTwo = mtt.KeyPartTwo
and mdt.KeyPartThree = mtt.KeyPartThree;
P.S. Both of the above update statements work as expected when the temp table contains only one row, but give me the error when there's more than one row. I'm seriously curious about why!
In your first UPDATE query, you use NATURAL JOIN, which is the same as NATURAL LEFT JOIN.
In your second UPDATE query, you use JOIN, which is the same as INNER JOIN.
A LEFT JOIN is not the same as an INNER JOIN, and a NATURAL JOIN is not the same as a JOIN.
Not sure what you're trying to do, but if you are trying to update all rows in MyDataTable where a corresponding entry exists in MyTempTable, this query should do the trick:
UPDATE
myDataTable mdt
INNER JOIN MyTempTable mtt ON
mdt.KeyPartOne = mtt.KeyPartOne
AND mdt.KeyPartTwo = mtt.KeyPartTwo
AND mdt.KeyPartThree = mtt.KeyPartThree
SET
mdt.RelevantData = 'Something Meaningful'
If that's not what you're trying to do, please clarify and I will update my answer.
Per the MySql forum, the update queries are valid, and the fact that they don't work in Workbench with safe-update mode turned on does not indicate that there's anything wrong with the index. It's just a quirk of Workbench's "don't-shoot-yourself-in-the-foot" mode. :-)
I am trying to run a query:
INSERT
INTO `ProductState` (`ProductId`, `ChangedOn`, `State`)
SELECT t.`ProductId`, t.`ProcessedOn`, \'Activated\'
FROM `tmpImport` t
LEFT JOIN `Product` p
ON t.`ProductId` = p.`Id`
WHERE p.`Id` IS NULL
ON DUPLICATE KEY UPDATE
`ChangedOn` = VALUES(`ChangedOn`)
(I am not quite sure the query is correct, but it appears to be working), however I am running into the following issue. I am running this query before creating the entry into the 'Products' table and am getting a foreign key constraint problem due to the fact that the entry is not in the Products table yet.
My question is, is there a way to run this query, but wait until the next query (which updates the Product table) before performing the insert portion of the query above? Also to note, if the query is run after the Product entry is created it will no longer see the p.Id as being null and therefore failing so it has to be performed before the Product entry is created.
---> Edit <---
The concept I am trying to achieve is as follows:
For starters I am importing a set of data into a temp table, the Product table is a list of all products that are (or have been in the past) added through the set of data from the temp table. What I need is a separate table that provides a state change to the product as sometimes the product will become unavailable (no longer in the data set provided by the vendor).
The ProductState table is as follows:
CREATE TABLE IF NOT EXISTS `ProductState` (
`ProductId` VARCHAR(32) NOT NULL ,
`ChangedOn` DATE NOT NULL ,
`State` ENUM('Activated','Deactivated') NULL ,
PRIMARY KEY (`ProductId`, `ChangedOn`) ,
INDEX `fk_ProductState_Product` (`ProductId` ASC) ,
CONSTRAINT `fk_ProductState_Product`
FOREIGN KEY (`ProductId` )
REFERENCES `Product` (`Id` )
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB
DEFAULT CHARACTER SET = utf8
COLLATE = utf8_general_ci;
The foreign key is an identifying relationship with the Product table (Product.Id)
Essentially what I am trying to accomplish is this:
1. Anytime a new product (or previously deactivated product) shows up in the vendor data set, the record is created in the ProductState table as 'Activated'.
2. Anytime a product (that is activated), does not show up in the vendor data set, the record is created as 'Deactivated' in the ProductState table.
The purpose of the ProductState table is to track activation and deactivation states of a product. Also the ProductState is a Multi-To-One relationship with the Product Table, and the state of the product will only change once daily, therefore my PKEY would be ProductId and ChangedDate.
With foreign keys, you definitely need to have the data on the Product table first, before entering the state, think about it with this logic: "How can something that dont exist have a state" ?
So pseudocode of what you should do:
Read in the vendor's product list
Compare them to the existing list in your Product table
If new ones found: 3.1 Insert it to Product table, 3.2 Insert it to ProductState table
If missing from vendor's list: 4.1 Insert it to ProductState table
All these should be done in 1 transaction. Note that you should NOT delete things from Product table, unless you really want to delete every information associated with it, ie. also delete all the "states" that you have stored.
Rather than trying to do this all in 1 query - best bet is to create a stored procedure that does the work as step-by-step above. I think it gets overly complicated (or in this case, probably impossible) to do all in 1 query.
Edit: Something like this:
CREATE PROCEDURE `some_procedure_name` ()
BEGIN
-- Breakdown the tmpImport table to 2 tables: new and removed
SELECT * INTO _temp_new_products
FROM`tmpImport` t
LEFT JOIN `Product` p
ON t.`ProductId` = p.`Id`
WHERE p.`Id` IS NULL
SELECT * INTO _temp_removed_products
FROM `Product` p
LEFT JOIN `tmpImport` t
ON t.`ProductId` = p.`Id`
WHERE t.`ProductId` IS NULL
-- For each entry in _temp_new_products:
-- 1. Insert into Product table
-- 2. Insert into ProductState table 'activated'
-- For each entry in _temp_removed_products:
-- 1. Insert into ProductState table 'deactivated'
-- drop the temporary tables
DROP TABLE _temp_new_products
DROP TABLE _temp_removed_products
END
I think you should:
start a transaction
do your insert into the Products table
do your insert into the ProductState table
commit the transaction
This will avoid any foreign key errors, but will also make sure your data is always accurate. You do not want to 'avoid' the foreign key constraint in any way, and InnoDB (which I'm sure you are using) never defers these constraints unless you turn them off completely.
Also no you cannot insert into multiple tables in one INSERT ... SELECT statement.