I'm trying to use SQL to delete multiple rows from multiple tables that are
joined together.
Table A is joined to Table B
Table B is joined to Table C
I want to delete all rows in table B & C that correspond to a row in Table A
CREATE TABLE `boards` (
`boardid` int(2) NOT NULL AUTO_INCREMENT,
`boardname` varchar(255) NOT NULL DEFAULT '',
PRIMARY KEY (`boardid`)
);
-- --------------------------------------------------------
--
-- Table structure for table `messages`
--
CREATE TABLE `messages` (
`messageid` int(6) NOT NULL AUTO_INCREMENT,
`boardid` int(2) NOT NULL DEFAULT '0',
`topicid` int(4) NOT NULL DEFAULT '0',
`message` text NOT NULL,
`author` varchar(255) NOT NULL DEFAULT '',
`date` datetime DEFAULT NULL,
PRIMARY KEY (`messageid`)
);
-- --------------------------------------------------------
--
-- Table structure for table `topics`
--
CREATE TABLE `topics` (
`topicid` int(4) NOT NULL AUTO_INCREMENT,
`boardid` int(2) NOT NULL DEFAULT '0',
`topicname` varchar(255) NOT NULL DEFAULT '',
`author` varchar(255) NOT NULL DEFAULT '',
PRIMARY KEY (`topicid`)
);
Well, if you had used InnoDB tables, you could set up a cascading delete with foreign keys that would do it all automatically. But if you have some reason for using MyISAM, You just use a multiple-table DELETE:
DELETE FROM boards, topics, messages
USING boards INNER JOIN topics INNER JOIN messages
WHERE boards.boardid = $boardid
AND topics.boardid = boards.boardid
AND messages.boardid = boards.boardid;
this can be done by your db-system if you are using foreign keys with "on delete cascade".
Take a look here:
http://dev.mysql.com/doc/refman/5.1/en/innodb-foreign-key-constraints.html
You could either just check for presence
delete from topics where boardid in (select boardid from boards)
delete from messages where boardid in (select boardid from boards)
but this would only make sense if this behaviour should not always apply. When the behaviour should always apply, implement foreign keys with delete on cascade
explained on a zillion sites, in your helpfiles and here
Deleting rows from multiple tables can be done in two ways :
Delete rows from one table, determining which rows to delete by referring to another
table
Delete rows from multiple tables with a single statement
Multiple-table DELETE statements can be written in two formats. The following example demonstrates one syntax, for a query that deletes rows from a table t1 where the id values match those in a table t2:
DELETE t1 FROM t1, t2 WHERE t1.id = t2.id;
The second syntax is slightly different:
DELETE FROM t1 USING t1, t2 WHERE t1.id = t2.id;
To delete the matching records from both tables, the statements are:
DELETE t1, t2 FROM t1, t2 WHERE t1.id = t2.id;
DELETE FROM t1, t2 USING t1, t2 WHERE t1.id = t2.id;
The ORDER BY and LIMIT clauses normally supported by UPDATE and DELETE aren’t allowed when these statements are used for multiple-table operations.
Related
I have a table that looks like this:
id int primary key
uniqueID string --not uniquely indexed
foreignKeyID int --foreignKey to another table
I want to find all the uniqueIds in this table that exist for foreign key 1 that do not exist for foreign key 2
I thought I could do something like this:
SELECT * FROM table t1
LEFT JOIN table t2
ON t1.uniqueID = t2.uniqueID
WHERE
t1.foreignKeyID = 1
AND t2.uniqueID IS NULL
However this is never giving me results. I can make it work with a NOT IN subquery but this is a very large table so I suspect a solution using joins will be faster.
Looking for the best way to structure this query.
Here's an sample data set and SQL Fiddle with an example of the working NOT IN query I am trying to convert to a LEFT JOIN:
CREATE TABLE `table` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`uniqueID` varchar(255),
`foreignKeyID` int(5) unsigned NOT NULL DEFAULT 0,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
INSERT INTO `table` (uniqueID, foreignKeyID) VALUES ('aaa', 1), ('bbb', 1);
http://sqlfiddle.com/#!9/48a3f3/4 and a non-working LEFT JOIN I thought would be equivalent.
Thanks!
Try this, seems to be working if understood the question properly:
SELECT *
FROM `table` t
LEFT JOIN `table` tt ON tt.uniqueID = t.uniqueID AND tt.foreignKeyID <> 1
WHERE t.foreignKeyID = 1 AND tt.id IS NULL;
I have two tables that are identical in fields. I want to query names in table2 that are not in table1. Both tables have name field as unique (primary key).
Here are the info. of my database design:
My query is:
SELECT `table2`.`name` FROM `mydatabase`.`table2`, `mydatabase`.`table1`
WHERE `table2`.`name` NOT IN (SELECT `table1`.`name` FROM `mydatabase`.`table1`)
AND table2`.`name` NOT LIKE 'xyz%';
The output of SHOW CREATE TABLE <table name>:
For table1:
table1, CREATE TABLE `table1` (
`name` varchar(500) NOT NULL,
`ip` varchar(500) DEFAULT NULL,
`type` varchar(500) DEFAULT NULL,
`grade` varchar(500) DEFAULT NULL,
`extended_ip` text,
PRIMARY KEY (`name`),
UNIQUE KEY `mydatabase_name_UNIQUE` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
And table2:
tabl2, CREATE TABLE `table2` (
`name` varchar(500) NOT NULL,
`ip` varchar(500) DEFAULT NULL,
`type` varchar(500) DEFAULT NULL,
`grade` varchar(500) DEFAULT NULL,
`extended_ip` text,
PRIMARY KEY (`name`),
UNIQUE KEY `mydatabase_name_UNIQUE` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
The output of EXPLAIN <my query>:
# id, select_type, table, partitions, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1, PRIMARY, table1, , index, , mydatabase_name_UNIQUE, 502, , 17584, 100.00, Using index
1, PRIMARY, table2, , index, , mydatabase_name_UNIQUE, 502, , 46264, 100.00, Using where; Using index; Using join buffer (Block Nested Loop)
2, SUBQUERY, table1 , index, PRIMARY,mydatabase_name_UNIQUE, mydatabase_name_UNIQUE, 502, , 17584, 100.00, Using index
EDIT:
And I forgot to mention what happens is that the databse just crashes with my query. i am using mysql-workbench in Ubuntu 18. When I perform this query the whole workbench closes and I have to restart opening it again.
Just do a LEFT JOIN on name, with table2 as your starting table, since you want to consider all the names from table2 which do not exist in table1. Names which don't exist in table1 will have a null value post the join. Note that this join based solution will be significantly faster than any subquery based approach.
Also, you should avoid comma (,) based implicit joins. It is old syntax, and you should use explicit JOIN based syntax. Read: Explicit vs implicit SQL joins
Also, it is a good habit to use Aliasing for better readability
Try the following:
SELECT t2.name
FROM `mydatabase`.`table2` AS t2
LEFT JOIN `mydatabase`.`table1` AS t1 ON t1.name = t2.name
WHERE t1.name IS NULL
AND t2.name NOT LIKE 'xyz%';
Try a subquery:
SELECT `table2`.`name` FROM `mydatabase`.`table2` WHERE `table2`.`name` NOT IN (SELECT `table1`.`name` FROM `mydatabase`.`table1`);
I have three tables {animal, food, animal_food}
DROP TABLE IF EXISTS `tbl_animal`;
CREATE TABLE `tbl_animal` (
id_animal INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(25) NOT NULL DEFAULT "no name",
sex CHAR(1) NOT NULL DEFAULT "M",
size VARCHAR(10) NOT NULL DEFAULT "Mini",
age VARCHAR(10) NOT NULL DEFAULT "born",
hair VARCHAR(5 ) NOT NULL DEFAULT "short",
color VARCHAR(25) NOT NULL DEFAULT "not defined",
FOREIGN KEY (sex) REFERENCES `tbl_sexes` (sex),
FOREIGN KEY (tamanio) REFERENCES `tbl_sizes` (size),
FOREIGN KEY (age) REFERENCES `tbl_ages` (age),
FOREIGN KEY (hair) REFERENCES `tbl_hair_length` (hair_length),
CONSTRAINT `uc_Info_Animal` UNIQUE (`id_animal`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
DROP TABLE IF EXISTS `tbl_food`;
CREATE TABLE `tbl_food` (
id_food INTEGER NOT NULL PRIMARY KEY,
type_food VARCHAR(20) NOT NULL DEFAULT "Other",
label VARCHAR(50) NOT NULL,
CONSTRAINT `uc_Info_Food` UNIQUE (`id_food`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
DROP TABLE IF EXISTS `animal_food`;
CREATE TABLE `animal_food` (
id_animal INTEGER NOT NULL,
food VARCHAR(50) NOT NULL DEFAULT "",
quantity VARCHAR(50) NOT NULL DEFAULT "",
times VARCHAR(50) NOT NULL DEFAULT "",
description VARCHAR(50) NOT NULL DEFAULT "",
date_last DATE DEFAULT '0000-00-00 00:00:00',
date_water DATE DEFAULT '0000-00-00 00:00:00',
CONSTRAINT fk_ID_Animal_Food FOREIGN KEY (id_animal) REFERENCES `tbl_animal`(id_animal)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
And I have a view where I select the values columns in animal and animal_food depending on ID
CREATE VIEW `CAT_animal_food` AS
SELECT a.name, a.sex,a.size,a.age,a.hair,a.color,
a_f.*
FROM `tbl_animal` a, `animal_food` a_f
WHERE a.id_animal = a_f.id_animal;
What would be better to create a view like above or, to join these animal and animal_food tables?
SELECT ...
FROM A.table t1
JOIN B.table2 t2 ON t2.column = t1.col
What is really the diference between that kind of view and a left join for example?
The only difference between the two SELECT statements is the syntax style. Both perform INNER JOINS. In other words, this style uses what is called "implicit" syntax:
SELECT ...
FROM A.table t1, B.table2 t2
WHERE t2.column = t1.col
It is "implicit" because the join condition is implied by the WHERE clause. This version uses "explicit" syntax:
SELECT ...
FROM A.table t1
JOIN B.table2 t2 ON t2.column = t1.col
Most people prefer to see "explicit" syntax because it make your code easier to follow; the join condition is explicitly understood and any WHERE clause is obvious.
None of this is related to LEFT JOINS of course. Here is a famous link with a great visual description of join types.
There is a big difference.
The old school style using a list of tables only allows inner joins.
Further, placing non-key conditions in the ON clause of a proper join allows greater performance and capability that a where clause can not deliver. The primary reason for this is that the ON clause is evaluated as the join is being made, but the WHERE clause is evaluated after all joins have been made.
The subject is too complex to do justice here.
If you had to pick one of the two following queries, which would you choose and why:
UPDATE `table1` AS e
SET e.points = e.points+(
SELECT points FROM `table2` AS ep WHERE e.cardnbr=ep.cardnbr);
or:
UPDATE `table1` AS e
INNER JOIN
(
SELECT points, cardnbr
FROM `table2`
) AS ep ON (e.cardnbr=ep.cardnbr)
SET e.points = e.points+ep.points;
Tables' definitions:
CREATE TABLE `table1` (
`cardnbr` int(10) DEFAULT NULL,
`name` varchar(50) DEFAULT NULL,
`points` decimal(7,3) DEFAULT '0.000',
`email` varchar(50) NOT NULL DEFAULT 'user#company.com',
`id` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=25205 DEFAULT CHARSET=latin1$$
CREATE TABLE `table2` (
`cardnbr` int(10) DEFAULT NULL,
`id` int(11) NOT NULL AUTO_INCREMENT,
`points` decimal(7,3) DEFAULT '0.000',
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=4 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci$$
UPDATE: BOTH are causing problems the first is causing non matched rows to update into NULL.
The second is causing them to update into the max value 999.9999 (decimal 7,3).
PS the cardnbr field is NOT a key
I prefer the second one..reason for that is
When using JOIN the databse can create an execution plan that is better for your query and save time whereas subqueries (like your first one ) will run all the queries and load all the datas which may take time.
i think subqueries is easy to read but performance wise JOIN is faster...
First, the two statements are not equivalent, as you found out yourself. The first one will update all rows of table1, putting NULL values for those rows that have no related rows in table2.
So the second query looks better because it doesn't update all rows of table1. It could be written in a more simpel way, like this though:
UPDATE table1 AS e
INNER JOIN table2 AS ep
ON e.cardnbr = ep.cardnbr
SET e.points = e.points + ep.points ;
So, the 2nd query would be the best to use, if cardnbr was the primary key of table2. Is it?
If it isn't, then which values from table2 should be used for the update of table1 (added to points)? All of them? You could use this:
UPDATE table1 AS e
INNER JOIN
( SELECT SUM(points) AS points, cardnbr
FROM table2
GROUP BY cardnbr
) AS ep ON e.cardnbr = ep.cardnbr
SET
e.points = e.points + ep.points ;
Just one of them? That would require some other derived table, depending on what you want.
I have two tables named table_1 (1GB) and reference (250Mb).
When I query a cross join on reference it takes 16hours to update table_1 .. We changed the system files EXT3 for XFS but still it's taking 16hrs.. WHAT AM I DOING WRONG??
Here is the update/cross join query :
mysql> UPDATE table_1 CROSS JOIN reference ON
-> (table_1.start >= reference.txStart AND table_1.end <= reference.txEnd)
-> SET table_1.name = reference.name;
Query OK, 17311434 rows affected (16 hours 36 min 48.62 sec)
Rows matched: 17311434 Changed: 17311434 Warnings: 0
Here is a show create table of table_1 and reference:
CREATE TABLE `table_1` (
`strand` char(1) DEFAULT NULL,
`chr` varchar(10) DEFAULT NULL,
`start` int(11) DEFAULT NULL,
`end` int(11) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`name2` varchar(255) DEFAULT NULL,
KEY `annot` (`start`,`end`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
CREATE TABLE `reference` (
`bin` smallint(5) unsigned NOT NULL,
`name` varchar(255) NOT NULL,
`chrom` varchar(255) NOT NULL,
`strand` char(1) NOT NULL,
`txStart` int(10) unsigned NOT NULL,
`txEnd` int(10) unsigned NOT NULL,
`cdsStart` int(10) unsigned NOT NULL,
`cdsEnd` int(10) unsigned NOT NULL,
`exonCount` int(10) unsigned NOT NULL,
`exonStarts` longblob NOT NULL,
`exonEnds` longblob NOT NULL,
`score` int(11) DEFAULT NULL,
`name2` varchar(255) NOT NULL,
`cdsStartStat` enum('none','unk','incmpl','cmpl') NOT NULL,
`cdsEndStat` enum('none','unk','incmpl','cmpl') NOT NULL,
`exonFrames` longblob NOT NULL,
KEY `chrom` (`chrom`,`bin`),
KEY `name` (`name`),
KEY `name2` (`name2`),
KEY `annot` (`txStart`,`txEnd`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
You should index table_1.start, reference.txStart, table_1.end and reference.txEnd table fields:
ALTER TABLE `table_1` ADD INDEX ( `start` ) ;
ALTER TABLE `table_1` ADD INDEX ( `end` ) ;
ALTER TABLE `reference` ADD INDEX ( `txStart` ) ;
ALTER TABLE `reference` ADD INDEX ( `txEnd` ) ;
Cross joins are Cartesian Products, which are probably one of the most computationally expensive things to compute (they don't scale well).
For each table T_i for i = 1 to n, the number of rows generated by crossing tables T_1 to T_n is the size of each table multiplied by the size of each other table, ie
|T_1| * |T_2| * ... * |T_n|
Assuming each table has M rows, the resulting cost of computing the cross join is then
M_1 * M_2 ... M_n = O(M^n)
which is exponential in the number of tables involved in the join.
I see 2 problems with the UPDATE statement.
There is no index for the End fields. The compound indexes (annot) you have will be used only for the start fields in this query. You should add them as suggested by Emre:
ALTER TABLE `table_1` ADD INDEX ( `end` ) ;
ALTER TABLE `reference` ADD INDEX ( `txEnd` ) ;
Second, the JOIN may (and probably does) find many rows of table reference that are related to a row of table_1. So some (or all) rows of table_1 that are updated, are updated many times. Check the result of this query, to see if it is the same as your updated rows count (17311434):
SELECT COUNT(*)
FROM table_1
WHERE EXISTS
( SELECT *
FROM reference
WHERE table_1.start >= reference.txStart
AND table_1.`end` <= reference.txEnd
)
There can be other ways to write this query but the lack of a PRIMARY KEY on both tables makes it harder. If you define a primary key on table_1, try this, replacing id with the primary key.
Update: No, do not try it on a table with 34M rows. Check the execution plan and try with smaller tables first.
UPDATE table_1 AS t1
JOIN
( SELECT t2.id
, r.name
FROM table_1 AS t2
JOIN
( SELECT name, txStart, txEnd
FROM reference
GROUP BY txStart, txEnd
) AS r
ON t2.start >= r.txStart
AND t2.`end` <= r.txEnd
GROUP BY t2.id
) AS good
ON good.id = t1.id
SET t1.name = good.name;
You can check the query plan by running EXPLAIN on the equivalent SELECT:
EXPLAIN
SELECT t1.id, t1.name, good.name
FROM table_1 AS t1
JOIN
( SELECT t2.id
, r.name
FROM table_1 AS t2
JOIN
( SELECT name, txStart, txEnd
FROM reference
GROUP BY txStart, txEnd
) AS r
ON t2.start >= r.txStart
AND t2.`end` <= r.txEnd
GROUP BY t2.id
) AS good
ON good.id = t1.id ;
Try this:
UPDATE table_1 SET
table_1.name = (
select reference.name
from reference
where table_1.start >= reference.txStart
and table_1.end <= reference.txEnd)
Somebody already offered you to add some indexes. But I think the best performance you may get with these two indexes:
ALTER TABLE `test`.`time`
ADD INDEX `reference_start_end` (`txStart` ASC, `txEnd` ASC),
ADD INDEX `table_1_star_end` (`start` ASC, `end` ASC);
Only one of them will be used by MySQL query, but MySQL will decide which is more useful automatically.