Consider MySQL tables: table1, table2
table1:
+------+------+
| col1 | col2 |
+------+------+
| 1 | a |
| 2 | b |
| 3 | c |
+------+------+
table2:
+------+------+
| col1 | col2 |
+------+------+
| 1 | a |
| 2 | b |
+------+------+
What is the most efficient way to delete the rows in table1 based on the rows in table2 such that the desired output looks like this:
+------+------+
| col1 | col2 |
+------+------+
| 3 | c |
+------+------+
Please note that this is a minimalist example of a problem I am having with two very large tables:
Here is code to create table1 and table2:
DROP TABLE IF EXISTS table1;
CREATE TABLE table1 (
col1 BIGINT,
col2 TEXT
);
INSERT INTO table1 VALUES (1, 'a');
INSERT INTO table1 VALUES (2, 'b');
INSERT INTO table1 VALUES (3, 'c');
DROP TABLE IF EXISTS table2;
CREATE TABLE table2 (
col1 BIGINT,
col2 TEXT
);
INSERT INTO table2 VALUES (1, 'a');
INSERT INTO table2 VALUES (2, 'b');
MySQL = 5.7.12
Question:
From reading this site and others I notice that there are several ways to do this operation in MySQL. I am wondering which is the fastest way for large tables (30M+ rows)? Here are some ways I have discovered:
1. method using DELETE
DELETE t1
FROM table1 t1
INNER JOIN table2 t2
ON t1.col1=t2.col1;
2. method using DELETE FROM
DELETE FROM t1
USING table1 t1
INNER JOIN table2 t2
ON ( t1.col1 = t2.col1 );
3. method using DELETE FROM
DELETE FROM table1 WHERE col1 in (SELECT col1 FROM table2);
Is there a faster way to do this that I have not listed here?
I will suggest another method it is not as practical as the mentioned method , but maybe it will be much faster for larger tables.
It is mentioned on [MySQL documentation] (https://dev.mysql.com/doc/refman/8.0/en/delete.html)
InnoDB Tables
If you are deleting many rows from a large table, you may exceed the lock table size for an InnoDB table. To avoid this problem, or
simply to minimize the time that the table remains locked, the
following strategy (which does not use DELETE at all) might be
helpful:
Select the rows not to be deleted into an empty table that has the same structure as the original table:
INSERT INTO t_copy SELECT * FROM t WHERE ... ;
Use RENAME TABLE to atomically move the original table out of the way and rename the copy to the original name:
RENAME TABLE t TO t_old, t_copy TO t;
Drop the original table:
DROP TABLE t_old;
--Follow below steps:
--Rename the table:
RENAME TABLE table1 TO table1_old;
--Create new table with primary key and all necessary indexes:
CREATE TABLE table1 LIKE table1_old;
USE THIS FOR MyISAM TABLES:
SET UNIQUE_CHECKS=0;
LOCK TABLES table1_old WRITE, table2 WRITE;
ALTER TABLE table1 DISABLE KEYS;
INSERT INTO table1 (select * from table1_old t1 where col1 not in (select col1 from table2 ));
ALTER TABLE table1 ENABLE KEYS;
SET UNIQUE_CHECKS=1;
UNLOCK TABLES;
-- USE THIS FOR InnoDB TABLES:
SET AUTOCOMMIT = 0;
SET UNIQUE_CHECKS=0;
SET FOREIGN_KEY_CHECKS=0;
LOCK TABLES table1_old WRITE, table2 WRITE;
INSERT INTO table1 (select * from table1_old t1 where col1 not in (select col1 from table2 ));
SET FOREIGN_KEY_CHECKS=1;
SET UNIQUE_CHECKS=1;
COMMIT; SET AUTOCOMMIT = 1;
UNLOCK TABLES;
CREATE TABLE t_new LIKE t
INSERT INTO t_new
SELECT *
FROM t
LEFT JOIN exclude ON ...
WHERE exclude.id IS NULL;
RENAME TABLE t TO t_old,
t_new TO t;
DROP TABLE t_old;
DELETE (and UPDATE) choke on handling a huge number of rows; SELECT does not.
A possible optimization on this would be to drop all indexes except the PRIMARY KEY and re-add them after finishing.
(FOREIGN KEYs can be a big nuisance; do you have any?)
Related
I've a sample table table1:
id transaction_number net_amount category type
1 100000 2000 A ZA
2 100001 4000 A ZA
3 100002 6000 B ZB
I've a sample table table2:
id transaction_number net_amount category type
1 100002 6000 B ZB
How do I insert unique records that are not in table2, but present in table1?
Desired result:
id transaction_number net_amount category type
1 100002 6000 B ZB
2 100000 2000 A ZA
3 100001 4000 A ZB
INSERT INTO table2 ( transaction_number, net_amount, category, type )
(
/* Rows in table1 that don't exist in table2: */
SELECT ( table1.transaction_number, table1.net_amount, table1.category, table1.type )
FROM table1
LEFT JOIN table2 ON ( table1.transaction_number = table2.transaction_number )
WHERE table2.transaction_number IS NULL
)
If you don't want to duplicate transaction numbers in table2, then create a unique index or constraint on that column (or the columns you want to be unique). Let the database handle the integrity of the data:
alter table table2 add constraint unq_table2_transaction_number
unique (transaction_number);
Then use on duplicate key update with a dummy update:
insert into table2 (transaction_number, net_amount, category, type)
select transaction_number, net_amount, category, type
from table1
on duplicate key update transaction_number = values(transaction_number);
Why do I recommend this approach? First, it is thread-safe, so it works even when multiple queries are modifying the database at the same time. Second, it puts the database in charge of data integrity, so the transactions will be unique regardless of how they are changed.
Note that the most recent versions of MySQL have deprecated this syntax in favor of the (standard) on conflict clause. The functionality is similar, but I don't think those versions are widespread.
Try this
INSERT INTO table2 (transaction_number,net_amount,category,type)
(SELECT transaction_number,net_amount,category,type from table1) ON DUPLICATE KEY UPDATE
net_amount=VALUES(net_amount),category=VALUES(category),type=VALUES(type);
Usw not exists as follows:
Insert into table2
Select t1.*
From table1 t1
Where not exists
(Select 1 from table2 t2
Where t1.transaction_number = t2.transaction_number)
I am saving tables from Spark SQL using MySQL as my storage engine. My table looks like
+-------------+----------+
| count| date|
+-------------+----------+
| 72|2017-09-08|
| 84|2017-09-08|
+-------------+----------+
I want to UPDATE the table by adding the count using GROUP BY and dropping the individual rows. So my output should be like
+-------------+----------+
| count| date|
+-------------+----------+
| 156|2017-09-08|
+-------------+----------+
Is it a right expectation and if possible, how it could be achieved using Spark SQL ?
Before you write the table to MYSQL, apply the following logic in your spark dataframe/dataset
import org.apache.spark.sql.functions._
df.groupBy("date").agg(sum("count").as("count"))
And write the transformed dataframe to MYSQL.
Soln 1
In MySQL, you can make use of TEMPORARY TABLE to store the results after grouping.
Then truncate the original table.
Now insert data from temporary table to original table.
CREATE TEMPORARY TABLE temp_table
AS
(SELECT SUM(count) as count, [date] from table_name GROUP BY [date]);
TRUNCATE TABLE table_name;
INSERT INTO table_name (count,[date])
SELECT (count,[date]) from temp_table;
DROP TEMPORARY TABLE temp_table;
Soln 2
Update the rows using following query.
UPDATE table_name t
INNER JOIN
(SELECT sum(count) as [count], [date] FROM table_name GROUP BY [date]) t1
ON t.[date] = t1.[date]
SET t.[count] = t1.[count]
Assuming that the table has a unique column named uid,
DELETE t1 FROM table_name t1, table_name t2
WHERE t1.uid > t2.uid AND t1.[date] = t2.[date]
Please refer this SO question to see more about deleting duplicate rows.
I have 2 simple table
table1 -> p_id | s_id
table2 -> p_id | s_id
The two table are same. The p_id ai value.
I would like to insert into the table2 a row, but **only if p_id is exist in table1. This is possible? (MySQL)
INSERT INTO table1 (p_id, s_id)
SELECT * FROM (SELECT '100', '2') AS tmp
WHERE NOT EXISTS (SELECT p_id FROM table2 WHERE p_id = '100')
LIMIT 1
You can insert into a table based on any SELECT query. For example:
INSERT INTO table2 (p_id, s_id)
SELECT p_id, 2 FROM table1 WHERE p_id = 100;
If there are zero rows in table1 with the specified p_id value, this is a no-op. That is, it inserts zero rows into table2. If there is 1 row in table1 with that p_id value, it inserts into table2.
No need for LIMIT 1 because if p_id is the primary key then there is guaranteed to be only 1 or 0 rows with the given value.
Try This
Insert into table2 (p_id,s_id) Select p_id,'2' as s_id FROM table1 where p_id=100 LIMIT 1
How do i join 3 or more tables in mysql as follows?
there is a column for each column of each table (except ID)
ID field values all go into the same ID field in the new table
an additional column is added called table the values of which is the source Table name
an autoincremented newID field is added
only one table contributes to each row, unrelated fields have null values
total number of rows is equal to the sum records from all tables
example with just two tables :
TableA: TableB
ID | fieldA ID | fieldB
----------------- -----------------
1 | valueA1 1 | valueB1
2 | valueA2 2 | valueB2
ResultTable:
newID | ID | table | fieldA | fieldB
---------------------------------------------
1 | 1 | TableA | valueA1 |
2 | 2 | TableA | valueA2 |
3 | 1 | TableB | | valueB1
4 | 2 | TableB | | valueB2
I know this probably sounds a bit weird!. I am going to try and use this to batch insert nodes for records from various tables into neojs graph database with this batch-insert script. which could be hilarious considering I hardly know what I am doing in either database ;-) .
Try this one,
SELECT #rownum := #rownum + 1 AS NewID,
a.*
FROM
(
SELECT ID, fieldA, '' AS fieldB
FROM tableA
UNION ALL
SELECT ID, '' AS fieldA, fieldB
FROM tableB
) a, (SELECT #rownum:=0) r
SQLFiddle Demo
Create New Table
here's the proposed schema
CREATE TABLE Newtable
(
NewID INT AUTO_INCREMENT,
ID INT NOT NULL,
FieldA VARCHAR(30),
FieldB Varchar(30),
CONSTRAINT tb_pk PRIMARY KEY (NewID)
)
then Insert your values,
here's the query using INSERT INTO...SELECT statement
INSERT INTO NewTable (ID, fieldA, fieldB)
SELECT ID, fieldA, NULL AS fieldB
FROM tableA
UNION ALL
SELECT ID, NULL AS fieldA, fieldB
FROM tableB
Create a table with auto increment newID
Add all the possible columns allowing nulls.
INSERT INTO it the values from TableA, then TableB with something like:
INSERT INTO table
(ID, `table`, fieldA)
SELECT ID, 'TableA', fieldA FROM TableA
INSERT INTO table
(ID, `table`, fieldB)
SELECT ID, 'TableB', fieldB FROM TableB
Use UNION to select all rows in one result set, INSERT INTO for inserting to new table.
Also you can get new ID using ROW_NUMBER() in sql server
SELECT ID, COL1, NULL, NULL FROM Table1
UNION
SELECT ID, NULL, COL2, NULL FROM Table2
UNION
SELECT ID, NULL, NULL, COL3 FROM Table3
Select above result to a temp table. Use row number to update new ID
SELECT ID, ... , ROW_NUMBER() OVER(ORDER BY ID) AS NewID FROM #TempTable
I have 4 tables each with different columns but they all have one column in common. This is an integer identifier column. So I will have some integer x, and I want all the rows from all 4 tables that have this one id column equal to x.
I've tried something similar to:
SELECT table1.col1, table2.col2 FROM table1 LEFT JOIN table2 ON table1.id=x OR coastlinessports.id=x
And I get back rows which have both the columns from both tables in the same row.
So one result block would have:
table1.col1, table2.col2
But I really want:
table1.col1
tale2.col2
Is there a way I can do this without doing 4 select queries in a row?
If you want sequential rows from different tables, and for each table to return a different number of rows, then you can use UNION. However, UNION requires each SELECT to return the same number of columns, so you will need to fill in the missing columns with a value (or NULL), like this:
DROP TABLE IF EXISTS `table1`;
DROP TABLE IF EXISTS `table2`;
CREATE TABLE `table1` (
`id` int(11) NOT NULL auto_increment,
`col1` VARCHAR(255),
`col2` VARCHAR(255),
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
CREATE TABLE `table2` (
`id` int(11) NOT NULL auto_increment,
`col1` VARCHAR(255),
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
INSERT INTO `table1` VALUES
(1, '1,1', '1,2'),
(2, '2,1', '2,2');
INSERT INTO `table2` VALUES
(1, '1,1'),
(2, '2,1');
SELECT `id`, `col1`, `col2` FROM `table1` WHERE `id` = 1
UNION
SELECT `id`, `col1`, NULL AS `col2` FROM `table2` WHERE `id` = 1;
+----+------+------+
| id | col1 | col2 |
+----+------+------+
| 1 | 1,1 | 1,2 |
| 1 | 1,1 | NULL |
+----+------+------+
If you want to further process the UNION result set, you can wrap it in another SELECT, like this:
SELECT `col1`, `col2` FROM (
SELECT `id`, `col1`, `col2` FROM `table1` WHERE `id` = 1
UNION
SELECT `id`, `col1`, NULL AS `col2` FROM `table2` WHERE `id` = 1
) AS `t1`
ORDER BY col2;
+------+------+
| col1 | col2 |
+------+------+
| 1,1 | NULL |
| 1,1 | 1,2 |
+------+------+
Is that what you are after?
This probably won't answer your question, but there's something weird about the JOIN.
Usually the "ON" condition refers to both tables being joined, similar to this:
... FROM table1 LEFT JOIN table2 ON table1.id = table2.id ...
I guess there can be cases where you wouldn't do that, but I can't think of any.
You should check out this post. It seems like what you are asking for:
http://ask.sqlteam.com/questions/870/pivoting-multiple-rows-into-one-row-with-multiple-columns
If the table is called the same you can use USING
And for the part of the given value, use WHERE
select * from table1 join table2 using(commonColumn) join table3 using(commonColumn) join table4 using(commonColumn) where commonColumn="desiredValue"
Update: on a second read of your question
You want this?
All rows of table1 where commonColumn="desiredValue"
Followed by
All rows of table2 where commonColumn="desiredValue"
Followed by
All rows of table3 where commonColumn="desiredValue"
Followed by
All rows of table4 where commonColumn="desiredValue"
If that's so, you need to use a UNION (and you have to make 4 selects)
IF the number of columns differs, you need to fill the gaps whit aliases
SELECT col1, col2, col3, col4 from table1 where commonColumn="desiredValue"
UNION
SELECT col1, col2, 0 as col3, 0 as col4 from table2 where commonColumn="desiredValue"
...