How to copy a very large table into another table in MYSQL? - mysql

I have a large table with 110M rows. I would like to copy some of the fields into a new table and here is a rough idea of how I am trying to do:
DECLARE l_seenChangesTo DATETIME DEFAULT '1970-01-01 01:01:01';
DECLARE l_migrationStartTime DATETIME;
SELECT NOW() into l_migrationStartTime;
-- See if we've run this migration before and if so, pick up from where we left off...
IF EXISTS(SELECT seenChangesTo FROM migration_status WHERE client_user = CONCAT('this-migration-script-', user())) THEN
SELECT seenChangesTo FROM migration_status WHERE client_user = CONCAT('this-migration-script-', user()) INTO l_seenChangesTo;
SELECT NOW() as LogTime, CONCAT('Picking up from where we left off: ', l_seenChangesTo) as MigrationStatus;
END IF;
INSERT IGNORE INTO newTable
(field1, field2, lastModified)
SELECT o.column1 AS field1,
o.column2 AS field2,
o.lastModified
FROM oldTable o
WHERE
o.lastModified >= l_seenChangesTo AND
o.lastModified <= l_migrationStartTime;
INSERT INTO migration_status (client_user,seenChangesTo)
VALUES (CONCAT('this-migration-script-', user()), l_migrationStartTime)
ON DUPLICATE KEY UPDATE seenChangesTo=l_migrationStartTime;
Context:
CREATE TABLE IF NOT EXISTS `newTable` (
`field1` varchar(255) NOT NULL,
`field2` tinyint unsigned NOT NULL,
`lastModified` datetime NOT NULL,
PRIMARY KEY (`field1`, `field2`),
KEY `ix_field1` (`field1`),
KEY `ix_lastModified` (`lastModified`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `oldTable` (
`column1` varchar(255) NOT NULL,
`column2` tinyint unsigned NOT NULL,
`lastModified` datetime NOT NULL,
PRIMARY KEY (`column1`, `column2`),
KEY `ix_column1` (`column1`),
KEY `ix_lastModified` (`lastModified`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `migration_status` (
`client_user` char(64) NOT NULL,
`seenChangesTo` char(128) NOT NULL,
PRIMARY KEY (`client_user`)
);
Note: I have a few more columns in oldTable. Both oldTable and newTable are in same DB schema using mysql.
What's the general strategy when copying a very table? Should I perform this migration in an iterative manner by copy say 50,000 rows at time.

The insert speed doing a migration like this iteratively is going to be dreadfully slow. Why not SELECT oldTable INTO OUTFILE, then LOAD DATA INFILE ?

Related

Increment MySQL uniq row by only one request

I need to increment value unique row, use only one request
Not use select and after condition INSERT OR UPDATE
Table columns (date, servie) are used as unique values
Only one INSERT
For example
CREATE TABLE `log` (
`date` date NOT NULL,
`service` varchar(255) NOT NULL,
`count` int(11) NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
I found a single solution, but it not work for my
INSERT INTO log (`date`,`service`,`count`) VALUES ('2020-12-1','amazon',count+1)
ON DUPLICATE KEY UPDATE `date`='2020-12-01',`service`='amazon',`count`=`count`+1;
Are there any other solutions?
First, define id_key as a unique key in the table (or as the primary key):
create table log (
id_key int primary key
dt date not null,
service varchar(255) not null,
cnt int not null,
) engine=innodb default charset=utf8;
Then you can do:
insert into t1 (id_key, dt, service, cnt)
values (1, '2020-12-1', 'amazon', 1)
on duplicate key update
dt = values(dt),
service = values(service),
cnt = cnt + 1;
This attempts to insert a new row in the table, with a cnt of of 1. If id_key exists already, then it is updated instead: the date and service are updated, and the count is incremented.
If you want to use the date and service as unique keys, then do reflect that in the create table statement:
create table log (
id_key int primary key
dt date not null,
service varchar(255) not null,
cnt int not null,
unique (dt, service)
) engine = innodb default charset = utf8;
And then, it does not really make sense to update these columns on duplicate keys, so:
insert into t1 (id_key, dt, service, cnt)
values (1, '2020-12-1', 'amazon', 1)
on duplicate key update cnt = cnt + 1;

How to Add integer column to an String column in MySQl 5.0

I Want to add an Integer Column to a String that's because i need to generate a varchar variable with a numeric part that automatically increments. For example, P000001,P000002...
In order to do that what i am doing while creation of table i have taken an int field ID which auto_increments and i am Concatenating P with 00000 and the ID value
The Table i have created is :
CREATE TABLE tblAcceptTest(
ID int AUTO_INCREMENT NOT NULL primary key,
PatientID as CONCAT('P' , CONCAT('000000',CAST(ID as char)))
);
It Shows me the error from as keyword.
Please help
MySQL's documentation (http://dev.mysql.com/doc/refman/5.1/en/create-table.html) says, "the default value must be a constant; it cannot be a function or an expression." Why don't you just get the PatientID value afterward as part of the SELECT:
SELECT CONCAT('P', LPAD(ID, 6, 0)) AS PatientID FROM tblAcceptTest;
It looks like you want six digits after the "P", so try this for your expression:
CONCAT('P', LPAD(ID, 6, '0'))
Mysql has little support for computed columns.
Patient ID from your specification could be a char(7)
CREATE TABLE tblAcceptTest(
ID int AUTO_INCREMENT NOT NULL primary key,
PatientID char(7)
);
Then create some triggers. Note that the following insert trigger will cause issues with high concurrency servers.
DELIMITER |
CREATE TRIGGER tblAcceptTest_insert BEFORE INSERT ON tblAcceptTest
FOR EACH ROW BEGIN
DECLARE next_id INT;
SET next_id = (SELECT AUTO_INCREMENT FROM information_schema.TABLES WHERE TABLE_SCHEMA=DATABASE() AND TABLE_NAME='tblAcceptTest');
SET NEW.PatientID = CONCAT('P' , RIGHT(CONCAT('000000',next_id),6)) ;
END;
|
CREATE TRIGGER tblAcceptTest_update BEFORE UPDATE ON tblAcceptTest
FOR EACH ROW BEGIN
SET NEW.PatientID = CONCAT('P' , RIGHT(CONCAT('000000',NEW.ID),6)) ;
END;
|
DELIMITER ;
You use relationships and views to achieve the same result.
CREATE TABLE `patient` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`patient` varchar(60) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `accepted_test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`patient_id` int(11) NOT NULL,
`accepted` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `patient_id` (`patient_id`),
CONSTRAINT `accepted_test_ibfk_1` FOREIGN KEY (`patient_id`) REFERENCES `patient` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
create or replace view accepted_test_veiw as
select CONCAT('P' , RIGHT(CONCAT('000000',patient_id),6)) patient_key
, accepted
, id accepted_test_id
, patient_id
from accepted_test ;
select * from `accepted_test_veiw`

MySQL INSERT INTO ... SELECT ... GROUP BY is too slow

I have a table with about 50M rows and format:
CREATE TABLE `big_table` (
`id` BIGINT NOT NULL,
`t1` DATETIME NOT NULL,
`a` BIGINT NOT NULL,
`type` VARCHAR(10) NOT NULL,
`b` BIGINT NOT NULL,
`is_c` BOOLEAN NOT NULL,
PRIMARY KEY (`id`),
INDEX `a_b_index` (a,b)
) ENGINE=InnoDB;
I then define the table t2, with no indices:
Create table `t2` (
`id` BIGINT NOT NULL,
`a` BIGINT NOT NULL,
`b` BIGINT NOT NULL,
`t1min` DATETIME NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I then populate t2 using a query from big_table (this will add about 12M rows).
insert into opportunities
(id, a,b,t1min)
SELECT id,a,b,min(t1)
FROM big_table use index (a_b_index)
where type='SUBMIT' and is_c=1
GROUP BY a,b;
I find that it takes this query about a minute to process 5000 distinct (a,b) in big_table.
Since there are 12M distinct (a,b) in big_table then it would take about 40 hours to run
the query on all of big_table.
What is going wrong?
If I just do SELECT ... then the query does 5000 lines in about 2s. If I SELECT ... INTO OUTFILE ..., then the query still takes 60s for 5000 lines.
EXPLAIN SELECT ... gives:
id,select_type,table,type,possible_keys,key,key_len,ref,rows,Extra
1,SIMPLE,stdnt_intctn_t,index,NULL,a_b_index,16,NULL,46214255,"Using where"
I found that the problem was that the GROUP_BY resulted in too many random-access reads of big_table. The following strategy allows one sequential trip through big_table. First, we add a key to t2:
Create table `t2` (
`id` BIGINT NOT NULL,
`a` BIGINT NOT NULL,
`b` BIGINT NOT NULL,
`t1min` DATETIME NOT NULL,
PRIMARY KEY (a,b),
INDEX `id` (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Then we fill t2 using:
insert into t2
(id, a,b,t1min)
SELECT id,a,b,t1
FROM big_table
where type='SUBMIT' and is_c=1
ON DUPLICATE KEY UPDATE
t1min=if(t1<t1min,t1,t1min),
id=if(t1<t1min,big_table.id,t2.id);
The resulting speed-up is several orders of magnitude.
The group by might be part of the issue. You are using an index on (a,b), but your where is not being utilized. I would have an index on
(type, is_c, a, b )
Also, you are getting the "ID", but not specifying which... you probably want to do a MIN(ID) for a consistent result.

How to refact UPDATE via SELECT

I have this structure of my db:
CREATE TABLE IF NOT EXISTS `peoples` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1 ;
For customers.
CREATE TABLE IF NOT EXISTS `peoplesaddresses` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`people_id` int(10) unsigned NOT NULL,
`phone` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
`address` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1 ;
For their addresses.
CREATE TABLE IF NOT EXISTS `peoplesphones` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`people_id` int(10) unsigned NOT NULL,
`phone` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
`address` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1 ;
For their phones.
UPD4
ALTER TABLE peoplesaddresses DISABLE KEYS;
ALTER TABLE peoplesphones DISABLE KEYS;
ALTER TABLE peoplesaddresses ADD INDEX i_phone (phone);
ALTER TABLE peoplesphones ADD INDEX i_phone (phone);
ALTER TABLE peoplesaddresses ADD INDEX i_address (address);
ALTER TABLE peoplesphones ADD INDEX i_address (address);
ALTER TABLE peoplesaddresses ENABLE KEYS;
ALTER TABLE peoplesphones ENABLE KEYS;
END UPD4
CREATE TABLE IF NOT EXISTS `order` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`people_id` int(10) unsigned NOT NULL,
`name` varchar(255) CHARACTER SET utf8 NOT NULL,
`phone` varchar(255) CHARACTER SET utf8 NOT NULL,
`adress` varchar(255) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=12 ;
INSERT INTO `order` (`id`, `people_id`, `name`, `phone`, `adress`) VALUES
(1, 0, 'name1', 'phone1', 'address1'),
(2, 0, 'name1_1', 'phone1', 'address1_1'),
(3, 0, 'name1_1', 'phone1', 'address1_2'),
(4, 0, 'name2', 'phone2', 'address2'),
(5, 0, 'name2_1', 'phone2', 'address2_1'),
(6, 0, 'name3', 'phone3', 'address3'),
(7, 0, 'name4', 'phone4', 'address4'),
(8, 0, 'name1_1', 'phone5', 'address1_1'),
(9, 0, 'name1_1', 'phone5', 'address1_2'),
(11, 0, 'name1', 'phone1', 'address1'),
(10, 0, 'name1', 'phone1', 'address1');
Production base have over 9000 records. Is there way to execute this 3 update query's little more faster, than now (~50 min on dev machine).
INSERT INTO peoplesphones( phone, address )
SELECT DISTINCT `order`.phone, `order`.adress
FROM `order`
GROUP BY `order`.phone;
Fill peoplesphones table with unique phones
INSERT INTO peoplesaddresses( phone, address )
SELECT DISTINCT `order`.phone, `order`.adress
FROM `order`
GROUP BY `order`.adress;
Fill peoplesaddresses table with unique adress.
The next three querys are very slow:
UPDATE peoplesaddresses, peoplesphones SET peoplesaddresses.people_id = peoplesphones.id WHERE peoplesaddresses.phone = peoplesphones.phone;
UPDATE peoplesaddresses, peoplesphones SET peoplesphones.people_id = peoplesaddresses.people_id WHERE peoplesaddresses.address = peoplesphones.address;
UPDATE `order`, `peoplesphones` SET `order`.people_id = `peoplesphones`.people_id where `order`.phone = `peoplesphones`.phone;
Finally fill people table, and clear uneccessary fields.
INSERT INTO peoples( id, name )
SELECT DISTINCT `order`.people_id, `order`.name
FROM `order`
GROUP BY `order`.people_id;
ALTER TABLE `peoplesphones`
DROP `address`;
ALTER TABLE `peoplesaddresses`
DROP `phone`;
So, again: How can I make those UPDATE query's a little more faster? THX.
UPD: I forgott to say: I need to do it at once, just for migrate phones and adresses into other tables since one people can have more than one phone, and can order pizza not only at home.
UPD2:
UPD3:
Replace slow update querys on this (without with) get nothing.
UPDATE peoplesaddresses
LEFT JOIN
peoplesphones
ON peoplesaddresses.phone = peoplesphones.phone
SET peoplesaddresses.people_id = peoplesphones.id;
UPDATE peoplesphones
LEFT JOIN
`peoplesaddresses`
ON `peoplesaddresses`.address = `peoplesphones`.address
SET `peoplesphones`.people_id = `peoplesaddresses`.people_id;
UPDATE `order`
LEFT JOIN
`peoplesphones`
ON `order`.phone = `peoplesphones`.phone
SET `order`.people_id = `peoplesphones`.people_id;
UPD4 After adding code at the top (upd4), script takes a few seconds for execute. But on ~6.5k query it terminate with text: "The system cannot find the Drive specified".
Thanks to All. Especially to xQbert and Brent Baisley.
50 minutes for 9000 records is a bit ridiculous, event without indexes. You might as well put the 9000 records in Excel and do what you need to do. I think there is something else going on with your dev machine. Perhaps you have mysql configured to use very little memory? Maybe you can post the results of this "query":
show variables like "%size%";
Just this morning I did an insert(ignore)/select on 2 tables (one into another), both with over 400,000 records. 126,000 records were inserted into the second table, it took a total of 2 minutes 13 seconds.
I would say put indexes on any of the fields you are joining or grouping on, but this seems like a one time job. I don't think the lack of indexes is your problem.
All write operations are slow in relational databases. Especially indexes make them slow, since they have to be recalculated.
If you're using a WHERE in your statements, you should place an index on the fields referenced.
GROUP BY is always very slow, and so is DISTINCT, since they have to do a lot of checks that don't scale linearly. Always avoid them.
You may like to choose a different database engine for what you're doing. 9000 records in 50 minutes is very slow. Experiment with a few different engines, such as MyISAM and InnoDB. If you're using temporary tables a lot, MEMORY is really fast for those.
Update: Also, updating multiple tables in one statement probably shouldn't be done.

Super slow query with CROSS JOIN

I have two tables named table_1 (1GB) and reference (250Mb).
When I query a cross join on reference it takes 16hours to update table_1 .. We changed the system files EXT3 for XFS but still it's taking 16hrs.. WHAT AM I DOING WRONG??
Here is the update/cross join query :
mysql> UPDATE table_1 CROSS JOIN reference ON
-> (table_1.start >= reference.txStart AND table_1.end <= reference.txEnd)
-> SET table_1.name = reference.name;
Query OK, 17311434 rows affected (16 hours 36 min 48.62 sec)
Rows matched: 17311434 Changed: 17311434 Warnings: 0
Here is a show create table of table_1 and reference:
CREATE TABLE `table_1` (
`strand` char(1) DEFAULT NULL,
`chr` varchar(10) DEFAULT NULL,
`start` int(11) DEFAULT NULL,
`end` int(11) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`name2` varchar(255) DEFAULT NULL,
KEY `annot` (`start`,`end`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
CREATE TABLE `reference` (
`bin` smallint(5) unsigned NOT NULL,
`name` varchar(255) NOT NULL,
`chrom` varchar(255) NOT NULL,
`strand` char(1) NOT NULL,
`txStart` int(10) unsigned NOT NULL,
`txEnd` int(10) unsigned NOT NULL,
`cdsStart` int(10) unsigned NOT NULL,
`cdsEnd` int(10) unsigned NOT NULL,
`exonCount` int(10) unsigned NOT NULL,
`exonStarts` longblob NOT NULL,
`exonEnds` longblob NOT NULL,
`score` int(11) DEFAULT NULL,
`name2` varchar(255) NOT NULL,
`cdsStartStat` enum('none','unk','incmpl','cmpl') NOT NULL,
`cdsEndStat` enum('none','unk','incmpl','cmpl') NOT NULL,
`exonFrames` longblob NOT NULL,
KEY `chrom` (`chrom`,`bin`),
KEY `name` (`name`),
KEY `name2` (`name2`),
KEY `annot` (`txStart`,`txEnd`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 ;
You should index table_1.start, reference.txStart, table_1.end and reference.txEnd table fields:
ALTER TABLE `table_1` ADD INDEX ( `start` ) ;
ALTER TABLE `table_1` ADD INDEX ( `end` ) ;
ALTER TABLE `reference` ADD INDEX ( `txStart` ) ;
ALTER TABLE `reference` ADD INDEX ( `txEnd` ) ;
Cross joins are Cartesian Products, which are probably one of the most computationally expensive things to compute (they don't scale well).
For each table T_i for i = 1 to n, the number of rows generated by crossing tables T_1 to T_n is the size of each table multiplied by the size of each other table, ie
|T_1| * |T_2| * ... * |T_n|
Assuming each table has M rows, the resulting cost of computing the cross join is then
M_1 * M_2 ... M_n = O(M^n)
which is exponential in the number of tables involved in the join.
I see 2 problems with the UPDATE statement.
There is no index for the End fields. The compound indexes (annot) you have will be used only for the start fields in this query. You should add them as suggested by Emre:
ALTER TABLE `table_1` ADD INDEX ( `end` ) ;
ALTER TABLE `reference` ADD INDEX ( `txEnd` ) ;
Second, the JOIN may (and probably does) find many rows of table reference that are related to a row of table_1. So some (or all) rows of table_1 that are updated, are updated many times. Check the result of this query, to see if it is the same as your updated rows count (17311434):
SELECT COUNT(*)
FROM table_1
WHERE EXISTS
( SELECT *
FROM reference
WHERE table_1.start >= reference.txStart
AND table_1.`end` <= reference.txEnd
)
There can be other ways to write this query but the lack of a PRIMARY KEY on both tables makes it harder. If you define a primary key on table_1, try this, replacing id with the primary key.
Update: No, do not try it on a table with 34M rows. Check the execution plan and try with smaller tables first.
UPDATE table_1 AS t1
JOIN
( SELECT t2.id
, r.name
FROM table_1 AS t2
JOIN
( SELECT name, txStart, txEnd
FROM reference
GROUP BY txStart, txEnd
) AS r
ON t2.start >= r.txStart
AND t2.`end` <= r.txEnd
GROUP BY t2.id
) AS good
ON good.id = t1.id
SET t1.name = good.name;
You can check the query plan by running EXPLAIN on the equivalent SELECT:
EXPLAIN
SELECT t1.id, t1.name, good.name
FROM table_1 AS t1
JOIN
( SELECT t2.id
, r.name
FROM table_1 AS t2
JOIN
( SELECT name, txStart, txEnd
FROM reference
GROUP BY txStart, txEnd
) AS r
ON t2.start >= r.txStart
AND t2.`end` <= r.txEnd
GROUP BY t2.id
) AS good
ON good.id = t1.id ;
Try this:
UPDATE table_1 SET
table_1.name = (
select reference.name
from reference
where table_1.start >= reference.txStart
and table_1.end <= reference.txEnd)
Somebody already offered you to add some indexes. But I think the best performance you may get with these two indexes:
ALTER TABLE `test`.`time`
ADD INDEX `reference_start_end` (`txStart` ASC, `txEnd` ASC),
ADD INDEX `table_1_star_end` (`start` ASC, `end` ASC);
Only one of them will be used by MySQL query, but MySQL will decide which is more useful automatically.