We're having a weird problem with MySQL (and also MariaDB). A simple database with 2 tables (InnoDB engine), both containing (among a few others) 3 or 4 text columns with XML data approx. 1-5kB in size.
Each table has around 40000 rows and no indexes except those for foreign keys.
The weird part is running select statements. The XML columns are NOT used anywhere inside select statement (select, where, order, group, ...), yet they slow down execution. If those columns are null, select statement executes in less than 2 second, but if they contain data, execution time jumps to around 20 seconds. Why is that?!
This is a script that generates an example behaving like described above:
CREATE TABLE tableA (
id bigint(20) NOT NULL AUTO_INCREMENT,
col1 bigint(20) NULL,
col2 bigint(20) NULL,
date1 datetime NULL,
largeString1 text NULL,
largeString2 text NULL,
largeString3 text NULL,
largeString4 text NULL,
PRIMARY KEY (id)
) DEFAULT CHARSET=utf8;
CREATE TABLE tableB (
id bigint(20) NOT NULL AUTO_INCREMENT,
col1 bigint(20) NULL,
col2 varchar(45) NULL,
largeString1 text NULL,
largeString2 datetime NULL,
largeString3 text NULL,
PRIMARY KEY (id)
) DEFAULT CHARSET=utf8;
fillTables:
DELIMITER ;;
CREATE PROCEDURE `fillTables`(
numRows INT
)
BEGIN
DECLARE i INT;
DECLARE j INT;
DECLARE largeString TEXT;
SET i = 1;
START TRANSACTION;
WHILE i < numRows DO
SET j = 1;
SET largeString = '';
WHILE j <= 100 DO
SET largeString = CONCAT(largeString, (SELECT UUID()));
SET j = j + 1;
END WHILE;
INSERT INTO tableA (id, col1, col2, date1, largeString1,
largeString2, largeString3, largeString4)
VALUES (i, FLOOR(1 + RAND() * 2), numRows - i,
date_sub(now(), INTERVAL i hour),
largeString, largeString, largeString, largeString);
INSERT INTO tableB (id, col1, col2, largeString1,
largeString2, largeString3)
VALUES (numRows - i, i, (SELECT UUID()),
largeString, largeString, largeString);
SET i = i + 1;
END WHILE;
COMMIT;
ALTER TABLE tableA ADD FOREIGN KEY (col2) REFERENCES tableB(id);
CREATE INDEX idx_FK_tableA_tableB ON tableA(col2);
ALTER TABLE tableB ADD FOREIGN KEY (col1) REFERENCES tableA(id);
CREATE INDEX idx_FK_tableB_tableA ON tableB(col1);
END ;;
test
CREATE PROCEDURE `test`(
_param1 bigint
,_dateFrom datetime
,_dateTo datetime
)
BEGIN
SELECT
a.id
,DATE(a.date1) as date
,COALESCE(b2.col2, '') as guid
,COUNT(*) as count
FROM
tableA a
LEFT JOIN tableB b1 ON b1.col1 = a.id
LEFT JOIN tableB b2 ON b2.id = a.col2
WHERE
a.col1 = _param1
AND (_dateFrom IS NULL OR DATE(a.date1) BETWEEN DATE(_dateFrom) AND DATE(_dateTo))
GROUP BY
a.id
,DATE(a.date1)
,b2.col2
;
END;;
DELIMITER ;
To populate the tables with random data use
call fillTables(40000);
Stored procedure used for retrieving data:
call test(2, null, null);
Also, MSSQL executes the select statement in a fraction of a second without any table optimization (even without foreign keys defined).
UPDATE:
SHOW CREATE TABLE for both tables:
'CREATE TABLE `tableA` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`col1` bigint(20) DEFAULT NULL,
`col2` bigint(20) DEFAULT NULL,
`date1` datetime DEFAULT NULL,
`largeString1` text,
`largeString2` text,
`largeString3` text,
`largeString4` text,
PRIMARY KEY (`id`),
KEY `idx_FK_tableA_tableB` (`col2`),
CONSTRAINT `tableA_ibfk_1` FOREIGN KEY (`col2`) REFERENCES `tableB` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=40000 DEFAULT CHARSET=utf8'
'CREATE TABLE `tableB` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`col1` bigint(20) DEFAULT NULL,
`col2` varchar(45) DEFAULT NULL,
`largeString1` text,
`largeString2` datetime DEFAULT NULL,
`largeString3` text,
PRIMARY KEY (`id`),
KEY `idx_FK_tableB_tableA` (`col1`),
CONSTRAINT `tableB_ibfk_1` FOREIGN KEY (`col1`) REFERENCES `tableA` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=40000 DEFAULT CHARSET=utf8'
Both tables need INDEX(col1). Without it, these need table scans:
WHERE a.col1 = _param1
ON b1.col1 = a.id
For a this would be 'covering', hence faster:
INDEX(col1, date1, id, col2)
Don't use LEFT unless you need it.
Try not to hide columns in functions; it prevents using indexes for them:
DATE(a.date1) BETWEEN ...
This might work for that:
a.date1 >= DATE(_dateFrom)
AND a.date1 < DATE(_dateTo) + INTERVAL 1 DAY
As for the mystery of 20s vs 2s -- Did you run each timing test twice? The first time is often bogged down with I/O; the second is memory-bound.
ROW_FORMAT
In InnoDB there are 4 ROW_FORMATs; they mostly differ in how they handle big strings (TEXT, BLOB, etc). You mentioned that the query ran faster with NULL strings than with non-null strings. With the default ROW_FORMAT, some or all of the XML strings is stored with the rest of the columns. After some limit, the rest is put in another block(s).
If a large field is NULL, then it takes almost no space.
With ROW_FORMAT=DYNAMIC (see CREATE TABLE and ALTER TABLE), a non-null column will tend to be pushed to other blocks instead of making the main part of the record bulky.
This has the effect of allowing more rows to fit in a single block (except for the overflow). That, in turn, allows certain queries to run faster since they can get more information with fewer I/Os.
Read the documentation, I think you need these:
SET GLOBAL innodb_file_format=Barracuda;
SET GLOBAL innodb_file_per_table=1;
ALTER TABLE tbl ROW_FORMAT=DYNAMIC;
In reading the documentation, you will run across COMPRESSED. Although this would shrink the XML by perhaps 3:1, there are other issues. I don't know whether it would end up being better or not.
Buffer pool
innodb_buffer_pool_size should be about 70% of available RAM.
Related
I am trying to update one table based on another in the most efficient way.
Here is the table DDL of what I am trying to update
Table1
CREATE TABLE `customersPrimary` (
`id` int NOT NULL AUTO_INCREMENT,
`groupID` int NOT NULL,
`IDInGroup` int NOT NULL,
`name` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`address` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `groupID-IDInGroup` (`groupID`,`IDInGroup`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Table2
CREATE TABLE `customersSecondary` (
`groupID` int NOT NULL,
`IDInGroup` int NOT NULL,
`name` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`address` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`groupID`,`IDInGroup`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Both the tables are practically identical but customersSecondary table is a staging table for the other by design. The big difference is primary keys. Table 1 has an auto incrementing primary key, table 2 has a composite primary key.
In both tables the combination of groupID and IDInGroup are unique.
Here is the query I want to optimize
UPDATE customersPrimary
INNER JOIN customersSecondary ON
(customersPrimary.groupID = customersSecondary.groupID
AND customersPrimary.IDInGroup = customersSecondary.IDInGroup)
SET
customersPrimary.name = customersSecondary.name,
customersPrimary.address = customersSecondary.address
This query works but scans EVERY row in customersSecondary.
Adding
WHERE customersPrimary.groupID = (groupID)
Cuts it down significantly to the number of rows with the GroupID in customersSecondary. But this is still often far larger than the number of rows being updated since the groupID can be large. I think the WHERE needs improvement.
I can control table structure and add indexes. I will have to keep both tables.
Any suggestions would be helpful.
Your existing query requires a full table scan because you are saying update everything on the left based on the value on the right. Presumably the optimiser is choosing customersSecondary because it has fewer rows, or at least it thinks it has.
Is the full table scan causing you problems? Locking? Too slow? How long does it take? How frequently are the tables synced? How many records are there in each table? What is the rate of change in each of the tables?
You could add separate indices on name and address but that will take a good chunk of space. The better option is going to be to add an indexed updatedAt column and use that to track which records have been changed.
ALTER TABLE `customersPrimary`
ADD COLUMN `updatedAt` DATETIME NOT NULL DEFAULT '2000-01-01 00:00:00',
ADD INDEX `idx_customer_primary_updated` (`updatedAt`);
ALTER TABLE `customersSecondary`
ADD COLUMN `updatedAt` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
ADD INDEX `idx_customer_secondary_updated` (`updatedAt`);
And then you can add updatedAt to your join criteria and the WHERE clause -
UPDATE customersPrimary cp
INNER JOIN customersSecondary cs
ON cp.groupID = cs.groupID
AND cp.IDInGroup = cs.IDInGroup
AND cp.updatedAt < cs.updatedAt
SET
cp.name = cs.name,
cp.address = cs.address,
cp.updatedAt = cs.updatedAt
WHERE cs.updatedAt > :last_query_run_time;
For :last_query_run_time you could use the last run time if you are storing it. Otherwise, if you know you are running the query every hour you could use NOW() - INTERVAL 65 MINUTE. Notice I have used more than one hour to make sure records aren't missed if there is a slight delay for some reason. Another option would be to use SELECT MAX(updatedAt) FROM customersPrimary -
UPDATE customersPrimary cp
INNER JOIN (SELECT MAX(updatedAt) maxUpdatedAt FROM customersPrimary) t
INNER JOIN customersSecondary cs
ON cp.groupID = cs.groupID
AND cp.IDInGroup = cs.IDInGroup
AND cp.updatedAt < cs.updatedAt
SET
cp.name = cs.name,
cp.address = cs.address,
cp.updatedAt = cs.updatedAt
WHERE cs.updatedAt > t.maxUpdatedAt;
Plan A:
Something like this would first find the "new" rows, then add only those:
UPDATE primary
SET ...
JOIN ( SELECT ...
FROM secondary
LEFT JOIN primary
WHERE primary... IS NULL )
ON ...
Might secondary have changes? If so, a variant of that would work.
Plan B:
Better yet is to TRUNCATE TABLE secondary after it is folded into primary.
I have a Table with a VARCHAR(20) as not unique index. If I declare a variable for it with SET #varname = "XYZ" the index is not used.
I tried to change "" to '',
I tried to change = to := in the SET statement,
I tried to use = and LIKE for my query.
CREATE TABLE `table_name` (
`ColumnID` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`Time` DATETIME NOT NULL,
`indexed_varchar_field ` VARCHAR(20) DEFAULT NULL,
UNIQUE KEY `ColumnID` (`ColumnID`),
KEY `idx_varchar_field` (`indexed_varchar_field `)
) ENGINE=InnoDB AUTO_INCREMENT=20268516 DEFAULT CHARSET=utf8;
SET #VAR_NAME = 'XYZ';
SELECT * FROM table_name WHERE indexed_varchar_field = #VAR_NAME;
The actual table has more columns, the indexed_varchar_field is a serial number and I want to have all datasets from this machine. Without index it takes more than a minute, with index it takes less than a second.
If I try it without the variable it uses the index and is fast.
I have a large table named 'roomlogs' which has nearly 1 million entries.
The structure of the table:
id --> PK
roomId --> varchar FK to rooms table
userId --> varchar FK to users table
enterTime --> Date and Time
exitTime --> Date and Time
status --> bool
I have the previous indexing on roomID, I recently added an index on the userId column.
So, When I run a stored procedure with following code it is taking more time like on average 50 seconds. WHich it should not take.
DELIMITER ;;
CREATE DEFINER=`root`#`%` PROCEDURE `enter_room`(IN pRoomId varchar(200), IN puserId varchar(50), IN ptime datetime, IN phidden int, pcheckid int, pexit datetime)
begin
update roomlogs set
roomlogs.exitTime = ptime,
roomlogs.`status` = 1
where
roomlogs.userId = puserId
and roomlogs.`status` = 0
and DATEDIFF(ptime,roomlogs.enterTime) = 0;
INSERT into roomlogs
( roomlogs.roomId,
roomlogs.userId,
roomlogs.enterTime,
roomlogs.exitTime,
roomlogs.hidden,
roomlogs.checkinId )
value
( pRoomId,
userId,
ptime,
pexit,
phidden,
pcheckid);
select *
from
roomlogs
where
roomlogs.id= LAST_INSERT_ID();
end ;;
DELIMITER ;
What Can be the reason for it to take this much time:
I added an index recently so previous rows are not indexed.
There is no selection on storage type for any indexes right now. Should I change it to B-tree?
On my website, I get 20-30 simultaneous call on other procedures also while this procedure has 10-20 simultaneous calls, does the update query in the procedure make a lock? But in MySQL.slow_logs table for each query the lock _time shows 0.
Is there any other reason for this behaviour?
Edit: Here is the SHOW TABLE:
CREATE TABLE `roomlogs` (
`roomId` varchar(200) CHARACTER SET latin1 DEFAULT NULL,
`userID` varchar(50) CHARACTER SET latin1 DEFAULT NULL,
`enterTime` datetime DEFAULT NULL,
`exitTime` datetime DEFAULT NULL,
`id` int(11) NOT NULL AUTO_INCREMENT,
`status` int(11) DEFAULT '0',
`hidden` int(11) DEFAULT '0',
`checkinId` int(11) DEFAULT '-1',
PRIMARY KEY (`id`),
KEY `RoomLogIndex` (`roomId`),
KEY `RoomLogIDIndex` (`id`),
KEY `USERID` (`userID`)
) ENGINE=InnoDB AUTO_INCREMENT=1064216 DEFAULT CHARSET=utf8
I can also see that this query is running more number of times like 100000 times per day (nearly continuously).
SELECT count(*) from roomlogs where roomId=proomId and status='0';
Because of this query reads from the same table, does InnoDB block or create a lock on update query because I can see that when the above-stored procedure is running more number of times then this query is taking more time.
Here is the link for MySQL variables: https://docs.google.com/document/d/17_MVaU4yvpQfVDT83yhSjkLHsgYd-z2mg6X7GwvYZGE/edit?usp=sharing
roomlogs needs this 'composite' index:
INDEX(userId, `status`, enterTime)
I added an index recently so previous rows are not indexed.
Not true. Adding an INDEX indexes the entire table.
The default index type is BTree; no need to explicitly specify it.
does the update query in the procedure make a lock?
It does some form of locking. What is the value of autocommit? Do you explicitly use BEGIN and COMMIT? Is the table ENGINE=InnoDB? Please provide SHOW CREATE TABLE.
MySQL.slow_logs table for each query the lock _time shows 0.
The INSERT you show seems to be inserting the same row as the UPDATE. Maybe you need INSERT ... ON DUPLICATE KEY UPDATE ...?
Don't "hide an index column in a function"; instead of DATEDIFF(roomlogs.enterTime,NOW()) = 0, do
AND enterTime >= CURDATE()
AND enterTime < CURDATE() + INTERVAL 1 DAY
This allows the index to be used more fully.
KEY `RoomLogIndex` (`roomId`), Change to (roomId, status)
KEY `RoomLogIDIndex` (`id`), Remove, redundant with the PK
Buffer pool in only 97,517,568 -- make it more like 9G.
I have a database with the following three tables:
matches table has 200,000 matches...
CREATE TABLE `matches` (
`match_id` bigint(20) unsigned NOT NULL,
`start_time` int(10) unsigned NOT NULL,
PRIMARY KEY (`match_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
heroes table has ~100 heroes...
CREATE TABLE `heroes` (
`hero_id` smallint(5) unsigned NOT NULL,
`name` char(40) NOT NULL,
PRIMARY KEY (`hero_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
matches_heroes table has 2,000,000 relationships (10 random heroes per match)...
CREATE TABLE `matches_heroes` (
`relation_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`match_id` bigint(20) unsigned NOT NULL,
`hero_id` smallint(6) unsigned NOT NULL,
PRIMARY KEY (`relation_id`),
KEY `match_id` (`match_id`),
KEY `hero_id` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_2` FOREIGN KEY (`hero_id`)
REFERENCES `heroes` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_1` FOREIGN KEY (`match_id`)
REFERENCES `matches` (`match_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=3689891 DEFAULT CHARSET=utf8
The following query takes over 1 second, which seems pretty slow to me for something so simple:
SELECT SQL_NO_CACHE COUNT(*) AS match_count
FROM matches INNER JOIN matches_heroes ON matches.match_id = matches_heroes.match_id
WHERE hero_id = 5
Removing only the WHERE clause doesn't help, but if I take out the INNER JOIN also, like so:
SELECT SQL_NO_CACHE COUNT(*) AS match_count FROM matches
...it only takes 0.05 seconds. It seems that INNER JOIN is very costly. I don't have much experience with joins. Is this normal or am I doing something wrong?
UPDATE #1: Here's the EXPLAIN result.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE matches_heroes ref match_id,hero_id,match_id_hero_id hero_id 2 const 34742
1 SIMPLE matches eq_ref PRIMARY PRIMARY 8 mydatabase.matches_heroes.match_id 1 Using index
UPDATE #2: After listening to you guys, I think it's working properly and this is simply as fast as it gets. Please let me know if you disagree. Thanks for all the help. I really appreciate it.
Use COUNT(matches.match_id) instead of count(*), as when using joins it's best to not use the * as it does extra computation. Using columns from the join are the best way ensure you are not requesting any other operations. (not a problem on MySql InnerJoin, my bad).
Also you should verify that you have all keys defragmented, and enough ram free for the index to load in memory
Update 1:
Try to add a composed index for match_id,hero_id as it should give better performance.
ALTER TABLE `matches_heroes` ADD KEY `match_id_hero_id` (`match_id`,`hero_id`)
Update 2:
I wasn't satisfied with the accepted answer, that mysql is that slow for just 2 mill records and I runed benchmarks on my ubuntu PC (i7 processor, with standard HDD).
-- pre-requirements
CREATE TABLE seq_numbers (
number INT NOT NULL
) ENGINE = MYISAM;
DELIMITER $$
CREATE PROCEDURE InsertSeq(IN MinVal INT, IN MaxVal INT)
BEGIN
DECLARE i INT;
SET i = MinVal;
START TRANSACTION;
WHILE i <= MaxVal DO
INSERT INTO seq_numbers VALUES (i);
SET i = i + 1;
END WHILE;
COMMIT;
END$$
DELIMITER ;
CALL InsertSeq(1,200000)
;
ALTER TABLE seq_numbers ADD PRIMARY KEY (number)
;
-- create tables
-- DROP TABLE IF EXISTS `matches`
CREATE TABLE `matches` (
`match_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`start_time` int(10) unsigned NOT NULL,
PRIMARY KEY (`match_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
CREATE TABLE `heroes` (
`hero_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`name` char(40) NOT NULL,
PRIMARY KEY (`hero_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
CREATE TABLE `matches_heroes` (
`match_id` bigint(20) unsigned NOT NULL,
`hero_id` smallint(6) unsigned NOT NULL,
PRIMARY KEY (`match_id`,`hero_id`),
KEY (match_id),
KEY (hero_id),
CONSTRAINT `matches_heroes_ibfk_2` FOREIGN KEY (`hero_id`) REFERENCES `heroes` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_1` FOREIGN KEY (`match_id`) REFERENCES `matches` (`match_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
-- insert DATA
-- 100
INSERT INTO heroes(name)
SELECT SUBSTR(CONCAT(char(RAND()*25+65),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97)),1,RAND()*9+4) as RandomName
FROM seq_numbers WHERE number <= 100
-- 200000
INSERT INTO matches(start_time)
SELECT rand()*1000000
FROM seq_numbers WHERE number <= 200000
-- 2000000
INSERT INTO matches_heroes(hero_id,match_id)
SELECT a.hero_id, b.match_id
FROM heroes as a
INNER JOIN matches as b ON 1=1
LIMIT 2000000
-- warm-up database, load INDEXes in ram (optional, works only for MyISAM tables)
LOAD INDEX INTO CACHE matches_heroes,matches,heroes
-- get random hero_id
SET #randHeroId=(SELECT hero_id FROM matches_heroes ORDER BY rand() LIMIT 1);
-- test 1
SELECT SQL_NO_CACHE #randHeroId,COUNT(*) AS match_count
FROM matches as a
INNER JOIN matches_heroes as b ON a.match_id = b.match_id
WHERE b.hero_id = #randHeroId
; -- Time: 0.039s
-- test 2: adding some complexity
SET #randName = (SELECT `name` FROM heroes WHERE hero_id = #randHeroId LIMIT 1);
SELECT SQL_NO_CACHE #randName, COUNT(*) AS match_count
FROM matches as a
INNER JOIN matches_heroes as b ON a.match_id = b.match_id
INNER JOIN heroes as c ON b.hero_id = c.hero_id
WHERE c.name = #randName
; -- Time: 0.037s
Conclusion: The test results are about 20x faster, and my server load was about 80% before testing as it's not a dedicated mysql server and had other cpu intensive tasks running, so if you run the whole script (from above) and get lower results it can be because:
you have a shared host, and the load is too big. In this case there isn't much you can do: you either complain to your current host, pay for a better host/vm or try another host
your configured key_buffer_size(for MyISAM) or innodb_buffer_pool_size(for innoDB) is too small, the optimum size would be over 150MB
your available ram is not enough, you would require about 100 - 150 mb of ram for the indexes to be loaded into memory. solution: free up some ram or buy more of it
Note that by using the test script, the generating of new data rules out the index fragmentation problem.
Hope this helps, and ask if you have issues in testing this.
obs:
SELECT SQL_NO_CACHE COUNT(*) AS match_count
FROM matches INNER JOIN matches_heroes ON matches.match_id = matches_heroes.match_id
WHERE hero_id = 5`
is the equivalent to:
SELECT SQL_NO_CACHE COUNT(*) AS match_count
FROM matches_heroes
WHERE hero_id = 5`
So you wouldn't require a join, if that's the count you need, but I'm guessing that was just an example.
So you say reading a table of 200,000 records is faster than reading a table of 2,000,000 records, finding the desired ones, then take them all to find matching records in the 200,000 record table?
And this surprises you? It's simply a lot of more work for the dbms. (It can even be, btw, that the dbms decides not to use the hero_id index when it considers a full table scan to be faster.)
So in my opinion there is nothing wrong with what is happening here.
I am trying to move a table which contains billions of rows to a new directory in MySQL 5.6. I am trying to copy table1 to table2 and there by droping table1 and then renaming table2 to table1.
CREATE TABLE `table2` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`col1` int(11) DEFAULT NULL,
`col2` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_col1_col2` (`col1`,`col2`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 DATA DIRECTORY='/mysql_data/';
I am using the below procedure to do the copy.
DROP PROCEDURE IF EXISTS copytables;
CREATE PROCEDURE `copytables`()
begin
DECLARE v_id INT(11) unsigned default 0;
declare maxid int(11) unsigned default 0;
select max(id) into maxid from table1;
while v_id < maxid do
insert into table2(col1,col2)
select fbpost_id,fbuser_id from table1 where id >= v_id and id <v_id+100000 ;
set v_id=v_id+100000;
select v_id;
select max(id) into maxid from table1;
select maxid;
end while;
end;
But now I am getting gaps in id column after every batch of 100000 in table2 (after the id 199999 next id is 262141). Table1 is not containing any gaps in id column.
Ask Google: https://www.google.com/search?q=auto_increment+mysql+gaps+innodb The first result explains this issue.
Generally, you need to be able to tell SO people what you have tried so far and why it isn't working. In this case, this is just a feature/characteristic of the InnoDB engine that lets it operate quickly at high volumes.
Auto increment fields are not guaranteed to be dense, they're just guaranteed to give you unique values. Usually it will do so by giving you dense (consecutive) values, but it doesn't have to. It will reserve a number of values, which can be discarded if not used. See http://dev.mysql.com/doc/refman/5.6/en/example-auto-increment.html