auto increment holes in insert statement mysql 5.6 - mysql

I am trying to move a table which contains billions of rows to a new directory in MySQL 5.6. I am trying to copy table1 to table2 and there by droping table1 and then renaming table2 to table1.
CREATE TABLE `table2` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`col1` int(11) DEFAULT NULL,
`col2` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_col1_col2` (`col1`,`col2`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 DATA DIRECTORY='/mysql_data/';
I am using the below procedure to do the copy.
DROP PROCEDURE IF EXISTS copytables;
CREATE PROCEDURE `copytables`()
begin
DECLARE v_id INT(11) unsigned default 0;
declare maxid int(11) unsigned default 0;
select max(id) into maxid from table1;
while v_id < maxid do
insert into table2(col1,col2)
select fbpost_id,fbuser_id from table1 where id >= v_id and id <v_id+100000 ;
set v_id=v_id+100000;
select v_id;
select max(id) into maxid from table1;
select maxid;
end while;
end;
But now I am getting gaps in id column after every batch of 100000 in table2 (after the id 199999 next id is 262141). Table1 is not containing any gaps in id column.

Ask Google: https://www.google.com/search?q=auto_increment+mysql+gaps+innodb The first result explains this issue.
Generally, you need to be able to tell SO people what you have tried so far and why it isn't working. In this case, this is just a feature/characteristic of the InnoDB engine that lets it operate quickly at high volumes.

Auto increment fields are not guaranteed to be dense, they're just guaranteed to give you unique values. Usually it will do so by giving you dense (consecutive) values, but it doesn't have to. It will reserve a number of values, which can be discarded if not used. See http://dev.mysql.com/doc/refman/5.6/en/example-auto-increment.html

Related

What is error? i'm trying to fill table with random values

I have two similar tables:
CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT ,
`c1` int(11) NOT NULL DEFAULT '0',
`c2` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `idx_c1` (`c1`)
) ENGINE=InnoDB;
CREATE TABLE `t2` (
`id` int(11) NOT NULL AUTO_INCREMENT ,
`c1` int(11) NOT NULL DEFAULT '0',
`c2` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `idx_c1` (`c1`)
) ENGINE=InnoDB;
I want to fill both tables with random values:
drop procedure if exists random_records;
truncate table t1;
truncate table t2;
delimiter $$
create procedure random_records(n int)
begin
set #i=1;
set #m=100000;
while #i <= n do
insert into t1(c1,c2) values(rand()*#m,rand()*#m);
insert into t2(c1,c2) values(rand()*#m,rand()*#m);
set #i=#i+1;
end while;
end $$
delimiter ;
call random_records(100);
select * from t1 limit 10;
select * from t2 limit 10;
select count(*) from t1;
select count(*) from t2;
Here is what i see in table t1:
I don't understand why there is a lot of '0' and'1'
Function count() returns 210 for t1 and 208 for t2 - one mystery more
The most likely reason for the presence of many zeros and ones in the c1 and c2 columns of both tables is that the rand() function is returning very small numbers. This is because the #m variable, which is used to scale the random numbers generated by rand(), is set to a relatively low value of 100,000.
As a result, the random numbers generated are mostly between 0 and 0.00001, which is why you are seeing many zeros and ones in the tables. To fix this, you can increase the value of #m to a higher number, such as 1,000,000 or even 10,000,000, to generate larger random numbers.
As for the discrepancy in the number of rows in the two tables, it is likely due to the fact that the insert statements in the random_records procedure are not being executed atomically.
This means that there is a chance that one of the insert statements could fail, resulting in fewer rows being inserted into one of the tables. To fix this, you can wrap the insert statements in a transaction to ensure that they are executed as a single unit of work.
For example, you can modify the random_records procedure as follows:
drop procedure if exists random_records;
truncate table t1;
truncate table t2;
delimiter $$
create procedure random_records(n int)
begin
set #i=1;
set #m=1000000;
start transaction;
while #i <= n do
insert into t1(c1,c2) values(rand()*#m,rand()*#m);
insert into t2(c1,c2) values(rand()*#m,rand()*#m);
set #i=#i+1;
end while;
commit;
end $$
delimiter ;
This should ensure that the insert statements are executed atomically and that the number of rows in both tables is consistent.

How to find the recently modified row in tables of a MySQL database?

I have a MySQL database (employee) consists of 5 tables. How to find the recently modified row by column name id in tables?
I tried to find the table using the coding below. It works fine.
USE
information_schema;
SELECT DISTINCT TABLE_NAME
FROM TABLES
WHERE
UPDATE_TIME IS NOT NULL AND UPDATE_TIME < NOW() AND TABLE_SCHEMA = 'employee'
Please help to find the row by column name id (all tables have this column name as identifier) in that table.
I think the work around for this problem is to create a trigger that will insert a record into a different table
CREATE TABLE `table_row_monitor` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`tbl` varchar(100) DEFAULT NULL,
`col` varchar(100) DEFAULT NULL,
`val` int,
`dtecreated` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`process_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
DROP TRIGGER IF EXISTS chk_pickup_contact_for_delivery;
DELIMITER $$
CREATE TRIGGER `[table_name]_log_row_insert` AFTER INSERT ON `[table_name]` FOR EACH ROW
BEGIN
INSERT INTO table_row_monitor (tbl, col, val)
SELECT '[table_name]', 'id', MAX(id) FROM `[table_name]`;
END$$
DELIMITER ;
just replace the [table_name]

MySQL create table if not exists and insert record only if table was created

I need to create a table and insert a first record only if the table was just newly created.
I do create the table with this statement:
CREATE TABLE IF NOT EXISTS tableName (
id int(9) NOT NULL,
col1 int(9) DEFAULT NULL,
col2 int(3) unsigned zerofill DEFAULT NULL,
PRIMARY KEY(id)
) ENGINE = InnoDB DEFAULT CHARSET = latin1;
How do I insert an first record only if the table was just created?
Combine the creation and insert into a single statement:
CREATE TABLE IF NOT EXISTS tableName (
id int(9) NOT NULL,
col1 int(9) DEFAULT NULL,
col2 int(3) unsigned zerofill DEFAULT NULL,
PRIMARY KEY(id)
) ENGINE = InnoDB DEFAULT CHARSET = latin1
AS SELECT 1 AS id, 10 AS col1, 5 AS col2;
If it doesn't create the table, AS SELECT ... clause is ignored.
That’s a good spot to use the INSERT IGNORE command rather than the INSERT command.
INSERT IGNORE INTO mytable (id, field1, field2) VALUES(1, 'foo', 'bar');
From the mysql documentation :
Errors that occur while executing the INSERT statement are ignored. For example, without IGNORE, a row that duplicates an existing UNIQUE index or PRIMARY KEY value in the table causes a duplicate-key error and the statement is aborted. With IGNORE, the row is discarded and no error occurs. Ignored errors generate warnings instead.

MySQL slow with large text fields in table

We're having a weird problem with MySQL (and also MariaDB). A simple database with 2 tables (InnoDB engine), both containing (among a few others) 3 or 4 text columns with XML data approx. 1-5kB in size.
Each table has around 40000 rows and no indexes except those for foreign keys.
The weird part is running select statements. The XML columns are NOT used anywhere inside select statement (select, where, order, group, ...), yet they slow down execution. If those columns are null, select statement executes in less than 2 second, but if they contain data, execution time jumps to around 20 seconds. Why is that?!
This is a script that generates an example behaving like described above:
CREATE TABLE tableA (
id bigint(20) NOT NULL AUTO_INCREMENT,
col1 bigint(20) NULL,
col2 bigint(20) NULL,
date1 datetime NULL,
largeString1 text NULL,
largeString2 text NULL,
largeString3 text NULL,
largeString4 text NULL,
PRIMARY KEY (id)
) DEFAULT CHARSET=utf8;
CREATE TABLE tableB (
id bigint(20) NOT NULL AUTO_INCREMENT,
col1 bigint(20) NULL,
col2 varchar(45) NULL,
largeString1 text NULL,
largeString2 datetime NULL,
largeString3 text NULL,
PRIMARY KEY (id)
) DEFAULT CHARSET=utf8;
fillTables:
DELIMITER ;;
CREATE PROCEDURE `fillTables`(
numRows INT
)
BEGIN
DECLARE i INT;
DECLARE j INT;
DECLARE largeString TEXT;
SET i = 1;
START TRANSACTION;
WHILE i < numRows DO
SET j = 1;
SET largeString = '';
WHILE j <= 100 DO
SET largeString = CONCAT(largeString, (SELECT UUID()));
SET j = j + 1;
END WHILE;
INSERT INTO tableA (id, col1, col2, date1, largeString1,
largeString2, largeString3, largeString4)
VALUES (i, FLOOR(1 + RAND() * 2), numRows - i,
date_sub(now(), INTERVAL i hour),
largeString, largeString, largeString, largeString);
INSERT INTO tableB (id, col1, col2, largeString1,
largeString2, largeString3)
VALUES (numRows - i, i, (SELECT UUID()),
largeString, largeString, largeString);
SET i = i + 1;
END WHILE;
COMMIT;
ALTER TABLE tableA ADD FOREIGN KEY (col2) REFERENCES tableB(id);
CREATE INDEX idx_FK_tableA_tableB ON tableA(col2);
ALTER TABLE tableB ADD FOREIGN KEY (col1) REFERENCES tableA(id);
CREATE INDEX idx_FK_tableB_tableA ON tableB(col1);
END ;;
test
CREATE PROCEDURE `test`(
_param1 bigint
,_dateFrom datetime
,_dateTo datetime
)
BEGIN
SELECT
a.id
,DATE(a.date1) as date
,COALESCE(b2.col2, '') as guid
,COUNT(*) as count
FROM
tableA a
LEFT JOIN tableB b1 ON b1.col1 = a.id
LEFT JOIN tableB b2 ON b2.id = a.col2
WHERE
a.col1 = _param1
AND (_dateFrom IS NULL OR DATE(a.date1) BETWEEN DATE(_dateFrom) AND DATE(_dateTo))
GROUP BY
a.id
,DATE(a.date1)
,b2.col2
;
END;;
DELIMITER ;
To populate the tables with random data use
call fillTables(40000);
Stored procedure used for retrieving data:
call test(2, null, null);
Also, MSSQL executes the select statement in a fraction of a second without any table optimization (even without foreign keys defined).
UPDATE:
SHOW CREATE TABLE for both tables:
'CREATE TABLE `tableA` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`col1` bigint(20) DEFAULT NULL,
`col2` bigint(20) DEFAULT NULL,
`date1` datetime DEFAULT NULL,
`largeString1` text,
`largeString2` text,
`largeString3` text,
`largeString4` text,
PRIMARY KEY (`id`),
KEY `idx_FK_tableA_tableB` (`col2`),
CONSTRAINT `tableA_ibfk_1` FOREIGN KEY (`col2`) REFERENCES `tableB` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=40000 DEFAULT CHARSET=utf8'
'CREATE TABLE `tableB` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`col1` bigint(20) DEFAULT NULL,
`col2` varchar(45) DEFAULT NULL,
`largeString1` text,
`largeString2` datetime DEFAULT NULL,
`largeString3` text,
PRIMARY KEY (`id`),
KEY `idx_FK_tableB_tableA` (`col1`),
CONSTRAINT `tableB_ibfk_1` FOREIGN KEY (`col1`) REFERENCES `tableA` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=40000 DEFAULT CHARSET=utf8'
Both tables need INDEX(col1). Without it, these need table scans:
WHERE a.col1 = _param1
ON b1.col1 = a.id
For a this would be 'covering', hence faster:
INDEX(col1, date1, id, col2)
Don't use LEFT unless you need it.
Try not to hide columns in functions; it prevents using indexes for them:
DATE(a.date1) BETWEEN ...
This might work for that:
a.date1 >= DATE(_dateFrom)
AND a.date1 < DATE(_dateTo) + INTERVAL 1 DAY
As for the mystery of 20s vs 2s -- Did you run each timing test twice? The first time is often bogged down with I/O; the second is memory-bound.
ROW_FORMAT
In InnoDB there are 4 ROW_FORMATs; they mostly differ in how they handle big strings (TEXT, BLOB, etc). You mentioned that the query ran faster with NULL strings than with non-null strings. With the default ROW_FORMAT, some or all of the XML strings is stored with the rest of the columns. After some limit, the rest is put in another block(s).
If a large field is NULL, then it takes almost no space.
With ROW_FORMAT=DYNAMIC (see CREATE TABLE and ALTER TABLE), a non-null column will tend to be pushed to other blocks instead of making the main part of the record bulky.
This has the effect of allowing more rows to fit in a single block (except for the overflow). That, in turn, allows certain queries to run faster since they can get more information with fewer I/Os.
Read the documentation, I think you need these:
SET GLOBAL innodb_file_format=Barracuda;
SET GLOBAL innodb_file_per_table=1;
ALTER TABLE tbl ROW_FORMAT=DYNAMIC;
In reading the documentation, you will run across COMPRESSED. Although this would shrink the XML by perhaps 3:1, there are other issues. I don't know whether it would end up being better or not.
Buffer pool
innodb_buffer_pool_size should be about 70% of available RAM.

How to optimize this simple JOIN+ORDER BY query?

I have two mysql tables:
/* Table users */
CREATE TABLE IF NOT EXISTS `users` (
`Id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`DateRegistered` datetime NOT NULL,
PRIMARY KEY (`Id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
/* Table statistics_user */
CREATE TABLE IF NOT EXISTS `statistics_user` (
`UserId` int(10) unsigned NOT NULL AUTO_INCREMENT,
`Sent_Views` int(10) unsigned NOT NULL DEFAULT '0',
`Sent_Winks` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`UserId`),
CONSTRAINT `statistics_user_ibfk_1` FOREIGN KEY (`UserId`) REFERENCES `users` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Both tables are populated with 10.000 random rows for testing by using the following procedure:
DELIMITER //
CREATE DEFINER=`root`#`localhost` PROCEDURE `FillUsersStatistics`(IN `cnt` INT)
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE dt DATE;
DECLARE Winks INT DEFAULT 1;
DECLARE Views INT DEFAULT 1;
WHILE (i<=cnt) DO
SET dt = str_to_date(concat(floor(1 + rand() * (9-1)),'-',floor(1 + rand() * (28 -1)),'-','2011'),'%m-%d-%Y');
INSERT INTO users (Id, DateRegistered) VALUES(i, dt);
SET Winks = floor(1 + rand() * (30-1));
SET Views = floor(1 + rand() * (30-1));
INSERT INTO statistics_user (UserId, Sent_Winks, Sent_Views) VALUES (i, Winks, Views);
SET i=i+1;
END WHILE;
END//
DELIMITER ;
CALL `FillUsersStatistics`(10000);
The problem:
When I run the EXPLAIN for this query:
SELECT
t1.Id, (Sent_Views + Sent_Winks) / DATEDIFF(NOW(), t1.DateRegistered) as Score
FROM users t1
JOIN statistics_user t2 ON t2.UserId = t1.Id
ORDER BY Score DESC
.. I get this explain:
Id select_type table type possible_keys key key_len ref rows extra
1 SIMPLE t1 ALL PRIMARY (NULL) (NULL) (NULL) 10037 Using temporary; Using filesort
1 SIMPLE t2 eq_ref PRIMARY PRIMARY 4 test2.t2.UserId 1
The above query gets very slow when both tables have more than 500K rows. I guess it's because of the 'Using temporary; Using filesort' in the explain of the query.
How can the above query be optimized so that it runs faster?
I'm faily sure that the ORDER BY is what's killing you, since it cannot be properly indexed. Here is a workable, if not particularly pretty, solution.
First, let's say you have a column named Score for storing a user's current score. Every time a user's Sent_Views or Sent_Winks changes, modify the Score column to match. This could probably be done with a trigger (my experience with triggers is limited), or definitely done in the same code that updates the Sent_Views and Sent_Winks fields. This change wouldn't need to know the DATEDIFF portion, because it could just divide by the old sum of Sent_Views + Sent_Winks and multiply by the new one.
Now you just need to change the Score column once per day (if you're not picky about the precise number of hours a user has been registered). This could be done with a script run by a cron job.
Then, just index the Score column and SELECT away!
Note: edited to remove incorrect first attempt.
I'm offering my comment as answer:
Establish a future date, far enough to not interfere with your application, say the year 5000. Replace the current date with this future date in your score calculation. The score computation is now for all intents and purposes absolute, and can be computed when updating winks and views (through a stored rocedure or atrigger (does mysql have triggers?)).
Add a score column to your statistics_user table to store the computed score and define an index on it.
Your SQL can be rewritten as:
SELECT
UserId, score
FROM
statistics_user
ORDER BY score DESC
If you need the real score, it is easily computed with just a constant multiplication which could be done afterwards if it interferse with mysql index selection.
Shouldn't you have indexed DateRegistered in Users?
You should try an inner join, rather than a cartesian product, the next thing you can do is partitioning according to date_registered.