Improve performance of query with large table? - mysql

I have a large table named 'roomlogs' which has nearly 1 million entries.
The structure of the table:
id --> PK
roomId --> varchar FK to rooms table
userId --> varchar FK to users table
enterTime --> Date and Time
exitTime --> Date and Time
status --> bool
I have the previous indexing on roomID, I recently added an index on the userId column.
So, When I run a stored procedure with following code it is taking more time like on average 50 seconds. WHich it should not take.
DELIMITER ;;
CREATE DEFINER=`root`#`%` PROCEDURE `enter_room`(IN pRoomId varchar(200), IN puserId varchar(50), IN ptime datetime, IN phidden int, pcheckid int, pexit datetime)
begin
update roomlogs set
roomlogs.exitTime = ptime,
roomlogs.`status` = 1
where
roomlogs.userId = puserId
and roomlogs.`status` = 0
and DATEDIFF(ptime,roomlogs.enterTime) = 0;
INSERT into roomlogs
( roomlogs.roomId,
roomlogs.userId,
roomlogs.enterTime,
roomlogs.exitTime,
roomlogs.hidden,
roomlogs.checkinId )
value
( pRoomId,
userId,
ptime,
pexit,
phidden,
pcheckid);
select *
from
roomlogs
where
roomlogs.id= LAST_INSERT_ID();
end ;;
DELIMITER ;
What Can be the reason for it to take this much time:
I added an index recently so previous rows are not indexed.
There is no selection on storage type for any indexes right now. Should I change it to B-tree?
On my website, I get 20-30 simultaneous call on other procedures also while this procedure has 10-20 simultaneous calls, does the update query in the procedure make a lock? But in MySQL.slow_logs table for each query the lock _time shows 0.
Is there any other reason for this behaviour?
Edit: Here is the SHOW TABLE:
CREATE TABLE `roomlogs` (
`roomId` varchar(200) CHARACTER SET latin1 DEFAULT NULL,
`userID` varchar(50) CHARACTER SET latin1 DEFAULT NULL,
`enterTime` datetime DEFAULT NULL,
`exitTime` datetime DEFAULT NULL,
`id` int(11) NOT NULL AUTO_INCREMENT,
`status` int(11) DEFAULT '0',
`hidden` int(11) DEFAULT '0',
`checkinId` int(11) DEFAULT '-1',
PRIMARY KEY (`id`),
KEY `RoomLogIndex` (`roomId`),
KEY `RoomLogIDIndex` (`id`),
KEY `USERID` (`userID`)
) ENGINE=InnoDB AUTO_INCREMENT=1064216 DEFAULT CHARSET=utf8
I can also see that this query is running more number of times like 100000 times per day (nearly continuously).
SELECT count(*) from roomlogs where roomId=proomId and status='0';
Because of this query reads from the same table, does InnoDB block or create a lock on update query because I can see that when the above-stored procedure is running more number of times then this query is taking more time.
Here is the link for MySQL variables: https://docs.google.com/document/d/17_MVaU4yvpQfVDT83yhSjkLHsgYd-z2mg6X7GwvYZGE/edit?usp=sharing

roomlogs needs this 'composite' index:
INDEX(userId, `status`, enterTime)
I added an index recently so previous rows are not indexed.
Not true. Adding an INDEX indexes the entire table.
The default index type is BTree; no need to explicitly specify it.
does the update query in the procedure make a lock?
It does some form of locking. What is the value of autocommit? Do you explicitly use BEGIN and COMMIT? Is the table ENGINE=InnoDB? Please provide SHOW CREATE TABLE.
MySQL.slow_logs table for each query the lock _time shows 0.
The INSERT you show seems to be inserting the same row as the UPDATE. Maybe you need INSERT ... ON DUPLICATE KEY UPDATE ...?
Don't "hide an index column in a function"; instead of DATEDIFF(roomlogs.enterTime,NOW()) = 0, do
AND enterTime >= CURDATE()
AND enterTime < CURDATE() + INTERVAL 1 DAY
This allows the index to be used more fully.
KEY `RoomLogIndex` (`roomId`), Change to (roomId, status)
KEY `RoomLogIDIndex` (`id`), Remove, redundant with the PK
Buffer pool in only 97,517,568 -- make it more like 9G.

Related

How to optimize an UPDATE and JOIN query on practically identical tables?

I am trying to update one table based on another in the most efficient way.
Here is the table DDL of what I am trying to update
Table1
CREATE TABLE `customersPrimary` (
`id` int NOT NULL AUTO_INCREMENT,
`groupID` int NOT NULL,
`IDInGroup` int NOT NULL,
`name` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`address` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `groupID-IDInGroup` (`groupID`,`IDInGroup`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Table2
CREATE TABLE `customersSecondary` (
`groupID` int NOT NULL,
`IDInGroup` int NOT NULL,
`name` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`address` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`groupID`,`IDInGroup`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Both the tables are practically identical but customersSecondary table is a staging table for the other by design. The big difference is primary keys. Table 1 has an auto incrementing primary key, table 2 has a composite primary key.
In both tables the combination of groupID and IDInGroup are unique.
Here is the query I want to optimize
UPDATE customersPrimary
INNER JOIN customersSecondary ON
(customersPrimary.groupID = customersSecondary.groupID
AND customersPrimary.IDInGroup = customersSecondary.IDInGroup)
SET
customersPrimary.name = customersSecondary.name,
customersPrimary.address = customersSecondary.address
This query works but scans EVERY row in customersSecondary.
Adding
WHERE customersPrimary.groupID = (groupID)
Cuts it down significantly to the number of rows with the GroupID in customersSecondary. But this is still often far larger than the number of rows being updated since the groupID can be large. I think the WHERE needs improvement.
I can control table structure and add indexes. I will have to keep both tables.
Any suggestions would be helpful.
Your existing query requires a full table scan because you are saying update everything on the left based on the value on the right. Presumably the optimiser is choosing customersSecondary because it has fewer rows, or at least it thinks it has.
Is the full table scan causing you problems? Locking? Too slow? How long does it take? How frequently are the tables synced? How many records are there in each table? What is the rate of change in each of the tables?
You could add separate indices on name and address but that will take a good chunk of space. The better option is going to be to add an indexed updatedAt column and use that to track which records have been changed.
ALTER TABLE `customersPrimary`
ADD COLUMN `updatedAt` DATETIME NOT NULL DEFAULT '2000-01-01 00:00:00',
ADD INDEX `idx_customer_primary_updated` (`updatedAt`);
ALTER TABLE `customersSecondary`
ADD COLUMN `updatedAt` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
ADD INDEX `idx_customer_secondary_updated` (`updatedAt`);
And then you can add updatedAt to your join criteria and the WHERE clause -
UPDATE customersPrimary cp
INNER JOIN customersSecondary cs
ON cp.groupID = cs.groupID
AND cp.IDInGroup = cs.IDInGroup
AND cp.updatedAt < cs.updatedAt
SET
cp.name = cs.name,
cp.address = cs.address,
cp.updatedAt = cs.updatedAt
WHERE cs.updatedAt > :last_query_run_time;
For :last_query_run_time you could use the last run time if you are storing it. Otherwise, if you know you are running the query every hour you could use NOW() - INTERVAL 65 MINUTE. Notice I have used more than one hour to make sure records aren't missed if there is a slight delay for some reason. Another option would be to use SELECT MAX(updatedAt) FROM customersPrimary -
UPDATE customersPrimary cp
INNER JOIN (SELECT MAX(updatedAt) maxUpdatedAt FROM customersPrimary) t
INNER JOIN customersSecondary cs
ON cp.groupID = cs.groupID
AND cp.IDInGroup = cs.IDInGroup
AND cp.updatedAt < cs.updatedAt
SET
cp.name = cs.name,
cp.address = cs.address,
cp.updatedAt = cs.updatedAt
WHERE cs.updatedAt > t.maxUpdatedAt;
Plan A:
Something like this would first find the "new" rows, then add only those:
UPDATE primary
SET ...
JOIN ( SELECT ...
FROM secondary
LEFT JOIN primary
WHERE primary... IS NULL )
ON ...
Might secondary have changes? If so, a variant of that would work.
Plan B:
Better yet is to TRUNCATE TABLE secondary after it is folded into primary.

optimize mysql count in 31M data [duplicate]

I have a largish but narrow InnoDB table with ~9m records. Doing count(*) or count(id) on the table is extremely slow (6+ seconds):
DROP TABLE IF EXISTS `perf2`;
CREATE TABLE `perf2` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`channel_id` int(11) DEFAULT NULL,
`timestamp` bigint(20) NOT NULL,
`value` double NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `ts_uniq` (`channel_id`,`timestamp`),
KEY `IDX_CHANNEL_ID` (`channel_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
RESET QUERY CACHE;
SELECT COUNT(*) FROM perf2;
While the statement is not run too often it would be nice to optimize it. According to http://www.cloudspace.com/blog/2009/08/06/fast-mysql-innodb-count-really-fast/ this should be possible by forcing InnoDB to use an index:
SELECT COUNT(id) FROM perf2 USE INDEX (PRIMARY);
The explain plan seems fine:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE perf2 index NULL PRIMARY 4 NULL 8906459 Using index
Unfortunately the statement is as slow as before. According to "SELECT COUNT(*)" is slow, even with where clause I've also tried optimizing the table without success.
What/is the/re a way to optimize COUNT(*) performance on InnoDB?
As of MySQL 5.1.6 you can use the Event Scheduler and insert the count to a stats table regularly.
First create a table to hold the count:
CREATE TABLE stats (
`key` varchar(50) NOT NULL PRIMARY KEY,
`value` varchar(100) NOT NULL);
Then create an event to update the table:
CREATE EVENT update_stats
ON SCHEDULE
EVERY 5 MINUTE
DO
INSERT INTO stats (`key`, `value`)
VALUES ('data_count', (select count(id) from data))
ON DUPLICATE KEY UPDATE value=VALUES(value);
It's not perfect but it offers a self contained solution (no cronjob or queue) that can be easily tailored to run as often as the required freshness of the count.
For the time being I've solved the problem by using this approximation:
EXPLAIN SELECT COUNT(id) FROM data USE INDEX (PRIMARY)
The approximate number of rows can be read from the rows column of the explain plan when using InnoDB as shown above. When using MyISAM this will remain EMPTY as the table reference isbeing optimized away- so if empty fallback to traditional SELECT COUNT instead.
Based on #Che code, you can also use triggers on INSERT and on UPDATE to perf2 in order to keep the value in stats table up to date in realtime.
CREATE TABLE stats (
`key` varchar(50) NOT NULL PRIMARY KEY,
`value` varchar(100) NOT NULL
);
Then:
CREATE TRIGGER `count_up` AFTER INSERT ON `perf2` FOR EACH ROW UPDATE `stats`
SET `stats`.`value` = `stats`.`value` + 1
WHERE `stats`.`key` = 'perf2_count';
CREATE TRIGGER `count_down` AFTER DELETE ON `perf2` FOR EACH ROW UPDATE `stats`
SET `stats`.`value` = `stats`.`value` - 1
WHERE `stats`.`key` = 'perf2_count';
So the number of rows in the perf2 table can be read using this query, in realtime:
SELECT `value` FROM `stats` WHERE `key` = 'perf2_count';
This would have the advantage of eliminating the performance issue of performing a COUNT(*) and would only be executed when data changes in perf2.

MySQL slow with large text fields in table

We're having a weird problem with MySQL (and also MariaDB). A simple database with 2 tables (InnoDB engine), both containing (among a few others) 3 or 4 text columns with XML data approx. 1-5kB in size.
Each table has around 40000 rows and no indexes except those for foreign keys.
The weird part is running select statements. The XML columns are NOT used anywhere inside select statement (select, where, order, group, ...), yet they slow down execution. If those columns are null, select statement executes in less than 2 second, but if they contain data, execution time jumps to around 20 seconds. Why is that?!
This is a script that generates an example behaving like described above:
CREATE TABLE tableA (
id bigint(20) NOT NULL AUTO_INCREMENT,
col1 bigint(20) NULL,
col2 bigint(20) NULL,
date1 datetime NULL,
largeString1 text NULL,
largeString2 text NULL,
largeString3 text NULL,
largeString4 text NULL,
PRIMARY KEY (id)
) DEFAULT CHARSET=utf8;
CREATE TABLE tableB (
id bigint(20) NOT NULL AUTO_INCREMENT,
col1 bigint(20) NULL,
col2 varchar(45) NULL,
largeString1 text NULL,
largeString2 datetime NULL,
largeString3 text NULL,
PRIMARY KEY (id)
) DEFAULT CHARSET=utf8;
fillTables:
DELIMITER ;;
CREATE PROCEDURE `fillTables`(
numRows INT
)
BEGIN
DECLARE i INT;
DECLARE j INT;
DECLARE largeString TEXT;
SET i = 1;
START TRANSACTION;
WHILE i < numRows DO
SET j = 1;
SET largeString = '';
WHILE j <= 100 DO
SET largeString = CONCAT(largeString, (SELECT UUID()));
SET j = j + 1;
END WHILE;
INSERT INTO tableA (id, col1, col2, date1, largeString1,
largeString2, largeString3, largeString4)
VALUES (i, FLOOR(1 + RAND() * 2), numRows - i,
date_sub(now(), INTERVAL i hour),
largeString, largeString, largeString, largeString);
INSERT INTO tableB (id, col1, col2, largeString1,
largeString2, largeString3)
VALUES (numRows - i, i, (SELECT UUID()),
largeString, largeString, largeString);
SET i = i + 1;
END WHILE;
COMMIT;
ALTER TABLE tableA ADD FOREIGN KEY (col2) REFERENCES tableB(id);
CREATE INDEX idx_FK_tableA_tableB ON tableA(col2);
ALTER TABLE tableB ADD FOREIGN KEY (col1) REFERENCES tableA(id);
CREATE INDEX idx_FK_tableB_tableA ON tableB(col1);
END ;;
test
CREATE PROCEDURE `test`(
_param1 bigint
,_dateFrom datetime
,_dateTo datetime
)
BEGIN
SELECT
a.id
,DATE(a.date1) as date
,COALESCE(b2.col2, '') as guid
,COUNT(*) as count
FROM
tableA a
LEFT JOIN tableB b1 ON b1.col1 = a.id
LEFT JOIN tableB b2 ON b2.id = a.col2
WHERE
a.col1 = _param1
AND (_dateFrom IS NULL OR DATE(a.date1) BETWEEN DATE(_dateFrom) AND DATE(_dateTo))
GROUP BY
a.id
,DATE(a.date1)
,b2.col2
;
END;;
DELIMITER ;
To populate the tables with random data use
call fillTables(40000);
Stored procedure used for retrieving data:
call test(2, null, null);
Also, MSSQL executes the select statement in a fraction of a second without any table optimization (even without foreign keys defined).
UPDATE:
SHOW CREATE TABLE for both tables:
'CREATE TABLE `tableA` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`col1` bigint(20) DEFAULT NULL,
`col2` bigint(20) DEFAULT NULL,
`date1` datetime DEFAULT NULL,
`largeString1` text,
`largeString2` text,
`largeString3` text,
`largeString4` text,
PRIMARY KEY (`id`),
KEY `idx_FK_tableA_tableB` (`col2`),
CONSTRAINT `tableA_ibfk_1` FOREIGN KEY (`col2`) REFERENCES `tableB` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=40000 DEFAULT CHARSET=utf8'
'CREATE TABLE `tableB` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`col1` bigint(20) DEFAULT NULL,
`col2` varchar(45) DEFAULT NULL,
`largeString1` text,
`largeString2` datetime DEFAULT NULL,
`largeString3` text,
PRIMARY KEY (`id`),
KEY `idx_FK_tableB_tableA` (`col1`),
CONSTRAINT `tableB_ibfk_1` FOREIGN KEY (`col1`) REFERENCES `tableA` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=40000 DEFAULT CHARSET=utf8'
Both tables need INDEX(col1). Without it, these need table scans:
WHERE a.col1 = _param1
ON b1.col1 = a.id
For a this would be 'covering', hence faster:
INDEX(col1, date1, id, col2)
Don't use LEFT unless you need it.
Try not to hide columns in functions; it prevents using indexes for them:
DATE(a.date1) BETWEEN ...
This might work for that:
a.date1 >= DATE(_dateFrom)
AND a.date1 < DATE(_dateTo) + INTERVAL 1 DAY
As for the mystery of 20s vs 2s -- Did you run each timing test twice? The first time is often bogged down with I/O; the second is memory-bound.
ROW_FORMAT
In InnoDB there are 4 ROW_FORMATs; they mostly differ in how they handle big strings (TEXT, BLOB, etc). You mentioned that the query ran faster with NULL strings than with non-null strings. With the default ROW_FORMAT, some or all of the XML strings is stored with the rest of the columns. After some limit, the rest is put in another block(s).
If a large field is NULL, then it takes almost no space.
With ROW_FORMAT=DYNAMIC (see CREATE TABLE and ALTER TABLE), a non-null column will tend to be pushed to other blocks instead of making the main part of the record bulky.
This has the effect of allowing more rows to fit in a single block (except for the overflow). That, in turn, allows certain queries to run faster since they can get more information with fewer I/Os.
Read the documentation, I think you need these:
SET GLOBAL innodb_file_format=Barracuda;
SET GLOBAL innodb_file_per_table=1;
ALTER TABLE tbl ROW_FORMAT=DYNAMIC;
In reading the documentation, you will run across COMPRESSED. Although this would shrink the XML by perhaps 3:1, there are other issues. I don't know whether it would end up being better or not.
Buffer pool
innodb_buffer_pool_size should be about 70% of available RAM.

MYSQL: Partitioning Table keeping id unique

We are using a table which has schema like following:-
CREATE TABLE `user_subscription` (
`ID` varchar(40) NOT NULL,
`COL1` varchar(40) NOT NULL,
`COL2` varchar(30) NOT NULL,
`COL3` datetime NOT NULL,
`COL4` datetime NOT NULL,
`ARCHIVE` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`ID`)
)
Now we wanted to do partition on column ARCHIVE. ARCHIVE can have only 2 values 0 or 1 and so 2 partitions.
Actually in our case, we are using partitioning as a Archival process. To do partition, we need to make ARCHIVE column as a part of primary key. But the problem here is that 2 rows can have same ID with different ARCHIVE column value. Actually thats not the main problem for us as 2 rows will be in different partitions. Problem is when we will update the archive column value of one of them to other to move one of the row to archive partition, then it will not allow us to update the entry giving "Duplicate Error".
Can somebody help in this regard?
Unfortunately,
A UNIQUE INDEX (or a PRIMARY KEY) must include all columns in the table's partitioning function
and since MySQL does not support check constraints either, the only ugly workaround I can think of is enforcing the uniqueness manually though triggers:
CREATE TABLE t (
id INT NOT NULL,
archived TINYINT(1) NOT NULL DEFAULT 0,
PRIMARY KEY (id, archived), -- required by MySQL limitation on partitioning
)
PARTITION BY LIST(archived) (
PARTITION pActive VALUES IN (0),
PARTITION pArchived VALUES IN (1)
);
CREATE TRIGGER tInsert
BEFORE INSERT ON t FOR EACH ROW
CALL checkUnique(NEW.id);
CREATE TRIGGER tUpdate
BEFORE UPDATE ON t FOR EACH ROW
CALL checkUnique(NEW.id);
DELIMITER //
CREATE PROCEDURE checkUnique(pId INT)
BEGIN
DECLARE flag INT;
DECLARE message VARCHAR(50);
SELECT id INTO flag FROM t WHERE id = pId;
IF flag IS NOT NULL THEN
-- the below tries to mimic the error raised
-- by a regular UNIQUE constraint violation
SET message = CONCAT("Duplicate entry '", pId, "'");
SIGNAL SQLSTATE "23000" SET
MYSQL_ERRNO = 1062,
MESSAGE_TEXT = message,
COLUMN_NAME = "id";
END IF;
END //
(fiddle)
MySQL's limitations on partitioning being such a downer (in particular its lack of support for foreign keys), I would advise against using it altogether until the table grows so large that it becomes an actual concern.

How to maintain a certain number of rows using triggers in MySQL?

For example I have a table which is used for logging. So very old data is useless and there are no reasons to leave it in the table. I want create a trigger which will delete old rows if number of existing rows more than 10 for example. What I already have:
CREATE TABLE log (
logId INT UNSIGNED NOT NULL AUTO_INCREMENT,
firstLogin DATETIME NOT NULL,
lastLogin DATETIME NOT NULL,
fingerprint VARCHAR(64) CHARACTER SET BINARY,
ip VARCHAR(24) NOT NULL,
accountId INT UNSIGNED NOT NULL,
FOREIGN KEY (accountId)
REFERENCES accounts (accountId)
ON UPDATE CASCADE ON DELETE CASCADE,
PRIMARY KEY (logId)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
DELIMITER |
CREATE TRIGGER logbeforeinsert BEFORE INSERT ON log
FOR EACH ROW
BEGIN
SET #rowcount = (SELECT COUNT(*) FROM log WHERE accountId = NEW.accountId);
IF #rowcount > 9 THEN
DELETE FROM log WHERE accountId = NEW.accountId LIMIT 1;
END IF;
END;
|
DELIMITER ;
But with this trigger inserting stopped at all after number of rows had reached 10.
Your trigger tries to write to the same table (DELETE is write access), that it is inserting into - this is not supported.
As you have a BEFORE INSERT trigger, failure of the trigger means failure of the INSERT.
You need either to trigger the delete from an operation, that does not write to log, or rethink your model.