Can some one explain why this query with IN clause over 5000 record are too slow?
Table strucuture
CREATE TABLE IF NOT EXISTS `wp_transactions_log` (
`sync_sequence` bigint(20) unsigned NOT NULL COMMENT 'the sequence number of the sync process/operation that this transaction belong to ',
`objectid` varchar(100) NOT NULL COMMENT 'the entity/record id',
`wp_id` bigint(20) unsigned NOT NULL,
`table_name` varchar(100) NOT NULL COMMENT 'the target wordpress table name this transaction occured/fail for some reason',
`logical_table_name` varchar(100) NOT NULL,
`operation` varchar(20) NOT NULL COMMENT 'inser/update/delete',
`status` varchar(20) NOT NULL COMMENT 'status of the transaction: success,fail',
`fail_count` int(10) unsigned NOT NULL COMMENT 'how many this transaction failed',
`fail_description` text NOT NULL COMMENT 'a description of the failure',
`createdon` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`sync_sequence`,`objectid`,`table_name`,`operation`,`wp_id`),
KEY `objectid` (`objectid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
This table contain 5k record.
The query :
SELECT wp_id,objectId FROM wp_transactions_log WHERE `operation` = "insert" AND `wp_id` != 0 AND `status` != "ignore" AND `table_name` ='itg_wpclass_dates' AND objectId IN (... 5k record)
even this query are same:
SELECT wp_id,objectId FROM wp_transactions_log WHERE objectId IN (5k record)
Note: all the parameters in the IN clauses are themselves the same in the table rows.
I mean by slow it takes more than 15 Sec.
objectid is not indexed. Composite primary key is indexed only. Add index on objectid and then try.
ALTER TABLE wp_transactions_log ADD INDEX (objectid);
Although if you have huge data, then adding index will lock your metadata, use INPLACE algorithm to do it with minimum lock contention.
Also, before youe select statement, just add Explain and provide us the response. It will be a good metrics to identify issue in your table.
The query are fast it take to 200ms to exectue, but the time for processing the query and retrieving the data are the long. I think there's no way to reduce this time.
Related
I'm trying to denormalize a few MySQL tables I have into a new table that I can use to speed up some complex queries with lots of business logic. The problem that I'm having is that there are 2.3 million records I need to add to the new table and to do that I need to pull data from several tables and do a few conversions too. Here's my query (with names changed)
INSERT INTO database_name.log_set_logs
(offload_date, vehicle, jurisdiction, baselog_path, path,
baselog_index_guid, new_location, log_set_name, index_guid)
(
select STR_TO_DATE(logset_logs.offload_date, '%Y.%m.%d') as offload_date,
logset_logs.vehicle, jurisdiction, baselog_path, path,
baselog_trees.baselog_index_guid, new_location, logset_logs.log_set_name,
logset_logs.index_guid
from
(
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 7), '/', -1) as offload_date,
SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 8), '/', -1) as vehicle,
SUBSTRING_INDEX(path, '/', 9) as baselog_path, index_guid,
path, log_set_name
FROM database_name.baselog_and_amendment_guid_to_path_mappings
) logset_logs
left join database_name.log_trees baselog_trees
ON baselog_trees.original_location = logset_logs.baselog_path
left join database_name.baselog_offload_location location
ON location.baselog_index_guid = baselog_trees.baselog_index_guid);
The query itself works because I was able to run it using a filter on log_set_name however that filter's condition will only work for less than 1% of the total records because one of the values for log_set_name has 2.2 million records in it which is the majority of the records. So there is nothing else I can use to break this query up into smaller chunks from what I can see. The problem is that the query is taking too long to run on the rest of the 2.2 million records and it ends up timing out after a few hours and then the transaction is rolled back and nothing is added to the new table for the 2.2 million records; only the 0.1 million records were able to be processed and that was because I could add a filter that said where log_set_name != 'value with the 2.2 million records'.
Is there a way to make this query more performant? Am I trying to do too many joins at once and perhaps I should populate the row's columns in their own individual queries? Or is there some way I can page this type of query so that MySQL executes it in batches? I already got rid of all my indexes on the log_set_logs table because I read that those will slow down inserts. I also jacked my RDS instance up to a db.r4.4xlarge write node. I am also using MySQL Workbench so I increased all of it's timeout values to their maximums giving them all nines. All three of these steps helped and were necessary in order for me to get the 1% of the records into the new table but it still wasn't enough to get the 2.2 million records without timing out. Appreciate any insights as I'm not adept to this type of bulk insert from a select.
'CREATE TABLE `log_set_logs` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`purged` tinyint(1) NOT NULL DEFAUL,
`baselog_path` text,
`baselog_index_guid` varchar(36) DEFAULT NULL,
`new_location` text,
`offload_date` date NOT NULL,
`jurisdiction` varchar(20) DEFAULT NULL,
`vehicle` varchar(20) DEFAULT NULL,
`index_guid` varchar(36) NOT NULL,
`path` text NOT NULL,
`log_set_name` varchar(60) NOT NULL,
`protected_by_retention_condition_1` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_2` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_3` tinyint(1) NOT NULL DEFAULT ''1'',
`protected_by_retention_condition_4` tinyint(1) NOT NULL DEFAULT ''1'',
`general_comments_about_this_log` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1736707 DEFAULT CHARSET=latin1'
'CREATE TABLE `baselog_and_amendment_guid_to_path_mappings` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`path` text NOT NULL,
`index_guid` varchar(36) NOT NULL,
`log_set_name` varchar(60) NOT NULL,
PRIMARY KEY (`id`),
KEY `log_set_name_index` (`log_set_name`),
KEY `path_index` (`path`(42))
) ENGINE=InnoDB AUTO_INCREMENT=2387821 DEFAULT CHARSET=latin1'
...
'CREATE TABLE `baselog_offload_location` (
`baselog_index_guid` varchar(36) NOT NULL,
`jurisdiction` varchar(20) NOT NULL,
KEY `baselog_index` (`baselog_index_guid`),
KEY `jurisdiction` (`jurisdiction`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1'
'CREATE TABLE `log_trees` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`baselog_index_guid` varchar(36) DEFAULT NULL,
`original_location` text NOT NULL, -- This is what I have to join everything on and since it's text I cannot index it and the largest value is above 255 characters so I cannot change it to a vachar then index it either.
`new_location` text,
`distcp_returncode` int(11) DEFAULT NULL,
`distcp_job_id` text,
`distcp_stdout` text,
`distcp_stderr` text,
`validation_attempt` int(11) NOT NULL DEFAULT ''0'',
`validation_result` tinyint(1) NOT NULL DEFAULT ''0'',
`archived` tinyint(1) NOT NULL DEFAULT ''0'',
`archived_at` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`dir_exists` tinyint(1) NOT NULL DEFAULT ''0'',
`random_guid` tinyint(1) NOT NULL DEFAULT ''0'',
`offload_date` date NOT NULL,
`vehicle` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `baselog_index_guid` (`baselog_index_guid`)
) ENGINE=InnoDB AUTO_INCREMENT=1028617 DEFAULT CHARSET=latin1'
baselog_offload_location has not PRIMARY KEY; what's up?
GUIDs/UUIDs can be terribly inefficient. A partial solution is to convert them to BINARY(16) to shrink them. More details here: http://localhost/rjweb/mysql/doc.php/uuid ; (MySQL 8.0 has similar functions.)
It would probably be more efficient if you have a separate (optionally redundant) column for vehicle rather than needing to do
SUBSTRING_INDEX(SUBSTRING_INDEX(path, '/', 8), '/', -1) as vehicle
Why JOIN baselog_offload_location? Three seems to be no reference to columns in that table. If there, be sure to qualify them so we know what is where. Preferably use short aliases.
The lack of an index on baselog_index_guid may be critical to performance.
Please provide EXPLAIN SELECT ... for the SELECT in your INSERT and for the original (slow) query.
SELECT MAX(LENGTH(original_location)) FROM .. -- to see if it really is too big to index. What version of MySQL are you using? The limit increased recently.
For the above item, we can talk about having a 'hash'.
"paging the query". I call it "chunking". See http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks . That talks about deleting, but it can be adapted to INSERT .. SELECT since you want to "chunk" the select. If you go with chunking, Javier's comment becomes moot. Your code would be chunking the selects, hence batching the inserts:
Loop:
INSERT .. SELECT .. -- of up to 1000 rows (see link)
End loop
I'm encountering an unexpected situation with inserting/querying particular records in the following table during periods of high contention. I believe there is a race condition in the database.
CREATE TABLE `business_objects` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`obj_id` varchar(255) DEFAULT NULL,
`obj_type` varchar(255) DEFAULT NULL,
`created_at` datetime(6) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `index_business_objects_on_obj_type_and_obj_id`
(`obj_type`,`obj_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
The steps to reproduce are:
Check if record exists in table using this query
SELECT *
FROM business_objects
WHERE obj_type = 'Tip'
AND obj_id = '7616904'
If this query is null, attempt to create
INSERT INTO business_objects (obj_type, obj_id)
VALUES ('Tip', '7616904')
If another thread has already created a record with the same attributes this insert will fail and raise a uniqueness MySQL error. In this scenario, I catch the error and run the same query in step 1 to get the record.
SELECT *
FROM business_objects
WHERE obj_type = 'Tip'
AND obj_id = '7616904'
The query returns an empty result.
My expectation is that if the index uniqueness constraint is violated than the record should be committed to the table. What am I missing?
I have a first table containing my ips stored as integer (500k rows), and a second one containing ranges of black listed ips and the reason of black listing (10M rows)
here is the table structure :
CREATE TABLE `black_lists` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`ip_start` INT(11) UNSIGNED NOT NULL,
`ip_end` INT(11) UNSIGNED NULL DEFAULT NULL,
`reason` VARCHAR(3) NOT NULL,
`excluded` TINYINT(1) NULL DEFAULT NULL,
PRIMARY KEY (`id`),
INDEX `ip_range` (`ip_end`, `ip_start`),
INDEX `ip_start` ( `ip_start`),
INDEX `ip_end` (`ip_end`),
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB
AUTO_INCREMENT=10747741
;
CREATE TABLE `ips` (
`id` INT(11) NOT NULL AUTO_INCREMENT COMMENT 'Id ips',
`idhost` INT(11) NOT NULL COMMENT 'Id Host',
`ip` VARCHAR(45) NULL DEFAULT NULL COMMENT 'Ip',
`ipint` INT(11) UNSIGNED NULL DEFAULT NULL COMMENT 'Int ip',
`type` VARCHAR(45) NULL DEFAULT NULL COMMENT 'Type',
PRIMARY KEY (`id`),
INDEX `host` (`idhost`),
INDEX `index3` (`ip`),
INDEX `index4` (`idhost`, `ip`),
INDEX `ipsin` (`ipint`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB
AUTO_INCREMENT=675651;
my problem is when I try to run this query no index is used and it takes an eternity to finish :
select i.ip,s1.reason
from ips i
left join black_lists s1 on i.ipint BETWEEN s1.ip_start and s1.ip_end;
I'm using MariaDB 10.0.16
True.
The optimizer has no knowledge that start..end values are non overlapping, nor anything else obvious about them. So, the best it can do is decide between
s1.ip_start <= i.ipint -- and use INDEX(ip_start), or
s1.ip_end >= i.ipint -- and use INDEX(ip_end)
Either of those could result in upwards of half the table being scanned.
In 2 steps you could achieve the desired goal for one ip; let's say #ip:
SELECT ip_start, reason
FROM black_lists
WHERE ip_start <= #ip
ORDER BY ip_start DESC
LIMIT 1
But after that, you need to see if the ip_end corresponding to that ip_start is <= #ip before deciding whether you have a black-listed item.
SELECT reason
FROM ( ... ) a -- fill in the above query
JOIN black_lists b USING(ip_start)
WHERE b.ip_end <= #ip
That will either return the reason or no rows.
In spite of the complexity, it will be very fast. But, you seem to have a set of IPs to check. That makes it more complex.
For black_lists, there seems to be no need for id. Suggest you replace the 4 indexes with only 2:
PRIMARY KEY(ip_start, ip_end),
INDEX(ip_end)
In ips, isn't ip unique? If so, get rid if id and change 5 indexes to 3:
PRIMARY KEY(idint),
INDEX(host, ip),
INDEX(ip)
You have allowed more than enough in the VARCHAR for IPv6, but not in INT UNSIGNED.
More discussion.
I am in a problematic situation and found dozens of questions on same topic, but may b i am not able to understand those solutions as per my issue.
I have a system built in Codeigniter, and it does the following
codeigniter->start_transaction()
UPDATE T SET A = 1, MODIFIED = NOW()
WHERE PK IN
( SELECT PK FROM
(SELECT PK, LAST_INSERT_ID(PK) FROM T
where FK = 31 AND A=0 AND R=1 AND R_FK = 21
AND DEAD = 0 LIMIT 0,1) AS TBL1
) and A=0 AND R = 1 AND R_FK = 21 AND DEAD = 0
-- what this query does is , it takes a row dynamically which is not dead yet,
--and not assigned and it's linked to 21 id (R_FK) from R table,
-- when finds the row, update it to be marked as assigned (A=1).
-- PK = LAST_INSERT_ID(PK) ensures that last_insert_id is updated with this row id, so i can retrieve it from PHP
GOTO MODULE B
MODULE B {
INSERT INTO T(A,B,C,D,E,F,R,FK,R_FK,DEAD,MODIFIED) VALUES(ALL VALUES)
-- this line gives me lock wait timeout exceeded.
}
MySQL version is 5.1.63-community-log
Table T is an INNODB table and has only one normal type index on FK field, and no foreign key constraints are there. PrimaryKey (PK) field is an auto_increment field.
I get lock wait timeout in the above case , and that is due to first transactional update holding lock on table, how can i avoid lock on table with that update query ,while using transactions, I cannot commit the transaction until i receive response from MODULE B .
I don't have much detailed knowledge about DB and structural things, so please bear with me if i said something not making sense.
--UPDATE--
-- TABLE T Structure
CREATE TABLE `T` (
`PK` int(11) NOT NULL AUTO_INCREMENT,
`FK` int(11) DEFAULT NULL,
`P` varchar(1024) DEFAULT NULL,
`DEAD` tinyint(1) NOT NULL DEFAULT '0',
`A` tinyint(1) NOT NULL DEFAULT '0',
`MODIFIED` datetime DEFAULT NULL,
`R` tinyint(4) NOT NULL DEFAULT '0',
`R_FK` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`PK`),
KEY `FK_REFERENCE_54` (`FK`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
-- Indexes Information
SHOW INDEX FROM T;
1- Field FK, Cardinality 65 , NULL => Yes , Index_Type => BTRee
2- Field PK, Cardinality 11153, Index_Type => BTRee
I need to optimize indexes in a table that stores more than 10 Millions rows. The query that is particularly time consuming takes up to 10 seconds to load (when WHERE clause filters only about 2 Millions rows - 8 Millions must be grouped). I have created a few indexes (some of them are complex, some simpler) and tried to find out how to speed this up. Perhaps I'm doing something wrong. MySQL is using optimized_5 index (based on EXPLAIN).
Here is the table's structure and the query:
CREATE TABLE IF NOT EXISTS `geo_reverse` (
`fid` mediumint(8) unsigned NOT NULL,
`tablename` enum('table1','table2') NOT NULL default 'table1',
`geo_continent` varchar(2) NOT NULL,
`geo_country` varchar(2) NOT NULL,
`geo_region` varchar(8) NOT NULL,
`geo_city` mediumint(8) unsigned NOT NULL,
`type` varchar(30) NOT NULL,
PRIMARY KEY (`fid`,`tablename`,`geo_continent`,`geo_country`,`geo_region`,`geo_city`),
KEY `geo_city` (`geo_city`),
KEY `fid` (`fid`),
KEY `geo_region` (`geo_region`,`geo_city`),
KEY `optimized` (`tablename`,`type`,`geo_continent`,`geo_country`,`geo_region`,`geo_city`,`fid`),
KEY `optimized_2` (`fid`,`tablename`),
KEY `optimized_3` (`type`,`geo_city`),
KEY `optimized_4` (`geo_city`,`tablename`),
KEY `optimized_5` (`tablename`,`type`,`geo_city`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
An example query:
SELECT type, COUNT(*) AS objects FROM geo_reverse WHERE tablename = 'table1' AND geo_city IN (5847207,5112771,4916894,...) GROUP BY type
Do you have any idea of how to speed the computation up?
i would use the following index: (geo_city, tablename, type) - geo_city is obviously more selective than tablename, thus it should be on the left. After the condition is applied, the rest should be sorted by type for grouping.