MySQL : SELECT on big table takes a lot of time. Solutions? - mysql

my app get stuck for hours on simple queries like :
SELECT COUNT(*) FROM `item`
Context :
This table is around 200Gb+ and 50M+ rows.
We have a RDS on AWS with 2CPU and 16GiB RAM (db.r6g.large).
This is the table structure SQL dump :
/*
Target Server Type : MySQL
Target Server Version : 80023
File Encoding : 65001
*/
SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;
DROP TABLE IF EXISTS `item`;
CREATE TABLE `item` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`status` tinyint DEFAULT '1',
`source_id` int unsigned DEFAULT NULL,
`type` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`url` varchar(2048) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`title` varchar(500) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`sku` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`price` decimal(20,4) DEFAULT NULL,
`price_bc` decimal(20,4) DEFAULT NULL,
`price_original` decimal(20,4) DEFAULT NULL,
`currency` varchar(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`description` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
`image` varchar(1024) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`time_start` datetime DEFAULT NULL,
`time_end` datetime DEFAULT NULL,
`block_update` tinyint(1) DEFAULT '0',
`status_api` tinyint(1) DEFAULT '1',
`data` json DEFAULT NULL,
`created_at` int unsigned DEFAULT NULL,
`updated_at` int unsigned DEFAULT NULL,
`retailer_id` int DEFAULT NULL,
`hash` char(32) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`count_by_hash` int DEFAULT '1',
`item_last_update` int DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `sku_retailer_idx` (`sku`,`retailer_id`),
KEY `updated_at_idx` (`updated_at`),
KEY `time_end_idx` (`time_end`),
KEY `retailer_id_idx` (`retailer_id`),
KEY `hash_idx` (`hash`),
KEY `source_id_hash_idx` (`source_id`,`hash`) USING BTREE,
KEY `count_by_hash_idx` (`count_by_hash`) USING BTREE,
KEY `created_at_idx` (`created_at`) USING BTREE,
KEY `title_idx` (`title`),
KEY `currency_idx` (`currency`),
KEY `price_idx` (`price`),
KEY `retailer_id_title_idx` (`retailer_id`,`title`) USING BTREE,
KEY `source_id_idx` (`source_id`) USING BTREE,
KEY `source_id_count_by_hash_idx` (`source_id`,`count_by_hash`) USING BTREE,
KEY `status_idx` (`status`) USING BTREE,
CONSTRAINT `fk-source_id` FOREIGN KEY (`source_id`) REFERENCES `source` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1858202585 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
SET FOREIGN_KEY_CHECKS = 1;
does partitioning the table could help on a simple query like this ?
do I need to increase the RAM of the RDS ? If yes what configuration do I need ?
is NoSQL more compatible to this kind of structure ?
Do you have any advices/solutions/fixes so the app can run those queries (we would like to keep all the data and not erase it if possible..) ?

"SELECT COUNT(*) FROM item" needs to scan an index. The smallest index is about 200MB, so that seems like it should not take "minutes".
There are probably multiple queries that do full table scans. Such will bump out all the cached data from the ~11GB of cache (the buffer_pool) and do that about 20 times. That's a lot of I/O and a lot of elapsed time. Meanwhile, most other queries will run slowly because their cached data is being bumped out.
The resolution:
Locate these naughty queries. RDS probably gives you access to the "slowlog".
Grab the slowlog and run pt-query-digest or mysqldumpslow -s t to find the "worst" queries.
Then we can discuss them.
There are some redundant indexes; removing them won't solve the problem. A rule: If you have INDEX(a), INDEX(a,b), you don't need the former.
If hash is some kind of scrambled value, it is likely that a single-row lookup (or update) will require a disk hit (and bump something else out of the cache).
decimal(20,4) takes 10 bytes and allows values up to 9,999,999,999,999,999.9999; that seems excessive. (Shrinking it won't save much space; something to keep in mind for the future.)
I see that AUTO_INCREMENT has reached 1.8 billion. If there are only 50M rows, does the processing do a lot of DELETEs? Or maybe REPLACE``? IODKU is better than REPLACE`.

Thanks for all the advices here, but the problem was that we were using the MySQL json type for a very heavy column. Removing this column or even changing it to varchar made the COUNT(id) around 1000x faster (also adding WHERE id > 1 helped..)
Note : it was impossible to just delete the column as it was, we had to change it to varchar before.

Related

Improve count query with join and where statement

I'm running a count query which is very slow, how can improve this?
I've got the following query, but it takes around 1.33 seconds:
select
count(*) as aggregate
from
`tickets`
inner join `orders` on `orders`.`id` = `tickets`.`order_id`
where
`orders`.`status` = 'paid' and
`tickets`.`created_at` > '2023-01-01 00:00:00'
The tickets table has around 650000 rows and the order table has around 320000 rows.
This is the result of SHOW CREATE TABLE tickets:
CREATE TABLE `tickets` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`tickettype_id` int unsigned NOT NULL,
`order_id` int unsigned NOT NULL,
`variant_id` bigint unsigned DEFAULT NULL,
`seat_id` bigint unsigned DEFAULT NULL,
`barcode` varchar(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`first_name` varchar(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`last_name` varchar(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`email` varchar(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`telephone` varchar(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`birthday` date DEFAULT NULL,
`age` int unsigned DEFAULT NULL,
`gender` enum('m','f') CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`price` double(10,2) DEFAULT NULL,
`extra_info` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `tickets_barcode_unique` (`barcode`),
KEY `tickets_tickettype_id_foreign` (`tickettype_id`),
KEY `tickets_order_id_foreign` (`order_id`),
KEY `tickets_order_id_index` (`order_id`),
KEY `tickets_tickettype_id_index` (`tickettype_id`),
KEY `tickets_seat_id_foreign` (`seat_id`),
KEY `tickets_variant_id_foreign` (`variant_id`),
CONSTRAINT `tickets_ibfk_1` FOREIGN KEY (`order_id`) REFERENCES `orders` (`id`) ON DELETE CASCADE,
CONSTRAINT `tickets_seat_id_foreign` FOREIGN KEY (`seat_id`) REFERENCES `seatplan_seats` (`id`) ON DELETE SET NULL,
CONSTRAINT `tickets_tickettype_id_foreign` FOREIGN KEY (`tickettype_id`) REFERENCES `tickets_types` (`id`) ON DELETE CASCADE,
CONSTRAINT `tickets_variant_id_foreign` FOREIGN KEY (`variant_id`) REFERENCES `ticket_variants` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB AUTO_INCREMENT=2945088 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
How can I improve the speed?
The performance of your query depends on several factors. Such as,
The table size
Performance of your machine
Indexing etc.
If you dont have indices created for status, order_id and created_at, better create them. Which can significantly improve the query performance.
CREATE INDEX order_id_index ON tickets(order_id);
CREATE INDEX status_index ON orders(status);
CREATE INDEX created_at_index ON tickets(created_at);
Additionally, if you are using PostgreSQL, try running VACUUM on your tables which removes the dead tuples and improves performance.
First of all
You need to add two indexes :
CREATE INDEX order_id_idx ON tickets(order_id);
// composite index since you are using both columns in where
CREATE INDEX status_created_at_idx ON tickets(status, created_at);
The query optimizer uses the composite indexes for queries that test all columns in the index, or queries that test the first columns, the first two columns, and so on.
More informations regarding composite can be found here

Same query on local/live site, vastly different performance

My slow query log is showing 5,6+ second runs randomly of a seemingly normal query. Running the same query on my local clone of the site gives a 0.05s query run time. Here is the EXPLAIN of both:
I'm a bit dense when it comes to MySQL optimization but, I don't understand how this can give such different results? The tables are roughly the same, the live site is more updated but otherwise very similar.
Edit:
Instead of comparing to my local installation, I instead am comparing to my "old" live server. I just moved servers this morning.
The OLD server DDL of the table:
CREATE TABLE `wp_bp_activity` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) NOT NULL,
`component` varchar(75) COLLATE utf8mb4_unicode_ci NOT NULL,
`type` varchar(75) COLLATE utf8mb4_unicode_ci NOT NULL,
`action` text COLLATE utf8mb4_unicode_ci NOT NULL,
`content` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
`primary_link` text COLLATE utf8mb4_unicode_ci NOT NULL,
`item_id` bigint(20) NOT NULL,
`secondary_item_id` bigint(20) DEFAULT NULL,
`date_recorded` datetime NOT NULL,
`hide_sitewide` tinyint(1) DEFAULT 0,
`mptt_left` int(11) NOT NULL DEFAULT 0,
`mptt_right` int(11) NOT NULL DEFAULT 0,
`is_spam` tinyint(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`),
KEY `date_recorded` (`date_recorded`),
KEY `user_id` (`user_id`),
KEY `item_id` (`item_id`),
KEY `secondary_item_id` (`secondary_item_id`),
KEY `component` (`component`),
KEY `type` (`type`),
KEY `mptt_left` (`mptt_left`),
KEY `mptt_right` (`mptt_right`),
KEY `hide_sitewide` (`hide_sitewide`),
KEY `is_spam` (`is_spam`),
FULLTEXT KEY `content` (`content`)
) ENGINE=InnoDB AUTO_INCREMENT=1060622 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
The NEW server DDL:
CREATE TABLE `wp_bp_activity` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`user_id` bigint(20) NOT NULL,
`component` varchar(75) COLLATE utf8mb4_unicode_ci NOT NULL,
`type` varchar(75) COLLATE utf8mb4_unicode_ci NOT NULL,
`action` text COLLATE utf8mb4_unicode_ci NOT NULL,
`content` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
`primary_link` text COLLATE utf8mb4_unicode_ci NOT NULL,
`item_id` bigint(20) NOT NULL,
`secondary_item_id` bigint(20) DEFAULT NULL,
`date_recorded` datetime NOT NULL,
`hide_sitewide` tinyint(1) DEFAULT 0,
`mptt_left` int(11) NOT NULL DEFAULT 0,
`mptt_right` int(11) NOT NULL DEFAULT 0,
`is_spam` tinyint(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`),
KEY `date_recorded` (`date_recorded`),
KEY `user_id` (`user_id`),
KEY `item_id` (`item_id`),
KEY `secondary_item_id` (`secondary_item_id`),
KEY `component` (`component`),
KEY `type` (`type`),
KEY `mptt_left` (`mptt_left`),
KEY `mptt_right` (`mptt_right`),
KEY `hide_sitewide` (`hide_sitewide`),
KEY `is_spam` (`is_spam`),
FULLTEXT KEY `content` (`content`)
) ENGINE=InnoDB AUTO_INCREMENT=1060840 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
They are the same, but as you can see at the bottom of each the number of rows (by the AUTO_INCREMENT). They are very similar. As I said this OLD server was active this morning, so there are many recent datetimes here as well.
Running the query pictured above on both servers however, results in the same difference in "EXPLAIN" output, the NEW server is examining an insane amount of rows in 10x the time.
Use a "composite" index (in this order):
INDEX(is_spam, hide_sitewide, type)
Tip: When EXPLAIN says "Intersect", you need a composite index.
Get rid of a.type NOT IN ... since that test is already handled by a.type IN ...
Get rid of DISTINCT, it may be causing an extra pass to de-dup.
Use EXPLAIN FORMAT=JSON SELECT ... to get more detailed info.
Use text, not images, in this forum.
There may be a difference in version between the two servers -- and this may explain the different optimization technique used (or mis-used).

How to overcome performance issue when converting utf8mb4 to latin1?

By my ignorance, I have altered a few tables without specifying collation.
That caused changed columns, which used to be latin1 characters, to be changed to utf8mb4.
This brought HUGE performance loss running joins. And when I say HUGE I mean fraction of a second changed to one hour or more!
So I have made an other request to convert it back to latin1.
And here comes the problem. Mere 60k row table, with ONE utf8mb4 column of 64 characters required 10 hours to complete. No, it is not a mistake. TEN hours. And my even bigger problem is that I have other tables that have millions of rows giving me ETA in years from today!
So now, I wonder what my options are because I can't afford having these tables to be read-only for longer than one day time.
I know that MYSQL ALTER creates a copy of a table. It makes sense because this is field size change, so I doubt I have an option to use ALGORITHM=INPLACE.
If I cannot do INPLACE then I cannot use LOCK=NONE option.
Why in the world utf8mb4 -> latin1 conversion could make such a big impact?
Note that the converted column is indexed, and this may be a reason for the impact!
ANY suggestion or a link would be greatly appreciated!
Maybe the solution would be to drop index (to avoid funky multibyte issue in the index conversion,) do fast alter, and then add an index?
Thanks in advance for any serious suggestion and I suspect I may not find much of a help because of the uniqueness of it.
EDIT
jobs | CREATE TABLE `jobs` (
`auto_inc_key` int(11) NOT NULL AUTO_INCREMENT,
`request_entered_timestamp` datetime NOT NULL,
`hash_id` char(64) CHARACTER SET latin1 NOT NULL,
`name` varchar(128) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`host` char(20) CHARACTER SET latin1 NOT NULL,
`user_id` int(11) NOT NULL,
`start_date` datetime NOT NULL,
`end_date` datetime NOT NULL,
`state` char(12) CHARACTER SET latin1 NOT NULL,
`location` varchar(50) NOT NULL,
`value` int(10) NOT NULL DEFAULT '0',
`aggregation_job_id` char(64) CHARACTER SET latin1 DEFAULT NULL,
`aggregation_job_order` int(11) DEFAULT NULL,
PRIMARY KEY (`auto_inc_key`),
KEY `host` (`host`),
KEY `hash_id` (`hash_id`),
KEY `user_id` (`user_id`,`request_entered_timestamp`),
KEY `request_entered_timestamp_idx` (`request_entered_timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=9068466 DEFAULT CHARSET=utf8mb4
jobs_archive | CREATE TABLE `jobs_archive` (
`auto_inc_key` int(11) NOT NULL AUTO_INCREMENT,
`request_entered_timestamp` datetime NOT NULL,
`hash_id` char(64) CHARACTER SET latin1 NOT NULL,
`name` varchar(128) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`host` char(20) CHARACTER SET latin1 NOT NULL,
`user_id` int(11) NOT NULL,
`start_date` datetime NOT NULL,
`end_date` datetime NOT NULL,
`state` char(12) CHARACTER SET latin1 NOT NULL,
`value` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`auto_inc_key`),
KEY `host` (`host`),
KEY `hash_id` (`hash_id`),
KEY `user_id` (`user_id`,`request_entered_timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=239432 DEFAULT CHARSET=utf8mb4
(taken from PROCEDURE, but you catch the drift...)
INSERT INTO jobs_archive (SELECT * FROM jobs WHERE (TIMESTAMPDIFF(DAY, request_entered_timestamp, starttime) > days));

Comparing MySQL and SQLite create table statements

So I am trying to make an app with "code first" approach and active record ORM. I am willing to work with MySQL and SQLite (both) in this app. Here is how I am creating a database:
(this is a mockup table I came up with solely for this question)
$this->int("id", self::INT_MEDIUM)->unSigned()->primaryKey()->autoIncrement();
$this->string("hash", 64, self::STR_FIXED)->unique();
$this->enum("status", "available","sold","pending")->defaultValue("available");
$this->int("category", self::INT_MEDIUM)->unSigned()->defaultValue(0);
$this->string("name")->unique();
$this->text("descr")->nullable();
$this->decimal("price", 10, 4)->defaultValue(1.234);
$this->uniqueKey("store_np", "name", "price");
$this->foreignKey("category", "categories", "id");
and then my code generates CREATE TABLE statement for me, here are the results:
MySQL:
CREATE TABLE `products` (
`id` mediumint UNSIGNED PRIMARY KEY auto_increment NOT NULL,
`hash` char(64) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`status` enum('available','sold','pending') CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL default 'available',
`category` mediumint UNSIGNED NOT NULL default 0,
`name` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`descr` TEXT CHARACTER SET utf8 COLLATE utf8_unicode_ci default NULL,
`price` decimal(10,4) NOT NULL default 1.234,
UNIQUE KEY (`hash`),
UNIQUE KEY (`name`),
UNIQUE KEY `store_np` (`name`,`price`),
FOREIGN KEY (`category`) REFERENCES `categories`(`id`)
) ENGINE=InnoDB;
SQLite:
CREATE TABLE `products` (
`id` INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
`hash` TEXT UNIQUE NOT NULL,
`status` TEXT CHECK(status in ('available','sold','pending') ) NOT NULL default 'available',
`category` INTEGER UNSIGNED NOT NULL default 0,
`name` TEXT UNIQUE NOT NULL,
`descr` TEXT default NULL,
`price` REAL NOT NULL default 1.234,
CONSTRAINT `store_np` UNIQUE (`name`,`price`),
CONSTRAINT `cnstrnt_category_frgn` FOREIGN KEY (`category`) REFERENCES `categories`(`id`)
);
I executed both queries in phpMyAdmin and phpLiteAdmin, and both seemed to worked fine.
Here are my concerns:
For example, I didn't know I cannot use "UNSIGNED" with "PRIMARY KEY AUTOINCREMENT" in SQLite, which gave me hard time to figure out. Is there anything else like that I should be concerned about?
Even though both statements were executed successfully, are they going to work as expected? Especially the constraints in SQLite
Please check "status" column in SQLite, is it appropriate method to use as an alternative to MySQL's enum?

Slow Updates for Single Records by Primary Key

I am using MySQL 5.5.
I have an InnoDB table definition as follows:
CREATE TABLE `table1` (
`col1` int(11) NOT NULL AUTO_INCREMENT,
`col2` int(11) DEFAULT NULL,
`col3` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
`col4` int(11) DEFAULT NULL,
`col5` datetime DEFAULT NULL,
`col6` tinyint(1) NOT NULL DEFAULT '0',
`col7` datetime NOT NULL,
`col8` datetime NOT NULL,
`col9` int(11) DEFAULT NULL,
`col10` tinyint(1) NOT NULL DEFAULT '0',
`col11` tinyint(1) DEFAULT '0',
PRIMARY KEY (`col1`),
UNIQUE KEY `index_table1_on_ci_ai_tn_sti` (`col2`,`col4`,`col3`,`col9`),
KEY `index_shipments_on_applicant_id` (`col4`),
KEY `index_shipments_on_shipment_type_id` (`col9`),
KEY `index_shipments_on_created_at` (`col7`),
KEY `idx_tracking_number` (`col3`)
) ENGINE=InnoDB AUTO_INCREMENT=7634960 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
The issue is UPDATES. There are about 2M rows in this table.
A typical UPDATE query would be :
UPDATE table1 SET col6 = 1 WHERE col1 = 7634912;
We have about 5-10k QPS on this production server. These queries are often in "Updating" state when looked at through the process list. The InnoDB locks show that there are many rec but not gap locks on index_table1_on_ci_ai_tn_sti. No transaction is waiting for lock.
My feeling is that the Unique Index is causing the lag but I'm not sure why. This is the only table we have that is defined this way using the Unique Index.
I don't think the UNIQUE key has any impact (in this case).
Are you really setting a DATETIME to "1"? (Please check for other typos -- they could make a big difference.)
Are you trying to do 10K UPDATEs per second?
Is innodb_buffer_pool_size bigger than the table, but no bigger than 70% of available RAM?
What is the value of innodb_flush_log_at_trx_commit? 1 is default and secure, but slower than 2.
Can you put a bunch of updates into a single transaction? That would cut down the transaction overhead.