How to overcome performance issue when converting utf8mb4 to latin1? - mysql

By my ignorance, I have altered a few tables without specifying collation.
That caused changed columns, which used to be latin1 characters, to be changed to utf8mb4.
This brought HUGE performance loss running joins. And when I say HUGE I mean fraction of a second changed to one hour or more!
So I have made an other request to convert it back to latin1.
And here comes the problem. Mere 60k row table, with ONE utf8mb4 column of 64 characters required 10 hours to complete. No, it is not a mistake. TEN hours. And my even bigger problem is that I have other tables that have millions of rows giving me ETA in years from today!
So now, I wonder what my options are because I can't afford having these tables to be read-only for longer than one day time.
I know that MYSQL ALTER creates a copy of a table. It makes sense because this is field size change, so I doubt I have an option to use ALGORITHM=INPLACE.
If I cannot do INPLACE then I cannot use LOCK=NONE option.
Why in the world utf8mb4 -> latin1 conversion could make such a big impact?
Note that the converted column is indexed, and this may be a reason for the impact!
ANY suggestion or a link would be greatly appreciated!
Maybe the solution would be to drop index (to avoid funky multibyte issue in the index conversion,) do fast alter, and then add an index?
Thanks in advance for any serious suggestion and I suspect I may not find much of a help because of the uniqueness of it.
EDIT
jobs | CREATE TABLE `jobs` (
`auto_inc_key` int(11) NOT NULL AUTO_INCREMENT,
`request_entered_timestamp` datetime NOT NULL,
`hash_id` char(64) CHARACTER SET latin1 NOT NULL,
`name` varchar(128) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`host` char(20) CHARACTER SET latin1 NOT NULL,
`user_id` int(11) NOT NULL,
`start_date` datetime NOT NULL,
`end_date` datetime NOT NULL,
`state` char(12) CHARACTER SET latin1 NOT NULL,
`location` varchar(50) NOT NULL,
`value` int(10) NOT NULL DEFAULT '0',
`aggregation_job_id` char(64) CHARACTER SET latin1 DEFAULT NULL,
`aggregation_job_order` int(11) DEFAULT NULL,
PRIMARY KEY (`auto_inc_key`),
KEY `host` (`host`),
KEY `hash_id` (`hash_id`),
KEY `user_id` (`user_id`,`request_entered_timestamp`),
KEY `request_entered_timestamp_idx` (`request_entered_timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=9068466 DEFAULT CHARSET=utf8mb4
jobs_archive | CREATE TABLE `jobs_archive` (
`auto_inc_key` int(11) NOT NULL AUTO_INCREMENT,
`request_entered_timestamp` datetime NOT NULL,
`hash_id` char(64) CHARACTER SET latin1 NOT NULL,
`name` varchar(128) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`host` char(20) CHARACTER SET latin1 NOT NULL,
`user_id` int(11) NOT NULL,
`start_date` datetime NOT NULL,
`end_date` datetime NOT NULL,
`state` char(12) CHARACTER SET latin1 NOT NULL,
`value` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`auto_inc_key`),
KEY `host` (`host`),
KEY `hash_id` (`hash_id`),
KEY `user_id` (`user_id`,`request_entered_timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=239432 DEFAULT CHARSET=utf8mb4
(taken from PROCEDURE, but you catch the drift...)
INSERT INTO jobs_archive (SELECT * FROM jobs WHERE (TIMESTAMPDIFF(DAY, request_entered_timestamp, starttime) > days));

Related

MySQL : SELECT on big table takes a lot of time. Solutions?

my app get stuck for hours on simple queries like :
SELECT COUNT(*) FROM `item`
Context :
This table is around 200Gb+ and 50M+ rows.
We have a RDS on AWS with 2CPU and 16GiB RAM (db.r6g.large).
This is the table structure SQL dump :
/*
Target Server Type : MySQL
Target Server Version : 80023
File Encoding : 65001
*/
SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;
DROP TABLE IF EXISTS `item`;
CREATE TABLE `item` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`status` tinyint DEFAULT '1',
`source_id` int unsigned DEFAULT NULL,
`type` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`url` varchar(2048) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`title` varchar(500) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`sku` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`price` decimal(20,4) DEFAULT NULL,
`price_bc` decimal(20,4) DEFAULT NULL,
`price_original` decimal(20,4) DEFAULT NULL,
`currency` varchar(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`description` text CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci,
`image` varchar(1024) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`time_start` datetime DEFAULT NULL,
`time_end` datetime DEFAULT NULL,
`block_update` tinyint(1) DEFAULT '0',
`status_api` tinyint(1) DEFAULT '1',
`data` json DEFAULT NULL,
`created_at` int unsigned DEFAULT NULL,
`updated_at` int unsigned DEFAULT NULL,
`retailer_id` int DEFAULT NULL,
`hash` char(32) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`count_by_hash` int DEFAULT '1',
`item_last_update` int DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `sku_retailer_idx` (`sku`,`retailer_id`),
KEY `updated_at_idx` (`updated_at`),
KEY `time_end_idx` (`time_end`),
KEY `retailer_id_idx` (`retailer_id`),
KEY `hash_idx` (`hash`),
KEY `source_id_hash_idx` (`source_id`,`hash`) USING BTREE,
KEY `count_by_hash_idx` (`count_by_hash`) USING BTREE,
KEY `created_at_idx` (`created_at`) USING BTREE,
KEY `title_idx` (`title`),
KEY `currency_idx` (`currency`),
KEY `price_idx` (`price`),
KEY `retailer_id_title_idx` (`retailer_id`,`title`) USING BTREE,
KEY `source_id_idx` (`source_id`) USING BTREE,
KEY `source_id_count_by_hash_idx` (`source_id`,`count_by_hash`) USING BTREE,
KEY `status_idx` (`status`) USING BTREE,
CONSTRAINT `fk-source_id` FOREIGN KEY (`source_id`) REFERENCES `source` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1858202585 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
SET FOREIGN_KEY_CHECKS = 1;
does partitioning the table could help on a simple query like this ?
do I need to increase the RAM of the RDS ? If yes what configuration do I need ?
is NoSQL more compatible to this kind of structure ?
Do you have any advices/solutions/fixes so the app can run those queries (we would like to keep all the data and not erase it if possible..) ?
"SELECT COUNT(*) FROM item" needs to scan an index. The smallest index is about 200MB, so that seems like it should not take "minutes".
There are probably multiple queries that do full table scans. Such will bump out all the cached data from the ~11GB of cache (the buffer_pool) and do that about 20 times. That's a lot of I/O and a lot of elapsed time. Meanwhile, most other queries will run slowly because their cached data is being bumped out.
The resolution:
Locate these naughty queries. RDS probably gives you access to the "slowlog".
Grab the slowlog and run pt-query-digest or mysqldumpslow -s t to find the "worst" queries.
Then we can discuss them.
There are some redundant indexes; removing them won't solve the problem. A rule: If you have INDEX(a), INDEX(a,b), you don't need the former.
If hash is some kind of scrambled value, it is likely that a single-row lookup (or update) will require a disk hit (and bump something else out of the cache).
decimal(20,4) takes 10 bytes and allows values up to 9,999,999,999,999,999.9999; that seems excessive. (Shrinking it won't save much space; something to keep in mind for the future.)
I see that AUTO_INCREMENT has reached 1.8 billion. If there are only 50M rows, does the processing do a lot of DELETEs? Or maybe REPLACE``? IODKU is better than REPLACE`.
Thanks for all the advices here, but the problem was that we were using the MySQL json type for a very heavy column. Removing this column or even changing it to varchar made the COUNT(id) around 1000x faster (also adding WHERE id > 1 helped..)
Note : it was impossible to just delete the column as it was, we had to change it to varchar before.

Error while I'm trying to partitioning a table

Here is my posts table:
CREATE TABLE `posts` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`date` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP,
`img` varchar(255) COLLATE utf8_croatian_ci NOT NULL,
`vid` varchar(255) COLLATE utf8_croatian_ci NOT NULL,
`title` varchar(255) COLLATE utf8_croatian_ci NOT NULL,
`subtitle` varchar(255) COLLATE utf8_croatian_ci NOT NULL,
`auth` varchar(54) COLLATE utf8_croatian_ci NOT NULL,
`story` longtext COLLATE utf8_croatian_ci NOT NULL,
`tags` varchar(255) COLLATE utf8_croatian_ci NOT NULL,
`status` varchar(100) COLLATE utf8_croatian_ci NOT NULL,
`moder` varchar(50) COLLATE utf8_croatian_ci NOT NULL,
`rec` varchar(50) COLLATE utf8_croatian_ci NOT NULL,
`pos` varchar(50) COLLATE utf8_croatian_ci NOT NULL,
`inde` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=117 DEFAULT CHARSET=utf8 COLLATE=utf8_croatian_ci
I want to make two partitions in order to improve query performances.
First partition should contain all non-archive rows.
Second partition - all archive rows.
ALTER TABLE posts
PARTITION BY LIST COLUMNS (status)
(
PARTITION P1 VALUES IN ('admin', 'moder', 'public', 'rec'),
PARTITION P2 VALUES IN ('archive')
);
phpmyadmin error:
Static analysis:
1 errors were found during analysis.
Unrecognized alter operation. (near "" at position 0)
MySQL said:
#1503 - A PRIMARY KEY must include all columns in the table's partitioning function
Any help?
What queries are you trying to speed up? Since the only index you currently have, WHERE id=... or WHERE id BETWEEN ... AND ... are the only queries that will be fast. And the partitioning you suggest will not help much for other queries.
You seem to have only dozens of rows; don't consider partitioning unless you expect to have at least a million rows.
status has only 5 values? Then make it ENUM('archive', 'admin', 'moder', 'public', 'rec') NOT NULL. That will take 1 byte instead of lots.
If you will be querying on date and/or status and/or auth, then let's talk about indexes, especially 'composite' indexes on such. And, to achieve the "archive" split you envision, put status as the first column in the index.

XAMPP's PHPMyAdmin Ignores AUTO_INCREMENT Attribute When Exporting the Table

The problem that I am going to tell is for all the tables in the DB, When I dump my DB it ignores AUTO_INCREMENT attribute. My actual table is as the following:
CREATE TABLE IF NOT EXISTS `departments` (
`departmentid` int(11) NOT NULL AUTO_INCREMENT,
`chairid` int(11) NOT NULL,
`department_name` varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
`image` varchar(128) NOT NULL DEFAULT 'default_department.png',
`url` varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
`active` tinyint(4) NOT NULL DEFAULT '1'
) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8;
But if I dump the table using PHPMyAdmin it does not add auto_increment which I mentioned before.
The output of the exported .sql file's content is here:
CREATE TABLE IF NOT EXISTS `departments` (
`departmentid` int(11) NOT NULL, -- AUTO_INCREMENT is missing
`chairid` int(11) NOT NULL,
`department_name` varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
`image` varchar(128) NOT NULL DEFAULT 'default_department.png',
`url` varchar(256) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
`active` tinyint(4) NOT NULL DEFAULT '1'
) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8;
I especially checked that if auto_increment is disabled in CREATE TABLE options but no, it is not.
This was fixed in phpMyAdmin version 4.5.0.1 (Sep 2015):
https://github.com/phpmyadmin/phpmyadmin/issues/11492
I supposed it ignores auto increment but I explore file and at the bottom of sql I found that it altered to auto increment somewhere else

MySQL indexing optimization

I have following table - 'element'
CREATE TABLE `element` (
`eid` bigint(22) NOT NULL AUTO_INCREMENT,
`tag_name` varchar(45) COLLATE utf8_bin DEFAULT NULL,
`text` text COLLATE utf8_bin,
`depth` tinyint(2) DEFAULT NULL,
`classes` tinytext COLLATE utf8_bin,
`webarchiver_uniqueid` int(11) DEFAULT NULL,
`created` datetime DEFAULT NULL,
`updated` datetime DEFAULT NULL,
`rowstatus` char(1) COLLATE utf8_bin DEFAULT 'A',
PRIMARY KEY (`eid`)
) ENGINE=InnoDB AUTO_INCREMENT=12090 DEFAULT CHARSET=utf8 COLLATE=utf8_bin;
Column details and current index details are given above. Almost 90% of queries on this table are like:
select * from element
where tag_name = 'XXX'
and text = 'YYYY'
and depth = 20
and classes = 'ZZZZZ'
and rowstatus = 'A'
What would be the most optimal way to create index on this table? The table has around 60k rows.
Change classes from TINYTEXT to VARCHAR(255) (or some more reasonable size), then have
INDEX(tag_name, depth, classes)
with the columns in any order. I left out rowstatus because it smells like a column that is likely to change. (Anyway, a flag does not add much to an index.)
You can't include TEXT or BLOB columns in an index. And it is not worth it to do a 'prefix' index.
Since a PRIMARY KEY is a UNIQUE key, DROP INDEX eid_UNIQUE.
Is there some reason for picking "binary" / "utf8_bin" for all the character fields?

Very odd characters in a mysql dump -- what to do?

I've got weird data mangling my data migration. These weird characters are embedded as-is in the actual mysql dump file:
北京东方å›悦大酒店<br />\n<br />\n“The impetus
I've been given mysql data dumps with those kinds of chars in them. I'm importing the data into Drupal, by first recreating the mysql tables, and then querying against them using Drupal's Migrate module.
Code looks like this:
DROP TABLE IF EXISTS `news`;
SET #saved_cs_client = ##character_set_client;
SET character_set_client = utf8;
CREATE TABLE `news` (
`id` int(11) NOT NULL auto_increment,
`uid` int(11) NOT NULL,
`pid` int(11) default NULL,
`puid` int(11) default NULL,
`headline` varchar(255) NOT NULL,
`teaser` varchar(500) NOT NULL,
`status` char(1) default NULL,
`date` datetime NOT NULL,
`url` varchar(255) default NULL,
`url_title` varchar(255) default NULL,
`body` text,
`caption` varchar(255) default NULL,
`gid` int(11) default NULL,
`feature` text,
`related` varchar(255) default NULL,
`change1_time` int(11) default NULL,
`change2_time` int(11) default NULL,
`change1_user` varchar(255) default NULL,
`change2_user` varchar(255) default NULL,
`expires` datetime default NULL,
`rank` char(1) default NULL,
PRIMARY KEY (`id`),
KEY `uid` (`uid`),
KEY `status` (`status`),
KEY `expires` (`expires`),
KEY `rank` (`rank`),
KEY `puid` (`puid`),
FULLTEXT KEY `headline` (`headline`,`teaser`,`body`)
) ENGINE=MyISAM AUTO_INCREMENT=6976 DEFAULT CHARSET=utf8;
SET character_set_client = #saved_cs_client;
Fastest solution is the winner here -- I'm on a tight deadline, and really suffering over here! I've tried a search and replace solution, but there appear to be too many different types of weird data. I can orchestrate a new data dump, if I know what to tell them (how to do the data dump).
Thanks,
John
This isn't a direct answer to your question, but I played a bit with the mojibake you quoted in your post. It looks like it was originally Chinese text in UTF-8 encoding, which was interpreted as Latin text in Windows-1252 encoding, re-encoded in UTF-8 and again interpreted as Windows-1252 (and finally once more encoded as UTF-8 when you posted it here). So it's not just mojibake, it's double mojibake.
Also, at some point, a byte was lost from the middle of the string (probably because it was one of the undefined code points in Windows-1252), mangling one of the original characters. Running the text through the encoding chain in reverse (encode as Windows-1252, decode as UTF-8, repeat), I get the output:
北京东方�悦大酒店<br />\n<br />\n“The impetus
where the replacement character � stands for the mangled character.