How to convert MYSQL UTF-8 when a row size is too large? - mysql

I'm on a MYSQL database attempting to alter my table encoded in latin1 to UTF-8. The tables name is Journalist that has 12 columns listed as varchar with max length of 3000. This is the sql i'm running.
ALTER TABLE `journalist` CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
The error I'm receiving
Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
Do I have to alter the size of the table before I run this conversion query?
and/or how I might accomplish this encoding alteration otherwise?

I did what #Wrikken suggested. I deleted my table and lowered varchar's max_length attributes to 1500 from 3000. I then ran this SQL on my new empty table
ALTER TABLE `table_name` CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
from here, I repopulated it with my backup table, using a script.
To answer the question:
Lower varchar max_length limits
Or change varchar fields to LONGTEXT, or BLOBS

Related

Will existing indexes be affected when changing character_set and collation of MySQL db

We have an database where the default character set for tables and columns is set to utf8 encoding
But with character set encoding of utf8 , we are unable to save emojis
To support saving of emojis,
a) We had to change the character set of table and columns to utf8mb4
b) We had to change the collation of table and columns to utf8mb4_unicode_ci
c) Update our JDBC driver so that it supports the unicode encoding
With the above changes we are able to save the emoji in our columns.
Question:
1) Do I need to delete existing indexes (varchar columns) and recreate the indexes as earlier with utf8 each character used to take 3 bytes and now with utf8mb4 encoding it will occupy 4 bytes ?
An index is an ordered list of pointers to the rows of the table. The ordering is based on both the CHARACTER SET and COLLATION of the column(s) of the index. If you change either, the index must be rebuilt. A "pointer" (in this context) is a copy of the PRIMARY KEY.
You should do one or the other of
ALTER TABLE tbl CONVERT TO CHARACTER SET utf8mb4 COLLATE ...,;
which converts all the text columns in the table. Or, if you need to leave some with their current charset/collation, then change each column:
ALTER TABLE tbl MODIFY col_name ... CHARACTER SET utf8mb4 COLLATE ...;
where the first '...' is the rest of the column definition (VARCHAR, NOT NULL, whatever).
Any indexes that involve the columns being changed will be rebuilt. In particular, note that a VARCHAR PRIMARY KEY is effectively in each secondary index.
The collation utf8mb4_unicode_ci is rather old; you might prefer utf8mb4_unicode_520_ci, especially since it handles Emoji as being distinct rather than lumped together (IIRC).
The fact that utf8 is a subset of utf8mb4 is not relevant; MySQL sees it as a change and fails to take any short cuts.

Mysql How to add a column as varchar(21884) with UTF8 charset?

If I execute this query:
CREATE TABLE `varchar_test1` (
`id` tinyint(1) NOT NULL,
`cloumn_1` varchar(21844) NOT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf8;
it is ok.
If I then execute this:
ALTER TABLE `varchar_test1` ADD COLUMN `cloumn_2` varchar(21844) NOT NULL;
I get an error:
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
If I execute this:
CREATE TABLE `varchar_test2` (
`id` int NOT NULL AUTO_INCREMENT,
`cloumn_1` varchar(21844) NOT NULL,
PRIMARY KEY (`id`)) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
I get:
ERROR 1118 (42000): Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs
Why?
Running mysql --version returns
mysql Ver 14.14 Distrib 5.7.17, for macos10.12 (x86_64) using EditLine wrapper
Your problem is that your columns store multi-byte values and thus exceed the maximum row size.
As explained in the docs, MySQL tables have a maximum row size of 65,535 bytes, regardless of the engine you use. Adding your two varchar(21844) columns means you go over this limit. This happens because the CHARSET on your table is utf8 (currently an alias for utf8mb3), so each character of a varchar in your table takes a maximum of 3 bytes (plus 2 bytes to store the length). That's fine with one column of 21,844 characters, but not with two.
To fix this, use TEXT (or TINYTEXT, MEDIUMTEXT, etc.) instead of VARCHAR columns. This will cause the values to be stored separately, so each of your columns will actually only contribute a few bytes to the row size. (This is also explained in the docs.)
Also, FYI: the spelling is column, not cloumn.

SQL converting column from Latin1 to UTF8

I am trying to convert a specific column in a table on my DB from latin1 character set with collation latin1_swedish_ci to utf8 with collation utf8_unicode_ci.
COLUMN: description, type: longtext, default not null
I tried the following commands on the column:
ALTER TABLE sample MODIFY description LONGBLOB NOT NULL ;
ALTER TABLE sample MODIFY description LONGTEXT CHARACTER SET utf8 NOT NULL COLLATE utf8_unicode_ci;
I also tried to alter the encoding WITHOUT changing to binary first. But the characters ended up being re-encoded incorrectly by the server.
And keep getting an error regarding some characters:
Error Code: 1366. Incorrect string value: '\x92t hav...' for column 'longDesc' at row 803
It seems like some of the character in my table aren't converting correctly.
How can I fix this issue?
\x92 implies that you have latin1 in the table now. The second ALTER is claiming that the bytes are in utf8 encoding. Hence, the error message.
Case 1: You need to change the LONGTEXT to utf8 because you plan to add rows with text that cannot be encoded in latin1.
For this case, ALTER TABLE sample CONVERT TO CHARACTER SET utf8; -- converts all CHAR/TEXT columns in the table.
ALTER TABLE sample MODIFY description ... CHARACTER SET utf8; -- converts the one column.
Case 2: The rest of the system is thinking utf8 and is confused by this column.
Well, I don't think it is confused. Conversions happen as needed.

ALTER DATABASE to change COLLATE not working

I am using Django on Bluehost. I created a form for user generated input, but unicode inputs from this form fails to be stored or displayed of characters. So I did a SO and google search that I should change the Collate and Character set of my database. I run this sql
ALTER DATABASE learncon_pywithyou CHARACTER SET utf8 COLLATE utf8_unicode_ci;
from python27 manage.py dbshell, which initiated a mysql shell, what shows on screen is
Query OK, 1 row affected (0.00 sec).
So I assume the problem is solved, but it is not actually. This sql has not done anything, as I later find it in phpMyAdmin provided by Bluehost. All the Varchar fields of all the tables are still in lantin1_swedish_ci collate.
So assume that alter table should work instead. I run this on mysql
alter table mytable character set utf8 collate utf8_unicode_ci;
although on screen it shows Query OK. 4 rows affected, it actually did nothing either, the collate of those fields in mytable did not change at all.
So I finally manually change the fields in phpMyAdmin for mytable and this works, now I am able to insert in this table with unicode and also they display correctly, but I have around 20 tables of such, I don't want to change them one by one manually.
Do we at all have a simple and effective way of changing Collate of each field to store and display correct unicodes?
Changing collation at the database level sets the default for new objects - existing collations will not be changed.
Similarly, at a table level, only new columns (See comment at the bottom) are affected with this:
alter table mytable character set utf8 collate utf8_unicode_ci;
However, to convert the collation of existing columns, you need to add convert to:
alter table mytable convert to character set utf8 collate utf8_unicode_ci;
In addition to #StuartLC ,
For Changing All 20 tables charset and collation use below query, Here world is database name
SELECT
CONCAT("ALTER TABLE ",TABLE_SCHEMA , ".",TABLE_NAME," CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci") AS AlterSQL
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = "world";
The above will prepare all ALTER queries which you need to run.

UTF8_general_ci collation taking a couple of seconds to process

I have made a table in collation utf8_general_ci .
CREATE TABLE `general` (
`one` varchar( 250 ) NOT NULL ,
`two` varchar( 250 ) NOT NULL ,
`time` varchar( 250 ) NOT NULL,);
and entered 1000 records in it.
insert into general values('ankit','hamitte','13364783'),('john','will','13364783');
Now when i am selecting the records and sorting them then it takes a couple of second while when i use armscii_general_c1 it loads instantly.What is the main reason for this and which collation should be used by me
I'm guessing you have all three columns in a multi-column index?
If so, then with utf8, they won't fit. MySQL will have to scan rows. A MySQL index is limited to 767 bytes in InnoDB (1000 for MyISAM), and indexes are fixed width, so UTF8 indexes are three times the size of single byte encoded indexes.
You'll only fit one column into the index with UTF8.
So, with the single-byte encoding, MySQL can utilize the index fully, whereas with the multi-byte encoding, MySQL cannot fully utilize the index.
Take in mind that UTF8 requires 2 times more space to hold your UTF8 chars. 1 UTF8 char will be stored as 2 chars describing it.
As a result it will take twice more time to make comparison of UTF8 strings.
Statement above does not relate to latin1 symbols as they will be stored as usual.
Please post full create table statement. As I may assume that problem is in indexes.
Just run
show create table general
and post here output.
BTW: sorting is very time consuming operation for DB, so you have to use indexes to make it fast.