Insert Japanese characters into latin1_swedish_ci collated mysql table column - mysql

Japanese characters are getting replaced by ??? I am not allowed to change the collation for the table/column. How can I insert these values?
MariaDB [company]> show full columns from test_table_latin1;
+-------+-------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------+-------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| id | int(5) | NULL | YES | | NULL | | select,insert,update,references | |
| data | varchar(20) | latin1_swedish_ci | YES | | NULL | | select,insert,update,references | |
+-------+-------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
2 rows in set (0.00 sec)
MariaDB [company]> insert into test_table_latin1 values (4,'Was sent 検索キーワード - 自然');
Query OK, 1 row affected, 1 warning (0.00 sec)
MariaDB [company]> select * from test_table_latin1 where id=4;
+------+----------------------+
| id | data |
+------+----------------------+
| 4 | Was sent ??????? - ? |
+------+----------------------+
1 row in set (0.00 sec)

Japanese data is already there
It can't be, or if it is, it is garbled beyond recognition. For one thing, DB throws warnings if you try (INSERT INTO test_table_latin1 (data) VALUES ('キーワード'); with "Incorrect string value: '\xE3\x82\xAD\xE3\x83\xBC...' for column 'data'".
Same if you force it (CONVERT('キーワード' USING latin1)), you get the question marks as it does the best it can with an impossible request. It tried to warn you when you were doing it accidentally, but now that you're doing it explicitly it will comply, and just mark the problem spots with '?'. The data is lost, the Japanese is no longer there, and there's nothing you can do to convert ????? to キーワード.
The best of horrible options is pretending all is well: INSERT INTO test_table_latin1 (data) VALUES (CONVERT('キーワード' USING binary)), which gets you キーワード. Total garbage, but garbage that can be converted back to original: SELECT CONVERT(CONVERT(data USING binary) USING utf8) FROM test_table_latin1; should give you `キーワード'. Problem is, this only works when there's no actual Swedish, because either you encode the characters above 0x7f as if they were Unicode (which they are not), or if you avoid them, then you break UTF8 and you won't be able to convert back. So it's again a very bad case.
Finally, you could make your own way of signifying "treat this part differently", like "Was sent [[Base64:UTF8:5qSc57Si44Kt44O844Ov44O844OJ]] - [[Base64:UTF8:6Ieq54S2]]" and decode it on the client.
All of these are bad, bad alternatives to the single correct one: make the column Unicode. I understand that you might be unable to do so (company policy, legacy, compatibility, whatever), but it doesn't change the facts that anything else is no longer suited for this multicultural world we live in.

Related

PuTTY outputs weird stuff when selecting in MySQL

I've encountered a strange problem when I was using PuTTY to query the following MySQL command: select * from gts_camera
The output seems extremely weird:
As you can see, putty outputs loads of "PuTTYPuTTYPuTTY..."
Maybe it's because of the table attribute set:
mysql> describe gts_kamera;
+---------+----------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------+----------+------+-----+-------------------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| datum | datetime | YES | | CURRENT_TIMESTAMP | |
| picture | longblob | YES | | NULL | |
+---------+----------+------+-----+-------------------+----------------+
This table stores some big pictures and their date of creation.
(The weird ASCII-characters you can see on top of the picture is the content.)
Does anybody know why PuTTY outputs such strange stuff, and how to solve/clean this?
Cause I can't type any other commands afterwards. I have to reopen the session again.
Sincerely,
Michael.
The reason this happens is because of the contents of the file (as you have a column defined with longblob). It may have some characters that Putty will not understand, therefore it will break as it is happening with you.
There is a configuration that may help though.
You can also not select every column in that table (at least not the *blob ones) as:
select id, datum from gts_camera;
Or If you still want to do it use the MySql funtion HEX:
select id, datum, HEX(picture) as pic from gts_camera;

SQLAlchemy/MySQL binary blob is being utf-8 encoded?

I'm using SQLAlchemy and MySQL, with a files table to store files. That table is defined as follows:
mysql> show full columns in files;
+---------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+---------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
| id | varchar(32) | utf8_general_ci | NO | PRI | NULL | | select,insert,update,references | |
| created | datetime | NULL | YES | | NULL | | select,insert,update,references | |
| updated | datetime | NULL | YES | | NULL | | select,insert,update,references | |
| content | mediumblob | NULL | YES | | NULL | | select,insert,update,references | |
| name | varchar(500) | utf8_general_ci | YES | | NULL | | select,insert,update,references | |
+---------+--------------+-----------------+------+-----+---------+-------+---------------------------------+---------+
The content column of type MEDIUMBLOB is where the files are stored. In SQLAlchemy that column is declared as:
__maxsize__ = 12582912 # 12MiB
content = Column(LargeBinary(length=__maxsize__))
I am not quite sure about the difference between SQLAlchemy's BINARY type and LargeBinary type. Or the difference between MySQL's VARBINARY type and BLOB type. And I am not quite sure if that matters here.
Question: Whenever I store an actual binary file in that table, i.e. a Python bytes or b'' object , then I get the following warning
.../python3.4/site-packages/sqlalchemy/engine/default.py:451: Warning: Invalid utf8 character string: 'BCB121'
cursor.execute(statement, parameters)
I don't want to just ignore the warning, but it seems that the files are in tact. How do I handle this warning gracefully, how can I fix its cause?
Side note: This question seems to be related, and it seems to be a MySQL bug that it tries to convert all incoming data to UTF-8 (this answer).
Turns out that this was a driver issue. Apparently the default MySQL driver stumbles with Py3 and utf8 support. Installing cymysql into the virtual Python environment resolved this problem and the warnings disappear.
The fix: Find out if MySQL connects through socket or port (see here), and then modify the connection string accordingly. In my case using a socket connection:
mysql+cymysql://user:pwd#localhost/database?unix_socket=/var/run/mysqld/mysqld.sock
Use the port argument otherwise.
Edit: While the above fixed the encoding issue, it gave rise to another one: blob size. Due to a bug in CyMySQL blobs larger than 8M fail to commit. Switching to PyMySQL fixed that problem, although it seems to have a similar issue with large blobs.
Not sure, but your problem might have the same roots as the one I had several years ago in python 2.7: https://stackoverflow.com/a/9535736/68998. In short, Mysql's interface does not let you be certain if you are working with a true binary string or a text in a binary collation (used because of a lack of case-sensitive utf8 collation). Therefore, a Mysql binding has the following options:
return all string fields as binary strings, and leave the decoding to you
decode only the fields that do not have a binary flag (so much fun when some of the fields are unicode and other are str)
have an option to force decoding to unicode for all string fields, even true binary
My guess is that in your case, the third option is somewhere enabled in the underlying Mysql binding. And the first suspect is your connection string (connection params).

MySQL Screwed Up Output

I'm see some very strange outputs from MySQL, and I don't know whether it's my console or my data that's causing this. Here are some screenshots:
Any ideas?
edit:
mysql> describe transformed_step_a1_sfdc_lead_history;
+-------------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------------+--------------+------+-----+---------+-------+
| old_value | varchar(255) | YES | | NULL | |
| new_value | varchar(255) | YES | | NULL | |
+-------------------+--------------+------+-----+---------+-------+
Max
To verify if there is any control characters, you can use -s option, see http://dev.mysql.com/doc/refman/5.5/en/mysql-command-options.html#option_mysql_raw
It's impossible to tell exactly what the problem is from your screenshots, but the text in your database contains control characters. The usual culprit is CR, which moves the cursor back to the beginning of the line and starts overwriting text already there.
If you have programmatic access to your database then you will be able to dump the values with control characters expressed as pintables so that you can see what is actually in there.

Concatenate a string in TEXT data type mysql

I have a table named testing which contains a column with MEDIMTEXT type.
mysql> desc testing;
+-------+------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+-------+
| id | tinyint(4) | YES | | NULL | |
| data | mediumtext | YES | | NULL | |
+-------+------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
And the table content is like--
mysql> select * from testing;
+------+--------------------------------+
| id | data |
+------+--------------------------------+
| 1 | This is the first data entered.|
+------+--------------------------------+
1 row in set (0.00 sec)
In the column named data in id=1, I want to concatenate another string ex- " Concat me " to make it look like "This is the first data entered. Concat me"
I could do it by using
UPDATE testing SET data=CONCAT(data,'Concat me') WHERE id=1;
but I think it would make the whole field to be read first, then concatenation, and in the end replacing the new made String to the previous one. In case if the text is too long, it would take a lot of time to do so.
like- if there is 15 MB of text and 10 bytes of text to concat, then 15 mb would be read, then 10 bytes of data concatenated and then 15 MB+ 10 bytes of data written back.
I want to ask that does there exist any other method so that the string to be concatenate is just added in the end instead of replacing the complete?
so that only 10 bytes are written in database.
May be I am wrong and mysql will manage the command to make it work efficiently.
Your update statement is perfectly fine. If you have an index in your data column then yes your update might be slower, in which case you could disable the index or remove it before update and add it back after your update completes.

viewing mysql blob with putty

I am saving a serialized object to a mysql database blob.
After inserting some test objects and then trying to view the table, i am presented with lots of garbage and "PuTTYPuTTY" several times.
I believe this has something to do with character encoding and the blob containing strange characters.
I am just wanting to check and see if this is going to cause problems with my database, or if this is just a problem with putty showing the data?
Description of the QuizTable:
+-------------+-------------+-------------------+------+-----+---------+----------------+---------------------------------+-------------------------------------------------------------------------------------------------------------------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+-------------+-------------+-------------------+------+-----+---------+----------------+---------------------------------+-------------------------------------------------------------------------------------------------------------------+
| classId | varchar(20) | latin1_swedish_ci | NO | | NULL | | select,insert,update,references | FK related to the ClassTable. This way each Class in the ClassTable is associated with its quiz in the QuizTable. |
| quizId | int(11) | NULL | NO | PRI | NULL | auto_increment | select,insert,update,references | This is the quiz number associated with the quiz. |
| quizObject | blob | NULL | NO | | NULL | | select,insert,update,references | This is the actual quiz object. |
| quizEnabled | tinyint(1) | NULL | NO | | NULL | | select,insert,update,references | |
+-------------+-------------+-------------------+------+-----+---------+----------------+---------------------------------+-------------------------------------------------------------------------------------------------------------------+
What i see when i try to view the table contents:
select * from QuizTable;
questionTextq ~ xp sq ~ w
t q1a1t q1a2xt 1t q1sq ~ sq ~ w
t q2a1t q2a2t q2a3xt 2t q2xt test3 | 1 |
+-------------+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+
3 rows in set (0.00 sec)
I believe you can use the hex function on blobs as well as strings. You can run a query like this.
Select HEX(quizObject) From QuizTable Where....
Putty is reacting to what it thinks are terminal control character strings in your output stream. These strings allow the remote host to change something about the local terminal without redrawing the entire screen, such as setting the title, positioning the cursor, clearing the screen, etc..
It just so happens that when trying to 'display' something encoded like this, that a lot of binary data ends up sending these characters.
You'll get this reaction catting binary files as well.
blob will completely ignore any character encoding settings you have. It's really intended for storing binary objects like images or zip files.
If this field will only contain text, I'd suggest using a text field.