SQL fails with a non-ascii char - mysql

I've got a SQL operation with the word "şalom" which contains a non-ASCII character.
And the statement ( which I posted below ) runs totally fine:
UPDATE `myTable1` SET `description`='The topic for this learning plan starts with the \"ÅŸ\" ( sh ) letter which is a non-ASCII char. ', `topic`='ÅŸalom'' WHERE `RecID` = '1308'"
however, right after this statement, I have to run another one to put the word "şalom" in another table but that one FAILS. The reported error is follows:
INSERT INTO `myTable2` (`TOPIC_Name`, `TOPIC_AddedOn`) VALUES ('[Åÿalom]', '2018-04-12').
LAST SQL ERROR:
HY000, 1366, Incorrect string value: '\xC5\xFFalom...' for column 'Topic_Name' at row 1
We've checked the table structures and the field structures and saw that they are identical and designed to accept utf-8.
We cannot figure out why one statement passes but the other one chokes. Any ideas?

ÅŸ is Mojibake for ş. When treated as latin1, ÅŸ is hex C5FF. See Trouble with UTF-8 characters; what I see is not what I stored for discussion of "Best practice" and "Mojibake".

Related

Invalid unicode character causing MySQL string error

I need to add a record to our MySQL database (via Omeka) that includes an invalid unicode character (this one)
The error message I get via Omeka is:
Mysqli statement execute error : Incorrect string value: '\xF0\xAA\xA8\xA7\xE7\x94...' for column 'text' at row 1
The database field is longtext with collation utf8_unicode_ci. There are already a lot of records in this table and I'm not quite sure what I should change without affecting the other data already in it. Suggestions?
ALTER TABLE tbl CONVERT TO utf8mb4;
Meanwhile, the text for that row in that column is probably truncated or the whole row is missing.
As best as I can tell, F0AAA8A7 is not yet assigned, but I think it is in the area of Chinese characters, not Emoji, which also need utf8mb4. It is Unicode "codepoint" 2AA27.

Store AESCipher object in mySQL

I'm encrypting some user data (no passwords) using Crypto.Cipher AES.
The return is a AESCipher of the form:
b'o\xab\xdd\x19\xaat\xfcIAN\xd2\x00\xe9'
sometimes it produces spaces and non hex representations.
b"N%?\x91\xe8'J\xc0\x10 p"
b'QV8>K\xd8\xfa\x9a\x05%\xe8LJp\xd0gf'
When I try to insert these values into the SQL table created as:
data_encrypted VARBINARY(40)
I get the following warning:
Warning: (1300, "Invalid utf8 character string: 'ABDD19'")
Seems like it's trimming the binary array, when I Query the inserted rows in the table, the row was effectively inserted but only the first bytes of the array, from b'o\xab\xdd\x19\xaat\xfcIAN\xd2\x00\xe9' it inserts only the 'o'.
Do I have to specify something else in the format?
Thanks
Adding a _binary solved the warning
self.cur.execute("INSERT IGNORE INTO members VALUES(%s,_binary %s...
And I'm also formatting the array to be hex represented, it looks better in the MySQL table and the decryption still works ok binascii.b2a_hex

How to edit invalid UTF-8 strings in mysql database

I have some utf-8 strings in my database, they are stored as varbinary. (Generally, it's mediawiki database, but that's not important, i think). I found that some strings are not in a good shape, then i make
SELECT log_comment, CONVERT( log_comment
USING utf8 ) AS
COMMENT
FROM `logging`
WHERE log_id = %somevalue%
i have output table in phpmyadmin like this:
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| d093d09ed0a1d0a220d0a020d098d0a1d09e2fd09cd0add09a20393239342d39332e20c2abd098d0bdd184d0bed180d0bcd0b0d186d0b8d0bed0bdd0bdd0b0d18f20d182d0b5d185d0bdd0bed0bbd0bed0b3d0b8d18f2e2e2e |NULL |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
What i need is to make this string readible, or upload new string with correct data. But this is varbinary field, how can i manage data inside it?
UPD:
found that phpmyadmin automatically added 2e2e2e for three dots at the end of each line - they were too long to show. Original binary data are, if somebody interested,
d09fd0a02035302e312e3031392d3230303020d09ed181d0bdd0bed0b2d0bdd18bd0b520d0bfd0bed0bbd0bed0b6d0b5d0bdd0b8d18f20d0b5d0b4d0b8d0bdd0bed0b920d181d0b8d181d182d0b5d0bcd18b20d0bad0bbd0b0d181d181d0b8d184d0b8d0bad0b0d186d0b8d0b820d0b820d0bad0bed0b4d0b8d180d0bed0b2d0b0d0bdd0b8d18f20d182d0b5d185d0bdd0b8d0bad0be2dd18dd0bad0bed0bdd0bed0bcd0b8d187d0b5d181d0bad0bed0b920d0b820d181d0bed186d0b8d0b0d0bbd18cd0bdd0bed0b920d0b8d0bdd184d0bed180d0bcd0b0d186d0b8d0b820d0b820d183d0bdd0b8d184d0b8d186d0b8d180d0bed0b2d0b0d0bdd0bdd18bd1
anyway those strings contains non-utf symbols at the line end, as it seems from
SELECT log_comment,CAST(log_comment AS CHAR CHARACTER SET utf8) AS COMMENT
FROM `logging`
WHERE log_id = %somevalue%
because last symbol is � - for me it seems as black rhomb with white question in it, and last 20-30 characters are missing
SELECT log_comment,CAST(log_comment AS CHAR CHARACTER SET utf8) AS COMMENT
FROM `logging`
WHERE log_id = %somevalue%
As it was said in Joni's comment,
"The length of the text is exactly 255 bytes, which is the limit of a
MySQL tinytext/tinyblob field, and also often used by programmers as
the size for varchar/varbinary. It looks like your original data has
been clipped. The last D1 in your original data starts a new UTF-8
character, but the second byte is missing; that's why the last
character is broken in the converted text."
In the MediaWiki DB in the field [log_comment] of the table [logging] should be stored headers of pages that was altered. Some of them appeared to be longer than 255 symbols, so while being logged they were clipped. That confused me; I thought that there was kind of database error, so i should just alter those strings - add to them missing symbols. Now i see it is slightly possible, so i just can gather necessary information from other fields.
try this:
SELECT log_comment,
CONVERT(log_comment,VARCHAR(65535)) AS COMMENT
FROM `logging`
WHERE log_id = %somevalue%

String or Binary Data Would Be Truncated: NVARCHAR(MAX), SQL Server 2008

I've got a column called description of type NVARCHAR(MAX) - biggest you can have. I need to return this field with quotes around it, so I'm trying to
SELECT QUOTENAME(description, '"')
This doesn't work - I get a "string or binary data would be truncated error."
My googling tells me that this problem can be solved by using SET ANSI_WARNINGS OFF, but if I do that, I still get the same error anyway.
Normally I would just pull the values into a temp table and use a field that is two characters bigger than the field I'm pulling in, thus ensuring that the QUOTENAME function won't cause any problems. How do I make a column two characters bigger than MAX, though?
QUOTENAME is a function intended for working with strings containing SQL Server identifier names and thus only works for strings less than or equal to the length of sysname (128 characters).
Why doesn't SELECT '"' + description +'"' work for you?

How can I process data to avoid MySQL "incorrect string value" error?

I am trying to use a Rake task to migrate some legacy data from MS Access to MySQL. I'm working on Windows XP, using Ruby 1.8.6.
I have the encoding for Rails set as "utf8" in database.yml.
Also, the default character set for MySQL is utf8.
99% of the data is coming in fine, but every now and then I'll get a column value that gives me a error something like this:
Mysql::Error: Incorrect string value: '\x92 Comm...' for column 'name'
at row 1:
INSERT INTO `organizations` ( [...] )
VALUES('Lawyers’ Committee', [...] )
It looks as though the thing that's giving MySQL trouble is the apostrophe immediately after the "s" in the word "Lawyers".
Here's another one...
Mysql::Error: Incorrect string value: '\x99 aoc' for column 'department'
at row 1:
INSERT INTO `addresses`
[...]
'TRInfo™ aoc'
[....]
Looks like it's choking on the "TM" after "TRInfo".
Is there any Ruby or Rails method that I can run the data through to cleanse from it any characters that MySQL will choke on?
Ideally, it would be great to replace them with more palatable characters -- replace the apostrophe with a single quote and the TM symbol with the string "(TM)".
Or, if I could somehow configure MySQL to store those characters as-is without errors that would be great too.
It looks like your input data is not in utf-8.
I did a little investigating and the styled quote used in Lawyer's is encoded as \x92 in the Windows-1252 encoding, but would be nonsense for utf-8 (when I decoded it and encoded it into utf8, I got \xe2\x80\x99).
Thus you will need to convert the input strings from windows-1252 to utf-8 (or to unicode).
I had the same problem when putting contents of UTF-16 encoded files - which usually store one character per 16bit block - into mysql tables with java. The problem was that the UTF-16 encoded string contained so called surrogate pairs. It means two consecutive 16bit UTF-16 blocks encode one special character but cannot be translated into a corresponding UTF-8 encoding individually. See wikipedia for further explanation.
The solution was to simply replace these characters with spaces. This is the character range you might want to strip out of your string: U+D800–U+DFFF
In general, this happens when you insert strings to columns with incompatible encoding/collation.
I got this error when I had TRIGGERs, which inherit server's collation for some reason.
And mysql's default is (at least on Ubuntu) latin-1 with swedish collation.
Even though I had database and all tables set to UTF-8, I had yet to set my.cnf:
/etc/mysql/my.cnf :
[mysqld]
character-set-server=utf8
default-character-set=utf8
And this must list all triggers with utf8-*:
select TRIGGER_SCHEMA, TRIGGER_NAME, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, DATABASE_COLLATION from information_schema.TRIGGERS
And some of variables listed by this should also have utf-8-* (no latin-1 or other encoding):
show variables like 'char%';
It looks like your old database is in one string format (utf8?) and your rails is expecting something else. If you input is in utf8, have you tried configuring your rails to support it?
I encountered the same problem today.
After tried many times, I found out the reason and fix it at last.
For applications that store data using the default MySQL character set and collation (latin1, latin1_swedish_ci), so you need to specify the character set and collation to utf8/utf8_general_ci when your create your database or table.
e.g.:
$sql = "CREATE TABLE " . $table_name . " (
id mediumint(9) NOT NULL AUTO_INCREMENT,
bookname varchar(128) NOT NULL,
author varchar(64) NOT NULL,
PRIMARY KEY (id),
KEY (bookname)
)CHARACTER SET utf8 COLLATE utf8_general_ci;";
Reference:
《mysql create table problem? SOLVED!!!!!!!!!!!》
http://forums.mysql.com/read.php?121,193883,193883
《10.1.5. Configuring the Character Set and Collation for Applications》
http://dev.mysql.com/doc/refman/5.0/en/charset-applications.html
Hoping this can help you.
Adding binary before the weirdcolumn solves the problem.
In my case, I have an update trigger on tableA to insert data into other table.
There are some special characters in column weirdcolumn, and the update failed with message: "ERROR 1366 (HY000): Incorrect string value: '\xE7....'"
After I dig in a lot, I found the solution by adding binary before the string column name, or using cast(weirdcolumn as binary);
Hope this can help.
I had the same issue importing data from SQL Server to MySql using Php.
My solution was utf8_encode() when inserting into MySql and use utf8_decode() when retrieving from MySql to display into the browser.
Here you have my FULL code, that works good.
//For string values
$Gro2=(is_null($row["GrpNm"]))?"NULL":"\"".mysql_escape_string(utf8_encode($row["GrpNm"]))."\"";
$sqlMy ="INSERT INTO `tbl_name` VALUES ($Gro2)";
Please note: For new projects use
mysqli_escape_string()
link