Where did I go wrong with this unicode field in MySQL?

Where did I go wrong with this unicode field in MySQL? - mysql

I have a table with a field which contains strings in my MySQL database.
The MySQL version is 5.0.51a. The default character set for the table is 'utf8'.
Many of the strings have unicode characters such as \xae and \u21222 (registered symbol and trademark symbol respectively).
For example, suppose I have a row with a field this value:
"Bing® Blang™ Blaow"
The default character set of my mysql command line client is "latin1".
If I issue a SELECT statement in the mysql client program from the command line without specifying a character set, the output of the title shows up like so:
"Bing® Blang Blaow"
The (R) symbol is correct but the (TM) symbol is missing. If I cut and paste this string from the console into TextMate, the (TM) symbol appears, but is half-way behind the g in the word "Blang".
I am assuming that the half-way-behind-the-g thing is a just a display error in TextMate (though if anyone can provide further detail that'd be great, but that's not really the important part).
The main thing I am inferring from the its-there-after-you-cut-and-paste behavior is that the data is in the database but there's something wrong with some sort of character set setting somewhere.
If I override the default encoding of the mysql client on the command line like so:
mysql --default-character-set=utf8
Then do the same select, the string comes out as:
"BingÂ® BlangÂ™ Blaow"
which is to say that both the (R) and (TM) symbols appear and are in the right place but both are preceded by the unicode character \xae which is an A with a circumflex on top.
(Incidentally this is also how the data is displayed when I pull it out using python and display it on a web page, which is what my real problem is).
Anyway, what is going on here? Everything we have done recently has used UTF8 everywhere possible, but it's possible that some of these rows were inserted prior to that change which means they would've been using the latin1 default... however neither encoding seems to produce the right result?
If the rows were inserted when the default encoding on the table was latin1 before it was switched to utf8, then the encoding was switched (via alter table..) then would the encoding have actually been updated? Should one of the encodings work now? Will unicode ever stop kicking my ass?

There are quite a number of issues here:
About the characters
You indicate that the text has characters U+AE and U+2122 (® and ™ respectively). However, the results imply that the text has U+99 as the character after "Blang": When you set MySQL to output UTF8, then you see this "Â™" -- which is the UTF8 sequence for U+99 displayed on a terminal that is interpreting this byte stream as Windows-1252.
U+99 probably isn't what you wanted: In Unicode, that is an extended control character with no graphic representation. It just so happens that in Windows-1252, that 0x99 is the encoding of the trademark symbol (U+2122).
(Please note that both MySQL and most web browsers have a common, "broken" behavior of using Windows-1252 when you choose Latin1. Sigh.)
What's probably wrong
Your terminal isn't operating in the right character set. It is clearly operating in Windows-1252.
Programs should be connecting to the database in UTF-8. You can do that in the command line, as you've found, or by executing the statement SET NAMES utf8_general_ci; in your database handle before doing anything else. Some other database APIs may have other ways of doing this, but there is no generic way for all SQL engines. SET NAMES ... is specific to MySQL, but sets all the required character set variables (there are three!) at once.
The process that is inserting data into the database is taking user input and not correctly converting it from Windows-1252 into UTF-8 before inserting. This is how you got a U+99 into your database. Since I don't know how you are getting that data, I'm not sure what to fix, but here are several possibilities:
If the data comes from a web page form, be sure the page with the form is served in UTF-8, is properly marked as such (via the MIME Type, and the <meta> tag.) Be sure also, that the <form> tag is not specifying a different character set.
When converting the data, be sure that you use iconv or similar libraries to convert from the input character set to UTF-8. Even if you think the input is Latin1, do not try to do this by hand (for example, by zero expanding every byte to 16-bits then claiming this is UTF-16 - that won't work for Windwos-1252!). Make absolutely certain that you know the character set of the source data. In particular, be sure to know if it is Latin1 or Windows-1252.
Instead of converting the user input, you could connect to the database in character set of the user input, and then just insert the raw byte data you get from the user. However, you must be sure to only do insertions this way: reading back data from the data with the user's character set in effect will lose information if other rows have data that can't be represented in that character set. It is possible to set up a MySQL connection so that you issue statements in one character set and read results back in another... But it isn't for the faint of heart, and future programmers will likely go nuts trying to understand why the code does this.
If, when you pull the data out with Python and display it in a web page, you see the string "Â™", then that is indication that your are pulling the data out of the database correctly as UTF-8, but then putting it into a web page that is not correctly identified as UTF-8. Probably it is just defaulting to Latin1, which as noted above will really be Windows-1252.
Nonetheless, even if you fix the display, note that the data base has bad data in it, since U+99 isn't really the trademark symbol in a UTF-8 column. You'll need to clean up your data, by reading all the data, and replacing any characters in the range of U+80 through U+9F with what they were likely to have been, assuming the data was really Windows-1252. If you're not certain what character set the data was in originally -- then this data is, alas, just junk.
About changing character sets of tables
Converting the character set and collation of the table after inserting data will convert the columns, but, of course, any data already inserted will have already lost whatever characters the original character set couldn't represent.
Be careful to note the difference between ALTER TABLE foo CONVERT TO CHARACTER SET ... and ALTER TABLE foo CHARACTER SET ... The later only changes the default character set for the table, and will not change any columns, even if they were set to the default at creation. (MySQL only uses the defaults at column creation time, it doesn't remember that a given column is "defaulted" not does it keep it in sync with the table's default.)

I think it has to do with the settings of the mysql connection in your Python code.
try setting conn.character_set_name or something like that, depends on the mysql connection lib you are using.
in case of MySQLdb it should be smthng like this:
def character_set_name(*args, **kwargs): return 'utf-8'
conn.character_set_name = new.instancemethod(character_set_name, conn, conn.__class__)

Could it be that some of the columns have an explicitly different character set than the table default?

something like this...?
ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci

Related

Why is 'João' coming out as 'Jo\u00e3o'?

I have this column in my database table:
`data` mediumtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci DEFAULT NULL
Names like 'João' are inserted. But they're showing up as Jo\u00e3o. E.g.:
{"4":"jo\u00e3o da silva"}
I tried changing the character set and the collation, but it didn't seem to help. What can I do in order to fix it?
My database "character set" settings:

First of all, \u00e3 is not generated by MySQL. It is, however, optionally generated by PHP's json_encode(). Be sure to use JSON_UNESCAPED_UNICODE in the second argument to that function.
Meanwhile, those codes are properly interpreted by web browsers, so you won't notice issues there. And reading and writing from and to a database won't change them. But note that any backslash needs to be escaped when INSERTing into a database table.
For use in MySQL tables, I prefer to have the connection and server settings consistently set at utf8mb4 so that Unicode stuff simply comes and goes without conversion.
I agree with "never trust your screen". About the only way to see what is actually stored in the database is to use SELECT HEX(col)... For ã:
UTF-8 (utf8mb4): Hex: C3A3
latin1: Hex: E3
But, for \u00e3, the hex would be 5C7530306533
In PHP, there is bin2hex().

Unknown characters at the end of the word

I am pulling data from database into R. I use the following commands to do this:
drv = dbDriver("MySQL")
con = dbConnect(drv,<credentials>)
dbSendQuery(con,"SET character_set_results = utf8;")
<code to pull data>
The data is stored in UTF-8 encoding in the database.I pull a dataframe with a column containing words. Once i pull the data, i convert the encoding to ASCII//TRANSLIT using iconv(x,"UTF-8","ASCII//TRANSLIT") Everything is working fine except that for few words i see an extra character appearing after i change the encoding. For example when u look in database you see abc and when u import, you get abc. But once you change the encoding to ASCII//TRANSLIT this word changes to abc?. I used https://www.branah.com/unicode-converter to check for encoding. I copied word abc in first box named unicode text and i see abcâ¬ in box named utf-8 text.What are these special characters and how to use them in regex to filter these out?

SET character_set_results = utf8 is probably not sufficient. Change to SET NAMES utf8mb4.
What do you mean by "pull the data"? Is it put into a database table? If so, please provide SHOW CREATE TABLE.
To investigate strange characters, do SELECT HEX(...) ... to see what is actually there. From that, we might be able to deduce what happened.
It looks like â¬ is part of the Mojibake for one of these.
⬀,⬁,⬂,⬃,⬄,⬅,⬆,⬇,⬈,⬉,⬊,⬋,⬌,⬍,⬎,⬏,⬐,⬑,⬒,⬓,⬔,⬕,⬖,⬗,⬘,⬙,⬚,⬛,⬜,⬝,⬞,⬟,⬠,⬡,⬢,⬣,⬤,⬥,⬦,⬧,⬨,⬩,⬪,⬫,⬬,⬭,⬮,⬯,⬰,⬱,⬲,⬳,⬴,⬵,⬶,⬷,⬸,⬹,⬺,⬻,⬼,⬽,⬾,⬿
â¬, when treated as latin1, is hex E2AC
⬀ when treated as UTF-8 (utf8mb4), is hex E2AC80
⬁ is hex E2AC81,
etc
The causes of Mojibake are discussed here.
Instead of trying to filter them out, you should fix the code to preserve them.

How to resolve an incorrect string value error (MySQL)

I have some documents, formatted in XML. I want to store their contents (raw text, formatting preserved) in cells in an SQL table, as LONGTEXT, so that I can simply grab the value of a cell and load it in a webpage later. I am doing this via MySQL Workbench.
However, when I try to apply the additions to my table, I get error 1366: Incorrect string value: \xE2\x80\xAF1, ...
I tried changing the character set to utf-8-general-ci and cp1251, but I keep getting the same errors.
Also, I searched the XML file for the string \xE2\x80\xAF1, but it's not even in the file.
Does anybody know what this string is?
The XML file is only 219KB so I think it should (very) easily fit in a LONGTEXT entry.
Does XML make use of any characters that could cause this error?
Am I missing another cause of the error?

Your code is not a literal text but points to NARROW NO-BREAK SPACE.
With UTF-8 simple characters are coded as one byte, other characters need two bytes.
And there are some characters which need three bytes. These characters tend to lead to such errors.
Find a related question here: freebcp: "Unicode data is odd byte size for column. Should be even byte size"

You need to
specify utf8 (or utf8mb4) for the connection established from your client (Workbench).
declare the column in question to be CHARACTER SET utf8 (or utf8mb4).

mysql - How to save ñ

Whenever I try to save ñ it becomes ? in the mysql database. After some few readings it is suggested that I have to change my jsp charset to UTF-8. For some reasons I have to stick to ISO-8859-1. My database table encoding is latin1. How can I fix this? Please help.

Go to your database administration with MySQL WorkBench for example, put the Engine to InnoDB and the collation to utf8-utf8_general_ci.

You state in your question that you require a ISO-8859-1 backend (latin1), and a Unicode (UTF-8) frontend. This setup is crazy, because the set on the frontend is much larger than that allowed in the database. The sanest thing would be using the same encoding through the software stack, but also using Unicode only for storage would make sense.
As you should know, a String is a human concept for a sequence of characters. In computer programs, a String is not that: it can be viewed as a sequence of characters, but it's really a pair data structure: a stream of bytes and an encoding.
Once you understand that passing a String is really passing bytes and a scheme, let's see who sends what:
Browser to HTTP server (usually same encoding as the form page, so UTF-8. The scheme is specified via Content-Type. If missing, the server will pick one based on its own strategy, for example default to ISO-8859-1 or a configuration parameter)
HTTP Server to Java program (it's Java to Java, so the encoding doesn't matter since we pass String objects)
Java client to MySQL server (the Connector/J documentation is quite convoluted - it uses the character_set_server system variable, possibly overridden by the characterEncoding connection parameter)
To understand where the problem lies, first assure that the column is really stored as latin1:
SELECT character_set_name, collation_name
FROM information_schema.columns
WHERE table_schema = :DATABASE
AND table_name = :TABLE
AND column_name = :COLUMN;
Then write the Java string you get from the request to a log file:
logger.info(request.getParameter("word"));
And finally see what actually is in the column:
SELECT HEX(:column) FROM :table
At this point you'll have enough information to understand the problem. If it's really a question mark (and not a replacement character) likely it's MySQL trying to transcode a character from a larger set (let's say Unicode) to a narrower one which doesn't contain it. The strange thing here is that ñ belongs to both ISO-8859-1 (0xF1, decimal 241) and Unicode (U+00F1), so it'd seem like there's a third charset (maybe a codepage?) involved in the round trip.
More information may help (operating system, HTTP server, MySQL version)

Change your db table content encoding to UTF-8
Here's the command for whole DB conversion
ALTER DATABASE db_name DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
And this is for single tables conversion
ALTER TABLE db_table CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

change your table collate to utf8_spanish_ci
where ñ is not equal to n but if you want both characters to be equal use
utf8_general_ci instead

I try several combinations, but this works for me:
VARCHAR(255) BINARY CHARACTER SET utf8 COLLATE utf8_bin
When data retrieve in dbforge express, shows like:
NIÃ‘A
but in the application shows like:
NIÑA

I had the same problem. Found out that is not an issue about encoding UTF-8 or whatever charset. I imported my data from windows ANSI and all my Ñ and ñ where put in the database perfectly as it should be. Example last names showed on database last_name = "MUÑOZ". I was able to select normally from the database with query Select * from database where last_name LIKE "%muñoz%" and phpmyadmin show me results fine. It selected all "MUÑOZ" and "MUNOZ" without a problem. So phpmyadmin does show all my Ñ and ñ without any problems.
The problem was the program itself. All my characters mention, showed as you describe with the funky "MU�OZ" question mark. I had follow all advice everywhere. Set my headers correctly and tried all my charsets available. Even used google fonts and whatsoever font available to display correctly those last names, but no success.
Then I remembered an old program that was able to do the trick back and forth transparently and peeked into the code to figure it out: The database itself, showing all my special characters was the problem. Remember, I uploaded using windows ANSI encoding. Phpmyadmin did as expected, uploaded all as instructed.
The old program fixed this problem translating the Ñ to its UNICODE HTML Entity: Ñ (see chart here https://www.compart.com/en/unicode/U+00D1 ) a process done back and forth from MySQL to the app.
So you just need to change your database strings containing the letter Ñ and ñ to their corresponding UNICODE to reflect correctly on your browser with UTF charset.
In my case, I solved my issues replacing all my Ñ and ñ for their corresponding UNICODE in all the last names in my database.
UPDATE database_name
SET
last_name = REPLACE(last_name,
'MUÑOZ',
'MUÑOZ');
Now, Im able to display, browse, even search all my correct last names and accents/tildes, proper to spanish language. I hope this helps. It was a pain to figure it out, but an old program solved the problem. Best regards and happy coding !

How to fix "Incorrect string value" errors?

After noticing an application tended to discard random emails due to incorrect string value errors, I went though and switched many text columns to use the utf8 column charset and the default column collate (utf8_general_ci) so that it would accept them. This fixed most of the errors, and made the application stop getting sql errors when it hit non-latin emails, too.
Despite this, some of the emails are still causing the program to hit incorrect string value errrors: (Incorrect string value: '\xE4\xC5\xCC\xC9\xD3\xD8...' for column 'contents' at row 1)
The contents column is a MEDIUMTEXT datatybe which uses the utf8 column charset and the utf8_general_ci column collate. There are no flags that I can toggle in this column.
Keeping in mind that I don't want to touch or even look at the application source code unless absolutely necessary:
What is causing that error? (yes, I know the emails are full of random garbage, but I thought utf8 would be pretty permissive)
How can I fix it?
What are the likely effects of such a fix?
One thing I considered was switching to a utf8 varchar([some large number]) with the binary flag turned on, but I'm rather unfamiliar with MySQL, and have no idea if such a fix makes sense.

UPDATE to the below answer:
The time the question was asked, "UTF8" in MySQL meant utf8mb3. In the meantime, utf8mb4 was added, but to my knowledge MySQLs "UTF8" was not switched to mean utf8mb4.
That means, you'd need to specifically put "utf8mb4", if you mean it (and you should use utf8mb4)
I'll keep this here instead of just editing the answer, to make clear there is still a difference when saying "UTF8"
Original
I would not suggest Richies answer, because you are screwing up the data inside the database. You would not fix your problem but try to "hide" it and not being able to perform essential database operations with the crapped data.
If you encounter this error either the data you are sending is not UTF-8 encoded, or your connection is not UTF-8. First, verify, that the data source (a file, ...) really is UTF-8.
Then, check your database connection, you should do this after connecting:
SET NAMES 'utf8mb4';
SET CHARACTER SET utf8mb4;
Next, verify that the tables where the data is stored have the utf8mb4 character set:
SELECT
`tables`.`TABLE_NAME`,
`collations`.`character_set_name`
FROM
`information_schema`.`TABLES` AS `tables`,
`information_schema`.`COLLATION_CHARACTER_SET_APPLICABILITY` AS `collations`
WHERE
`tables`.`table_schema` = DATABASE()
AND `collations`.`collation_name` = `tables`.`table_collation`
;
Last, check your database settings:
mysql> show variables like '%colla%';
mysql> show variables like '%charac%';
If source, transport and destination are utf8mb4, your problem is gone;)

MySQL’s utf-8 types are not actually proper utf-8 – it only uses up to three bytes per character and supports only the Basic Multilingual Plane (i.e. no Emoji, no astral plane, etc.).
If you need to store values from higher Unicode planes, you need the utf8mb4 encodings.

The table and fields have the wrong encoding; however, you can convert them to UTF-8.
ALTER TABLE logtest CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE logtest DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
ALTER TABLE logtest CHANGE title title VARCHAR(100) CHARACTER SET utf8 COLLATE utf8_general_ci;

"\xE4\xC5\xCC\xC9\xD3\xD8" isn't valid UTF-8. Tested using Python:
>>> "\xE4\xC5\xCC\xC9\xD3\xD8".decode("utf-8")
...
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid data
If you're looking for a way to avoid decoding errors within the database, the cp1252 encoding (aka "Windows-1252" aka "Windows Western European") is the most permissive encoding there is - every byte value is a valid code point.
Of course it's not going to understand genuine UTF-8 any more, nor any other non-cp1252 encoding, but it sounds like you're not too concerned about that?

I solved this problem today by altering the column to 'LONGBLOB' type which stores raw bytes instead of UTF-8 characters.
The only disadvantage of doing this is that you have to take care of the encoding yourself. If one client of your application uses UTF-8 encoding and another uses CP1252, you may have your emails sent with incorrect characters. To avoid this, always use the same encoding (e.g. UTF-8) across all your applications.
Refer to this page http://dev.mysql.com/doc/refman/5.0/en/blob.html for more details of the differences between TEXT/LONGTEXT and BLOB/LONGBLOB. There are also many other arguments on the web discussing these two.

First check if your default_character_set_name is utf8.
SELECT default_character_set_name FROM information_schema.SCHEMATA S WHERE schema_name = "DBNAME";
If the result is not utf8 you must convert your database. At first you must save a dump.
To change the character set encoding to UTF-8 for all of the tables in the specified database, type the following command at the command line. Replace DBNAME with the database name:
mysql --database=DBNAME -B -N -e "SHOW TABLES" | awk '{print "SET foreign_key_checks = 0; ALTER TABLE", $1, "CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci; SET foreign_key_checks = 1; "}' | mysql --database=DBNAME
To change the character set encoding to UTF-8 for the database itself, type the following command at the mysql> prompt. Replace DBNAME with the database name:
ALTER DATABASE DBNAME CHARACTER SET utf8 COLLATE utf8_general_ci;
You can now retry to to write utf8 character into your database. This solution help me when i try to upload 200000 row of csv file into my database.

Although your collation is set to utf8_general_ci, I suspect that the character encoding of the database, table or even column may be different.
ALTER TABLE tabale_name MODIFY COLUMN column_name VARCHAR(255)
CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;

In general, this happens when you insert strings to columns with incompatible encoding/collation.
I got this error when I had TRIGGERs, which inherit server's collation for some reason.
And mysql's default is (at least on Ubuntu) latin-1 with swedish collation.
Even though I had database and all tables set to UTF-8, I had yet to set my.cnf:
/etc/mysql/my.cnf :
[mysqld]
character-set-server=utf8
default-character-set=utf8
And this must list all triggers with utf8-*:
select TRIGGER_SCHEMA, TRIGGER_NAME, CHARACTER_SET_CLIENT, COLLATION_CONNECTION, DATABASE_COLLATION from information_schema.TRIGGERS
And some of variables listed by this should also have utf-8-* (no latin-1 or other encoding):
show variables like 'char%';

I got a similar error (Incorrect string value: '\xD0\xBE\xDO\xB2. ...' for 'content' at row 1). I have tried to change character set of column to utf8mb4 and after that the error has changed to 'Data too long for column 'content' at row 1'.
It turned out that mysql shows me wrong error. I turned back character set of column to utf8 and changed type of the column to MEDIUMTEXT. After that the error disappeared.
I hope it helps someone.
By the way MariaDB in same case (I have tested the same INSERT there) just cut a text without error.

That error means that either you have the string with incorrect encoding (e.g. you're trying to enter ISO-8859-1 encoded string into UTF-8 encoded column), or the column does not support the data you're trying to enter.
In practice, the latter problem is caused by MySQL UTF-8 implementation that only supports UNICODE characters that need 1-3 bytes when represented in UTF-8. See "Incorrect string value" when trying to insert UTF-8 into MySQL via JDBC? for details. The trick is to use column type utf8mb4 instead of type utf8 which doesn't actually support all of UTF-8 despite the name. The former type is the correct type to use for all UTF-8 strings.

In my case, Incorrect string value: '\xCC\x88'..., the problem was that an o-umlaut was in its decomposed state. This question-and-answer helped me understand the difference between o¨ and ö. In PHP, the fix for me was to use PHP's Normalizer library. E.g., Normalizer::normalize('o¨', Normalizer::FORM_C).

The solution for me when running into this Incorrect string value: '\xF8' for column error using scriptcase was to be sure that my database is set up for utf8 general ci and so are my field collations. Then when I do my data import of a csv file I load the csv into UE Studio then save it formatted as utf8 and Voila! It works like a charm, 29000 records in there no errors. Previously I was trying to import an excel created csv.

I have tried all of the above solutions (which all bring valid points), but nothing was working for me.
Until I found that my MySQL table field mappings in C# was using an incorrect type: MySqlDbType.Blob . I changed it to MySqlDbType.Text and now I can write all the UTF8 symbols I want!
p.s. My MySQL table field is of the "LongText" type. However, when I autogenerated the field mappings using MyGeneration software, it automatically set the field type as MySqlDbType.Blob in C#.
Interestingly, I have been using the MySqlDbType.Blob type with UTF8 characters for many months with no trouble, until one day I tried writing a string with some specific characters in it.
Hope this helps someone who is struggling to find a reason for the error.

If you happen to process the value with some string function before saving, make sure the function can properly handle multibyte characters. String functions that cannot do that and are, say, attempting to truncate might split one of the single multibyte characters in the middle, and that can cause such string error situations.
In PHP for instance, you would need to switch from substr to mb_substr.

I added binary before the column name and solve the charset error.
insert into tableA values(binary stringcolname1);

Hi i also got this error when i use my online databases from godaddy server
i think it has the mysql version of 5.1 or more. but when i do from my localhost server (version 5.7) it was fine after that i created the table from local server and copied to the online server using mysql yog i think the problem is with character set
Screenshot Here

To fix this error I upgraded my MySQL database to utf8mb4 which supports the full Unicode character set by following this detailed tutorial. I suggest going through it carefully, because there are quite a few gotchas (e.g. the index keys can become too large due to the new encodings after which you have to modify field types).

There's good answers in here. I'm just adding mine since I ran into the same error but it turned out to be a completely different problem. (Maybe on the surface the same, but a different root cause.)
For me the error happened for the following field:
#Column(nullable = false, columnDefinition = "VARCHAR(255)")
private URI consulUri;
This ends up being stored in the database as a binary serialization of the URI class. This didn't raise any flags with unit testing (using H2) or CI/integration testing (using MariaDB4j), it blew up in our production-like setup. (Though, once the problem was understood, it was easy enough to see the wrong value in the MariaDB4j instance; it just didn't blow up the test.) The solution was to build a custom type mapper:
package redacted;
import javax.persistence.AttributeConverter;
import java.net.URI;
import java.net.URISyntaxException;
import static java.lang.String.format;
public class UriConverter implements AttributeConverter<URI, String> {
#Override
public String convertToDatabaseColumn(URI attribute) {
return attribute.toString();
}
#Override
public URI convertToEntityAttribute(String field) {
try {
return new URI(field);
}
catch (URISyntaxException e) {
throw new RuntimeException(format("could not convert database field to URI: %s", field));
}
}
}
Used as follows:
#Column(nullable = false, columnDefinition = "VARCHAR(255)")
#Convert(converter = UriConverter.class)
private URI consulUri;
As far as Hibernate is involved, it seems it has a bunch of provided type mappers, including for java.net.URL, but not for java.net.URI (which is what we needed here).

In my case that problem was solved by changing Mysql column encoding to 'binary' (data type will be changed automatically to VARBINARY). Probably I will not be able to filter or search with that column, but I'm no need for that.

In my case ,first i've meet a '???' in my website, then i check Mysql's character set which is latin now ,so i change it into utf-8,then i restart my project ,then i got the same error with you , then i found that i forget to change the database's charset and change into utf-8, boom,it worked.

I tried almost every steps mentioned here. None worked. Downloaded mariadb. It worked. I know this is not a solution yet this might help somebody to identify the problem quickly or give a temporary solution.
Server version: 10.2.10-MariaDB - MariaDB Server
Protocol version: 10
Server charset: UTF-8 Unicode (utf8)

I had a table with a varbinary column that I wanted to convert to utf8mb4 varchar. Unfortunately some of the existing data was invalid UTF-8 and the ALTER query returned Incorrect string value for various rows.
I tried every suggestion I could find regarding cast / convert / char_length = length etc. but nothing in SQL detected the erroneous values, other than the ALTER query returning bad rows one by one. I would love a pure SQL solution to remove the bad values. Sadly this solution is not pretty
I ended up select *'ing the entire table into PHP, where the erroneous rows could be detected en-masse by:
if (empty(htmlspecialchars($row['whatever'])))

The problem can also be caused by the client if the charset is not set to utf8mb4. so even if every Database, Table and Column is set to utf8mb4 you will still get an error, for instance in PyCharm.
For Python, set the charset of the connection in the MySQL Connector connect method:
mydb = mysql.connector.connect(
host="IP or Host",
user="<user>",
passwd="<password>",
database="<yourDB>",
# set charset to utf8mb4 to support emojis
charset='utf8mb4'
)

I know i`m late to the ball but someone else might come accross the problem i had with this and be happy to read my workaround.
I have come accross this problem with french characters. turns out i the text I was copying had encoding the accents on some charaatcers as 2 chars and others as single chars...
i couldn`t find how to set my table to accept the strings so i ended up changing the diacritics in my text import.
here is a list of them as double characters to search for them in your texts.
ùòìàè
áéíóú
ûôêâî
ç

1 - You have to declare in your connection the propertie of enconding UTF8. http://php.net/manual/en/mysqli.set-charset.php.
2 - If you are using mysql commando line to execute a script, you have to use the flag, like:
Cmd: C:\wamp64\bin\mysql\mysql5.7.14\bin\mysql.exe -h localhost -u root -P 3306 --default-character-set=utf8 omega_empresa_parametros_336 < C:\wamp64\www\PontoEletronico\PE10002Corporacao\BancoDeDadosModelo\omega_empresa_parametros.sql

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008