Scala jdbc mysql json field encoding issue - mysql

I have a MySQL table with fields of json type. The table and the db have utf8 charset.
The problem I having is that when I query data, the value of these fields are strings in incorrect charset and unicode characters are messed up: like, instead of é I get é.
There's no problems with other text fields.
I tried using Quill and Slick. The connection config is like this:
characterEncoding="UTF-8"
useUnicode=true
I also tried jdbc:mysql://"${DB_HOST}/${DB_NAME}&useUnicode=yes&characterEncoding=UTF-8
The only workaround I found is to manually convert strings to another charset like
new String(string.getBytes("Windows-1252"), "UTF-8")
Is there a better way?

Related

When I store data from servlet to mysql database characters like "<", ">" are stored as unicode format like u003c rather than the actual symbol

I'm trying to store data in my database test like "< hi >" is stored as "u003c hi u004e" my database uses utf8mb4 charset. But also I experimented with utf8, utf 32, etc. It didn't work out.
The problem is not with the Mysql Db. The problem is with gson/json library.
Use this code to solve it.
Gson gsonBuilder =new GsonBuilder().disableHtmlEscaping().create();

UTF-8 encoded MS Access table

I have a MS Access table which is encoded in UTF-8 charset and the characters appear like this:
Participació en comissió
If I UTF-8 decode this text I get the correct text:
Participació en comissió
How can I utf-8 decode several Access table columns? I would like to end up with the same MS Access database but with the columns converted (utf-8 decoded). I cannot figure out an easy way to do this conversion.
Thanks in advance.
--
More clarifications:
So how did you decode the text that you have in the question?
I simply put the sentence in an online utf-8 decoder but it crashes when there is a lot of text. FYI, the Access table comes from a MS SQL Server database with Modern_Spanish_CI_AS collation and varchar (MAX) data type field. Maybe is there a way to perform the conversion while exporting the table from the MS SQL Server?
While searching for a solution I found this post that has a function to decode utf-8 fields right from the MS SQL Server. I tested it and it works perfectly, althought it is quite slow. Hope this helps someone else with the same problem.
New query editor and copy&paste the function provided in this link:
Convert text value in SQL Server from UTF8 to ISO 8859-1

Mysql JSON type have messy code using spring data jpa

I have set MySQL's character set to utf8mb4, and it works fine for varchar type, saving and reading Chinese character works fine.
But when it comes to JSON type, saving works fine, while reading JSON as string using spring-data-jpa, it get messy code.
I have tried the below settings, it doesn't work.
spring.datasource.url = jdbc:mysql://localhost:3306/TAIMIROBOT?&useUnicode=yes&characterEncoding=UTF-8
spring.datasource.init-sql="SET NAMES utf8mb4 COLLATE utf8mb4_bin;"
This issue has been fixed. bugs.mysql.com/bug.php?id=80631
ResultSet.getString() sometimes returned garbled data for columns of the
JSON data type. This was because JSON data was binary encoded by MySQL using
the utf8mb4 character set, but decoded by Connector/J using the ISO-8859-1
character set.
The fix has been included in Connector/J 6.0.5. The entry for the 5.1.40
changelog has been included into the 6.0.5 changelog.
If you have the same problem, just update your connector version in your maven pom file to 6.0.5 (if you are using maven).

ZF2 Doctrine2 MySql charset error

I've setup a MySQL DB with utf8_unicode_ci collation, and all the tables and columns on it have de same collation.
My Doctrine config have SET NAMES utf8 as a connection option and my html files use utf8 charset.
The text saved on those tables contain accented characters (á,è,etc).
The problem is that when I save the content to the DB, it saves with strange characters, like when I try to save ISO in UTF8 table. (e.g.: Notícias)
The only workaround that i've found is to, utf8_decode before save, and utf8_encode before printing.
That means that, for some reason, something in between is messing up utf8 with iso.
What might be?
Thanks.
EDIT:
I've setup to encode before saving and decode before printing, and it prints correctly but in DB my chars change to:
XPTÓ -> XPTÓ
This makes searching in DB for "XPTÓ" impossible...
I would print bin2hex($string); at each step of the original workflow (i.e. without encode/decode steps).
Go through each of:
the raw $_POST data
the values you get after form validation
the values that get put in your bound Entity
the values you'd get from the db if you query it directly using PDO (get this from
$em->getConnection())
the values that get populated into your Entity on reload (can do this via $em->detach($entity); $entity = $em->find('Entity', $id);)
You'd be looking at the point at which the output changes, and focus your search there.
I would also double check:
On the db: SHOW CREATE TABLE 'table' shows CHARSET=utf8 for the whole table (and nothing different for the individual columns)
That the tool you use to see your database values (Navicat, phpMyAdmin) has got the correct encoding set.

Problem with charset

I have an MYSQL Database in utf-8 format, but the Characters inside the Database are ISO-8859-1 (ISO-8859-1 Strings are stored in utf-8). I've tried with recode, but it only converted e.g. ü to ü). Does anybody out there has an solution??
If you tried to store ISO-8859-1 characters in the a database which is set to UTF-8 you just managed to corrupt your "special characters" -- as MySQL would retrieve the bytes from the database and try to assemble them as UTF-8 rather than ISO-8859-1. The only way to read the data correctly is to use a script which does something like:
ResultSet rs = ...
byte[] b = rs.getBytes( COLUMN_NAME );
String s = new String( b, "ISO-8859-1" );
This would ensure you get the bytes (which came from a ISO-8859-1 string from what you said) and then you can assemble them back to ISO-8859-1 string.
The other problem as well -- what do you use to "view" the strings in the database -- is it not the case that your console doesn't have the right charset to display those characters rather than the characters being stored wrongly?
NOTE: Updated the above after the last comment
I just went through this. The biggest part of my solution was exporting the database to .csv and Find / Replace the characters in question. The character at issue may look like a space, but copy it directly from the cell as your Find parameter.
Once this is done - and missing this is what took me all morning:
Save the file as CSV ( MS-DOS )
Excellent post on the issue
Source of MS-DOS idea