store polish characters mysql - mysql

I am trying to save characters like "ą", "ć", "ł" but they are saved in the database as question marks (I save them using phpMyAdmin).
The database and table's collation is utf8_bin.

Try changing the collation to:
utf8_unicode_ci
or
utf8_polish_ci
You can refer to: http://mysql.rjweb.org/doc.php/charcoll
Also you can TRY altering the specific column with:
ALTER TABLE tbl MODIFY COLUMN txt TEXT CHARACTER SET utf8

I've searched a lot and finally, I got a solution with this:
ALTER TABLE tableName MODIFY COLUMN columnName VARCHAR(64) CHARACTER SET `binary`;
You can change VARCHAR(64) to match your needs. I hope this helps someone. Note that I did not only required to store Polish characters but French and Spanish ones as well. So the solutions above might work for just polish chars.

You can also change column from vchar to nvchar. Then when inserting values to DB remember to ad N before as follows: N'ŁÓDŹ' (in persistence frameworks u should have some kind of NString representation)
from documentation:
Nvarchar stores UNICODE data. If you have requirements to store UNICODE or multilingual data, nvarchar is the choice. Varchar stores ASCII data and should be your data type of choice for normal use. Regarding memory usage, nvarchar uses 2 bytes per character, whereas varchar uses 1

Related

How to find out mysql field level charset?

I need to convert latin1 charset of a table to utf8.
Quoting from mysql docs:
The CONVERT TO operation converts column values between the original and named character sets. This is not what you want if you have a column in one character set (like latin1) but the stored values actually use some other, incompatible character set (like utf8mb4). In this case, you have to do the following for each such column:
ALTER TABLE t1 CHANGE c1 c1 BLOB;
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8mb4;
This answer shows how to find out charset at DB level, table level, and column level. But I need to find out the charset of the actual stored values. How can I do that?
Since my connector/j jdbc connection string doesn't specify any characterEncoding or connectionCollation properties, it is possible that it used utf8 by default to store the values, in which case I don't need any conversion, just change the table metadata.
mysql-connector-java version: 8.0.22
mysql database version: 5.6
spring boot version: 2.5.x
The character set of the string in a given column should be the same as the column definition.
There have been cases where people accidentally store the bytes of the wrong encoding in a column. For example, they store bytes of a latin1 encoding in a utf8 field. This is a terrible idea, because queries can't tell the difference. Those bytes may not be valid values of the column's defined encoding, and this results in garbage data. Cleaning up a table where some of the strings are stored in the wrong encoding is an unpleasant chore.
So I strongly urge you to store only strings encoded in a compatible way according to the column's definition, and to assume that all strings are stored this way.
To answer the title:
SHOW CREATE TABLE tablename shows the detault charset for the table and any overrides for individual columns.
Don't blindly use CONVERT TO, especially the 2-step ALTER you are showing. Let's see what is in the table now (SELECT col, HEX(col) ... for something with accented text.
See Trouble with UTF-8 characters; what I see is not what I stored for the main 4 types of problems.
This gives several cases and how to fix them. http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases
One case involves using CONVERT TO; two other cases involve using BLOB or VARBINARY.

How to create multilingual Unicode text attributes in SQL?

I have a database and the requirement is to store the data in the columns that can:
hold fixed-length Unicode characters like Japanese, Chinese, French, Arabic and so on characters.
The data stored in a column is Unicode or multilingual and is of variable length.
In my suggestions, the Data Types are NCHAR, NVARCHAR, CHAR and VARCHAR etc...
But please tell me how what are the SQL queries to create these columns with the above-mentioned constraints.
The user requirements are to speed up the data retreival process. Also if to save hard disk.
Depending on your DBMS, you can create your database defining what would be the character encoding (normally, UTF-8 would do). Once the database was create with that encoding, you can insert text in any language. Take into account that the actual number of characters that you will be able to store within a table column will normally be less that what you defined as string length. For instance, if you create the column as varchar(1000), you will NOT be able to store 1000 character in all cases.
Check your specific DBMS documentation on how to configure UTF-8 encoding.

Varchar Encoding in MySQL

I am coming to MySQL from MS SQL. It was my understanding that in order to store International Characters, I need to declare a field as varchar with UTF-8 character set.
I am using Sequel Pro to develop MySQL database. When I manually enter international characters into a field in my table, it does not understand it and turns it into questions marks (?????).
Could someone please point me into the right direction?
Simple example,
ALTER TABLE t MODIFY col1 CHAR(50) CHARACTER SET utf8;
Source: http://dev.mysql.com/doc/refman/5.0/en/charset-conversion.html

Is this a safe way to convert MySQL tables from latin1 to utf-8?

I need to change all the tables in one of my databases from latin1 to utf-8 (with utf8_bin collation).
I have dumped the database, created a test database from it, and run the following without any errors or warnings for each table:
ALTER TABLE tablename CONVERT TO CHARSET utf8 COLLATION utf8_bin
Is it safe for me to repeat this on the real database? The data seems fine by inspection...
There are 3 different cases to consider:
The values are indeed encoded using Latin1
This is the consistent case: declared charset and content encoding match. This was the only case I covered in my initial answer.
Use the command you suggested:
ALTER TABLE tablename CONVERT TO CHARSET utf8 COLLATE utf8_bin
Note that the CONVERT TO CHARACTER SET command only appeared in MySQL 4.1.2, so anyone using a database installed before 2005 had to use an export/import trick. This is why there are so many legacy scripts and document on Internet doing it the old way.
The values are already encoded using utf8
In this case, you don't want mysql to convert any data, you only need to change the column's metadata.
For this, you have to change the type to BLOB first, then to TEXT utf8 for each column, so that there are no value conversions:
ALTER TABLE t1 CHANGE c1 c1 BLOB;
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8
This is the recommended way, and it is explicitely documented in Alter Table Syntax Documentation.
The values use in a different encoding
The default encoding was Latin1 for several years on a some Linux distributions. In this case, you have to use a combination of the two techniques:
Fix the table meta-data, using the BLOB type trick
Convert the values using CONVERT TO.
A straightforward conversion will potentially break any strings with non-utf7 characters.
If you don't have any of those (i.e. all of your text is english), you'll usually be fine.
If you've any of those, however, you need to convert all char/varchar/text fields to blob in an initial run, and to convert them to utf8 in a subsequent run.
See this article for detailed procedures:
http://codex.wordpress.org/Converting_Database_Character_Sets
I've done this a few times on production databases in the past (converting from the old standard encoding swedish to latin1), and when MySQL encounters a character that cannot be translated to the target encoding, it aborts the conversion and remains in the unchanged state. Therefor, I'd deem the ALTER TABLE statement working.

MySQL: adding support for Asian characters to an existing database

I'm looking for a best-practices approach to adding support for Asian character sets to an existing database. We have existing tables that are in the latin1 charset:
show create table books
CREATE TABLE `books` (
`id` varchar(255) NOT NULL,
`category` varchar(255) default NULL,
`contactEmail` varchar(255) default NULL,
`description` text,
`price` varchar(255) default NULL,
PRIMARY KEY (`id`),
) ENGINE=MyISAM DEFAULT CHARSET=latin1
Currently when we enter UTF8 chars for the description field, we get back '?' chars for Asian chars on the round-trip. Latin1 chars work just fine.
Can I simply convert this table with something like this?
ALTER TABLE books CONVERT TO CHARACTER SET utf8
I understand that this won't magically fix data already present in the table. I just want it to work properly for new data going forward.
Do I need to worry about collation? I have no idea how that would work for non-latin characters.
Would it make sense to make utf8 the default for the database? Are there any caveats to that?
Thanks
I don't have much experience with how MySQL handles character sets, but I have experience with character sets in general.
Currently when we enter UTF8 chars for the description field, we get back '?' chars for Asian chars on the round-trip. Latin1 chars work just fine.
Because your table is using latin1 for encoding, it can only store characters that are present in the latin1 character set. Latin1 is a shorthand for ISO-8859-1, you can see what characters it has — no Asian characters, which is why they won't store. I'm a little surprised MySQL doesn't error on such input.
Would it make sense to make utf8 the default for the database? Are there any caveats to that?
UTF-8 would be a good choice if you need to store characters from multiple languages. UTF-8, as a Unicode encoding, will let you store any Unicode character (there are literally thousands of them), from many languages. You could store the string "Dog café θλφ 你好" using UTF-8. UTF-8 is widely used, and is able to encode just about anything — I highly recommend it.
I would peruse the Internet to find literature on converting MySQL tables, to make sure there aren't any gotchas. If this is production data, test on an offline dataset — a development table or a QA table.
Last, you seem to indicate that there are half-stored Asian characters somehow in your DB. I'd figure out what extactly is stored: if it's the UTF-8 sequence for the Asian character, but the database thinks it's latin1 (a classic case of mojibake), some recovery may be possible. I would worry that the conversion may attempt to transform the UTF-8 code units as if they were latin1, resulting in very interesting output. Test test test.
The fact that you're getting back '?' is a good sign, as it would suggest that the characters not present in Latin-1 have been properly converted to the replacement character. Before embarking on a project to convert the data, make sure that everything in there is sane. This is especially important if you have more than one application and programming language writing to the database.
One of the simplest ways to do a rough and ready sanity check is to check the character length against the byte length.
SELECT length(foo), char_length(foo) FROM bar
The first returned value is the length of the string in bytes, the second is the length of the string in characters. If there are any multi-byte characters in there somehow, these two values will differ.
There are a great many guides to converting available on the internet, and of those I've found one in particular to be incredibly useful.