How to create multilingual Unicode text attributes in SQL? - mysql

I have a database and the requirement is to store the data in the columns that can:
hold fixed-length Unicode characters like Japanese, Chinese, French, Arabic and so on characters.
The data stored in a column is Unicode or multilingual and is of variable length.
In my suggestions, the Data Types are NCHAR, NVARCHAR, CHAR and VARCHAR etc...
But please tell me how what are the SQL queries to create these columns with the above-mentioned constraints.
The user requirements are to speed up the data retreival process. Also if to save hard disk.

Depending on your DBMS, you can create your database defining what would be the character encoding (normally, UTF-8 would do). Once the database was create with that encoding, you can insert text in any language. Take into account that the actual number of characters that you will be able to store within a table column will normally be less that what you defined as string length. For instance, if you create the column as varchar(1000), you will NOT be able to store 1000 character in all cases.
Check your specific DBMS documentation on how to configure UTF-8 encoding.

Related

How to find out mysql field level charset?

I need to convert latin1 charset of a table to utf8.
Quoting from mysql docs:
The CONVERT TO operation converts column values between the original and named character sets. This is not what you want if you have a column in one character set (like latin1) but the stored values actually use some other, incompatible character set (like utf8mb4). In this case, you have to do the following for each such column:
ALTER TABLE t1 CHANGE c1 c1 BLOB;
ALTER TABLE t1 CHANGE c1 c1 TEXT CHARACTER SET utf8mb4;
This answer shows how to find out charset at DB level, table level, and column level. But I need to find out the charset of the actual stored values. How can I do that?
Since my connector/j jdbc connection string doesn't specify any characterEncoding or connectionCollation properties, it is possible that it used utf8 by default to store the values, in which case I don't need any conversion, just change the table metadata.
mysql-connector-java version: 8.0.22
mysql database version: 5.6
spring boot version: 2.5.x
The character set of the string in a given column should be the same as the column definition.
There have been cases where people accidentally store the bytes of the wrong encoding in a column. For example, they store bytes of a latin1 encoding in a utf8 field. This is a terrible idea, because queries can't tell the difference. Those bytes may not be valid values of the column's defined encoding, and this results in garbage data. Cleaning up a table where some of the strings are stored in the wrong encoding is an unpleasant chore.
So I strongly urge you to store only strings encoded in a compatible way according to the column's definition, and to assume that all strings are stored this way.
To answer the title:
SHOW CREATE TABLE tablename shows the detault charset for the table and any overrides for individual columns.
Don't blindly use CONVERT TO, especially the 2-step ALTER you are showing. Let's see what is in the table now (SELECT col, HEX(col) ... for something with accented text.
See Trouble with UTF-8 characters; what I see is not what I stored for the main 4 types of problems.
This gives several cases and how to fix them. http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases
One case involves using CONVERT TO; two other cases involve using BLOB or VARBINARY.

MySQL to postgres migration issue

I want to migrate my project from MySQL to postgres, I have one table in MySQL, in which utf8mb4 set for particular column in a table, what alternative is there in postgres to set in column for encoding?
utf8mb4 is MySQL's way to represent 4-byte UTF characters, however, as the documentation clarifies:
Requires a maximum of four bytes per multibyte character.
So, actually not all characters are stored in four bytes. The OS is also not using up all the 4 available bytes for each characters, so you should be able to migrate your utf8mb4 characters into a UTF-8 encoded target field (MySQL - PostgreSQL) without problems, at least in theory.
But you never know whether this fits practice, so it is advisable to first create a backup of your MySQL database (so you will not be afraid of doing changes to it if for some reason you decide that the initial database needs some changes), export your database and modify your table's/column's definition to no longer use utf8mb4 as an encoding, but rather leave it unspecified (if you can rely on the fact that PostgreSQL has UTF-8 as the default encoding) or specify a UTF-8 encoding explicitly and run the inserts. Take a few samples of data from the original database and compare them to what PostgreSQL returns to them. If it works out of the box, then the theory was fitting the practice. If not, then you will need to research for the cause of the problem you experience.

store polish characters mysql

I am trying to save characters like "ą", "ć", "ł" but they are saved in the database as question marks (I save them using phpMyAdmin).
The database and table's collation is utf8_bin.
Try changing the collation to:
utf8_unicode_ci
or
utf8_polish_ci
You can refer to: http://mysql.rjweb.org/doc.php/charcoll
Also you can TRY altering the specific column with:
ALTER TABLE tbl MODIFY COLUMN txt TEXT CHARACTER SET utf8
I've searched a lot and finally, I got a solution with this:
ALTER TABLE tableName MODIFY COLUMN columnName VARCHAR(64) CHARACTER SET `binary`;
You can change VARCHAR(64) to match your needs. I hope this helps someone. Note that I did not only required to store Polish characters but French and Spanish ones as well. So the solutions above might work for just polish chars.
You can also change column from vchar to nvchar. Then when inserting values to DB remember to ad N before as follows: N'ŁÓDŹ' (in persistence frameworks u should have some kind of NString representation)
from documentation:
Nvarchar stores UNICODE data. If you have requirements to store UNICODE or multilingual data, nvarchar is the choice. Varchar stores ASCII data and should be your data type of choice for normal use. Regarding memory usage, nvarchar uses 2 bytes per character, whereas varchar uses 1

Unicode Comparing in PHP/MySQL

The name Accîdent seems to be different than AccÎdent when I do a database query to update the column. Yet Accîdent and AccÎdent point to the same place...
In MySQL Accîdent = Accîdent when inserted.
Also, AccÎdent = AccÃŽdent.
Do you know why this is?
By default, MySQL assumes the client uses the latin1 character set. If you're using UTF-8 in your PHP scripts, then this assumption is false. You need to specify to MySQL that you're using UTF-8 by issuing this SQL statement just after the database connection is opened:
SET NAMES utf8
Then the data inserted by the following SQL statements will use the correct character set. This means that you need to re-insert your data or follow the MySQL conversion procedure (see the last paragraphs).
It is recommended that your tables are configured to store data in UTF-8, too, to avoid unnecessary read/write character set conversions. That's not required, though.
More information is available in the MySQL documentation. Specifically, Connection Character Sets and Collations.
First, you seem to be storing UTF-8 data in a table of different encoding. MySQL will try and cope, but the side effect is as you see - data in the database will look "weird". When creating a table, you need to specify the character encoding - preferably UTF-8. For existing tables, you'll need to convert the data.
Second, the tables have a "collation" beside encoding. Encoding determines how the characters map to bytes, collation determines sorting and comparison. There are language-specific collations, but utf8_general_ci should be the one you're looking for (ci stands for "case insensitive") - then your two string would match.

can Mysql or sqlite Blob data type store varchar data in it?

Can one anyone provide any information on whether can we store normal text data(varchar) in mysql or sqlite Blob data type
You can store any binary data in a blob. This would include both ascii and unicode text. However, you would need to explicitly interpret this as text in your code since an SQL interface would have no idea what type of data was stored inside its binary field.
Why would you want to use this rather than a normal varchar or text field?
A BLOB is just a bunch of bytes -- no hindrance no help dealing with them in any way; a VARCHAR is known to be text so such things as character encoding and collation come into play. So, store bunches of bytes (e.g. images) as BLOBs, bunches of text as TEXT or VARCHAR!
in mysql you can insert text/varchar in a blob data type.
blob can support binary data. meaning that non binary or binary data are also supported in mysql.
text/varchar can be indexed but not for blob.
further readings here
SQLite uses manifest typing, so you can put any type of data in any type of column, actually. Then you'll have to interpret the returned data yourself, just like everyone pointed out for MySQL.