How to insert unknown charset string to MariaDB table in C - mysql

My sample code is..
insert into test set fir='aaa',sec='¿¬ºA¦ µµT °¥µ젰ވ­.alz' (in C)
This string copy&paste to MariaDB prompt, then success.
question
but not work in C code. sec column is NULL. why?
How can I modify this in C code that insert to DB? (It is okay to string is broken.)
MariaDB status...
MariaDB [oops]> status
....
Server characterset: utf8
Db characterset: utf8
Client characterset: utf8
Conn. characterset: utf8
...
MariaDB [oops]> show full columns from test
...
| fir | varchar(255) | utf8_general_ci |
| sec | varchar(255) | utf8_general_ci |
...
thanks.
Ps.
C code is parse character(¿¬ºA¦ µµT °¥µ젰ވ­.alz) in E-Mail
add
MariaDB [oops]> show variables like 'char%';
+--------------------------+-------------------------------+
| Variable_name | Value |
+--------------------------+-------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /xxxx/mysql/share/charsets/ |
+--------------------------+-------------------------------+
8 rows in set (0.01 sec)
MariaDB [oops]>

The mysql C client often defaults to a latin1 client connection while the mysql command line client adopts the character set of the user's environment.
So my guess would be that if you execute "SHOW VARIABLES LIKE 'char%'" from within your C program, you'll get a row back stating that the client connection is in fact latin1.
Thus, although the charset of your original string is "unknown" to you, it probably contains some multibye character sequences which are being wrongfully encoded as a result of inserting data from a latin1 client connection into a utf8 database/column.
Generally speaking, if you want your data to be unadulterated upon insertion then your client connection character encoding should match that of the database/column.

Related

When does mysql throw an error instead of coercing text into the default column format?

I'm working with two mysql servers, trying to understand why they behave differently.
I've created identical tables on each:
| Field | Type | Collation |
+----------------+------------+-------------------+
| some_chars | char(45) | latin1_swedish_ci |
| some_text | text | latin1_swedish_ci |
and I've set identical character set variables:
| Variable_name | Value
+--------------------------+-------+
| character_set_client | utf8
| character_set_connection | utf8
| character_set_database | latin1
| character_set_filesystem | binary
| character_set_results | utf8
| character_set_server | latin1
| character_set_system | utf8
When I insert UTF-8 characters into the database on one server, I get an error:
DatabaseError: 1366 (HY000): Incorrect string value: '\xE7\xBE\x8E\xE5\x9B\xBD...'
The same insertion in the other server throws no error. The table just silently accepts the utf-8 insertion and renders a bunch of ? marks where the utf-8 characters should be.
Why is the behavior of the two servers different?
What command were you executing when you got the error?
Your data is obviously utf8 (good).
Your connection apparently is utf8 (good).
Your table/column is declared CHARACTER SET latin1? It should be utf8.
That is 美 - Chinese, correct? Some Chinese characters need 4-byte utf8. So you should use utf8mb4 instead of utf8 in all 3 cases listed above.
Other notes:
There is no substantive difference in this area in 5.6 versus 5.7.
##SQL_MODE is not relevant.
VARCHAR is usually advisable over CHAR.

MySQL UTF8 Issue

Okay, I have tried to import "CSV" file into MySQL for the past 24 hours but have failed miserably.
I have set name, set char and there is nothing left that I have not set to UTF8 but it still is not working. Not just for the DB and Tables, but for the server as well, still no use.
I am importing directly into MySQL so it is not PHP issue. I will be grateful if anyone can highlight where am I going wrong.
mysql> SHOW CREATE DATABASE `dict_2`;
+----------+--------------------------------------------------------------------
---------------------+
| Database | Create Database
|
+----------+--------------------------------------------------------------------
---------------------+
| dict_2 | CREATE DATABASE `dict_2` /*!40100 DEFAULT CHARACTER SET utf8 COLLAT
E utf8_unicode_ci */ |
+----------+--------------------------------------------------------------------
---------------------+
1 row in set (0.00 sec)
mysql> show variables like "%character%"; show variables like "%collation%";
+--------------------------+--------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | utf8 |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | C:\xampp\mysql\share\charsets\ |
+--------------------------+--------------------------------+
8 rows in set (0.00 sec)
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_unicode_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)
In its current form, this question is impossible to answer.
We're left guessing...
That you're using a MySQL LOAD DATA statement.
You've verified that the characterset encoding of the .csv file is not ucs2.
You've verified that the characterset encoding of the .csv file is utf8 (i.e. matches the character_set_database system variable), of that you've specified the appropriate characterset in the CHARACTER SET clause of the LOAD DATA statement.
Beyond that, there's a whole slew of other things that might be wrong, but we're still just guessing.
Very frequently when something MySQL "fail miserably", there's some sort of indication, like an error message, or some other behavior that we can observe and describe.
In the question, the description of the failure mode is beyond vague, it's entirely non-existent.

How do I convert the OpenShift MySQL 5.1 cartridge to UTF-8

The default MySQL 5.1 cartridge apparently creates all its tables with the latin1 character set. I have an application (Review Board, a python/Django application) that has some issues unless the DB is running as UTF-8. How do I change that? I can't just edit my.cnf because it will be wiped at the next cartridge restart.
mysql> show variables like 'character_set%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
I cannot change this setting in my.cnf, because to the best of my knowledge, there exists no OpenShift environment variable to set the character encoding. How do I persistently change this (ideally in my OpenShift hooks so this will persist into future deployments) and update my existing tables to UTF-8?
I found a solution but not a perfect one :
In openshift installing phpMyAdmin,
Find and change server settings, the relevant character variables changed from latin1 to utf8.
Hope that helps

MSSQL and MySQL as Linked Server

I have a MSSQL Server 2005 and MySQL Server as linked server.
I want to save particular data from MSSQL to MySQL.
And I have a huge problem related with encoding.
MS SQL
select SERVERPROPERTY ('collation')
Result: Cyrillic_General_CI_AS
MySQL
mysql> SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
+--------------------------+--------+
When I'm trying to retrive data from MySQL or to insert ones to MySQL
I have a wrong character set in text field,
something like that "???????????????"
How can I convert text data to UTF-8 encoding before inserting the data to linked server?
Or should I change some settings?
I don't want to change encoding of MySQL server on CP-1251, it's not convenient for me.
What is your Collation Compatible property for linked server? This might help.
Have you tried COLLATE?

tomcat/jdbc/mysql: can insert ÿ(U+00FF) but not Ā (U+0100)

my setup:
mysql 5.1
show variables:
| character_set_client | utf8
| character_set_connection | utf8
| character_set_database | utf8
| character_set_filesystem | binary
| character_set_results | utf8
| character_set_server | utf8
| character_set_system | utf8
| character_sets_dir | D:\Programme\MySQL\MySQL Server 5.1\share
charsets\
| collation_connection | utf8_general_ci
| collation_database | utf8_unicode_ci
| collation_server | utf8_general_ci
and even
| init_connect | SET collation_connection = utf8_general_ci; SET NAMES utf8;
the table table has character set utf8
tomcat 6.0
the jdbc connector uses characterEncoding="utf8" useUnicode="true"
now when i try
stmt.execute("UPDATE *table* SET *value*=\"ÿ\" WHERE ...)
it works but for
stmt.execute("UPDATE *table* SET *value*=\"Ā\" WHERE ...)
i get an
java.sql.SQLException: Incorrect string value: '\xC4\x80' for column
'value' at row 1
furthermore it works for all characters below ÿ, which can be encoded with 1 byte but as soon as 2 bytes are needed: bang!
why is that so? and how can i get it to work?
after i added another two tables to check if it's an MyISAM vs. InnoDB problem it just worked on the new tables and why?
in the new tables each column used the default charset while in my existing tables the charsets of each column were set to latin1. this was because i copied the db from a non-utf8 mysql instance and manually changed the table charset to utf-8. BUT while copying, HeidiSQL added a "CHARACTER SET latin1" to each column which wasn't changed when changing the charset AND it is not very easily visible in HeidiSQL that a column has an individual charset ...