This may look like a duplicate but I've been searching for hours and none of the suggested fixes for similar problems are working:
I have text in xls file that was converted to CSV. It contains polish characters. I've confirmed I did save as UTF8 encoded. I don't have access to PHPMyAdmin on this server, so I uploaded this UTF8 encoded CSV file to the server.
I then use a UTF8 encoded PHP file to load the database up:
mb_language('uni');
mb_internal_encoding('UTF-8');
setlocale(LC_ALL, "pl_PL.UTF-8");
require_once('config.php');
mysql_set_charset('utf8');
$f=fopen('questions-final2.csv','r');
$questions=array();
while (($data = fgetcsv($f, 1000, ",")) !== FALSE) {
//$num = count($data);
//echo "<p> $num fields in line $row: <br /></p>\n";
print_r($data);
$questions[]=$data;
//mysql_query('INSERT INTO questions(question_id,text,answer_time,difficulty,mode) VALUES '.implode(',',$inserts));
//echo $data;
}
//exit();
// import of questions
$prev_index=0;
foreach($questions as $index=>$question){
if($index>0)
if($question[0]==$questions[$prev_index][0])
unset($questions[$index]);
else
$prev_index=$index;
}
mysql_query('SET CHARACTER SET utf8');
mysql_query('SET NAME utf8');
$res=mysql_query('SELECT * FROM questions');
$inserts=array();
foreach($questions as $question)
$inserts[]='("'.$question[5].'","'.addslashes($question[1]).'","'.$question[7].'","'.$question[0].'","'.$question[4].'")';
mysql_query('INSERT IGNORE INTO questions(question_id,text,answer_time,difficulty,mode) VALUES '.implode(',',$inserts));
var_dump(mysql_error());
fclose($f);
Now, here is what the database says:
mysql> show variables like 'character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
I can't get that latin1 part to go away. My my.conf looks like this:
[client]
default-character-set=utf8
[mysql]
default-character-set=utf8
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
collation-server = utf8_general_ci
init-connect='SET NAMES utf8'
character-set-server = utf8
default-character-set = utf8
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
I'm using putty and have confirmed I have it set to utf8 encoding as well, this is the output:
mysql> select text from questions limit 1;
+-------------------------------------------+
| text |
+-------------------------------------------+
| ?wi?to Unii Europejskiej obchodzone jest: |
+-------------------------------------------+
1 row in set (0.00 sec)
This is the original text as it should appear:
Święto Unii Europejskiej obchodzone jest:
Also I have tried :
alter table questions modify column text TEXT character set utf8 collate utf8_unicode_ci;
and
alter table questions convert to character set utf8 collate utf8_unicode_ci;
Both before and after importing data, to no avail. What am I missing here?
mysql_query('SET NAME utf8');
This query should trigger an error:
SQL Error (1193): Unknown system variable 'NAME'
... but you don't see it because you don't test whether mysql_query() succeeds. The correct variable is NAMES.
Related
I am experiencing the issue of unproper reading of cyrillic letters from MySQL table.
I use the following code:
library(RMySQL)
library(keyring)
mydb = dbConnect(MySQL(), ...)
dbReadTable(mydb, 'tregions2')
The table is read but cyrillic letters are substituted with question marks:
id regionname iSOID administrativeCenter
1 1 ????????? ???? RU-ALT ???????
I started investigating into the issue.
The result of the query show variables like 'character_set_%'; in MySQL Workbench for the same user logged in on the same PC returns:
character_set_client utf8mb4
character_set_connection utf8mb4
character_set_database utf8
character_set_filesystem binary
character_set_results utf8mb4
character_set_server utf8mb4
character_set_system utf8
character_sets_dir C:\Program Files\MySQL\MySQL Server 8.0\share\charsets\
But result of the query returned by R is different:
> dbGetQuery(mydb, "show variables like 'character_set_%'")
Variable_name Value
1 character_set_client latin1
2 character_set_connection latin1
3 character_set_database utf8
4 character_set_filesystem binary
5 character_set_results latin1
6 character_set_server utf8mb4
7 character_set_system utf8
8 character_sets_dir C:\\Program Files\\MySQL\\MySQL Server 8.0\\share\\charsets\\
The locale variables of R are the following:
> Sys.getlocale()
[1] "LC_COLLATE=Russian_Russia.1251;LC_CTYPE=Russian_Russia.1251;LC_MONETARY=Russian_Russia.1251;LC_NUMERIC=C;LC_TIME=Russian_Russia.1251
I tried to change character set and collation of the table in DB. Earlier setting cp1251 character set helped me to properly write the data into the database. But not now. I tried utf8/koi8r/cp1251 without any effect.
Attempt to execute Sys.setlocale(,"ru_RU") aborted with an error that it could not be executed.
I am stuck. Could anyone give me an advise what else I should do?
After several hours of investigation I finaly figured out the solution. Hope it will help someone encountering the same problem:
> dbExecute(mydb, "SET NAMES cp1251")
[1] 0
> dbGetQuery(mydb, "show variables like 'character_set_%'")
Variable_name Value
1 character_set_client cp1251
2 character_set_connection cp1251
3 character_set_database utf8
4 character_set_filesystem binary
5 character_set_results cp1251
6 character_set_server utf8mb4
7 character_set_system utf8
8 character_sets_dir C:\\Program Files\\MySQL\\MySQL Server 8.0\\share\\charsets\\
>
> TrTMP <- dbReadTable(mydb, 'tregions')
> TrTMP[1,c(1,2,6,14)]
id regionname iSOID administrativeCenter
1 1 Алтайский край RU-ALT Барнаул
Tool -> Global Options -> Code -> Saving and put UTF-8
rs <- dbSendQuery(con, 'set character set "utf8"')
rs <- dbSendQuery(con, 'SET NAMES utf8')
options(encoding = "UTF-8") at the top of my main script from which I call my package seems to fix the issue with having non-ascii characters in my package code.
read_chunk(lines = readLines("TestSpanishText.R", encoding = "UTF-8")) (also file())
For more flexibility, you should use utf8mb4 instead of cp1251. If you have data coming into the client in cp1251, then you probably have to stick with that charset.
I have no clue what's going on here. One of my DB servers is giving me that error when trying to create a function (that works on all the other servers):
My function is
delimiter $$
CREATE DEFINER=`root`#`%` FUNCTION `getreadablesize`(`Width` DECIMAL(13,4),`Height` DECIMAL(13,4),`Type` VARCHAR(64)) RETURNS varchar(64) CHARSET utf8mb4 COLLATE utf8mb4_unicode_ci
BEGIN
RETURN concat(trim(trailing'.'
from trim(trailing'0'
from`Width`)),'\"',
if(`Height`>0,concat(' × ',trim(trailing'.'
from trim(trailing'0'
from`Height`)),'\"'),''),
if(`Type`>'',concat(' ',`Type`),''));
END$$
And the exact error message is
0 row(s) affected, 2 warning(s): 1300 Invalid big5 character string: '
\xC3\x97 ' 1300 Invalid big5 character string: 'C39720'
None of my DB is in Chinese, or ever uses the Big5 character set?
If I copy my schema create code, I get this:
CREATE DATABASE `sterling` /*!40100 DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci */;
EDIT: It works if I change the times symbol to something else, but that still doesn't make any sense as to why it's being treated as Big5 and not uft8
EDIT: Something is apparently quite wrong. If I run the following query, this is what I get:
SHOW VARIABLES LIKE 'character_set%';
+-----------------------------+------------------------------+
| 'character_set_client', | 'big5' |
+-----------------------------+------------------------------+
| 'character_set_connection', | 'big5' |
+-----------------------------+------------------------------+
| 'character_set_database', | 'utf8mb4' |
+-----------------------------+------------------------------+
| 'character_set_filesystem', | 'binary' |
+-----------------------------+------------------------------+
| 'character_set_results', | 'big5' |
+-----------------------------+------------------------------+
| 'character_set_server', | 'utf8mb4' |
+-----------------------------+------------------------------+
| 'character_set_system', | 'utf8' |
+-----------------------------+------------------------------+
| 'character_sets_dir', | '/usr/share/mysql/charsets/' |
+-----------------------------+------------------------------+
But my my.cnf clearly has default-character-set = utf8mb4 and all the variants of it under each applicable section... I will restart my MySQL server, because something is most definitely a foot.
Welp. I restarted the server and it never came back. I tried just about every MySQL server repair steps that existed and just nothing worked. Eventually I got it to start in safe mode with the innodb force flag set to 6, and I was then able to use HeidiSQL to pull all the data out (mysqlddump just hung and never started, but HeidiSQL implements their own export.). Even then though I still had 3 of my 165 tables that I couldn't read at all.
I have a database I am using that has support for different languages, the issue I am running into is, in the source SQL data, the format is correct.
MariaDB [stmtransit]> SELECT * FROM routes WHERE route_id = 181;
+----------+-----------+------------------+------------------+------------+------------+------------------------------------------+-------------+------------------+
| route_id | agency_id | route_short_name | route_long_name | route_desc | route_type | route_url | route_color | route_text_color |
+----------+-----------+------------------+------------------+------------+------------+------------------------------------------+-------------+------------------+
| 181 | 1 | 369 | Côte-des-Neiges | NULL | 3 | http://www.stm.info/fr/infos/reseaux/bus | 009EE0 | NULL |
+----------+-----------+------------------+------------------+------------+------------+------------------------------------------+-------------+------------------+
1 row in set (0.00 sec)
When I move do the query and move it into CouchDB, it changes accents and anything other than plain characters to
Côte-des-Neiges
My request is
function queryRouteTable(db, route_id) {
return db.query({
sql: "SELECT * FROM routes WHERE route_id = ?;",
values: [route_id],
})
.take(1);
}
Then my upload to couch uses rx and rx-couch with the code, and no matter where I view the document.route_long_name after the initial grab, its always formatted wrong.
What am I missing, why does it change after initial grab.
To display the current character encoding set for a particular database, type the following command at the mysql> prompt. Replace DBNAME with the database name:
SELECT default_character_set_name FROM information_schema.SCHEMATA S WHERE schema_name = "DBNAME";
If you have your encoding set per table use the following command. Replace DBNAME with the database name, and TABLENAME with the name of the table:
SELECT CCSA.character_set_name FROM information_schema.`TABLES` T,information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA WHERE CCSA.collation_name = T.table_collation AND T.table_schema = "DBNAME" AND T.table_name = "TABLENAME";
IMPORTANT: BACKUP YOUR DATABASE
If you have a working backup of your database you can convert it from your current encoding to UTF-8 by issuing the following commands:
mysql --database=DBNAME -B -N -e "SHOW TABLES" | awk '{print "SET foreign_key_checks = 0; ALTER TABLE", $1, "CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci; SET foreign_key_checks = 1; "}' | mysql --database=DBNAME
And in prompt:
ALTER DATABASE DBNAME CHARACTER SET utf8 COLLATE utf8_general_ci;
Now you should be able to export using UTF-8 and import into couch using UTF-8 encoding...
Hope that helps...
Turns out MariaDB has a bug which turns the database formating to latin1 intead of utf8
To correct for this you must go to /etc/my.cnf
remove all instances of
default-character-set=utf8
find title "mysqld" in my.cnf and put under it
init_connect='SET collation_connection = utf8_unicode_ci'
init_connect='SET NAMES utf8'
character-set-server=utf8
collation-server=utf8_unicode_ci
skip-character-set-client-handshake
and save.
Then restart mariadb.
I have just updated my cnf properties to add the following:
init_connect = 'SET collation_connection = utf8_unicode_ci; SET NAMES utf8;'
character-set-client = utf8
character-set-server = utf8
collation-server = utf8_unicode_ci
skip-character-set-client-handshake
My system variables after restarting mysql:
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_unicode_ci |
| collation_database | utf8_unicode_ci |
| collation_server | utf8_unicode_ci |
+----------------------+-----------------+
So then I ran the following query to find a table that I knew had been built in utf_general_ci:
select t.table_name, c.column_name,round(((data_length + index_length) / 1024 / 1024), 2) 'Size in MB',count(c.column_name), c.character_set_name,c.collation_name
from columns c
inner join tables t on t.table_schema=c.table_schema and t.table_name=c.table_name
where t.table_schema='db' and
(c.collation_name like '%general%' or c.character_set_name like '%general%') and
(c.column_type like 'varchar%' or c.column_type like 'text') and
t.table_collation not like '%latin%' and t.table_name in ('table_name') group by t.table_name, c.column_name;
So I took a dump of the table and reimported it into my database, but it stays in utf8_general_ci!!?!?!??
Why is this? I know if I run an alter it will change it, but why didn't the dump and load resolve the problem?
Additionally, when I run an alter to convert to utf8_unicode_ci, all the columns in the table have "COLLATE utf8_unicode_ci" listed in them.
I try to save names from users from a service in my MySQL database. Those names can contain emojis like 🙈😂😱🍰 (just for examples)
After searching a little bit I found this stackoverflow linking to this tutorial. I followed the steps and it looks like everything is configured properly.
I have a Database (charset and collation set to utf8mb4 (_unicode_ci)), a Table called TestTable, also configured this way, as well as a "Text" column, configured this way (VARCHAR(191) utf8mb4_unicode_ci).
When I try to save emojis I get an error:
Example of error for shortcake (🍰):
Warning: #1300 Invalid utf8 character string: 'F09F8D'
Warning: #1366 Incorrect string value: '\xF0\x9F\x8D\xB0' for column 'Text' at row 1
The only Emoji that I was able to save properly was the sun ☀️
Though I didn't try all of them to be honest.
Is there something I'm missing in the configuration?
Please note: All tests of saving didn't involve a client side. I use phpmyadmin to manually change the values and save the data. So the proper configuration of the client side is something that I will take care of after the server properly saves emojis.
Another Sidenote: Currently, when saving emojis I either get the error like above, or get no error and the data of Username 🍰 will be stored as Username ????. Error or no error depends on the way I save. When creating/saving via SQL Statement I save with question marks, when editing inline I save with question marks, when editing using the edit button I get the error.
thank you
EDIT 1:
Alright so I think I found out the problem, but not the solution.
It looks like the Database specific variables didn't change properly.
When I'm logged in as root on my server and read out the variables (global):
Query used: SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+--------------------+
10 rows in set (0.00 sec)
For my Database (in phpmyadmin, the same query) it looks like the following:
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+--------------------+
How can I adjust these settings on the specific database?
Also even though I have the first shown settings as default, when creating a new database I get the second one as settings.
Edit 2:
Here is my my.cnf file:
[client]
port=3306
socket=/var/run/mysqld/mysqld.sock
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld_safe]
socket=/var/run/mysqld/mysqld.sock
[mysqld]
user=mysql
pid-file=/var/run/mysqld/mysqld.pid
socket=/var/run/mysqld/mysqld.sock
port=3306
basedir=/usr
datadir=/var/lib/mysql
tmpdir=/tmp
lc-messages-dir=/usr/share/mysql
log_error=/var/log/mysql/error.log
max_connections=200
max_user_connections=30
wait_timeout=30
interactive_timeout=50
long_query_time=5
innodb_file_per_table
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
!includedir /etc/mysql/conf.d/
character_set_client, _connection, and _results must all be utf8mb4 for that shortcake to be eatable.
Something, somewhere, is setting a subset of those individually. Rummage through my.cnf and phpmyadmin's settings -- something is not setting all three.
If SET NAMES utf8mb4 is executed, all three set correctly.
The sun shone because it is only 3-bytes - E2 98 80; utf8 is sufficient for 3-byte utf8 encodings of Unicode characters.
For me, it turned out that the problem lied in mysql client.
mysql client updates my.cnf's char setting on a server, and resulted in unintended character setting.
So, What I needed to do is just to add character-set-client-handshake = FALSE.
It disables client setting from disturbing my char setting.
my.cnf would be like this.
[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
...
Hope it helps.
It is likely that your service/application is connecting with "utf8" instead of "utf8mb4" for the client character set. That's up to the client application.
For a PHP application see http://php.net/manual/en/function.mysql-set-charset.php or http://php.net/manual/en/mysqli.set-charset.php
For a Python application see https://github.com/PyMySQL/PyMySQL#example or http://docs.sqlalchemy.org/en/latest/dialects/mysql.html#mysql-unicode
Also, check that your columns really are utf8mb4. One direct way is like this:
mysql> SELECT character_set_name FROM information_schema.`COLUMNS` WHERE table_name = "user" AND column_name = "displayname";
+--------------------+
| character_set_name |
+--------------------+
| utf8mb4 |
+--------------------+
1 row in set (0.00 sec)
Symfony 5 answer
Although this is not what was asked, people can land up here after searching the web for the same problem in Symfony.
1. Configure MySQL properly
☝️ See (and upvote if helpful) top answers here.
2. Change your Doctrine configuration
/config/packages/doctrine.yaml
doctrine:
dbal:
...
charset: utf8mb4
I'm not proud of this answer, because it uses brute-force to clean the input. It's brutal, but it works
function cleanWord($string, $debug = false) {
$new_string = "";
for ($i=0;$i<strlen($string);$i++) {
$letter = substr($string, $i, 1);
if ($debug) {
echo "Letter: " . $letter . "<BR>";
echo "Code: " . ord($letter) . "<BR><BR>";
}
$blnSkip = false;
if (ord($letter)=="146") {
$letter = "´";
$blnSkip = true;
}
if (ord($letter)=="233") {
$letter = "é";
$blnSkip = true;
}
if (ord($letter)=="147" || ord($letter)=="148") {
$letter = """;
$blnSkip = true;
}
if (ord($letter)=="151") {
$letter = "–";
$blnSkip = true;
}
if ($blnSkip) {
$new_string .= $letter;
break;
}
if (ord($letter) > 127) {
$letter = "�" . ord($letter) . ";";
}
$new_string .= $letter;
}
if ($new_string!="") {
$string = $new_string;
}
//optional
$string = str_replace("\r\n", "<BR>", $string);
return $string;
}
//clean up the input
$message = cleanWord($message);
//now you can insert it as part of SQL statement
$sql = "INSERT INTO tbl_message (`message`)
VALUES ('" . addslashes($message) . "')";
ALTER TABLE table_name CHANGE column_name column_name
VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL
DEFAULT NULL;
example query :
ALTER TABLE `reactions` CHANGE `emoji` `emoji` VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL;
after that , successful able to store emoji in table :
Consider adding
init_connect = 'SET NAMES utf8mb4'
to all of your your db-servers' my.cnf-s.
(still, clients can (so will) overrule it)
I was importing data via command:
LOAD DATA LOCAL INFILE 'abc.csv' INTO TABLE abc
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(col1, col2, col3, col4, col5...);
This didnt work for me:
SET NAMES utf8mb4;
I had to add the CHARACTER SET to make it working:
LOAD DATA LOCAL INFILE
'E:\\wamp\\tmp\\customer.csv' INTO TABLE `customer`
CHARACTER SET 'utf8mb4'
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;
Note, the target column must be also utf8mb4 not utf8, or the import will save (without errors thought) the question marks like "?????".
For codeigniter user, ensure your character set and collate setting in database.php is set properly, which is worked for me.
$db['default']['char_set'] = 'utf8mb4';
$db['default']['dbcollat'] = 'utf8mb4_unicode_ci';