I am running mysql Ver 8.0.20.
Currently the collation and character sets are set to utf8mb4_0900_ai_ci and
I have been trying to update these to UTF8 by running these commands. But everytime I close the client and log back in the values have reverted to utf8mb4.
SET character_set_client = 'utf8';
SET character_set_connection = 'utf8';
SET character_set_database = 'latin1';
SET character_set_filesystem = 'binary';
SET character_set_results = 'utf8';
SET character_set_server = 'latin1';
SET character_set_system = 'utf8';
set collation_connection = 'utf8_general_ci';
set collation_database = 'latin1_swedish_ci';
set collation_database = 'latin1_swedish_ci'
commit;
ALTER DATABASE sanskvrcpu_db2 CHARACTER SET utf8 COLLATE utf8_general_ci;
The output of these statmentnts is something like this:
mysql> ALTER DATABASE sanskvrcpu_db2 CHARACTER SET utf8 COLLATE utf8_general_ci;
Query OK, 1 row affected, 2 warnings (0.00 sec)
Warning (Code 3719): 'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous.
Warning (Code 3778): 'utf8_general_ci' is a collation of the deprecated character set UTF8MB3. Please consider using UTF8MB4 with an appropriate collation instead.
SET ... applies to the current "session" (aka "connection"). So the information is lost when you disconnect.
SET GLOBAL ... applies to the global variables, but not the current session. It applies only to new connections. But they are lost when the server goes down. When doing item 3, also do item 2.
Changing the config file (my.cnf or whatever) applies after the next restart. When doing item 2, also do item 3.
MySQL 8.0 has SET PERSIST, a way of "persisting" global settings via item 2 without resorting to also doing item 3. Ref: https://dev.mysql.com/doc/refman/8.0/en/set-variable.html
(You did item 1.)
Exception: Some settings are both 'session' and 'global'; hence the above restrictions are not quite followed.
Related
I am experiencing the issue of unproper reading of cyrillic letters from MySQL table.
I use the following code:
library(RMySQL)
library(keyring)
mydb = dbConnect(MySQL(), ...)
dbReadTable(mydb, 'tregions2')
The table is read but cyrillic letters are substituted with question marks:
id regionname iSOID administrativeCenter
1 1 ????????? ???? RU-ALT ???????
I started investigating into the issue.
The result of the query show variables like 'character_set_%'; in MySQL Workbench for the same user logged in on the same PC returns:
character_set_client utf8mb4
character_set_connection utf8mb4
character_set_database utf8
character_set_filesystem binary
character_set_results utf8mb4
character_set_server utf8mb4
character_set_system utf8
character_sets_dir C:\Program Files\MySQL\MySQL Server 8.0\share\charsets\
But result of the query returned by R is different:
> dbGetQuery(mydb, "show variables like 'character_set_%'")
Variable_name Value
1 character_set_client latin1
2 character_set_connection latin1
3 character_set_database utf8
4 character_set_filesystem binary
5 character_set_results latin1
6 character_set_server utf8mb4
7 character_set_system utf8
8 character_sets_dir C:\\Program Files\\MySQL\\MySQL Server 8.0\\share\\charsets\\
The locale variables of R are the following:
> Sys.getlocale()
[1] "LC_COLLATE=Russian_Russia.1251;LC_CTYPE=Russian_Russia.1251;LC_MONETARY=Russian_Russia.1251;LC_NUMERIC=C;LC_TIME=Russian_Russia.1251
I tried to change character set and collation of the table in DB. Earlier setting cp1251 character set helped me to properly write the data into the database. But not now. I tried utf8/koi8r/cp1251 without any effect.
Attempt to execute Sys.setlocale(,"ru_RU") aborted with an error that it could not be executed.
I am stuck. Could anyone give me an advise what else I should do?
After several hours of investigation I finaly figured out the solution. Hope it will help someone encountering the same problem:
> dbExecute(mydb, "SET NAMES cp1251")
[1] 0
> dbGetQuery(mydb, "show variables like 'character_set_%'")
Variable_name Value
1 character_set_client cp1251
2 character_set_connection cp1251
3 character_set_database utf8
4 character_set_filesystem binary
5 character_set_results cp1251
6 character_set_server utf8mb4
7 character_set_system utf8
8 character_sets_dir C:\\Program Files\\MySQL\\MySQL Server 8.0\\share\\charsets\\
>
> TrTMP <- dbReadTable(mydb, 'tregions')
> TrTMP[1,c(1,2,6,14)]
id regionname iSOID administrativeCenter
1 1 Алтайский край RU-ALT Барнаул
Tool -> Global Options -> Code -> Saving and put UTF-8
rs <- dbSendQuery(con, 'set character set "utf8"')
rs <- dbSendQuery(con, 'SET NAMES utf8')
options(encoding = "UTF-8") at the top of my main script from which I call my package seems to fix the issue with having non-ascii characters in my package code.
read_chunk(lines = readLines("TestSpanishText.R", encoding = "UTF-8")) (also file())
For more flexibility, you should use utf8mb4 instead of cp1251. If you have data coming into the client in cp1251, then you probably have to stick with that charset.
I try to save names from users from a service in my MySQL database. Those names can contain emojis like 🙈😂😱🍰 (just for examples)
After searching a little bit I found this stackoverflow linking to this tutorial. I followed the steps and it looks like everything is configured properly.
I have a Database (charset and collation set to utf8mb4 (_unicode_ci)), a Table called TestTable, also configured this way, as well as a "Text" column, configured this way (VARCHAR(191) utf8mb4_unicode_ci).
When I try to save emojis I get an error:
Example of error for shortcake (🍰):
Warning: #1300 Invalid utf8 character string: 'F09F8D'
Warning: #1366 Incorrect string value: '\xF0\x9F\x8D\xB0' for column 'Text' at row 1
The only Emoji that I was able to save properly was the sun ☀️
Though I didn't try all of them to be honest.
Is there something I'm missing in the configuration?
Please note: All tests of saving didn't involve a client side. I use phpmyadmin to manually change the values and save the data. So the proper configuration of the client side is something that I will take care of after the server properly saves emojis.
Another Sidenote: Currently, when saving emojis I either get the error like above, or get no error and the data of Username 🍰 will be stored as Username ????. Error or no error depends on the way I save. When creating/saving via SQL Statement I save with question marks, when editing inline I save with question marks, when editing using the edit button I get the error.
thank you
EDIT 1:
Alright so I think I found out the problem, but not the solution.
It looks like the Database specific variables didn't change properly.
When I'm logged in as root on my server and read out the variables (global):
Query used: SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+--------------------+
10 rows in set (0.00 sec)
For my Database (in phpmyadmin, the same query) it looks like the following:
+--------------------------+--------------------+
| Variable_name | Value |
+--------------------------+--------------------+
| character_set_client | utf8 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
+--------------------------+--------------------+
How can I adjust these settings on the specific database?
Also even though I have the first shown settings as default, when creating a new database I get the second one as settings.
Edit 2:
Here is my my.cnf file:
[client]
port=3306
socket=/var/run/mysqld/mysqld.sock
default-character-set = utf8mb4
[mysql]
default-character-set = utf8mb4
[mysqld_safe]
socket=/var/run/mysqld/mysqld.sock
[mysqld]
user=mysql
pid-file=/var/run/mysqld/mysqld.pid
socket=/var/run/mysqld/mysqld.sock
port=3306
basedir=/usr
datadir=/var/lib/mysql
tmpdir=/tmp
lc-messages-dir=/usr/share/mysql
log_error=/var/log/mysql/error.log
max_connections=200
max_user_connections=30
wait_timeout=30
interactive_timeout=50
long_query_time=5
innodb_file_per_table
character-set-client-handshake = FALSE
character-set-server = utf8mb4
collation-server = utf8mb4_unicode_ci
!includedir /etc/mysql/conf.d/
character_set_client, _connection, and _results must all be utf8mb4 for that shortcake to be eatable.
Something, somewhere, is setting a subset of those individually. Rummage through my.cnf and phpmyadmin's settings -- something is not setting all three.
If SET NAMES utf8mb4 is executed, all three set correctly.
The sun shone because it is only 3-bytes - E2 98 80; utf8 is sufficient for 3-byte utf8 encodings of Unicode characters.
For me, it turned out that the problem lied in mysql client.
mysql client updates my.cnf's char setting on a server, and resulted in unintended character setting.
So, What I needed to do is just to add character-set-client-handshake = FALSE.
It disables client setting from disturbing my char setting.
my.cnf would be like this.
[mysqld]
character-set-client-handshake = FALSE
character-set-server = utf8mb4
...
Hope it helps.
It is likely that your service/application is connecting with "utf8" instead of "utf8mb4" for the client character set. That's up to the client application.
For a PHP application see http://php.net/manual/en/function.mysql-set-charset.php or http://php.net/manual/en/mysqli.set-charset.php
For a Python application see https://github.com/PyMySQL/PyMySQL#example or http://docs.sqlalchemy.org/en/latest/dialects/mysql.html#mysql-unicode
Also, check that your columns really are utf8mb4. One direct way is like this:
mysql> SELECT character_set_name FROM information_schema.`COLUMNS` WHERE table_name = "user" AND column_name = "displayname";
+--------------------+
| character_set_name |
+--------------------+
| utf8mb4 |
+--------------------+
1 row in set (0.00 sec)
Symfony 5 answer
Although this is not what was asked, people can land up here after searching the web for the same problem in Symfony.
1. Configure MySQL properly
☝️ See (and upvote if helpful) top answers here.
2. Change your Doctrine configuration
/config/packages/doctrine.yaml
doctrine:
dbal:
...
charset: utf8mb4
I'm not proud of this answer, because it uses brute-force to clean the input. It's brutal, but it works
function cleanWord($string, $debug = false) {
$new_string = "";
for ($i=0;$i<strlen($string);$i++) {
$letter = substr($string, $i, 1);
if ($debug) {
echo "Letter: " . $letter . "<BR>";
echo "Code: " . ord($letter) . "<BR><BR>";
}
$blnSkip = false;
if (ord($letter)=="146") {
$letter = "´";
$blnSkip = true;
}
if (ord($letter)=="233") {
$letter = "é";
$blnSkip = true;
}
if (ord($letter)=="147" || ord($letter)=="148") {
$letter = """;
$blnSkip = true;
}
if (ord($letter)=="151") {
$letter = "–";
$blnSkip = true;
}
if ($blnSkip) {
$new_string .= $letter;
break;
}
if (ord($letter) > 127) {
$letter = "�" . ord($letter) . ";";
}
$new_string .= $letter;
}
if ($new_string!="") {
$string = $new_string;
}
//optional
$string = str_replace("\r\n", "<BR>", $string);
return $string;
}
//clean up the input
$message = cleanWord($message);
//now you can insert it as part of SQL statement
$sql = "INSERT INTO tbl_message (`message`)
VALUES ('" . addslashes($message) . "')";
ALTER TABLE table_name CHANGE column_name column_name
VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL
DEFAULT NULL;
example query :
ALTER TABLE `reactions` CHANGE `emoji` `emoji` VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL;
after that , successful able to store emoji in table :
Consider adding
init_connect = 'SET NAMES utf8mb4'
to all of your your db-servers' my.cnf-s.
(still, clients can (so will) overrule it)
I was importing data via command:
LOAD DATA LOCAL INFILE 'abc.csv' INTO TABLE abc
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES
(col1, col2, col3, col4, col5...);
This didnt work for me:
SET NAMES utf8mb4;
I had to add the CHARACTER SET to make it working:
LOAD DATA LOCAL INFILE
'E:\\wamp\\tmp\\customer.csv' INTO TABLE `customer`
CHARACTER SET 'utf8mb4'
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;
Note, the target column must be also utf8mb4 not utf8, or the import will save (without errors thought) the question marks like "?????".
For codeigniter user, ensure your character set and collate setting in database.php is set properly, which is worked for me.
$db['default']['char_set'] = 'utf8mb4';
$db['default']['dbcollat'] = 'utf8mb4_unicode_ci';
I'm on ruby 2.2.0 and rails 4.2.0.
For a project i have a table calles 'Character' where each record is a character. When i'm doing a search for a record with 'where' for example the framework do a mistake between character.
For example :
Basic.where(:character => 'Í')
return all record with a I like character: "Ï", character: "I", character: "i", character: "í", character: "ì",...
My DB is encoding in utf8-general-ci and when i put my data into the db I use 'iso-8859-1:utf-8' encoding.
utf8_general_ci has issue were it strip characters with combining characters. In short use utf8_unicode_ci which uses the Unicode Collation Algorithm and instead. This has already been answered very well in What are the diffrences between utf8_general_ci and utf8_unicode_ci?
EDIT:
It actually seems like not even utf8_unicode_ci handles this correctly.
Here's the code I used to test this
SET collation_connection = 'utf8_bin';
SELECT 'Ï' = 'I'; -- 0
SET collation_connection = 'utf8_unicode_ci';
SELECT 'Ï' = 'I'; -- 1
SET collation_connection = 'utf8_general_ci';
SELECT 'Ï' = 'I'; -- 1
SET collation_connection = 'utf8mb4_bin';
SELECT 'Ï' = 'I'; -- 0
SET collation_connection = 'utf8mb4_unicode_ci';
SELECT 'Ï' = 'I'; -- 1
SET collation_connection = 'utf8mb4_general_ci';
SELECT 'Ï' = 'I'; -- 1
EDIT2:
It looks like Postgres handles this better, http://sqlfiddle.com/#!15/9eecb/797. If you can control the choice of DB I would suggest using Postgres instead
I have a rails application running on production mode, but all of the sudden this error came up today when a user tried to save a record.
Mysql2::Error: Incorrect string value
More details (from production log):
Parameters: {"utf8"=>"â<9c><93>" ...
Mysql2::Error: Incorrect string value: '\xC5\x99\xC3\xA1k
Mysql2::Error: Incorrect string value: '\xC5\x99\xC3\xA1k
Now I saw some solutions that required dropping the databases and recreating it, but I cannot do that.
Now mysql shows this:
mysql> show variables like 'char%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.04 sec)
What is wrong and how can I change it so I do not have any problems with any characters?
Also: Is this problem solvable with javascript? Convert it before sending it ?
Thanks
the problem is caused by charset of your mysql server side. You can config manually like:
ALTER TABLE your_database_name.your_table CONVERT TO CHARACTER SET utf8
or drop the table and recreate it like:
rake db:drop
rake db:create
rake db:migrate
references:
https://stackoverflow.com/a/18498210/2034097
https://stackoverflow.com/a/16934647/2034097
UPDATE
the first command only affect specified table, if you want to change all the tables in a database, you can do like
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_general_ci;
reference:
https://stackoverflow.com/a/6115705/2034097
I managed to store emojis (which take up 4 bytes) by following this blog post:
Rails 4, MySQL, and Emoji (Mysql2::Error: Incorrect string value error.)
You might think that you’re safe inserting most utf8 data in
to mysql when you’ve specified that the charset is utf-8. Sadly,
however, you’d be wrong. The problem is that the utf8 character set
takes up 3 bytes when stored in a VARCHAR column. Emoji characters, on
the other hand, take up 4 bytes.
The solution is in 2 parts:
Change the encoding of your table and fields:
ALTER TABLE `[table]`
CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin,
MODIFY [column] VARCHAR(250)
CHARACTER SET utf8mb4 COLLATE utf8mb4_bin
Tell the mysql2 adapter about it:
development:
adapter: mysql2
database: db
username:
password:
encoding: utf8mb4
collation: utf8mb4_unicode_ci
Hope this helps someone!
Then I had to restart my app and it worked.
Please note that some emojis will work without this fix, while some won't:
➡️ Did work
🔵 Did not work until I applied the fix described above.
You can use a migration like this to convert your tables to utf8:
class ConvertTablesToUtf8 < ActiveRecord::Migration
def change_encoding(encoding,collation)
connection = ActiveRecord::Base.connection
tables = connection.tables
dbname =connection.current_database
execute <<-SQL
ALTER DATABASE #{dbname} CHARACTER SET #{encoding} COLLATE #{collation};
SQL
tables.each do |tablename|
execute <<-SQL
ALTER TABLE #{dbname}.#{tablename} CONVERT TO CHARACTER SET #{encoding} COLLATE #{collation};
SQL
end
end
def change
reversible do |dir|
dir.up do
change_encoding('utf8','utf8_general_ci')
end
dir.down do
change_encoding('latin1','latin1_swedish_ci')
end
end
end
end
If you want to the store emoji, you need to do the following:
Create a migration (thanks #mfazekas)
class ConvertTablesToUtf8 < ActiveRecord::Migration
def change_encoding(encoding,collation)
connection = ActiveRecord::Base.connection
tables = connection.tables
dbname =connection.current_database
execute <<-SQL
ALTER DATABASE #{dbname} CHARACTER SET #{encoding} COLLATE #{collation};
SQL
tables.each do |tablename|
execute <<-SQL
ALTER TABLE #{dbname}.#{tablename} CONVERT TO CHARACTER SET #{encoding} COLLATE #{collation};
SQL
end
end
def change
reversible do |dir|
dir.up do
change_encoding('utf8mb4','utf8mb4_bin')
end
dir.down do
change_encoding('latin1','latin1_swedish_ci')
end
end
end
end
Change rails charset to utf8mb4 (thanks #selvamani-p)
production:
encoding: utf8mb4
References:
https://stackoverflow.com/a/39465494/1058096
https://stackoverflow.com/a/26273185/1058096
Need to change CHARACTER SET and COLLATE for already created database:
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Or it was necessary to create a database with pre-set parameters:
CREATE DATABASE databasename CHARACTER SET utf8 COLLATE utf8_general_ci;
It seems like an encoding problem while getting data from database. Try adding the below to your database.yml file
encoding: utf8
Hope this solves your issue
Also, if you don't want to do changes in your database structure, you could opt by serializing the field in question.
class MyModel < ActiveRecord::Base
serialize :content
attr_accessible :content, :title
end
This may look like a duplicate but I've been searching for hours and none of the suggested fixes for similar problems are working:
I have text in xls file that was converted to CSV. It contains polish characters. I've confirmed I did save as UTF8 encoded. I don't have access to PHPMyAdmin on this server, so I uploaded this UTF8 encoded CSV file to the server.
I then use a UTF8 encoded PHP file to load the database up:
mb_language('uni');
mb_internal_encoding('UTF-8');
setlocale(LC_ALL, "pl_PL.UTF-8");
require_once('config.php');
mysql_set_charset('utf8');
$f=fopen('questions-final2.csv','r');
$questions=array();
while (($data = fgetcsv($f, 1000, ",")) !== FALSE) {
//$num = count($data);
//echo "<p> $num fields in line $row: <br /></p>\n";
print_r($data);
$questions[]=$data;
//mysql_query('INSERT INTO questions(question_id,text,answer_time,difficulty,mode) VALUES '.implode(',',$inserts));
//echo $data;
}
//exit();
// import of questions
$prev_index=0;
foreach($questions as $index=>$question){
if($index>0)
if($question[0]==$questions[$prev_index][0])
unset($questions[$index]);
else
$prev_index=$index;
}
mysql_query('SET CHARACTER SET utf8');
mysql_query('SET NAME utf8');
$res=mysql_query('SELECT * FROM questions');
$inserts=array();
foreach($questions as $question)
$inserts[]='("'.$question[5].'","'.addslashes($question[1]).'","'.$question[7].'","'.$question[0].'","'.$question[4].'")';
mysql_query('INSERT IGNORE INTO questions(question_id,text,answer_time,difficulty,mode) VALUES '.implode(',',$inserts));
var_dump(mysql_error());
fclose($f);
Now, here is what the database says:
mysql> show variables like 'character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
I can't get that latin1 part to go away. My my.conf looks like this:
[client]
default-character-set=utf8
[mysql]
default-character-set=utf8
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
user=mysql
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
collation-server = utf8_general_ci
init-connect='SET NAMES utf8'
character-set-server = utf8
default-character-set = utf8
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
I'm using putty and have confirmed I have it set to utf8 encoding as well, this is the output:
mysql> select text from questions limit 1;
+-------------------------------------------+
| text |
+-------------------------------------------+
| ?wi?to Unii Europejskiej obchodzone jest: |
+-------------------------------------------+
1 row in set (0.00 sec)
This is the original text as it should appear:
Święto Unii Europejskiej obchodzone jest:
Also I have tried :
alter table questions modify column text TEXT character set utf8 collate utf8_unicode_ci;
and
alter table questions convert to character set utf8 collate utf8_unicode_ci;
Both before and after importing data, to no avail. What am I missing here?
mysql_query('SET NAME utf8');
This query should trigger an error:
SQL Error (1193): Unknown system variable 'NAME'
... but you don't see it because you don't test whether mysql_query() succeeds. The correct variable is NAMES.