I am working with a Debian Server with xampp 1.8.3-2 and mysqlserver version 5.6.14 installed. In the old database, latin1_german2_ci was used as character collation for the database and the tables because of the german characters like ä,ö,ü,ß etc.. Now, I have to convert the collation of the tables into utf8_unicode_ci(collate of database is still in latin1_german2_ci). But after that, queries like this don't produce the correct answers anymore. i.e. it not only give everything with kö but with ko as well. How can I fix this?
SELECT * FROM users Where lastname like "%kö%"
edit: Just found one solution which uses COLLATE:
SELECT * FROM users Where lastname like "%kö%" COLLATE utf8mb4_german2_ci
However, this has to be adjusted depend which server connection collation is used, so if the server connection collation is utf8_unicode_ci, the query has to be changed into
SELECT * FROM users Where lastname like "%kö%" COLLATE utf8_german2_ci
So my question now: is there a better/more elegant way to solve my problem? Is there any option in the database to prevent this?
try to set the table's charset to UTF-8 and the collation to utf8_* and make sure there is no _ci and the end (_ci is for "case insensitive") otherwise MySQL will both perform case and accent-insensitive searches.
Related
I am aware about MySQL being case insensitive by default.
I also read about using the collation utf8_general_cs to enable case sensitivity. But I get an error saying the collation is not identified. Also when I query the collation for charset utf8, the resultset shows ci related collations only. So question number 1 would be, do we need to configure cs related collations? If so, then I would like some guidance over it. Or is it dependent on some particular database engine?
Also I read about using utf8_bin collation for making MySQL queries search case sensitive. I did so. Set the schema collation as utf8_bin. But it didn't work. Restarted MySQL services as well to ensure that collation has been updated. But yet, when I do a
Select * from table where name like 'el%';
It gives name starting from 'EL' as well.
Note: I am preferably looking for options to set the collation at the database level.
MySQL server version 5.6.x
Column collation has precedence over database and table collations. If you've been making changes, it's possible that your column is currently using the value that was the default when the table was created. You should be able to spot it with a proper SQL tool or by running:
SELECT table_schema, column_name, collation_name
FROM information_schema.columns
WHERE table_schema = 'your database name'
AND table_name = 'your table name'
If you aren't willing to change the column collation, you can set it at expression level:
SELECT *
FROM foo
WHERE bar LIKE 'el%' COLLATE utf8mb4_0900_as_cs;
(demo)
Collation affects sorting and character comparison so you'll have to read some docs to figure out which one suits your needs best (it isn't straightforward if you aren't a Unicode geek).
My projects are all in Spanish so I tend to use utf8mb4_spanish_ci a lot ;-)
I thought that normally worked?!
Anyway if this is a localised problem there are a few solutions:
Primarily one could use a SELECT * FROM users WHERE name REGEXP '^[E][P]*$' - regular expression.
The alternative
Surely you must be accessing this through another languages wrapper for either automation or handling rather than the MySQL console right? I would suggest sorting it using the language you’re using for a wrapper this can easily be implemented. Still however only deal with those beginning in el as this will reduce the number of things to check.
I started working on a wordpress on my dev machine. mysql version is 5.6, and worpdress is 4.7 so its already using the utf8mb4_unicode_520_ci encoding if it detects its possible.
My problem is that on my hosting (mysql 5.5) utf8mb4_unicode_520_ci is not recognized as a valid encoding. So I'm trying to target utf8mb4_unicode_ci encoding as my hosting knows about this one, and if I understand correctly, this would - in opposition to going to utf8 - allow me to keep the 4 bytes.
I tried several different combinaison of encoding and collation set up for the db, but nothing successful (from here How to convert an entire MySQL database characterset and collation to UTF-8?).
I tried several combination of encoding and collation in the wp-config, but nothing.
Everything that is coming from the database (like post titles and post contents displays badly encoded char for all diatrics, anything else is displayed appropriately )
menu label from the database display incorrectly, where the hardcoded/translated label display correctly
I think I need to convert the actual content of the database, changing charset and collation does not seems to be enough.
I found this but it does not address my problem directly, or if it does I missed it.
Any help would be appreciated
————————————————————————————————
UPDATE :
here is the precise procedure I went through:
Initial situation:
I installed a wordpress (4.6.1) locally (on my dev machine, mysql 5.6.28).
I worked on the theme and plugin locally
(at this moment I have, locally, a database that is utf8_general_ci and tables that are utf8mb4_unicode_520_ci
Problem:
I want to deploy my wordpress on my hosting (mysql: 5.5 - db collation seems to be utf8mb4_unicode_ci).
I mysqldump the db locally, then try to import it on my hostings' phpmyadmin.
This gives error :
Unknown collation: 'utf8mb4_unicode_520_ci'
solution 1 change the tables charset to utf8mb4_unicode_ci:
On my hosting sql server, utf8mb4_unicode_520_ci is not available and I can't get a more recent version of mysql.
utf8mb4_unicode_ci seems like the closest and is available on my hosting sql server.
from various so question, I adapt a bash script to change charset and collation of my tables
for tbl in wp_sij2017_commentmeta wp_sij2017_comments wp_sij2017_cwa wp_sij2017_links wp_sij2017_options wp_sij2017_postmeta wp_sij2017_posts wp_sij2017_term_relationships wp_sij2017_term_taxonomy wp_sij2017_termmeta wp_sij2017_terms wp_sij2017_usermeta wp_sij2017_users wp_sij2017_woocommerce_api_keys wp_sij2017_woocommerce_attribute_taxonomies wp_sij2017_woocommerce_downloadable_product_permissions wp_sij2017_woocommerce_order_itemmeta wp_sij2017_woocommerce_order_items wp_sij2017_woocommerce_payment_tokenmeta wp_sij2017_woocommerce_payment_tokens wp_sij2017_woocommerce_sessions wp_sij2017_woocommerce_shipping_zone_locations wp_sij2017_woocommerce_shipping_zone_methods wp_sij2017_woocommerce_shipping_zones wp_sij2017_woocommerce_tax_rate_locations wp_sij2017_woocommerce_tax_rates; do
mysql --execute="ALTER TABLE wp_sij_2017_original_copy.${tbl} CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;"
done
I run this script on the local db
I now have all my tables set to collation utf8mb4_unicode_ci
My db collation is still utf8
I mysqldump the db, then import it to my hosting and...
Import is successful.
I search and replace siteurl in the db.
I then visit the online website, I got SOME diatrics that renders a "question mark char"
Any text coming from the db has decoding issue AT SOME POINT
The source/html markup also has those "question mark char"
I have no idea where to look or what to do next
Clarification: CHARACTER SETs utf8 and utf8mb4 specify how characters are encoded into bytes. COLLATIONs *_unicode_*, etc, specify how those character compare.
The encoding for utf8mb4_unicode_ci and utf8mb4_unicode_520_ci are the same because they are encoded in the character set utf8mb4.
"database that is utf8_general_ci and tables that are utf8mb4_unicode_520_ci" -- that probably means that new tables in that database, unless specifically stated, will be CHARACTER SET utf8 COLLATION utf8_general_ci. That is the database setting is just a default for CREATE TABLE. Since your tables are already CHARACTER SET utf8mb4 COLLATION utf8mb4_unicode_520_ci, the database default is not relevant to them.
As long as the CHARACTER SET stays utf8mb4, no Emoji, Chinese, etc will be lost or otherwise mangled.
Do not use mysql40; it did not know about any CHARACTER SETs. Do not use CONVERT or CAST. Etc.
I assume the 520 is coming from the output of mysqldump? Do you have an editor that can handle a file that big? If so, simply edit it to change utf8mb4_unicode_520_ci to utf8mb4_unicode_ci throughout. Then load the dump. Problem solved?
Your fix
You did ALTER ... CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci on your local machine. That is probably an even better way -- since it will put your dev and prod machine in line with each other. That should have worked. Don't worry about what the "database" claims.
I'm find 'utf8mb4_unicode_520_ci' and replace with 'utf8mb4_unicode_ci' in .sql file.
Its simplest why to solve this.
I have a mysql database with a charset utf8 of all the tables.
I am using SQLDeveloper to access and query the database with the latest JConnector JDBC driver.
When executing a simple query such as SELECT 'Варна'; equivalent to SELECT 'Варна' from DUAL;, which contains Bulgarian language, SQLDeveloper returns '?????'. This makes selects from the database in which I have used Bulgarian language return NULL, because their where clauses (containing Bulgarian language) mismatch the uft8 Bulgarian characters in the database. (When the select doesn't use Bulgarian language at all SQLDeveloper returns completely correct values and displays the Bulgarian language returned as a result of the query correctly.)
The Preferences -> Environment -> Encoding in SQLDeveloper is set currently to UTF-8, but I have tried virtually every applicable encoding listed in there and even the simplest query SELECT 'Варна' from DUAL; still does not return back the correct value Варна.
I have looked into setting the variable NLS_LANG, thinking this may be the cause but to no avail. (Perhaps it is the key after all but I am unable to actually configure it properly).
Edit: In order to reproduce the problem and visualise it (as I realise I may have explained it poorly) just go in SQLDeveloper and connect to a mysql database and execute the query SELECT 'Варна' from DUAL;.
Edit2: Clarifications.
Edit3: As shown by the comment left by #tenhouse it appears that this may be a bug.
Edit4: As stated below as a comment, the above query SELECT 'Варна' from DUAL; works perfectly fine without any modifications and/or settings fiddling on MySQL Workbench.
Edit5: Please, feel free to correct the title and/or tags if you feel that something can be improved as there is still no answer to the problem.
Edit6: By now can I assume that it really is a bug? Could anyone advise me where exactly to report it - is it a JConnector or SQLDeveloper related bug. I would think that I have to report it as a SQLDeveloper bug but I'd rather get a confirmation before possibly wasting their time.
Edit7: Tried to clarify it even further in my hopes for an answer.
Edit8: (Important!) My current database is hosted on linux (Ubuntu 12.04, MySQL 5.5.28) server. If, however, I install MySQL on a fresh Windows machine and create a utf8 db there, querying through SQLDeveloper works as it is supposed to, SELECT 'Варна' from DUAL; actually returns Варна. Could someone please confirm this?
So I didn't know this myself till having this problem a few months back, but MySQL actually offers the ability for different encodings for clients, databases, and connections. MySQL will convert (or collate) the requests/responses from/to a client to different encodings as specified by the client or in his config file. So even though the database is storing stuff as utf8, if the client is set to latin1, your going to see latin1 as your result encoding. The easiest way to check this is to spin up a connection to MySQL and run the following query:
SHOW VARIABLES LIKE "%char%";
You should see a whole bunch of encodings for different connections/sources. From your description, I imagine most of these will not be utf8. Here's mysql's doc on what each of these mean. You can test if this in fact the problem by doing a SET NAMES 'utf8'; or charset utf8; (can't remember which one) and running your queries again to see if that fixes the problem.
A summary of what each of these guys does (since the docs leave some stuff out):
character_set_client: specifies how data is encoded when sending from client to server. Anything connecting through MySQL's API is not a client (ex. php's mysqli, most C/C++ wrapper libs)
character_set_database: specifies the encoding for data stored in the database
character_set_filesystem: not really sure, but I believe how data is written to disk?
character_set_results: the encoding that MySQL returns query results
character_set_server: server's default set (not really sure cases where this is used)
character_set_system: not sure on this one either
character_sets_dir: where your collation/encoding definitions are located
Most of these guys can be specified by editing your my.cnf and sticking your defaults in there.
I'm not exactly sure how JConnector works, but I imagine it uses MySQL's C API, in which case you'll need to do something like the following somewhere in the code. Maybe JConnector has a way for you to set this through him. I'm not sure, but here's the syntax for the MySQL API:
mysql_options( myLink, MYSQL_SET_CHARSET_NAME, "utf8" );
EDIT: For MySQL 5.5
You can try a command like this ::
ALTER DATABASE CHARACTER SET WE8ISO8859P5;
Please restart the DB after changing the characterset.
More details refer this link where it explains about the encoding required for different languages
http://www.csee.umbc.edu/portal/help/oracle8/server.815/a67789/ch3.htm
after you connect with a mysql_connect:
$dbcnx = mysql_connect($dbhost, $dbuser, $dbpass)
you do this query:
mysql_query("SET
character_set_results = 'utf8',
character_set_client = 'utf8',
character_set_connection = 'utf8',
character_set_database = 'utf8',
character_set_server = 'utf8'",
$dbcnx);
Now this will set the encoding for what is returned, what happens on the server - so all of it has the same encoding.
In your following query's, you specify this connection to be used
Export
Add [?characterEncoding=utf8]
<StringRefAddr addrType="customUrl">
<Contents>jdbc:mysql://instance_host_name:3306/database_name?characterEncoding=utf8</Contents>
</StringRefAddr>
Import
On our development server, we have a database server with collation: COLLATE SQL_Latin1_General_CP1_CI_AS.
After deploying our solution on a server and that database server has collation: COLLATE SQL_Latin1_General_CI_AS
That means if we have a query:
SELECT ('text' + 'abc') AS 'result'
I got this problem:
Implicit conversion of varchar value to varchar cannot be performed because the collation of the value is unresolved due to a collation conflict.
So I tried this: ALTER DATABASE [mydb] COLLATE SQL_Latin1_General_CP1_CI_AS
then I check the property of the myDB, the collate is changed to: SQL_Latin1_General_CI_CS_AS but I still get the same error.
Other topics suggest to reinstall the database. But that is not the case we will lose all of the data.
Any suggestion is very appreciated!
Thanks in advance.
In a nutshell, it's not enough to alter the database, as that only affects new objects that are created going forward. You also have to change all of the existing columns. This Microsoft support article should have all the details you'll need.
How to transfer a database from one collation to another collation in SQL Server
I'm trying to execute this concat query in mysql
SELECT CONCAT(if(fName,fName,''),Name)
From Student
Error:
#1271 - Illegal mix of collations for operation 'concat'
This is due to collections difference, you can solve by converting the two strings or columns
to one collection say UTF8
CONCAT(CAST(fName AS CHAR CHARACTER SET utf8),CAST('' AS CHAR CHARACTER SET utf8))
This will solve :)
you can check more about casting in MySQL here MySQL Casting
The charsets and/or collations you use in your connection do not match the charset/collation in your table.
There are 4 solutions:
1- Change the charset in your connection:
//find out the charset used in your table.
SHOW TABLES LIKE 'student'
//set the server charset to match
SET NAMES 'charset_name' [COLLATE 'collation_name']
2- Change the charset used in your table to match the server charset:
//find out the charset used in the server
SHOW VARIABLES LIKE 'character_set%';
SHOW VARIABLES LIKE 'collation%';
//Change the charset used in the table
ALTER TABLE student ......
3- Change the default charset settings and restart MySQL
Edit My.ini and replace the character_set_* options, so they match your tables.
4- Change the charset settings for your connection
You client can override the charset and collation settings.
If it does not option 1 or 3 should fix your issue, but if the connection overrides these settings, you need to check the connection-string and edit the charset/collation settings to match your database.
Some advice:
Find a charset. I recommend UTF8 and a collation: I recommend utf8_general_ci. And use those consistantly everywhere.
A concatenation can only work if the collation of all used values matches OR you use a collation that all collations are a subset of (from a logical standpoint).
If you want to concatenate text, each text should be the same collation. Take a look at the collation the database uses, then take a look at the collation that your connection uses:
show variables like '%coll%'
The collation_connection should match the collation of the table you try to concatenate. If it doesn't, the error message will arise.
You can then change the connection collation to match the one of the table.
Look like you have a miss use on the if statement there because it will resulting an undefined data type so the concat operation will fail as it different in data type. Try change the query by use ifnull instead.
Try this query instead:
SELECT concat(ifnull(fName,''),Name) From Student
see the demo here http://www.sqlize.com/kfy85j8f1e
for another reference read also http://forums.mysql.com/read.php?10,225982,225982#msg-225982
It can also be an error with your client library being too old for the mysql server.
We had a similar problem with LIKE and the character "ő" and using PHP MySQL library version 5.1.52 but MySQL server version 5.5.22.
The problem has gone away upon upgrading the client library.