MySQL Database why do Õ come back as O when Search - mysql

i have multiple special characters (Õ) in my table column
When i search for Õ its showing O also in search results.
select * from table where column like '%Õ%'
i want to replace Õ with single Question in my Table.
Example : its saving as below
ItÕs going to be just one of the factors that will be the cause of the resistance
so figure out which one it is and focus on that one.

If you are using utf8 currently:
mysql> SELECT REPLACE('O-o-Õ-Õ-x', 'Õ', '?') COLLATE utf8_bin;
+----------------------------------------------------+
| REPLACE('O-o-Õ-Õ-x', 'Õ', '?') COLLATE utf8_bin |
+----------------------------------------------------+
| O-o-?-?-x |
+----------------------------------------------------+
Notice how it replaced only the Õ characters.
If you are using utf8mb4, then change to COLLATE utf8mb4_bin.
Caution -- Your problem is very unusual. If you have left out some aspects of the problem, this solution may do more harm than good.

Related

Can I convert user input language to default collation of database?

I want to search user input in my database. database collation is latin1_swedish_ci. I don't want to change that, instead can I change user input utf-8 to latin1_swedish_ci?
Edit:
I approach two methods.
Method 1: I imported and used default collation latin1_swedish_ci and character set latin1. Then I have
Here I can query like SELECT * FROM dict WHERE english_word = '$_value' and I get all the values of column including malayalam_definition in the browser as desired. But problem is I can't query like SELECT * FROM dict WHERE malayalam_definition = '$_value'. It returns no result.
Method 2: I changed collation to utf8_unicode_ci and character set to utf8. Then in mysql I get desired values like
Here I when I query like SELECT * FROM dict WHERE english_word = '$_value' in browser I get question marks in malayalam_definition values like
Result of SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
+--------------------------+--------+
7 rows in set (0.00 sec)
Do I need to change character_set_server, then how to do it?
First of all, the "database collation" is only a default. The real question is what is the CHARACTER SET of the columns that you are interested in.
Then, what are the bytes in your client? Are they encoded as latin1? Or utf8? In either case, tell MySQL that that is what is coming at it. This is preferably done in the connection parameters. (What is your client language?) Alternatively, use SET NAMES latin1 or SET NAMES utf8, according to the client encoding.
Now, what MySQL will do on INSERT and SELECT... It will convert the encoding from the client's encoding to the column's encoding as you do an INSERT. No further action is needed to achieve this.
Similarly, MySQL will convert the other way during a SELECT.
(Of course, if the column and the client are talking the same encoding, no "convert" is needed.)
Your question mentions "collation". So far, I have only talked about CHARACTER SETs, also known as "encoding". Contrast with that, the sorting and comparing of two strings -- this is COLLATION.
For the CHARACTER SET latin1, the default COLLATION is latin1_swedish_ci.
For the CHARACTER SET utf8, the default COLLATION is utf8_general_ci.
There are several different "collations" to handle the quirks of German or Turkish or Spanish or (etc) orderings.
Please explain why you are trying to do what you stated. There are many ways you can do it wrong, so I do not want to give you an ALTER statement -- it may just make things worse for the real goal.
It is better to use utf8mb4 instead of utf8. The outside world refers to UTF-8; this is equivalent to MySQL's utf8mb4.
Edit (after OP's Edit)
The first screenshot shows "Mojibake". Another screenshot shows question marks. The causes of each are covered in Trouble with UTF-8 characters; what I see is not what I stored

Use accent senstive primary key in MySQL

Desired result :
Have an accent sensitive primary key in MySQL.
I have a table of unique words, so I use the word itself as a primary key (by the way if someone can give me an advice about it, I have no idea if it's a good design/practice or not).
I need that field to be accent (and why not case) sensitive, because it must distinguish between, for instance, 'demandé' and 'demande', two different inflexions of the French verb "demander". I do not have any problem to store accented words in the database. I just can't insert two accented characters strings that are identical when unaccented.
Error :
When trying to create the 'demandé' row with the following query:
INSERT INTO `corpus`.`token` (`name_token`) VALUES ('demandé');
I got this error :
ERROR 1062: 1062: Duplicate entry 'demandé' for key 'PRIMARY'
Questions :
Where in the process should a make a modification in order to have two different unique primary keys for "demande" and "demandé" in that table ?
SOLUTION using 'collate utf8_general_ci' in table declaration
How can i make accent sensitive queries ? Is the following the right way :
SELECT * FROM corpus.token WHERE name_token = 'demandé' COLLATE utf8_bin
SOLUTION using 'collate utf8_bin' with WHERE statement
I found that i can achieve this point by using the BINARY Keyword (see this sqlFiddle). What is the difference between collate and binary?
Can I preserve other tables from any changes ? (I'll have to rebuild that table anyway, because it's kind of messy)
I'm not very comfortable with encoding in MySQL. I don't have any problem yet with encoding in that database (and I'm kind of lucky because my data might not always use the same encoding... and there is not much I can do about it). I have a feeling that any modification regarding to that "accent sensitive" issue might create some encoding issue with other queries or data integrity. Am I right to be concerned?
Step by step :
Database creation :
CREATE DATABASE corpus DEFAULT CHARACTER SET utf8;
Table of unique words :
CREATE TABLE token (name_token VARCHAR(50), freq INTEGER, CONSTRAINT pk_token PRIMARY KEY (name_token))
Queries
SELECT * FROM corpus.token WHERE name_token = 'demande';
SELECT * FROM corpus.token WHERE name_token = 'demandé';
both returns the same row:
demande
Collations. You have two choices, not three:
utf8_bin treats all of these as different: demandé and demande and Demandé.
utf8_..._ci (typically utf8_general_ci or utf8_unicode_ci) treats all of these as the same: demandé and demande and Demandé.
If you want only case sensitivity (demandé = demande, but neither match Demandé), you are out of luck.
If you want only accent sensitivity (demandé = Demandé, but neither match demande), you are out of luck.
Declaration. The best way to do whatever you pick:
CREATE TABLE (
name VARCHAR(...) CHARACTER SET utf8 COLLATE utf8_... NOT NULL,
...
PRIMARY KEY(name)
)
Don't change collation on the fly. This won't use the index (that is, will be slow) if the collation is different in name:
WHERE name = ... COLLATE ...
BINARY. The datatypes BINARY, VARBINARY and BLOB are very much like CHAR, VARCHAR, and TEXT with COLLATE ..._bin. Perhaps the only difference is that text will be checked for valid utf8 storing in a VARCHAR ... COLLATE ..._bin, but it will not be checked when storing into VARBINARY.... Comparisons (WHERE, ORDER BY, etc) will be the same; that is, simply compare the bits, don't do case folding or accent stripping, etc.
May be you need this
_ci in a collation name=case insensitive
If your searches on that field are always going to be case-sensitive, then declare the collation of the field as utf8_bin... that'll compare for equality the utf8-encoded bytes.
col_name varchar(10) collate utf8_bin
If searches are normally case-insensitive, but you want to make an exception for this search, try;
WHERE col_name = 'demandé' collate utf8_bin
More here
Try this
mysql> SET NAMES 'utf8' COLLATE 'utf8_general_ci';
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE t1
-> (c1 CHAR(1) CHARACTER SET UTF8 COLLATE utf8_general_ci);
Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO t1 VALUES ('a'),('A'),('À'),('á');
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> SELECT c1, HEX(c1), HEX(WEIGHT_STRING(c1)) FROM t1;
+------+---------+------------------------+
| c1 | HEX(c1) | HEX(WEIGHT_STRING(c1)) |
+------+---------+------------------------+
| a | 61 | 0041 |
| A | 41 | 0041 |
| À | C380 | 0041 |
| á | C3A1 | 0041 |
+------+---------+------------------------+
4 rows in set (0.00 sec)

MySQL string comparison with special characters

I have created an autocomplete that matches against a list of names in a database.
The database that I'm working contains a ton of names with special characters, but the end users are most likely going to search with the English equivalent of those names, e.g. Bela Bartok for Béla Bartók and Dvorak for Dvořák, etc. Currently, doing the English searches returns no results.
I have come across threads saying that the way to solve this is to change your MySQL collation to utf8 (which I have done to no avail).
I think that this may be because I used utf8_unicode_ci, but the one that would get the results that I want is utf8_general_ci. The problem with the latter though is that all the comments say to no longer use it.
Does anyone know how I can solve this problem?
If you know the list of special characters and what the equivalents in plain English are, than you can do the following:
lower case the string
replace the characters with the lower case equivalents
search against that "plain English" column
You will need to use the full text searching of MySQL in order to search against the text or come up with a home grown solution for how you're going to handle that.
Just tested with both utf8_general_ci and utf8_unicode_ci collations and it worked like a charm in both cases.
Follows the MySQL code I used to run my test:
CREATE TABLE `test` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`text` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
INSERT INTO `test` (`id`, `text`) VALUES (NULL, 'Dvořák'), (NULL, 'Béla Bartók');
SELECT * FROM `test` WHERE `text` LIKE '%dvorak%';
The above SELECT statement returns:
id text
--------------
1 Dvořák
Note: During my test I set all the collations to the desired one. The database collation, the table collation and the column collation as well.
Could it be that there's a bug in your PHP application?
I found the solution to my problem. Changing the collation to utf8_unicode_ci works perfectly fine. My problem was that I needed to use REGEXP in my query instead of LIKE, but REGEXP obviously doesn't work in this situation!
So in short, changing your collation to utf8_unicode_ci will allow you to compare Dvorak and Dvořák using = or LIKE, but not one of the REGEXP equivalents.
First, let's see if the data is stored correctly. Do
SELECT name, HEX(name) FROM ... WHERE ...;
Béla may come out (ignoring the spaces)
42 C3A9 6C 61 -- if correctly encoded with utf8 (é = C3A9)
42 E9 6C 61 -- if encoded with latin1 (é = E9)
The "Collation" (utf8_general_ci or utf8_unicode_ci) makes no difference for the examples you gave. Both tread é = e. See extensive list of equivalences for utf8 collations.
After you determine the encoding, we can proceed to prescribe a cure.
Taking a hint from Rick James, using:
SELECT * FROM `test` WHERE HEX(`column`) = HEX('Dvořák');
Should work. If you need a case insensitive query, then you'll need to lower/upper both sides in addition to the HEX check.
A more up to date collation is utf8mb4_unicode_520_ci.
Note, it does NOT work for utf8mb4_unicode_ci. See the comparison here: https://stackoverflow.com/a/59805600/857113

MySQL view - Illegal mix of collations

I'll be very clear: What's the solution for create views in MySQL without have the damned Illegal mix of collations error.
My SQL code is like this (it has some portuguese words), and my database default collation is latin1_swedish_ci:
CREATE VIEW v_veiculos AS
SELECT
v.id,
v.marca_id,
v.modelo,
v.placa,
v.cor,
CASE v.combustivel
WHEN 'A' THEN 'Álcool'
WHEN 'O' THEN 'Óleo Diesel'
WHEN 'G' THEN 'Gasolina'
ELSE 'Não Informado'
END AS combustivel,
marcas.marca,
/*I think that the CONCAT and COALESCE below causes this error, when the next line the view works fine*/
CONCAT(marca, ' ', v.modelo, ' - Placa: ', v.placa, ' - Combustível: ', COALESCE(v.combustivel, 'Não informado')) AS info_completa
FROM veiculos v
LEFT JOIN
marcas on(marcas.id = v.marca_id);
I think that the error cause is because I'm using coalesce and/or concat as the full error's description tells me: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation 'coalesce'
You may also use CAST() to convert a string to a different character set. The syntax is:
CAST(character_string AS character_data_type CHARACTER SET charset_name)
eg:
SELECT CAST(_latin1'test' AS CHAR CHARACTER SET utf8);
alternative : use CONVERT(expr USING transcoding_name)
This is kind of old, but well
I had this same error,
As far as I know the Views does not have a collation, the tables does.
So, if you get the "illegal mix..." is because your view is linking (comparing, whatever) 2 tables with different collation
The thing is, if you create a table you can specify the collation, for instance
CREATE TABLE IF NOT EXISTS `vwHotelCode_Terminal` (
`HOTELCODE` varchar(8)
,`TERMINALCODE` varchar(5)
,`DISTKM` varchar(6)
,`DISTMIN` varchar(3)
,`TERMINALNAME` varchar(50)
)ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_spanish_ci ;
But if you don't, the default collation will be applied. So for me the defaul collation is utf8_unicode_ci so my tables will be created with this collation and I ended having some tables with utf8_spanish_ci and the ones I did not specify with utf8_unicode_ci
If you are exporting from one server to another one and the default collation is different, you are probably going to get the "illegal mix" message.
if you have views, phpmyadmin likes to create the tables of all the views and then the views. The tables are created without the collation so it takes the default one. Then, many times, when the view is created uses different collations.
That is actually a bug in MySQL.
Maybe you can update to the latest version of MySQL?
After searching around for a while and taking information from this answer, I found a hack that could be useful.
Simply check the default character set system default_character_set of your database with the below command:
SHOW VARIABLES LIKE "char%";
You'll see something like this:
mysql> SHOW VARIABLES LIKE "char%";
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 | <--
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
I just set the character_set_system which is nothing but default system character set. Copied the create code of the view and created a new view and that's all.
What happens here is the new view that you will create will use the new default character set that you defined for the system. Hence resolving the issue.
Just use below command to set the default character set
SET character_set_server = 'latin2';
This worked in my case.
NOTE: Alternatively you can change the character set of that view. That would also do the trick but I wasn't able to find the solution so I used this hack.
REFERENCE: Read more on Illegal Collation Mix on MariaDB.
A CITATION FROM Illegal Collation Mix on MariaDB:
If you encounter this issue, set the character set in the view to force it to the value you want.
Read more about Collation and Character Sets here.

How to search in mysql so that accented character is same as non-accented?

I'd like to have:
piščanec = piscanec in mysql. I mean, I'd like to search for piscanec to find piščanec also.
So the č and c would be same, š and s etc...
I know it can be done using regexp, but this is slow :-( Any other way with LIKE? I am also using full text searches a lot.
UPDATE:
select CONVERT('čšćžđ' USING ascii) as text
does not work. Produces: ?????
Declare the column with the collation utf8_generic_ci. This collation considers š equal to s and č equal to c:
create temporary table t (t varchar(100) collate utf8_general_ci);
insert into t set t = 'piščanec';
insert into t set t = 'piscanec';
select * from t where t='piscanec';
+------------+
| t |
+------------+
| piščanec |
| piscanec |
+------------+
If you don't want to or can't use the utf8_generic_ci collation for the column--maybe you have a unique index on the column and want to consider piščanec and piscanec distinct?--you can use collation in the query only:
create temporary table t (t varchar(100) collate utf8_bin);
insert into t set t = 'piščanec';
insert into t set t = 'piscanec';
select * from t where t='piscanec';
+------------+
| t |
+------------+
| piscanec |
+------------+
select * from t where t='piscanec' collate utf8_general_ci;
+------------+
| t |
+------------+
| piščanec |
| piscanec |
+------------+
The FULLTEXT index is supposed to use the column collation directly; you don't need to define a new collation. Apparently the fulltext index can only be in the column's storage collation, so if you want to use utf8_general_ci for searches and utf8_slovenian_ci for sorting, you have to use use collate in the order by:
select * from tab order by col collate utf8_slovenian_ci;
It's not straightforward, but you'll probably best off creating your own collation for your fulltrext searches. Here is an example:
http://dev.mysql.com/doc/refman/5.5/en/full-text-adding-collation.html
with more info here:
http://dev.mysql.com/doc/refman/5.5/en/adding-collation.html
That way, you have your collation logic completely independent of your SQL and business logic, and you're not having to do any heavy-lifting yourself with SQL-workarounds.
EDIT: since collations are used for all string-matching operations, this may not be the best way to go: you will end up obfuscating differences between characters that are linguistically discrete.
If you want to suppress these differences for specific operations, then you might consider writing a function that takes a string and replaces - in a targetted way - characters which, for the purposes of the current operation, are to be considered identical.
You could define one table holding your base characters (š, č etc.) and another holding the equivalences. Then run a REPLACE over your string.
Another way is just to CAST your string to ASCII, thereby suppressing all non-ASCII characters.
e.g.
SELECT CONVERT('<your text here>' USING ascii) as as_ascii