Can I convert user input language to default collation of database? - mysql

I want to search user input in my database. database collation is latin1_swedish_ci. I don't want to change that, instead can I change user input utf-8 to latin1_swedish_ci?
Edit:
I approach two methods.
Method 1: I imported and used default collation latin1_swedish_ci and character set latin1. Then I have
Here I can query like SELECT * FROM dict WHERE english_word = '$_value' and I get all the values of column including malayalam_definition in the browser as desired. But problem is I can't query like SELECT * FROM dict WHERE malayalam_definition = '$_value'. It returns no result.
Method 2: I changed collation to utf8_unicode_ci and character set to utf8. Then in mysql I get desired values like
Here I when I query like SELECT * FROM dict WHERE english_word = '$_value' in browser I get question marks in malayalam_definition values like
Result of SHOW VARIABLES LIKE 'character\_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | latin1 |
| character_set_system | utf8 |
+--------------------------+--------+
7 rows in set (0.00 sec)
Do I need to change character_set_server, then how to do it?

First of all, the "database collation" is only a default. The real question is what is the CHARACTER SET of the columns that you are interested in.
Then, what are the bytes in your client? Are they encoded as latin1? Or utf8? In either case, tell MySQL that that is what is coming at it. This is preferably done in the connection parameters. (What is your client language?) Alternatively, use SET NAMES latin1 or SET NAMES utf8, according to the client encoding.
Now, what MySQL will do on INSERT and SELECT... It will convert the encoding from the client's encoding to the column's encoding as you do an INSERT. No further action is needed to achieve this.
Similarly, MySQL will convert the other way during a SELECT.
(Of course, if the column and the client are talking the same encoding, no "convert" is needed.)
Your question mentions "collation". So far, I have only talked about CHARACTER SETs, also known as "encoding". Contrast with that, the sorting and comparing of two strings -- this is COLLATION.
For the CHARACTER SET latin1, the default COLLATION is latin1_swedish_ci.
For the CHARACTER SET utf8, the default COLLATION is utf8_general_ci.
There are several different "collations" to handle the quirks of German or Turkish or Spanish or (etc) orderings.
Please explain why you are trying to do what you stated. There are many ways you can do it wrong, so I do not want to give you an ALTER statement -- it may just make things worse for the real goal.
It is better to use utf8mb4 instead of utf8. The outside world refers to UTF-8; this is equivalent to MySQL's utf8mb4.
Edit (after OP's Edit)
The first screenshot shows "Mojibake". Another screenshot shows question marks. The causes of each are covered in Trouble with UTF-8 characters; what I see is not what I stored

Related

Charset and collation priority in MySQL

In MySQL I can configure the charset and collation at both Server, Database and Table level.
Is this the same level of priority? From the least to the most specific?
I didn't manage to find it in the DOCS.
e.g.:
SERVER level
SHOW VARIABLES LIKE "char%";
RESULTS IN:
character_set_client utf8
character_set_connection utf8
character_set_database latin1
character_set_filesystem binary
character_set_results utf8
character_set_server latin1
character_set_system utf8
DB level
SELECT * FROM information_schema.SCHEMATA;
RESULTS IN:
name character_set_name collation_name
my_database latin1 latin1_swedish_ci
TABLE level
SELECT T.table_name, CCSA.character_set_name
FROM information_schema.`TABLES` T,
information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` CCSA
WHERE CCSA.collation_name = T.table_collation
AND T.table_schema = "my_database";
RESULTS IN:
table_name character_set_name
my_table1 latin1
my_table2 utf8
The Database setting is the default when creating a table.
The Table setting is the default when creating a column.
The Column setting is the only thing that matters.
Well, that is not completely true. You need to specify what encoding (CHARACTER SET) and collation is used by the client. That's where character_set_client/connection/results comes into play. Those 3 are normally set the same as each other, but they do not need to match the column's CHARACTER SET.
If the column does not match those settings, the MySQL will transcode the bytes on the fly as they go between client and server. Note that this lets you have different charsets for different columns in the same table.
The previous paragraph says nothing about the Table and Database settings -- because they are irrelevant. (Because they are only defaults.) Once a table has been CREATEd, each column's charset has been 'set in stone'.
You'll find the detail of the different levels (server,database, table and column) in this section of the manual 10.3 Specifying Character Sets and Collations and subsequent pages. At the database, table and column levels it states that the character set and collation from the previous level is used if you don't explicitly set it, which would equate to your notion of the least to the most specific.

MySQL Database why do Õ come back as O when Search

i have multiple special characters (Õ) in my table column
When i search for Õ its showing O also in search results.
select * from table where column like '%Õ%'
i want to replace Õ with single Question in my Table.
Example : its saving as below
ItÕs going to be just one of the factors that will be the cause of the resistance
so figure out which one it is and focus on that one.
If you are using utf8 currently:
mysql> SELECT REPLACE('O-o-Õ-Õ-x', 'Õ', '?') COLLATE utf8_bin;
+----------------------------------------------------+
| REPLACE('O-o-Õ-Õ-x', 'Õ', '?') COLLATE utf8_bin |
+----------------------------------------------------+
| O-o-?-?-x |
+----------------------------------------------------+
Notice how it replaced only the Õ characters.
If you are using utf8mb4, then change to COLLATE utf8mb4_bin.
Caution -- Your problem is very unusual. If you have left out some aspects of the problem, this solution may do more harm than good.

Illegal mix of collations: latin1_swedish_ci and utf8_general_ci [duplicate]

Am getting the below error when trying to do a select through a stored procedure in MySQL.
Illegal mix of collations (latin1_general_cs,IMPLICIT) and (latin1_general_ci,IMPLICIT) for operation '='
Any idea on what might be going wrong here?
The collation of the table is latin1_general_ci and that of the column in the where clause is latin1_general_cs.
This is generally caused by comparing two strings of incompatible collation or by attempting to select data of different collation into a combined column.
The clause COLLATE allows you to specify the collation used in the query.
For example, the following WHERE clause will always give the error you posted:
WHERE 'A' COLLATE latin1_general_ci = 'A' COLLATE latin1_general_cs
Your solution is to specify a shared collation for the two columns within the query. Here is an example that uses the COLLATE clause:
SELECT * FROM table ORDER BY key COLLATE latin1_general_ci;
Another option is to use the BINARY operator:
BINARY str is the shorthand for CAST(str AS BINARY).
Your solution might look something like this:
SELECT * FROM table WHERE BINARY a = BINARY b;
or,
SELECT * FROM table ORDER BY BINARY a;
Please keep in mind that, as pointed out by Jacob Stamm in the comments, "casting columns to compare them will cause any indexing on that column to be ignored".
For much greater detail about this collation business, I highly recommend eggyal's excellent answer to this same question.
TL;DR
Either change the collation of one (or both) of the strings so that they match, or else add a COLLATE clause to your expression.
What is this "collation" stuff anyway?
As documented under Character Sets and Collations in General:
A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set. Let's make the distinction clear with an example of an imaginary character set.
Suppose that we have an alphabet with four letters: “A”, “B”, “a”, “b”. We give each letter a number: “A” = 0, “B” = 1, “a” = 2, “b” = 3. The letter “A” is a symbol, the number 0 is the encoding for “A”, and the combination of all four letters and their encodings is a character set.
Suppose that we want to compare two string values, “A” and “B”. The simplest way to do this is to look at the encodings: 0 for “A” and 1 for “B”. Because 0 is less than 1, we say “A” is less than “B”. What we've just done is apply a collation to our character set. The collation is a set of rules (only one rule in this case): “compare the encodings.” We call this simplest of all possible collations a binary collation.
But what if we want to say that the lowercase and uppercase letters are equivalent? Then we would have at least two rules: (1) treat the lowercase letters “a” and “b” as equivalent to “A” and “B”; (2) then compare the encodings. We call this a case-insensitive collation. It is a little more complex than a binary collation.
In real life, most character sets have many characters: not just “A” and “B” but whole alphabets, sometimes multiple alphabets or eastern writing systems with thousands of characters, along with many special symbols and punctuation marks. Also in real life, most collations have many rules, not just for whether to distinguish lettercase, but also for whether to distinguish accents (an “accent” is a mark attached to a character as in German “Ö”), and for multiple-character mappings (such as the rule that “Ö” = “OE” in one of the two German collations).
Further examples are given under Examples of the Effect of Collation.
Okay, but how does MySQL decide which collation to use for a given expression?
As documented under Collation of Expressions:
In the great majority of statements, it is obvious what collation MySQL uses to resolve a comparison operation. For example, in the following cases, it should be clear that the collation is the collation of column charset_name:
SELECT x FROM T ORDER BY x;
SELECT x FROM T WHERE x = x;
SELECT DISTINCT x FROM T;
However, with multiple operands, there can be ambiguity. For example:
SELECT x FROM T WHERE x = 'Y';
Should the comparison use the collation of the column x, or of the string literal 'Y'? Both x and 'Y' have collations, so which collation takes precedence?
Standard SQL resolves such questions using what used to be called “coercibility” rules.
[ deletia ]
MySQL uses coercibility values with the following rules to resolve ambiguities:
Use the collation with the lowest coercibility value.
If both sides have the same coercibility, then:
If both sides are Unicode, or both sides are not Unicode, it is an error.
If one of the sides has a Unicode character set, and another side has a non-Unicode character set, the side with Unicode character set wins, and automatic character set conversion is applied to the non-Unicode side. For example, the following statement does not return an error:
SELECT CONCAT(utf8_column, latin1_column) FROM t1;
It returns a result that has a character set of utf8 and the same collation as utf8_column. Values of latin1_column are automatically converted to utf8 before concatenating.
For an operation with operands from the same character set but that mix a _bin collation and a _ci or _cs collation, the _bin collation is used. This is similar to how operations that mix nonbinary and binary strings evaluate the operands as binary strings, except that it is for collations rather than data types.
So what is an "illegal mix of collations"?
An "illegal mix of collations" occurs when an expression compares two strings of different collations but of equal coercibility and the coercibility rules cannot help to resolve the conflict. It is the situation described under the third bullet-point in the above quotation.
The particular error given in the question, Illegal mix of collations (latin1_general_cs,IMPLICIT) and (latin1_general_ci,IMPLICIT) for operation '=', tells us that there was an equality comparison between two non-Unicode strings of equal coercibility. It furthermore tells us that the collations were not given explicitly in the statement but rather were implied from the strings' sources (such as column metadata).
That's all very well, but how does one resolve such errors?
As the manual extracts quoted above suggest, this problem can be resolved in a number of ways, of which two are sensible and to be recommended:
Change the collation of one (or both) of the strings so that they match and there is no longer any ambiguity.
How this can be done depends upon from where the string has come: Literal expressions take the collation specified in the collation_connection system variable; values from tables take the collation specified in their column metadata.
Force one string to not be coercible.
I omitted the following quote from the above:
MySQL assigns coercibility values as follows:
An explicit COLLATE clause has a coercibility of 0. (Not coercible at all.)
The concatenation of two strings with different collations has a coercibility of 1.
The collation of a column or a stored routine parameter or local variable has a coercibility of 2.
A “system constant” (the string returned by functions such as USER() or VERSION()) has a coercibility of 3.
The collation of a literal has a coercibility of 4.
NULL or an expression that is derived from NULL has a coercibility of 5.
Thus simply adding a COLLATE clause to one of the strings used in the comparison will force use of that collation.
Whilst the others would be terribly bad practice if they were deployed merely to resolve this error:
Force one (or both) of the strings to have some other coercibility value so that one takes precedence.
Use of CONCAT() or CONCAT_WS() would result in a string with a coercibility of 1; and (if in a stored routine) use of parameters/local variables would result in strings with a coercibility of 2.
Change the encodings of one (or both) of the strings so that one is Unicode and the other is not.
This could be done via transcoding with CONVERT(expr USING transcoding_name); or via changing the underlying character set of the data (e.g. modifying the column, changing character_set_connection for literal values, or sending them from the client in a different encoding and changing character_set_client / adding a character set introducer). Note that changing encoding will lead to other problems if some desired characters cannot be encoded in the new character set.
Change the encodings of one (or both) of the strings so that they are both the same and change one string to use the relevant _bin collation.
Methods for changing encodings and collations have been detailed above. This approach would be of little use if one actually needs to apply more advanced collation rules than are offered by the _bin collation.
Adding my 2c to the discussion for future googlers.
I was investigating a similar issue where I got the following error when using custom functions that recieved a varchar parameter:
Illegal mix of collations (utf8_unicode_ci,IMPLICIT) and
(utf8_general_ci,IMPLICIT) for operation '='
Using the following query:
mysql> show variables like "collation_database";
+--------------------+-----------------+
| Variable_name | Value |
+--------------------+-----------------+
| collation_database | utf8_general_ci |
+--------------------+-----------------+
I was able to tell that the DB was using utf8_general_ci, while the tables were defined using utf8_unicode_ci:
mysql> show table status;
+--------------+-----------------+
| Name | Collation |
+--------------+-----------------+
| my_view | NULL |
| my_table | utf8_unicode_ci |
...
Notice that the views have NULL collation. It appears that views and functions have collation definitions even though this query shows null for one view. The collation used is the DB collation that was defined when the view/function were created.
The sad solution was to both change the db collation and recreate the views/functions to force them to use the current collation.
Changing the db's collation:
ALTER DATABASE mydb DEFAULT COLLATE utf8_unicode_ci;
Changing the table collation:
ALTER TABLE mydb CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
I hope this will help someone.
Sometimes it can be dangerous to convert charsets, specially on databases with huge amounts of data. I think the best option is to use the "binary" operator:
e.g : WHERE binary table1.column1 = binary table2.column1
I had a similar problem, was trying to use the FIND_IN_SET procedure with a string variable.
SET #my_var = 'string1,string2';
SELECT * from my_table WHERE FIND_IN_SET(column_name,#my_var);
and was receiving the error
Error Code: 1267. Illegal mix of collations (utf8_unicode_ci,IMPLICIT)
and (utf8_general_ci,IMPLICIT) for operation 'find_in_set'
Short answer:
No need to change any collation_YYYY variables, just add the correct collation next to your variable declaration, i.e.
SET #my_var = 'string1,string2' COLLATE utf8_unicode_ci;
SELECT * from my_table WHERE FIND_IN_SET(column_name,#my_var);
Long answer:
I first checked the collation variables:
mysql> SHOW VARIABLES LIKE 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
+----------------------+-----------------+
| collation_database | utf8_general_ci |
+----------------------+-----------------+
| collation_server | utf8_general_ci |
+----------------------+-----------------+
Then I checked the table collation:
mysql> SHOW CREATE TABLE my_table;
CREATE TABLE `my_table` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`column_name` varchar(40) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=125 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
This means that my variable was configured with the default collation of utf8_general_ci while my table was configured as utf8_unicode_ci.
By adding the COLLATE command next to the variable declaration, the variable collation matched the collation configured for the table.
Below solution worked for me.
CONVERT( Table1.FromColumn USING utf8) = CONVERT(Table2.ToColumn USING utf8)
Solution if literals are involved.
I am using Pentaho Data Integration and dont get to specify the sql syntax.
Using a very simple DB lookup gave the error
"Illegal mix of collations (cp850_general_ci,COERCIBLE) and (latin1_swedish_ci,COERCIBLE) for operation '='"
The generated code was
"SELECT DATA_DATE AS latest_DATA_DATE FROM hr_cc_normalised_data_date_v WHERE PSEUDO_KEY = ?"
Cutting the story short the lookup was to a view and when I issued
mysql> show full columns from hr_cc_normalised_data_date_v;
+------------+------------+-------------------+------+-----+
| Field | Type | Collation | Null | Key |
+------------+------------+-------------------+------+-----+
| PSEUDO_KEY | varchar(1) | cp850_general_ci | NO | |
| DATA_DATE | varchar(8) | latin1_general_cs | YES | |
+------------+------------+-------------------+------+-----+
which explains where the 'cp850_general_ci' comes from.
The view was simply created with 'SELECT 'X',......'
According to the manual literals like this should inherit their character set and collation from server settings which were correctly defined as 'latin1' and 'latin1_general_cs'
as this clearly did not happen I forced it in the creation of the view
CREATE OR REPLACE VIEW hr_cc_normalised_data_date_v AS
SELECT convert('X' using latin1) COLLATE latin1_general_cs AS PSEUDO_KEY
, DATA_DATE
FROM HR_COSTCENTRE_NORMALISED_mV
LIMIT 1;
now it shows latin1_general_cs for both columns and the error has gone away. :)
If the columns that you are having trouble with are "hashes", then consider the following...
If the "hash" is a binary string, you should really use BINARY(...) datatype.
If the "hash" is a hex string, you do not need utf8, and should avoid such because of character checks, etc. For example, MySQL's MD5(...) yields a fixed-length 32-byte hex string. SHA1(...) gives a 40-byte hex string. This could be stored into CHAR(32) CHARACTER SET ascii (or 40 for sha1).
Or, better yet, store UNHEX(MD5(...)) into BINARY(16). This cuts in half the size of the column. (It does, however, make it rather unprintable.) SELECT HEX(hash) ... if you want it readable.
Comparing two BINARY columns has no collation issues.
Very interesting... Now, be ready. I looked at all of the "add collate" solutions and to me, those are band aid fixes. The reality is the database design was "bad". Yes, standard changes and new things gets added, blah blah, but it does not change the bad database design fact. I refuse to go with the route of adding "collate" all over the SQL statements just to get my query to work. The only solution that works for me and will virtually eliminate the need to tweak my code in the future is to re-design the database/tables to match the character set that I will live with and embrace for the long term future. In this case, I choose to go with the character set "utf8mb4".
So the solution here when you encounter that "illegal" error message is to re-design your database and tables. It is much easier and quicker then it sounds. Exporting your data and re-importing it from a CSV may not even be required. Change the character set of the database and make sure all the character set of your tables matches.
Use these commands to guide you:
SHOW VARIABLES LIKE "collation_database";
SHOW TABLE STATUS;
Now, if you enjoy adding "collate" here and there and beef up your code with forces fulls "overrides", be my guess.
MySQL really dislikes mixing collations unless it can coerce them to the same one (which clearly is not feasible in your case). Can't you just force the same collation to be used via a COLLATE clause? (or the simpler BINARY shortcut if applicable...).
A possible solution is to convert the entire database to UTF8 (see also this question).
I used ALTER DATABASE mydb DEFAULT COLLATE utf8_unicode_ci;, but didn't work.
In this query:
Select * from table1, table2 where table1.field = date_format(table2.field,'%H');
This work for me:
Select * from table1, table2 where concat(table1.field) = date_format(table2.field,'%H');
Yes, only a concat.
One another source of the issue with collations is mysql.proc table. Check collations of your storage procedures and functions:
SELECT
p.db, p.db_collation, p.type, COUNT(*) cnt
FROM mysql.proc p
GROUP BY p.db, p.db_collation, p.type;
Also pay attention to mysql.proc.collation_connection and mysql.proc.character_set_client columns.
If you have phpMyAdmin installed, you can follow the instructions given in the following link: https://mediatemple.net/community/products/dv/204403914/default-mysql-character-set-and-collation You have to match the collate of the database with that of all the tables, as well as the fields of the tables and then recompile all the stored procedures and functions. With that everything should work again.
I personnaly had this problem in a procedure.
If you dont want to alter table you can try to convert your parameter into the procedure .
I've try sevral use of collate (with a set into the select) but none works for me.
CONVERT(my_param USING utf32) did the trick.
In my case the default return type of a function was the type/collation from database (utf8mb4_general_ci) but database column was ascii.
WHERE ascii_col = md5(concat_ws(',', a,b,c))
Quick fix was
WHERE ascii_col = BINARY md5(concat_ws(',', a,b,c))
This code needs to be put inside Run SQL query/queries on database
SQL QUERY WINDOW
ALTER TABLE `table_name` CHANGE `column_name` `column_name` VARCHAR(128) CHARACTER SET utf8 COLLATE utf8_unicode_ci NULL DEFAULT NULL;
Please replace table_name and column_name with appropriate name.

MySQL view - Illegal mix of collations

I'll be very clear: What's the solution for create views in MySQL without have the damned Illegal mix of collations error.
My SQL code is like this (it has some portuguese words), and my database default collation is latin1_swedish_ci:
CREATE VIEW v_veiculos AS
SELECT
v.id,
v.marca_id,
v.modelo,
v.placa,
v.cor,
CASE v.combustivel
WHEN 'A' THEN 'Álcool'
WHEN 'O' THEN 'Óleo Diesel'
WHEN 'G' THEN 'Gasolina'
ELSE 'Não Informado'
END AS combustivel,
marcas.marca,
/*I think that the CONCAT and COALESCE below causes this error, when the next line the view works fine*/
CONCAT(marca, ' ', v.modelo, ' - Placa: ', v.placa, ' - Combustível: ', COALESCE(v.combustivel, 'Não informado')) AS info_completa
FROM veiculos v
LEFT JOIN
marcas on(marcas.id = v.marca_id);
I think that the error cause is because I'm using coalesce and/or concat as the full error's description tells me: Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation 'coalesce'
You may also use CAST() to convert a string to a different character set. The syntax is:
CAST(character_string AS character_data_type CHARACTER SET charset_name)
eg:
SELECT CAST(_latin1'test' AS CHAR CHARACTER SET utf8);
alternative : use CONVERT(expr USING transcoding_name)
This is kind of old, but well
I had this same error,
As far as I know the Views does not have a collation, the tables does.
So, if you get the "illegal mix..." is because your view is linking (comparing, whatever) 2 tables with different collation
The thing is, if you create a table you can specify the collation, for instance
CREATE TABLE IF NOT EXISTS `vwHotelCode_Terminal` (
`HOTELCODE` varchar(8)
,`TERMINALCODE` varchar(5)
,`DISTKM` varchar(6)
,`DISTMIN` varchar(3)
,`TERMINALNAME` varchar(50)
)ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_spanish_ci ;
But if you don't, the default collation will be applied. So for me the defaul collation is utf8_unicode_ci so my tables will be created with this collation and I ended having some tables with utf8_spanish_ci and the ones I did not specify with utf8_unicode_ci
If you are exporting from one server to another one and the default collation is different, you are probably going to get the "illegal mix" message.
if you have views, phpmyadmin likes to create the tables of all the views and then the views. The tables are created without the collation so it takes the default one. Then, many times, when the view is created uses different collations.
That is actually a bug in MySQL.
Maybe you can update to the latest version of MySQL?
After searching around for a while and taking information from this answer, I found a hack that could be useful.
Simply check the default character set system default_character_set of your database with the below command:
SHOW VARIABLES LIKE "char%";
You'll see something like this:
mysql> SHOW VARIABLES LIKE "char%";
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 | <--
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
I just set the character_set_system which is nothing but default system character set. Copied the create code of the view and created a new view and that's all.
What happens here is the new view that you will create will use the new default character set that you defined for the system. Hence resolving the issue.
Just use below command to set the default character set
SET character_set_server = 'latin2';
This worked in my case.
NOTE: Alternatively you can change the character set of that view. That would also do the trick but I wasn't able to find the solution so I used this hack.
REFERENCE: Read more on Illegal Collation Mix on MariaDB.
A CITATION FROM Illegal Collation Mix on MariaDB:
If you encounter this issue, set the character set in the view to force it to the value you want.
Read more about Collation and Character Sets here.

How to search in mysql so that accented character is same as non-accented?

I'd like to have:
piščanec = piscanec in mysql. I mean, I'd like to search for piscanec to find piščanec also.
So the č and c would be same, š and s etc...
I know it can be done using regexp, but this is slow :-( Any other way with LIKE? I am also using full text searches a lot.
UPDATE:
select CONVERT('čšćžđ' USING ascii) as text
does not work. Produces: ?????
Declare the column with the collation utf8_generic_ci. This collation considers š equal to s and č equal to c:
create temporary table t (t varchar(100) collate utf8_general_ci);
insert into t set t = 'piščanec';
insert into t set t = 'piscanec';
select * from t where t='piscanec';
+------------+
| t |
+------------+
| piščanec |
| piscanec |
+------------+
If you don't want to or can't use the utf8_generic_ci collation for the column--maybe you have a unique index on the column and want to consider piščanec and piscanec distinct?--you can use collation in the query only:
create temporary table t (t varchar(100) collate utf8_bin);
insert into t set t = 'piščanec';
insert into t set t = 'piscanec';
select * from t where t='piscanec';
+------------+
| t |
+------------+
| piscanec |
+------------+
select * from t where t='piscanec' collate utf8_general_ci;
+------------+
| t |
+------------+
| piščanec |
| piscanec |
+------------+
The FULLTEXT index is supposed to use the column collation directly; you don't need to define a new collation. Apparently the fulltext index can only be in the column's storage collation, so if you want to use utf8_general_ci for searches and utf8_slovenian_ci for sorting, you have to use use collate in the order by:
select * from tab order by col collate utf8_slovenian_ci;
It's not straightforward, but you'll probably best off creating your own collation for your fulltrext searches. Here is an example:
http://dev.mysql.com/doc/refman/5.5/en/full-text-adding-collation.html
with more info here:
http://dev.mysql.com/doc/refman/5.5/en/adding-collation.html
That way, you have your collation logic completely independent of your SQL and business logic, and you're not having to do any heavy-lifting yourself with SQL-workarounds.
EDIT: since collations are used for all string-matching operations, this may not be the best way to go: you will end up obfuscating differences between characters that are linguistically discrete.
If you want to suppress these differences for specific operations, then you might consider writing a function that takes a string and replaces - in a targetted way - characters which, for the purposes of the current operation, are to be considered identical.
You could define one table holding your base characters (š, č etc.) and another holding the equivalences. Then run a REPLACE over your string.
Another way is just to CAST your string to ASCII, thereby suppressing all non-ASCII characters.
e.g.
SELECT CONVERT('<your text here>' USING ascii) as as_ascii