MySQL column name is a weird character - how do I change it? - mysql

I'm examining a table in MySQL that has a weird column name. I want to change the name of the column to not be weird. I can't figure out how to do so.
Firstly, if I first do
SET NAMES utf8;
DESC `tblName`;
I get
| Ԫ | varchar(255) | YES | MUL | NULL | |
Instead, doing
SET NAMES latin1;
DESC `tblName`;
Results in
| ? | varchar(255) | YES | MUL | NULL | |
Fair enough - this make me think the column name is simply a latin1 question mark. But this statement doesn't work:
mysql> ALTER TABLE `tblName` CHANGE COLUMN `?` `newName` VARCHAR(255);
ERROR 1054 (42S22): Unknown column '?' in 'tblName'
So I went to the information_schema table for some info:
mysql> SELECT column_name, HEX(column_name), ordinal_position FROM information_schema.columns WHERE table_schema = 'myschema' AND table_name = 'tblName' ;
| ? | D4AA | 48 |
I looked up this hex point, and assuming I looked it up correctly (which may not be true), I determined this character is "풪" which is the "hangul syllable pweoj". So I tried that in an alter table statement to no avail:
ALTER TABLE `tblName` change column `풪` `newName` VARCHAR(255);
So that's where I'm stuck.

I figured out a way to do this (but I wonder if there's a better solution?)
I did a SHOW CREATE statement:
mysql> SHOW CREATE TABLE `tblName`;
...
`Ԫ` varchar(255) DEFAULT NULL,
I looked for the column in question, which was printed strangely (what you see above doesn't quite match it). The closing backtick wasn't visible. But I highlighted what was visible and pasted that into my ALTER TABLE and that finally fixed the issue.

I believe that Ԫ (question mark in a box) is actually shown because your system does not have a font at that code point. From your `hex(column_name)' we can see that the value is xD4AA, which is the UTF-8 value. This translates to Unicode point 052a for which I don't have the font either on my Windows box.
Setting the char set to latin1, simply meant that Mysql was unable to translate that char to a latin1/cp1252 value so replaced it with "?". (xD4AA could easily be translated to two cp1252 chars, "Ôª". For some reason Mysql chose not to. Perhaps it knew the original encoding?)
Now, how to rename the column? It should be as simple as you say with ALTER TABLE CHANGE COLUMN etc etc. However, it seems that the Mysql console doesn't play nice with non-ASCII chars, especially variable length chars found in UTF-8.
The solution was to pass the SQL as an argument to mysql from Bash instead. For example (Ensure terminal translation is UTF-8 before pasting Ԫ):
mysql --default-character-set=utf8 -e "ALTER TABLE test change column Ԫ test varchar(255);" test

Related

MYSQL UTF8MB4 keep showing ???? character

I would like to save extra rare Chinese character in database using mysql,
however keep showing ???? after the text is inserted into the table.
Previously I have searched how to insert any 4byte Chinese character into mysql data, and following the process ("Incorrect string value" when trying to insert UTF-8 into MySQL via JDBC?) but still no use. Is there something I miss?
Fyi, I have updated mysql version into 5.7.18, and I try to insert the extra rare Chinese character from phpmyadmin directly.
thanks in advance!
As per the CREATE TABLE Syntax:
CREATE [TEMPORARY] TABLE [IF NOT EXISTS] tbl_name
(create_definition,...)
[table_options]
[partition_options]
[...]
table_option:
ENGINE [=] engine_name
| AUTO_INCREMENT [=] value
| AVG_ROW_LENGTH [=] value
| [DEFAULT] CHARACTER SET [=] charset_name
In other words:
CREATE TABLE test (
some_column VARCHAR(100)
)
CHARACTER SET = utf8mb4;
It won't hurt either to pick a specific collation.
You also must change the connection to be utf8mb4. See http://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored for discussion of "question marks".
Note: The data failed to be stored correctly; it cannot be recovered without re-inserting the data.

MySQL Database why do Õ come back as O when Search

i have multiple special characters (Õ) in my table column
When i search for Õ its showing O also in search results.
select * from table where column like '%Õ%'
i want to replace Õ with single Question in my Table.
Example : its saving as below
ItÕs going to be just one of the factors that will be the cause of the resistance
so figure out which one it is and focus on that one.
If you are using utf8 currently:
mysql> SELECT REPLACE('O-o-Õ-Õ-x', 'Õ', '?') COLLATE utf8_bin;
+----------------------------------------------------+
| REPLACE('O-o-Õ-Õ-x', 'Õ', '?') COLLATE utf8_bin |
+----------------------------------------------------+
| O-o-?-?-x |
+----------------------------------------------------+
Notice how it replaced only the Õ characters.
If you are using utf8mb4, then change to COLLATE utf8mb4_bin.
Caution -- Your problem is very unusual. If you have left out some aspects of the problem, this solution may do more harm than good.

Use accent senstive primary key in MySQL

Desired result :
Have an accent sensitive primary key in MySQL.
I have a table of unique words, so I use the word itself as a primary key (by the way if someone can give me an advice about it, I have no idea if it's a good design/practice or not).
I need that field to be accent (and why not case) sensitive, because it must distinguish between, for instance, 'demandé' and 'demande', two different inflexions of the French verb "demander". I do not have any problem to store accented words in the database. I just can't insert two accented characters strings that are identical when unaccented.
Error :
When trying to create the 'demandé' row with the following query:
INSERT INTO `corpus`.`token` (`name_token`) VALUES ('demandé');
I got this error :
ERROR 1062: 1062: Duplicate entry 'demandé' for key 'PRIMARY'
Questions :
Where in the process should a make a modification in order to have two different unique primary keys for "demande" and "demandé" in that table ?
SOLUTION using 'collate utf8_general_ci' in table declaration
How can i make accent sensitive queries ? Is the following the right way :
SELECT * FROM corpus.token WHERE name_token = 'demandé' COLLATE utf8_bin
SOLUTION using 'collate utf8_bin' with WHERE statement
I found that i can achieve this point by using the BINARY Keyword (see this sqlFiddle). What is the difference between collate and binary?
Can I preserve other tables from any changes ? (I'll have to rebuild that table anyway, because it's kind of messy)
I'm not very comfortable with encoding in MySQL. I don't have any problem yet with encoding in that database (and I'm kind of lucky because my data might not always use the same encoding... and there is not much I can do about it). I have a feeling that any modification regarding to that "accent sensitive" issue might create some encoding issue with other queries or data integrity. Am I right to be concerned?
Step by step :
Database creation :
CREATE DATABASE corpus DEFAULT CHARACTER SET utf8;
Table of unique words :
CREATE TABLE token (name_token VARCHAR(50), freq INTEGER, CONSTRAINT pk_token PRIMARY KEY (name_token))
Queries
SELECT * FROM corpus.token WHERE name_token = 'demande';
SELECT * FROM corpus.token WHERE name_token = 'demandé';
both returns the same row:
demande
Collations. You have two choices, not three:
utf8_bin treats all of these as different: demandé and demande and Demandé.
utf8_..._ci (typically utf8_general_ci or utf8_unicode_ci) treats all of these as the same: demandé and demande and Demandé.
If you want only case sensitivity (demandé = demande, but neither match Demandé), you are out of luck.
If you want only accent sensitivity (demandé = Demandé, but neither match demande), you are out of luck.
Declaration. The best way to do whatever you pick:
CREATE TABLE (
name VARCHAR(...) CHARACTER SET utf8 COLLATE utf8_... NOT NULL,
...
PRIMARY KEY(name)
)
Don't change collation on the fly. This won't use the index (that is, will be slow) if the collation is different in name:
WHERE name = ... COLLATE ...
BINARY. The datatypes BINARY, VARBINARY and BLOB are very much like CHAR, VARCHAR, and TEXT with COLLATE ..._bin. Perhaps the only difference is that text will be checked for valid utf8 storing in a VARCHAR ... COLLATE ..._bin, but it will not be checked when storing into VARBINARY.... Comparisons (WHERE, ORDER BY, etc) will be the same; that is, simply compare the bits, don't do case folding or accent stripping, etc.
May be you need this
_ci in a collation name=case insensitive
If your searches on that field are always going to be case-sensitive, then declare the collation of the field as utf8_bin... that'll compare for equality the utf8-encoded bytes.
col_name varchar(10) collate utf8_bin
If searches are normally case-insensitive, but you want to make an exception for this search, try;
WHERE col_name = 'demandé' collate utf8_bin
More here
Try this
mysql> SET NAMES 'utf8' COLLATE 'utf8_general_ci';
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE t1
-> (c1 CHAR(1) CHARACTER SET UTF8 COLLATE utf8_general_ci);
Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO t1 VALUES ('a'),('A'),('À'),('á');
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> SELECT c1, HEX(c1), HEX(WEIGHT_STRING(c1)) FROM t1;
+------+---------+------------------------+
| c1 | HEX(c1) | HEX(WEIGHT_STRING(c1)) |
+------+---------+------------------------+
| a | 61 | 0041 |
| A | 41 | 0041 |
| À | C380 | 0041 |
| á | C3A1 | 0041 |
+------+---------+------------------------+
4 rows in set (0.00 sec)

How to search in mysql so that accented character is same as non-accented?

I'd like to have:
piščanec = piscanec in mysql. I mean, I'd like to search for piscanec to find piščanec also.
So the č and c would be same, š and s etc...
I know it can be done using regexp, but this is slow :-( Any other way with LIKE? I am also using full text searches a lot.
UPDATE:
select CONVERT('čšćžđ' USING ascii) as text
does not work. Produces: ?????
Declare the column with the collation utf8_generic_ci. This collation considers š equal to s and č equal to c:
create temporary table t (t varchar(100) collate utf8_general_ci);
insert into t set t = 'piščanec';
insert into t set t = 'piscanec';
select * from t where t='piscanec';
+------------+
| t |
+------------+
| piščanec |
| piscanec |
+------------+
If you don't want to or can't use the utf8_generic_ci collation for the column--maybe you have a unique index on the column and want to consider piščanec and piscanec distinct?--you can use collation in the query only:
create temporary table t (t varchar(100) collate utf8_bin);
insert into t set t = 'piščanec';
insert into t set t = 'piscanec';
select * from t where t='piscanec';
+------------+
| t |
+------------+
| piscanec |
+------------+
select * from t where t='piscanec' collate utf8_general_ci;
+------------+
| t |
+------------+
| piščanec |
| piscanec |
+------------+
The FULLTEXT index is supposed to use the column collation directly; you don't need to define a new collation. Apparently the fulltext index can only be in the column's storage collation, so if you want to use utf8_general_ci for searches and utf8_slovenian_ci for sorting, you have to use use collate in the order by:
select * from tab order by col collate utf8_slovenian_ci;
It's not straightforward, but you'll probably best off creating your own collation for your fulltrext searches. Here is an example:
http://dev.mysql.com/doc/refman/5.5/en/full-text-adding-collation.html
with more info here:
http://dev.mysql.com/doc/refman/5.5/en/adding-collation.html
That way, you have your collation logic completely independent of your SQL and business logic, and you're not having to do any heavy-lifting yourself with SQL-workarounds.
EDIT: since collations are used for all string-matching operations, this may not be the best way to go: you will end up obfuscating differences between characters that are linguistically discrete.
If you want to suppress these differences for specific operations, then you might consider writing a function that takes a string and replaces - in a targetted way - characters which, for the purposes of the current operation, are to be considered identical.
You could define one table holding your base characters (š, č etc.) and another holding the equivalences. Then run a REPLACE over your string.
Another way is just to CAST your string to ASCII, thereby suppressing all non-ASCII characters.
e.g.
SELECT CONVERT('<your text here>' USING ascii) as as_ascii

MySQL: char_length(), wrong value for Russian

I am using char_length() to measure the size of "Русский": strangely, instead of telling me that it's 7 chars, it tells me there are 14. Interestingly if the query is simply...
SELECT CHAR_LENGTH('Русский')
...the answer is correct. However if I query the DB instead, the anser is 14:
SELECT CHAR_LENGTH(text) FROM locales WHERE lang = 'ru-RU' AND name = 'lang_name'
Anybody go any ideas what I might be doing wrong? I can confirm that the collation is utf8_general_ci and the table is MyISAM
Thanks,
Adrien
EDIT: My end objective is to be able to measure the lengths of records in a table containing single and double-byte chracters (eg. English & Russian, but not limited to these two languages only)
Because of two bytes is used for each UTF8 char.
See http://dev.mysql.com/doc/refman/5.5/en/string-functions.html#function_char-length
mysql> set names utf8;
mysql> SELECT CHAR_LENGTH('Русский'); result - 7
mysql> SELECT CHAR_LENGTH('test'); result - 4
create table test123 (
text VARCHAR(255) NOT NULL DEFAULT '',
text_text TEXT) Engine=Innodb default charset=UTF8;
insert into test123 VALUES('русский','test русский');
SELECT CHAR_LENGTH(text),CHAR_LENGTH(text_text) from test123; result - 7 and 12
I have tested work with: set names koi8r; create table and so on and got invalid result.
So the solution is recreate table and insert all data after setting set names UTF8.
the function return it's anwser guided by the most adjacent charset avaiable
in the case of a column, the column definition
in the case of a literal, the connection default
review the column charset with:
SELECT CHARACTER_SET_NAME FROM information_schema.`COLUMNS`
where table_name = 'locales'
and column_name = 'text'
be careful, it is not filtered by table_schema