Mysql Select with LIKE clause is not working Chinese characters - mysql

I have data stored in single column which are in English and Chinese.
the data is separated by the separators e.g.
for Chinese
<!--:zh-->日本<!--:-->
for English
<!--:en-->English Characters<!--:-->
I would show the content according to users selected language.
I made a query like this
SELECT * FROM table WHERE content LIKE '<!--:zh-->%<!--:-->'
The query above works but return empty result set.
Collation of content column is utf8_general_ci
I have also tried using the convert function like below
SELECT * FROM table WHERE CONVERT(content USING utf8)
LIKE CONVERT('<!--:zh-->%<!--:-->' USING utf8)
But this also does not work.
I also tried running the query SET NAMES UTF8 but still it does not work.
I am running queries in PhpMyAdmin if it does matter.
qTranslate did not change the database used by WordPress. Translation data is stored in original fields. For that reason there is each field containing all translations for that special field and the data is like this
<!--:en-->English Characters<!--:--><!--:zh-->日本<!--:-->
http://wpml.org/documentation/related-projects/qtranslate-importer/

Test table data for content
<!--:zh-->日本<!--:--><!--:en-->English Characters<!--:-->
<!--:en-->English Characters<!--:--><!--:zh-->日本<!--:-->
<!--:zh-->日本<!--:-->
<!--:en-->English Characters<!--:-->
followed by
I have data stored in single column which are in English and
Chinese
and your select should look like this
SELECT * FROM tab
WHERE content LIKE '%<!--:zh-->%<!--:-->%'
SQL Fiddle DEMO (also with demo how to get the special language part out of content)
SET #PRE = '<!--:zh-->', #SUF = '<!--:-->';
SELECT
content,
SUBSTR(
content,
LOCATE( #PRE, content ) + LENGTH( #PRE ),
LOCATE( #SUF, content, LOCATE( #PRE, content ) ) - LOCATE( #PRE, content ) - LENGTH( #PRE )
) langcontent
FROM tab
WHERE content LIKE CONCAT( '%', #PRE, '%', #SUF, '%' );
as stated in MySQL Documentation and follow the example of
SELECT 'David!' LIKE '%D%v%';

As others have pointed, your queries seem to be fine, so I'd look somewhere else. This is something you can try:
I'm not sure about chinese input, but for japanese, many symbols have full-width and half-width variants, for example: "hello" and "hello" look similar, but the codepoints of their characaters are different, and therefore won't compare as equal. It's very easy to mistype something in full-width, and very difficult to detect, especially for whitespace. Compare " " and " ".
You are probably storing your data in half width and querying it in full width. Even if one character is different (especially spaces are difficult to detect), the query will not find your desired data.
There are many ways to detect this, for instance try copying the data and query into text files verbatim, and view them with hex editors. If there is a single bit difference in the relevant parts, you may be dealing with this problem.

Assuming you're using MySQL, you can use wildcards in LIKE:
% matches any number of characters, including zero characters.
_ matches exactly one character
Here's an example search for values containing the character 日 in the content column of your table:
SELECT * FROM table WHERE `content` LIKE '%日%'

Search fails because of the way you store data.
You are using utf8_general_ci collation, which is tailored to fast search in some European languages. It is even not so perfect with some of them. People tend to use it just because it fast and they don't care about some search inaccuracy in, say, Scandinavian languages.
Change this to big5_chinese_ci or some other Chinese - tuned collation.
UPD.
Another thing.
I see, you use a kind of markup in your DB records.
<!--:zh-->日本<!--:-->
<!--:en-->English Characters<!--:-->
So, if you're searching for Chinese, you may just use
SELECT * FROM table WHERE content LIKE '<!--:zh-->%'
instead of
SELECT * FROM table WHERE content LIKE '<!--:zh-->%<!--:-->'

I have tried to reproduce the problem. The query is OK, I have got the result, even using SET NAMES latin1.
Check the content of the field, possible there are beginning/ending white spaces, remove them firstly, or try this query -
SELECT * FROM table
WHERE TRIM(content) LIKE '<!--:zh-->%<!--:-->'
Example with your string -
CREATE TABLE table1(
column1 VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_general_ci
);
INSERT INTO table1 VALUES
('<!--:en-->English Characters<!--:--><!--:zh-->日本<!--:-->');
SELECT * FROM table1 WHERE column1 LIKE '%<!--:zh-->%<!--:-->';
=> <!--:en-->English Characters<!--:--><!--:zh-->日本<!--:-->

Can I ask what version of MySQL you're using? From what I see your code seems fine, which gets me thinking you're not running the most up to date version of MySQL.

Related

mysql MATCH AGAINST weird characters query

I have a table where the field "company_name" has weird characters, like "à","ö","¬","©","¬","†", etc. I want to return all "company_name"s that contain these characters anywhere within the string. My current query looks like this:
SELECT * FROM table WHERE
MATCH (company_name) AGAINST ('"Ä","à","ö","¬","©","¬","†"' in natural language mode);
But I keep getting no data from the query. I know this can't be the case, as there are definitely examples of them I can find manually. To be clear, the query itself isn't throwing any errors, just not returning any data.
The minimun word length is 3 pr 4 .
you can change it see manial
https://dev.mysql.com/doc/refman/8.0/en/fulltext-fine-tuning.html
or use regular expressiions
SELECT * FROM table WHERE
ompany_name REGEXP '[Äàö¬©¬†]+';
SELECT *
FROM table
WHERE company_name LIKE '%[^0-9a-zA-Z !"#$%&''()*+,\-./:;<=>?#\[\^_`{|}~\]\\]%' ESCAPE '\'
This will find any wacky stuff, including wide characters or 'euro-ASCII' or emoji.

Removing Cyrillic text from MySQL Select string

Recently i stuck with some problems with MySQL queries. I have table that contains multiple language records. For example Columns are ID and Description. It have data like this 1 test with Кирилица; 2 Test without Cyrillic. I need to remove all cyrillic symbols from Select query. The Select answer must be like 1 test with;2 Test without Cyrillic. Seems like i need to use Select Replace query, but is it possible to do it much faster way than replace 66 characters in query for Header letters and small letters.
I have tried something like that. But of course this isnt working. Hope for help from MySQL Gurus. Thank You for attention
SELECT id,SUBSTRING_INDEX(title, REGEXP "[а-яА-Я]", 1)
FROM Test
AFAIK there is no faster method than
SELECT id, REGEXP_REPLACE(title, '[Ѐ-ӿ]+', '') AS title FROM test;
Fiddle
("Ѐ" and "ӿ" are the first and last characters, respectively, of Unicode Cyrillic block. If you go with [а-яА-Я], you can miss Cyrillic letters of languages outside Russian, and even the Russian Ё.)

mysql select query ignoring inner spaces

Banging me head against the wall with this one.
I have table containing postcodes and street names and I have another table where Houses are listed for sale ( where the Street name is missing) and I am tryin to get the Street name for each post code.
The problem is that table 1 stores the postcode without the space and table 2 which I am trying to update stores the post code with the space.
So in table 1 the postcode is stored as "l249pb" and table 2 it is stored as "l24 9pb".
Now if the post codes where both stored in exactly the same format i.e without the space I would expect this query to work:
UPDATE Table1
INNER JOIN Table2 ON ( Table1.PostCode = Table2.PostCode )
SET Table1.StreetName = Table2.StreetName
I have tried this but it wont work :
UPDATE Table1
INNER JOIN Table2 ON ( Table1.PostCode = REPLACE(Table2.PostCode,' ',''))
SET Table1.StreetName = Table2.StreetName
can anyone tell me how to check for a match ignoring spaces ( like a trim but removing every space )
Many thanks for any help you can offer.
With the data you've given your UPDATE runs just fine. Probably the whitespaces you see are not actually spaces, but something else, e.g. non-breaking spaces, tabs etc.
After normal SPACE, the next most common white spaces (which are not line breaks) are CHARACTER TABULATION (ie. horizontal tab) and NO-BREAK SPACE. You could use CHAR(9) and CHAR(160), respectively, to reference them in your query.
It also might be possible that your table viewer application shows line breaks as a space for brevity, so if replacing space, tab and nbsp isn't enough, try replacing those, too.
If you really need to replace all white space characters… Unfortunately there is no "white space wildcard" to use in MySQL. Technically, you could make a monster REPLACE(REPLACE(REPLACE(REPLACE…-call, which, in the end, would replace all whitespace characters with ''. For example, to replace every THREE-PER-EM SPACE, first look for its Unicode code point (U+2004), then you can replace its occurences e.g. with:
REPLACE(PostCode, CHAR(0x2004 using ucs2), '')
There is a hackish shortcut to this: if you are sure that your data should contain only Latin-1 characters and no ? (question mark), you could CONVERT() the string first as latin1, which replaces all characters with overflowing code as ?and then replace all ? as '':
REPLACE(CONVERT(PostCode using latin1), '?', '')
This can be useful in one-off, manual queries, but for continuing use, better replace the characters explicitly.
But first you should check your data input sanitizer/validator, so future records won't be such a mess. Perhaps you could consider running a bulk replace to normalize the data on PostCode column(s), if possible, before even trying to do your join query. Legacy systems with legacy data only get worse over time.

How to convert latin characters to their corresponding unicode escape sequences in MySql?

First let me tell you that the character set and collation I'm using is utf8_general_ci.
I have two tables, A and B, for the sake of example.
Table A has a column (let's call it 'columnX'). In any row of table A columnX might have a value that contains latin characters, e.g., 'niño' (means 'boy' in english).
Table B has a column (let's call it 'columnY') that I know might contain 'niño' as part of its value, e.g., 'es un niño bueno' ('he is a good boy' in english), but the 'ñ' will be escaped, since I know columnY contains JSON strings so that string literal would be encoded as 'es un ni\u00f1o bueno'.
I need to find all rows of table A whose column A.columnX is contained in any B.columnY. I need a function that converts al A.columnX in its corresponding escaped version. Something along the lines of the following code:
SELECT * FROM A
INNER JOIN B
ON B.columnY LIKE CONCAT('%', escapeUtf8(A.columnX) ,'%')
I have tried using QUOTE, CONVERT, CAST and also googles a lot, but all I've found is the opposite to what I need (the posts explain how to convert escaped sequences to something readable by humans).
Thanks in advance,
Adrián
You should be storing ñ as hex C3B1 in a utf8 column. Check by doing SELECT HEX(col) .... If it is &...; (there are about 3 possible ways to represent the "html entity"), then you have a mess that should be fixed when INSERTing the text.
escapeUtf8() only complicates the situation; no function should be needed there.
I found a workaround
What I did was to use the STRINGDECODE function that I found here (look at Joni's answer): MySQL decode Unicode to UTF-8 function and I used it to "reverse" the query.
Instead of escaping the column with "ñ" (or accented letters, etc.) from table A, I unescaped the escaped columns in table B with STRINGDECODE.
Also, since I needed to use the decoded column values several times, I stored B.columnY in a temporary table (after applying STRINGDECODE, of course) for later use. This would be the sample code:
CREATE TEMPORARY TABLE sanitizedB
SELECT STRINGDECODE(columnY) as sanitizedColumnY, columnZ
FROM B;
SELECT * FROM A
INNER JOIN sanitizedB SB
ON SB.sanitizedColumnY LIKE CONCAT('%', A.columnX ,'%')
Of course, I added other columns that I needed too for later use.
It's worth saying that the B table had less than 10.000 records, so the creation of the the temporary table was quite fast... Possibly is not the best solution, but worked well for me.

MYSQL full text search numbers and character being ignored?

I"m trying to do a full text search, the database I'm querying has a lot of LCD screen sizes. Eg. 32".
I need to do a full text search as a search phrase can be complex, we started with a LIKE comparison but it didn't cut it.
Heres what we've got.
SELECT stock_items.name AS si_name,
stock_makes.name AS sm_name,
stock_items.id AS si_id
FROM stock_items
LEFT JOIN stock_makes ON stock_items.make = stock_makes.id
WHERE MATCH (stock_items.name,
stock_makes.name) AGAINST ('+32"' IN BOOLEAN MODE)
AND stock_items.deleted != 1
With this search we get 0 results. Although 32" Appears multiple times in the fields.
We have modified the mysql config to allow us to search 3 characters or more (instead of the default four) and searches like +NEC work fine.
My guess is here that either a) full text search in mysql ignores " character or maybe the numbers.
I don't have any control over the database data unfortunately or I'd replace the double quotes with something else.
Any other solutions ?
MySQL ignore cirtain characters when indexing, " is one of them I presume.
There are few ways to change the default character settings as described here
Modify the MySQL source: In myisam/ftdefs.h, see the true_word_char() and misc_word_char() macros. Add '-' to one of those macros and recompile MySQL.
Modify a character set file: This requires no recompilation. The true_word_char() macro uses a “character type” table to distinguish letters and numbers from other characters. . You can edit the contents of the array in one of the character set XML files to specify that '-' is a “letter.” Then use the given character set for your FULLTEXT indexes.
Add a new collation for the character set used by the indexed columns, and alter the columns to use that collation.
I used the second approach and that worked for us. The provided page has a detailed example in the comments at the end of the page (comment by John Navratil).
In any case you need to rebuild the index after you changed settings:
REPAIR TABLE tbl_name QUICK;