Mysql german accents not-sensitive search in full-text searches - mysql

Let`s have a example hotels table:
CREATE TABLE `hotels` (
`HotelNo` varchar(4) character set latin1 NOT NULL default '0000',
`Hotel` varchar(80) character set latin1 NOT NULL default '',
`City` varchar(100) character set latin1 default NULL,
`CityFR` varchar(100) character set latin1 default NULL,
`Region` varchar(50) character set latin1 default NULL,
`RegionFR` varchar(100) character set latin1 default NULL,
`Country` varchar(50) character set latin1 default NULL,
`CountryFR` varchar(50) character set latin1 default NULL,
`HotelText` text character set latin1,
`HotelTextFR` text character set latin1,
`tagsforsearch` text character set latin1,
`tagsforsearchFR` text character set latin1,
PRIMARY KEY (`HotelNo`),
FULLTEXT KEY `fulltextHotelSearch` (`HotelNo`,`Hotel`,`City`,`CityFR`,`Region`,`RegionFR`,`Country`,`CountryFR`,`HotelText`,`HotelTextFR`,`tagsforsearch`,`tagsforsearchFR`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci;
In this table for example we have only one hotel with Region name = "Graubünden" (please note umlaut ü character)
And now I want to achieve same search match for phrases:
'graubunden' and
'graubünden'
This is simple with use of MySql built in
collations in regular searches as follows:
SELECT *
FROM `hotels`
WHERE `Region` LIKE CONVERT(_utf8 '%graubunden%' USING latin1)
COLLATE latin1_german1_ci
This works fine for 'graubunden' and 'graubünden' and
as a result I receive proper result, but problem is
when we make MySQL full text search
Whats wrong with this SQL statement?:
SELECT
*
FROM
hotels
WHERE
MATCH (`HotelNo`,`Hotel`,`Address`,`City`,`CityFR`,`Region`,`RegionFR`,`Country`,`CountryFR`, `HotelText`, `HotelTextFR`, `tagsforsearch`, `tagsforsearchFR`)
AGAINST( CONVERT('+graubunden' USING latin1) COLLATE latin1_german1_ci IN BOOLEAN MODE)
ORDER BY Country ASC, Region ASC, City ASC
This doesn`t return any result.
Any ideas where the dog is buried ?

When you define individual CHARACTER SETS for your columns, you override the collation you set default on table level.
Each of your columns has default latin1 collation (which is latin1_swedish_ci). You can see it by running SHOW CREATE TABLE.
In FULLTEXT queries, indexed columns have COERCIBILITY of 0, that is all fulltext queries are converted to the collation used in the index, not vice versa.
You need to remove CHARACTER SET definitions from your columns or explicitly set all columns to latin1_german_ci:
CREATE TABLE `hotels` (
`HotelNo` varchar(4) NOT NULL default '0000',
`Hotel` varchar(80) NOT NULL default '',
`City` varchar(100) default NULL,
`CityFR` varchar(100) default NULL,
`Region` varchar(50) default NULL,
`RegionFR` varchar(100) default NULL,
`Country` varchar(50) default NULL,
`CountryFR` varchar(50) default NULL,
`HotelText` text,
`HotelTextFR` text,
`tagsforsearch` text,
`tagsforsearchFR` text,
PRIMARY KEY (`HotelNo`),
FULLTEXT KEY `fulltextHotelSearch` (`HotelNo`,`Hotel`,`City`,`CityFR`,`Region`,`RegionFR`,`Country`,`CountryFR`,`HotelText`,`HotelTextFR`,`tagsforsearch`,`tagsforsearchFR`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci;
INSERT
INTO hotels (hotelText, HotelTextFR, tagsforsearch, tagsforsearchFR)
VALUES ('text', 'text', 'graubünden', 'tags');
SELECT *
FROM hotels
WHERE MATCH (`HotelNo`,`Hotel`,`City`,`CityFR`,`Region`,`RegionFR`,`Country`,`CountryFR`, `HotelText`, `HotelTextFR`, `tagsforsearch`, `tagsforsearchFR`)
AGAINST (CONVERT('+graubunden' USING latin1) COLLATE latin1_german1_ci IN BOOLEAN MODE)
ORDER BY
Country ASC, Region ASC, City ASC;

Related

Mysql search interprets european character as non-european character

When I do a search on a string containing the letter "a", that letter also matches words containing the letter "ä" and "å".
So if I search for "anna" it matches a text containing "känna". If I search for "manda" it matches "månda".
The database character set is: latin1
The table character set: utf8mb4
I can test it by searching a particular row and a particular column:
SELECT * FROM events AS e
WHERE e.content LIKE '%anna%' AND e.id = 1230
That would return that row (1230). If I remove the line containing "känna" from the "content" column of that row, there will be no match. So it interprets all european characters based on "a" as also being "ä", or "å".
How do I correct this?
edit:
okay, here is a fiddle:
fiddle
Show create table events output: (I removed some columns):
CREATE TABLE `events` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`content` longtext COLLATE utf8mb4_unicode_ci,
`event_at` timestamp NULL DEFAULT NULL,
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1242 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
Try to specify the character set and collate for your literal:
SELECT * FROM events AS e
WHERE e.content LIKE _utf8mb4 '%anna%' COLLATE utf8mb4_unicode_ci AND e.id = 1230

Not inserting Turkish characters in SQL table. Displays 'xxx??xxx' when selected

I am trying to add Turkish names on my table but then when displayed it gives me ? instead of any of them. Any help what I am missing here? This is my table:
CREATE TABLE IF NOT EXISTS `offerings` (
`dep` varchar(5) CHARACTER SET utf16 COLLATE utf16_turkish_ci DEFAULT NULL,
`grade` varchar(4) CHARACTER SET utf16 COLLATE utf16_turkish_ci DEFAULT NULL,
`section` varchar(3) CHARACTER SET utf16 COLLATE utf16_turkish_ci DEFAULT NULL,
`name` varchar(100) CHARACTER SET utf16 COLLATE utf16_turkish_ci DEFAULT NULL,
`teacher` varchar(50) CHARACTER SET utf16 COLLATE utf16_turkish_ci DEFAULT NULL,
`quota` varchar(2) CHARACTER SET utf16 COLLATE utf16_turkish_ci DEFAULT NULL,
`lec1` varchar(35) CHARACTER SET utf16 COLLATE utf16_turkish_ci DEFAULT NULL,
`lec2` varchar(35) CHARACTER SET utf16 COLLATE utf16_turkish_ci DEFAULT NULL,
`lec3` varchar(35) DEFAULT NULL,
`lec4` varchar(35) DEFAULT NULL,
`lec5` varchar(35) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf16;
As suggested from the answer I choose here is the solution to the problem for whoever googles this topic. Special thanks to all who contributed in the solution of my problem.
CREATE TABLE IF NOT EXISTS `offerings` (
`dep` varchar(5) NOT NULL,
`grade` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`section` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
`name` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
`teacher` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`quota` varchar(2) COLLATE utf8_unicode_ci,
`lec1` varchar(35) DEFAULT NULL,
`lec2` varchar(35) DEFAULT NULL,
`lec3` varchar(35) DEFAULT NULL,
`lec4` varchar(35) DEFAULT NULL,
`lec5` varchar(35) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
(Beginnings of an answer...)
Please don't use utf16; there is virtually no reason for such in a MySQL table.
So, assuming you switch to utf8, let's see if we can get rid of the ? problems.
utf8 needs to be established in about 4 places.
The column(s) in the database -- Use SHOW CREATE TABLE to verify that they are explicitly set to utf8, or defaulted from the table definition. (It is not enough to change the database default.)
The connection between the client and the server. See SET NAMES utf8.
The bytes you have. (This is probably the case.)
If you are displaying the text in a web page, check the <meta> tag.
What probably happened:
you had utf8-encoded data (good)
SET NAMES latin1 was in effect (default, but wrong)
the column was declared CHARACTER SET latin1 (default, but wrong)
Since the CHARACTER SET disagrees with what you have shown, the problem is possibly more complex. Please provide
SELECT col, HEX(col) FROM tbl WHERE ...
for some simple cell with Turkish characters. With this, I may be able to figure out what happened.
Also, Reference notes on encodings.
VARCHARs are character strings, while NVARCHARS are Unicode character strings. NVARCHARS require more bits per character to store, but have a greater range. Try updating your data types. This should fix your problem.
EDIT This answer is wrong. The OP clearly asked for a MySQL solution, but the above applies only to SQL Server.

MySQL character set for multiple European languages + mathematic symbols

I need some help getting a MySQL table to store and output characters from the following languages:
English
French
Russian
Turkish
German
These are the languages I know of in the data. It also uses mathematical symbols such as this:
b ∈ A. Define s(A):= supn≥0 r A (n) for each A ⊆ ? ∪ {0}.
I'm using htmlentities to encode the text. The ? above is meant to display as a ℕ.
It displays this way when I look at the data in PhpMyAdmin. The other characters are encoded as expected.
The table is set to utf8_unicode_ci and all aspects of the website have been set to UTF-8 (including via the .htaccess file, a PHP header and a meta tag).
Please help?
Additional info:
Hosting environment:
Linux, Apache
Mysql 5.5.38
PHP Version 5.4.4-14
Connection string :
ini_set('default_charset', 'UTF-8');
$mysqli = new mysqli($DB_host , $DB_username, $DB_password);
$mysqli->set_charset("utf8");
$mysqli->select_db($DB_name);
SHOW CREATE TABLE mydatabase.mytable output:
CREATE TABLE `tablename` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`created` datetime NOT NULL,
`updated` datetime NOT NULL,
`product` int(11) NOT NULL,
`ppub` tinytext COLLATE utf8_unicode_ci NOT NULL,
`pubdate` date NOT NULL,
`numerous_other_tinytext_cols` tinytext COLLATE utf8_unicode_ci NOT NULL,
`numerous_other_tinytext_cols` tinytext COLLATE utf8_unicode_ci NOT NULL,
`text` text COLLATE utf8_unicode_ci NOT NULL,
`keywords` tinytext COLLATE utf8_unicode_ci NOT NULL,
`active` int(11) NOT NULL DEFAULT '1',
`orderid` int(11) NOT NULL,
`src` tinytext CHARACTER SET latin1 NOT NULL,
`views` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=17780 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
SELECT DEFAULT_CHARACTER_SET_NAME FROM information_schema.SCHEMATA output:
DEFAULT_CHARACTER_SET_NAME
utf8 [->UTF-8 Unicode]
utf8mb4 [->UTF-8 Unicode]
Fonts used:
Arial
Sample of text in the database:
Let <em>A</em> be a subset of the set of nonnegative integers ℕ ∪ {0}, and let <em>r</em><sub><em>A</em></sub> (<em>n</em>) be the number of representations of <em>n</em> ≥ 0 by the sum <em>a</em> + <em>b</em> with <em>a, b</em> ∈ <em>A</em>.
Output on web page:
Let <em>A</em> be a subset of the set of nonnegative integers ? ∪ {0}, and let <em>r</em><sub><em>A</em></sub> (<em>n</em>) be the number of representations of <em>n</em> ≥ 0 by the sum <em>a</em> + <em>b</em> with <em>a, b</em> ∈ <em>A</em>.
Which becomes
Let A be a subset of the set of nonnegative integers ? ∪ {0}, and let rA (n) be the number of representations of n ≥ 0 by the sum a + b with a, b ∈ A.
While your database and table are configured to use UTF-8, one of your columns still isn't:
CREATE TABLE `tablename` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`created` datetime NOT NULL,
`updated` datetime NOT NULL,
`product` int(11) NOT NULL,
`ppub` tinytext COLLATE utf8_unicode_ci NOT NULL,
`pubdate` date NOT NULL,
`numerous_other_tinytext_cols` tinytext COLLATE utf8_unicode_ci NOT NULL,
`numerous_other_tinytext_cols` tinytext COLLATE utf8_unicode_ci NOT NULL,
`text` text COLLATE utf8_unicode_ci NOT NULL,
`keywords` tinytext COLLATE utf8_unicode_ci NOT NULL,
`active` int(11) NOT NULL DEFAULT '1',
`orderid` int(11) NOT NULL,
`src` tinytext CHARACTER SET latin1 NOT NULL, <--------- This one
`views` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=17780 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
Since all the other symbols got HTML-encoded, they will survive all charsets, but not ℕ, which doesn't have a named entity reference.
You need to convert your column:
ALTER TABLE tablename MODIFY src TINYTEXT CHARACTER SET utf8;
NOTE: I noticed you like mathematical symbols. Some of them are outside of the Basic Multilingual Plane, ie. have codepoints > 0xFFFF, for example mathematical letter variants (fraktur, double-struck, semantic italic etc.).
If you want to support them, you need to switch encoding in MySQL everywhere (table, columns, connection) to utf8mb4, which is the true UTF-8 (utf8 in MySQL means the subset of UTF-8 with BMP only), with utf8mb4_unicode_ci collation. Here is how to do the migration.
Also, I have noticed that you are HTML-encoding HTML. Maybe you have a reason, but in my opinion storing this doesn't make sense:
<em>A</em>
If you want to put it into an HTML document, now you need to HTML-decode it at least once, sometimes twice. I'd rather store what almost everybody else does:
<em>A</em>
This way, you will store Unicode characters natively, in optimal way.

View Japanese Characters in MySQL Workbench

I already tried This solution which says
ALTER TABLE title
CHARACTER SET utf8
COLLATE utf8_unicode_ci;
Ok here are some screen shots which might help you.
Update
here's what happens when i insert Japanese characters.
Update 2
Show create table gives this
CREATE TABLE `productInfo` (
`pID` int(11) NOT NULL AUTO_INCREMENT,
`pOperation` varchar(40) CHARACTER SET latin1 DEFAULT NULL,
`year` year(4) DEFAULT NULL,
`season` varchar(10) CHARACTER SET latin1 DEFAULT NULL,
`pName` varchar(40) CHARACTER SET latin1 DEFAULT NULL,
`category` varchar(40) CHARACTER SET latin1 DEFAULT NULL,
`margin1` text CHARACTER SET latin1,
`margin2` text CHARACTER SET latin1,
PRIMARY KEY (`pID`)
) ENGINE=InnoDB AUTO_INCREMENT=12 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
just see that
DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
But now see that the query
SELECT character_set_name, collation_name
FROM information_schema.columns
WHERE table_schema = 'trac_data'
AND table_name = 'productInfo'
AND column_name = 'pOperation';
gives
character_set_name collation_name
'latin1' 'latin1_swedish_ci'
Thats weird !
Update 3
SELECT hex(pOperation),pOperation FROM trac_data.productInfo;
gave 3F3F3F3F3F which is hex code for actual '?' and not any japanese character so that means no japanese characters are being stored
You have a mix of charsets in your table structure. The table itself uses utf8, but the column in question uses latin 1. You have it defined that way. As long as you have an own charset for your column you can change the table's or the schema's column a thousand times. It won't have any effect on your column. So, instead change the column's charset to either default (to use that of the table) or make it using utf8 explicitely.
When you alter the column's charset existing data will be converted (if possible). Your wrong input however stays wrong, so you have to fill the data again.
Ok i found the cause
CREATE TABLE `productInfo` (
`pID` int(11) NOT NULL AUTO_INCREMENT,
`pOperation` varchar(40) CHARACTER SET latin1 DEFAULT NULL,
`year` year(4) DEFAULT NULL,
`season` varchar(10) CHARACTER SET latin1 DEFAULT NULL,
`pName` varchar(40) CHARACTER SET latin1 DEFAULT NULL,
`category` varchar(40) CHARACTER SET latin1 DEFAULT NULL,
`margin1` text CHARACTER SET latin1,
`margin2` text CHARACTER SET latin1,
PRIMARY KEY (`pID`)
) ENGINE=InnoDB AUTO_INCREMENT=12 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
I noticed how in front of each column SET latin1 was present.
So I Just changed to sjis and problem solved.
You have to set the database collation to UTF-8, not only the table collation :
Here is the SQL script result :

Mysql FullText Search Case-insensitive

I have table products_discription from OpenCart.
I created new search engine. Everything is okey, except that is case sensitive.
How I can make it insensitive.
I readed in Mysql Documentation I must change utf8_bin to utf8_general_ci.
But how to make it, without deleting all indexes.
Its not only one table. I'm looking for at 4 tables. Every table has around 4 -5 indexes.
The site brings non-stop information. Loss of information is simply not acceptable.
I was wondering if there is a way to extract keys to delete, and change the encoding. Then add them again with just one application. As such, I think that there will be no data loss.
CREATE TABLE IF NOT EXISTS `product_description` (
`product_id` int(11) NOT NULL,
`language_id` int(11) NOT NULL,
`name` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`short_description` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`description` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`meta_description` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`meta_keyword` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`tag` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`custom_title` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT '',
PRIMARY KEY (`product_id`,`language_id`),
FULLTEXT KEY `description` (`description`),
FULLTEXT KEY `tag` (`tag`),
FULLTEXT KEY `ft_namerel` (`name`,`description`),
FULLTEXT KEY `name` (`name`,`short_description`,`description`,`meta_description`,`meta_keyword`,`tag`,`custom_title`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
have you tried searching in boolean mode?
I deleted all index keys and change encoding, after that I set new index keys.