I have table products_discription from OpenCart.
I created new search engine. Everything is okey, except that is case sensitive.
How I can make it insensitive.
I readed in Mysql Documentation I must change utf8_bin to utf8_general_ci.
But how to make it, without deleting all indexes.
Its not only one table. I'm looking for at 4 tables. Every table has around 4 -5 indexes.
The site brings non-stop information. Loss of information is simply not acceptable.
I was wondering if there is a way to extract keys to delete, and change the encoding. Then add them again with just one application. As such, I think that there will be no data loss.
CREATE TABLE IF NOT EXISTS `product_description` (
`product_id` int(11) NOT NULL,
`language_id` int(11) NOT NULL,
`name` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`short_description` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`description` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`meta_description` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`meta_keyword` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`tag` text CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
`custom_title` varchar(255) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT '',
PRIMARY KEY (`product_id`,`language_id`),
FULLTEXT KEY `description` (`description`),
FULLTEXT KEY `tag` (`tag`),
FULLTEXT KEY `ft_namerel` (`name`,`description`),
FULLTEXT KEY `name` (`name`,`short_description`,`description`,`meta_description`,`meta_keyword`,`tag`,`custom_title`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
have you tried searching in boolean mode?
I deleted all index keys and change encoding, after that I set new index keys.
Related
By my ignorance, I have altered a few tables without specifying collation.
That caused changed columns, which used to be latin1 characters, to be changed to utf8mb4.
This brought HUGE performance loss running joins. And when I say HUGE I mean fraction of a second changed to one hour or more!
So I have made an other request to convert it back to latin1.
And here comes the problem. Mere 60k row table, with ONE utf8mb4 column of 64 characters required 10 hours to complete. No, it is not a mistake. TEN hours. And my even bigger problem is that I have other tables that have millions of rows giving me ETA in years from today!
So now, I wonder what my options are because I can't afford having these tables to be read-only for longer than one day time.
I know that MYSQL ALTER creates a copy of a table. It makes sense because this is field size change, so I doubt I have an option to use ALGORITHM=INPLACE.
If I cannot do INPLACE then I cannot use LOCK=NONE option.
Why in the world utf8mb4 -> latin1 conversion could make such a big impact?
Note that the converted column is indexed, and this may be a reason for the impact!
ANY suggestion or a link would be greatly appreciated!
Maybe the solution would be to drop index (to avoid funky multibyte issue in the index conversion,) do fast alter, and then add an index?
Thanks in advance for any serious suggestion and I suspect I may not find much of a help because of the uniqueness of it.
EDIT
jobs | CREATE TABLE `jobs` (
`auto_inc_key` int(11) NOT NULL AUTO_INCREMENT,
`request_entered_timestamp` datetime NOT NULL,
`hash_id` char(64) CHARACTER SET latin1 NOT NULL,
`name` varchar(128) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`host` char(20) CHARACTER SET latin1 NOT NULL,
`user_id` int(11) NOT NULL,
`start_date` datetime NOT NULL,
`end_date` datetime NOT NULL,
`state` char(12) CHARACTER SET latin1 NOT NULL,
`location` varchar(50) NOT NULL,
`value` int(10) NOT NULL DEFAULT '0',
`aggregation_job_id` char(64) CHARACTER SET latin1 DEFAULT NULL,
`aggregation_job_order` int(11) DEFAULT NULL,
PRIMARY KEY (`auto_inc_key`),
KEY `host` (`host`),
KEY `hash_id` (`hash_id`),
KEY `user_id` (`user_id`,`request_entered_timestamp`),
KEY `request_entered_timestamp_idx` (`request_entered_timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=9068466 DEFAULT CHARSET=utf8mb4
jobs_archive | CREATE TABLE `jobs_archive` (
`auto_inc_key` int(11) NOT NULL AUTO_INCREMENT,
`request_entered_timestamp` datetime NOT NULL,
`hash_id` char(64) CHARACTER SET latin1 NOT NULL,
`name` varchar(128) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`host` char(20) CHARACTER SET latin1 NOT NULL,
`user_id` int(11) NOT NULL,
`start_date` datetime NOT NULL,
`end_date` datetime NOT NULL,
`state` char(12) CHARACTER SET latin1 NOT NULL,
`value` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`auto_inc_key`),
KEY `host` (`host`),
KEY `hash_id` (`hash_id`),
KEY `user_id` (`user_id`,`request_entered_timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=239432 DEFAULT CHARSET=utf8mb4
(taken from PROCEDURE, but you catch the drift...)
INSERT INTO jobs_archive (SELECT * FROM jobs WHERE (TIMESTAMPDIFF(DAY, request_entered_timestamp, starttime) > days));
Motto of the query is very simple, to find out the last entry on a foreign key column.
the pseudo code I can say is
select vehicleid , last_journey_point , last_journey_time from journeyTable.
here is my SQL statement
-- loconumber is a indexed column
-- journeyserla is a autonumber primary key int(11)
-- the table locojourney contains 400,000 records
-- the below block of code executes in 19 secs
with LocomotiveLastRun AS(
-- this block of code runs in 0.016 sec
SELECT locojourney.loconumber , MAX(locojourney.journeyserla) as lastrunid
FROM locojourney GROUP BY loconumber)
SELECT locojourney.CurrentCombiners , locojourney.JourneySerla ,
locojourney.From_RunPoint , locojourney.NEXT_RunPoint
FROM LocomotiveLastRun FORCE INDEX(lastrunid)
JOIN locojourney FORCE INDEX(PRIMARY) ON x.lastrunid = locojourney.journeyserla
WHERE locojourney.ishoc = 'n'
the EXPLAIN command shows a derived table which is using no index and using where and type ALL
This is the table definition:
-- SHOW CREATE TABLE locojourney
CREATE TABLE `locojourney` (
`trainID` smallint(5) NOT NULL,
`LocoNumber` varchar(5) CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL,
`CurrentLocoBase` varchar(10) CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT NULL,
`CurrentDuedate` date DEFAULT NULL,
`LocoConsist` varchar(10) CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL,
`CurrentLocoDomain` varchar(10) CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT NULL,
`DomainChange` varchar(10) CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL,
`FEDR` enum('N','Y') CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT 'N',
`LADR` enum('N','Y') CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT 'N',
`ISBANKER` enum('N','Y') CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT 'N',
`TrainName` varchar(10) CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL,
`WithOutLoad` enum('N','Y') CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL DEFAULT 'N',
`runRoute` varchar(50) CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL,
`From_RunPoint` varchar(10) CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL,
`From_RunTime` datetime NOT NULL,
`NEXT_RunPoint` varchar(10) CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL,
`NEXT_RunTime` datetime NOT NULL,
`Affects_Outage` enum('N','Y') CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT 'N',
`Affects_Mileage` enum('N','Y') CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT 'N',
`GroundDistance` double(5,2) DEFAULT '0.00',
`SHGallowance` int(11) DEFAULT '0',
`Outage` double(5,4) DEFAULT '0.0000',
`UnderServiceType` enum('FHT','CHG','DEP','MIX','DETN') CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL DEFAULT 'FHT',
`SubServiceHead` varchar(25) CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL DEFAULT 'RUN',
`IShoc` enum('N','Y') CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT 'N',
`CurrentCombiners` varchar(28) CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT NULL,
`RunSetSerla` varchar(25) CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT NULL,
`JourneySerla` int(11) NOT NULL AUTO_INCREMENT,
`NominationSerla` varchar(50) CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT NULL,
`Traction` enum('DSL','AC') CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL DEFAULT 'DSL',
`Trainload` smallint(4) NOT NULL DEFAULT '0',
`LeadAssist` enum('Y','N') CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL DEFAULT 'N',
`DEO` varchar(100) CHARACTER SET latin1 COLLATE latin1_swedish_ci DEFAULT NULL,
`DEOtime` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`JourneySerla`),
KEY `trainID` (`trainID`) USING BTREE,
KEY `routesection_idx` (`runRoute`) USING BTREE,
KEY `loconumber_idx` (`LocoNumber`) USING BTREE,
KEY `runsetserla_idx` (`RunSetSerla`) USING BTREE,
KEY `subservicehead_idx` (`SubServiceHead`) USING BTREE,
CONSTRAINT `locojourney_ibfk_1` FOREIGN KEY (`SubServiceHead`) REFERENCES `ineffective` (`IneffectiveHead`) ON UPDATE CASCADE,
CONSTRAINT `locojourney_ibfk_3` FOREIGN KEY (`runRoute`) REFERENCES `routesections` (`Sectionname`) ON DELETE RESTRICT ON UPDATE CASCADE,
CONSTRAINT `loconumber_fk` FOREIGN KEY (`LocoNumber`) REFERENCES `lococontainer` (`LocoNumber`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=345719 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
with LocomotiveLastRun AS(
-- this block of code runs in 0.016 sec
SELECT locojourney.loconumber , MAX(locojourney.journeyserla) as lastrunid
FROM locojourney
GROUP BY loconumber)
Why is this CTE subquery fast? Because your table already has an index on (loconumber, journeyserla). (InnoDb automatically appends the primary key to every index.) This query can be satisfied with a loose index scan on that index, and those are fast.
Now for your main query:
Get rid of FORCE INDEX(). Don't even dream of using that unless you have at least a decade of SQL experience or you have read the source code for the InnoDB indexing stuff in MySQL. Notably, it's completely useless on the CTE because CTEs don't have indexes.
For clarity put your main (detail) table first and your CTE second.
For clarity recast the JOIN as a WHERE...IN...
Those three suggestions give us this:
WITH LocomotiveLastRun AS (...)
SELECT locojourney.CurrentCombiners , locojourney.JourneySerla ,
locojourney.From_RunPoint , locojourney.NEXT_RunPoint
FROM locojourney
WHERE journeyserla IN (SELECT lastrunid FROM LocomotiveLastRun)
AND locojourney.ishoc = 'n'
Now, it's plain what index can help this query.
An index on (ishoc) will help a bit. (It's actually an index, because InnoDB, on (ishoc, journeyserla) so it helps with both WHERE conditions.) The query planner uses BTREE random access to find the first index row with the ishoc value 'n', then scans the values of the primary key to match them with the IN clause.
Instead of that index, a compound index that covers the query will help even more. Such a covering index helps especially because each row of your table is large, with many columns. That index mentions the columns in the WHERE clause and those you want to select, like this:
(ishoc, journeyserla, CurrentCombiners, From_RunPoint, NEXT_RunPoint)
The query planner can satisfy your query entirely from the index, which saves on disk reading time to satisfy the query. If you use your query a lot, this index is a good idea. But, it does consume disk space and slow down INSERT and UPDATE operations a bit.
Read https://use-the-index-luke.com/
Give this a try:
SELECT lj.CurrentCombiners , lj.JourneySerla , lj.From_RunPoint , lj.NEXT_RunPoint
FROM ( SELECT MAX(journeyserla) as lastrunid
FROM locojourney
GROUP BY loconumber
) AS llr
JOIN locojourney AS lj ON llr.lastrunid = lj.journeyserla
WHERE lj.ishoc = 'n'
(time it and provide EXPLAIN for it)
I am trying to add Turkish names on my table but then when displayed it gives me ? instead of any of them. Any help what I am missing here? This is my table:
CREATE TABLE IF NOT EXISTS `offerings` (
`dep` varchar(5) CHARACTER SET utf16 COLLATE utf16_turkish_ci DEFAULT NULL,
`grade` varchar(4) CHARACTER SET utf16 COLLATE utf16_turkish_ci DEFAULT NULL,
`section` varchar(3) CHARACTER SET utf16 COLLATE utf16_turkish_ci DEFAULT NULL,
`name` varchar(100) CHARACTER SET utf16 COLLATE utf16_turkish_ci DEFAULT NULL,
`teacher` varchar(50) CHARACTER SET utf16 COLLATE utf16_turkish_ci DEFAULT NULL,
`quota` varchar(2) CHARACTER SET utf16 COLLATE utf16_turkish_ci DEFAULT NULL,
`lec1` varchar(35) CHARACTER SET utf16 COLLATE utf16_turkish_ci DEFAULT NULL,
`lec2` varchar(35) CHARACTER SET utf16 COLLATE utf16_turkish_ci DEFAULT NULL,
`lec3` varchar(35) DEFAULT NULL,
`lec4` varchar(35) DEFAULT NULL,
`lec5` varchar(35) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf16;
As suggested from the answer I choose here is the solution to the problem for whoever googles this topic. Special thanks to all who contributed in the solution of my problem.
CREATE TABLE IF NOT EXISTS `offerings` (
`dep` varchar(5) NOT NULL,
`grade` varchar(4) COLLATE utf8_unicode_ci NOT NULL,
`section` varchar(3) COLLATE utf8_unicode_ci NOT NULL,
`name` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
`teacher` varchar(50) COLLATE utf8_unicode_ci NOT NULL,
`quota` varchar(2) COLLATE utf8_unicode_ci,
`lec1` varchar(35) DEFAULT NULL,
`lec2` varchar(35) DEFAULT NULL,
`lec3` varchar(35) DEFAULT NULL,
`lec4` varchar(35) DEFAULT NULL,
`lec5` varchar(35) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
(Beginnings of an answer...)
Please don't use utf16; there is virtually no reason for such in a MySQL table.
So, assuming you switch to utf8, let's see if we can get rid of the ? problems.
utf8 needs to be established in about 4 places.
The column(s) in the database -- Use SHOW CREATE TABLE to verify that they are explicitly set to utf8, or defaulted from the table definition. (It is not enough to change the database default.)
The connection between the client and the server. See SET NAMES utf8.
The bytes you have. (This is probably the case.)
If you are displaying the text in a web page, check the <meta> tag.
What probably happened:
you had utf8-encoded data (good)
SET NAMES latin1 was in effect (default, but wrong)
the column was declared CHARACTER SET latin1 (default, but wrong)
Since the CHARACTER SET disagrees with what you have shown, the problem is possibly more complex. Please provide
SELECT col, HEX(col) FROM tbl WHERE ...
for some simple cell with Turkish characters. With this, I may be able to figure out what happened.
Also, Reference notes on encodings.
VARCHARs are character strings, while NVARCHARS are Unicode character strings. NVARCHARS require more bits per character to store, but have a greater range. Try updating your data types. This should fix your problem.
EDIT This answer is wrong. The OP clearly asked for a MySQL solution, but the above applies only to SQL Server.
So i get an error when i try and use
SELECT views, keywords, title, url, thumbnail,
MATCH(keywords,title) AGAINST ('%$search_value%') AS relevance
FROM straight
WHERE MATCH (keywords,title) AGAINST ('%$search_value%')
ORDER BY relevance DESC
This is due to me not having FULLtext search enabled, but i cant seem to enable it. when i run the sql below:
ALTER TABLE straight ADD FULLTEXT(keywords, title)
i get this response:
MySQL returned an empty result set (i.e. zero rows). (Query took 3.8022 sec)
Then when trying to run the first query again i get the failed
#1191 - Can't find FULLTEXT index matching the column list
I can't tell why it's not registering. Any help would be great.
Thanks!
Edit:
My tabel:
CREATE TABLE `straight` (
`url` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL,
`title` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL,
`keywords` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL,
`production` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL,
`categories` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL,
`views` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL,
`likes` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL,
`length` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL,
`thumbnail` varchar(200) COLLATE utf8_unicode_ci DEFAULT NULL,
`date` varchar(12) COLLATE utf8_unicode_ci DEFAULT NULL,
UNIQUE KEY `url` (`url`),
FULLTEXT KEY `url_2` (`url`,`title`,`keywords`,`production`,
`categories`,`views`,`likes`,`length`,`thumbnail`,`date`
), ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
You need a FULLTEXT index that matches, exactly, the columns upon which you want to search. The FULLTEXT index you have has more columns than you need.
Try adding the one you mentioned.
ALTER TABLE straight ADD FULLTEXT(keywords, title)
Then look at the table definition and make sure it's there.
Let`s have a example hotels table:
CREATE TABLE `hotels` (
`HotelNo` varchar(4) character set latin1 NOT NULL default '0000',
`Hotel` varchar(80) character set latin1 NOT NULL default '',
`City` varchar(100) character set latin1 default NULL,
`CityFR` varchar(100) character set latin1 default NULL,
`Region` varchar(50) character set latin1 default NULL,
`RegionFR` varchar(100) character set latin1 default NULL,
`Country` varchar(50) character set latin1 default NULL,
`CountryFR` varchar(50) character set latin1 default NULL,
`HotelText` text character set latin1,
`HotelTextFR` text character set latin1,
`tagsforsearch` text character set latin1,
`tagsforsearchFR` text character set latin1,
PRIMARY KEY (`HotelNo`),
FULLTEXT KEY `fulltextHotelSearch` (`HotelNo`,`Hotel`,`City`,`CityFR`,`Region`,`RegionFR`,`Country`,`CountryFR`,`HotelText`,`HotelTextFR`,`tagsforsearch`,`tagsforsearchFR`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci;
In this table for example we have only one hotel with Region name = "Graubünden" (please note umlaut ü character)
And now I want to achieve same search match for phrases:
'graubunden' and
'graubünden'
This is simple with use of MySql built in
collations in regular searches as follows:
SELECT *
FROM `hotels`
WHERE `Region` LIKE CONVERT(_utf8 '%graubunden%' USING latin1)
COLLATE latin1_german1_ci
This works fine for 'graubunden' and 'graubünden' and
as a result I receive proper result, but problem is
when we make MySQL full text search
Whats wrong with this SQL statement?:
SELECT
*
FROM
hotels
WHERE
MATCH (`HotelNo`,`Hotel`,`Address`,`City`,`CityFR`,`Region`,`RegionFR`,`Country`,`CountryFR`, `HotelText`, `HotelTextFR`, `tagsforsearch`, `tagsforsearchFR`)
AGAINST( CONVERT('+graubunden' USING latin1) COLLATE latin1_german1_ci IN BOOLEAN MODE)
ORDER BY Country ASC, Region ASC, City ASC
This doesn`t return any result.
Any ideas where the dog is buried ?
When you define individual CHARACTER SETS for your columns, you override the collation you set default on table level.
Each of your columns has default latin1 collation (which is latin1_swedish_ci). You can see it by running SHOW CREATE TABLE.
In FULLTEXT queries, indexed columns have COERCIBILITY of 0, that is all fulltext queries are converted to the collation used in the index, not vice versa.
You need to remove CHARACTER SET definitions from your columns or explicitly set all columns to latin1_german_ci:
CREATE TABLE `hotels` (
`HotelNo` varchar(4) NOT NULL default '0000',
`Hotel` varchar(80) NOT NULL default '',
`City` varchar(100) default NULL,
`CityFR` varchar(100) default NULL,
`Region` varchar(50) default NULL,
`RegionFR` varchar(100) default NULL,
`Country` varchar(50) default NULL,
`CountryFR` varchar(50) default NULL,
`HotelText` text,
`HotelTextFR` text,
`tagsforsearch` text,
`tagsforsearchFR` text,
PRIMARY KEY (`HotelNo`),
FULLTEXT KEY `fulltextHotelSearch` (`HotelNo`,`Hotel`,`City`,`CityFR`,`Region`,`RegionFR`,`Country`,`CountryFR`,`HotelText`,`HotelTextFR`,`tagsforsearch`,`tagsforsearchFR`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci;
INSERT
INTO hotels (hotelText, HotelTextFR, tagsforsearch, tagsforsearchFR)
VALUES ('text', 'text', 'graubünden', 'tags');
SELECT *
FROM hotels
WHERE MATCH (`HotelNo`,`Hotel`,`City`,`CityFR`,`Region`,`RegionFR`,`Country`,`CountryFR`, `HotelText`, `HotelTextFR`, `tagsforsearch`, `tagsforsearchFR`)
AGAINST (CONVERT('+graubunden' USING latin1) COLLATE latin1_german1_ci IN BOOLEAN MODE)
ORDER BY
Country ASC, Region ASC, City ASC;