full text indexing only returns 2 rows - mysql

I have a table for food and hotels
like
CREATE TABLE `food_master` (
`id` int(6) unsigned NOT NULL auto_increment,
`caption` varchar(255) default NULL,
`category` varchar(10) default NULL,
`subcategory` varchar(10) default NULL,
`hotel` varchar(10) default NULL,
`description` text,
`status` varchar(10) default NULL,
`created_date` date default NULL,
`modified_date` date default NULL,
`chosen_mark` varchar(10) default 'no',
PRIMARY KEY (`id`),
FULLTEXT KEY `description` (`description`,`caption`)
) ENGINE=MyISAM AUTO_INCREMENT=15 DEFAULT CHARSET=latin1
And I have data in it. I use full text indexing in this table. I use the query
SELECT * FROM food_master am
WHERE MATCH(description, caption) AGAINST ('Chicken')
This query works fine when i have 2 'Chicken' in the field 'caption'. but when i put third one it doesnt return a row.

try with IN BOOLEAN MODE as
MySQL can perform boolean full-text searches using the IN BOOLEAN MODE
modifier. With this modifier, certain characters have special meaning
at the beginning or end of words in the search string.
SELECT * FROM food_master
WHERE MATCH(description, caption) AGAINST ('Chicken' IN BOOLEAN MODE)
Demo

Related

Long running Mysql Query on Indexes and sort by clause

I have a very long running MySql query. The query simply joins two tables which are very huge
bizevents - Nearly 34 Million rows
bizevents_actions - Nearly 17 million rows
Here is the query:
select
bizevent0_.id as id1_37_,
bizevent0_.json as json2_37_,
bizevent0_.account_id as account_3_37_,
bizevent0_.createdBy as createdB4_37_,
bizevent0_.createdOn as createdO5_37_,
bizevent0_.description as descript6_37_,
bizevent0_.iconCss as iconCss7_37_,
bizevent0_.modifiedBy as modified8_37_,
bizevent0_.modifiedOn as modified9_37_,
bizevent0_.name as name10_37_,
bizevent0_.version as version11_37_,
bizevent0_.fired as fired12_37_,
bizevent0_.preCreateFired as preCrea13_37_,
bizevent0_.entityRefClazz as entityR14_37_,
bizevent0_.entityRefIdAsStr as entityR15_37_,
bizevent0_.entityRefIdType as entityR16_37_,
bizevent0_.entityRefName as entityR17_37_,
bizevent0_.entityRefType as entityR18_37_,
bizevent0_.entityRefVersion as entityR19_37_
from
BizEvent bizevent0_
left outer join BizEvent_actions actions1_ on
bizevent0_.id = actions1_.BizEvent_id
where
bizevent0_.createdOn >= '1969-12-31 19:00:01.0'
and (actions1_.action <> 'SoftLock'
and actions1_.targetRefClazz = 'com.biznuvo.core.orm.domain.org.EmployeeGroup'
and actions1_.targetRefIdAsStr = '1'
or actions1_.action <> 'SoftLock'
and actions1_.objectRefClazz = 'com.biznuvo.core.orm.domain.org.EmployeeGroup'
and actions1_.objectRefIdAsStr = '1')
order by
bizevent0_.createdOn;
Below are the table definitions -- As you see i have defined the indexes well enough on these two tables on all the search columns plus the sort column. But still my queries are running for very very long time. Appreciate any more ideas either with respective indexing.
-- bizevent definition
CREATE TABLE `bizevent` (
`id` bigint(20) NOT NULL,
`json` longtext,
`account_id` int(11) DEFAULT NULL,
`createdBy` varchar(50) NOT NULL,
`createdon` datetime(3) DEFAULT NULL,
`description` varchar(255) DEFAULT NULL,
`iconCss` varchar(50) DEFAULT NULL,
`modifiedBy` varchar(50) NOT NULL,
`modifiedon` datetime(3) DEFAULT NULL,
`name` varchar(255) NOT NULL,
`version` int(11) NOT NULL,
`fired` bit(1) NOT NULL,
`preCreateFired` bit(1) NOT NULL,
`entityRefClazz` varchar(255) DEFAULT NULL,
`entityRefIdAsStr` varchar(255) DEFAULT NULL,
`entityRefIdType` varchar(25) DEFAULT NULL,
`entityRefName` varchar(255) DEFAULT NULL,
`entityRefType` varchar(50) DEFAULT NULL,
`entityRefVersion` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `IDXk9kxuuprilygwfwddr67xt1pw` (`createdon`),
KEY `IDXsf3ufmeg5t9ok7qkypppuey7y` (`entityRefIdAsStr`),
KEY `IDX5bxv4g72wxmjqshb770lvjcto` (`entityRefClazz`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
-- bizevent_actions definition
CREATE TABLE `bizevent_actions` (
`BizEvent_id` bigint(20) NOT NULL,
`action` varchar(255) DEFAULT NULL,
`objectBizType` varchar(255) DEFAULT NULL,
`objectName` varchar(255) DEFAULT NULL,
`objectRefClazz` varchar(255) DEFAULT NULL,
`objectRefIdAsStr` varchar(255) DEFAULT NULL,
`objectRefIdType` int(11) DEFAULT NULL,
`objectRefVersion` int(11) DEFAULT NULL,
`targetBizType` varchar(255) DEFAULT NULL,
`targetName` varchar(255) DEFAULT NULL,
`targetRefClazz` varchar(255) DEFAULT NULL,
`targetRefIdAsStr` varchar(255) DEFAULT NULL,
`targetRefIdType` int(11) DEFAULT NULL,
`targetRefVersion` int(11) DEFAULT NULL,
`embedJson` longtext,
`actions_ORDER` int(11) NOT NULL,
PRIMARY KEY (`BizEvent_id`,`actions_ORDER`),
KEY `IDXa21hhagjogn3lar1bn5obl48gll` (`action`),
KEY `IDX7agsatk8u8qvtj37vhotja0ce77` (`targetRefClazz`),
KEY `IDXa7tktl678kqu3tk8mmkt1mo8lbo` (`targetRefIdAsStr`),
KEY `IDXa22eevu7m820jeb2uekkt42pqeu` (`objectRefClazz`),
KEY `IDXa33ba772tpkl9ig8ptkfhk18ig6` (`objectRefIdAsStr`),
CONSTRAINT `FKr9qjs61id11n48tdn1cdp3wot` FOREIGN KEY (`BizEvent_id`) REFERENCES `bizevent` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;>
By the way we are using Amazon RDS 5.7.33 MySql version. 16 GB RAM and 4 vCPU.
I also did a Explain Extended on the query and below is what it shows. Appreciate any help.
Initially the search of the bizevent_actions didn;t have the indexes defined. I have defined the indexes for them and tried the query but of no use.
One technique that worked for me in a similar situation was abandoning the idea of JOIN completely and switching to queries by PK. More detailed: find out which table in join would give less rows on average if you use only that table and related filter to query; get the primary keys from that table and then query the other one using WHERE pk IN ().
In your case one example would be:
SELECT
bizevent0_.id as id1_37_,
bizevent0_.json as json2_37_,
bizevent0_.account_id as account_3_37_,
...
FROM BizEvent bizevent0_
WHERE
bizevent0_.createdOn >= '1969-12-31 19:00:01.0'
AND bizevent0_.id IN (
SELECT BizEvent_id
FROM BizEvent_actions actions1_
WHERE
actions1_.action <> 'SoftLock'
and actions1_.targetRefClazz = 'com.biznuvo.core.orm.domain.org.EmployeeGroup'
and actions1_.targetRefIdAsStr = '1'
or actions1_.action <> 'SoftLock'
and actions1_.objectRefClazz = 'com.biznuvo.core.orm.domain.org.EmployeeGroup'
and actions1_.objectRefIdAsStr = '1')
ORDER BY
bizevent0_.createdOn;
This assumes that you're not actually willing to select 33+ Mio rows from BizEvent though - your code with LEFT OUTER JOIN would have done exactly this.

Mysql JOIN query apparently slow

I have 2 tables. The first, called stazioni, where I store live weather data from some weather station, and the second called archivio2, where are stored archived day data. The two tables have in common the ID station data (ID on stazioni, IDStazione on archvio2).
stazioni (1,743 rows)
CREATE TABLE `stazioni` (
`ID` int(10) NOT NULL,
`user` varchar(100) NOT NULL,
`nome` varchar(100) NOT NULL,
`email` varchar(50) NOT NULL,
`localita` varchar(100) NOT NULL,
`provincia` varchar(50) NOT NULL,
`regione` varchar(50) NOT NULL,
`altitudine` int(10) NOT NULL,
`stazione` varchar(100) NOT NULL,
`schermo` varchar(50) NOT NULL,
`installazione` varchar(50) NOT NULL,
`ubicazione` varchar(50) NOT NULL,
`immagine` varchar(100) NOT NULL,
`lat` double NOT NULL,
`longi` double NOT NULL,
`file` varchar(255) NOT NULL,
`url` varchar(255) NOT NULL,
`temperatura` decimal(10,1) DEFAULT NULL,
`umidita` decimal(10,1) DEFAULT NULL,
`pressione` decimal(10,1) DEFAULT NULL,
`vento` decimal(10,1) DEFAULT NULL,
`vento_direzione` decimal(10,1) DEFAULT NULL,
`raffica` decimal(10,1) DEFAULT NULL,
`pioggia` decimal(10,1) DEFAULT NULL,
`rate` decimal(10,1) DEFAULT NULL,
`minima` decimal(10,1) DEFAULT NULL,
`massima` decimal(10,1) DEFAULT NULL,
`orario` varchar(16) DEFAULT NULL,
`online` int(1) NOT NULL DEFAULT '0',
`tipo` int(1) NOT NULL DEFAULT '0',
`webcam` varchar(255) DEFAULT NULL,
`webcam2` varchar(255) DEFAULT NULL,
`condizioni` varchar(255) DEFAULT NULL,
`Data2` datetime DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
archivio2 (2,127,347 rows)
CREATE TABLE `archivio2` (
`ID` int(10) NOT NULL,
`IDStazione` int(4) NOT NULL DEFAULT '0',
`localita` varchar(100) NOT NULL,
`temp_media` decimal(10,1) DEFAULT NULL,
`temp_minima` decimal(10,1) DEFAULT NULL,
`temp_massima` decimal(10,1) DEFAULT NULL,
`pioggia` decimal(10,1) DEFAULT NULL,
`pressione` decimal(10,1) DEFAULT NULL,
`vento` decimal(10,1) DEFAULT NULL,
`raffica` decimal(10,1) DEFAULT NULL,
`records` int(10) DEFAULT NULL,
`Data2` datetime DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
The indexes that I set
-- Indexes for table `archivio2`
--
ALTER TABLE `archivio2`
ADD PRIMARY KEY (`ID`),
ADD KEY `IDStazione` (`IDStazione`),
ADD KEY `Data2` (`Data2`);
-- Indexes for table `stazioni`
--
ALTER TABLE `stazioni`
ADD PRIMARY KEY (`ID`),
ADD KEY `Tipo` (`Tipo`);
ALTER TABLE `stazioni` ADD FULLTEXT KEY `localita` (`localita`);
On a map, I call by a calendar the date to search data on archive2 table, by this INNER JOIN query (I put an example date):
SELECT *, c.pioggia AS rain, c.raffica AS raff, c.vento AS wind, c.pressione AS press
FROM stazioni as o
INNER JOIN archivio2 as c ON o.ID = c.IDStazione
WHERE c.Data2 LIKE '2019-01-01%'
All works fine, but the time needed to show result are really slow (4/5 seconds), even if the query execution time seems to be ok (about 0.5s/1.0s).
I tried to execute the query on PHPMyadmin, and the results are the same. Execution time quickly, but time to show result extremely slow.
EXPLAIN query result
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE o ALL PRIMARY,ID NULL NULL NULL 1743 NULL
1 SIMPLE c ref IDStazione,Data2 IDStazione 4 sccavzuq_rete.o.ID 1141 Using where
UPDATE: the query goes fine if I remove the index from 'IDStazione'. But in this way I lost all advantages and speed on other queries... why only that query become slow if I put index on that field?
In your WHERE clause
WHERE c.Data2 LIKE '2019-01-01%'
the value of Data2 must be casted to a string. No index can be used for that condition.
Change it to
WHERE c.Data2 >= '2019-01-01' AND c.Data2 < '2019-01-01' + INTERVAL 1 DAY
This way the engine should be able to use the index on (Data2).
Now check the EXPLAIN result. I would expect, that the table order is swapped and the key column will show Data2 (for c) and ID (for o).
(Fixing the DATE is the main performance solution; here is a less critical issue.)
The tables are much bigger than necessary. Size impacts disk space and, to some extent, speed.
You have 1743 stations, yet the datatype is a 32-bit (4-byte) number (INT). SMALLINT UNSIGNED would allow for 64K stations and use only 2 bytes.
Does it get really, really, hot there? Like 999999999.9 degrees? DECIMAL(10.1) takes 5 bytes; DECIMAL(4,1) takes only 3 and allows up to 999.9 degrees. DECIMAL(3,1) has a max of 99.9 and takes only 2 bytes.
What is "localita varchar(100)" doing in the big table? Seems like you could JOIN to the stations table when you need it? Removing that might cut the table size in half.

How to optimize mysql query even it already used index

query is simple, as below:
select count(1) from ec_account a join ec_card b on a.id = b.AccountId
there are 2.5 million rows in either ec_account and ec_card.(InnoDB)
here is the execution plan:
execution plan
as you see,
it already added index and used it, but the query still costed almost 60 seconds, is there any way could optimize it except changing database(mariadb has no such choke point as far as i know).
here is table DDL,ec_ccount:
CREATE TABLE `ec_account` (
`Id` varchar(64) NOT NULL,
`AccountType` varchar(32) NOT NULL,
`Name` varchar(32) NOT NULL,
`Status` tinyint(3) unsigned NOT NULL,
`IDCardType` varchar(32) DEFAULT NULL,
`IDCardNo` varchar(64) DEFAULT NULL,
`Password` varchar(256) DEFAULT NULL,
`PasswordHalt` varchar(128) DEFAULT NULL,
`Sex` varchar(8) DEFAULT NULL,
`BirthDay` datetime NOT NULL,
`Mobile` varchar(16) DEFAULT NULL,
`Address` varchar(64) DEFAULT NULL,
`Linkman` varchar(32) DEFAULT NULL,
`LinkmanRelation` varchar(16) DEFAULT NULL,
`LinkmanTel` varchar(16) DEFAULT NULL,
`Remark` varchar(128) DEFAULT NULL,
`Nationality` varchar(32) DEFAULT NULL,
`Nation` varchar(32) DEFAULT NULL,
`MaritalStatus` varchar(8) DEFAULT NULL,
`NativePlace` varchar(64) DEFAULT NULL,
`Occupation` varchar(32) DEFAULT NULL,
`BloodType` varchar(8) DEFAULT NULL,
`Education` varchar(8) DEFAULT NULL,
`LinkmanAddress` varchar(64) DEFAULT NULL,
`HomeAddress` varchar(128) DEFAULT NULL,
`Email` varchar(64) DEFAULT NULL,
`CompanyName` varchar(64) DEFAULT NULL,
`CompanyAddress` varchar(128) DEFAULT NULL,
`CompanyTel` varchar(16) DEFAULT NULL,
`Creator` char(36) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
`CreateTime` datetime NOT NULL,
`LastModifier` char(36) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL,
`LastModifyTime` datetime DEFAULT NULL,
`Avatar` longblob,
PRIMARY KEY (`Id`),
KEY `IX_Name` (`Name`) USING HASH,
KEY `Idx_IDCard_Account` (`IDCardType`,`IDCardNo`) USING HASH,
KEY `Idx_Mobile` (`Mobile`) USING HASH,
KEY `Idx_CreateTime` (`CreateTime`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
and ec_card :
CREATE TABLE `ec_card` (
`Id` char(36) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '',
`AccountId` varchar(64) NOT NULL,
`CardType` varchar(32) NOT NULL,
`CardNo` varchar(32) NOT NULL,
`Status` tinyint(3) unsigned NOT NULL,
`IsPasswordAuth` tinyint(1) NOT NULL,
PRIMARY KEY (`Id`),
UNIQUE KEY `Idx_Unique_AccountId_CardType` (`AccountId`,`CardType`) USING HASH,
UNIQUE KEY `Idx_Unique_CardType_CardNo` (`CardType`,`CardNo`) USING HASH,
KEY `Idx_Uniques_AccountId` (`AccountId`) USING BTREE,
CONSTRAINT `FK_ec_card_ec_account_AccountId` FOREIGN KEY (`AccountId`) REFERENCES `ec_account` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Not without fundamentally changing the query.
There are no conditions on your query! It selects all 2.5 million rows from ec_card, as well as every matching row from ec_account. Reading all this data from disk and sending it over the network is the bottleneck; there is no way to change that without changing what the query does.
Here is a workaround for you. I think it would run much faster, and get the same result.
Calculate the total count of ec_account:
SELECT count(1) AS total_count FROM ec_account;
Calculate the amount of records those existed in ec_account but not existed in ec_card:
SELECT count(1) AS missing_count
FROM ec_account a LEFT JOIN ec_card b on a.id = b.AccountId
WHERE b.AccountId IS NULL;
Matched count = total_count - missing_count
The core problem here is that you combined two large table together, it requires a lot of memory and it apparently needs a lot of time to finish.
try it using correlated subquery. This might help:
select count(1) from ec_account a where exists (select * from ec_card b
where b.AccountId=a.id)
Also, other than indexing following strategies generally help:
- Denormalization
- Caching results
- Using a NoSQL database

Mysql word boundary for diacritic strings

I am facing an issue to match words with word boundaries in diacritic strings
CREATE TABLE `authors` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`first_name` varchar(250) DEFAULT NULL,
`last_name` varchar(250) DEFAULT NULL,
`initials` varchar(250) DEFAULT NULL,
`affiliation_available` tinyint(1) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `unique_names` (`first_name`,`last_name`,`initials`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
INSERT INTO authors VALUES (1,'Juan José','Palacios Gutiérrez','JJ',1);
I want to get the word boundaries for diacritic strings also
select * from authors where ( preg_rlike('/\bJosé\b/i',CONVERT(authors.first_name USING utf8)));
Which failes, then I tried
select * from authors where ( preg_rlike('/(?<!\w)José(?!\w)/i',CONVERT(authors.first_name USING utf8)));
It is working. But
select * from authors where ( preg_rlike('/(?<!\w)Jos(?!\w)/i',CONVERT(authors.first_name USING utf8)));
This query should return false ideally.Because there is no words with Jos.But here it returns the wrong data
Can you please help me

MySQL - ordering by column with many different values in big table

Probably through poor database design, the following really simple query is taking ~1.5 minutes to run.
SELECT s.title, t.name AS team_name
FROM stories AS s
JOIN teams AS t ON s.team_id = t.id
WHERE s.pubdate >= "1970-01-01 00:00"
ORDER BY s.hits /* <-- here's the problem */
LIMIT 3 OFFSET 0
The problem is the stories table is fairly big, with ~1.5m rows, and there's a ton of unique values for hits (this column logs the hits to each story.)
Take out the order clause and it resolves almost instantly.
Question: what can I do to optimise for queries like this? Presumably I shouldn't apply an index to hits since direct no look-ups take place on that column.
[UPDATE]
SHOW CREATE TABLE for all tables concerned:
CREATE TABLE stories (
`id` varchar(11) NOT NULL,
`link` text NOT NULL,
`title` varchar(255) CHARACTER SET utf8 NOT NULL,
`description` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
`pubdate` datetime NOT NULL,
`source_id` varchar(11) NOT NULL,
`team_id` varchar(11) NOT NULL,
`hits` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `Unique combo (title + date)` (`title`,`pubdate`),
KEY `team (FK)` (`team_id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
CREATE TABLE teams (
`id` varchar(11) NOT NULL,
`is_live` enum('y') DEFAULT NULL,
`name` varchar(50) NOT NULL,
`short_name` varchar(12) DEFAULT NULL,
`server` varchar(11) DEFAULT NULL,
`url_token` varchar(255) NOT NULL,
`league` varchar(11) NOT NULL,
`away_game_id` varchar(255) DEFAULT NULL,
`digest_list_id` varchar(25) DEFAULT NULL,
`twitter_handle` varchar(255) DEFAULT NULL,
`no_official_news` enum('y') DEFAULT NULL,
`alt_names` varchar(255) DEFAULT NULL,
`no_use_nickname` enum('y') DEFAULT NULL,
`official_hashtag` varchar(30) DEFAULT NULL,
`merge_news_and_fans` enum('y') DEFAULT NULL,
`colour_1` varchar(6) NOT NULL,
`colour_2` varchar(6) DEFAULT NULL,
`colour_3` varchar(6) DEFAULT NULL,
`link_colour_modifier` float DEFAULT NULL,
`alt_link_colour_modifier` float DEFAULT NULL,
`title_shade` enum('dark','light') NOT NULL,
`shirt_style` enum('vert_stripes','horiz_stripes','vert_stripes_thin','horiz_stripes_thin','vert_split','horiz_split') DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `URL token` (`url_token`),
KEY `league (FK)` (`league`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
Consider removing the filter on pubdate if the user does not need it. It confuses the optimizer.
INDEX(hits, pubdate, title)
will probably help the query the most. It is "covering".
The reason why removing ORDER BY runs fast: Without it, it gives you any 3 rows. With it, and without a useful index, it needs to sort the 1.5M rows to discover the 3 with the least number of hits.
Perhaps you wanted ORDER BY s.hits DESC? -- to get those with the most hits.