Mysql Fulltext Index with AND condition accross multiple tables is slow - mysql

I have two huge tables (55M rows) with the following structure:
CREATE TABLE `chapters` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`toc` varchar(5000) COLLATE utf8mb4_unicode_ci NOT NULL,
`author` varchar(5000) COLLATE utf8mb4_unicode_ci NOT NULL,
`ari_id` bigint(20) NOT NULL,
PRIMARY KEY (`id`),
KEY `ari_id` (`ari_id`),
FULLTEXT KEY `toc` (`toc`),
FULLTEXT KEY `author` (`author`)
) ENGINE=InnoDB AUTO_INCREMENT=52251463 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
CREATE TABLE `books` (
`ID` int(15) unsigned NOT NULL AUTO_INCREMENT,
`Title` varchar(2000) COLLATE utf8mb4_unicode_ci DEFAULT '',
`Author` varchar(2000) COLLATE utf8mb4_unicode_ci DEFAULT '',
`isOpenAccess` tinyint(1) NOT NULL,
`ari_id` bigint(20) NOT NULL,
PRIMARY KEY (`ID`),
UNIQUE KEY `ari_id` (`ari_id`),
FULLTEXT KEY `Title` (`Title`),
FULLTEXT KEY `Author` (`Author`),
) ENGINE=InnoDB AUTO_INCREMENT=2627161 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
I am using the following query for searching:
SELECT b.ari_id, b.Title, b.Author, t.toc, t.author
FROM books b
INNER JOIN chapters t
ON b.ari_id = t.ari_id
WHERE MATCH(t.toc) AGAINST('power*' IN BOOLEAN MODE)
AND b.isOpenaccess = 1
LIMIT 300
It is returning the results in about 12 seconds. Is there any chance that I can speed up the response time?
Second when I try to search from two fulltext indexes using "AND" operator, it takes forever to respond (146 seconds). The query I am running is as follows:
SELECT toc, author
FROM tocs
WHERE MATCH(toc) AGAINST('high*' IN BOOLEAN MODE)
AND MATCH(author) AGAINST('max*' IN BOOLEAN MODE)
LIMIT 300

In books, ari_id could be the PRIMARY KEY and you could get rid of id. (This may or may not help performance.)
MySQL likes to run the FULLTEXT part of the WHERE first, then AND with the other tests.
This:
WHERE MATCH(toc) AGAINST('high*' IN BOOLEAN MODE)
AND MATCH(author) AGAINST('max*' IN BOOLEAN MODE)
can be sped up with FULLTEXT(toc, author) and
WHERE MATCH(toc, author) AGAINST('high* max*' IN BOOLEAN MODE)
But it will find extra rows. (Eg, the toc has both high and max, but author has neither.)
A similar trick is not possible when doing FT queries on each of two Joined tables.
OTOH, by having a 3rd table with all of columns, plus ari_id, would let you combine the tests into a single MATCH. Then follow by further refinement.

Related

Mysql Partitioning Query Performance

i have created partitions on pricing table. below is the alter statement.
ALTER TABLE `price_tbl`
PARTITION BY HASH(man_code)
PARTITIONS 87;
one partition consists of 435510 records. total records in price_tbl is 6 million.
EXPLAIN query showing only one partion is used for the query . Still the query takes 3-4 sec to execute. below is the query
EXPLAIN SELECT vrimg.image_cap_id,vm.man_name,vr.range_code,vr.range_name,vr.range_url, MIN(`finance_rental`) AS from_price, vd.der_id AS vehicle_id FROM `range_tbl` vr
LEFT JOIN `image_tbl` vrimg ON vr.man_code = vrimg.man_code AND vr.type_id = vrimg.type_id AND vr.range_code = vrimg.range_code
LEFT JOIN `manufacturer_tbl` vm ON vr.man_code = vm.man_code AND vr.type_id = vm.type_id
LEFT JOIN `derivative_tbl` vd ON vd.man_code=vm.man_code AND vd.type_id = vr.type_id AND vd.range_code=vr.range_code
LEFT JOIN `price_tbl` vp ON vp.vehicle_id = vd.der_id AND vd.type_id = vp.type_id AND vp.product_type_id=1 AND vp.maintenance_flag='N' AND vp.man_code=164
AND vp.initial_rentals_id =(SELECT rental_id FROM `rentals_tbl` WHERE rental_months='9')
AND vp.annual_mileage_id =(SELECT annual_mileage_id FROM `mileage_tbl` WHERE annual_mileage='8000')
WHERE vr.type_id = 1 AND vm.man_url = 'audi' AND vd.type_id IS NOT NULL GROUP BY vd.der_id
Result of EXPLAIN.
Same query without partitioning takes 3-4 sec.
Query with partitioning takes 2-3 sec.
how we can increase query performance as it is too slow yet.
attached create table structure.
price table - This consists 6 million records
CREATE TABLE `price_tbl` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`lender_id` bigint(20) DEFAULT NULL,
`type_id` bigint(20) NOT NULL,
`man_code` bigint(20) NOT NULL,
`vehicle_id` bigint(20) DEFAULT NULL,
`product_type_id` bigint(20) DEFAULT NULL,
`initial_rentals_id` bigint(20) DEFAULT NULL,
`term_id` bigint(20) DEFAULT NULL,
`annual_mileage_id` bigint(20) DEFAULT NULL,
`ref` varchar(255) DEFAULT NULL,
`maintenance_flag` enum('Y','N') DEFAULT NULL,
`finance_rental` decimal(20,2) DEFAULT NULL,
`monthly_rental` decimal(20,2) DEFAULT NULL,
`maintenance_payment` decimal(20,2) DEFAULT NULL,
`initial_payment` decimal(20,2) DEFAULT NULL,
`doc_fee` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`,`type_id`,`man_code`),
KEY `type_id` (`type_id`),
KEY `vehicle_id` (`vehicle_id`),
KEY `term_id` (`term_id`),
KEY `product_type_id` (`product_type_id`),
KEY `finance_rental` (`finance_rental`),
KEY `type_id_2` (`type_id`,`vehicle_id`),
KEY `maintenanace_idx` (`maintenance_flag`),
KEY `lender_idx` (`lender_id`),
KEY `initial_idx` (`initial_rentals_id`),
KEY `man_code_idx` (`man_code`)
) ENGINE=InnoDB AUTO_INCREMENT=5830708 DEFAULT CHARSET=latin1
/*!50100 PARTITION BY HASH (man_code)
PARTITIONS 87 */
derivative table - This consists 18k records.
CREATE TABLE `derivative_tbl` (
`type_id` bigint(20) DEFAULT NULL,
`der_cap_code` varchar(20) DEFAULT NULL,
`der_id` bigint(20) DEFAULT NULL,
`body_style_id` bigint(20) DEFAULT NULL,
`fuel_type_id` bigint(20) DEFAULT NULL,
`trans_id` bigint(20) DEFAULT NULL,
`man_code` bigint(20) DEFAULT NULL,
`range_code` bigint(20) DEFAULT NULL,
`model_code` bigint(20) DEFAULT NULL,
`der_name` varchar(255) DEFAULT NULL,
`der_url` varchar(255) DEFAULT NULL,
`der_intro_year` date DEFAULT NULL,
`der_disc_year` date DEFAULT NULL,
`der_last_spec_date` date DEFAULT NULL,
KEY `der_id` (`der_id`),
KEY `type_id` (`type_id`),
KEY `man_code` (`man_code`),
KEY `range_code` (`range_code`),
KEY `model_code` (`model_code`),
KEY `body_idx` (`body_style_id`),
KEY `capcodeidx` (`der_cap_code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
range table - This consists 1k records
CREATE TABLE `range_tbl` (
`type_id` bigint(20) DEFAULT NULL,
`man_code` bigint(20) DEFAULT NULL,
`range_code` bigint(20) DEFAULT NULL,
`range_name` varchar(255) DEFAULT NULL,
`range_url` varchar(255) DEFAULT NULL,
KEY `range_code` (`range_code`),
KEY `type_id` (`type_id`),
KEY `man_code` (`man_code`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
PARTITION BY HASH is essentially useless if you are hoping for improved performance. BY RANGE is useful in a few use cases_.
In most situations, improvements in indexes are as good as trying to use partitioning.
Some likely problems:
No explicit PRIMARY KEY for InnoDB tables. Add a natural PK, if applicable, else an AUTO_INCREMENT.
No "composite" indexes -- they often provide a performance boost. Example: The LEFT JOIN between vr and vrimg involves 3 columns; a composite index on those 3 columns in the 'right' table will probably help performance.
Blind use of BIGINT when smaller datatypes would work. (This is an I/O issue when the table is big.)
Blind use of 255 in VARCHAR.
Consider whether most of the columns should be NOT NULL.
That query may be a victim of the "explode-implode" syndrome. This is where you do JOIN(s), which create a big intermediate table, followed by a GROUP BY to bring the row-count back down.
Don't use LEFT unless the 'right' table really is optional. (I see LEFT JOIN vd ... vd.type_id IS NOT NULL.)
Don't normalize "continuous" values (annual_mileage and rental_months). It is not really beneficial for "=" tests, and it severely hurts performance for "range" tests.
Same query without partitioning takes 3-4 sec. Query with partitioning takes 2-3 sec.
The indexes almost always need changing when switching between partitioning and non-partitioning. With the optimal indexes for each case, I predict that performance will be close to the same.
Indexes
These should help performance whether or not it is partitioned:
vm: (man_url)
vr: (man_code, type_id) -- either order
vd: (man_code, type_id, range_code, der_id)
-- `der_id` 4th, else in any order (covering)
vrimg: (man_code, type_id, range_code, image_cap_id)
-- `image_cap_id` 4th, else in any order (covering)
vp: (type_id, der_id, product_type_id, maintenance_flag,
initial_rentals, annual_mileage, man_code)
-- any order (covering)
A "covering" index is an extra boost, in that it can do all the work just in the index's BTree, without touching the data's BTree.
Implement a bunch of what I recommend, then come back (in another Question) for further tweaking.
Usually the "partition key" should be last in a composite index.

Selecting columns messes up order of rows

I have 3 tables blog_articles, blog_tags and blog_articles_tags. Pretty basic stuff, a blog where articles can have tags.
CREATE TABLE `blog_articles` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(255) NOT NULL,
`body` text NOT NULL,
`datetime` datetime NOT NULL,
`author` int(10) unsigned DEFAULT NULL,
`published` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `author` (`author`),
FULLTEXT KEY `title` (`title`,`body`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8;
CREATE TABLE `blog_articles_tags` (
`article` int(10) unsigned NOT NULL,
`tag` int(10) unsigned NOT NULL,
PRIMARY KEY (`article`,`tag`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `blog_tags` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(255) DEFAULT NULL,
`description` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8;
Here is a working example, selecting blog posts with their tags and author
But if I switch the sort from descending to ascending, I get messed up results
However if I remove the columns of the blog_tags table from the list of columns to select, the order is correct
There are two questions I would like to ask:
Why is the sequence of the rows altered by the columns that are selected?
How can I prevent this without modifying the SQL statement anywhere outside of the inner-most query?
I cannot modify the SQL statement because that is an automatically generated statement and I can not determine (easily if at all) what the sort will be and if any successive columns added to the select clause will alter the results even further.
Queries in SQL are unordered unless you specify an ORDER BY. Any ordering you happen to get is an artifact of the implementation and cannot be relied upon.
You have an ORDER BY in one of your virtual tables, but nothing for the outer query. Since it includes several joins you can't even count on MySQL coincidentally preserving the order of the virtual table.
There's little reason to put an ORDER BY on a sub-query unless you're doing something advanced like adding a row-number. So the whole sub-query can be dropped and just join on blog_articles. Then you can ORDER BY blog_articles.id.

MySQL fulltext selection performance

I have 1.5M rows in a table. Following is the table create code:
CREATE TABLE `jobs` (
`id` INT(8) NOT NULL AUTO_INCREMENT,
`job_id` VARCHAR(50) NOT NULL DEFAULT '',
`title` VARCHAR(255) NOT NULL DEFAULT '',
`company` VARCHAR(255) NOT NULL DEFAULT '',
`city` VARCHAR(50) NOT NULL DEFAULT '',
`state` VARCHAR(50) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
UNIQUE INDEX `job_id` (`job_id`),
FULLTEXT INDEX `search` (`title`, `company`, `city`, `state`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM
The query below takes about 0.5 seconds, which is very high.
SELECT id, title, company, state, city FROM `jobs` WHERE MATCH (title, company, state, city) AGAINST ('software engineer in san fransisco california') LIMIT 0,10
How can I decrease execution time and still provide relevance results? Any suggestions?
So far I tried followings but there is no improvement at all.
-Searching in a single field that contains 4 field of data but it did not matter.
-Using in boolean mode>1 or >2, but then it gives me unrelated results
-Repearing the table, increasing key_buffer_size to 1GB from 16MB, changing table type to Innodb, changing character set to latin1 from utf8.
-Setting ft_max_word_len=1 and ft_stopword_file='' from default values.
-I searched online for many hours but no luck so far.
"Explain select..." output:
id;select_type;table;type ;possible_keys;key ;key_len;ref;rows;Extra
1 ;SIMPLE ;jobs ;fulltext;search ;search;0 ;\N ;1 ;Using where
Edit: Thank you for your suggestions but there is no imprevement at all.

MySQL optimize count query

I've got a question about MySQL performance.
These are my tables:
(about 140.000 records)
CREATE TABLE IF NOT EXISTS `article` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`label` varchar(256) COLLATE utf8_unicode_ci NOT NULL,
`title` varchar(256) COLLATE utf8_unicode_ci NOT NULL,
`intro` text COLLATE utf8_unicode_ci NOT NULL,
`content` text COLLATE utf8_unicode_ci NOT NULL,
`date` int(11) NOT NULL,
`active` int(1) NOT NULL,
`language_id` int(11) NOT NULL,
`category_id` int(11) NOT NULL,
`indexed` int(1) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=132911 ;
(about 400.000 records)
CREATE TABLE IF NOT EXISTS `article_category` (
`article_id` int(11) NOT NULL,
`category_id` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
RUNNING THIS COUNT QUERY:
SELECT SQL_NO_CACHE COUNT(id) as total
FROM (`article`)
LEFT JOIN `article_category` ON `article_category`.`article_id` = `article`.`id`
WHERE `article`.`language_id` = 1
AND `article_category`.`category_id` = '<catid>'
This query takes a lot of resources, so I am wondering how to optimize this query.
After executing it's beeing cached, so after the first run I am fine.
RUNNING THE EXPLAIN FUNCTION:
AFTER CREATING AN INDEX:
ALTER TABLE `article_category` ADD INDEX ( `article_id` , `category_id` ) ;
After adding indexes and changing LEFT JOIN to JOIN the query runs alot faster!
Thanks for these fast replys :)
QUERY I USE NOW (I removed the language_id because it was not that neccesary):
SELECT COUNT(id) as total
FROM (`article`)
JOIN `article_category` ON `article_category`.`article_id` = `article`.`id`
AND `article_category`.`category_id` = '<catid>'
I've read something about forcing an index, but I think thats not neccesary anymore because the tables are already indexed, right?
Thanks alot!
Martijn
You haven't created necessary index on the table
Table article_category - Create a compound index on (article_id, category_id)
Table article -Create a compound index on (id, language_id)
If this doesn't help post the explain statement.
The columns used in a JOIN condition should have an index, so you need to index article_id.

optimize query (2 simple left joins)

SELECT fcat.id,fcat.title,fcat.description,
count(DISTINCT ftopic.id) as number_topics,
count(DISTINCT fpost.id) as number_posts FROM fcat
LEFT JOIN ftopic ON fcat.id=ftopic.cat_id
LEFT JOIN fpost ON ftopic.id=fpost.topic_id
GROUP BY fcat.id
ORDER BY fcat.ord
LIMIT 100;
index on ftopic_cat_id, fpost.topic_id, fcat.ord
EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE fcat ALL PRIMARY NULL NULL NULL 11 Using temporary; Using filesort
1 SIMPLE ftopic ref PRIMARY,cat_id_2 cat_id_2 4 bloki.fcat.id 72
1 SIMPLE fpost ref topic_id_2 topic_id_2 4 bloki.ftopic.id 245
fcat - 11 rows,
ftopic - 1106 rows,
fpost - 363000 rows
Query takes 4,2 sec
TABLES:
CREATE TABLE IF NOT EXISTS `fcat` (
`id` int(11) NOT NULL auto_increment,
`title` varchar(250) collate utf8_unicode_ci NOT NULL,
`description` varchar(250) collate utf8_unicode_ci NOT NULL,
`created` datetime NOT NULL,
`visible` tinyint(4) NOT NULL default '1',
`ord` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `ord` (`ord`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=12 ;
CREATE TABLE IF NOT EXISTS `ftopic` (
`id` int(11) NOT NULL auto_increment,
`cat_id` int(11) NOT NULL,
`title` varchar(100) collate utf8_unicode_ci NOT NULL,
`created` datetime NOT NULL,
`updated` timestamp NOT NULL default CURRENT_TIMESTAMP,
`lastname` varchar(200) collate utf8_unicode_ci NOT NULL,
`visible` tinyint(4) NOT NULL default '1',
`closed` tinyint(4) NOT NULL default '0',
`views` int(11) NOT NULL default '1',
PRIMARY KEY (`id`),
KEY `cat_id_2` (`cat_id`,`updated`,`visible`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1116 ;
CREATE TABLE IF NOT EXISTS `fpost` (
`id` int(11) NOT NULL auto_increment,
`topic_id` int(11) NOT NULL,
`pet_id` int(11) NOT NULL,
`content` text collate utf8_unicode_ci NOT NULL,
`imageName` varchar(300) collate utf8_unicode_ci NOT NULL,
`created` datetime NOT NULL,
`reply_id` int(11) NOT NULL,
`visible` tinyint(4) NOT NULL default '1',
`md5` varchar(100) collate utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `md5` (`md5`),
KEY `topic_id_2` (`topic_id`,`created`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=390971 ;
Thanks,
hamlet
you need to create a key with both fcat.id, fcat.ord
Bold rewrite
This code is not functionally identical, but...
Because you want to know about distinct ftopic.id and fpost.id I'm going to be bold and suggest two INNER JOIN's instead of LEFT JOIN's.
Then because the two id's are autoincrementing they will no longer repeat, so you can drop the distinct.
SELECT
fcat.id
, fcat.title
, fcat.description
, count(ftopic.id) as number_topics
, count(fpost.id) as number_posts
FROM fcat
INNER JOIN ftopic ON fcat.id = ftopic.cat_id
INNER JOIN fpost ON ftopic.id = fpost.topic_id
GROUP BY fcat.id
ORDER BY fcat.ord
LIMIT 100;
It depends on your data if this is what you are looking for, but I'm guessing it will be faster.
All your indexes seem to be in order though.
MySQL does not use indexes for small sample sizes!
Note that the explain list that MySQL only has 11 rows to consider for fcat. This is not enough for MySQL to really start worrying about indexes, so it doesn't.
Because going to the index for small row-counts slows things down.
MySQL is trying to speed things up so it chooses not to use the index, this confuses a lot of people because we are trained so hard on the index. Small sample sizes don't give good explains!
Increase the size of the test data so MySQL has more rows to consider and you should start seeing the index being used.
Common misconceptions about force index
Force index does not force MySQL to use an index as such.
It hints at MySQL to use a different index from the one it might naturally use and it pushes MySQL into using an index by setting a very high cost on a table scan.
(In your case MySQL is not using a table scan, so force index has no effect)
MySQL (same most other DBMS's on the planet) has a very strong urge to use indexes, so if it doesn't (use any) that's because using no index at all is faster.
How does MySQL know which index to use
One of the parameters the query optimizer uses is the stored cardinality of the indexes.
Over time these values change... But studying the table takes time, so MySQL doesn't do that unless you tell it to.
Another parameter that affects index selection is the predicted disk-seek-times that MySQL expects to encounter when performing the query.
Tips to improve index usage
ANALYZE TABLE will instruct MySQL to re-evaluate the indexes and update its key distribution (cardinality). (consider running it daily/weekly in a cron job)
SHOW INDEX FROM table will display the key distribution.
MyISAM tables and indexes fragment over time. Use OPTIMIZE TABLE to unfragment the tables and recreate the indexes.
FORCE/USE/IGNORE INDEX limits the options MySQL's query optimizer has to perform your query. Only consider it on complex queries.
Time the effect of your meddling with indexes on a regular basis. A forced index that speeds up your query today might slow it down tomorrow because the underlying data has changed.