Huge innodb tables with SELECT performance issue - mysql

I have two huge innodb tables (page: +40M rows, +30Gb and stat: +45M rows, +10Gb). I have a query that selects rows from the join of these two tables and it used to take about a second for execution. Recently it's taking more than 20 seconds (sometime up to few minutes) for the exact same query to be completed. I suspected that with lot's of inserts and updates it might need an optimization. I ran OPTIMIZE TABLE on the table using phpMyAdmin but no improvements. I've Googled a lot but couldn't find any content helping me on this situation.
The query I mentioned earlier looks like below:
SELECT `c`.`unique`, `c`.`pub`
FROM `pages` `c`
LEFT JOIN `stat` `s` ON `c`.`unique`=`s`.`unique`
WHERE `s`.`isc`='1'
AND `s`.`haa`='0'
AND (`pubID`='24')
ORDER BY `eid` ASC LIMIT 0, 10
These are the tables structure:
CREATE TABLE `pages` (
`eid` int(10) UNSIGNED NOT NULL,
`ti` text COLLATE utf8_persian_ci NOT NULL,
`fat` text COLLATE utf8_persian_ci NOT NULL,
`de` text COLLATE utf8_persian_ci NOT NULL,
`fad` text COLLATE utf8_persian_ci NOT NULL,
`pub` varchar(100) COLLATE utf8_persian_ci NOT NULL,
`pubID` int(10) UNSIGNED NOT NULL,
`pubn` text COLLATE utf8_persian_ci NOT NULL,
`unique` tinytext COLLATE utf8_persian_ci NOT NULL,
`pi` tinytext COLLATE utf8_persian_ci NOT NULL,
`kw` text COLLATE utf8_persian_ci NOT NULL,
`fak` text COLLATE utf8_persian_ci NOT NULL,
`te` text COLLATE utf8_persian_ci NOT NULL,
`fae` text COLLATE utf8_persian_ci NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_persian_ci;
ALTER TABLE `pages`
ADD PRIMARY KEY (`eid`),
ADD UNIQUE KEY `UNIQ` (`unique`(128)),
ADD KEY `pub` (`pub`),
ADD KEY `unique` (`unique`(128)),
ADD KEY `pubID` (`pubID`) USING BTREE;
ALTER TABLE `pages` ADD FULLTEXT KEY `faT` (`fat`);
ALTER TABLE `pages` ADD FULLTEXT KEY `faA` (`fad`,`fae`);
ALTER TABLE `pages` ADD FULLTEXT KEY `faK` (`fak`);
ALTER TABLE `pages` ADD FULLTEXT KEY `pubn` (`pubn`);
ALTER TABLE `pages` ADD FULLTEXT KEY `faTAK` (`fat`,`fad`,`fak`,`fae`);
ALTER TABLE `pages` ADD FULLTEXT KEY `ab` (`de`,`te`);
ALTER TABLE `pages` ADD FULLTEXT KEY `Ti` (`ti`);
ALTER TABLE `pages` ADD FULLTEXT KEY `Kw` (`kw`);
ALTER TABLE `pages` ADD FULLTEXT KEY `TAK` (`ti`,`de`,`kw`,`te`);
ALTER TABLE `pages`
MODIFY `eid` int(10) UNSIGNED NOT NULL AUTO_INCREMENT;
CREATE TABLE `stat` (
`sid` int(10) UNSIGNED NOT NULL,
`unique` tinytext COLLATE utf8_persian_ci NOT NULL,
`haa` tinyint(1) UNSIGNED NOT NULL,
`isc` tinyint(1) NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_persian_ci;
ALTER TABLE `stat`
ADD PRIMARY KEY (`sid`),
ADD UNIQUE KEY `Unique` (`unique`(128)),
ADD KEY `isc` (`isc`),
ADD KEY `haa` (`haa`),
ALTER TABLE `stat`
MODIFY `sid` int(10) UNSIGNED NOT NULL AUTO_INCREMENT;
The following query took only 0.0126 seconds with 38685601 total results as said by phpMyAdmin:
SELECT `sid` FROM `stat` WHERE `s`.`isc`='1' AND `s`.`haa`='0'
and this one took 0.0005 seconds with 5159484 total results
SELECT `eid`, `unique`, `pubn`, `pi` FROM `pages` WHERE `pubID`='24'
Am I missing something? Can anybody help?

The slowdown is probably due to scanning so many rows, and that is now more than can fit in cache. So, let's try to improve the query.
Replace INDEX(pubID) with INDEX(pubID, eid) -- This may allow both the WHERE and ORDER BY to be handled by the index, thereby avoiding a sort.
Replace TINYTEXT with VARCHAR(255) or some smaller limit. This may speed up tmp tables.
Don't use prefix index on eid -- its an INT !
Don't say UNIQUE with prefixing -- UNIQUE(x(128)) only checks the uniqueness of the first 128 columns !
Once you change to VARCHAR(255) (or less), you can apply UNIQUE to the entire column.
The biggest performance issue is filtering on two tables -- can you move the status flags into the main table?
Change LEFT JOIN to JOIN.
What does unique look like? If it is a "UUID", that could further explain the trouble.
If that is a UUID that is 39 characters, the string can be converted to a 16-byte column for further space savings (and speedup). Let's discuss this further if necessary.
5 million results in 0.5ms is bogus -- it was fetching from the Query cache. Either turn off the QC or run with SELECT SQL_NO_CACHE...

+1 to #RickJames answer, but following it I have done a test.
I would also recommend you do not use the name unique for a column name, because it's an SQL reserved word.
ALTER TABLE pages
CHANGE `unique` objectId VARCHAR(128) NOT NULL COMMENT 'Document Object Identifier',
DROP KEY pubId,
ADD KEY bktest1 (pubId, eid, objectId, pub);
ALTER TABLE stat
CHANGE `unique` objectId VARCHAR(128) NOT NULL COMMENT 'Document Object Identifier',
DROP KEY `unique`,
ADD UNIQUE KEY bktest2 (objectId, isc, haa);
mysql> explain SELECT `c`.`objectId`, `c`.`pub` FROM `pages` `c` JOIN `stat` `s` ON `c`.`objectId`=`s`.`objectId` WHERE `s`.`isc`='1' AND `s`.`haa`='0' AND (`pubID`='24') ORDER BY `eid` ASC LIMIT 0, 10;
+----+-------------+-------+------------+--------+-------------------------+---------+---------+-----------------------------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+-------------------------+---------+---------+-----------------------------+------+----------+--------------------------+
| 1 | SIMPLE | c | NULL | ref | unique,unique_2,bktest1 | bktest1 | 4 | const | 1 | 100.00 | Using where; Using index |
| 1 | SIMPLE | s | NULL | eq_ref | bktest2,haa,isc | bktest2 | 388 | test.c.objectId,const,const | 1 | 100.00 | Using index |
+----+-------------+-------+------------+--------+-------------------------+---------+---------+-----------------------------+------+----------+--------------------------+
By creating the multi-column indexes, this makes them covering indexes, and you see "Using index" in the EXPLAIN report.
It's important to put eid second in the bktest1 index, so you avoid a filesort.
This is the best you can hope to optimize this query without denormalizing or partitioning the tables.
Next you should make sure your buffer pool is large enough to hold all the requested data.

Related

MySQL query with GROUP BY on a full text index is very slow

I'm building an online tool for collecting feedback.
Right now I'm building a visual summary of all answers per question with answer occurence next to it. I use this query:
SELECT
feedback_answer,
feedback_qtype,
COUNT(feedback_answer) as occurence
FROM acc_data_1005
WHERE (feedback_qtype=5 or feedback_qtype=4 or feedback_qtype=12 or feedback_qtype=13 or feedback_qtype=1 or feedback_qtype=2)
and survey_id=205283
GROUP BY feedback_answer ORDER BY feedback_qtype DESC, COUNT(feedback_answer) DESC
DB table:
CREATE TABLE `acc_data_1005` (
`id` int UNSIGNED NOT NULL,
`survey_id` int UNSIGNED NOT NULL,
`feedback_id` int UNSIGNED NOT NULL,
`date_registered` date NOT NULL,
`feedback_qid` int UNSIGNED NOT NULL,
`feedback_question` varchar(140) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`feedback_qtype` tinyint UNSIGNED NOT NULL COMMENT 'nps, text, input etc',
`data_type` tinyint UNSIGNED NOT NULL COMMENT '0 till 10 are sensitive data options (first name, last name, email etc.)',
`feedback_answer` varchar(1500) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
`additional_data` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC;
ALTER TABLE `acc_data_1005`
ADD PRIMARY KEY (`id`),
ADD KEY `date_registered` (`date_registered`),
ADD KEY `feedback_qid` (`feedback_qid`,`feedback_question`) USING BTREE,
ADD KEY `feedback_id` (`feedback_id`),
ADD KEY `survey_id` (`survey_id`);
ALTER TABLE `acc_data_1005` ADD FULLTEXT KEY `feedback_answer` (`feedback_answer`);
ALTER TABLE `acc_data_1005`
MODIFY `id` int UNSIGNED NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=2020001;
COMMIT;
The table has around 2 million rows and for this test, they all have the same survey_id.
Profling says executing takes up 96% of time, explain result:
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
acc_data_1005
NULL
ref
survey_id,feedback_answer
survey_id
4
const
998375
46.86
Using where; Using temporary; Using filesort
This query takes around 22-30 seconds for just 11 rows.
If I remove the survey_id (which is important), the query takes around 2-4 seconds (still way too much).
I've been at it for hours but can't find why this query is so slow.
If it helps I can dump the rows in a SQL file (around 400-600MB).
The group by is slow because of scanning 2 million rows on a fulltext index (feedback_answer) long character items.
I created another table "analytic_stats" and create a cron job that runs this query every month (for only the data of that month) and store that in the stats table.
When the customer want's to get the data of a full year (2 million+ rows, which is too slow) I just get the data of a few rows from the stats table and run the group by query only for the current month. This would just have to group around 10.000-20.000 rows instead of 2 million which is instant.
Maybe not the most efficent way, but it works for me ;)
Hope it might help someone with a similar problem.

MySQL key index not working, search all rows using where

So basically I created a table:
CREATE TABLE IF NOT EXISTS `student` (
`id` int(4) unsigned NOT NULL AUTO_INCREMENT,
`campus` enum('CAMPUS1', 'CAMPUS2') NOT NULL,
`fullname` char(32) NOT NULL,
`gender` enum('MALE', 'FEMALE') NOT NULL,
`birthday` char(16) NOT NULL,
`phone` char(32) NOT NULL,
`emergency` char(32) NOT NULL,
`address` char(128) NOT NULL,
PRIMARY KEY (`idx`),
KEY `key_student` (`campus`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
I have like 20 rows with only 12 in CAMPUS1
But when I use query it: SELECT * FROM student WHERE campus='CAMPUS1'; The EXPLAIN is this:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE student ALL key_student NULL NULL NULL 20 Using where
I am new to this thing, how does a KEY really works? I read documentation but I cant understand that much.
MySQL is trying to be smart (with varying success) when deciding which index to use for a query.
There are cases where it is faster to query the entire table instead of using the index. E.g: if your table has 500 records for CAMPUS1 and 100 records for CAMPUS2 it is faster to do a full (600 records) scan when looking for campus='CAMPUS1'.
When you have only 20 rows you run into the edge cases of the algorithm. Try adding some more rows, and see what happens.
Also, it seems this index will have a very low cardinality (an even split between only 2 values). It will probably not be very useful.

Validate fields from one table to another in MySQL

The problem:
I have 1 table of aprox 5000 rows called imported_cities
I have 1 table of aprox 800 000 rows called postal_codes containing postal codes cities
I need to validate each distinct city from imported_cities against the cities in postal codes table based on city name and its province. See tables structure below.
If they match exactly (yes, exactly. The rest of cities are manually validated) I have to update a column on imported_city and
enter both city from imported_cities and city from postal_codes (side by side) into a third table called imported_cities_equiv
What I have tried:
Adding indexes to tables and make query below. It takes forever... :(
explain SELECT DISTINCT ic.destinationCity, pc.city FROM (imported_cities ic, postalcodes pc)
WHERE LOWER(ic.destinationCity) = LOWER(pc.city)
the result
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE ip index NULL company_city 478 NULL 4221 Using index; Using temporary
1 SIMPLE pc index NULL city_prov 160 NULL 765407 Using where; Using index; Using join buffer (Block...
--
-- Table structure for table postalcodes
CREATE TABLE IF NOT EXISTS `postalcodes` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`code` varchar(11) NOT NULL,
`city` varchar(50) NOT NULL,
`province` varchar(50) NOT NULL,
`provinceISO` varchar(2) NOT NULL,
`latitude` decimal(17,13) NOT NULL,
`longitude` decimal(17,13) NOT NULL,
PRIMARY KEY (`id`),
KEY `code` (`code`),
KEY `city_prov` (`city`,`provinceISO`)
--
-- Table structure for table imported_cities
CREATE TABLE IF NOT EXISTS `imported_cities` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`companyName` varchar(30) CHARACTER SET utf8 NOT NULL,
`destinationCity` varchar(128) CHARACTER SET utf8 NOT NULL,
`destinationProvince` varchar(20) CHARACTER SET utf8 NOT NULL,
`equivCity` varchar(128) CHARACTER SET utf8 DEFAULT NULL,
`minAmount` decimal(6,2) NOT NULL
PRIMARY KEY (`id`),
KEY `company_city` (`companyName`,`destinationCity`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=7933 ;
--
-- Table structure for table imported_cities_equiv
CREATE TABLE IF NOT EXISTS `imported_cities_equiv` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`imported_city` varchar(128) CHARACTER SET utf8 NOT NULL,
`pc_city` varchar(128) CHARACTER SET utf8 NOT NULL,
`province` varchar(20) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=149 ;
Any help or suggestion is appreciated. Thank you.
The query you want to get your information is:
SELECT ip.*, (pc.city is not null) as exact match
FROM imported_prices ip left join
postalcodes pc
on LOWER(ip.destinationCity) = LOWER(pc.city) and
lower(ip.province) = lower(pc.province);
However, this will have really bad performance. Getting rid of the lower() would help:
SELECT ip.*, (pc.city is not null) as exact match
FROM imported_prices ip left join
postalcodes pc
on(ip.destinationCity) =(pc.city) and
(ip.province) = (pc.province);
Because then you can add an index on postalcodes(city, province).
If you cannot use remove lower(), then alter the table to add new columns and put the lower-case values in those columns. Then build an index on the new columns and use them in the join.
Thank you all for pointing me on the right direction.
Some changes have been made following your advices:
added indexes on imported_cities table on destinationCity and destinationProvince columns
added indexes on postalcodes table on city and provinceISO columns
JOIN clause have only one side upper since the field ic.destinationCity is already in uppercase
limit query by province on WHERE for performance
The final SQL is:
SELECT DISTINCT pc.city, pc.provinceISO
FROM postalcodes pc
LEFT JOIN imported_cities ic
ON upper(pc.city) = ic.destinationCity AND
pc.provinceISO = ic.destinationProvince
WHERE ic.destinationProvince = 'QC';
AND the EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE pc ref province province 8 const 278115 Using index condition; Using temporary
1 SIMPLE ip ref destinationCity,destinationProvince destinationCity 386 func 1 Using index condition; Using where; Distinct
Going forward I can now construct the INSERT query on PHP and make one INSERT query to insert all equivalent cities on the 3rd table. Thank you all.

How to create indexes efficiently

I wish to know how I can create indexes in my database according to my data structure. most of my queries are fetching data against the ID and the name as well with two or three tables joining while pagination. please advise how to make indexes according to below queries.
Query:1
SELECT DISTINCT topic, type FROM books where type like 'Tutor-Books' order by topic
Explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE books range faith faith 102 NULL 132 Using index condition; Using temporary; Using filesort
Query:2
SELECT books.name, books.name2, books.id, books.image, books.faith,
books.topic, books.downloaded, books.viewed, books.language,
books.size, books.author as author_id, authors.name as author_name,
authors.aid
from books
LEFT JOIN authors ON books.author = authors.aid
WHERE books.id = '".$id."'
AND status = 1
Explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE books const PRIMARY PRIMARY 4 const 1 NULL
1 SIMPLE authors const aid aid 4 const 1 NULL
Can i use indexes for pagination in offset case where same query returns total:
SELECT SQL_CALC_FOUND_ROWS books.name, books.name2, books.id,
books.image, books.topic, books.author as author_id,
authors.name as author_name, authors.aid
from books
LEFT JOIN authors ON books.author = authors.aid
WHERE books.author = '$pid'
AND status = 1
ORDER BY books.name
LIMIT $limit OFFSET $offset
Do I need to update my queries after creating indexes. please also suggest what should be the table format.
SHOW CREATE TABLE books:
Table Create Table
books CREATE TABLE `books` (
`name` varchar(100) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`name2` varchar(150) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`author` int(100) NOT NULL,
`translator` int(120) NOT NULL,
`publisher` int(100) NOT NULL,
`pages` int(50) NOT NULL,
`date` varchar(50) CHARACTER SET latin1 NOT NULL,
`downloaded` int(100) NOT NULL,
`alt_lnk` text NOT NULL,
`viewed` int(100) NOT NULL,
`language` varchar(100) CHARACTER SET latin1 NOT NULL,
`image` varchar(200) CHARACTER SET latin1 NOT NULL,
`faith` varchar(100) CHARACTER SET latin1 NOT NULL,
`id` int(100) NOT NULL AUTO_INCREMENT,
`sid` varchar(1200) CHARACTER SET latin1 DEFAULT NULL,
`topic` varchar(100) CHARACTER SET latin1 NOT NULL,
`last_viewed` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`size` double NOT NULL,
`status` int(2) NOT NULL DEFAULT '0',
`is_scroll` int(2) NOT NULL,
`is_downloaded` int(2) NOT NULL,
`pdf_not_found` int(2) NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`),
KEY `downloaded` (`downloaded`),
KEY `name2` (`name2`),
KEY `topic` (`topic`),
KEY `faith` (`faith`)
) ENGINE=InnoDB AUTO_INCREMENT=12962 DEFAULT CHARSET=utf8
where type like 'Tutor-Books' order by topic (or:)
where type = 'Tutor-Books' order by topic
--> INDEX(type, topic)
where type like '%Tutor-Books' order by topic
--> INDEX(topic) -- the leading % prevents indexing
LEFT JOIN authors ON books.author = authors.aid
--> PRIMARY KEY(aid)
Do you really need LEFT JOIN? If you can change it to JOIN, the optimizer might be able to start with authors. If it does, then
--> INDEX(author) -- in `books`
My cookbook for building indexes.
Other tips:
INT(100) and INT(2) are identical -- each is a 4-byte signed integer. Read about TINYINT UNSIGNED for numbers 0..255, etc. Use that for your flags (status, is_scroll, etc)
DATE is a datatype; using a VARCHAR is problematic if you ever want to compare or order.
Learn about composite indexes, such as my first example.
Your display widths are a little funky, but that wont cause a problem.
Query 1:
You're using the LIKE operator without a wildcard search %. You can likely swap this with an = operator.
I don't see the column type in your SHOW CREATE TABLE -- but it seems you don't have an index here, unless you renamed it to faith.
Do you need to type to be a string? could it be abstracted to a types table and then joined against using an integer? Or, if you have a fixed amount of types that's unlikely to change, could you use an enum?
Query 2:
You don't need to quote strings, also that's probably vulnerable to SQL injection. do ='.intval($id).' instead.
Make sure you have an index on authors.aid and that they're of the same type.

Need help optimizing mysql query to get it to sort quickly by index

Someone helped me come up with this query but its still too slow; The order by is slowing it down and I dont think its using my index
I'm hoping someone can fix it for me :D Yes I read the manual page but I can't understand it.
Query:
EXPLAIN SELECT u.id, u.url, u.title, u.numsaves
FROM urls u
JOIN tags t ON t.url_id = u.id
AND t.tag = 'osx'
ORDER BY u.numsaves DESC
LIMIT 20
Showing rows 20 - 19 ( 20 total, Query took 1.5395 sec) [numsaves: 6130 - 2107]
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t ref tag_id tag_id 767 const 49432 Using where; Using index; Using temporary; Using filesort
1 SIMPLE u eq_ref PRIMARY,id_numsaves_IX PRIMARY 4 jcooper_whatrethebest_urls.t.url_id 1
Database:
CREATE TABLE `urls` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`url` text NOT NULL,
`domain` text,
`title` text NOT NULL,
`description` text,
`numsaves` int(11) NOT NULL,
`firstsaved` varchar(256) DEFAULT NULL,
`md5` varchar(255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
UNIQUE KEY `md5` (`md5`),
KEY `id_numsaves_IX` (`id`,`numsaves`)
) ENGINE=InnoDB AUTO_INCREMENT=2958560 DEFAULT CHARSET=utf8
CREATE TABLE `tags` (
`url_id` int(11) DEFAULT NULL,
`hash` varchar(255) NOT NULL,
`tag` varchar(255) NOT NULL,
UNIQUE KEY `tag_id` (`tag`,`url_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
I think the main problem with your query is your choice of indexes.
1) tags has a compound UNIQUE KEY on tag and url_id but no PRIMARY KEY.
If nothing else, you should make it primary - this may help a bit with performance. Also, you might want to take a close look if VARCHAR(255) is really necessary for your tags. It makes the index quite big.
2) add a separate index on numsaves since you're ordering by that. The compound index on id and numsaves is not going to help here.
3) EXPLAIN says that you have 49432 rows in tags that match "osx". This is quite redundant. You may want to split your tags table into two, one containing the text while the other contains the N:M link to urls.