Why such query has issue "Using temporary; Using filesort" on table?
Explain
SELECT `table`.*, COUNT(table.id) AS `count`
FROM `table`
LEFT JOIN `table2` ON table.id = table2.foreign_id
GROUP BY `table2`.`foreign_id`
ORDER BY table.`title` ASC
1 SIMPLE table ALL NULL NULL NULL NULL 305 Using temporary; Using filesort
1 SIMPLE table2 ref table table 5 table.id 343 Using index
During doc could be without these slow processes.
EDIT:
Tables are the easiest as could be.
CREATE TABLE `table` (
`id` int(11) NOT NULL,
`title` varchar(100) DEFAULT NULL,
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE `table2` (
`id` int(11) NOT NULL,
`foreign_id`(11) NOT NULL,
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
`table`.* -- This implies that you want all the rows (and all columns)
COUNT(table.id) -- This implies that you just one a single number
Although MySQL let you run the query, it is not a proper query. (And, in version 8.0, it will spit at you.)
If id is the PRIMARY KEY, then GROUP BY table.id further complicates the issue. That says to produce one row for each id. But since id is unique, there is only one row for each id. So the GROUP BY is redundant.
Please describe what the query is supposed to do; we can help you correctly formulate it.
Meanwhile, use InnoDB instead of MyISAM.
Related
What is the proper indexing for this query.
I tried given different combinations of indexes for this query but it is still using from using tempory , using filesort etc.
Total table data - 7,60,346
product= 'Dresses' - Total rows = 122 554
CREATE TABLE IF NOT EXISTS `product_data` (
`table_id` int(11) NOT NULL AUTO_INCREMENT,
`id` int(11) NOT NULL,
`price` int(11) NOT NULL,
`store` varchar(255) NOT NULL,
`brand` varchar(255) DEFAULT NULL,
`product` varchar(255) NOT NULL,
`model` varchar(255) NOT NULL,
`size` varchar(50) NOT NULL,
`discount` varchar(255) NOT NULL,
`gender_id` int(11) NOT NULL,
`availability` int(11) NOT NULL,
PRIMARY KEY (`table_id`),
UNIQUE KEY `table_id` (`table_id`),
KEY `id` (`id`),
KEY `discount` (`discount`),
KEY `step_one` (`product`,`availability`),
KEY `step_two` (`product`,`availability`,`brand`,`store`),
KEY `step_three` (`product`,`availability`,`brand`,`store`,`id`),
KEY `step_four` (`brand`,`store`),
KEY `step_five` (`brand`,`store`,`id`)
) ENGINE=InnoDB ;
Query :
SELECT id ,store,brand FROM `product_data` WHERE product='dresses' and
availability='1' group by brand,store order by store limit 10;
excu..time :- (10 total, Query took 1.0941 sec)
EXPLAIN PLAN :
possible_keys :- step_one, step_two, step_three, step_four, step_five
key :- step_two
ref :- const,const
rows :- 229438
Extra :-Using where; Using temporary; Using filesort
I tried these indexes
Key step_one (product,availability)
Key step_two (product,availability,brand,store)
Key step_three (product,availability,brand,store,id)
Key step_four (brand,store)
Key step_five (brand,store,id)
The real problem is not the index, but the mismatch between GROUP BY and ORDER BY preventing taking advantage of LIMIT.
This
INDEX(product, availability, store, brand, id)
will be "covering" and in the right order. But note that I have swapped store and brand...
Change the query to
SELECT id ,store,brand
FROM `product_data`
WHERE product='dresses'
and availability='1'
GROUP BY store, brand -- change
ORDER BY store, brand -- change
limit 10;
That changes the GROUP BY to start with store, to reflect the ORDER BY ordering -- this avoid an extra sort. And it changes the ORDER BY to be identical to the GROUP BY so that the two can be combined.
Given those changes, the INDEX can now go all the way through to the LIMIT, thereby allowing the processing to look at only 10 rows, not a much larger set.
Anything less than all these changes will not be as efficient.
Further discussion:
INDEX(product, availability, -- these two can be in either order
store, brand, -- must match both `GROUP BY` and `ORDER BY`
id) -- tacked on (on the end) to make it "covering"
"Covering" means that all the columns for the SELECT are found in the INDEX, so no need to reach over into the data.
But... The whole query does not make sense because of the inclusion of id in the SELECT. If you want to find what stores have available dresses, then get rid of id. If you want to list all the available dresses, then change id to GROUP_CONCAT(id).
For the indexes, the best index is the step_two. The product field is used in where and has more variation than the availability-field.
Couple of notes about the query:
availability='1' should be availability=1 so that needless int->varchar conversion would be avoided.
"group by brand" should not be used as GROUP BY should only be used when you use aggregate functions as selected columns. What as it that you were trying to achieve with the group by?
Your group by clause doesn't really make sense without an aggregate function.
If you can re-write the query to
SELECT id ,store
FROM `product_data`
WHERE product='dresses'
and availability='1'
order by store limit 10;
Then an index on (product,availability,store) will remove all filesorts.
See SQLFiddle: http://sqlfiddle.com/#!9/60f33d/2
UPDATE:
The SQLFiddle makes your intention clear - you're using GROUP BY to simulate DISTINCT. I don't think you can get rid of the filesort and temporary table steps in your query if this is the case - but I also don't think those steps should be hugely expensive.
I've read multiple questions in here but none could help me so far. For the same query and table structure on my previous [unanswered] question Optimizing a SELECT … UNION … query with ORDER and LIMIT on a table with 5M+ rows besides having all the indexes defined, the query is still logged as "not using index".
SELECT `id`, `title`, `title_fa`
FROM
( SELECT `p`.`id` AS `id`, `p`.`title` AS `title`, `p`.`title_fa` AS `title_fa`,
`p`.`unique` AS `unique`, `p`.`date` AS `date`
FROM `articles` `p`
LEFT JOIN `authors` `a` ON `p`.`unique` = `a`.`unique`
WHERE 1
AND MATCH (`p`.`title`) AGAINST ('"heat"' IN BOOLEAN MODE)
UNION
SELECT `p`.`id` AS `id`, `p`.`title` AS `title`, `p`.`title_fa` AS `title_fa`,
`p`.`unique` AS `unique`, `p`.`date` AS `date`
FROM `articles` `p`
LEFT JOIN `authors` `a` ON `p`.`unique` = `a`.`unique`
WHERE 1
AND MATCH (`p`.`title_fa`) AGAINST ('"گرما"' IN BOOLEAN MODE)
) AS `subQuery`
GROUP BY `unique`
ORDER BY `date` DESC
LIMIT 0,10;
I don't know how should I use an index in the outer SELECT where it's grouping the two SELECTs using UNION.
Thanks
Update
This is the structure of the article table:
CREATE TABLE `articles` (
`id` int(10) unsigned NOT NULL,
`title` text COLLATE utf8_persian_ci NOT NULL,
`title_fa` text COLLATE utf8_persian_ci NOT NULL,
`description` text COLLATE utf8_persian_ci NOT NULL,
`description_fa` text COLLATE utf8_persian_ci NOT NULL,
`date` date NOT NULL,
`unique` tinytext COLLATE utf8_persian_ci NOT NULL,
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_persian_ci;
ALTER TABLE `articles`
ADD PRIMARY KEY (`id`),
ADD KEY `unique` (`unique`(128)),
ADD FULLTEXT KEY `TtlDesc` (`title`,`description`);
ADD FULLTEXT KEY `Title` (`title`);
ADD FULLTEXT KEY `faTtlDesc` (`title_fa`,`description_fa`);
ADD FULLTEXT KEY `faTitle` (`title_fa`);
MODIFY `id` int(10) unsigned NOT NULL AUTO_INCREMENT;
UPDATE 2:
Here is the output of EXPLAIN SELECT (I didn't know how to get it from phpMyAdmin any better! Sorry if it doesn't look good):
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 4 Using temporary; Using filesort
2 DERIVED p fulltext title title 0 NULL 1 Using where
3 UNION p fulltext title_fa title_fa 0 NULL 1 Using where
NULL UNION RESULT <union2,3> ALL NULL NULL NULL NULL NULL Using temporary
) ASsubQuery
It is a subquery, a derived table, and it is manifested coming out of a temporary table. It has no chance of index use.
As I wrote in this answer:
The document Derived Tables in MySQL 5.7 describes it well for
versions 5.6 and 5.7, where the latter will provide no penalty due to
the change in materialized derived table output being incorporated
into the outer query. In prior versions, substantial overhead was
endured with temporary tables with the derived.
When there is a MATCH clause, only a FULLTEXT index will be used.
Meanwhile, tips on syntax and pagination:
The usual pattern:
( SELECT ...
GROUP BY ... ORDER BY ... -- apply to result of inner SELECT
)
UNION ALL
( SELECT ...
GROUP BY ... ORDER BY ... -- apply to result of inner SELECT
)
GROUP BY ... ORDER BY ... -- apply to result of UNION
(If you need pagination, see my blog .)
Addenda
In the EXPLAIN... The 1st and 4th lines say ALL and NULL -- this indicates that no index was used in any way. In those cases, we are talking about 4 rows, and all 4 rows are needed. So, do not worry that no INDEX was used.
In the 2nd and 3rd lines, a FULLTEXT index was used.
The phrase Using index (which does not show in your EXPLAIN) does not mean "using some index", it means "using only the index". To elaborate... The data for a table is in one place, the index is in another. When all the necessary columns are in the index, the query does not need to reach over into the data. This is labeled as Using index, and it is termed a "covering index". This particular situation is not relevant for your query.
A similar phrase, Using index condition, means something else. It says that the WHERE clause can be handled by the storage engine, and does not need to involve the handler. Let's simply say that it is an optimization making things run a little faster.
Bottom line: You query is well written, and your indexes are fine for this query.
Maybe no UNION?
Try getting rid of the UNION and simply search for both strings at the same time:
FULLTEXT(title, title_fa)
MATCH (title, title_fa) AGAINST ('"heat" "گرما"' IN BOOLEAN MODE)
If that does not work, then explain what goes wrong.
I have a table that contains two bigint columns: beginNumber, endNumber, defined as UNIQUE. The ID is the Primary Key.
ID | beginNumber | endNumber | Name | Criteria
The second table contains a number. I want to retrieve the record from table1 when the Number from table2 is found to be between any two numbers. The is the query:
select distinct t1.Name, t1.Country
from t1
where t2.Number
BETWEEN t1.beginIpNum AND t1.endNumber
The query is taking too much time as I have so many records. I don't have experience in DB. But, I read that indexing the table will improve the search so MySQL does not have to pass through every row searching about m Number and this can be done by, for example, having UNIQE values. I made the beginNumber & endNumber in table1 as UNIQUE. Is this all what I can do ? Is there any possible way to improve the time ? Please, provide detailed answers.
EDIT:
table1:
CREATE TABLE `t1` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`beginNumber` bigint(20) DEFAULT NULL,
`endNumber` bigint(20) DEFAULT NULL,
`Name` varchar(255) DEFAULT NULL,
`Criteria` varchar(455) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `beginNumber_UNIQUE` (`beginNumber`),
UNIQUE KEY `endNumber_UNIQUE` (`endNumber `)
) ENGINE=InnoDB AUTO_INCREMENT=327 DEFAULT CHARSET=utf8
table2:
CREATE TABLE `t2` (
`id2` int(11) NOT NULL AUTO_INCREMENT,
`description` varchar(255) DEFAULT NULL,
`Number` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id2`),
UNIQUE KEY ` description _UNIQUE` (`description `)
) ENGINE=InnoDB AUTO_INCREMENT=433 DEFAULT CHARSET=utf8
This is a toy example of the tables but it shows the concerned part.
I'd suggest an index on t2.Number like this:
ALTER TABLE t2 ADD INDEX numindex(Number);
Your query won't work as written because it won't know which t2 to use. Try this:
SELECT DISTINCT t1.Name, t1.Criteria
FROM t1
WHERE EXISTS (SELECT * FROM t2 WHERE t2.Number BETWEEN t1.beginNumber AND t1.endNumber);
Without the t2.Number index EXPLAIN gives this query plan:
1 PRIMARY t1 ALL 1 Using where; Using temporary
2 DEPENDENT SUBQUERY t2 ALL 1 Using where
With an index on t2.Number, you get this plan:
PRIMARY t1 ALL 1 Using where; Using temporary
DEPENDENT SUBQUERY t2 index numindex numindex 9 1 Using where; Using index
The important part to understand is that an ALL comparison is slower than an index comparison.
This is a good place to use binary tree index (default is hashmap). Btree indexes are best when you often sort or use between on column.
CREATE INDEX index_name
ON table_name (column_name)
USING BTREE
To start out here is a simplified version of the tables involved.
tbl_map has approx 4,000,000 rows, tbl_1 has approx 120 rows, tbl_2 contains approx 5,000,000 rows. I know the data shouldn't be consider that large given that Google, Yahoo!, etc use much larger datasets. So I'm just assuming that I'm missing something.
CREATE TABLE `tbl_map` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`tbl_1_id` bigint(20) DEFAULT '-1',
`tbl_2_id` bigint(20) DEFAULT '-1',
`rating` decimal(3,3) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `tbl_1_id` (`tbl_1_id`),
KEY `tbl_2_id` (`tbl_2_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `tbl_1` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `tbl_2` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`data` varchar(255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The Query in interest: also, instead of ORDER BY RAND(), ORDERY BY t.id DESC. The query is taking as much as 5~10 seconds and causes a considerable wait when users view this page.
EXPLAIN SELECT t.data, t.id , tm.rating
FROM tbl_2 AS t
JOIN tbl_map AS tm
ON t.id = tm.tbl_2_id
WHERE tm.tbl_1_id =94
AND tm.rating IS NOT NULL
ORDER BY t.id DESC
LIMIT 200
1 SIMPLE tm ref tbl_1_id, tbl_2_id tbl_1_id 9 const 703438 Using where; Using temporary; Using filesort
1 SIMPLE t eq_ref PRIMARY PRIMARY 8 tm.tbl_2_id 1
I would just liked to speed up the query, ensure that I have proper indexes, etc.
I appreciate any advice from DB Gurus out there! Thanks.
SUGGESTION : Index the table as follows:
ALTER TABLE tbl_map ADD INDEX (tbl_1_id,rating,tbl_2_id);
As per Rolando, yes, you definitely need an index on the map table but I would expand to ALSO include the tbl_2_id which is for your ORDER BY clause of Table 2's ID (which is in the same table as the map, so just use that index. Also, since the index now holds all 3 fields, and is based on the ID of the key search and criteria of null (or not) of rating, the 3rd element has them already in order for your ORDER BY clause.
INDEX (tbl_1_id,rating, tbl_2_id);
Then, I would just have the query as
SELECT STRAIGHT_JOIN
t.data,
t.id ,
tm.rating
FROM
tbl_map tm
join tbl_2 t
on tm.tbl_2_id = t.id
WHERE
tm.tbl_1_id = 94
AND tm.rating IS NOT NULL
ORDER BY
tm.tbl_2_id DESC
LIMIT 200
Here`s my SHOW CREATE TABLE tbl:
CREATE TABLE IF NOT EXISTS `msc_pagestats` (
`id` int(10) unsigned NOT NULL auto_increment,
`domain` varchar(250) NOT NULL,
`file` varchar(200) NOT NULL,
`simbol` varchar(100) NOT NULL,
`request_time` timestamp NULL default CURRENT_TIMESTAMP,
`querystring` mediumtext NOT NULL,
`host` varchar(100) NOT NULL,
PRIMARY KEY (`id`),
KEY `myindex` (`simbol`,`request_time`,`file`,`domain`)
) ENGINE=MyISAM DEFAULT CHARSET=latin2 AUTO_INCREMENT=248008 ;
So basically this table keeps track on what simbols have been most accesed, most viewed, most searched within the site based on a query string.
My query is:
SELECT `simbol`, count(*) AS requests
FROM msc_pagestats
WHERE 1=1 AND request_time > '20100504000000'
AND simbol NOT LIKE ''
GROUP BY `simbol`
ORDER BY requests DESC
LIMIT 0, 15;
This query EXPLAINED:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE msc_pagestats index NULL myindex 561 NULL 24961 Using where; Using index; Using temporary; Using filesort
So the query tryes to get the most accesed symbols in the latest hour or today.
Here's what I've tried doing in order to get rid of using temporary and using filesort:
Adding an ID as primary key and using COUNT(id) AS requests instead of the COUNT(*) AS requests;
Removing the where 1=1 and simbol not like='', it dosen`t prove a big difference though;
Adding a multiple index instead of the reqular index, previously there were indexes on each column ex (KEY(request_time), KEY(file), KEY(domain), KEY(simbol)).
I'm not that good on optimization, so I`ve runed out of options.
Here's a dump of my 'mysq_slow_query' file:
Query_time: 3 Lock_time: 0 Rows_sent: 15 Rows_examined: 220297
use kmarket;
SELECT `simbol`, count(*) AS requests
FROM msc_pagestats
WHERE 1=1 AND request_time > '20100504000000'
AND simbol NOT LIKE ''
GROUP BY `simbol`
ORDER BY requests DESC
LIMIT 0, 15;
Any help would be appreciated, thanks :)
Not much point in adding an index to a field calculated at run time, it would still need to be sorted/indexed on each run.
An index on (request_time,simbol) may allow the optimiser to exclude a lot of rows more quickly and also reduce the key length.