MySQL self-join takes too long—can I streamline this query? - mysql

I'm hoping (and pretty sure that) someone out there is much better at MySQL queries than msyelf.
I have a query which checks a table that contains information on :
- a search term
- title and price results from various sites using this search term
For the sake of streamlining, I've inserted the data already converted to lowercase with spaces removed and the whole thing trimmed to 11 characters to help reduce the load on the MySQL server.
The query is designed to find the maximum cost and minimum cost of likely equal titles and determine a price difference if it exists.
Having read some similar questions here, I've also prepended EXPLAIN EXTENDED to the query to see if that would help and I'm including the results along with the query.
The query as is :
SELECT
a.pricesrch11,
b.pricesrch11,
a.pricegroup11,
b.pricegroup11,
a.priceamt - b.priceamt AS pricediff
FROM ebssavings a
LEFT JOIN ebssavings b ON ( a.pricesrch11 = b.pricesrch11 )
AND (a.pricegroup11 = a.pricesrch11)
AND (b.pricegroup11 = a.pricesrch11)
WHERE a.priceamt - b.priceamt >0
GROUP BY a.pricesrch11
The results of the EXPLAIN :
select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
1 | SIMPLE | a | ALL | pricesrch11,pricegroup11 | NULL | NULL | NULL | 8816 | Using where; Using temporary; Using filesort
1 | SIMPLE | b | ALL | pricesrch11,pricegroup11 | NULL | NULL | NULL | 6612 | Using where
ADDENDUM :
I just ran this query and got the following result :
Showing rows 0 - 4 ( 5 total, Query took 66.8119 sec)
CREATE TABLE IF NOT EXISTS ebssavings
( priceid int(44) NOT NULL auto_increment,
priceamt decimal(10,2) NOT NULL,
pricesrch11 varchar(11) character set utf8 collate utf8_unicode_ci NOT NULL,
pricegroup11 varchar(11) character set utf8 collate utf8_unicode_ci NOT NULL,
pricedate timestamp NOT NULL default CURRENT_TIMESTAMP,
PRIMARY KEY (priceid),
KEY priceamt (priceamt),
KEY pricesrch11 (pricesrch11),
KEY pricegroup11 (pricegroup11) )
ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=8817
MORE INFO ON THE NEW INDEXES (removed pricegroup11, and made a composite index called srchandtitle from pricesrch11 and pricegroup11):
Edit Drop PRIMARY BTREE Yes No priceid 169 A
Edit Drop priceamt BTREE No No priceamt 56 A
Edit Drop pricesrch11 BTREE No No pricesrch11 12 A
Edit Drop srchandtitle BTREE No No pricesrch11 12 A
pricegroup11 169 A

Create two indexes:
PriceSrch11
A clustered index on pricesrch11,pricegroup11

Remove the Key on pricegroup11 and add a composite clustered key on pricesrch11,pricegroup11.
Also move the table to InnoDB.

It seems that things have sped up now with the changes made to the table and the indexes.
I've emptied the table and am beginning again.
Thank you all for your help.
-A

Related

MySQL update query not using indexed columns

I tried the following SQL query to update the table INDEXED_MERCHANT where I have 10000 records in the table. I indexed both "Name" and "A" as Indexed keys to improve the update query performance. By executing the command SHOW CREATE TABLE INDEXED_MERCHANT; I'll get an output result as follows:
INDEXED_MERCHANT | CREATE TABLE `INDEXED_MERCHANT` (
`ID` varchar(50) NOT NULL,
`NAME` varchar(200) DEFAULT NULL,
`ONLINE_STATUS` varchar(10) NOT NULL,
`A` varchar(100) DEFAULT NULL,
`B` varchar(200) DEFAULT NULL,
PRIMARY KEY (`ID`),
KEY `NAME` (`NAME`,`A`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
Here this means both my keys are recognized as indexed keys. When I execute the following command, the "Extra" column result says the query doesn't using the index keys. How should I achieve my goal ?
Executed query : EXPLAIN EXTENDED UPDATE INDEXED_MERCHANT SET ONLINE_STATUS = '0' WHERE NAME = 'A 205' AND A = 'P 205';
Result:
+----+-------------+------------------+-------+---------------+------+---------+-------------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | **Extra** |
+----+-------------+------------------+-------+---------------+------+---------+-------------+------+----------+-------------+
| 1 | SIMPLE | INDEXED_MERCHANT | range | NAME | NAME | 906 | const,const | 1 | 100.00 | **Using where** |
+----+-------------+------------------+-------+---------------+------+---------+-------------+------+----------+-------------+
Your query is hitting "NAME" index as evident in explain output column "key".
Here is the explanation of key and extra columns from mysql documentation
key
The key column indicates the key (index) that MySQL actually decided to use. If MySQL decides to use one of the possible_keys indexes to look up rows, that index is listed as the key value.
Using where
A WHERE clause is used to restrict which rows to match against the next table or send to the client. Unless you specifically intend to fetch or examine all rows from the table, you may have something wrong in your query if the Extra value is not Using where and the table join type is ALL or index. Even if you are using an index for all parts of a WHERE clause, you may see Using where if the column can be NULL.

Optimize COUNT(*) with MATCH ... AGAINST

I am using COUNT(*) with MATCH() ... AGAINST(). My specific query is as follows:
SELECT COUNT(*) FROM `source_code` WHERE MATCH(`html`) AGAINST ('title');
I get results after a few seconds:
+----------+
| count(*) |
+----------+
| 17346 |
+----------+
1 row in set (16.30 sec)
After running the query multiple times, the query always takes around 16 seconds to complete.
Is there any way to speed up this query? Why isn't query cache caching the results of this query?
In case it's helpful, here is the EXPLAIN and CREATE TABLE statements:
EXPLAIN SELECT COUNT(*) FROM `source_code` WHERE MATCH(`html_w`) AGAINST ('title');
+----+-------------+-------------+----------+---------------+--------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+----------+---------------+--------+---------+------+------+-------------+
| 1 | SIMPLE | source_code | fulltext | html | html | 0 | | 1 | Using where |
+----+-------------+-------------+----------+---------------+--------+---------+------+------+-------------+
Looks like the index is being used. (Maybe the overhead is that the query is still Using where? Is it normal for key_len to be 0?)
SHOW CREATE TABLE `source_code`;
CREATE TABLE `source_code` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`url` varchar(255) NOT NULL,
`domain` varchar(255) DEFAULT NULL,
`title` varchar(255) DEFAULT NULL,
`html` longtext,
`crawled` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `url` (`url`),
KEY `crawled` (`crawled`),
KEY `domain` (`domain`),
FULLTEXT KEY `html` (`html`)
) ENGINE=MyISAM AUTO_INCREMENT=78707 DEFAULT CHARSET=latin1
Nothing too crazy in the CREATE TABLE statement.
Unlike many other databases, mysql is very good had handling select count(*) queries when there is an index that covers the entire table. In your case you do have an index that covers the whole table but it's different from a normal primary key since it's a full text index.
You can see that the query analyzer tries to use that index (possible_keys) but it's actually unable to us it.
The key_len column indicates the length of the key that MySQL decided
to use. The length is NULL if the key column says NULL. Note that the
value of key_len enables you to determine how many parts of a
multiple-part key MySQL actually uses
It's most unusual for key_len to be 0 instead of null, but what it means is that 0 parts of your index was used for the query.
As for how to optimize this? THe answer is it's very difficult. The only thing I can think of is to create a stop word list and the other is to set the minimum word length. Both these go into your my.cnf file.

mysql file sort happens even with indexes -- How can I fix

I have a simple query below that I can to make sure runs fast if the table. I did an explain on the query and it says Using where; Using file sort. Is there a way to get rid of the file sort? The data has only about 25 items in it now; but it could end up by 300 or more.
mysql> show create table phppos_categories;
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| phppos_categories | CREATE TABLE `phppos_categories` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`parent_id` int(11) DEFAULT NULL,
`name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
KEY `phppos_categories_ibfk_1` (`parent_id`),
KEY `name` (`name`),
KEY `parent_id` (`parent_id`),
CONSTRAINT `phppos_categories_ibfk_1` FOREIGN KEY (`parent_id`) REFERENCES `phppos_categories` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=25 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci |
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> SELECT * FROM (`phppos_categories`) WHERE `parent_id` = 5 ORDER BY `name` asc;
+----+-----------+-------------------+
| id | parent_id | name |
+----+-----------+-------------------+
| 3 | 5 | Basketball Shoes |
| 7 | 5 | Basketball Shorts |
+----+-----------+-------------------+
2 rows in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM (`phppos_categories`) WHERE `parent_id` = 5 ORDER BY `name` asc;
+----+-------------+-------------------+------+------------------------------------+--------------------------+---------+-------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+------+------------------------------------+--------------------------+---------+-------+------+-----------------------------+
| 1 | SIMPLE | phppos_categories | ref | phppos_categories_ibfk_1,parent_id | phppos_categories_ibfk_1 | 5 | const | 2 | Using where; Using filesort |
+----+-------------+-------------------+------+------------------------------------+--------------------------+---------+-------+------+-----------------------------+
1 row in set (0.00 sec)
mysql>
You can potentially remove the filesort here by adding a multi-column index on (parent_id, name).
ALTER TABLE phppos_categories ADD INDEX (parent_id,name);
Generally speaking MySQL will only use a single index when it has to potentially sort with one of them (if you are simply using two indexes to query the table, it MAY use two indexes using index-merge but often that is not the case anyway). The solution is to create a single index covering all of the columns you need to query.
Second to this, MySQL can only do a "range" search or "sort" with the last column in the index that it uses. Any columns before that must be an exact equality match.
On that basis we can create an index with parent_id first, which has an exact equality match (=5) and then on name which is your order constraint.
Main thing to be aware of is that you don't want to add more indexes than necessary, and an index to suit every single possible query may not be sensible, especially given the additional storage space and work required to keep the index up to date. As part of that comparison, you also need to consider how often the table is updated. If it is seldom updated then more indexes are potentially less of an issue.
See here for some more information:
https://dev.mysql.com/doc/refman/5.6/en/order-by-optimization.html
You are seeing 'using filesort' because you are ordering by the name column which is a varchar(255) field.
ORDER BY `name` asc;
Explain reports that the number of records examined is only 2, and the index on the parent_id is being used. There are 2 rows in your result set. Therefore, MySQL did not do any extra or unnecessary work.
Personally, I would not worry about trying to get rid of the 'using filesort' in this case. Even if you had 300 rows in the table, you are still limiting the result set with where on an indexed field (parent_id), and the number of examined rows will equal the number in your result set.

Simple ordered MySQL query running very slow on a large table

I have a table with ~ 1.500.000 records:
CREATE TABLE `item_locale` (
`item_id` bigint(20) NOT NULL,
`language` int(11) NOT NULL,
`name` varchar(256) COLLATE utf8_czech_ci NOT NULL,
`text` text COLLATE utf8_czech_ci)
PRIMARY KEY (`item_id`,`language`),
KEY `name` (`name`(255))
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_czech_ci;
With item_id, language as primary keys and index on name with size 255.
With following query:
select item_id, name from item_locale order by name limit 50;
The select takes around 3 seconds event though only 50 rows were required.
What can I do to speed up such query?
EDIT: Some of you suggested adding an INDEX. I mentioned above, that the name column is indexed with size 255.
I runned explain on the command:
+----+-------------+---------------+------+---------------+------+---------+------+---------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+------+---------------+------+---------+------+---------+----------------+
| 1 | SIMPLE | item_locale | ALL | NULL | NULL | NULL | NULL | 1558653 | Using filesort |
+----+-------------+---------------+------+---------------+------+---------+------+---------+----------------+
Strange thing is that it is seems not to use any index...
Retrieving 50 Records is heavier too. Limit them to 10 Since you are using Order by also..
Try to use query hint:
select item_id, name
from item_locale USE INDEX FOR ORDER BY (name)
order by name limit 50;
also try to use
select item_id, name
from item_locale FORCE INDEX (name)
order by name limit 50;
In the end, there was some kind of problem with indexes - I dropped them all and recreated them again. And it finally works. Thanks.
Apply an index on the name field which might speed it up a bit.

How to optimize a query that's using group by on a large number of rows

The table looks like this:
CREATE TABLE `tweet_tweet` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`text` varchar(256) NOT NULL,
`created_at` datetime NOT NULL,
`created_date` date NOT NULL,
...
`positive_sentiment` decimal(5,2) DEFAULT NULL,
`negative_sentiment` decimal(5,2) DEFAULT NULL,
`entity_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `tweet_tweet_entity_created` (`entity_id`,`created_at`)
) ENGINE=MyISAM AUTO_INCREMENT=1097134 DEFAULT CHARSET=utf8
The explain on the query looks like this:
mysql> explain SELECT `tweet_tweet`.`entity_id`,
STDDEV_POP(`tweet_tweet`.`positive_sentiment`) AS `sentiment_stddev`,
AVG(`tweet_tweet`.`positive_sentiment`) AS `sentiment_avg`,
COUNT(`tweet_tweet`.`id`) AS `tweet_count`
FROM `tweet_tweet`
WHERE `tweet_tweet`.`created_at` > '2010-10-06 16:24:43'
GROUP BY `tweet_tweet`.`entity_id` ORDER BY `tweet_tweet`.`entity_id` ASC;
+----+-------------+-------------+------+---------------+------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+---------------+------+---------+------+---------+----------------------------------------------+
| 1 | SIMPLE | tweet_tweet | ALL | NULL | NULL | NULL | NULL | 1097452 | Using where; Using temporary; Using filesort |
+----+-------------+-------------+------+---------------+------+---------+------+---------+----------------------------------------------+
1 row in set (0.00 sec)
About 300k rows are added to the table every day. The query runs about 4 seconds right now but I want to get it down to around 1 second and I'm afraid the query will take exponentially longer as the days go on. Total number of rows in tweet_tweet is currently only a little over 1M, but it will be growing fast.
Any thoughts on optimizing this? Do I need any more indexes? Should I be using something like Cassandra instead of MySQL? =)
You may try to reorder fields in the index (i.e. KEY tweet_tweet_entity_created (created_at, entity_id). That will allow mysql to use the index to reduce the quantity of actual rows that need to be grouped and ordered).
You're not using the index tweet_tweet_entity_created. Change your query to:
explain SELECT `tweet_tweet`.`entity_id`,
STDDEV_POP(`tweet_tweet`.`positive_sentiment`) AS `sentiment_stddev`,
AVG(`tweet_tweet`.`positive_sentiment`) AS `sentiment_avg`,
COUNT(`tweet_tweet`.`id`) AS `tweet_count`
FROM `tweet_tweet` FORCE INDEX (tweet_tweet_entity_created)
WHERE `tweet_tweet`.`created_at` > '2010-10-06 16:24:43'
GROUP BY `tweet_tweet`.`entity_id` ORDER BY `tweet_tweet`.`entity_id` ASC;
You can read more about index hints in the MySQL manual http://dev.mysql.com/doc/refman/5.1/en/index-hints.html
Sometimes MySQL's query optimizer needs a little help.
MySQL has a dirty little secret. When you create an index over multiple columns, only the first one is really "used". I've made tables that used Unique Keys and Foreign Keys, and I often had to set a separate index for one or more of the columns.
I suggest adding an extra index to just created_at at a minimum. I do not know if adding indexes to the aggregate columns will also speed things up.
if your mysql version 5.1 or higher ,you can consider partitioning option for large tables.
http://dev.mysql.com/doc/refman/5.1/en/partitioning.html