Optimizing a MySql query: Query only having 'using where' in explain extra - mysql

I have table with following schema:
+----------------------+--------------+--------------+-----+---------+-----------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+--------------+--------------+-----+---------+-----------+
| request_id | bigint(20) | NO | PRI | | |
| marketplace_id | int(11) | NO | PRI | | |
| feed_attribute_name | varchar(256) | NO | PRI | | |
| full_update_count | int(11) | NO | | | |
| partial_update_count | int(11) | NO | | | |
| ptd | varchar(256) | NO | PRI | | |
| processed_date | datetime | NO | PRI | | |
+----------------------+--------------+--------------+-----+---------+-----------+
and I am querying it like this:
EXPLAIN SELECT SUM(full_update_count) as total FROM
x.attribute_usage_information WHERE marketplace_id=6
AND ptd='Y' AND processed_date>2013-12-31 AND
feed_attribute_name='abc'
The query plan is:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE X ALL 1913668816 Using where
I am new to query optimization so my inferences can be wrong.
I am surprised that it is not using index which can be a reason for its slow execultion(around an hour). The table size is of order of 10^10. Can this query be rewritten so that it uses index because where clause is part a subset of the primary key set for the table?
EDIT: SHOW INDEX result
+---------------------------+------------+------------+--------------+----------------+------
|Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation Cardinality Sub_part Packed Null Index_type Comment
|attribute_usage_information | 0 | PRIMARY | 1 | request_id | A 2901956 BTREE
|attribute_usage_information | 0 | PRIMARY | 2 | marketplace_id | A 2901956 BTREE
|attribute_usage_information | 0 | PRIMARY | 3 | | feed_attribute_name A 273613033 BTREE
|attribute_usage_information | 0 | PRIMARY | 4 | ptd | A 1915291236 BTREE
|attribute_usage_information | 0 | PRIMARY | 5 | processed_date | A 1915291236 BTREE
EDIT 2: SHOW GRANT RESULT
GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, RELOAD, PROCESS, REFERENCES, INDEX, ALTER, SHOW DATABASES, CREATE TEMPORARY TABLES, LOCK TABLES, EXECUTE, REPLICATION CLIENT, CREATE VIEW, SHOW VIEW, CREATE ROUTINE, ALTER ROUTINE, CREATE USER, EVENT, TRIGGER ON *.* TO 'data_usage_rw'#'%' IDENTIFIED BY PASSWORD *** WITH GRANT OPTION

Your query:
SELECT SUM(full_update_count) as total
FROM x.attribute_usage_information
WHERE marketplace_id=6 AND ptd='Y' AND processed_date>2013-12-31 AND
feed_attribute_name='abc';
The "using where" is saying that MySQL is doing a full table scan. This is a simple query, so the only optimization approach is to create an index that reduces the number of rows being processed. he best index for this query is x.attribute_usage_information(marketplace_id, ptd, feed_attribute_name, processed_date, full_update_count).
You can create it as:
create index attribute_usage_information_idx on x.attribute_usage_information(marketplace_id, ptd, feed_attribute_name, processed_date, full_update_count);
By including full_update_count, this is a covering index. That further speeds the query because all columns used in the query are in the index. The execution engine does not need to look up values on the original data pages.

Cover your WHERE conditions with a composite index(marketplace_id,ptd,processed_date,feed_attribute_name)
ALTER TABLE `tablename` ADD INDEX (marketplace_id,ptd,processed_date,feed_attribute_name)
Be patient,it will take a while.

Related

Drop Column from an index mysql

i'am having these indexes on my table:
mysql> SHOW INDEXES from sous_categories;
+-----------------+------------+----------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-----------------+------------+----------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| sous_categories | 0 | PRIMARY | 1 | id_sous_categorie | A | 16 | NULL | NULL | | BTREE | | | YES | NULL |
| sous_categories | 0 | PRIMARY | 2 | categorie_id | A | 16 | NULL | NULL | | BTREE | | | YES | NULL |
+-----------------+------------+----------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
2 rows in set (0.01 sec)
and i want to remove the second column, by checking MySQL INDEX STATMENT, i figured out that i can drop only by index_name, but in my case both has the same key_name.
How can i do that please ? (I know it's possible because i can do it with phpmyadmin but i want to know how to do it with command line)
ALTER TABLE sous_categories
DROP PRIMAR KEY,
ADD PRIMARY KEY(id_sous_categorie);
However, to be the PK, id_sous_categorie must be Unique (no dups) and NOT NULL.
The only effects will be
Time spent rebuilding the table. (Any change to the PK requires a rebuild.)
A fatal error if id_sous_categorie is not already unique. (Previously the pair of columns was constrained to be unique, but not each column.)

Why would cardinality of an index in a restored table be different from cardinality in the original table?

I'm testing a proprietary tool that dumps a table in a MySQL RDS to the parquet format, and then restores it into another MySQL RDS.
Both tables have the same amount of rows:
mysql> SELECT COUNT(*) FROM fox_owners;
+----------+
| COUNT(*) |
+----------+
| 118950 |
+----------+
The table itself is configured the same way in both cases:
mysql> SHOW CREATE TABLE fox_owners;
+------------+-------------------------------------------------------+
| Table | Create Table |
+------------+-------------------------------------------------------+
| fox_owners | CREATE TABLE `fox_owners` (
`name` mediumtext,
`owner_id` bigint NOT NULL,
PRIMARY KEY (`owner_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+------------+-------------------------------------------------------+
So far so good, right?
However, the table sizes are different.
The original:
+----------+----------------------+------------+
| Database | Table | Size in MB |
+----------+----------------------+------------+
| stam_db | fox_owners | 5582.52 |
The restored one:
+----------+----------------------+------------+
| Database | Table | Size in MB |
+----------+----------------------+------------+
| stam_db | fox_owners | 5584.52 |
The restored one is 2MB bigger!
However, what's really bugging me, is the change in cardinality of the indexes between the 2 tables.
Original:
mysql> SHOW INDEX FROM fox_owners;
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| fox_owners | 0 | PRIMARY | 1 | owner_id | A | 118728 | NULL | NULL | | BTREE | | | YES | NULL |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
Restored:
mysql> SHOW INDEX FROM fox_owners;
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| fox_owners | 0 | PRIMARY | 1 | owner_id | A | 117518 | NULL | NULL | | BTREE | | | YES | NULL |
+------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
Why would cardinality drop from 118728 to 117518?
If the restored table is less unique than the original - isn't this a clear sign that this table is different? How can I verify that these 2 tables in separate RDS databases have identical content?
And shouldn't the cardinality be 118950 in both of them anyway, since for a table with a single primary key column, the cardinality must be equal to the number of rows in the table?
I've ran ANALYZE TABLE on both tables, the values didn't change.
No Problem.
The "cardinality" is determined by making a small number of 'random' probes into the table. This leads to estimates. Sometimes the estimates are off by a factor of two or even more. 118728 and 117518 are unusually close to each other.
When loading/copying/altering a table, the BTrees are rebuilt. This leads to a likely variation of how the blocks of the BTree are laid out. So, it is normal to see the size (on disk) of a table change. A change of a factor of 2 is rare for this.

Composite index not used in mysql

According to MySQL docs a composite index will still be used if the leftmost fields are part of the criteria. However, this table will not join correctly with the primary key; I had to add another index of the left two fields which is then used.
One of the tables is memory, and I know that by default memory uses a hash index which can't be used for group/order. However I'm using all rows of the memory table and not the index, so I don't think that relates to the problem.
What am I missing?
mysql> show create table pr_temp;
| pr_temp | CREATE TEMPORARY TABLE `pr_temp` (
`player_id` int(10) unsigned NOT NULL,
`insert_date` date NOT NULL,
[...]
PRIMARY KEY (`player_id`,`insert_date`) USING BTREE,
KEY `insert_date` (`insert_date`)
) ENGINE=MEMORY DEFAULT CHARSET=utf8 |
mysql> show create table player_game_record;
| player_tank_record | CREATE TABLE `player_game_record` (
`player_id` int(10) unsigned NOT NULL,
`game_id` smallint(5) unsigned NOT NULL,
`insert_date` date NOT NULL,
[...]
PRIMARY KEY (`player_id`,`insert_date`,`game_id`),
KEY `insert_date` (`insert_date`),
KEY `player_date` (`player_id`,`insert_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 DATA DIRECTORY='...' INDEX DIRECTORY='...' |
mysql> explain select pgr.* from player_game_record pgr inner join pr_temp on pgr.player_id = pr_temp.player_id and pgr.insert_date = pr_temp.date_prev;
+----+-------------+---------+------+---------------------------------+-------------+---------+-------------------------------------------------------------------------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------------------------+-------------+---------+-------------------------------------------------------------------------+--------+-------+
| 1 | SIMPLE | pr_temp | ALL | PRIMARY | NULL | NULL | NULL | 174683 | |
| 1 | SIMPLE | pgr | ref | PRIMARY,insert_date,player_date | player_date | 7 | test_gamedb.pr_temp.player_id,test_gamedb.pr_temp.date_prev | 21 | |
+----+-------------+---------+------+---------------------------------+-------------+---------+-------------------------------------------------------------------------+--------+-------+
2 rows in set (0.00 sec)
mysql> explain select pgr.* from player_game_record pgr force index (primary) inner join pr_temp on pgr.player_id = pr_temp.player_id and pgr.insert_date = pr_temp.date_prev;
+----+-------------+---------+------+---------------+---------+---------+-------------------------------------------------------------------------+---------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+---------+---------+-------------------------------------------------------------------------+---------+-------+
| 1 | SIMPLE | pr_temp | ALL | PRIMARY | NULL | NULL | NULL | 174683 | |
| 1 | SIMPLE | pgr | ref | PRIMARY | PRIMARY | 7 | test_gamedb.pr_temp.player_id,test_gamedb.pr_temp.date_prev | 2873031 | |
+----+-------------+---------+------+---------------+---------+---------+-------------------------------------------------------------------------+---------+-------+
2 rows in set (0.00 sec)
I think the primary key should work, with the two left columns (player_id, insert_date) being used. However it will use the player_date index by default, and if I force it to use the primary index it looks like it's only using one field rather than both.
Update2: Mysql version 5.5.27-log
Update3:
(note this is after removing the player_date index while trying some other tests)
mysql> show indexes in player_game_record;
+--------------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| player_game_record | 0 | PRIMARY | 1 | player_id | A | NULL | NULL | NULL | | BTREE | | |
| player_game_record | 0 | PRIMARY | 2 | insert_date | A | NULL | NULL | NULL | | BTREE | | |
| player_game_record | 0 | PRIMARY | 3 | game_id | A | 576276246 | NULL | NULL | | BTREE | | |
| player_game_record | 1 | insert_date | 1 | insert_date | A | 33304 | NULL | NULL | | BTREE | | |
+--------------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (1.08 sec)
mysql> select count(*) from player_game_record;
+-----------+
| count(*) |
+-----------+
| 576276246 |
+-----------+
1 row in set (0.00 sec)
I agree that your use of the MEMORY storage engine for one of the tables should not at all be an issue here, since we're talking about the other table.
I also agree that the leftmost prefix of an index can be used exactly how you are trying to use it, and I cannot think of any reason why the primary key could not be used in exactly the same way as any other index.
This has been a head-scratcher. The new index you created "should" be the same as the left side of the primary key, so why don't they behave the same way? I have two thoughts, both of which lead me to the same recommendation, even though I am not as familiar with the internals of MyISAM as I am with InnoDB. (As an aside, I'd recommend InnoDB over MyISAM.)
The index on your primary key was presumably on the table when you began inserting data, while the new index was added while most or all of the data was already there. This suggests that your new index is nice and cleanly-organized internally, while your primary key index may be highly fragmented, having been built as the data was loaded.
The row count the optimizer shows is based on index statistics, which may be inaccurate on your primary key due to the insert order.
The fragmentation theory may explain why querying with the primary key as your index is not as fast; the index statistics theory may explain why the optimizer comes up with such a different row count and it may explain why the optimizer might have been choosing a full table scan instead of using that index (which is only a guess, since we don't have the explain available).
The thing I would suggest based on these two thoughts is running OPTIMIZE TABLE on your table. If it took 12 hours to build that new index, then optimizing the table may very possibly take that long or longer.
Possibly helpful: http://www.dbasquare.com/2012/07/09/data-fragmentation-problem-in-mysql-myisam/

How can a 'WHERE column LIKE "%expression%" ' perform better than a MATCH(column) AGAINST("expression") in MySQL?

I've run into a serious MySQL performance bottleneck which I'm unable to understand and resolve. Here are the table structures, indexes and record counts (bear with me, it's only two tables):
mysql> desc elggobjects_entity;
+-------------+---------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------------+------+-----+---------+-------+
| guid | bigint(20) unsigned | NO | PRI | NULL | |
| title | text | NO | MUL | NULL | |
| description | text | NO | | NULL | |
+-------------+---------------------+------+-----+---------+-------+
3 rows in set (0.00 sec)
mysql> show index from elggobjects_entity;
+--------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| elggobjects_entity | 0 | PRIMARY | 1 | guid | A | 613637 | NULL | NULL | | BTREE | |
| elggobjects_entity | 1 | title | 1 | title | NULL | 131 | NULL | NULL | | FULLTEXT | |
| elggobjects_entity | 1 | title | 2 | description | NULL | 131 | NULL | NULL | | FULLTEXT | |
+--------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
3 rows in set (0.00 sec)
mysql> select count(*) from elggobjects_entity;
+----------+
| count(*) |
+----------+
| 613637 |
+----------+
1 row in set (0.00 sec)
mysql> desc elggentity_relationships;
+--------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| guid_one | bigint(20) unsigned | NO | MUL | NULL | |
| relationship | varchar(50) | NO | MUL | NULL | |
| guid_two | bigint(20) unsigned | NO | MUL | NULL | |
| time_created | int(11) | NO | | NULL | |
+--------------+---------------------+------+-----+---------+----------------+
5 rows in set (0.00 sec)
mysql> show index from elggentity_relationships;
+--------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
| elggentity_relationships | 0 | PRIMARY | 1 | id | A | 11408236 | NULL | NULL | | BTREE | |
| elggentity_relationships | 0 | guid_one | 1 | guid_one | A | NULL | NULL | NULL | | BTREE | |
| elggentity_relationships | 0 | guid_one | 2 | relationship | A | NULL | NULL | NULL | | BTREE | |
| elggentity_relationships | 0 | guid_one | 3 | guid_two | A | 11408236 | NULL | NULL | | BTREE | |
| elggentity_relationships | 1 | relationship | 1 | relationship | A | 11408236 | NULL | NULL | | BTREE | |
| elggentity_relationships | 1 | guid_two | 1 | guid_two | A | 11408236 | NULL | NULL | | BTREE | |
+--------------------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+
6 rows in set (0.00 sec)
mysql> select count(*) from elggentity_relationships;
+----------+
| count(*) |
+----------+
| 11408236 |
+----------+
1 row in set (0.00 sec)
Now I'd like to use an INNER JOIN on those two tables and perform a full text search.
Query:
SELECT
count(DISTINCT o.guid) as total
FROM
elggobjects_entity o
INNER JOIN
elggentity_relationships r on (r.relationship="image" AND r.guid_one = o.guid)
WHERE
((MATCH (o.title, o.description) AGAINST ('scelerisque' )))
This gave me a 6 minute (!) response time.
On the other hand this one
SELECT
count(DISTINCT o.guid) as total
FROM
elggobjects_entity o
INNER JOIN
elggentity_relationships r on (r.relationship="image" AND r.guid_one = o.guid)
WHERE
((o.title like "%scelerisque%") OR (o.description like "%scelerisque%"))
returned the same count value in 0.02 seconds.
How is that possible? What am I missing here?
(MySQL info: mysql Ver 14.14 Distrib 5.1.49, for debian-linux-gnu (x86_64) using readline 6.1)
EDIT
EXPLAINing the first query (using match .. against) gives:
+----+-------------+-------+----------+-----------------------+--------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+----------+-----------------------+--------------+---------+-------+------+-------------+
| 1 | SIMPLE | r | ref | guid_one,relationship | relationship | 152 | const | 6145 | Using where |
| 1 | SIMPLE | o | fulltext | PRIMARY,title | title | 0 | | 1 | Using where |
+----+-------------+-------+----------+-----------------------+--------------+---------+-------+------+-------------+
while the second query (using LIKE "%..%"):
+----+-------------+-------+--------+-----------------------+--------------+---------+---------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-----------------------+--------------+---------+---------------------+------+-------------+
| 1 | SIMPLE | r | ref | guid_one,relationship | relationship | 152 | const | 6145 | Using where |
| 1 | SIMPLE | o | eq_ref | PRIMARY | PRIMARY | 8 | elgg1710.r.guid_one | 1 | Using where |
+----+-------------+-------+--------+-----------------------+--------------+---------+---------------------+------+-------------+
By combining your experience and EXPLAIN's results, it seems that fulltext index is not as useful as you expect in this particular case. This depends on particular data in your database, on database structure or/and particular query.
Usually database engines use no more than one index per table. So when the table has more than one index, query optimizer tries to use the better one. But optimizer is not always clever enough.
EXPLAIN's output shows that database query optimizer decided to use indexes for relationship and title. The relationship filter reduces table elggentity_relationships to 6145 rows. And the title filter reduces the table elggobjects_entity to 72697 rows. Then MySQL needs to join those tables (6145 x 72697 = 446723065 filtering operations) without using any index because indexes have already been used for filtering. In this case this can be too much. MySQL can even make a decision to keep intermediate calculations in the hard drive by trying to keep enough free space in memory.
Now let's take a look into another query. It uses relationship and PRIMARY KEY (of table elggobjects_entity) as its indexes. The relationship filter reduces table elggentity_relationships to 6145 rows. By joining those tables on PRIMARY KEY index, the result gets only 3957 rows. This is not much for the last filter (i.e. LIKE "%scelerisque%"), even if index is NOT used for this purpose at all.
As you can see the speed much depends on indexes selected for a query. So, in this particular case the PRIMARY KEY index is much more useful than fulltext title index, because PRIMARY KEY has bigger impact for result reduction than title.
MySQL is not always clever to set the right indexes. We can do this manually, by using clauses like IGNORE INDEX (index_name), FORCE INDEX (index_name), etc.
But in your case the problem is that if we use MATCH() AGAINST() in a query then the fulltext index is required, because MATCH() AGAINST() doesn't work without fulltext index at all. So this is the main reason why MySQL has chosen incorrect indexes for the query.
UPDATE
OK, I did some investigation.
Firstly, you may try to force MySQL to use guid_one index instead of relationship on table elggentity_relationships: USE INDEX (guid_one).
But for even better performance I think you can try to create one index for the composition of two columns (guid_one, membership). Current index guid_one is very similar, but for 3 columns, not for 2. In this query there are only 2 columns used. In my opinion after index creation MySQL should automatically use the right index. If not, force MySQL to use it.
Note: After index creation don't forget to remove old USE INDEX instruction from your query, because this may prevent query from using the newly created index. :)

Is it a problem that I somehow have managed to get two indexes with the same name in a MySQL table?

Somehow I managed to get two indexes named user_id, as shown below. Should I drop, rename and rebuild one of them, or is this no problem?
SHOW INDEXES FROM core_item;
+-----------+------------+-----------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------+------------+-----------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
| core_item | 0 | PRIMARY | 1 | id | A | 593642 | NULL | NULL | | BTREE | |
| core_item | 0 | user_id | 1 | user_id | A | 11416 | NULL | NULL | | BTREE | |
| core_item | 0 | user_id | 2 | product_id | A | 593642 | NULL | NULL | | BTREE | |
+-----------+------------+-----------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
It's a single composite index covering 2 columns.
I think that output of SHOW CREATE TABLE core_item is easier to understand.
You have just one index, but it is across two fields - it is a composite key on user_id and product_id. It would be created like this:
ALTER TABLE core_item ADD INDEX `user_id` (`user_id`, `product_id`);
It might be worth renaming it to something else to save any future confusion, but only if the rename will not affect any existing queries that specify indexes directly.