I am having troubles with a particular query being slow. Although everything is heavily indexed, some similar queries working fine and the indexes are used, the query still is slow as hell. I cannot understand why, so maybe anybody can help.
Just for the prerequisites: the write speed of the underlying table does not matter. The table contains ~3.5 million entries but I guess MySQL should handle that just fine.
The query that is being slow takes about 2s
SELECT DISTINCT t.`tag_3` FROM `image_tags` t
WHERE t.`type` = 1 AND t.`category` LIKE "00%" AND tag_1 = "0"
--- DESCRIBE OUTPUT
--- The used index thirdtag is just an index defined as (type, category, tag_1, tag_3)
--- The actual result is 201 rows
+----+-------------+-------+- -----------------------+----------+---------+------+---------+-------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-----------------+----------+---------+------+---------+-------------------------------------------+
| 1 | SIMPLE | t | range | [... A LOT ...] | thirdtag | 31 | NULL | 1652861 | Using where; Using index; Using temporary |
+----+-------------+-------+-------+-----------------+----------+---------+------+---------+-------------------------------------------+
The only thing that's standing out is the enormous amount of rows involved. If you compare with the 2 fast queries I attached to the end of this question it is literally the only thing different (at least from the first one). So most probably that's the problem. But that's how the data is given to me so I need to work with that. I thought if involved in the index mysql could handle the data just fine.
Does anybody have a suggestion how to optimize the query? Any suggestions if i could use different indexes that suit more to the query?
For comparison these 2 similar queries work blazing fast
--- just a longer category string resulting in fewer results
SELECT DISTINCT t.`tag_3` FROM `image_tags` t
WHERE t.`type` = 1 AND t.`category` LIKE "0000%" AND tag_1 = "0"
--- and additional where clause
SELECT DISTINCT t.`tag_3` FROM `image_tags` t
WHERE t.`type` = 1 AND t.`category` LIKE "00%" AND tag_1 = "0" and tag_2 = ""
The table (it has a lot of indexes probably too long to paste).
+----------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| image | char(8) | NO | MUL | NULL | |
| category | varchar(6) | YES | MUL | NULL | |
| type | tinyint(1) | NO | MUL | NULL | |
| tag_1 | char(3) | NO | MUL | NULL | |
| tag_2 | char(3) | NO | MUL | NULL | |
| tag_3 | char(3) | NO | MUL | NULL | |
| tag_4 | char(3) | NO | MUL | NULL | |
| tag_5 | char(3) | NO | MUL | NULL | |
| tag_6 | char(3) | NO | MUL | NULL | |
+----------+------------------+------+-----+---------+----------------+
Please provide SHOW CREATE TABLE, it is more descriptive than DESCRIBE! In particular, I cannot see what indexes you have.
As My index cookbook explains, start the index with any fields that are '=', then you get one chance to add a 'range' comparison. Your category is a range, so
WHERE t.`type` = 1 AND t.`category` LIKE "00%" AND tag_1 = "0"
does not get past category in
INDEX(type, category, tag_1, tag_3)
For your 3 queries, these are the best indexes:
INDEX(type, tag_1, category)
INDEX(type, tag_1, category)
INDEX(type, tag_1, tag_2, category)
category should be last; the other columns can be in any order. Perhaps some one of your indexes handled the 3rd case?
it has a lot of indexes probably too long to paste
Probably most of them are unused. Keep in mind that INDEX(a) is unnecessary if you also have INDEX(a,b).
Related
I need to make this update query more efficient.
UPDATE #table_name# SET #column_name2# = 1 WHERE #column_name1# in (A list of data)
Right now it takes more than 2 minute to finish the job when my list of data is quite large. Here is the result of explain of this query:
+----+-------------+--------------+-------+---------------+---------+---------+------+--------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+-------+---------------+---------+---------+------+--------+------------------------------+
| 1 | SIMPLE | #table_name# | index | NULL | PRIMARY | 38 | NULL | 763719 | Using where; Using temporary |
+----+-------------+--------------+-------+---------------+---------+---------+------+--------+------------------------------+
In class, I was told that an OK query should at least have a type of range and is better to reach ref. Right now mine is index, which is the second slowest I think. I'm wondering if there's a way to optimize that.
Here is the table format:
+--------------------+-------------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+-------------+------+-----+-------------------+-------+
| #column_name1# | varchar(12) | NO | PRI | | |
| #column_name2# | tinyint(4) | NO | | 0 | |
| #column_name3# | tinyint(4) | NO | | 0 | |
| ENTRY_TIME | datetime | NO | | CURRENT_TIMESTAMP | |
+--------------------+-------------+------+-----+-------------------+-------+
My friend suggested me that using exists rather than in clause may help. However, it looks like I cannot use exists like exists (A list of data)
For this query:
UPDATE #table_name#
SET #column_name2# = 1
WHERE #column_name1# in (A list of data);
You want an index on #table_name#(#column_name1#).
Do note that the number of records being updated has a very big impact on performance. If the "list of data" is really a subquery, then other methods are likely to be more helpful for improving performance.
I have the original table of 4 columns, described as follows:
+----------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+-------------+------+-----+---------+-------+
| FieldID | varchar(10) | NO | MUL | NULL | |
| PaperID | varchar(10) | NO | | NULL | |
| RefID | varchar(10) | NO | | NULL | |
| FieldID2 | varchar(10) | NO | MUL | NULL | |
+----------+-------------+------+-----+---------+-------+
I want to run a query with COUNT(*) and GROUP BY :
select FieldID, FieldID2, count(*) from nFPRF75_1 GROUP BY FieldID, FieldID2
I've created indexes on both column FieldID and column FieldID2, however, they seem to be ineffective. I have also tried OPTIMIZE table_name and created redundant indexes on these two columns (as is indicated by other optimization questions), unfortunately it didn't work out either.
Here is what I get from EXPLAIN:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+------+---------+------+----------+---------------------------------+
| 1 | SIMPLE | nFPRF75_1 | ALL | NULL | NULL | NULL | NULL | 90412507 | Using temporary; Using filesort |
I wonder if there's anyway that I can use indexes in this query, or any other way to optimize it. Now it's of very low efficiency since there's lots of lines.
Thanks a lot for the help!
You should create a multi-column index of (FieldID, FieldID2).
Create an index of FieldID, FieldID2 if you are grouping by them. That must improve the speed.
Also, I recommend you change count(*) to count('myIntColumn') which improve the speed too.
I have a big table, with 670k rows and I'm running a SELECT with a lot of WHEREs to search and filter useful results, the thing is sometimes there are NO results with the selected filters, and the query just goes all over the table and takes a lot of time, I'd like to stop the query if there are no results found in, say, 30 seconds.
This is my query:
SELECT date, s.name, l.id, l.title,ratingsum,numvotes,keyword,tag
from news_links l
LEFT JOIN sources s on s.id = l.source
WHERE
l.date BETWEEN STR_TO_DATE(?,'%Y-%m-%d')
AND STR_TO_DATE(?,'%Y-%m-%d')
AND s.name like ?
AND ((numvotes-1) *?) <= l.ratingsum
AND numvotes > ?
AND matches = 1
AND tag >= ?
AND tag <= ?
AND (l.title like ? or l.keyword like ?)
AND category >= ?
AND category <= ?
order by date desc
limit ?,15
I tried running a sub-query instead of joining but it didn't speed up the query.
News table(640k rows)
-----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | UNI | NULL | auto_increment |
| link | varchar(450) | NO | PRI | NULL | |
| date | datetime | NO | MUL | NULL | |
| title | varchar(145) | NO | MUL | NULL | |
| source | int(11) | NO | MUL | NULL | |
| text | mediumtext | YES | | NULL | |
| numvotes | int(3) | NO | MUL | 0 | |
| ratingsum | int(3) | NO | | 0 | |
| matches | int(1) | NO | | 0 | |
| keyword | varchar(45) | YES | | NULL | |
| tag | int(1) | NO | | 0 | |
+-----------+--------------+------+-----+---------+----------------+
I have indexes set up on date,title,source,numvotes as well as the primary key on link
670k rows should run VERY fast in MySQL. You should have a closer look at your indices. Start adding a combined HASH index on news_links.source and news_links.matches:
ALTER TABLE news_links ADD INDEX myIdx1 USING HASH (source, matches)
What does EXPLAIN SELECT ... gives you with that?
After that you can try to improve the Performance further by including more Information in your index (Note that MySQL will use only one index per table). Add a BTREE index:
ALTER TABLE news_links ADD INDEX myIdx2 USING BTREE (source, matches, `date`)
BTREE will be good for range-queries (eg with a BETWEEN in it). HASH is good for equal/unequal conditions. If you want to index several columns with mixed conditions (range an equal) use BTREE
What does EXPLAIN SELECT ... gives you now?
Is there anyway to get better performance out of this.
select * from p_all where sec='0P00009S33' order by date desc
Query took 0.1578 sec.
Table structure is shown below. There are more than 100 Millions records in this table.
+------------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------+---------------+------+-----+---------+-------+
| sec | varchar(10) | NO | PRI | NULL | |
| date | date | NO | PRI | NULL | |
| open | decimal(13,3) | NO | | NULL | |
| high | decimal(13,3) | NO | | NULL | |
| low | decimal(13,3) | NO | | NULL | |
| close | decimal(13,3) | NO | | NULL | |
| volume | decimal(13,3) | NO | | NULL | |
| unadjusted_close | decimal(13,3) | NO | | NULL | |
+------------------+---------------+------+-----+---------+-------+
EXPLAIN result
+----+-------------+-----------+------+---------------+---------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+---------+---------+-------+------+-------------+
| 1 | SIMPLE | price_all | ref | PRIMARY | PRIMARY | 12 | const | 1731 | Using where |
+----+-------------+-----------+------+---------------+---------+---------+-------+------+-------------+
How can i speed up this query?
In your example, you do a SELECT *, but you only have an INDEX that contains the columns sec and date.
In result, MySQLs execution plan roughly looks like the following:
Find all rows that have sec = 0P00009S33 in the INDEX. This is fast.
Sort all returned rows by date. This is also possibly fast, depending on the size of your MySQL buffer. Here is possibly room for improvement by optimizing the sort_buffer_size.
Fetch all columns (= full row) for each returned row from the previous INDEX query. This is slow! see (1)
You can optimize this drastically by reducing the SELECTed fields to the minimum. Example: If you only need the open price, do only a SELECT sec, date, open instead of SELECT *.
When you identified the minimum columns you need to query, add a combined INDEX that contains exactly these colums (all columns involved - in the WHERE, SELECT or ORDER BY clause)
This way you can completely skip the slow part of this query, (3) in my example above. When the INDEX already contains all necessary columns, MySQLs optimizer can avoid looking up the full columns and serve your query directly from the INDEX.
Disclaimer: I'm unsure in which order MySQL executes the steps, possibly i ordered (2) and (3) the wrong way round. But this is not important to answer this question, though.
I have a question about, how to analyze a query to know performance of its (good or bad).
I searched a lot and got something like below:
SELECT count(*) FROM users; => Many experts said it's bad.
SELECT count(id) FROM users; => Many experts said it's good.
Please see the table:
+---------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-------------+------+-----+---------+----------------+
| userId | int(11) | NO | PRI | NULL | auto_increment |
| f_name | varchar(50) | YES | | NULL | |
| l_name | varchar(50) | YES | | NULL | |
| user_name | varchar(50) | NO | | NULL | |
| password | varchar(50) | YES | | NULL | |
| email | varchar(50) | YES | | NULL | |
| active | char(1) | NO | | Y | |
| groupId | smallint(4) | YES | MUL | NULL | |
| created_date | datetime | YES | | NULL | |
| modified_date | datetime | YES | | NULL | |
+---------------+-------------+------+-----+---------+----------------+
But when I try to using EXPLAIN command for that, I got the results:
EXPLAIN SELECT count(*) FROM `user`;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | user | index | NULL | groupId | 3 | NULL | 83 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
EXPLAIN SELECT count(userId) FROM user;
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
| 1 | SIMPLE | user | index | NULL | groupId | 3 | NULL | 83 | Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)
So, the first thing for me:
Can I understand it's the same performance?
P/S: MySQL version is 5.5.8.
No, you cannot. Explain doesn't reflect all the work done by mysql, it just gives you a plan of how it will be performed.
What about specifically count(*) vs count(id). The first one is always not slower than the second, and in some cases it is faster.
count(col) semantic is amount of not null values, while count(*) is - amount of rows.
Probably mysql can optimize count(col) by rewriting into count(*) as well as id is a PK thus cannot be NULL (if not - it looks up for NULLS, which is not fast), but I still propose you to use COUNT(*) in such cases.
Also - the internall processes depend on used storage engine. For myisam the precalculated number of rows returned in both cases (as long as you don't use WHERE).
In the example you give the performance is identical.
The execution plan shows you that the optimiser is clever enough to know that it should use the Primary key to find the total number of records when you use count(*).
There is not significant difference when it comes on counting. The reason is that most optimizers will figure out the best way to count rows by themselves.
The performance difference comes to searching for values and lack of indexing. So if you search for a field that has no index assigned {f_name,l_name} and a field that has{userID(mysql automatically use index on primary keys),groupID(seems like foraign key)} then you will see the difference in performance.