I've got a huge table that looks like this:
CREATE TABLE `images` (
`image_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(10) unsigned NOT NULL,
`data` mediumblob,
PRIMARY KEY (`user_id`,`image_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
on which I have to run a query which compresses the blob field. Will MySQL be able to use index with the following query:
UPDATE images SET data = COMPRESS(data) WHERE (user_id = ? AND image_id = ?) OR (user_id = ? AND image_id = ?) OR (...) OR (...);
I have to do it like this since there's no way I can update the whole table in a single query and I can't update by only using user_id.
EDIT: explain and update doesn't work, you guys know that, right?
Yes, your update will use the indexes on the table, since the only columns referred after the WHERE are the ones your primary key stands of.
If you are unsure of what a query will use, feel free to use the EXPLAIN command:
http://dev.mysql.com/doc/refman/5.0/en/explain.html
Use
EXPLAIN SELECT data FROM images
WHERE (user_id = ? AND image_id = ?)
OR (user_id = ? AND image_id = ?)
OR (...)
OR (...);
In your example, I'd expect MySQL to use the primary key as the clustered index. That means the index stores all columns, and is actually the only version of the data on disk.
So yes, it will use the index. With a condition including ors, I'd expect MySQL to scan the index (not seek.)
This is better to check using EXPLAIN than to speculate about too much.
Before even running EXPLAIN, issue ANALYZE TABLE to make sure that the query optimizer has the best chances to find an optimum query plan.
Yes, it will most probably use the index (unless your conditions have really low cardinality).
MySQL also supports this syntax:
(user_id, image_id) IN ((user1, image1), (user2, image2), (user3, image3))
but this one will not use the index (this is just an implementation flaw).
You can also use this query:
UPDATE (
SELECT user1 AS user_id, image1 AS image_id
UNION ALL
SELECT user2 AS user_id, image2 AS image_id
UNION ALL
SELECT user3 AS user_id, image3 AS image_id
) q
JOIN images i
ON (i.user_id, i.image_id) = (q.user_id, q.image_id)
SET i.data = COMPRESS(i.data)
which will also use the index.
Related
I have a many-to-many relationship database in MySQL
And this Query:
SELECT main_id FROM posts_tag
WHERE post_id IN ('134','140','187')
GROUP BY main_id
HAVING COUNT(DISTINCT post_id) = 3
There are ~5,300,000 rows into this table and that query seems to be slow like 5 seconds (and slower if I add more ids into search)
I want to ask if there is any way to make it faster?
EXPLAIN shows this:
By the way, I want to add more conditions like NOT IN and possible JOIN new tables which has same structure but different data. Not so much like this but first I want to know if there is any way to make that simple query faster?
Any advice would be helpful, even another method, or structure etc.
PS: Hardware is Intel Core i9 3.6Ghz, 64GB RAM, 480GB SSD. So I think the server specs is not a problem.
Use a "composite" and "covering" index:
INDEX(post_id, main_id)
And get rid of INDEX(post_id) since it will then be redundant.
"Covering" helps speed up a query.
Assuming this is a normal "many-to-many" table, then:
CREATE TABLE post_main (
post_id -- similar to `id` in table `posts`
main_id -- similar to `id` in table `main`
PRIMARY KEY(post_id, main_id),
INDEX(main_id, post_id)
) ENGINE=InnoDB;
There is no need for AUTO_INCREMENT anywhere in a many-to-many table.
(You could add FK constraints, but I say 'why bother'.)
More discussion: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
And NOT IN
This gets a bit tricky. I think this is one way; there may be others.
SELECT main_id
FROM post_main
WHERE post_id IN (244,229,193,93,61)
GROUP BY main_id AS x
HAVING COUNT(*) = 5
AND NOT EXISTS ( SELECT 1
FROM post_main
WHERE main_id = x.main_id
AND post_id IN (92,10,234) );
Alexfsk, your Query on the second line has the IN variables surrounded by single quotes. When your column name is defined as INT or mediumint (or any kind of int) datatype, adding the single quotes around the data causes datatype conversion delays on every row considered and delays completion of your query.
I've got simple query with big IN clause:
SELECT test_id FROM sample WHERE id IN (99 000 of ids);
The explain gives me this result:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE sample range PRIMARY PRIMARY 4 NULL 40 Using where
The id field is primary key of sample table (~320 000 rows) and test_id is foreign key to test table - both are mysql InnoDB tables. Query takes over 2000 secs! I tried to join tables but it took a similar time. After some research i found this topic but the correct answer was only saying what the problem may be (which i don't understand, to be honest :/ ) and there is no solution other than
If these are in cache, the query should run fast
How can i speed up this query?
Please be as precise as possible, cause as I found out I'm a optimization novice.
EDIT 1:
SHOW CREATE TABLE sample
CREATE TABLE `sample` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`test_id` int(11) NOT NULL,
...
PRIMARY KEY (`id`),
KEY `sample_FI_1` (`test_id`),
... other keys ...,
CONSTRAINT `sample_FK_1` FOREIGN KEY (`test_id`) REFERENCES `test` (`id`),
... other foreign keys ...
) ENGINE=InnoDB AUTO_INCREMENT=315607 DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci
There was simple join something like this:
SELECT t.* FROM test t JOIN sample s ON t.id = s.test_id JOIN sample_x x ON s.id = x.sample_id WHERE x.field_id = '321' AND x.value LIKE '%smth%';
innodb_buffer_pool_size:
SELECT # #innodb_buffer_pool_size /1024 /1024 /1024
##innodb_buffer_pool_size/1024/1024/1024
24.000000000000
Statuses:
SHOW TABLE STATUS FROM zert1442
Name Engine Version Row_format Rows Avg_row_length Data_length Max_data_length Index_length Data_free Auto_increment Create_time Update_time Check_time Collation Checksum Create_options Comment
...
sample InnoDB 10 Compact 357323 592 211632128 0 54837248 7340032 315647 2017-02-15 10:22:03 NULL NULL utf8_general_ci NULL
test InnoDB 10 Compact 174915 519 90865664 0 33947648 4194304 147167 2017-02-15 10:22:03 NULL NULL utf8_general_ci NULL
...
Here is your query:
SELECT t.*
FROM test t
JOIN sample s ON t.id = s.test_id
JOIN sample_x x ON s.id = x.sample_id
WHERE x.field_id = '321'
AND x.value LIKE '%smth%'
Unfortunately you didn't provide the SHOW CREATE TABLE output for the test or sample_x tables.
Regardless, add this index if it doesn't already exist:
ALTER TABLE sample_x
ADD INDEX `wr1` (`sample_id`,`field_id`,`value`)
This should improve things a bit. However, using LIKE with a wildcard at the start of the string cannot be improved (at least not without fulltext indexes.)
You can also try this index, too:
ALTER TABLE sample_x
ADD INDEX `wr2` (`field_id`,`value`,`sample_id`)
This will allow the optimizer to start at the sample_x table and then work backwards towards the test table. Which it prefers to do will depend on a lot of factors.
You can remove either of these indexes with the following:
ALTER TABLE sample_x
DROP INDEX `wr1`
Or
ALTER TABLE sample_x
DROP INDEX `wr2`
Experiment to see which helps your query the most, if either of them do. When measuring performance always run the query twice, and throw out the first result. That is because the buffer cache needs to be populated the first time, and so it can take longer and won't be an accurate measure of the real improvement.
First things first: what do you consider 'slow'? You're getting 99k records out of a table, that is bound to take some time!
In my experience, if your IN() list contains 99.000 values I'd REALLY consider putting these into a (temporary) table first and adding a unique index on said table (or PK if you prefer) and then JOIN that table against your sample table. That said, I'm not sure what the fastest way would be to get 99k id's into that (temporary) table; in .Net/MSSQL I'd use the SqlBulkCopy object, I'm not sure what you are using there.
PS: You can off course simply stick to rather verbose INSERT statements but I fear that inserting 99k values like that will take even more time than what you're seeing now. Where are these values coming from in the first place? The more I think about it, the more I'm guessing your underlying approach might be 'off'.
I need to optimize a table (It's INNODB) where I'm going to do a query using IN in two columns.
This is the query:
SELECT `emails`.* FROM `emails`
WHERE (`from` IN ('some#email.com', 'other#email.com') OR `to` IN ('some#email.com', 'other#email.com'))
the from and to fields are VARCHAR(255)
How I can create an index to help speed it up. Or if I should change my query strategy please let me know
I'm not sure if I should create one index for each column, or a single index with the two columns. I'm also not sure if the IN clause will make the index work or not.
question 1 - which index to create
Just create one index for each column.
After that mysql would be able combine results of each index into one set.
More on the subject
question 2 - in clauses
You can check it out yourself via explain after you create the indexes. It should work. If for some reason your mysql version doesn't use the index for IN queries, you can rewrite your original query using OR because your query is equivalent to
`from` = 'some#email.com' OR `from` = 'other#email.com' OR `to` = 'some#email.com' OR `to` = 'other#email.com'
In case "index merge" does not kick in, "turn OR into UNION":
SELECT *
FROM `emails`
WHERE `from` IN ('some#email.com', 'other#email.com')
UNION
SELECT *
FROM `emails`
WHERE `from``to` IN ('some#email.com', 'other#email.com')
And have
INDEX(`from`),
INDEX(`to`)
I've done a lot of reading and Googling on this and I cannot find any satisfactory answer so I'd appreciate any help. Most answers I find come close to my situation but do not address it (and attempting to follow the solutions has not done me any good).
See Edit #2 below for the best example
[This was the original question but is not a great representation of what I'm asking.]
Say I have 2 tables, each with 4 columns:
key (int, auto increment)
c1 (a date)
c2 (a varchar of length 3)
c3 (also a varchar of length 3)
And I want to perform the following query:
SELECT t.c1, t.c2, COUNT(*)
FROM test1 t
LEFT JOIN test2 t2 ON t2.key = t.key
GROUP BY t.c1, t.c2
Both key fields are indexed as primary keys. I want to get the number of rows returned in each grouping of c1, c2.
When I explain this query I get "using temporary; using filesort". The actual table I'm performing this query on is over 500,000 rows, so that means it's a time consuming query.
So my question is (assuming I'm not doing anything wrong in the query): is there a way to index this table to eliminate the temporary/filesort usage?
Thanks in advance for any help.
Edit
Here is the table definition (in this example both tables are identical - in reality they're not but I'm not sure it makes a difference at this point):
CREATE TABLE `test1` (
`key` int(11) NOT NULL auto_increment,
`c1` date NOT NULL,
`c2` varchar(3) NOT NULL,
`c3` varchar(3) NOT NULL,
PRIMARY KEY (`key`),
UNIQUE KEY `c1` (`c1`,`c2`),
UNIQUE KEY `c2_2` (`c2`,`c1`),
KEY `c2` (`c2`,`c3`)
) ENGINE=MyISAM AUTO_INCREMENT=3 DEFAULT CHARSET=utf8
Full EXPLAIN statement:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t ALL NULL NULL NULL NULL 2 Using temporary; Using filesort
1 SIMPLE t2 eq_ref PRIMARY PRIMARY 4 tracking.t.key 1 Using index
This is just for my sample tables. In my real tables the rows for t says 500,000+ (every row in the table, though that could be related to something else).
Edit #2
Here is a more concrete example to better explain my situation.
Let's say I have data on Little League baseball games. I have two tables. One holds data on the games:
CREATE TABLE `ex_games` (
`game_id` int(11) NOT NULL auto_increment,
`home_team` int(11) NOT NULL,
`date` date NOT NULL,
PRIMARY KEY (`game_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
The other holds data on the at bats in each game:
CREATE TABLE `ex_atbats` (
`ab_id` int(11) NOT NULL auto_increment,
`game` int(11) NOT NULL,
`team` int(11) NOT NULL,
`player` int(11) NOT NULL,
`result` tinyint(1) NOT NULL,
PRIMARY KEY (`hit_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
So I have two questions. Let's start with the simple version: I want to return a list of games with a count of how many at bats are in each game. So I think I would do something like this:
SELECT date, home_team, COUNT(h.ab_id) FROM `ex_atbats` h
LEFT JOIN ex_games g ON g.game_id = h.game
GROUP BY g.game_id
This query uses filesort/temporary. Is there a better way to structure this or to index the tables to get rid of that?
Then, the trickier part: say I now want to not only include a count of the number of at bats, but also include a count of the number of at bats that were preceded by an at bat with the same result by the same team. I assume that would be something like:
SELECT g.date, g.home_team, COUNT(ab.ab_id), COUNT(ab2.ab_id) FROM `ex_atbats` ab
LEFT JOIN ex_games g ON g.game_id = ab.game
LEFT JOIN ex_atbats ab2 ON ab2.ab_id = ab.ab_id - 1 AND ab2.result = ab.result
GROUP BY g.game_id
Is that the correct way to structure that query? This also uses filesort/temporary.
So what is the optimal way to go about accomplishing these tasks?
Thanks again.
Phrases Using temporary/filesort usually are not related to the indexes used in the JOIN operation. There is numerous examples where you can have all indexes set (they show up in key and key_len columns in EXPLAIN) but you still get Using temporary and Using filesort.
Check out what the manual says about Using temporary and Using filesort:
How MySQL Uses Internal Temporary Tables
ORDER BY Optimization
Having a combined index for all columns used in GROUP BY clause may help to get rid of Using filesort in certain circumstances. If you also issue ORDER BY you may need to add more complex indexes.
If you have a huge dataset consider partitioning it using some criteria like date or timestamp by means of actual partitioning or a simple WHERE clause.
First of all, the tables' definitions do matter. It's one thing to join using two primary keys, another to join using a primary key from one side and a non-unique key in the other, etc. It also matters what type of engine the tables use as InnoDB treats Primary Keys differently than MyISAM engine.
What I notice though is that on table test1, the (c1,c2) combination is Unique and the fields are not nullable. This allows your query to be rewritten as:
SELECT t.c1, t.c2, COUNT(*)
FROM test1 t
LEFT JOIN test2 t2 ON t2.key = t.key
GROUP BY t.key
It will give the same results while using the same field for the JOIN and the GROUP BY. Note that MySQL allows you to use in the SELECT list fields that are not in the GROUP BY list, without having aggregate functions on them. This is not allowed in most other systems and is seen as a bug by some. In this situation though it is a very nice feature. Every row can be either identified by (key) or (c1,c2), so it shouldn't matter which of the two is used for the grouping.
Another thing to note is that when you use LEFT JOIN, it's common to use the joining column from the right side for the counting: COUNT(t2.key) and not COUNT(*). Your original query will give 1 in that column for records in test1 that do not mmatch any record in test2 because it counts rows while you probably want to count the related records in test2 - and show 0 in those cases.
So, try this query and post the EXPLAIN:
SELECT t.c1, t.c2, COUNT(t2.key)
FROM test1 t
LEFT JOIN test2 t2 ON t2.key = t.key
GROUP BY t.key
The indexes help with the join, but you still need to do a full sort in order to do the group by. Essentially, it still has to process every record in the set.
Adding a where clause and limiting the set would run faster, of course. It just won't get you the results you want.
There may be other options than doing a group by on the entire table. I notice you're doing a SELECT * - What are you trying to get out of the query?
SELECT DISTINCT c1, c2
FROM test t
LEFT JOIN test2 t2 ON t2.key = t.key
may run faster, for instance. (I realize this was just a sample query, but understand that it's hard to optimize when you don't know what the end goal is!)
EDIT - In doing some reading (http://dev.mysql.com/doc/refman/5.0/en/group-by-optimization.html), I learned that, under the correct circumstances, indexes can help significantly with the group by.
What I'm seeing is that it needs to be a sorted index (like BTREE), not a HASH. Perhaps:
CREATE INDEX c1c2 IN t (c1, c2) USING BTREE;
might help.
For innodb it will work, as the index caries your primary key by default. For myisam you have to have the key as the last column of your index be "key". That will give the optimizers all keys in the same order and he can skip the sort. You cannot do any range queryies on the index prefix theN, puts you right back into filesort. currently struggling with a similiar problem
If I have the following table:
CREATE TABLE `mytable` (
`id` INT NOT NULL AUTO_INCREMENT,
`name` VARCHAR(64) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `name_first_letter` (`name`(1)),
KEY `name_all` (`name`)
)
Will MySQL ever choose to use the name_first_letter index over the name_all index? If so, under what conditions would this happen?
I have done some quick tests and I'm not sure if MySQL will choose the name_first_letter index even when using index hints:
-- This uses name_all
EXPLAIN SELECT name FROM mytable
WHERE SUBSTRING(name FROM 1 FOR 1) = 'T';
-- This uses no index at all
EXPLAIN SELECT name FROM mytable USE INDEX (name_first_letter)
WHERE SUBSTRING(name FROM 1 FOR 1) = 'T';
Can any MySQL gurus shed some light on this? Is there even a point to having name_first_letter on this column?
Edit: Question title wasn't quite right.
It will not make sense to use the index for your query, because you are selecting the full name column. That means that MySQL cannot use the index alone to satisfy the query.
Further, I believe that MySQL cannot understand that the SUBSTRING(name FROM 1 FOR 1) expression is equivalent to the index.
MySQL might, however, use the index if the index alone can satisfy the query. For example:
select count(*)
from mytable
where name like 'T%';
But even that depends on you statistics (hinting should work).
MySQLs partial index feature is intended to save space. It does (usually) not make sense to have both, the partial and the full column index. You would typically drop the shorter one. There might be a rare case where it makes sense, but doesn't make sense in general.