I have table that has composite PK.
CREATE TABLE `tag_value_copy` (
`tag_id` INT(11) NOT NULL,
`created_at` INT(11) NOT NULL,
`value` FLOAT NULL DEFAULT NULL,
PRIMARY KEY (`tag_id`, `created_at`)
)
COLLATE='utf8_unicode_ci'
ENGINE=InnoDB
ROW_FORMAT=COMPACT;
When I execute following query
DELETE FROM tag_value_copy WHERE (tag_id, created_at) IN ((1,2), (2,3), ..., (5,6))
mysql does not use index and goes through all rows. But why?
EXPLAIN SELECT * FROM tag_value_copy WHERE (tag_id,created_at) in ((1,1518136666), (2,1518154836)) do NOT use an index as well.
UPD 1
show index from tag_value_copy
UPD 2
explain delete from tag_value_copy where (tag_id=1 and created_at=1518103037) or (tag_id=2 and created_at=1518103038)
The Why -- MySQL's optimizer does nothing toward optimizing (a, b) IN ((1,2), ...).
The Workaround -- Create a table with the pairs to delete. Then JOIN using an AND between each of the 2 columns.
None of these help: OR, FORCE INDEX.
Why the heck do you have PRIMARY KEY (tag_id, created_at) ? Are you allowing the same tag to be entered multiple times?
Related
Given the following table:
CREATE TABLE `test` (
`a` int(255) unsigned NOT NULL AUTO_INCREMENT,
`b` varchar(64) NOT NULL DEFAULT '' ,
`c` varchar(32) NOT NULL DEFAULT '' ,
`d` varchar(32) NOT NULL DEFAULT '' ,
PRIMARY KEY (`a`),
KEY `b` (`b`) USING BTREE,
KEY `c` (`c`(19)) USING BTREE,
KEY `d` (`d`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='test';
insert test(a,b,c,d) values('1','1','1','1');
insert test(a,b,c,d) values('2','2','2','1');
insert test(a,b,c,d) values('3','3','3','1');
insert test(a,b,c,d) values('4','4','4','1');
I don't know which index the following SQL uses, but I know that Innodb engine only uses one index.
explain select * from test where b='2' and c='2' and d='2';
I executed the above sql in the mysql database, then this statement uses 'b index'. Are there any rules here? Or are there any rules for the optimizer, but it is used here?
In theory, MySQL would choose the index on the column that is most restrictive -- that is, the one that chooses the fewest rows.
But for your query, you want an index that has all three columns, b, c, and d in any order.
For
where b='2' and c='2' and d='2';
have
INDEX(b,c,d) -- with the columns in any order
Another issue. Don't use "prefix" indexing without a good reason:
KEY `c` (`c`(19)) USING BTREE,
It is mostly useless.
Here are some guidelines: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
I'm running MySQL 5.5 and found behaviour I didn't know of before.
Given this create:
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(128) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name_UQ` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
With these inserts:
insert into test (name) values ('b');
insert into test (name) values ('a');
And this select:
select * from test;
MySQL does something I wasn't aware of:
2 a
1 b
It sorts automatically.
Given a table with one extra, non-unique column:
CREATE TABLE `test` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(128) DEFAULT NULL,
`other_column` varchar(128) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `name_UQ` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
And the same inserts (see above), the select (see above) gives this result:
1 b NULL
2 a NULL
Which is kind of expected.
Where is the behaviour of the first query (SQL Fiddle) documented? I'd like to see more of these peculiar things.
MySQL does not sort result sets automatically. The ordering of a result set is indeterminate unless the query specifies an order by clause.
You should never rely on any sort of "implicit" ordering. Just because you see it in 1 (or 100 queries). In fact, without an order by, the same query can return results in different orders on subsequent runs (although I'll admit that this regularly occurs in other database, it is unlikely in MySQL).
Instead, add the ORDER BY. Ordering by a primary key is remarkably efficient, so you don't have to worry about performance.
I'm having some problems optimizing a certain query in SQL(using MariaDB), to give you some context: I have a system with "events"(see them as log entries) that can occur on tickets, but also on some other objects besides tickets(which I why I seperated the event and ticket_event tables). I want to get all ticket_events sorted by display_time. The event table has ~20M rows right now.
CREATE TABLE IF NOT EXISTS `event` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`type` varchar(255) DEFAULT NULL,
`data` text,
`display_time` datetime DEFAULT NULL,
`created_time` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `index_for_display_time_and_id` (`id`,`display_time`),
KEY `index_for_display_time` (`display_time`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE IF NOT EXISTS `ticket_event` (
`id` int(11) NOT NULL,
`ticket_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `ticket_id` (`ticket_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `ticket_event`
ADD CONSTRAINT `ticket_event_ibfk_1` FOREIGN KEY (`id`) REFERENCES `event` (`id`),
ADD CONSTRAINT `ticket_event_ibfk_2` FOREIGN KEY (`ticket_id`) REFERENCES `ticket` (`id`);
As you see I already played around with some keys(I also made one for (id, ticket_id) that doesn't show up here now since I removed it again) The query I execute:
SELECT * FROM ticket_event
INNER JOIN event ON event.id = ticket_event.id
ORDER BY display_time DESC
LIMIT 25
That query takes quite a while to execute(~30s if I filter on a specific ticket_id, can't even complete it reliably without filtering on it). If I run an explain on the query it shows it does a filesort + temporary:
I played around with force index etc. a bit, but that doesn't seem to solve anything or I did it wrong.
Does anyone see what I did wrong or what I can optimize here? I would very much prefer not to make "event" a wide table by adding ticket_id/host_id etc. as columns and just making them NULL if they don't apply.
Thanks in advance!
EDIT: Extra image of EXPLAIN with actual rows in the table:
OK what if you try to force the index?
SELECT * FROM ticket_event
INNER JOIN event
FORCE INDEX (index_for_display_time)
ON event.id = ticket_event.id
ORDER BY display_time DESC
LIMIT 25;
Your query selects every column from every row, even if you use a LIMIT. Have you tried to select one specific row by id?
KEY `index_for_display_time_and_id` (`id`,`display_time`),
is useless; DROP it. It is useless because you are using InnoDB, which stores the data "clustered" on the PK (id).
Please change ticket_event.id to event_id. id is confusing because it feels like the PK of the mapping table, which it is. But wait! That does not make sense? There is only one ticket for each event? Then why does ticket_event exist at all? Why not put ticket_id in event?
For a many-to-many table, do
CREATE TABLE IF NOT EXISTS `ticket_event` (
`event_id` int(11) NOT NULL,
`ticket_id` int(11) NOT NULL,
PRIMARY KEY (`event_id`, ticket_id), -- for lookup one direction
KEY (`ticket_id`, event_id) -- for the other direction
) ENGINE=InnoDB DEFAULT;
Maybe you will achieve a better performance by trying this:
SELECT *
FROM ticket_event
INNER JOIN (select * from event ORDER BY display_time DESC limit 25) as b
ON b.id = ticket_event.id;
I have a database with the following three tables:
matches table has 200,000 matches...
CREATE TABLE `matches` (
`match_id` bigint(20) unsigned NOT NULL,
`start_time` int(10) unsigned NOT NULL,
PRIMARY KEY (`match_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
heroes table has ~100 heroes...
CREATE TABLE `heroes` (
`hero_id` smallint(5) unsigned NOT NULL,
`name` char(40) NOT NULL,
PRIMARY KEY (`hero_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
matches_heroes table has 2,000,000 relationships (10 random heroes per match)...
CREATE TABLE `matches_heroes` (
`relation_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`match_id` bigint(20) unsigned NOT NULL,
`hero_id` smallint(6) unsigned NOT NULL,
PRIMARY KEY (`relation_id`),
KEY `match_id` (`match_id`),
KEY `hero_id` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_2` FOREIGN KEY (`hero_id`)
REFERENCES `heroes` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_1` FOREIGN KEY (`match_id`)
REFERENCES `matches` (`match_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=3689891 DEFAULT CHARSET=utf8
The following query takes over 1 second, which seems pretty slow to me for something so simple:
SELECT SQL_NO_CACHE COUNT(*) AS match_count
FROM matches INNER JOIN matches_heroes ON matches.match_id = matches_heroes.match_id
WHERE hero_id = 5
Removing only the WHERE clause doesn't help, but if I take out the INNER JOIN also, like so:
SELECT SQL_NO_CACHE COUNT(*) AS match_count FROM matches
...it only takes 0.05 seconds. It seems that INNER JOIN is very costly. I don't have much experience with joins. Is this normal or am I doing something wrong?
UPDATE #1: Here's the EXPLAIN result.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE matches_heroes ref match_id,hero_id,match_id_hero_id hero_id 2 const 34742
1 SIMPLE matches eq_ref PRIMARY PRIMARY 8 mydatabase.matches_heroes.match_id 1 Using index
UPDATE #2: After listening to you guys, I think it's working properly and this is simply as fast as it gets. Please let me know if you disagree. Thanks for all the help. I really appreciate it.
Use COUNT(matches.match_id) instead of count(*), as when using joins it's best to not use the * as it does extra computation. Using columns from the join are the best way ensure you are not requesting any other operations. (not a problem on MySql InnerJoin, my bad).
Also you should verify that you have all keys defragmented, and enough ram free for the index to load in memory
Update 1:
Try to add a composed index for match_id,hero_id as it should give better performance.
ALTER TABLE `matches_heroes` ADD KEY `match_id_hero_id` (`match_id`,`hero_id`)
Update 2:
I wasn't satisfied with the accepted answer, that mysql is that slow for just 2 mill records and I runed benchmarks on my ubuntu PC (i7 processor, with standard HDD).
-- pre-requirements
CREATE TABLE seq_numbers (
number INT NOT NULL
) ENGINE = MYISAM;
DELIMITER $$
CREATE PROCEDURE InsertSeq(IN MinVal INT, IN MaxVal INT)
BEGIN
DECLARE i INT;
SET i = MinVal;
START TRANSACTION;
WHILE i <= MaxVal DO
INSERT INTO seq_numbers VALUES (i);
SET i = i + 1;
END WHILE;
COMMIT;
END$$
DELIMITER ;
CALL InsertSeq(1,200000)
;
ALTER TABLE seq_numbers ADD PRIMARY KEY (number)
;
-- create tables
-- DROP TABLE IF EXISTS `matches`
CREATE TABLE `matches` (
`match_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`start_time` int(10) unsigned NOT NULL,
PRIMARY KEY (`match_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
CREATE TABLE `heroes` (
`hero_id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`name` char(40) NOT NULL,
PRIMARY KEY (`hero_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
CREATE TABLE `matches_heroes` (
`match_id` bigint(20) unsigned NOT NULL,
`hero_id` smallint(6) unsigned NOT NULL,
PRIMARY KEY (`match_id`,`hero_id`),
KEY (match_id),
KEY (hero_id),
CONSTRAINT `matches_heroes_ibfk_2` FOREIGN KEY (`hero_id`) REFERENCES `heroes` (`hero_id`),
CONSTRAINT `matches_heroes_ibfk_1` FOREIGN KEY (`match_id`) REFERENCES `matches` (`match_id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=MyISAM DEFAULT CHARSET=utf8
;
-- insert DATA
-- 100
INSERT INTO heroes(name)
SELECT SUBSTR(CONCAT(char(RAND()*25+65),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97),char(RAND()*25+97)),1,RAND()*9+4) as RandomName
FROM seq_numbers WHERE number <= 100
-- 200000
INSERT INTO matches(start_time)
SELECT rand()*1000000
FROM seq_numbers WHERE number <= 200000
-- 2000000
INSERT INTO matches_heroes(hero_id,match_id)
SELECT a.hero_id, b.match_id
FROM heroes as a
INNER JOIN matches as b ON 1=1
LIMIT 2000000
-- warm-up database, load INDEXes in ram (optional, works only for MyISAM tables)
LOAD INDEX INTO CACHE matches_heroes,matches,heroes
-- get random hero_id
SET #randHeroId=(SELECT hero_id FROM matches_heroes ORDER BY rand() LIMIT 1);
-- test 1
SELECT SQL_NO_CACHE #randHeroId,COUNT(*) AS match_count
FROM matches as a
INNER JOIN matches_heroes as b ON a.match_id = b.match_id
WHERE b.hero_id = #randHeroId
; -- Time: 0.039s
-- test 2: adding some complexity
SET #randName = (SELECT `name` FROM heroes WHERE hero_id = #randHeroId LIMIT 1);
SELECT SQL_NO_CACHE #randName, COUNT(*) AS match_count
FROM matches as a
INNER JOIN matches_heroes as b ON a.match_id = b.match_id
INNER JOIN heroes as c ON b.hero_id = c.hero_id
WHERE c.name = #randName
; -- Time: 0.037s
Conclusion: The test results are about 20x faster, and my server load was about 80% before testing as it's not a dedicated mysql server and had other cpu intensive tasks running, so if you run the whole script (from above) and get lower results it can be because:
you have a shared host, and the load is too big. In this case there isn't much you can do: you either complain to your current host, pay for a better host/vm or try another host
your configured key_buffer_size(for MyISAM) or innodb_buffer_pool_size(for innoDB) is too small, the optimum size would be over 150MB
your available ram is not enough, you would require about 100 - 150 mb of ram for the indexes to be loaded into memory. solution: free up some ram or buy more of it
Note that by using the test script, the generating of new data rules out the index fragmentation problem.
Hope this helps, and ask if you have issues in testing this.
obs:
SELECT SQL_NO_CACHE COUNT(*) AS match_count
FROM matches INNER JOIN matches_heroes ON matches.match_id = matches_heroes.match_id
WHERE hero_id = 5`
is the equivalent to:
SELECT SQL_NO_CACHE COUNT(*) AS match_count
FROM matches_heroes
WHERE hero_id = 5`
So you wouldn't require a join, if that's the count you need, but I'm guessing that was just an example.
So you say reading a table of 200,000 records is faster than reading a table of 2,000,000 records, finding the desired ones, then take them all to find matching records in the 200,000 record table?
And this surprises you? It's simply a lot of more work for the dbms. (It can even be, btw, that the dbms decides not to use the hero_id index when it considers a full table scan to be faster.)
So in my opinion there is nothing wrong with what is happening here.
If I create a table with the following syntax,
CREATE TABLE IF NOT EXISTS `hashes` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`hash` binary(20) NOT NULL,
PRIMARY KEY (`id`,`hash`),
UNIQUE KEY (`hash`)
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE = 4 AUTO_INCREMENT=1
PARTITION BY KEY(`hash`)
PARTITIONS 10;
And insert queries with the following syntax
INSERT INTO hashes (hash) VALUES ($value) ON DUPLICATE KEY UPDATE hash = hash
Then the auto increment column works as expected both if the row is inserted or updated.
Although creating the table without the partition like below and inserting with the query above the auto increment value will increase by 1 on every update or insert causing the A_I column to be all over place as the query could do 10 updates and then 1 insert causing the column value to jump 10 places.
CREATE TABLE IF NOT EXISTS `hashes` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`hash` binary(20) NOT NULL,
PRIMARY KEY (`id`,`hash`),
UNIQUE KEY (`hash`)
) ENGINE=InnoDB AUTO_INCREMENT=1;
I understand why the value increases on an update with INNO_DB but I do not understand why it doesn't when the table is partitioned?
you cannot change that, but you can try something like this:
mysql> set #a:= (select max(id) + 2 from hashes);
mysql> insert into hashes (id) values ($value) on duplicate key update id=#a;
NOTE: the partitions change a little bit after mysql 5.6, which version do you have?