I have these tables.
CREATE TABLE `movements` (
`movementId` mediumint(8) UNSIGNED NOT NULL,
`movementType` tinyint(3) UNSIGNED NOT NULL,
`deleted` tinyint(1) UNSIGNED NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `movements`
ADD PRIMARY KEY (`movementId`),
ADD KEY `movementType` (`movementType`) USING BTREE,
ADD KEY `deleted` (`deleted`),
ADD KEY `movementId` (`movementId`,`deleted`);
CREATE TABLE `movements_items` (
`movementId` mediumint(8) UNSIGNED NOT NULL,
`itemId` mediumint(8) UNSIGNED NOT NULL,
`qty` decimal(10,3) UNSIGNED NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `movements_items`
ADD KEY `movementId` (`movementId`),
ADD KEY `itemId` (`itemId`),
ADD KEY `movementId_2` (`movementId`,`itemId`);
and this view called "movements_items_view".
SELECT
movements_items.itemId, movements_items.qty,
movements.movementId, movements.movementType
FROM movements_items
JOIN movements ON (movements.movementId=movements_items.movementId
AND movements.deleted=0)
The first table has 5913 rows, the second one has 144992.
The view is very fast, it loads 20 result in PhpMyAdmin in 0.0011s but as soon as I ask for a GROUP BY on it (I need it to do statistics with SUM()) es:
SELECT * FROM movements_items_view GROUP BY itemId LIMIT 0,20
time jumps to 0.2s or more and it causes "Using where; Using temporary; Using filesort" on movements join.
Any help appreciated, thanks.
EDIT:
I also run via phpMyAdmin this query to try to not use the view:
SELECT movements.movementId, movements.movementType, movements_items.qty
FROM movements_items
JOIN movements ON movements.movementId=movements_items.movementId
GROUP BY itemId LIMIT 0,20
And the performance is the same.
Edit. Here is the EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE movements index PRIMARY,movementid movement_type 1 NULL 5913 Using index; Using temporary; Using filesort
1 SIMPLE movements_items ref movementId,itemId,movementId_2 movementId_2 3 movements.movementId 12 Using index
Turn it inside out. See if this works:
SELECT movementId, m.movementType, mi.qty
FROM
( SELECT movementId, qty
FROM movements_items
GROUP BY itemId
ORDER BY itemId
LIMIT 20
) AS mi
JOIN movements AS m USING(movementId)
The trick is to do the LIMIT sooner. The original way had all the data being hauled around, not just 20 rows.
In movements_items, is no column or combination of columns 'unique'? If so, make it the PRIMARY KEY.
In movement, KEY movementId (movementId,deleted) is redundant and should be dropped.
In movement_items, KEY movementId (movementId) is redundant and should be dropped.
Related
I have posts and websites (and connecting post_websites). Each post can be on multiple websites, and some websites share the content, so I am trying to access the posts which are attached to particular website IDs.
Most of the cases WHERE IN works fine, but not for all websites, some of them are laggy, and I can't understand a difference.
SELECT *
FROM `posts`
WHERE `posts`.`id` IN (
SELECT `post_websites`.`post_id`
FROM `post_websites`
WHERE `website_id` IN (
12054,
19829,
2258,
253
)
) AND
`status` = 1 AND
`posts`.`deleted_at` IS NULL
ORDER BY `post_date` DESC
LIMIT 6
Explain
select_type
table
type
key
key_len
ref
rows
Extra
SIMPLE
post_websites
range
post_websites_website_id_index
4
NULL
440
Using index condition; Using temporary; Using filesort; Start temporary
SIMPLE
posts
eq_ref
PRIMARY
4
post_websites.post_id
1
Using where; End temporary
Other version with EXISTS
SELECT *
FROM `posts`
WHERE EXISTS (
SELECT `post_websites`.`post_id`
FROM `post_websites`
WHERE `website_id` IN (
12054,
19829,
2258,
253
) AND
`posts`.`id` = `post_websites`.`post_id`
) AND
`status` = 1 AND
`deleted_at` IS NULL
ORDER BY `post_date` DESC
LIMIT 6
EXPLAIN:
select_type
table
type
key
key_len
ref
rows
Extra
PRIMARY
posts
index
post_date_index
5
NULL
12
Using where
DEPENDENT SUBQUERY
post_websites
ref
post_id_website_id_unique
4
post.id
1
Using where; Using index
Long story short: based on different amounts of posts on each site and amount of websites sharing content the results are different from 20ms to 50s!
Based on the EXPLAIN the EXISTS works better, but on practice when the amount of data in subquery is lower, it can be very slow.
Is there a query I am missing that could work like a charm for all cases? Or should I check something before querying and choose the method of doing so dynamically?
migrations:
CREATE TABLE `posts` (
`id` int(10) UNSIGNED NOT NULL,
`title` varchar(225) COLLATE utf8_unicode_ci NOT NULL,
`description` varchar(500) COLLATE utf8_unicode_ci NOT NULL,
`post_date` timestamp NULL DEFAULT NULL,
`status` tinyint(4) NOT NULL DEFAULT '1',
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`deleted_at` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
ALTER TABLE `posts`
ADD PRIMARY KEY (`id`),
ADD KEY `created_at_index` (`created_at`) USING BTREE,
ADD KEY `status_deleted_at_index` (`status`,`deleted_at`) USING BTREE,
ADD KEY `post_date_index` (`post_date`) USING BTREE,
ADD KEY `id_post_date_status_deleted_at` (`id`,`post_date`,`status`,`deleted_at`) USING BTREE;
CREATE TABLE `post_websites` (
`post_id` int(10) UNSIGNED NOT NULL,
`website_id` int(10) UNSIGNED NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
ALTER TABLE `post_websites`
ADD PRIMARY KEY (`website_id`, `post_id`),
ADD UNIQUE KEY `post_id_website_id_unique` (`post_id`,`website_id`),
ADD KEY `website_id_index` (`website_id`),
ADD KEY `post_id_index` (`post_id`);
eloquent:
$news = Post::select(['title', 'description'])
->where('status', 1)
->whereExists(
function ($query) use ($sites) {
$query->select('post_websites.post_id')
->from('post_websites')
->whereIn('websites_id', $sites)
->whereRaw('post_websites.post_id = posts.id');
})
->orderBy('post_date', 'desc');
->limit(6)
->get();
or
$q->whereIn('posts.id',
function ($query) use ($sites) {
$query->select('post_websites.post_id')
->from('post_websites')
->whereIn('website_id', $sites);
});
Thanks.
Many:many table: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
That says to get rid if id (because it slows things down), promote that UNIQUE to be the PK, and add an INDEX in the opposite direction.
Don't use IN ( SELECT ... ). A simple JOIN is probably the best alternative here.
Did some 3rd party package provide those 3 TIMESTAMPs for each table? Are they ever used? Get rid of them.
KEY `id_post_date_status_deleted_at` (`id`,`post_date`,`status`,`deleted_at`) USING BTREE;
is mostly backward. Some rules:
Don't start an index with the PRIMARY KEY column(s).
Do start an index with = tests: status,deleted_at
I have the following MySQL query which takes about 40 seconds on a linux VM:
SELECT
* FROM `clients_event_log`
WHERE
`ex_long` = 1475461 AND
`type` in (2, 1) AND NOT
(
(category=1 AND error=-2147212542) OR
(category=7 AND error=67)
)
ORDER BY `ev_time` DESC LIMIT 100
The table has around 7 million rows, aprox. 800 MB in size and it has indexes on all the fields used in the WHERE and ORDER BY clauses.
Now if I change the query in such a way that the ordering is done in an outer SELECT, everything works much faster (around 100ms):
SELECT res.* FROM
(
SELECT * FROM `clients_event_log`
WHERE
`ex_long` = 1475461 AND
`type` in (2, 1) AND NOT
(
(category=1 AND error=-2147212542) OR
(category=7 AND error=67)
)
) AS res
ORDER BY res.ev_time DESC LIMIT 0, 100
Do you have any idea why the first query takes such a long time? Thank you.
Later Update:
1st Query EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE clients_event_log index category,ex_long,type,error,categ_error ev_time 4 NULL 5636 Using where
2nd Query EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> system NULL NULL NULL NULL 1
2 DERIVED clients_event_log ref category,ex_long,type,error,categ_error ex_long 5 131264 Using where
Table definition:
CREATE TABLE `clients_event_log` (
`ev_id` int(11) NOT NULL,
`type` int(6) NOT NULL,
`ev_time` int(11) NOT NULL,
`category` smallint(6) NOT NULL,
`error` int(11) NOT NULL,
`ev_text` varchar(1024) DEFAULT NULL,
`userid` varchar(20) DEFAULT NULL,
`ex_long` int(11) DEFAULT NULL,
`client_ex_long` int(11) DEFAULT NULL,
`ex_text` varchar(1024) DEFAULT NULL,
PRIMARY KEY (`ev_id`),
KEY `category` (`category`),
KEY `ex_long` (`ex_long`),
KEY `type` (`type`),
KEY `ev_time` (`ev_time`),
KEY `error` (`error`),
KEY `categ_error` (`category`,`error`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
I ended up using the second query (inner SELECT) because the MySQL optimiser decided to always use the ev_time index even if I tried multiple versions of a composite index containing the columns in the WHERE and ORDER BY clauses.
Using force index (ex_long) also worked.
The MySQL version was 5.5.38
Thank you.
Add these
INDEX(ev_long, ev_time),
INDEX(ev_long, type)
and use the first format of the query and let the optimizer decide which is better based on the statistics.
I'm struggling to understand if I've indexed this query properly, it's somewhat slow and I feel it could use optimization. MySQL 5.1.70
select snaps.id, snaps.userid, snaps.ins_time, usr.gender
from usersnaps as snaps
join user as usr on usr.id = snaps.userid
left join user_convert as conv on snaps.userid = conv.userid
where (conv.level is null or conv.level = 4) and snaps.active = 'N'
and (usr.status = "unfilled" or usr.status = "unapproved") and usr.active = 1
order by snaps.ins_time asc
usersnaps table (irrelevant deta removed, size about 250k records) :
CREATE TABLE IF NOT EXISTS `usersnaps` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`userid` int(11) unsigned NOT NULL DEFAULT '0',
`picture` varchar(250) NOT NULL,
`active` enum('N','Y') NOT NULL DEFAULT 'N',
`ins_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`,`userid`),
KEY `userid` (`userid`,`active`),
KEY `ins_time` (`ins_time`),
KEY `active` (`active`)
) ENGINE=InnoDB;
user table (irrelevant deta removed, size about 300k records) :
CREATE TABLE IF NOT EXISTS `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`active` tinyint(1) NOT NULL DEFAULT '1',
`status` enum('15','active','approval','suspended','unapproved','unfilled','rejected','suspended_auto','incomplete') NOT NULL DEFAULT 'approval',
PRIMARY KEY (`id`),
KEY `status` (`status`,`active`)
) ENGINE=InnoDB;
user_convert table (size about : 60k records) :
CREATE TABLE IF NOT EXISTS `user_convert` (
`userid` int(10) unsigned NOT NULL,
`level` tinyint(4) NOT NULL,
UNIQUE KEY `userid` (`userid`),
KEY `level` (`level`)
) ENGINE=InnoDB;
Explain extended returns :
id select_type table type possible_keys key key_len ref rows filtered Extra
1 SIMPLE snaps ref userid,default_pic,active active 1 const 65248 100.00 Using where; Using filesort
1 SIMPLE usr eq_ref PRIMARY,active,status PRIMARY 4 snaps.userid 1 100.00 Using where
1 SIMPLE conv eq_ref userid userid 4s snaps.userid 1 100.00 Using where
Using filesort is probably your performance killer.
You need the records from usersnaps where active = 'N' and you need them sorted by ins_time.
ALTER TABLE usersnaps ADD KEY active_ins_time (active,ins_time);
Indexes are stored in sorted order, and read in sorted order... so if the optimizer chooses that index, it will go for the records with active = 'N' and -- hey, look at that -- they're already sorted by ins_time -- because of that index. So as it reads the rows referenced by the index, the result-set internally is already in the order you want it to ORDER BY, and the optimizer should realize this... no filesort required.
I would recommend changing the userid index (assuming you're not using it right now) to have active first and userid later.
That should make it more useful for this query.
To start out here is a simplified version of the tables involved.
tbl_map has approx 4,000,000 rows, tbl_1 has approx 120 rows, tbl_2 contains approx 5,000,000 rows. I know the data shouldn't be consider that large given that Google, Yahoo!, etc use much larger datasets. So I'm just assuming that I'm missing something.
CREATE TABLE `tbl_map` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`tbl_1_id` bigint(20) DEFAULT '-1',
`tbl_2_id` bigint(20) DEFAULT '-1',
`rating` decimal(3,3) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `tbl_1_id` (`tbl_1_id`),
KEY `tbl_2_id` (`tbl_2_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `tbl_1` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
CREATE TABLE `tbl_2` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`data` varchar(255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The Query in interest: also, instead of ORDER BY RAND(), ORDERY BY t.id DESC. The query is taking as much as 5~10 seconds and causes a considerable wait when users view this page.
EXPLAIN SELECT t.data, t.id , tm.rating
FROM tbl_2 AS t
JOIN tbl_map AS tm
ON t.id = tm.tbl_2_id
WHERE tm.tbl_1_id =94
AND tm.rating IS NOT NULL
ORDER BY t.id DESC
LIMIT 200
1 SIMPLE tm ref tbl_1_id, tbl_2_id tbl_1_id 9 const 703438 Using where; Using temporary; Using filesort
1 SIMPLE t eq_ref PRIMARY PRIMARY 8 tm.tbl_2_id 1
I would just liked to speed up the query, ensure that I have proper indexes, etc.
I appreciate any advice from DB Gurus out there! Thanks.
SUGGESTION : Index the table as follows:
ALTER TABLE tbl_map ADD INDEX (tbl_1_id,rating,tbl_2_id);
As per Rolando, yes, you definitely need an index on the map table but I would expand to ALSO include the tbl_2_id which is for your ORDER BY clause of Table 2's ID (which is in the same table as the map, so just use that index. Also, since the index now holds all 3 fields, and is based on the ID of the key search and criteria of null (or not) of rating, the 3rd element has them already in order for your ORDER BY clause.
INDEX (tbl_1_id,rating, tbl_2_id);
Then, I would just have the query as
SELECT STRAIGHT_JOIN
t.data,
t.id ,
tm.rating
FROM
tbl_map tm
join tbl_2 t
on tm.tbl_2_id = t.id
WHERE
tm.tbl_1_id = 94
AND tm.rating IS NOT NULL
ORDER BY
tm.tbl_2_id DESC
LIMIT 200
This query:
explain
SELECT `Lineitem`.`id`, `Donation`.`id`, `Donation`.`order_line_id`
FROM `order_line` AS `Lineitem`
LEFT JOIN `donations` AS `Donation`
ON (`Donation`.`order_line_id` = `Lineitem`.`id`)
WHERE `Lineitem`.`session_id` = '1'
correctly uses the Donation.order_line_id and Lineitem.id indexes, shown in this EXPLAIN output:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE Lineitem ref session_id session_id 97 const 1 Using where; Using index
1 SIMPLE Donation ref order_line_id order_line_id 4 Lineitem.id 2 Using index
However, this query, which simply includes another field:
explain
SELECT `Lineitem`.`id`, `Donation`.`id`, `Donation`.`npo_id`,
`Donation`.`order_line_id`
FROM `order_line` AS `Lineitem`
LEFT JOIN `donations` AS `Donation`
ON (`Donation`.`order_line_id` = `Lineitem`.`id`)
WHERE `Lineitem`.`session_id` = '1'
Shows that the Donation table does not use an index:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE Lineitem ref session_id session_id 97 const 1 Using where; Using index
1 SIMPLE Donation ALL order_line_id NULL NULL NULL 3
All of the _id fields in the tables are indexed, but I can't figure out how adding this field into the list of selected fields causes the index to be dropped.
As requested by James C, here are the table definitions:
CREATE TABLE `donations` (
`id` int(10) unsigned NOT NULL auto_increment,
`npo_id` int(10) unsigned NOT NULL,
`order_line_detail_id` int(10) unsigned NOT NULL default '0',
`order_line_id` int(10) unsigned NOT NULL default '0',
`created` datetime default NULL,
`modified` datetime default NULL,
PRIMARY KEY (`id`),
KEY `npo_id` (`npo_id`),
KEY `order_line_id` (`order_line_id`),
KEY `order_line_detail_id` (`order_line_detail_id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8
CREATE TABLE `order_line` (
`id` bigint(20) unsigned NOT NULL auto_increment,
`order_id` bigint(20) NOT NULL,
`npo_id` bigint(20) NOT NULL default '0',
`session_id` varchar(32) collate utf8_unicode_ci default NULL,
`created` datetime default NULL,
PRIMARY KEY (`id`),
KEY `order_id` (`order_id`),
KEY `npo_id` (`npo_id`),
KEY `session_id` (`session_id`)
) ENGINE=InnoDB AUTO_INCREMENT=23 DEFAULT CHARSET=utf8
I also did some reading about cardinality, and it looks like both the Donations.npo_id and Donations.order_line_id have a cardinality of 2. Hopefully this suggests something useful?
I'm thinking that a USE INDEX might solve the problem, but I'm using an ORM that makes this a bit tricky, and I don't understand why it wouldn't grab the correct index when the JOIN specifically names indexed fields?!?
Thanks for your brainpower!
The first explain has "uses index" at the end. This means that it was able to find the rows and return the result for the query by just looking at the index and not having to fetch/analyse any row data.
In the second query you add a row that's likely not indexed. This means that MySQL has to look at the data of the table. I'm not sure why the optimiser chose to do a table scan but I think it's likely that if the table is fairly small it's easier for it to just read everything than trying to pick out details for individual rows.
edit: I think adding the following indexes will improve things even more and let all of the join use indexes only:
ALTER TABLE order_line ADD INDEX(session_id, id);
ALTER TABLE donations ADD INDEX(order_line_id, npo_id, id)
This will allow order_line to to find the rows using session_id and then return id and also allow donations to join onto order_line_id and then return the other two columns.
Looking at the auto_increment values can I assume that there's not much data in there. It's worth noting that the amount of data in the tables will have an effect on the query plan and it's good practice to put some sample data in there to test things out. For more detail have a look in this blog post I made some time back: http://webmonkeyuk.wordpress.com/2010/09/27/what-makes-a-good-mysql-index-part-2-cardinality/