MySQL doesn't use indexes in JOIN query - mysql

I have a main table with 500000+ rows.
CREATE TABLE `esc_questions`(
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`esc_id` INT(11) NOT NULL,
`question_text` LONGTEXT COLLATE utf8_unicode_ci NOT NULL,
`answer_1` TEXT COLLATE utf8_unicode_ci NOT NULL,
`answer_2` TEXT COLLATE utf8_unicode_ci NOT NULL,
`answer_3` TEXT COLLATE utf8_unicode_ci NOT NULL,
`answer_4` TEXT COLLATE utf8_unicode_ci NOT NULL,
`answer_5` TEXT COLLATE utf8_unicode_ci NOT NULL,
`right_answer` VARCHAR(255) COLLATE utf8_unicode_ci NOT NULL,
`disciplinas_id` INT(11) UNSIGNED NOT NULL,
`assunto_id` INT(11) UNSIGNED NOT NULL,
`orgao_id` INT(11) UNSIGNED NOT NULL,
`cargo_id` INT(11) UNSIGNED NOT NULL,
`ano` INT(11) NOT NULL,
`banca_id` INT(11) UNSIGNED NOT NULL,
`question_type` TINYINT(4) NOT NULL,
`url` TEXT COLLATE utf8_unicode_ci NOT NULL,
`created_at` TIMESTAMP NULL DEFAULT NULL,
`updated_at` TIMESTAMP NULL DEFAULT NULL,
PRIMARY KEY(`id`),
KEY `idx_ano`(`ano`) USING BTREE,
KEY `idx_question_type`(`question_type`) USING BTREE,
KEY `idx_cargo_id`(`cargo_id`) USING BTREE,
KEY `idx_orgao_id`(`orgao_id`) USING BTREE,
KEY `idx_banca_id`(`banca_id`) USING BTREE,
KEY `idx_question_id`(`id`) USING BTREE,
KEY `idx_assunto_id`(`assunto_id`) USING BTREE,
KEY `idx_disciplinas_id`(`disciplinas_id`) USING BTREE,
CONSTRAINT `fk_assunto_id` FOREIGN KEY(`assunto_id`) REFERENCES `esc_assunto`(`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `fk_banca_id` FOREIGN KEY(`banca_id`) REFERENCES `esc_bancas`(`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `fk_cargo_id` FOREIGN KEY(`cargo_id`) REFERENCES `esc_cargo`(`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `fk_disciplinas_id` FOREIGN KEY(`disciplinas_id`) REFERENCES `esc_disciplinas`(`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `fk_orgao_id` FOREIGN KEY(`orgao_id`) REFERENCES `esc_orgao`(`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE = INNODB AUTO_INCREMENT = 516157 DEFAULT CHARSET = utf8 COLLATE = utf8_unicode_ci
Related data is stored to five additional tables, very similar to this one:
CREATE TABLE `esc_assunto`(
`id` INT(11) UNSIGNED NOT NULL AUTO_INCREMENT,
`name` VARCHAR(255) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY(`id`),
KEY `idx_assunto_id`(`id`) USING BTREE,
KEY `idx_assunto_name`(`name`(30)),
CONSTRAINT `fk_assunto` FOREIGN KEY(`id`) REFERENCES `esc_questions`(`assunto_id`) ON DELETE NO ACTION ON UPDATE NO ACTION) ENGINE = INNODB AUTO_INCREMENT = 3618 DEFAULT CHARSET = utf8 COLLATE = utf8_unicode_ci
I have pagination on my website. When I'm trying to get latest pages, time taken for data request is rising.
Here is my SELECT for this task:
SELECT
f.*,
d.name disciplinas,
o.name orgao,
c.name cargo,
b.name banca,
a.name assunto
FROM
`esc_questions` f
INNER JOIN
`esc_bancas` b
ON
f.banca_id = b.id
INNER JOIN
`esc_disciplinas` d
ON
f.disciplinas_id = d.id
INNER JOIN
`esc_assunto` a
ON
f.assunto_id = a.id
INNER JOIN
`esc_orgao` o
ON
f.orgao_id = o.id
INNER JOIN
`esc_cargo` c
ON
f.cargo_id = c.id
LIMIT 400020, 20
This query takes a long time on Sending Data stage showed in query profiler.
Sending Data 17.6 s 99.99% 1 17.6 s
EXPLAIN shows the following:
1 SIMPLE d ALL PRIMARY,idx_disciplinas_id 247
1 SIMPLE f ref idx_cargo_id,idx_orgao_id,idx_banca_id,idx_assunto_id,idx_disciplinas_id idx_disciplinas_id 4 concursos.d.id 1116
1 SIMPLE o eq_ref PRIMARY,idx_orgao_id PRIMARY 4 concursos.f.orgao_id 1
1 SIMPLE c eq_ref PRIMARY,idx_cargo_id PRIMARY 4 concursos.f.cargo_id 1
1 SIMPLE a eq_ref PRIMARY,idx_assunto_id PRIMARY 4 concursos.f.assunto_id 1
1 SIMPLE b eq_ref PRIMARY,idx_bancas_id PRIMARY 4 concursos.f.banca_id 1
I spent all day to make this work fast, but no success.
Can somebody tell me what's wrong with my select query or why MySQL doesn't use indexes?
Any help appreciated.

You have the wrong approach in several says. First, your query has no order by clause. A query is not guaranteed to return the results in the same order on multiple executions (although in practice queries do, debugging such a problem could be really hard).
So, you should add an order by, probably on the primary key of esc_questions and whatever secondary keys are necessary.
Second, the offset of 400020 is rather large. MySQL is going to generate 400,020 rows and discard them, before finding the 400,021st row.
My suggestion is to find the "id" used in the sort and then include a where clause:
where ?? > $last_id
. . .
order by ??
limit 20
This may not (or may) speed up the load the first time, but it should speed subsequent loads.

I found solution myself. I need to avoid LIMIT with offset in my JOIN query.
In order to do this I need to do some preparation:1. Get only ids from my main table without any joins on needed offset. This query took 0.0856 seconds
SELECT id FROM `esc_questions` WHERE 1 LIMIT 489980, 20
2. Create composite index in order you will then make your query. In my case I use the following index:
...
KEY `idx_filter_search` (`id`,`disciplinas_id`,`assunto_id`,`orgao_id`,`cargo_id`,`banca_id`) USING BTREE,
...
3. Finally make your query. Query took 0.0040 seconds:
SELECT SQL_NO_CACHE
f.*,
d.name disciplinas,
o.name orgao,
c.name cargo,
b.name banca,
a.name assunto
FROM
`esc_questions` f FORCE INDEX(idx_filter_search),
`esc_disciplinas` d,
`esc_assunto` a,
`esc_orgao` o,
`esc_cargo` c,
`esc_bancas` b
WHERE
f.id IN(
497442,
497444,
497445,
497447,
497449,
497450,
497452,
497453,
497454,
497456,
497458,
497459,
497461,
497462,
497464,
497465,
497467,
497468,
497470,
497471
) AND f.disciplinas_id = d.id AND f.assunto_id = a.id AND f.orgao_id = o.id AND f.cargo_id = c.id AND f.banca_id = b.id
ORDER BY
id
EXPLAIN this query will tell me that it's using my newly created index.
1 | SIMPLE | f | range | idx_filter_search | idx_filter_search | 4 | NULL | 20 | Using where
Hope this helps someone.
Thanks #GordonLinoff for pointing me to the right direction.

Related

Mysql EXISTS vs IN slow performance

I have posts and websites (and connecting post_websites). Each post can be on multiple websites, and some websites share the content, so I am trying to access the posts which are attached to particular website IDs.
Most of the cases WHERE IN works fine, but not for all websites, some of them are laggy, and I can't understand a difference.
SELECT *
FROM `posts`
WHERE `posts`.`id` IN (
SELECT `post_websites`.`post_id`
FROM `post_websites`
WHERE `website_id` IN (
12054,
19829,
2258,
253
)
) AND
`status` = 1 AND
`posts`.`deleted_at` IS NULL
ORDER BY `post_date` DESC
LIMIT 6
Explain
select_type
table
type
key
key_len
ref
rows
Extra
SIMPLE
post_websites
range
post_websites_website_id_index
4
NULL
440
Using index condition; Using temporary; Using filesort; Start temporary
SIMPLE
posts
eq_ref
PRIMARY
4
post_websites.post_id
1
Using where; End temporary
Other version with EXISTS
SELECT *
FROM `posts`
WHERE EXISTS (
SELECT `post_websites`.`post_id`
FROM `post_websites`
WHERE `website_id` IN (
12054,
19829,
2258,
253
) AND
`posts`.`id` = `post_websites`.`post_id`
) AND
`status` = 1 AND
`deleted_at` IS NULL
ORDER BY `post_date` DESC
LIMIT 6
EXPLAIN:
select_type
table
type
key
key_len
ref
rows
Extra
PRIMARY
posts
index
post_date_index
5
NULL
12
Using where
DEPENDENT SUBQUERY
post_websites
ref
post_id_website_id_unique
4
post.id
1
Using where; Using index
Long story short: based on different amounts of posts on each site and amount of websites sharing content the results are different from 20ms to 50s!
Based on the EXPLAIN the EXISTS works better, but on practice when the amount of data in subquery is lower, it can be very slow.
Is there a query I am missing that could work like a charm for all cases? Or should I check something before querying and choose the method of doing so dynamically?
migrations:
CREATE TABLE `posts` (
`id` int(10) UNSIGNED NOT NULL,
`title` varchar(225) COLLATE utf8_unicode_ci NOT NULL,
`description` varchar(500) COLLATE utf8_unicode_ci NOT NULL,
`post_date` timestamp NULL DEFAULT NULL,
`status` tinyint(4) NOT NULL DEFAULT '1',
`created_at` timestamp NULL DEFAULT NULL,
`updated_at` timestamp NULL DEFAULT NULL,
`deleted_at` timestamp NULL DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
ALTER TABLE `posts`
ADD PRIMARY KEY (`id`),
ADD KEY `created_at_index` (`created_at`) USING BTREE,
ADD KEY `status_deleted_at_index` (`status`,`deleted_at`) USING BTREE,
ADD KEY `post_date_index` (`post_date`) USING BTREE,
ADD KEY `id_post_date_status_deleted_at` (`id`,`post_date`,`status`,`deleted_at`) USING BTREE;
CREATE TABLE `post_websites` (
`post_id` int(10) UNSIGNED NOT NULL,
`website_id` int(10) UNSIGNED NOT NULL,
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
ALTER TABLE `post_websites`
ADD PRIMARY KEY (`website_id`, `post_id`),
ADD UNIQUE KEY `post_id_website_id_unique` (`post_id`,`website_id`),
ADD KEY `website_id_index` (`website_id`),
ADD KEY `post_id_index` (`post_id`);
eloquent:
$news = Post::select(['title', 'description'])
->where('status', 1)
->whereExists(
function ($query) use ($sites) {
$query->select('post_websites.post_id')
->from('post_websites')
->whereIn('websites_id', $sites)
->whereRaw('post_websites.post_id = posts.id');
})
->orderBy('post_date', 'desc');
->limit(6)
->get();
or
$q->whereIn('posts.id',
function ($query) use ($sites) {
$query->select('post_websites.post_id')
->from('post_websites')
->whereIn('website_id', $sites);
});
Thanks.
Many:many table: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
That says to get rid if id (because it slows things down), promote that UNIQUE to be the PK, and add an INDEX in the opposite direction.
Don't use IN ( SELECT ... ). A simple JOIN is probably the best alternative here.
Did some 3rd party package provide those 3 TIMESTAMPs for each table? Are they ever used? Get rid of them.
KEY `id_post_date_status_deleted_at` (`id`,`post_date`,`status`,`deleted_at`) USING BTREE;
is mostly backward. Some rules:
Don't start an index with the PRIMARY KEY column(s).
Do start an index with = tests: status,deleted_at

How to create indexes efficiently

I wish to know how I can create indexes in my database according to my data structure. most of my queries are fetching data against the ID and the name as well with two or three tables joining while pagination. please advise how to make indexes according to below queries.
Query:1
SELECT DISTINCT topic, type FROM books where type like 'Tutor-Books' order by topic
Explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE books range faith faith 102 NULL 132 Using index condition; Using temporary; Using filesort
Query:2
SELECT books.name, books.name2, books.id, books.image, books.faith,
books.topic, books.downloaded, books.viewed, books.language,
books.size, books.author as author_id, authors.name as author_name,
authors.aid
from books
LEFT JOIN authors ON books.author = authors.aid
WHERE books.id = '".$id."'
AND status = 1
Explain:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE books const PRIMARY PRIMARY 4 const 1 NULL
1 SIMPLE authors const aid aid 4 const 1 NULL
Can i use indexes for pagination in offset case where same query returns total:
SELECT SQL_CALC_FOUND_ROWS books.name, books.name2, books.id,
books.image, books.topic, books.author as author_id,
authors.name as author_name, authors.aid
from books
LEFT JOIN authors ON books.author = authors.aid
WHERE books.author = '$pid'
AND status = 1
ORDER BY books.name
LIMIT $limit OFFSET $offset
Do I need to update my queries after creating indexes. please also suggest what should be the table format.
SHOW CREATE TABLE books:
Table Create Table
books CREATE TABLE `books` (
`name` varchar(100) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`name2` varchar(150) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
`author` int(100) NOT NULL,
`translator` int(120) NOT NULL,
`publisher` int(100) NOT NULL,
`pages` int(50) NOT NULL,
`date` varchar(50) CHARACTER SET latin1 NOT NULL,
`downloaded` int(100) NOT NULL,
`alt_lnk` text NOT NULL,
`viewed` int(100) NOT NULL,
`language` varchar(100) CHARACTER SET latin1 NOT NULL,
`image` varchar(200) CHARACTER SET latin1 NOT NULL,
`faith` varchar(100) CHARACTER SET latin1 NOT NULL,
`id` int(100) NOT NULL AUTO_INCREMENT,
`sid` varchar(1200) CHARACTER SET latin1 DEFAULT NULL,
`topic` varchar(100) CHARACTER SET latin1 NOT NULL,
`last_viewed` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`size` double NOT NULL,
`status` int(2) NOT NULL DEFAULT '0',
`is_scroll` int(2) NOT NULL,
`is_downloaded` int(2) NOT NULL,
`pdf_not_found` int(2) NOT NULL,
PRIMARY KEY (`id`),
KEY `name` (`name`),
KEY `downloaded` (`downloaded`),
KEY `name2` (`name2`),
KEY `topic` (`topic`),
KEY `faith` (`faith`)
) ENGINE=InnoDB AUTO_INCREMENT=12962 DEFAULT CHARSET=utf8
where type like 'Tutor-Books' order by topic (or:)
where type = 'Tutor-Books' order by topic
--> INDEX(type, topic)
where type like '%Tutor-Books' order by topic
--> INDEX(topic) -- the leading % prevents indexing
LEFT JOIN authors ON books.author = authors.aid
--> PRIMARY KEY(aid)
Do you really need LEFT JOIN? If you can change it to JOIN, the optimizer might be able to start with authors. If it does, then
--> INDEX(author) -- in `books`
My cookbook for building indexes.
Other tips:
INT(100) and INT(2) are identical -- each is a 4-byte signed integer. Read about TINYINT UNSIGNED for numbers 0..255, etc. Use that for your flags (status, is_scroll, etc)
DATE is a datatype; using a VARCHAR is problematic if you ever want to compare or order.
Learn about composite indexes, such as my first example.
Your display widths are a little funky, but that wont cause a problem.
Query 1:
You're using the LIKE operator without a wildcard search %. You can likely swap this with an = operator.
I don't see the column type in your SHOW CREATE TABLE -- but it seems you don't have an index here, unless you renamed it to faith.
Do you need to type to be a string? could it be abstracted to a types table and then joined against using an integer? Or, if you have a fixed amount of types that's unlikely to change, could you use an enum?
Query 2:
You don't need to quote strings, also that's probably vulnerable to SQL injection. do ='.intval($id).' instead.
Make sure you have an index on authors.aid and that they're of the same type.

Never ending MySQL query during data import

I'm working on a data import routine from a set of CSV files into my main database and am stuck with this particular set of data. I've used LOAD DATA LOCAL INFILE to dump the CSV data into my table, feed_hcp_leasenote:
CREATE TABLE `feed_hcp_leasenote` (
`BLDGID` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`LEASID` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`NOTEDATE` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`REF1` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`REF2` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`LASTDATE` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`USERID` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
`NOTETEXT` varchar(1000) COLLATE utf8_unicode_ci DEFAULT NULL,
`tempid` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`tempid`),
KEY `BLDGID` (`BLDGID`),
KEY `LEASID` (`LEASID`),
KEY `REF1` (`REF1`),
KEY `NOTEDATE` (`NOTEDATE`)
) ENGINE=MyISAM AUTO_INCREMENT=65002 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
I'm trying to import this data into two tables, lease_notes and customfield_data. lease_notes only stores a unique ID value, the note itself, and the lid which links it to the lease table. customfield_data stores a variety of data for system- and user-created fields, with each record linked to another table via the linkid field. Here's the lease_note table:
CREATE TABLE `lease_notes` (
`lnid` int(11) NOT NULL AUTO_INCREMENT,
`notetext` longtext COLLATE utf8_unicode_ci NOT NULL,
`lid` int(11) NOT NULL COMMENT 'Lease ID',
PRIMARY KEY (`lnid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
And the customfield_data table:
CREATE TABLE `customfield_data` (
`cfdid` int(11) NOT NULL AUTO_INCREMENT,
`data_int` int(11) DEFAULT NULL,
`data_date` datetime DEFAULT NULL,
`data_smtext` varchar(1000) COLLATE utf8_unicode_ci DEFAULT NULL,
`data_lgtext` longtext COLLATE utf8_unicode_ci,
`data_numeric` decimal(20,2) DEFAULT NULL,
`linkid` int(11) DEFAULT NULL COMMENT 'ID value of specific item',
`cfid` int(11) NOT NULL COMMENT 'Custom field ID',
PRIMARY KEY (`cfdid`),
KEY `data_smtext` (`data_smtext`(333)),
KEY `linkid` (`linkid`),
KEY `cfid` (`cfid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
The query that is getting stuck is as follows:
SELECT NOTEDATE, REF1, REF2, LASTDATE, USERID, feed_hcp_leasenote.NOTETEXT, leases.lid, lease_notes.lnid
FROM feed_hcp_leasenote
JOIN customfield_data mrileaseid ON feed_hcp_leasenote.LEASID = mrileaseid.data_smtext AND mrileaseid.cfid = ?
JOIN leases ON mrileaseid.linkid = leases.lid
JOIN suites ON leases.sid = suites.sid
JOIN floors ON suites.fid = floors.fid
JOIN customfield_data coid ON floors.bid = coid.linkid AND coid.cfid = ? AND coid.data_smtext = feed_hcp_leasenote.BLDGID
JOIN customfield_data status ON leases.lid = status.linkid AND status.cfid = ? AND status.data_smtext <> ?
LEFT JOIN lease_notes ON leases.lid = lease_notes.lid
LEFT JOIN customfield_data notedate ON lease_notes.lnid = notedate.linkid AND notedate.data_date = feed_hcp_leasenote.NOTEDATE AND notedate.cfid = ?
LEFT JOIN customfield_data ref1 ON lease_notes.lnid = ref1.linkid AND ref1.data_smtext = feed_hcp_leasenote.REF1 AND ref1.cfid = ?
My goal with this is to return all records in feed_hcp_leasenote and, depending on whether or not lease_notes.lnid is null, insert or update the records as needed (nulls would be inserts, not nulls would be updates.) The problem is that the provided data uses a combination of 4 fields to determine uniqueness: BLDGID, LEASID, NOTEDATE, and REF1. A note will not exist without a proper BLDGID and LEASID (translated in my query to a valid lid. It can match an existing record with a valid lid, NOTEDATE, and REF1, but if those don't match then I can assume it's a new record.
If I chop off all of the LEFT JOINs and the lease_notes.lnid from the SELECT, it executes properly and gives me all records. Since I couldn't get my original query to work I played with the idea of cycling all results and performing another SELECT to see if the notedate and ref1 matched. If not, I INSERTed, otherwise UPDATE. While this approach works it can only process about 20 records per second which is a problem when I'm dealing with 30,000 at a crack.
Since I got asked about it in a previous question, here's an EXPLAIN of my query:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE status ref data_smtext,linkid,cfid cfid 4 const 934 Using where
1 SIMPLE mrileaseid ref data_smtext,linkid,cfid linkid 5 rl_hpsi.status.linkid 19 Using where
1 SIMPLE leases eq_ref PRIMARY,sid PRIMARY 4 rl_hpsi.mrileaseid.linkid 1 Using where
1 SIMPLE suites eq_ref PRIMARY,fid PRIMARY 4 rl_hpsi.leases.sid 1
1 SIMPLE floors eq_ref PRIMARY,bid PRIMARY 4 rl_hpsi.suites.fid 1
1 SIMPLE feed_hcp_leasenote ref BLDGID,LEASID LEASID 153 rl_hpsi.mrileaseid.data_smtext 19 Using where
1 SIMPLE coid ref data_smtext,linkid,cfid data_smtext 1002 rl_hpsi.feed_hcp_leasenote.BLDGID 10 Using where
1 SIMPLE lease_notes ALL NULL NULL NULL NULL 15000
1 SIMPLE notedate ref linkid,cfid linkid 5 rl_hpsi.lease_notes.lnid 24
1 SIMPLE ref1 ref data_smtext,linkid,cfid data_smtext 1002 rl_hpsi.feed_hcp_leasenote.REF1 10
Can anyone point me in the right direction? Thanks!
From our comments:
The answer is to add the columns that make an entry unique to your destination table and create a compound unique key on them. Then when inserting to that table use INSERT ON DUPLICATE KEY UPDATE to prevent duplicate data. When the insert is complete you can drop those columns if they are no longer necessary, to prevent storing data in multiple tables.

Mysql query optimisation, EXPLAIN and slow execution

Having some real issues with a few queries, this one inparticular. Info below.
tgmp_games, about 20k rows
CREATE TABLE IF NOT EXISTS `tgmp_games` (
`g_id` int(8) NOT NULL AUTO_INCREMENT,
`site_id` int(6) NOT NULL,
`g_name` varchar(255) NOT NULL,
`g_link` varchar(255) NOT NULL,
`g_url` varchar(255) NOT NULL,
`g_platforms` varchar(128) NOT NULL,
`g_added` datetime NOT NULL,
`g_cover` varchar(255) NOT NULL,
`g_impressions` int(8) NOT NULL,
PRIMARY KEY (`g_id`),
KEY `g_platforms` (`g_platforms`),
KEY `site_id` (`site_id`),
KEY `g_link` (`g_link`),
KEY `g_release` (`g_release`),
KEY `g_genre` (`g_genre`),
KEY `g_name` (`g_name`),
KEY `g_impressions` (`g_impressions`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
tgmp_reviews - about 200k rows
CREATE TABLE IF NOT EXISTS `tgmp_reviews` (
`r_id` int(8) NOT NULL AUTO_INCREMENT,
`site_id` int(6) NOT NULL,
`r_source` varchar(128) NOT NULL,
`r_date` date NOT NULL,
`r_score` int(3) NOT NULL,
`r_copy` text NOT NULL,
`r_link` text NOT NULL,
`r_int_link` text NOT NULL,
`r_parent` int(8) NOT NULL,
`r_platform` varchar(12) NOT NULL,
`r_impressions` int(8) NOT NULL,
PRIMARY KEY (`r_id`),
KEY `site_id` (`site_id`),
KEY `r_parent` (`r_parent`),
KEY `r_platform` (`r_platform`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
Here is the query, takes 3 seconds ish
SELECT * FROM tgmp_games g
RIGHT JOIN tgmp_reviews r ON g_id = r.r_parent
WHERE g.site_id = '34'
GROUP BY g_name
ORDER BY g_impressions DESC LIMIT 15
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE r ALL r_parent NULL NULL NULL 201133 Using temporary; Using filesort
1 SIMPLE g eq_ref PRIMARY,site_id PRIMARY 4 engine_comp.r.r_parent 1 Using where
I am just trying to grab the 15 most viewed games, then grab a single review (doesnt really matter which, I guess highest rated would be ideal, r_score) for each game.
Can someone help me figure out why this is so horribly inefficient?
I don't understand what is the purpose of having a GROUP BY g_name in your query, but this makes MySQL performing aggregates on the columns selected, or all columns from both table. So please try to exclude it and check if it helps.
Also, RIGHT JOIN makes database to query tgmp_reviews first, which is not what you want. I suppose LEFT JOIN is a better choice here. Please, try to change the join type.
If none of the first options helps, you need to redesign your query. As you need to obtain 15 most viewed games for the site, the query will be:
SELECT g_id
FROM tgmp_games g
WHERE site_id = 34
ORDER BY g_impressions DESC
LIMIT 15;
This is the very first part that should be executed by the database, as it provides the best selectivity. Then you can get the desired reviews for the games:
SELECT r_parent, max(r_score)
FROM tgmp_reviews r
WHERE r_parent IN (/*1st query*/)
GROUP BY r_parent;
Such construct will force database to execute the first query first (sorry for the tautology) and will give you the maximal score for each of the wanted games. I hope you will be able to use the obtained results for your purpose.
Your MyISAM table is small, you can try converting it to see if that resolves the issue. Do you have a reason for using MyISAM instead of InnoDB for that table?
You can also try running an analyze on each table to update the statistics to see if the optimizer chooses something different.

How to optimize slow query with many joins [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
My situation:
the query searches around 90,000 vehicles
the query takes long each time
I already have indexes on all the fields being JOINed.
How can I optimise it?
Here is the query:
SELECT vehicles.make_id,
vehicles.fuel_id,
vehicles.body_id,
vehicles.transmission_id,
vehicles.colour_id,
vehicles.mileage,
vehicles.vehicle_year,
vehicles.engine_size,
vehicles.trade_or_private,
vehicles.doors,
vehicles.model_id,
Round(3959 * Acos(Cos(Radians(51.465436)) *
Cos(Radians(vehicles.gps_lat)) *
Cos(
Radians(vehicles.gps_lon) - Radians(
-0.296482)) +
Sin(
Radians(51.465436)) * Sin(
Radians(vehicles.gps_lat)))) AS distance
FROM vehicles
INNER JOIN vehicles_makes
ON vehicles.make_id = vehicles_makes.id
LEFT JOIN vehicles_models
ON vehicles.model_id = vehicles_models.id
LEFT JOIN vehicles_fuel
ON vehicles.fuel_id = vehicles_fuel.id
LEFT JOIN vehicles_transmissions
ON vehicles.transmission_id = vehicles_transmissions.id
LEFT JOIN vehicles_axles
ON vehicles.axle_id = vehicles_axles.id
LEFT JOIN vehicles_sub_years
ON vehicles.sub_year_id = vehicles_sub_years.id
INNER JOIN members
ON vehicles.member_id = members.id
LEFT JOIN vehicles_categories
ON vehicles.category_id = vehicles_categories.id
WHERE vehicles.status = 1
AND vehicles.date_from < 1330349235
AND vehicles.date_to > 1330349235
AND vehicles.type_id = 1
AND ( vehicles.price >= 0
AND vehicles.price <= 1000000 )
Here is the vehicle table schema:
CREATE TABLE IF NOT EXISTS `vehicles` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`number_plate` varchar(100) NOT NULL,
`type_id` int(11) NOT NULL,
`make_id` int(11) NOT NULL,
`model_id` int(11) NOT NULL,
`model_sub_type` varchar(250) NOT NULL,
`engine_size` decimal(12,1) NOT NULL,
`vehicle_year` int(11) NOT NULL,
`sub_year_id` int(11) NOT NULL,
`mileage` int(11) NOT NULL,
`fuel_id` int(11) NOT NULL,
`transmission_id` int(11) NOT NULL,
`price` decimal(12,2) NOT NULL,
`trade_or_private` tinyint(4) NOT NULL,
`postcode` varchar(25) NOT NULL,
`gps_lat` varchar(50) NOT NULL,
`gps_lon` varchar(50) NOT NULL,
`img1` varchar(100) NOT NULL,
`img2` varchar(100) NOT NULL,
`img3` varchar(100) NOT NULL,
`img4` varchar(100) NOT NULL,
`img5` varchar(100) NOT NULL,
`img6` varchar(100) NOT NULL,
`img7` varchar(100) NOT NULL,
`img8` varchar(100) NOT NULL,
`img9` varchar(100) NOT NULL,
`img10` varchar(100) NOT NULL,
`is_featured` tinyint(4) NOT NULL,
`body_id` int(11) NOT NULL,
`colour_id` int(11) NOT NULL,
`doors` tinyint(4) NOT NULL,
`axle_id` int(11) NOT NULL,
`category_id` int(11) NOT NULL,
`contents` text NOT NULL,
`date_created` int(11) NOT NULL,
`date_edited` int(11) NOT NULL,
`date_from` int(11) NOT NULL,
`date_to` int(11) NOT NULL,
`member_id` int(11) NOT NULL,
`inactive_id` int(11) NOT NULL,
`status` tinyint(4) NOT NULL,
PRIMARY KEY (`id`),
KEY `type_id` (`type_id`),
KEY `make_id` (`make_id`),
KEY `model_id` (`model_id`),
KEY `fuel_id` (`fuel_id`),
KEY `transmission_id` (`transmission_id`),
KEY `body_id` (`body_id`),
KEY `colour_id` (`colour_id`),
KEY `axle_id` (`axle_id`),
KEY `category_id` (`category_id`),
KEY `vehicle_year` (`vehicle_year`),
KEY `mileage` (`mileage`),
KEY `status` (`status`),
KEY `date_from` (`date_from`),
KEY `date_to` (`date_to`),
KEY `trade_or_private` (`trade_or_private`),
KEY `doors` (`doors`),
KEY `price` (`price`),
KEY `engine_size` (`engine_size`),
KEY `sub_year_id` (`sub_year_id`),
KEY `member_id` (`member_id`),
KEY `date_created` (`date_created`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=136237 ;
The EXPLAIN:
1 SIMPLE vehicles ref type_id,make_id,status,date_from,date_to,price,mem... type_id 4 const 85695 Using where
1 SIMPLE members index PRIMARY PRIMARY 4 NULL 3 Using where; Using index; Using join buffer
1 SIMPLE vehicles_makes eq_ref PRIMARY PRIMARY 4 tvs.vehicles.make_id 1 Using index
1 SIMPLE vehicles_models eq_ref PRIMARY PRIMARY 4 tvs.vehicles.model_id 1 Using index
1 SIMPLE vehicles_fuel eq_ref PRIMARY PRIMARY 4 tvs.vehicles.fuel_id 1 Using index
1 SIMPLE vehicles_transmissions eq_ref PRIMARY PRIMARY 4 tvs.vehicles.transmission_id 1 Using index
1 SIMPLE vehicles_axles eq_ref PRIMARY PRIMARY 4 tvs.vehicles.axle_id 1 Using index
1 SIMPLE vehicles_sub_years eq_ref PRIMARY PRIMARY 4 tvs.vehicles.sub_year_id 1 Using index
1 SIMPLE vehicles_categories eq_ref PRIMARY PRIMARY 4 tvs.vehicles.category_id 1 Using index
Improving the WHERE clause
Your EXPLAIN shows that MySQL is only utilizing one index (type_id) for selecting the rows that match the WHERE clause, even though you have multiple criteria in the clause.
To be able to utilize an index for all of the criteria in the WHERE clause, and to reduce the size of the result set as quickly as possible, add a multi-column index on the following columns on the vehicles table:
(status, date_from, date_to, type_id, price)
The columns should be in order of highest cardinality to least.
For example, vehicles.date_from is likely to have more distinct values than status, so put the date_from column before status, like this:
(date_from, date_to, price, type_id, status)
This should reduce the rows returned in the first part of the query execution, and should be demonstrated with a lower row count on the first line of the EXPLAIN result.
You will also notice that MySQL will use the multi-column index for the WHERE in the EXPLAIN result. If, by chance, it doesn't, you should hint or force the multi-column index.
Removing the unnecessary JOINs
It doesn't appear that you are using any fields in any of the joined tables, so remove the joins. This will remove all of the additional work of the query, and get you down to one, simple execution plan (one line in the EXPLAIN result).
Each JOINed table causes an additional lookup per row of the result set. So, if the WHERE clause selects 5,000 rows from vehicles, since you have 8 joins to vehicles, you will have 5,000 * 8 = 40,000 lookups. That's a lot to ask from your database server.
Instead of expensive calculation of precise distance for all of the rows use a bounding box and calculate the exact distance only for rows inside the box.
The simplest possible example is to calculate min/max longitude and latitude that interests you and add it to WHERE clause. This way the distance will be calculated only for a subset of rows.
WHERE
vehicles.gps_lat > min_lat ANDd vehicles.gps_lat < max_lat AND
vehicles.gps_lon > min_lon AND vehicles.gps_lon < max_lon
For more complex solutions see:
MySQL spatial extensions
How to use MySQL spatial extensions
https://stackoverflow.com/a/5237509/342473
Is you SQL faster without this?
Round(3959 * Acos(Cos(Radians(51.465436)) *
Cos(Radians(vehicles.gps_lat)) *
Cos(Radians(vehicles.gps_lon) -
Radians(-0.296482)) +
Sin(Radians(51.465436)) *
Sin(Radians(vehicles.gps_lat)))) AS distance
performing math equation is very expensive
Maybe you should consider a materialized view that pre-calculate you distance, and you can select from that view. Depending on how dynamic you data is, you may not have to refresh you data too often.
To be a little more specific than #Randy of indexes, I believe his intention was to have a COMPOUND index to take advantage of your querying critieria... One index that is built on a MINIMUM of ...
( status, type_id, date_from )
but could be extended to include the date_to and price too, but don't know how much the index at that granular level might actually help
( status, type_id, date_from, date_to, price )
EDIT per Comments
You shouldn't need all those individual indexes... Yes, the Primary Key by itself. However, for the others, you should have compound indexes based on what your common querying criteria might be and remove the others... the engine might get confused on which might be best suited for the query. If you know you are always looking for a certain status, type and date (assuming vehicle searches), make that as one index. If the query is looking for such information, but also prices within that criteria it will already be very close on the few indexed records that qualify and fly through the price as just an extra criteria.
If you offer querying like Only Automatic vs Manual transmission regardless of year/make, then yes, that could be an index of its own. However, if you would TYPICALLY have some other "common" criteria, tack that on as a secondary that MAY be utilized in the query. Ex: if you look for Manual Transmissions that are 2-door vs 4-door, have your index on (transmission_id, category_id).
Again, you want whatever will help narrow down the field of criteria based on some "minimum" condition. If you tack on an extra column to the index that might "commonly" be applied, that should only help the performance.
To clarify this as an answer: if you do not already have these indexes, you should consider adding them
do you also have indexes on these:
vehicles.status
vehicles.date_from
vehicles.date_to
vehicles.type_id
vehicles.price