How to optimise this slow MySQL query - late row lookups?

How to optimise this slow MySQL query - late row lookups? - mysql

I'm converting a site over to use XenForo as forum software, however this site has millions of thread rows in the MySQL table. If I try to browse a paginated listing of threads, it slows to a crawl the further I go. Once I'm at page 10,000 it takes almost 30s.
My aim is to improve the query below, perhaps by using late row lookups so that I can make this query run faster:
SELECT thread.*
,
user.*, IF(user.username IS NULL, thread.username, user.username) AS username,
NULL AS thread_read_date,
0 AS thread_is_watched,
0 AS user_post_count
FROM xf_thread AS thread
LEFT JOIN xf_user AS user ON
(user.user_id = thread.user_id)
WHERE (thread.node_id = 152) AND (thread.sticky = 0) AND (thread.discussion_state IN ('visible'))
ORDER BY thread.last_post_date DESC
LIMIT 20 OFFSET 238340
Run Time: 4.383607
Select Type Table Type Possible Keys Key Key Len Ref Rows Extra
SIMPLE thread ref node_id_last_post_date,node_id_sticky_state_last_post node_id_last_post_date 4 const 552480 Using where
SIMPLE user eq_ref PRIMARY PRIMARY 4 sitename.thread.user_id 1
Schema:
CREATE TABLE `xf_thread` (
`thread_id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`node_id` INT(10) UNSIGNED NOT NULL,
`title` VARCHAR(150) NOT NULL,
`reply_count` INT(10) UNSIGNED NOT NULL DEFAULT '0',
`view_count` INT(10) UNSIGNED NOT NULL DEFAULT '0',
`user_id` INT(10) UNSIGNED NOT NULL,
`username` VARCHAR(50) NOT NULL,
`post_date` INT(10) UNSIGNED NOT NULL,
`sticky` TINYINT(3) UNSIGNED NOT NULL DEFAULT '0',
`discussion_state` ENUM('visible','moderated','deleted') NOT NULL DEFAULT 'visible',
`discussion_open` TINYINT(3) UNSIGNED NOT NULL DEFAULT '1',
`discussion_type` VARCHAR(25) NOT NULL DEFAULT '',
`first_post_id` INT(10) UNSIGNED NOT NULL,
`first_post_likes` INT(10) UNSIGNED NOT NULL DEFAULT '0',
`last_post_date` INT(10) UNSIGNED NOT NULL,
`last_post_id` INT(10) UNSIGNED NOT NULL,
`last_post_user_id` INT(10) UNSIGNED NOT NULL,
`last_post_username` VARCHAR(50) NOT NULL,
`prefix_id` INT(10) UNSIGNED NOT NULL DEFAULT '0',
`sonnb_xengallery_import` TINYINT(3) DEFAULT '0',
PRIMARY KEY (`thread_id`),
KEY `node_id_last_post_date` (`node_id`,`last_post_date`),
KEY `node_id_sticky_state_last_post` (`node_id`,`sticky`,`discussion_state`,`last_post_date`),
KEY `last_post_date` (`last_post_date`),
KEY `post_date` (`post_date`),
KEY `user_id` (`user_id`)
) ENGINE=INNODB AUTO_INCREMENT=2977 DEFAULT CHARSET=utf8
Can anyone help me improve the speed of this query? I'm a real MySQL novice, but I am running the same dataset on other forum software and it is much faster - so I'm sure there is a way somehow. This table is INNODB and I'd consider the server well optimised.

This might help: http://explainextended.com/2009/10/23/mysql-order-by-limit-performance-late-row-lookups/
The concept being, query just the index column with your required paging/ordering, then join this list to the other columns you want from the table

Your User table is already index by user ID... good.
For your thread table, I would have a compound index on it with the key
( note_id, sticky, discussion_state, last_post_date )
This way, the index is optimized on all parts in the WHERE clause... AND since it has the last_post_date too, that can be utilized by the ORDER BY clause. Order By clauses are notorious for killing query performance.

Related

MySQL Query Optimization that touches three tables via a union of two of them

I have a query that returns results from a single table based on the provided ID existing in a column in one of two, or both, tables. The DB schema for the relevant tables is provided below as well as the initial query and then what was later recommended to me by a peer. I go into some details below as to why this query works but I need to optimize it farther for larger datasets and pagination.
CREATE TABLE `killmails` (
`id` BIGINT(20) UNSIGNED NOT NULL,
`hash` VARCHAR(255) NOT NULL,
`moon_id` BIGINT(20) NULL DEFAULT NULL,
`solar_system_id` BIGINT(20) UNSIGNED NOT NULL,
`war_id` BIGINT(20) NULL DEFAULT NULL,
`is_npc` TINYINT(1) NOT NULL DEFAULT '0',
`is_awox` TINYINT(1) NOT NULL DEFAULT '0',
`is_solo` TINYINT(1) NOT NULL DEFAULT '0',
`dropped_value` DECIMAL(18,4) UNSIGNED NOT NULL DEFAULT '0.0000',
`destroyed_value` DECIMAL(18,4) UNSIGNED NOT NULL DEFAULT '0.0000',
`fitted_value` DECIMAL(18,4) UNSIGNED NOT NULL DEFAULT '0.0000',
`total_value` DECIMAL(18,4) UNSIGNED NOT NULL DEFAULT '0.0000',
`killmail_time` DATETIME NOT NULL,
`created_at` DATETIME NOT NULL,
`updated_at` DATETIME NOT NULL,
PRIMARY KEY (`id`, `hash`),
INDEX `total_value` (`total_value`),
INDEX `killmail_time` (`killmail_time`),
INDEX `solar_system_id` (`solar_system_id`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
CREATE TABLE `killmail_attackers` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`killmail_id` BIGINT(20) UNSIGNED NOT NULL,
`alliance_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`character_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`corporation_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`faction_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`damage_done` BIGINT(20) UNSIGNED NOT NULL,
`final_blow` TINYINT(1) NOT NULL DEFAULT '0',
`security_status` DECIMAL(17,15) NOT NULL,
`ship_type_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`weapon_type_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`created_at` DATETIME NOT NULL,
`updated_at` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `ship_type_id` (`ship_type_id`),
INDEX `weapon_type_id` (`weapon_type_id`),
INDEX `alliance_id` (`alliance_id`),
INDEX `corporation_id` (`corporation_id`),
INDEX `killmail_id_character_id` (`killmail_id`, `character_id`),
CONSTRAINT `killmail_attackers_killmail_id_killmails_id_foreign_key` FOREIGN KEY (`killmail_id`) REFERENCES `killmails` (`id`) ON UPDATE CASCADE ON DELETE CASCADE
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
CREATE TABLE `killmail_victim` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`killmail_id` BIGINT(20) UNSIGNED NOT NULL,
`alliance_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`character_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`corporation_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`faction_id` BIGINT(20) UNSIGNED NULL DEFAULT NULL,
`damage_taken` BIGINT(20) UNSIGNED NOT NULL,
`ship_type_id` BIGINT(20) UNSIGNED NOT NULL,
`ship_value` DECIMAL(18,4) NOT NULL DEFAULT '0.0000',
`pos_x` DECIMAL(30,10) NULL DEFAULT NULL,
`pos_y` DECIMAL(30,10) NULL DEFAULT NULL,
`pos_z` DECIMAL(30,10) NULL DEFAULT NULL,
`created_at` DATETIME NOT NULL,
`updated_at` DATETIME NOT NULL,
PRIMARY KEY (`id`),
INDEX `corporation_id` (`corporation_id`),
INDEX `alliance_id` (`alliance_id`),
INDEX `ship_type_id` (`ship_type_id`),
INDEX `killmail_id_character_id` (`killmail_id`, `character_id`),
CONSTRAINT `killmail_victim_killmail_id_killmails_id_foreign_key` FOREIGN KEY (`killmail_id`) REFERENCES `killmails` (`id`) ON UPDATE CASCADE ON DELETE CASCADE
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;
This first query is where the problem started:
SELECT
*
FROM
killmails k
LEFT JOIN killmail_attackers ka ON k.id = ka.killmail_id
LEFT JOIN killmail_victim kv ON k.id = kv.killmail_id
WHERE
ka.character_id = ?
OR kv.character_id = ?
ORDER BY killmails.killmail_time DESC
LIMIT ? OFFSET ?
This worked okay, but long query times. We optimized to this
SELECT
killmails.*,
FROM (
SELECT killmail_victim.killmail_id FROM killmail_victim
WHERE killmail_victim.corporation_id = ?
UNION
SELECT killmail_attackers.killmail_id FROM killmail_attackers
WHERE killmail_attackers.corporation_id = ?
) SELECTED_KMS
LEFT JOIN killmails ON killmails.id = SELECTED_KMS.killmail_id
ORDER BY killmails.killmail_time DESC
LIMIT ? OFFSET ?
I saw a huge improvement in query times when looking up killmails for characters, however when I started querying for larger datasets like corporation and alliance killmails, the query slows down. This is because the queries that are union'd together can potentially return large sets of data and the time it takes to read all that into memory so that the SELECTED_KMS table can be created is what I believe is taking so much time. Most of the time, with alliances, my connection to the database times out from the application. One alliance returned 900K killmailIDs from one of the union'd tables, not sure what the other returned.
I can easily add limit statements to the internal queries, but this will introduce a lot of complications when I get to paginating the data or when I introduce a feature to search for KMs by date for example.
I am looking for suggestions on how this query can be optimized and still allow for easy pagination in the near future.
Thank You

Change INDEX(corporation_id) in both tables to INDEX(corporation_id, killmail_id) so that the inner queries will be "covering".
In general, INDEX(a) is useless when you also have INDEX(a,b). Any query that needs just a, can use either of those indexes. (This rule does not apply to b; only the "leftmost" column(s).)
Where does killmails.id come from? It's not AUTO_INCREMENT; it is not alone in the PRIMARY KEY, so there is no specified "uniqueness" constraint. Is it unique by some other design? Is it computed somewhere else in the code? (I ask because I need a feel for its uniqueness and other characteristics.)
Add INDEX(id, killmails_time).
What version are you using?
Perhaps UNION ALL give the same results? It would be faster because it would not need to de-dup.
How much RAM do you have? What is the value of innodb_buffer_pool_size?
Do you really need 8-byte BIGINTs? Even if your application is using longlong (or whatever it calls it), you can probably change the schema without changing the app.
Do you need this much precision and range? DECIMAL(30,10) -- it takes 14 bytes each. DOUBLE would give you about 16 significant digits in 8 bytes, with a wider range of values (up to about 10^308). What "units" are you using? (Overkill for light-years or parsecs; inadequate for miles or km. Perhaps AUs? Then the bottom digit would be a precision of a few meters?)
The last few questions are aimed at shrinking the table and seeing if we can avoid it being as I/O-bound as it apparently is now.
Important
innodb_buffer_pool_size = 128M is terribly small, especially for a 32GB machine, and especially if your dataset is much bigger than 128MB. If there are not any other apps running on the server, bump that setting up to 20G.

Mysql Query is not giving good performance

I have a fairly simple query in MySQL but it is taking around 170 minutes to execute.
Can anyone help me here? I am tired of applying indexes on various keys but no benefit.
Update
H20_AUDIENCE_ADDRESS_LOG L
Join
TEMP_V_3064446579 T
Using
( ZS_AUDIENCE_ID, ZS_SOURCE_OBJECT_ID, ZS_ADDRESS_TYPE_ID )
Set
ZS_ACTIVE_PERIOD_END_DT = '2015-08-14 15:05:48',
ZS_IS_ACTIVE_PERIOD = False
Where
ZS_IS_ACTIVE_PERIOD = True
And
L.ZS_ADDRESS_ID <> T.ZS_ADDRESS_ID
And
T.ZS_SOURCE_TIMESTAMP > L.ZS_SOURCE_TIMESTAMP;
Creates:
CREATE TABLE `H20_AUDIENCE_ADDRESS_LOG` (
`ZS_AUDIENCE_ADDRESS_LOG_ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`ZS_AUDIENCE_ID` bigint(20) unsigned NOT NULL,
`ZS_SOURCE_OBJECT_ID` int(10) unsigned NOT NULL,
`ZS_INSERT_DT` datetime NOT NULL,
`ZS_ADDRESS_TYPE_ID` tinyint(3) unsigned NOT NULL,
`ZS_ADDRESS_ID` bigint(20) unsigned NOT NULL,
`ZS_SOURCE_TIMESTAMP` datetime NOT NULL,
`ZS_ACTIVE_PERIOD_START_DT` datetime DEFAULT NULL,
`ZS_ACTIVE_PERIOD_END_DT` datetime DEFAULT NULL,
`ZS_IS_ACTIVE_PERIOD` bit(1) DEFAULT NULL,
`ZS_ACTIVE_PRIORITY_PERIOD_START_DT` datetime DEFAULT NULL,
`ZS_ACTIVE_PRIORITY_PERIOD_END_DT` datetime DEFAULT NULL,
`ZS_IS_ACTIVE_PRIORITY_PERIOD` bit(1) DEFAULT NULL,
PRIMARY KEY (`ZS_AUDIENCE_ADDRESS_LOG_ID`),
KEY `IX_H20_AUDIENCE_ADDRESS_LOG` (`ZS_AUDIENCE_ID`,`ZS_SOURCE_OBJECT_ID`,`ZS_ADDRESS_TYPE_ID`,`ZS_ADDRESS_ID`),
KEY `IX_ADDRESS_ID` (`ZS_ADDRESS_ID`,`ZS_IS_ACTIVE_PERIOD`)
) ENGINE=InnoDB AUTO_INCREMENT=22920801 DEFAULT CHARSET=utf8;
CREATE TABLE `TEMP_V_3064446579` (
`ZS_AUDIENCE_ID` bigint(20) unsigned NOT NULL,
`ZS_SOURCE_OBJECT_ID` int(10) unsigned NOT NULL,
`ZS_ADDRESS_TYPE_ID` tinyint(3) unsigned NOT NULL,
`ZS_ADDRESS_ID` bigint(20) unsigned NOT NULL,
`ZS_SOURCE_TIMESTAMP` datetime NOT NULL,
UNIQUE KEY `IX_TEMP_V_3064446579` (`ZS_AUDIENCE_ID`,`ZS_SOURCE_OBJECT_ID`,`ZS_ADDRESS_TYPE_ID`,`ZS_ADDRESS_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Both tables circa 3m records

Something like this should work:
UPDATE
`H20_AUDIENCE_ADDRESS_LOG` `L`
SET
`ZS_ACTIVE_PERIOD_END_DT` = '2015-08-14 15:05:48',
`ZS_IS_ACTIVE_PERIOD` = False
WHERE
`ZS_IS_ACTIVE_PERIOD` = True AND
EXISTS (
SELECT
1
FROM
`TEMP_V_3064446579` `T`
WHERE
`L`.`ZS_ADDRESS_ID` <> `T`.`ZS_ADDRESS_ID` AND
`T`.`ZS_SOURCE_TIMESTAMP` > `L`.`ZS_SOURCE_TIMESTAMP`
LIMIT 1
);

(The ZS_ makes the SQL hard to read; suggest removing it.)
In TEMP_V_3064446579, change UNIQUE to PRIMARY.
Change
KEY `IX_H20_AUDIENCE_ADDRESS_LOG` (`ZS_AUDIENCE_ID`,`ZS_SOURCE_OBJECT_ID`,
`ZS_ADDRESS_TYPE_ID`,`ZS_ADDRESS_ID`)
to
KEY `IX_H20_AUDIENCE_ADDRESS_LOG` (`ZS_AUDIENCE_ID`,`ZS_SOURCE_OBJECT_ID`,
`ZS_ADDRESS_TYPE_ID`,`ZS_ADDRESS_ID`,
`ZS_SOURCE_TIMESTAMP`)
If you have a new enough version, please provide EXPLAIN UPDATE .... If not, please provide EXPLAIN SELECT ... where the SELECT is derived from the UPDATE, but without the SET.

Find row that have a duplicate field, the filed type is blob

I have a table with many many duplicated row, I cannot create a unique value for the blob field, because is too large.
How can I find and delete the duplicate rows where the blob field (answer) is duplicated?
This is the table structure :
CREATE TABLE `answers` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`id_question` int(11) NOT NULL,
`id_user` int(11) NOT NULL,
`answer` blob NOT NULL,
`language` varchar(2) NOT NULL,
`datetime` datetime NOT NULL,
`enabled` int(11) NOT NULL DEFAULT '0',
`deleted` int(11) NOT NULL DEFAULT '0',
`spam` int(11) NOT NULL DEFAULT '0',
`correct` int(11) NOT NULL DEFAULT '0',
`notification_send` int(11) NOT NULL DEFAULT '0',
`correct_notification` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `id_question` (`id_question`),
KEY `id_user` (`id_user`),
KEY `enabled` (`enabled`)
) ENGINE=InnoDB AUTO_INCREMENT=1488 DEFAULT CHARSET=utf8mb4

probable you can use prefix of the column by substr() or left() and compare. How much size you want to take that depends on your data distribution or prefix uniqueness of the column data.
for uniqueness check you can fire the below query if the
select count(distinct left(answer, 128))/count(*), count(distinct left(answer, 256))/count(*) from answers.
This will provide you selectivity or data distribution in your column. suppose 128 gives you answer as 1 i.e. all unique if you take first 128 bytes then choose that amount of data from each row and work. Hope it helps.

MySQL index help - which is faster?

What I'm dealing with:
I have a project which uses ActiveCollab 2, and the database structure is new to me - practically everything gets stored to a project_objects table and has a recursively hierarchical relationship:
Record 1234 might be type "Ticket" with parent_id of 123
Record 123 might be type "Category" with parent_id of 12
Record 12 might be type "Milestone" and so on.
Currently there are upwards of 450,000 records in this table and many of the queries in the code reference the name field which does NOT have an index on it. An example value might be Design or Development.
This might be an example query:
SELECT * FROM project_objects WHERE type = "Ticket" and name = "Design"
My problem:
I have a query that is taking upwards of 12-15 seconds and I have a feeling it's from that
name column lacking the index and requiring the full text search. My understanding with indexes is that if I add one to the name field, it'll speed up the reads, but slow down the inserts and updates. Does the index need to get rebuilt completely every time a record is added or updated or is it just altered/appended? I don't want to optimize this query with an index if it means drastically slowing down other parts of the code base which depend on faster writes.
My question:
Assume 100 reads and 100 writes per day, which is more likely to be a faster process for MySQL - executing the above query on the above table without the index or having to rebuild the index every time a record is added?
I don't have the knowledge or authority to start running benchmarks, but I would like to offer a suggestion to the client without sounding completely novice. Thanks!
EDIT: Here is the table:
'CREATE TABLE `project_objects` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`source` varchar(50) DEFAULT NULL,
`type` varchar(30) NOT NULL DEFAULT ''ProjectObject'',
`module` varchar(30) NOT NULL DEFAULT ''system'',
`project_id` int(10) unsigned NOT NULL DEFAULT ''0'',
`milestone_id` int(10) unsigned DEFAULT NULL,
`parent_id` int(10) unsigned DEFAULT NULL,
`parent_type` varchar(30) DEFAULT NULL,
`name` varchar(150) DEFAULT NULL,
`body` longtext,
`tags` text,
`state` tinyint(4) NOT NULL DEFAULT ''0'',
`visibility` tinyint(4) NOT NULL DEFAULT ''0'',
`priority` tinyint(4) DEFAULT NULL,
`created_on` datetime DEFAULT NULL,
`created_by_id` smallint(5) unsigned NOT NULL DEFAULT ''0'',
`created_by_name` varchar(100) DEFAULT NULL,
`created_by_email` varchar(100) DEFAULT NULL,
`updated_on` datetime DEFAULT NULL,
`updated_by_id` smallint(5) unsigned DEFAULT NULL,
`updated_by_name` varchar(100) DEFAULT NULL,
`updated_by_email` varchar(100) DEFAULT NULL,
`due_on` date DEFAULT NULL,
`completed_on` datetime DEFAULT NULL,
`completed_by_id` smallint(5) unsigned DEFAULT NULL,
`completed_by_name` varchar(100) DEFAULT NULL,
`completed_by_email` varchar(100) DEFAULT NULL,
`comments_count` smallint(5) unsigned DEFAULT NULL,
`has_time` tinyint(1) unsigned NOT NULL DEFAULT ''0'',
`is_locked` tinyint(3) unsigned DEFAULT NULL,
`estimate` float(9,2) DEFAULT NULL,
`start_on` date DEFAULT NULL,
`start_on_text` varchar(50) DEFAULT NULL,
`due_on_text` varchar(50) DEFAULT NULL,
`workflow_status` int(4) DEFAULT NULL,
`varchar_field_1` varchar(255) DEFAULT NULL,
`varchar_field_2` varchar(255) DEFAULT NULL,
`integer_field_1` int(11) DEFAULT NULL,
`integer_field_2` int(11) DEFAULT NULL,
`float_field_1` double(10,2) DEFAULT NULL,
`float_field_2` double(10,2) DEFAULT NULL,
`text_field_1` longtext,
`text_field_2` longtext,
`date_field_1` date DEFAULT NULL,
`date_field_2` date DEFAULT NULL,
`datetime_field_1` datetime DEFAULT NULL,
`datetime_field_2` datetime DEFAULT NULL,
`boolean_field_1` tinyint(1) unsigned DEFAULT NULL,
`boolean_field_2` tinyint(1) unsigned DEFAULT NULL,
`position` int(10) unsigned DEFAULT NULL,
`version` int(10) unsigned NOT NULL DEFAULT ''0'',
PRIMARY KEY (`id`),
KEY `type` (`type`),
KEY `module` (`module`),
KEY `project_id` (`project_id`),
KEY `parent_id` (`parent_id`),
KEY `created_on` (`created_on`),
KEY `due_on` (`due_on`)
KEY `milestone_id` (`milestone_id`)
) ENGINE=InnoDB AUTO_INCREMENT=993109 DEFAULT CHARSET=utf8'

As #Ray points out, indexes do not have to be rebuilt on every Insert, Update or Delete operation. So, if you only want to improve efficuency of this (or similar) queries, add either an index on (name, type) or on (type, name).
Since you already have an index on (type) alone, I would add the first one:
ALTER TABLE project_objects
ADD INDEX name_type_IDX
(name, type) ;
It may take a few seconds on a busy server but it has to be done once and then all the queries with conditions like yours will benefit. It may also improve efficiency of several other types of queries that involve name only or name and type:
WHERE name = 'Design' AND type = 'Ticket' --- your query
WHERE name = 'Design' --- condition on `name` only
GROUP BY name --- group by `name`
WHERE name LIKE 'Design%' --- range condition on `name` only
WHERE name = 'Design' --- equality condition on `name`
AND type LIKE 'Ticket%' --- and range condition on `type`
WHERE name = 'Design' --- equality condition on `name`
GROUP BY type --- and group by `type`
GROUP BY name --- group by `name`
, type --- and `type`

The insert cost of adding a single point index on the name column is most likely negligible--it will probably amount to an addition of a constant time increase, probably no more that a few milliseconds. You will eat up some extra disk space, but that's usually not a concern. Nothing like the multiple seconds you're experienceing on select performance.
Add the index, enjoy the performance improvement.
BTW: Indexes aren't 'rebuilt' on every insert. They're usually implemented in B-Trees and unless you're deleting frequently, should require very little re-balancing once you get larger than a few levels (and rebalancing with little depth is pretty cheap).

mysql join not use index for 'between' operator

So basically I have three tables:
CREATE TABLE `cdIPAddressToLocation` (
`IPADDR_FROM` int(10) unsigned NOT NULL COMMENT 'Low end of the IP Address block',
`IPADDR_TO` int(10) unsigned NOT NULL COMMENT 'High end of the IP Address block',
`IPLOCID` int(10) unsigned NOT NULL COMMENT 'The Location ID for the IP Address range',
PRIMARY KEY (`IPADDR_TO`),
KEY `Index_2` USING BTREE (`IPLOCID`),
KEY `Index_3` USING BTREE (`IPADDR_FROM`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
CREATE TABLE `cdIPLocation` (
`IPLOCID` int(10) unsigned NOT NULL default '0',
`Country` varchar(4) default NULL,
`Region` int(10) unsigned default NULL,
`City` varchar(90) default NULL,
`PostalCode` varchar(10) default NULL,
`Latitude` float NOT NULL,
`Longitude` float NOT NULL,
`MetroCode` varchar(4) default NULL,
`AreaCode` varchar(4) default NULL,
`State` varchar(45) default NULL,
`Continent` varchar(10) default NULL,
PRIMARY KEY (`IPLOCID`)
) ENGINE=MyISAM AUTO_INCREMENT=218611 DEFAULT CHARSET=latin1;
and
CREATE TABLE 'data'{
'IP' varchar(50)
'SCORE' int
}
My task is to join these three tables and find the location data for given IP address.
My query is as follows:
select
t.ip,
l.Country,
l.State,
l.City,
l.PostalCode,
l.Latitude,
l.Longitude,
t.score
from
(select
ip, inet_aton(ip) ipv, score
from
data
order by score desc
limit 5) t
join
cdIPAddressToLocation a ON t.ipv between a.IPADDR_FROM and a.IPADDR_TO
join
cdIPLocation l ON l.IPLOCID = a.IPLOCID
While this query works, it's very very slow, it took about 100 seconds to return the result on my dev box.
I'm using mysql 5.1, the cdIPAddressToLocation has 5.9 million rows and cdIPLocation table has about 0.3 million rows.
When I check the execution plan, I found it's not using any index in the table 'cdIPAddressToLocation', so for each row in the 'data' table it would do a full table scan against table 'cdIPAddressToLocation'.
It is very weird to me. I mean since there are already two indexes in table 'cdIPAddressToLocation' on columns 'IPADDR_FROM' and 'IPADDR_TO', the execution plan should exploit the index to improve the performance, but why it didn't use them.
Or was there something wrong with my query?
Please help, thanks a lot.

Have you tried using a composite index on the columns cdIPAddressToLocation.IPADDR_FROM and cdIPAddressToLocation.IPADDR_TO?
Multiple-Column Indexes

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How to optimise this slow MySQL query - late row lookups? - mysql

This might help: http://explainextended.com/2009/10/23/mysql-order-by-limit-performance-late-row-lookups/ The concept being, query just the index column with your required paging/ordering, then join this list to the other columns you want from the table

Related

MySQL Query Optimization that touches three tables via a union of two of them

Mysql Query is not giving good performance

Find row that have a duplicate field, the filed type is blob

MySQL index help - which is faster?

mysql join not use index for 'between' operator

Categories

Resources