I am creating a face recognition system, but the search is very slow. Can you share how to speed up the search?
It takes about 6 seconds for 100,000 data items.
MySQL
mysql> SHOW VARIABLES LIKE '%version%';
+--------------------------+------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------+
| version | 8.0.29 |
| version_comment | MySQL Community Server - GPL |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
+--------------------------+------------------------------+
Table
CREATE TABLE `face_feature` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`f1` decimal(9,8) NOT NULL,
`f2` decimal(9,8) NOT NULL,
...
...
`f127` decimal(9,8) NOT NULL,
`f128` decimal(9,8) NOT NULL,
PRIMARY KEY (id)
);
Data
mysql> SELECT count(*) FROM face_feature;
+----------+
| count(*) |
+----------+
| 110004 |
+----------+
mysql> SELECT * FROM face_feature LIMIT 1\G;
id: 1
f1: -0.07603023
f2: 0.13605964
...
f127: 0.09608927
f128: 0.00082345
SQL
SELECT
id,
sqrt(
power(f1 - (-0.09077361), 2) +
power(f2 - (0.10373443), 2) +
...
...
power(f127 - (0.0778369), 2) +
power(f128 - (0.00951046), 2)
) as distance
FROM
face_feature
ORDER BY
distance
LIMIT
1
;
Result
+----+--------------------+
| id | distance |
+----+--------------------+
| 1 | 0.3376853491771237 |
+----+--------------------+
1 row in set (6.18 sec)
Update 1:
Changed from decimal(9,8) to float(9,8)
Then, improved from approximately 4sec to 3.26 sec
mysql> desc face_feature;
+-------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+----------------+
| id | int | NO | PRI | NULL | auto_increment |
| f1 | float(9,8) | NO | | NULL | |
| f2 | float(9,8) | NO | | NULL | |
..
| f127 | float(9,8) | NO | | NULL | |
| f128 | float(9,8) | NO | | NULL | |
+-------+------------+------+-----+---------+----------------+
Update 2:
Changed from POWER(z, 2) to z*z
Then, the result was changed from 3.26 sec to 4.65 sec
SELECT
id,
sqrt(
((f1 - (-0.09077361)) * (f1 - (-0.09077361))) +
((f2 - (0.10373443)) * (f2 - (0.10373443))) +
((f3 - (0.00798536)) * (f3 - (0.00798536))) +
...
...
((f126 - (0.07803915)) * (f126 - (0.07803915))) +
((f127 - (0.0778369)) * (f127 - (0.0778369))) +
((f128 - (0.00951046)) * (f128 - (0.00951046))
) as distance
FROM
face_feature
ORDER BY
distance
LIMIT
1
;
Update 3
I am looking into the usage of MySQL GIS.
How can I migrate from "float" to "points" in MySQL?
Update 4
I'm also looking at PostgreSQL because I can't find a way to handle 128 dimensions in MySQL.
DECIMAL(9,8) -- that's a lot of significant digits. Do you need that much precision?
FLOAT -- about 7 significant digits; faster arithmetic.
POWER(z, 2) -- probably a lot slower than z*z. (This may be the slowest part.)
SQRT -- In many situations, you can simply work with the squares. In this case:
SELECT SQRT(closest)
FROM ( SELECT -- leave out SQRT
... ORDER BY .. LIMIT 1 )
Here are some other thoughts. They are not necessarily relevant to the query being discussed:
Precise testing -- Beware of comparing for 'equal' Roundoff error is likely to make things unequal unexpectedly. Imprecise measurements add to the issue. If I measure something twice, I might get 1.23456789 one time and 1.23456788 the next time. (Especially at that level of "precision".
Trade complexity vs speed -- Use ABS(a - b) as the distance formula; find the 10 items closest in that way, then use the Euclidean distance to get the 'right' distance.
Break the face into regions. Find which region the item is in, then check only the subset of the 128 points that are in that region. (Being near a boundary -- put some points in two regions.)
Think out of the box -- I'm not familiar with your facial recognition, so I have run out of mathematical tricks.
Switch to POINTs and a SPATIAL index. It may be possible your task orders of magnitude faster. (This is probably not practical for 128-dimensional space.)
Related
I'm looking for a faster way to calculate Euclidean distances in SQL.
Problem I want to solve
The following "Euclidean distance calculation" is slow.
SELECT
id,
sqrt(
power(f1 - (-0.09077361), 2) +
power(f2 - (0.10373443), 2) +
...
...
power(f127 - (0.0778369), 2) +
power(f128 - (0.00951046), 2)
) as distance
FROM
face_feature
ORDER BY
distance
LIMIT
1
;
What I want to know
Can you share how to migrate from "float" to "points"?
I received the following advice, but I don't understand how.
Switch to POINTs and a SPATIAL index. It may be possible your task orders of magnitude faster.
MySQL
mysql> SHOW VARIABLES LIKE '%version%';
+--------------------------+------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------+
| version | 8.0.29 |
| version_comment | MySQL Community Server - GPL |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
+--------------------------+------------------------------+
Table
mysql> desc face_feature;
+-------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+----------------+
| id | int | NO | PRI | NULL | auto_increment |
| f1 | float(9,8) | NO | | NULL | |
| f2 | float(9,8) | NO | | NULL | |
..
| f127 | float(9,8) | NO | | NULL | |
| f128 | float(9,8) | NO | | NULL | |
+-------+------------+------+-----+---------+----------------+
Data
mysql> SELECT count(*) FROM face_feature;
+----------+
| count(*) |
+----------+
| 100003 |
+----------+
mysql> SELECT * FROM face_feature LIMIT 1\G;
id: 1
f1: -0.07603023
f2: 0.13605964
...
f127: 0.09608927
f128: 0.00082345
Reference (My other question)
How can I make "euclidean distance calculation" faster in MySQL?
Don't use FLOAT(M,N) it adds an extra rounding that only hurts various operations.
FLOAT(9,8), if the numbers are near "1.0" will lose some precision. This is because there are only 24 bits of precision in any FLOAT.
(m,n) on FLOAT and DOUBLE has been deprecated (as useless and misleading) in newer versions of MySQL.
There are helper functions to convert numeric strings to POINT values. Internally, a POINT contains two DOUBLEs. Hence the original DECIMAL(9,8) loses only a round-from-decimal-to-binary at the 53rd significant bit.
But the real question is about using SPATIAL indexing when the universe has 128 dimensions. I don't think it will work. (I have not even heard of using SPATIAL for 3 dimensions, though it should be practical.)
I am currently building a single (but extremely important in its context) query, which seems like it is working (qualitatively ok), but which I think/hope/wish could run faster.
I am running tests on MySQL 5.7.29, until a box running OmnisciDB in GPU mode can become available (which should be relatively soon). While I am hoping the switch to that different DB backend will improve performance, I am also aware it might require some tweaking in the table structures, querying techniques used, etc. But that is for later.
A little context:
Data
Is summed up as an extremely simple table:
CREATE TABLE `entities_for_perception` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`pos` POINT NOT NULL,
`perception` INT(11) NOT NULL DEFAULT '0',
`stealth` INT(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
SPATIAL INDEX `pos` (`pos`),
INDEX `perception` (`perception`),
INDEX `stealth` (`stealth`)
)
COLLATE='utf8mb4_bin'
ENGINE=InnoDB
AUTO_INCREMENT=10001
;
Which then contains values like (obvious but helps visualise :-) ):
| id | pos | perception | stealth |
| 1 | ... | 10 | 3 |
| 2 | ... | 6 | 5 |
| 3 | ... | 5 | 5 |
| 4 | ... | 7 | 7 |
etc..
Now I have this query (see below) whose intent is the following: in one pass, fetch all the ids of the "entities" that see other entities and return the list of "who sees who".
[The "in one pass" is obvious and is to limit roundtrips.]
Let's assume POINT() is in a cartesian system.
The query is the following:
SHOW WARNINGS;
SET #automatic_perception_distance := 10;
SELECT
*
FROM (
SELECT
e1.id AS oid,
e1.perception AS operception,
#max_perception_distance := e1.perception * 5 AS 'max_perception_distance',
#dist := ST_DISTANCE(e1.pos, e2.pos) AS 'dist',
# minimum 0
#dist_from_auto := GREATEST(#dist - #automatic_perception_distance, 0) AS 'dist_from_auto',
#effective_perception := (
#origin_perception - (
#dist_from_auto
/ (#max_perception_distance - #automatic_perception_distance)
* #origin_perception
)
) AS 'effective_perception',
e2.id AS tid,
e2.stealth AS tstealth
FROM
entities_for_perception e1
INNER JOIN entities_for_perception e2 ON
e1.id != e2.id
ORDER BY
oid,
dist
) AS subquery
WHERE
effective_perception >= tstealth
;
What it does is list "who seems whom" by applying the following criteria/filters:
determining a maximum distance beyond which perception is not possible
determining a minimal distance below which perception is automatic (not implemented yet)
determining an effective perception value varying (and regressing) with distance
...and comparing the effective perception of the "spotter" versus the stealth of the "target".
This works, but runs somewhat slowly (laptop + virtualbox + centos7) on a table with very few rows (~1,000). The query time seems to fluctuate between 0.2 and 0.29 seconds. This is however orders of magnitude faster than it would be with one query per "spotter", which would not scale with 1,000+ spotters. Heh. :-)
Example of output:
| oid | operception | max_perception_distance | dist | dist_fromt_auto | effective_perception | tid | tstleath |
| 1 | 9 | 45 | 1.4142135623730951 | 0 | 9 | 156 | 5 |
| 1 | 9 | 45 | 11.045361017187261 | 1.0453610171872612 | 8.731192881294705 | 164 | 2 |
| 1 | 9 | 45 | 13.341664064126334 | 3.341664064126334 | 8.140714954938943 | 163 | 8 |
| 1 | 9 | 45 | 16.97056274847714 | 6.970562748477139 | 7.207569578963021 | 125 | 7 |
| 1 | 9 | 45 | 25.019992006393608 | 15.019992006393608 | 5.137716341213072 | 152 | 3 |
| 1 | 9 | 45 | 25.079872407968907 | 15.079872407968907 | 5.122318523665138 | 191 | 5 |
etc.
Could the reason for what I believe is a slow response:
be the subquery?
be the variables or the arithmetics applied to them?
the join?
something else I am not aware of?
Thank you for any insight!
An index would probably help: CREATE INDEX idx_ID ON entities_for_perception (id);
If you were to upgrade to MySQL version 8, you could take advantage of a Common Table Expression as follows:
with e1 as (
SELECT
e1.id AS oid,
e1.perception AS operception,
#max_perception_distance := e1.perception * 5 AS 'max_perception_distance',
#dist := ST_DISTANCE(e1.pos, e2.pos) AS 'dist',
# minimum 0
#dist_from_auto := GREATEST(#dist - #automatic_perception_distance, 0) AS 'dist_from_auto',
#effective_perception := (
#origin_perception - (
#dist_from_auto
/ (#max_perception_distance - #automatic_perception_distance)
* #origin_perception
)
) AS 'effective_perception',
e2.id AS tid,
e2.stealth AS tstealth
FROM
entities_for_perception)
SELECT *
FROM e1
INNER JOIN entities_for_perception e2 ON
e1.id != e2.id
ORDER BY
oid,
dist
WHERE
effective_perception >= tstealth
;
In my rails 3 application, there is a model called Book,
Book(id: integer, link_index: integer, publish_status: integer, link_page: integer, created_at: datetime, updated_at: datetime)
link_index is allowed to be NULL, and others are not allowed to be NULL, when I query like this:
Book.where(link_page: 1).published.order('link_index DESC').limit(5).pluck(:id)
it returns [518, 331, 486, 488, 493].
but when I use map instead of pluck,
Book.where(link_page: 1).published.order('link_index DESC').limit(5).map(&:id)
it returns [518, 512, 516, 534, 566].
All we know is that: only the column where id = 518 has link_index = 4, all other columns' link_index IS NULL. So the result is right: 518 is returned as the first element.
But in above two ways, why the order among NULL elements is different?
UPDATED:
Maybe it's not about map and pluck, because I use SQL directly in mysql shell, it's always the same issue:
SELECT id FROM `books` WHERE `books`.`link_page` = 1 AND `books`.`publish_status` = 4 ORDER BY link_index DESC LIMIT 5;
returns:
+-----+
| id |
+-----+
| 518 |
| 331 |
| 486 |
| 488 |
| 493 |
+-----+
But
SELECT * FROM `books` WHERE `books`.`link_page` = 1 AND `books`.`publish_status` = 4 ORDER BY link_index DESC LIMIT 5;
returns:
+-----+------------+----------------+-----------+
| id | link_index | publish_status | link_page |
+-----+------------+----------------+-----------+
| 518 | 4 | 4 | 1 |
| 512 | NULL | 4 | 1 |
| 516 | NULL | 4 | 1 |
| 534 | NULL | 4 | 1 |
| 566 | NULL | 4 | 1 |
+-----+------------+----------------+-----------+
WHY?
map and pluck are completely different functions. map runs on the collection level where as pluck runs on db level.
http://guides.rubyonrails.org/active_record_querying.html#pluck
I suggest you to check MySQL EXPLAIN of those two. The difference will be in index used to retrieve the data or in usage of some temp table. First query returns only ID, so index scan or index merge on proper indexes can be used to get those IDs and then order depends on BTREE ordering of those. In case of the second one, plan will be different, using maybe different set of indexes or different order of them, so it picks row in other order - and if lot of values is NULL, then no "right" order exists (you did not define second column to be used in case of duplicate link_index) and mysql is free to pick what it finds best (least costly plan and other stuff hides in there).
I need some help optimizing this procedure:
DELIMITER $$
CREATE DEFINER=`ryan`#`%` PROCEDURE `GetCitiesInRadius`(
cityID numeric (15),
`range` numeric (15)
)
BEGIN
DECLARE lat1 decimal (5,2);
DECLARE long1 decimal (5,2);
DECLARE rangeFactor decimal (7,6);
SET rangeFactor = 0.014457;
SELECT `latitude`,`longitude` into lat1,long1
FROM world_cities as wc WHERE city_id = cityID;
SELECT
wc.city_id,
wc.accent_city as city,
s.state_name as state,
c.short_name as country,
GetDistance(lat1, long1, wc.`latitude`, wc.`longitude`) as dist
FROM world_cities as wc
left join states s on wc.state_id = s.state_id
left join countries c on wc.country_id = c.country_id
WHERE
wc.`latitude` BETWEEN lat1 -(`range` * rangeFactor) AND lat1 + (`range` * rangeFactor)
AND wc.`longitude` BETWEEN long1 - (`range` * rangeFactor) AND long1 + (`range` * rangeFactor)
AND GetDistance(lat1, long1, wc.`latitude`, wc.`longitude`) <= `range`
ORDER BY dist limit 6;
END
Here is my explain on the main portion of the query:
+----+-------------+-------+--------+---------------+--------------+---------+--------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+--------------+---------+--------------------------+------+----------------------------------------------+
| 1 | SIMPLE | B | range | idx_lat_long | idx_lat_long | 12 | NULL | 7619 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | s | eq_ref | PRIMARY | PRIMARY | 4 | civilipedia.B.state_id | 1 | |
| 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 1 | civilipedia.B.country_id | 1 | Using where |
+----+-------------+-------+--------+---------------+--------------+---------+--------------------------+------+----------------------------------------------+
3 rows in set (0.00 sec)
Here are the indexes:
mysql> show indexes from world_cities;
+--------------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| world_cities | 0 | PRIMARY | 1 | city_id | A | 3173958 | NULL | NULL | | BTREE | |
| world_cities | 1 | country_id | 1 | country_id | A | 23510 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | city | 1 | city | A | 3173958 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | accent_city | 1 | accent_city | A | 3173958 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | idx_pop | 1 | population | A | 28854 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | idx_lat_long | 1 | latitude | A | 1057986 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | idx_lat_long | 2 | longitude | A | 3173958 | NULL | NULL | YES | BTREE | |
| world_cities | 1 | accent_city_2 | 1 | accent_city | NULL | 1586979 | NULL | NULL | YES | FULLTEXT | |
+--------------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
8 rows in set (0.01 sec)
The function you see in the query I wouldn't think would cause the slow down, but here is the function:
CREATE DEFINER=`ryan`#`%` FUNCTION `GetDistance`(lat1 numeric (9,6),
lon1 numeric (9,6),
lat2 numeric (9,6),
lon2 numeric (9,6) ) RETURNS decimal(10,5)
BEGIN
DECLARE x decimal (20,10);
DECLARE pi decimal (21,20);
SET pi = 3.14159265358979323846;
SET x = sin( lat1 * pi/180 ) * sin( lat2 * pi/180 ) + cos(
lat1 *pi/180 ) * cos( lat2 * pi/180 ) * cos( (lon2 * pi/180) -
(lon1 *pi/180)
);
SET x = atan( ( sqrt( 1- power( x, 2 ) ) ) / x );
RETURN ( 1.852 * 60.0 * ((x/pi)*180) ) / 1.609344;
END
As far as I can tell there is not something directly wrong with your logic that would make this slow, so the problems ends up being that you can't use any indexes with this query.
MySQL needs to do a full table scan and apply the functions of your WHERE clause to each row to determine if it passed the conditions. Currently there's 1 index used: idx_lat_long.
It's a bit of a bad index, the long portion will never be used, because the lat portion is a float. But at the very least you managed to effectively filter out all rows that are outside the latitude range. But it's likely.. these are still a lot though.
You'd actually get slightly better results on the longitude, because humans only really live in the middle 30% of the earth. We're very much spread out horizontally, but not really vertically.
Regardless, the best way to further minimize the field is to try to filter out as many records in the general area. Right now it's a full vertical strip on the earth, try to make it a bounding box.
You could naively dice up the earth in say, 10x10 segments. This would in a best case make sure the query is limited to 10% of the earth ;).
But as soon as your bounding box exceeds to separate segments, only the first coordinate (lat or lng) can be used in the index and you end up with the same problem.
So when I thought of this problem I started thinking about this differently. Instead, I divided up the earth in 4 segments (lets say, north east, north west, south east, south west on map). So this gives me coordinates like:
0,0
0,1
1,0
1,1
Instead of putting the x and y value in 2 separate fields, I used it as a bit field and store both at once.
Then every 1 of the 4 boxes I divided up again, which gives us 2 sets of coordinates. The outer and inner coordinates. I'm still encoding this in the same field, which means we now use 4 bits for our 8x8 coordinate system.
How far can we go? If we assume a 64 bit integer field, it means that 32bit can be used for each of the 2 coordinates. This gives us a grid system of 4294967295 x 4294967295 all encoded into one database field.
The beauty of this field is that you can index it. This is sometimes called (I believe) a Quad-tree. If you need to select a big area in your database, you just calculate the 64bit top-left coordinate (in the 4294967295 x 4294967295 grid system) and the bottom-left, and it's guaranteed that anything that lies in that box, will also be within the two numbers.
How do you get to those numbers. Lets be lazy and assume that both our x and y coordinate have range from -180 to 180 degrees. (The y coordinate of course is half that, but we're lazy).
First we make it positive:
// assuming x and y are our long and lat.
var x+=180;
var y+=180;
So the max for those is 360 now, and (4294967295 / 360 is around 11930464).
So to convert to our new grid system, we just do:
var x*=11930464;
var y*=11930464;
Now we have to distinct numbers, and we need to turn them into 1 number. First bit 1 of x, then bit 1 of y, bit 2 of x, bit 2 of y, etc.
// The 'morton number'
morton = 0
// The current bit we're interleaving
bit = 1
// The position of the bit we're interleaving
position = 0
while(bit <= latitude or bit <= longitude) {
if (bit & latitude) morton = morton | 1 << (2*position+1)
if (bit & longitude) morton = morton | 1 << (2*position)
position += 1
bit = 1 << position
}
I'm calling the final variable 'morton', the guy who came up with it in 1966.
So this leaves us finally with the following:
For each row in your database, calculate the morton number and store it.
Whenever you do a query, first determine the maximum bounding box (as the morton number) and filter on that.
This will greatly reduce the number of records you need to check.
Here's a stored procedure I wrote that will do the calculation for you:
CREATE FUNCTION getGeoMorton(lat DOUBLE, lng DOUBLE) RETURNS BIGINT UNSIGNED DETERMINISTIC
BEGIN
-- 11930464 is round(maximum value of a 32bit integer / 360 degrees)
DECLARE bit, morton, pos BIGINT UNSIGNED DEFAULT 0;
SET #lat = CAST((lat + 90) * 11930464 AS UNSIGNED);
SET #lng = CAST((lng + 180) * 11930464 AS UNSIGNED);
SET bit = 1;
WHILE bit <= #lat || bit <= #lng DO
IF(bit & #lat) THEN SET morton = morton | ( 1 << (2 * pos + 1)); END IF;
IF(bit & #lng) THEN SET morton = morton | ( 1 << (2 * pos)); END IF;
SET pos = pos + 1;
SET bit = 1 << pos;
END WHILE;
RETURN morton;
END;
A few caveats:
The absolute worst case scenario will still scan 50% of your entire table. This chance is extremely low though, and I've seen absolutely significant performance increases for most real-world queries.
The bounding box in this case assumes a Eucllidean space, meaning.. a flat surface. In reality your bounding boxes are not exact squares, and they warp heavily when getting closer to the poles. By just making the boxes a bit larger (depending on how exact you want to be) you can get quite far. Most real-world data is also often not close to the poles ;). Remember that this filter is just a 'rough filter' to get the most of the likely unwanted rows out.
This is based on a so-called Z-Order curve. To get even better performance, if you're feeling adventurous.. you could try to go for the Hilbert Curve instead. This curve oddly rotates, which ensures that in a worst case scenario, you will only scan about 25% of the table.. Magic! In general this one will also filter much more unwanted rows.
Source for all this: I wrote 3 blogposts about this topic when I came to the same problems and tried to creatively get to a solution. I got much better performance with this compared to MySQL's GEO indexes.
http://www.rooftopsolutions.nl/blog/229
http://www.rooftopsolutions.nl/blog/230
http://www.rooftopsolutions.nl/blog/231
Given the following table:
desc exchange_rates;
+------------------+----------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| time | datetime | NO | MUL | NULL | |
| base_currency | varchar(3) | NO | MUL | NULL | |
| counter_currency | varchar(3) | NO | MUL | NULL | |
| rate | decimal(32,16) | NO | | NULL | |
+------------------+----------------+------+-----+---------+----------------+
I have added indexes on time, base_currency and counter_currency, as well as a composite index on (time, base_currency, counter_currency), but I'm seeing a big performance difference when I perform a SELECT using <= against using <.
The first SELECT is:
ExchangeRate Load (95.5ms)
SELECT * FROM `exchange_rates` WHERE (time <= '2009-12-30 14:42:02' and base_currency = 'GBP' and counter_currency = 'USD') LIMIT 1
As you can see this is taking 95ms.
If I change the query such that I compare time using < rather than <= I see this:
ExchangeRate Load (0.8ms)
SELECT * FROM `exchange_rates` WHERE (time < '2009-12-30 14:42:02' and base_currency = 'GBP' and counter_currency = 'USD') LIMIT 1
Now it takes less than 1 millisecond, which sounds right to me. Is there a rational explanation for this behaviour?
The output from EXPLAIN provides further details, but I'm not 100% sure how to intepret this:
-- Output from the first, slow, select
SIMPLE | 5,5 | exchange_rates | 1 | index_exchange_rates_on_time,index_exchange_rates_on_base_currency,index_exchange_rates_on_counter_currency,time_and_currency | index_merge | Using intersect(index_exchange_rates_on_counter_currency,index_exchange_rates_on_base_currency); Using where | 813 | | index_exchange_rates_on_counter_currency,index_exchange_rates_on_base_currency
-- Output from the second, fast, select
SIMPLE | 5 | exchange_rates | 1 | index_exchange_rates_on_time,index_exchange_rates_on_base_currency,index_exchange_rates_on_counter_currency,time_and_currency | ref | Using where | 4988 | const | index_exchange_rates_on_counter_currency
(Note: I'm producing these queries through ActiveRecord (in a Rails app) but these are ultimately the queries which are being executed)
In the first case, MySQL tries to combine results from all indexes. It fetches all records from both indexes and joins them on the value of the row pointer (table offset in MyISAM, PRIMARY KEY in InnoDB).
In the second case, it just uses a single index, which, considering LIMIT 1, is the best decision.
You need to create a composite index on (base_currency, counter_currency, time) (in this order) for this query to work as fast as possible.
The engine will use the index for filtering on the leading columns (base_currency, counter_currency) and for ordering on the trailing column (time).
It also seems you want to add something like ORDER BY time DESC to your query to get the last exchange rate.
In general, any LIMIT without ORDER BY should ring the bell.