I am currently building a single (but extremely important in its context) query, which seems like it is working (qualitatively ok), but which I think/hope/wish could run faster.
I am running tests on MySQL 5.7.29, until a box running OmnisciDB in GPU mode can become available (which should be relatively soon). While I am hoping the switch to that different DB backend will improve performance, I am also aware it might require some tweaking in the table structures, querying techniques used, etc. But that is for later.
A little context:
Data
Is summed up as an extremely simple table:
CREATE TABLE `entities_for_perception` (
`id` INT(11) NOT NULL AUTO_INCREMENT,
`pos` POINT NOT NULL,
`perception` INT(11) NOT NULL DEFAULT '0',
`stealth` INT(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
SPATIAL INDEX `pos` (`pos`),
INDEX `perception` (`perception`),
INDEX `stealth` (`stealth`)
)
COLLATE='utf8mb4_bin'
ENGINE=InnoDB
AUTO_INCREMENT=10001
;
Which then contains values like (obvious but helps visualise :-) ):
| id | pos | perception | stealth |
| 1 | ... | 10 | 3 |
| 2 | ... | 6 | 5 |
| 3 | ... | 5 | 5 |
| 4 | ... | 7 | 7 |
etc..
Now I have this query (see below) whose intent is the following: in one pass, fetch all the ids of the "entities" that see other entities and return the list of "who sees who".
[The "in one pass" is obvious and is to limit roundtrips.]
Let's assume POINT() is in a cartesian system.
The query is the following:
SHOW WARNINGS;
SET #automatic_perception_distance := 10;
SELECT
*
FROM (
SELECT
e1.id AS oid,
e1.perception AS operception,
#max_perception_distance := e1.perception * 5 AS 'max_perception_distance',
#dist := ST_DISTANCE(e1.pos, e2.pos) AS 'dist',
# minimum 0
#dist_from_auto := GREATEST(#dist - #automatic_perception_distance, 0) AS 'dist_from_auto',
#effective_perception := (
#origin_perception - (
#dist_from_auto
/ (#max_perception_distance - #automatic_perception_distance)
* #origin_perception
)
) AS 'effective_perception',
e2.id AS tid,
e2.stealth AS tstealth
FROM
entities_for_perception e1
INNER JOIN entities_for_perception e2 ON
e1.id != e2.id
ORDER BY
oid,
dist
) AS subquery
WHERE
effective_perception >= tstealth
;
What it does is list "who seems whom" by applying the following criteria/filters:
determining a maximum distance beyond which perception is not possible
determining a minimal distance below which perception is automatic (not implemented yet)
determining an effective perception value varying (and regressing) with distance
...and comparing the effective perception of the "spotter" versus the stealth of the "target".
This works, but runs somewhat slowly (laptop + virtualbox + centos7) on a table with very few rows (~1,000). The query time seems to fluctuate between 0.2 and 0.29 seconds. This is however orders of magnitude faster than it would be with one query per "spotter", which would not scale with 1,000+ spotters. Heh. :-)
Example of output:
| oid | operception | max_perception_distance | dist | dist_fromt_auto | effective_perception | tid | tstleath |
| 1 | 9 | 45 | 1.4142135623730951 | 0 | 9 | 156 | 5 |
| 1 | 9 | 45 | 11.045361017187261 | 1.0453610171872612 | 8.731192881294705 | 164 | 2 |
| 1 | 9 | 45 | 13.341664064126334 | 3.341664064126334 | 8.140714954938943 | 163 | 8 |
| 1 | 9 | 45 | 16.97056274847714 | 6.970562748477139 | 7.207569578963021 | 125 | 7 |
| 1 | 9 | 45 | 25.019992006393608 | 15.019992006393608 | 5.137716341213072 | 152 | 3 |
| 1 | 9 | 45 | 25.079872407968907 | 15.079872407968907 | 5.122318523665138 | 191 | 5 |
etc.
Could the reason for what I believe is a slow response:
be the subquery?
be the variables or the arithmetics applied to them?
the join?
something else I am not aware of?
Thank you for any insight!
An index would probably help: CREATE INDEX idx_ID ON entities_for_perception (id);
If you were to upgrade to MySQL version 8, you could take advantage of a Common Table Expression as follows:
with e1 as (
SELECT
e1.id AS oid,
e1.perception AS operception,
#max_perception_distance := e1.perception * 5 AS 'max_perception_distance',
#dist := ST_DISTANCE(e1.pos, e2.pos) AS 'dist',
# minimum 0
#dist_from_auto := GREATEST(#dist - #automatic_perception_distance, 0) AS 'dist_from_auto',
#effective_perception := (
#origin_perception - (
#dist_from_auto
/ (#max_perception_distance - #automatic_perception_distance)
* #origin_perception
)
) AS 'effective_perception',
e2.id AS tid,
e2.stealth AS tstealth
FROM
entities_for_perception)
SELECT *
FROM e1
INNER JOIN entities_for_perception e2 ON
e1.id != e2.id
ORDER BY
oid,
dist
WHERE
effective_perception >= tstealth
;
Related
I am creating a face recognition system, but the search is very slow. Can you share how to speed up the search?
It takes about 6 seconds for 100,000 data items.
MySQL
mysql> SHOW VARIABLES LIKE '%version%';
+--------------------------+------------------------------+
| Variable_name | Value |
+--------------------------+------------------------------+
| version | 8.0.29 |
| version_comment | MySQL Community Server - GPL |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
+--------------------------+------------------------------+
Table
CREATE TABLE `face_feature` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`f1` decimal(9,8) NOT NULL,
`f2` decimal(9,8) NOT NULL,
...
...
`f127` decimal(9,8) NOT NULL,
`f128` decimal(9,8) NOT NULL,
PRIMARY KEY (id)
);
Data
mysql> SELECT count(*) FROM face_feature;
+----------+
| count(*) |
+----------+
| 110004 |
+----------+
mysql> SELECT * FROM face_feature LIMIT 1\G;
id: 1
f1: -0.07603023
f2: 0.13605964
...
f127: 0.09608927
f128: 0.00082345
SQL
SELECT
id,
sqrt(
power(f1 - (-0.09077361), 2) +
power(f2 - (0.10373443), 2) +
...
...
power(f127 - (0.0778369), 2) +
power(f128 - (0.00951046), 2)
) as distance
FROM
face_feature
ORDER BY
distance
LIMIT
1
;
Result
+----+--------------------+
| id | distance |
+----+--------------------+
| 1 | 0.3376853491771237 |
+----+--------------------+
1 row in set (6.18 sec)
Update 1:
Changed from decimal(9,8) to float(9,8)
Then, improved from approximately 4sec to 3.26 sec
mysql> desc face_feature;
+-------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+----------------+
| id | int | NO | PRI | NULL | auto_increment |
| f1 | float(9,8) | NO | | NULL | |
| f2 | float(9,8) | NO | | NULL | |
..
| f127 | float(9,8) | NO | | NULL | |
| f128 | float(9,8) | NO | | NULL | |
+-------+------------+------+-----+---------+----------------+
Update 2:
Changed from POWER(z, 2) to z*z
Then, the result was changed from 3.26 sec to 4.65 sec
SELECT
id,
sqrt(
((f1 - (-0.09077361)) * (f1 - (-0.09077361))) +
((f2 - (0.10373443)) * (f2 - (0.10373443))) +
((f3 - (0.00798536)) * (f3 - (0.00798536))) +
...
...
((f126 - (0.07803915)) * (f126 - (0.07803915))) +
((f127 - (0.0778369)) * (f127 - (0.0778369))) +
((f128 - (0.00951046)) * (f128 - (0.00951046))
) as distance
FROM
face_feature
ORDER BY
distance
LIMIT
1
;
Update 3
I am looking into the usage of MySQL GIS.
How can I migrate from "float" to "points" in MySQL?
Update 4
I'm also looking at PostgreSQL because I can't find a way to handle 128 dimensions in MySQL.
DECIMAL(9,8) -- that's a lot of significant digits. Do you need that much precision?
FLOAT -- about 7 significant digits; faster arithmetic.
POWER(z, 2) -- probably a lot slower than z*z. (This may be the slowest part.)
SQRT -- In many situations, you can simply work with the squares. In this case:
SELECT SQRT(closest)
FROM ( SELECT -- leave out SQRT
... ORDER BY .. LIMIT 1 )
Here are some other thoughts. They are not necessarily relevant to the query being discussed:
Precise testing -- Beware of comparing for 'equal' Roundoff error is likely to make things unequal unexpectedly. Imprecise measurements add to the issue. If I measure something twice, I might get 1.23456789 one time and 1.23456788 the next time. (Especially at that level of "precision".
Trade complexity vs speed -- Use ABS(a - b) as the distance formula; find the 10 items closest in that way, then use the Euclidean distance to get the 'right' distance.
Break the face into regions. Find which region the item is in, then check only the subset of the 128 points that are in that region. (Being near a boundary -- put some points in two regions.)
Think out of the box -- I'm not familiar with your facial recognition, so I have run out of mathematical tricks.
Switch to POINTs and a SPATIAL index. It may be possible your task orders of magnitude faster. (This is probably not practical for 128-dimensional space.)
I want to check what range/level the number is in. I have the table of buy and pay. Only I can thinking of is about between. But here, it is different because the column is not only min and max.
pay_level
| id | type | buy1 | pay1 | buy2 | pay2 | buy3 | pay3 |
|----|------|------|------|------|------|------|------|
| 1 | p1 | 10 | 100 | 20 | 80 | 30 | 70 |
|----|------|------|------|------|------|------|------|
| 2 | p2 | 10 | 100 | 20 | 80 | 30 | 70 |
|----|------|------|------|------|------|------|------|
| 3 | p3 | 5 | 500 | 10 | 400 | 30 | 300 |
|----|------|------|------|------|------|------|------|
Ok, according to the table above. My goal is to see how much cost is the incoming order.
For example.
A order p1 for 12 unit. So the price per unit is 100. Because he is buying between buy1 and buy2
B order p1 for 15 units. Then he got 100 per unit as well as A.
C order p1 for 25 units. He got 70 because it's in between pay2 and pay3.
What I can thinking of is to compare 2 columns where the order in between. So my code is:
select * from pay_level where order between buy1 and buy2 and type='p1'
But the problem is occurs when the order is more than 20 (of buy2). I know my English is not good to explain this clear enough. Hope you understand.
First normalise your schema design...
DROP TABLE IF EXISTS wilf;
CREATE TABLE wilf
(id INT AUTO_INCREMENT PRIMARY KEY
,type INT NOT NULL
,x INT NOT NULL
,buy INT NOT NULL
,pay INT NOT NULL
);
INSERT INTO wilf VALUES
(1,1,1,10,100),
(2,2,1,10,100),
(3,3,1, 5,500),
(4,1,2,20, 80),
(5,2,2,20, 80),
(6,3,2,10,400),
(7,1,3,30, 70),
(8,2,3,30, 70),
(9,3,3,30,300);
...and then your queries become trivial...
SELECT pay FROM wilf WHERE type = 1 AND buy < 12 ORDER BY id DESC LIMIT 1;
+-----+
| pay |
+-----+
| 100 |
+-----+
(And C should have got 80)
You'll need a CASE expression to navigate this one since you can't dynamically refer to a database object (table, column, etc) in your sql.
I think something like the following would get you in the ballpark:
SELECT
CASE WHEN order BETWEEN buy1 and buy2 THEN pay1
WHEN order BETWEEN buy2 and buy3 THEN pay2
WHEN order > buy3 THEN pay3 END as cost
FROM pay_level
WHERE type = 'p1'
In my projects I often need to store the result of a SELECT in another table (we call this a "resultset"). The reason is to dynamically display a large number of rows in a web application while loading only small chunks as necessary.
Typically, this is done by queries such as this one:
SET #counter := 0;
INSERT INTO resultsetdata
SELECT "12345", #counter:=#counter+1, a.ID
FROM sometable a
JOIN bigtable b
WHERE (a.foo = b.bar)
ORDER BY a.whatever DESC;
The fixed "12345" value is just a value to identify the "resultset" as a whole and changes for each query. The second column is a incrementing index counter that is meant to allow direct access to a specific row in the result and the ID column references the specific row in the source data table.
When the application needs a certain range of the result I just join resultsetdata with the source table to get the detailed data - which is quick as opposed to the resultsetdata query above which may take 2-3 seconds to complete (which explains why I need this intermediary table).
The SELECT query itself is not relevant for this question.
resultsetdata has the following structure:
CREATE TABLE `resultsetdata` (
`ID` int(11) NOT NULL,
`ContIdx` int(11) NOT NULL,
`Value` int(11) NOT NULL,
PRIMARY KEY (`ID`,`ContIdx`)
) ENGINE=InnoDB;
This usually works like a charm but lately we noticed that in some cases the ORDER of the result is not correct. This depends on the query itself (for example, adding DISTINCT is a typical cause), the server version and the data contained in the source tables, so I guess one can say that the row order is unpredictable with this method. Probably it depends on internal optimizations.
However, the problem is now that I can't think of any alternative solution that gives me the expected result.
Since the resultset can get several thousands of rows, loading all data in memory and then manually INSERTing it is not feasible.
Any suggestions?
EDIT: For further clarification, have a look at these queries:
DROP TABLE IF EXISTS test;
CREATE TABLE test (ID INT NOT NULL, PRIMARY KEY(ID)) ENGINE=InnoDB;
INSERT INTO test (ID) VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
SET #counter:=0;
SELECT "12345", #counter:=#counter+1, ID
FROM test
ORDER BY ID DESC;
This produces the following result as "expected":
+-------+----------------------+----+
| 12345 | #counter:=#counter+1 | ID |
+-------+----------------------+----+
| 12345 | 1 | 10 |
| 12345 | 2 | 9 |
| 12345 | 3 | 8 |
| 12345 | 4 | 7 |
| 12345 | 5 | 6 |
| 12345 | 6 | 5 |
| 12345 | 7 | 4 |
| 12345 | 8 | 3 |
| 12345 | 9 | 2 |
| 12345 | 10 | 1 |
+-------+----------------------+----+
10 rows in set (0.00 sec)
As said, in some cases (I can't provide a testcase here, sorry), this may lead to a result similar to this:
+-------+----------------------+----+
| 12345 | #counter:=#counter+1 | ID |
+-------+----------------------+----+
| 12345 | 10 | 10 |
| 12345 | 9 | 9 |
| 12345 | 8 | 8 |
| 12345 | 7 | 7 |
| 12345 | 6 | 6 |
| 12345 | 5 | 5 |
| 12345 | 4 | 4 |
| 12345 | 3 | 3 |
| 12345 | 2 | 2 |
| 12345 | 1 | 1 |
+-------+----------------------+----+
I'm not saying this is a MySQL bug and I fully understand that my method currently provides unpredictable results. Still, I don't know how to tweak this to get predictable results.
This is because the order that records are sorted when they are inserted is unrelated to the order when you retrieve them.
When you retrieve them a query plan will be created. If no ORDER BY is specified in your SELECT statement then the order will depend on the query plan produced. This is why it is unpredictable and adding DISTINCT can change the order.
The solution is to store enough data that you can retrieve them in the correct order using an ORDER BY clause. In your case you have ordered your data by a.whatever. Can a.whatever be stored in resultsetdata? If so then you can read the records out in the correct order.
Maybe you could wrap the select into another select:
SET #counter := 0;
INSERT INTO resultsetdata
SELECT *, #counter := #counter + 1
FROM (
SELECT "12345", a.ID
FROM sometable a
JOIN bigtable b
WHERE a.foo = b.bar
ORDER BY a.whatever DESC
) AS tmp
... but you are still at the mercy of the dumbness of MySQL's optimizer.
That's all I found about this topic, but I couln't find a hard guarantee:
Pure-SQL Technique for Auto-Numbering Rows in Result Set
http://www.xaprb.com/blog/2006/12/02/how-to-number-rows-in-mysql/
http://www.xaprb.com/blog/2005/09/27/simulating-the-sql-row_number-function/
I am using MySQL version 5.5 on Ubuntu.
My database tables are setup as follows:
DDLs:
CREATE TABLE 'asx' (
'code' char(3) NOT NULL,
'high' decimal(9,3),
'low' decimal(9,3),
'close' decimal(9,3),
'histID' int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY ('histID'),
UNIQUE KEY 'code' ('code')
)
CREATE TABLE 'asxhist' (
'date' date NOT NULL,
'average' decimal(9,3),
'histID' int(11) NOT NULL,
PRIMARY KEY ('date','histID'),
KEY 'histID' ('histID'),
CONSTRAINT 'asxhist_ibfk_1' FOREIGN KEY ('histID') REFERENCES 'asx' ('histID')
ON UPDATE CASCADE
)
t1:
| code | high | low | close | histID (primary key)|
| asx | 10.000 | 9.500 | 9.800 | 1
| nab | 42.000 | 41.250 | 41.350 | 2
t2:
| date | average | histID (foreign key) |
| 2013-01-01| 10.000 | 1 |
| 2013-01-01| 39.000 | 2 |
| 2013-01-02| 9.000 | 1 |
| 2013-01-02| 38.000 | 2 |
| 2013-01-03| 9.500 | 1 |
| 2013-01-03| 39.500 | 2 |
| 2013-01-04| 11.000 | 1 |
| 2013-01-04| 38.500 | 2 |
I am attempting to complete a select query that produces this as a result:
| code | high | low | close | asxhist.average |
| asx | 10.000 | 9.500 | 9.800 | 11.000, 9.5000 |
| nab | 42.000 | 41.250 | 41.350 | 38.500,39.500 |
Where the most recent information in table 2 is returned with table 1 in a csv format.
I have managed to get this far:
SELECT code, high, low, close,
(SELECT GROUP_CONCAT(DISTINCT t2.average ORDER BY date DESC SEPARATOR ',') FROM t2
WHERE t2.histID = t1.histID)
FROM t1;
Unfortunately this returns all values associated with hID. I'm taking a look at xaprb.com's firstleastmax-row-per-group-in-sql solution but I have been banging my head all day and the slight wooziness seems to be dimming my ability to comprehend how I should use it to my benefit. How can I limit the results to the most 5 recent values and considering the tables will eventually be megabytes in size, try and remain in O(n2) or less? (Or can I?)
Temporary work around using SUBSTRING_INDEX and not a feasible solution for huge data
SELECT code, high, low, close,
(SELECT SUBSTRING_INDEX(GROUP_CONCAT(asxhist.average), ',', 3)
FROM asxhist
WHERE asxhist.histID = asx.histID
ORDER BY date DESC)
FROM asx;
From what I gather Limit option in GROUP_CONCAT is still under feature-request.
Also on stackoverflow hack MySQL GROUP_CONCAT
I have a transitional table that I temporarily fill with some values before querying it and destroying it.
CREATE TABLE SearchListA(
`pTime` int unsigned NOT NULL ,
`STD` double unsigned NOT NULL,
`STD_Pos` int unsigned NOT NULL,
`SearchEnd` int unsigned NOT NULL,
UNIQUE INDEX (`pTime`,`STD` ASC) USING BTREE
) ENGINE = MEMORY;
It looks as such:
+------------+------------+---------+------------+
| pTime | STD | STD_Pos | SearchEnd |
+------------+------------+---------+------------+
| 1105715400 | 1.58474499 | 0 | 1105723200 |
| 1106297700 | 2.5997839 | 0 | 1106544000 |
| 1107440400 | 2.04860375 | 0 | 1107440700 |
| 1107440700 | 1.58864998 | 0 | 1107467400 |
| 1107467400 | 1.55207218 | 0 | 1107790500 |
| 1107790500 | 2.04239417 | 0 | 1108022100 |
| 1108022100 | 1.61385678 | 0 | 1108128000 |
| 1108771500 | 1.58835083 | 0 | 1108771800 |
| 1108771800 | 1.65734727 | 0 | 1108772100 |
| 1108772100 | 2.09378189 | 0 | 1109027700 |
+------------+------------+---------+------------+
Only columns pTime and SearchEnd are relevant to my problem.
My intention is to use this table to speed up searching through a much larger, static table.
The first column, pTime, is where the search should start
The fourth column, SearchEnd, is where the search should end
The larger table is similar; it looks like this:
CREATE TABLE `b50d1_abs` (
`pTime` int(10) unsigned NOT NULL,
`Slope` double NOT NULL,
`STD` double NOT NULL,
`Slope_Pos` int(11) NOT NULL,
`STD_Pos` int(11) NOT NULL,
PRIMARY KEY (`pTime`),
KEY `Slope` (`Slope`) USING BTREE,
KEY `STD` (`STD`),
KEY `ID1` (`pTime`,`STD`) USING BTREE
) ENGINE=MyISAM DEFAULT CHARSET=latin1 MIN_ROWS=339331 MAX_ROWS=539331 PACK_KEYS=1 ROW_FORMAT=FIXED;
+------------+-------------+------------+-----------+---------+
| pTime | Slope | STD | Slope_Pos | STD_Pos |
+------------+-------------+------------+-----------+---------+
| 1107309300 | 1.63257919 | 1.39241698 | 0 | 1 |
| 1107314400 | 6.8959276 | 0.22425643 | 1 | 1 |
| 1107323100 | 18.19909502 | 1.46854808 | 1 | 0 |
| 1107335400 | 2.50135747 | 0.4736305 | 0 | 0 |
| 1107362100 | 4.28778281 | 0.85576985 | 0 | 1 |
| 1107363300 | 6.96289593 | 1.41299044 | 0 | 0 |
| 1107363900 | 8.10316742 | 0.2859726 | 0 | 0 |
| 1107367500 | 16.62443439 | 0.61587645 | 0 | 0 |
| 1107368400 | 19.37918552 | 1.18746968 | 0 | 0 |
| 1107369300 | 21.94570136 | 0.94261744 | 0 | 0 |
| 1107371400 | 25.85701357 | 0.2741292 | 0 | 1 |
| 1107375300 | 21.98914027 | 1.59521158 | 0 | 1 |
| 1107375600 | 20.80542986 | 1.59231289 | 0 | 1 |
| 1107375900 | 19.62714932 | 1.50661679 | 0 | 1 |
| 1107381900 | 8.23167421 | 0.98048205 | 1 | 1 |
| 1107383400 | 10.68778281 | 1.41607579 | 1 | 0 |
+------------+-------------+------------+-----------+---------+
...etc (439340 rows)
Here, the columns pTime, STD, and STD_Pos are relevant to my problem.
For every element in the smaller table (SearchListA), I need to search the specified range within the larger table (b50d1_abs()) and return the row with the lowest b50d1_abs.pTime that is higher than the current SearchListA.pTime and that also matches the following conditions:
SearchListA.STD < b50d1_abs.STD AND SearchListA.STD_Pos <> b50d1_abs.STD_Pos
AND
b50d1_abs.pTime < SearchListA.SearchEnd
The latter condition is simply to reduce the length of the search.
This seems to me like a pretty straightforward query that should be able to use indexes; especially since all values are unsigned numbers - But I cannot get it to execute nearly fast enough! I think it is because it rebuilds the entire table each time instead of just omitting values from it.
I would be extremely grateful if someone takes a look at my code and figures out a more efficient way to go about this:
SELECT
m.pTime as OpenTime,
m.STD,
m.STD_Pos,
mu.pTime AS CloseTime
FROM
SearchListA m
JOIN b50d1_abs mu ON mu.pTime =(
SELECT
md.pTime
FROM
b50d1_abs as md
WHERE
md.pTime > m.pTime
AND md.pTime <=m.SearchEnd
AND m.STD < md.STD AND m.STD_Pos <> md.STD_Pos
LIMIT 1
);
Here is my EXPLAIN EXTENDED statement:
+----+--------------------+-------+--------+-----------------+---------+---------+------+--------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+--------+-----------------+---------+---------+------+--------+----------+--------------------------+
| 1 | PRIMARY | m | ALL | NULL | NULL | NULL | NULL | 365 | 100.00 | |
| 1 | PRIMARY | mu | eq_ref | PRIMARY,ID1 | PRIMARY | 4 | func | 1 | 100.00 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | md | ALL | PRIMARY,STD,ID1 | NULL | NULL | NULL | 439340 | 100.00 | Using where |
+----+--------------------+-------+--------+-----------------+---------+---------+------+--------+----------+--------------------------+
It looks like the lengthiest query (#2) doesn't use indexes at all!
If I try FORCE INDEX then it will list it under possible_keys, but still list NULL under Key and still take an extremely long time (over 80 seconds).
I need to get this query under 10 second; and even 10 is too long.
Your subquery is a dependent subquery, so the best case is that it's going to be evaluated once for every row in table m. Since m contains few rows, that would be OK.
But if you put that subquery in a JOIN condition, it is going to be executed (rows in m)*(rows in mu) times, no matter what.
Note that your results may be incorrect since :
return the row with the lowest b50d1_abs.pTime
but you don't specify that anywhere.
Try this query :
SELECT
m.pTime as OpenTime,
m.STD,
m.STD_Pos,
(
SELECT min( big.pTime )
FROM b50d1_abs as big
WHERE big.pTime > m.pTime
AND big.pTime <= m.SearchEnd
AND m.STD < big.STD AND m.STD_Pos <> big.STD_Pos
) AS CloseTime
FROM SearchListA m
or this one :
SELECT
m.pTime as OpenTime,
m.STD,
m.STD_Pos,
min( big.pTime )
FROM
SearchListA m
JOIN b50d1_abs as big ON (
big.pTime > m.pTime
AND big.pTime <= m.SearchEnd
AND m.STD < big.STD AND m.STD_Pos <> big.STD_Pos
)
GROUP BY m.pTime
(if you also want rows where the search was unsuccessful, make that a LEFT JOIN).
SELECT
m.pTime as OpenTime,
m.STD,
m.STD_Pos,
(
SELECT big.pTime
FROM b50d1_abs as big
WHERE big.pTime > m.pTime
AND big.pTime <= m.SearchEnd
AND m.STD < big.STD AND m.STD_Pos <> big.STD_Pos
ORDER BY big.pTime LIMIT 1
) AS CloseTime
FROM SearchListA m
(Try an index on b50d1_abs( pTime, STD, STD_Pos)
FYI here are some tests using Postgres on a test data set that should look like yours (maybe remotely, lol)
CREATE TABLE small (
pTime INT PRIMARY KEY,
STD FLOAT NOT NULL,
STD_POS BOOL NOT NULL,
SearchEnd INT NOT NULL
);
CREATE TABLE big(
pTime INTEGER PRIMARY KEY,
Slope FLOAT NOT NULL,
STD FLOAT NOT NULL,
Slope_Pos BOOL NOT NULL,
STD_POS BOOL NOT NULL
);
INSERT INTO small SELECT
n*100000,
random(),
random()<0.1,
n*100000+random()*50000
FROM generate_series( 1, 365 ) n;
INSERT INTO big SELECT
n*100,
random(),
random(),
random() > 0.5,
random() > 0.5
FROM generate_series( 1, 500000 ) n;
Query 1 : 6.90 ms (yes milliseconds)
Query 2 : 48.20 ms
Query 3 : 6.46 ms
I'll start a new answer cause it starts to look like a mess ;)
With your data I get, using MySQL 5.1.41
Query 1 : takes forever, Ctrl-C
Query 2 : 520 ms
Query 3 : takes forever, Ctrl-C
Explain for 2 looks good :
+----+-------------+-------+------+---------------------+------+---------+------+--------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------------+------+---------+------+--------+------------------------------------------------+
| 1 | SIMPLE | m | ALL | PRIMARY,STD,ID1,ID2 | NULL | NULL | NULL | 743 | Using temporary; Using filesort |
| 1 | SIMPLE | big | ALL | PRIMARY,ID1,ID2 | NULL | NULL | NULL | 439340 | Range checked for each record (index map: 0x7) |
+----+-------------+-------+------+---------------------+------+---------+------+--------+------------------------------------------------+
So, I loaded your data into postgres...
Query 1 : 14.8 ms
Query 2 : 100 ms
Query 3 : 14.8 ms (same plan as 1)
In fact rewriting 2 as query 1 (or 3) fixes a little optimizer shortcoming and finds the optimal query plan for this scenario.
Would you recommend using Postgres over MySql for this scenario?
Speed is extremely important to me.
Well, I don't know why mysql barfs so much on queries 1 and 3 (which are pretty simple and easy), in fact it should even beat postgres (using an index only scan) but apparently not, eh. You should ask a mysql specialist !
I'm more used to postgres... got fed up with mysql a long time ago ! If you need complex queries postgres usually wins big time (but you'll need to re-learn how to optimize and tune your new database)...