Faster to do a 'WHERE IN (SELECT)' or 'WHERE x = (SELECT)' - mysql

Which select statement is better?
SELECT *
FROM aTable
WHERE aField in (
SELECT xField
FROM bTable
WHERE yField > 5
);
OR
SELECT *
FROM aTable
WHERE (
SELECT yField
FROM bTable
WHERE aTable.aField = bTable.xField
) > 5;

They produce very similar execution plans (on my test tables, which are tiny; YMMV, always profile real data), and there's a third alternative you may want to consider instead:
The first:
EXPLAIN SELECT * FROM aTable WHERE aField in (SELECT xField FROM bTable WHERE yField > 5);
+----+--------------------+--------+-------+---------------+---------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------+-------+---------------+---------------+---------+------+------+-------------+
| 1 | PRIMARY | aTable | ALL | NULL | NULL | NULL | NULL | 4 | Using where |
| 2 | DEPENDENT SUBQUERY | bTable | range | bTable_yField | bTable_yField | 5 | NULL | 2 | Using where |
+----+--------------------+--------+-------+---------------+---------------+---------+------+------+-------------+
The second:
EXPLAIN SELECT * FROM aTable WHERE (SELECT yField FROM bTable WHERE aTable.aField = bTable.xField) > 5;
+----+--------------------+--------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+--------+------+---------------+------+---------+------+------+-------------+
| 1 | PRIMARY | aTable | ALL | NULL | NULL | NULL | NULL | 4 | Using where |
| 2 | DEPENDENT SUBQUERY | bTable | ALL | NULL | NULL | NULL | NULL | 4 | Using where |
+----+--------------------+--------+------+---------------+------+---------+------+------+-------------+
Both result in a dependent subquery; on my example tables, the first one gets the benefit of the index (I assume bTable.yField is indexed) while the second doesn't.
You can avoid the dependent subquery and get better up-front filtering using a JOIN:
The third alternative:
EXPLAIN SELECT * FROM aTable INNER JOIN bTable On aTable.aField = bTable.xField WHERE bTable.yField > 5;
+----+-------------+--------+-------+---------------+---------------+---------+------+------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+---------------+---------+------+------+--------------------------------+
| 1 | SIMPLE | bTable | range | bTable_yField | bTable_yField | 5 | NULL | 2 | Using where |
| 1 | SIMPLE | aTable | ALL | NULL | NULL | NULL | NULL | 4 | Using where; Using join buffer |
+----+-------------+--------+-------+---------------+---------------+---------+------+------+--------------------------------+
Again, though, you really have to profile with your schema and your representative real-world data, as the optimizer may make different decisions.
More comparing these sorts of techniques in this excellent article by quassnoi.
For reference, here is how I created aTable and bTable (as you didn't provide definitions) and tested your queries:
mysql> CREATE TABLE aTable (aField INT, aMore VARCHAR(200));
Query OK, 0 rows affected (0.01 sec)
mysql> CREATE TABLE bTable (xField INT, yField INT);
Query OK, 0 rows affected (0.02 sec)
mysql> INSERT INTO aTable (aField, aMore) VALUES (1, 'One'), (2, 'Two'), (3, 'Three'), (4, 'Four');
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> INSERT INTO bTable (xField, yField) VALUES (1, 10), (2, 2), (3, 20), (4, 4);
Query OK, 4 rows affected (0.02 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> CREATE INDEX bTable_yField ON bTable(yField);
Query OK, 0 rows affected (0.05 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM aTable WHERE aField in (SELECT xField FROM bTable WHERE yField > 5);
+--------+-------+
| aField | aMore |
+--------+-------+
| 1 | One |
| 3 | Three |
+--------+-------+
2 rows in set (0.00 sec)
mysql> SELECT * FROM aTable WHERE (SELECT yField FROM bTable WHERE aTable.aField = bTable.xField) > 5;
+--------+-------+
| aField | aMore |
+--------+-------+
| 1 | One |
| 3 | Three |
+--------+-------+
2 rows in set (0.00 sec)

I think the second one translates to correlated sub-query semantics and so is costly, compared to the first one. The best would be to just JOIN the two tables, as follows:
SELECT
a.*
FROM
aTable a
JOIN bTable b
ON aTable.aField = bTable.xField
WHERE
b.xField > 5
This will save you from large number of results in the IN clause, in case of the first query, that would make the query execution slower, and at times results in overflow error (SQL Server used to have a limit of 32767 values in the IN clause after which it used to throw this overflow error).

Alot depends on the indexing of the tables and whether indexed columns are used within the join condition. A combination of these will go some way as to how the SQL Engine 'decides' to construct the query internally and will ultimately affect the query performance. Not too sure on MySQL but certainly SQL Server will allow an Execution Plan to be created which will show potential bottlenecks.

Related

How to uniquely order a MySQL table with a multi column primary key

Good day all
I have a strange query.
Let's say I have a table with a composite primary key (2 columns).
CREATE TABLE `testtable` (
`ifk1` INT(10) NOT NULL,
`ifk2` INT(10) NOT NULL,
`data1` VARCHAR(10) DEFAULT NULL,
PRIMARY KEY (`ifk1`,`ifk2`),
UNIQUE KEY `keyName` (`data1`)
) ENGINE=INNODB DEFAULT CHARSET=utf8mb4
Let's add some basic data
INSERT INTO testtable(ifk1 , ifk2 , data1)
VALUES (1 , 2 , 'a') , (5 , 2 , 'b') , (2 , 4 , 'c') , (5 , 8 , 'd') , (2 , 2 , 'e') , (2 , 5 , 'f');
Let's do a simple SELECT to see what order the data comes out in:
ifk1 ifk2 data1
1 2 a
2 2 e
2 4 c
2 5 f
5 2 b
5 8 d
Now, what if I want to write some code to iterate through the table, grabbing X number of records at a time.
With a small set of data, this is simple:
SELECT * FROM testtable LIMIT 0 , 2;
SELECT * FROM testtable LIMIT 2 , 2;
SELECT * FROM testtable LIMIT 4 , 2;
This is going to run into some problems as the table gets bigger, as it's not using a WHERE clause and so not using an INDEX.
How do I use a WHERE clause to replicate the above SELECTS?
SELECT * FROM testtable WHERE ifk1 > 0 AND ifk2 > 0 LIMIT 2; -- this will work
The first one is easy, but what about the others?
Is there a way to do that?
A LIMIT clause without an ORDER BY clause is arbitrary. All three queries you are showing:
SELECT * FROM testtable LIMIT 0 , 2;
SELECT * FROM testtable LIMIT 2 , 2;
SELECT * FROM testtable LIMIT 4 , 2;
could return the exact same two rows. So, you must add an ORDER BY clause to make this work reliably: ORDER BY ifk1, ifk2.
But, yes, having to sort the data again and again for every access can take a lot of time. This is why we try to avoid using offsets and work with a key instead:
SELECT *
FROM testtable
WHERE ifk1 > #last_ifk1 OR (ifk1 = #last_ifk1 AND ifk2 > #last_ifk2)
ORDER BY ifk1, ifk2
LIMIT 2;
Paging is almost always quite slow. But this access method can use the primary key's unique index on (ifk1, ifk2) and access the next two rows very quickly. It depends on the implemantation in MySQL and its version how fast this is.
I am not sure I understand the index part of your question.
But generally, if you want to iterate over a bigger result set you can use a cursor as described here:
https://www.mysqltutorial.org/mysql-cursor/
This would be for a stored procedure but db drivers for other languages will expose similar functionality.
If your request is to use indexes: Logic is the same:
mysql> SELECT * FROM testtable WHERE ifk1 > 0 AND ifk2 > 0 LIMIT 2,2;
+------+------+-------+
| ifk1 | ifk2 | data1 |
+------+------+-------+
| 2 | 4 | c |
| 5 | 8 | d |
+------+------+-------+
2 rows in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM testtable WHERE ifk1 > 0 AND ifk2 > 0 LIMIT 2,2;
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | testtable | NULL | index | PRIMARY | keyName | 43 | NULL | 6 | 33.33 | Using where; Using index |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
mysql> SELECT * FROM testtable WHERE ifk1 > 0 AND ifk2 > 0 LIMIT 4,2;
+------+------+-------+
| ifk1 | ifk2 | data1 |
+------+------+-------+
| 2 | 2 | e |
| 2 | 5 | f |
+------+------+-------+
2 rows in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM testtable WHERE ifk1 > 0 AND ifk2 > 0 LIMIT 4,2;
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
| 1 | SIMPLE | testtable | NULL | index | PRIMARY | keyName | 43 | NULL | 6 | 33.33 | Using where; Using index |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
Above uses the KeyName indexes: Maybe you should deactivate or drop KeyName index, and enable the query to use PK composite indexes: Please follow these steps to achieve it:
DROP keyName index first:
mysql> ALTER TABLE testtable
-> DROP INDEX keyName;
Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0
the run the query again to see that PK Composite keys are used in the query which I think make the query faster:
mysql> EXPLAIN SELECT * FROM testtable WHERE ifk1 > 0 AND ifk2 > 0 LIMIT 2,2;
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| 1 | SIMPLE | testtable | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 6 | 33.33 | Using where |
+----+-------------+-----------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

How to build index and query with time range and id sort?

Here is the table data
id
time
amount
1
20221104
15
2
20221104
10
3
20221105
7
4
20221105
19
5
20221106
10
The id and time field is asc, but time can be same.
The rows are very large, so we don't want to use page limit offset method, but with cursor id.
first query:
select * from t where time > xxx and time < yyy order by id asc limit 10;
get the biggest id zzz, then
next query:
select * from t where time > xxx and time < yyy and id > zzz order by id asc limit 10;
How should I build the index?
If I use id as index, the time range will cause huge scan if time is far away.
And If I use time as index, seek id will not be effective.
The following index should be enough for both queries:
alter table t add index `time_id` (`time`,`id`);
Note, use proper date/datetime data types , will save a lot of pain in the future
The key is composite index by leftmost prefixing principle. But both queries here start with
range expression. So I suppose that simply creating index on (a, b) is unable to optimize effectively
because indexing process stops after range condition. It is enough to create index like this:
CREATE INDEX index_time ON t (`time`)
More can be referenced here:
https://www.ibm.com/docs/en/informix-servers/12.10?topic=indexes-use-composite
https://orangematter.solarwinds.com/2019/02/05/the-left-prefix-index-rule/
First I agree with #ErgestBasha Suggestion:
If you follow the general performance rules:
CREATE TABLE t (id INT, time DATE, amount DEC(3,1));
Query OK, 0 rows affected (0.02 sec)
mysql> INSERT INTO t VALUES
-> (1, '2022-11-04', 15),
-> (2, '2022-11-04', 10),
-> (3, '2022-11-05', 7),
-> (4, '2022-11-05', 19),
-> (5, '2022-11-06', 10);
Query OK, 5 rows affected (0.00 sec)
Records: 5 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM t;
+------+------------+--------+
| id | time | amount |
+------+------------+--------+
| 1 | 2022-11-04 | 15.0 |
| 2 | 2022-11-04 | 10.0 |
| 3 | 2022-11-05 | 7.0 |
| 4 | 2022-11-05 | 19.0 |
| 5 | 2022-11-06 | 10.0 |
+------+------------+--------+
5 rows in set (0.00 sec)
mysql> ALTER TABLE t ADD INDEX idx_time_id (time,id);
Query OK, 0 rows affected (0.02 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> SHOW INDEXES FROM t;
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| t | 1 | idx_time_id | 1 | time | A | 3 | NULL | NULL | YES | BTREE | | | YES | NULL |
| t | 1 | idx_time_id | 2 | id | A | 5 | NULL | NULL | YES | BTREE | | | YES | NULL |
+-------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
2 rows in set (0.01 sec)
mysql> SELECT * FROM t WHERE time > '2022-11-04' AND time < '2022-11-06' AND id > 3 ORDER BY id ASC LIMIT 10;
+------+------------+--------+
| id | time | amount |
+------+------------+--------+
| 4 | 2022-11-05 | 19.0 |
+------+------------+--------+
1 row in set (0.00 sec)
mysql> EXPLAIN SELECT * FROM t WHERE time > '2022-11-04' AND time < '2022-11-06' AND id > 3 ORDER BY id ASC LIMIT 10;
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+------+----------+---------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+------+----------+---------------------------------------+
| 1 | SIMPLE | t | NULL | range | idx_time_id | idx_time_id | 4 | NULL | 2 | 33.33 | Using index condition; Using filesort |
+----+-------------+-------+------------+-------+---------------+-------------+---------+------+------+----------+---------------------------------------+
1 row in set, 1 warning (0.00 sec)
As you can see, It uses the indexes defined (time,id) and uses the range scan access method. Also Extra column you can see that index is used during operation!
See this for iterating through a compound key:
http://mysql.rjweb.org/doc.php/deletebig#iterating_through_a_compound_key
It cannot be done with two ANDs; tt needs one AND and one OR.
See this for why OFFSET should be avoided when Paginating

MySQL JOINing a TABLE to itself without a primary key

I created two tables in mysql. Each contains an integer idx and a string name. In one, the idx was the primary key.
CREATE TABLE table_indexed (
idx INTEGER,
name VARCHAR(24),
PRIMARY KEY(idx)
);
CREATE TABLE table_not_indexed (
idx INTEGER,
name VARCHAR(24)
);
I then added the same data to both tables. 3 million lines of distinct values to idx (1-3_000_00, randomly arranged) and 3 million random arrangements of 8 lowercase characters to name.
Then I ran a query where I joined each table to itself. The table without the primary key runs almost 3 times as fast.
mysql> SELECT COUNT(*)
-> FROM table_indexed t1 JOIN table_indexed t2
-> ON t1.idx = t2.idx;
+----------+
| COUNT(*) |
+----------+
| 3000000 |
+----------+
1 row in set (11.80 sec)
mysql> SELECT COUNT(*)
-> FROM table_not_indexed t1 JOIN table_not_indexed t2
-> ON t1.idx = t2.idx;
+----------+
| COUNT(*) |
+----------+
| 3000000 |
+----------+
1 row in set (4.12 sec)
EDIT: Asked mySQL to Explain the query.
mysql> EXPLAIN SELECT COUNT(*)
-> FROM table_indexed t1 JOIN table_indexed t2
-> ON t1.idx = t2.idx;
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------+---------+----------+-------------+
| 1 | SIMPLE | t1 | NULL | index | PRIMARY | PRIMARY | 4 | NULL | 3171970 | 100.00 | Using index |
| 1 | SIMPLE | t2 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | index_test3000000.t1.idx | 1 | 100.00 | Using index |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------------------+---------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT COUNT(*)
-> FROM table_not_indexed t1 JOIN table_not_indexed t2
-> ON t1.idx = t2.idx;
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 2993208 | 100.00 | NULL |
| 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 2993208 | 10.00 | Using where; Using join buffer (hash join) |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+--------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
mysql>
In both cases it does a table scan of t1, then looks for the matching row in t2.
In this case USING INDEX is equivalent to using the PK when the PK is involved. (EXPLAIN is a bit sloppy and inconsistent in this area.)
Sometimes you can get more details with EXPLAIN FORMAT=JSON SELECT .... (Might not be anything useful in this case.)
"rows" is just an estimate.
The non-indexed case reads t2 entirely into memory and builds a Hash index on it. With too small a value for join_buffer_size, you can experience the alternative -- repeated full table scans of t2.
Your experiment is a good example of when the "join buffer" is good, but not as good as an appropriate index.
Your experiment would probably come out the same with two separate tables instead of a "self-join".
"3 times as fast" -- I would expect a lot of variation in the "3" for different test cases.
For more on join_buffer_size, BNL, and BKA (Block Nested-Loop or Batched Key Access), see https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_join_buffer_size
It is potentially unsafe to set join_buffer_size bigger than 1% of RAM.

Find value within a range in database table

I need the SQL equivalent of this.
I have a table like this
ID MN MX
-- -- --
A 0 3
B 4 6
C 7 9
Given a number, say 5, I want to find the ID of the row where MN and MX contain that number, in this case that would be B.
Obviously,
SELECT ID FROM T WHERE ? BETWEEN MN AND MX
would do, but I have 9 million rows and I want this to run as fast as possible. In particular, I know that there can be only one matching row, I now that the MN-MX ranges cover the space completely, and so on. With all these constraints on the possible answers, there should be some optimizations I can make. Shouldn't there be?
All I have so far is indexing MN and using the following
SELECT ID FROM T WHERE ? BETWEEN MN AND MX ORDER BY MN LIMIT 1
but that is weak.
If you have an index spanning MN and MX it should be pretty fast, even with 9M rows.
alter table T add index mn_mx (mn, mx);
Edit
I just tried a test w/ a 1M row table
mysql> select count(*) from T;
+----------+
| count(*) |
+----------+
| 1000001 |
+----------+
1 row in set (0.17 sec)
mysql> show create table T\G
*************************** 1. row ***************************
Table: T
Create Table: CREATE TABLE `T` (
`id` int(10) NOT NULL AUTO_INCREMENT,
`mn` int(10) DEFAULT NULL,
`mx` int(10) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `mn_mx` (`mn`,`mx`)
) ENGINE=InnoDB AUTO_INCREMENT=1048561 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
mysql> select * from T order by rand() limit 1;
+--------+-----------+-----------+
| id | mn | mx |
+--------+-----------+-----------+
| 112940 | 948004986 | 948004989 |
+--------+-----------+-----------+
1 row in set (0.65 sec)
mysql> explain select id from T where 948004987 between mn and mx;
+----+-------------+-------+-------+---------------+-------+---------+------+--------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------+---------+------+--------+--------------------------+
| 1 | SIMPLE | T | range | mn_mx | mn_mx | 5 | NULL | 239000 | Using where; Using index |
+----+-------------+-------+-------+---------------+-------+---------+------+--------+--------------------------+
1 row in set (0.00 sec)
mysql> select id from T where 948004987 between mn and mx;
+--------+
| id |
+--------+
| 112938 |
| 112939 |
| 112940 |
| 112941 |
+--------+
4 rows in set (0.03 sec)
In my example I just had an incrementing range of mn values and then set mx to +3 that so that's why I got more than 1, but should apply the same to you.
Edit 2
Reworking your query will definitely be better
mysql> explain select id from T where mn<=947892055 and mx>=947892055;
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
| 1 | SIMPLE | T | range | mn_mx | mn_mx | 5 | NULL | 9 | Using where; Using index |
+----+-------------+-------+-------+---------------+-------+---------+------+------+--------------------------+
It's worth noting even though the first explain reported many more rows to be scanned I had enough innodb buffer pool set to keep the entire thing in RAM after creating it; so it was still pretty fast.
If there are no gaps in your set, a simple gte comparison will work:
SELECT ID FROM T WHERE ? >= MN ORDER BY MN ASC LIMIT 1

Getting max value from many tables

There are two ways, that I can think of, to obtain similar results from multiple tables. One is UNION and the other is JOIN. The similar questions on SO have all been answered with a UNION. Here's the coder I just found:
SELECT max(up.id) AS up, max(sc.id) AS sc, max(cl.id) AS cl
FROM updates up, chat_staff sc, change_log cl
explain:
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
| 1 | SIMPLE | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------------------+
My question is -- Is this better than the following?
SELECT "up.id" AS K, max(id) AS V FROM updates
UNION
SELECT "sc.id" AS K, max(id) AS V FROM chat_staff
UNION
SELECT "cl.id" AS K, max(id) AS V FROM change_log
explain:
+----+--------------+--------------+------+---------------+------+---------+------+-------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+--------------+------+---------------+------+---------+------+------+------------------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
| 2 | UNION | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
| 3 | UNION | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
| NULL | UNION RESULT | <union1,2,3> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+------+---------------+------+---------+------+------+------------------------------+
Both of those methods are just fine. In fact, I have another method:
SELECT
IFNULL(maxidup,0) max_id_up,
IFNULL(maxscup,0) max_sc_up,
IFNULL(maxclup,0) max_cl_up
FROM
(SELECT max(id) maxidup FROM updates) up,
(SELECT max(id) maxidsc FROM chat_staff) sc,
(SELECT max(id) maxidcl FROM change_log) cl
;
This method presents the three values side by side like your first example. It also shows 0 in the event one of the tables are empty.
mysql> DROP DATABASE IF EXISTS junk;
Query OK, 3 rows affected (0.11 sec)
mysql> CREATE DATABASE junk;
Query OK, 1 row affected (0.00 sec)
mysql> use junk
Database changed
mysql> CREATE TABLE updates (id int not null auto_increment primary key,x int);
Query OK, 0 rows affected (0.07 sec)
mysql> CREATE TABLE chat_staff LIKE updates;
Query OK, 0 rows affected (0.07 sec)
mysql> CREATE TABLE change_log LIKE updates;
Query OK, 0 rows affected (0.06 sec)
mysql> INSERT INTO updates (x) VALUES (37),(84),(12);
Query OK, 3 rows affected (0.06 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> INSERT INTO change_log (x) VALUES (37),(84),(12),(14),(35);
Query OK, 5 rows affected (0.09 sec)
Records: 5 Duplicates: 0 Warnings: 0
mysql> SELECT
-> IFNULL(maxidup,0) max_id_up,
-> IFNULL(maxidsc,0) max_sc_up,
-> IFNULL(maxidcl,0) max_cl_up
-> FROM
-> (SELECT max(id) maxidup FROM updates) up,
-> (SELECT max(id) maxidsc FROM chat_staff) sc,
-> (SELECT max(id) maxidcl FROM change_log) cl
-> ;
+-----------+-----------+-----------+
| max_id_up | max_sc_up | max_cl_up |
+-----------+-----------+-----------+
| 3 | 0 | 5 |
+-----------+-----------+-----------+
1 row in set (0.00 sec)
mysql> explain SELECT IFNULL(maxidup,0) max_id_up, IFNULL(maxidsc,0) max_sc_up, IFNULL(maxidcl,0) max_cl_up FROM (SELECT max(id) maxidup FROM updates) up, (SELECT max(id) maxidsc FROM chat_staff) sc, (SELECT max(id) maxidcl FROM change_log) cl;
+----+-------------+------------+--------+---------------+------+---------+------+------+------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+---------------+------+---------+------+------+------------------------------+
| 1 | PRIMARY | <derived2> | system | NULL | NULL | NULL | NULL | 1 | |
| 1 | PRIMARY | <derived3> | system | NULL | NULL | NULL | NULL | 1 | |
| 1 | PRIMARY | <derived4> | system | NULL | NULL | NULL | NULL | 1 | |
| 4 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
| 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No matching min/max row |
| 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away |
+----+-------------+------------+--------+---------------+------+---------+------+------+------------------------------+
6 rows in set (0.02 sec)
In my EXPLAIN plan, it has Select tables optimized away just like yours. Why ?
Since id is indexed in all the tables, the index is used to retrieve the max(id) rather than the table. Thus, Select tables optimized away is the correct response.
Six of one, half dozen of the other. How you present data from there is strictly your personal preference.
UPDATE 2011-10-20 15:32 EDT
You commented : Do you know how table locking would compromise this? Let's say one of the tables in question is locked. Would this query lock the other two and keep 'em locked until the first one was freed up?
This would depend on the storage engine. If all tables in question are MyISAM, definite possibility since MyISAM performs a full table lock on INSERT, UPDATE, DELETE. If the three tables are InnoDB, you have the benefit of MVCC to provide transaction isolation. This would allow everyone their view of the data in a point-in-time. Aside from DDL and an explcit LOCK TABLES against InnoDB, your query should not be blocked.
Actually, while they're similar, there's a subtle difference. The first gives you a one-row, three-column table (with the values going "across") and the second gives you a three-row, two-column table (with the values going "down").
Provided you're happy processing or viewing that data in either form, it's probably going to come down to performance.
In my experience (and this is nothing to do specifically with MySQL), the latter query will probably be better. That's because the DBMS' I work with are able to run queries like that in parallel for efficiency, combining them at completion of all. The fact that they're on different tables means that lock contention between them will be zero.
It may be that the query analysis engine of a DBMS could do a similar optimisation for the first query but it would require a lot more intelligence than I've seen from most of them.
One quick point, if you use union all instead of just union, you tell the database not to remove duplicate rows. You won't get any duplicates in this case due to the K column being different for all three sub-queries.
But, as with all optimisations, measure, don't guess! Certainly don't take as gospel the rants of random internet roamers (yes, even me).
Put together various candidate tables with the properties you're likely to have in production, and compare the performance of each.