MySQL JOIN subquery ON FALSE. Will subquery execute? - mysql

Consider this MySQL query
SELECT <cols> FROM <tbl> INNER|LEFT JOIN (SELECT ...) AS sub_query ON FALSE
Of course this will yield an empty result set no matter what's in tbl or sub_query because of the ON statement is always false.
In my situation the only part I can control is the ON statement.
My question is, is MySQL smart enough to detect that? And then skips running the subquery altogether and return the empty result set immediately?

We could just test this out:
mysql> explain select * from mdl_user join (select 1) q on false;
+----+-------------+-------+------+---------------+------+---------+------+------+------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE |
| 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------+
2 rows in set (0.02 sec)
mysql> select * from mdl_user join (select 1) q on false;
Empty set (0.00 sec)
That would seem to indicate that yes, MySQL detects the on false condition, and doesn't bother with the entire join.
(sorry, a moodle database was what I was buried in at the time)

This is too long for a comment.
I would be surprised if MySQL were smart enough in practice to do this on a real query. You should test on your own data.
However, the documentation suggests that it is a possibility:
For non-EXPLAIN queries, delay of materialization may result in not
having to do it at all. Consider a query that joins the result of a
subquery in the FROM clause to another table: If the optimizer
processes that other table first and finds that it returns no rows,
the join need not be carried out further and the optimizer can
completely skip materializing the subquery.

Related

mysql get average of column join from million records

SELECT AVG(table1.column1) as a,
table2.column2
FROM table1
LEFT OUTER JOIN table2
ON table2.column2 = table1.column2
GROUP BY table2.column2 ORDER BY a DESC LIMIT 10
This is MySQL code. I have 1.5 Million rows in table1, 200.000 rows in table2.
I am still waiting for the query to finish.
Does anybody know a way to work in a shorter time?
Lot of comments in the same vein but I thought I'd give a thorough answer. I'm gonna use one of my own tables/databases here for explanation. Let's take this query:
SELECT A.id, B.asin FROM AmazonWishlistItems A LEFT JOIN AmazonWishlistItemPrices B ON (B.asin = A.asin) WHERE A.asin LIKE "%C%"
This query returns about 851 and takes 0.5 seconds. If we add the word EXPLAIN to the query, MySQL tells us what this query is doing.
mysql> EXPLAIN SELECT A.id, B.asin FROM AmazonWishlistItems A LEFT JOIN AmazonWishlistItemPrices B ON (B.asin = A.asin) WHERE A.asin LIKE "%C%";
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | A | ALL | NULL | NULL | NULL | NULL | 1183 | Using where |
| 1 | SIMPLE | B | ALL | NULL | NULL | NULL | NULL | 6594 | |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
2 rows in set (0.00 sec)
Important column to look at here is the rows as this is the number of records MySQL is having to look at and in this case for tables A and B it is having to look up all the rows even though there's only 851 that fit the condition. This is how tables can get out of hand quickly, this only has 6594 record to search through but left alone this could easily reach your 1.5 million rows.
So we can cut this down by adding an index to the table, allowing MySQL to store a reference for each record.
ALTER TABLE AmazonWishlistItemPrices ADD INDEX idx_asin (asin)
This simply says create an index called idx_asin and use the column asin to do the indexing. If we re run our EXPLAIN...
mysql> EXPLAIN SELECT A.id, B.asin FROM AmazonWishlistItems A LEFT JOIN AmazonWishlistItemPrices B ON (B.asin = A.asin) WHERE A.asin LIKE "%C%";
+----+-------------+-------+------+---------------+----------+---------+---------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+----------+---------+---------------------+------+-------------+
| 1 | SIMPLE | A | ALL | NULL | NULL | NULL | NULL | 1183 | Using where |
| 1 | SIMPLE | B | ref | idx_asin | idx_asin | 12 | mah_database.A.asin | 6 | Using index |
+----+-------------+-------+------+---------------+----------+---------+---------------------+------+-------------+
2 rows in set (0.00 sec)
We're down to six rows and you can see in the possible_keys it's found our index. You may find that with certain joins and where clauses your indexes are being ignored that's simply MySQL saying "I'm going to have to get all the data anyway" because of the conditions you've provided in the WHERE condition.
It's best to use numeric keys for indexing, you can get away with some varchars but they do take up disk space. You should have a PRIMARY KEY on each table where possible. So look at your database structure and consider adding some indexes.
Final thing to check if your table has indexes you can use SHOW CREATE TABLE followed by the table name.

Very Strange MySQL Results

Can someone explain this... before I have myself committed? The first result set should have two results, same as the second, no?
mysql> SELECT * FROM kuru_footwear_2.customer_address_entity_varchar
-> WHERE attribute_id=31 AND entity_id=324134;
+----------+----------------+--------------+-----------+-------+
| value_id | entity_type_id | attribute_id | entity_id | value |
+----------+----------------+--------------+-----------+-------+
| 885263 | 2 | 31 | 324134 | NULL |
+----------+----------------+--------------+-----------+-------+
1 row in set (0.00 sec)
mysql> SELECT * FROM kuru_footwear_2.customer_address_entity_varchar
-> WHERE value_id=885263 OR value_id=950181;
+----------+----------------+--------------+-----------+-------+
| value_id | entity_type_id | attribute_id | entity_id | value |
+----------+----------------+--------------+-----------+-------+
| 885263 | 2 | 31 | 324134 | NULL |
| 950181 | 2 | 31 | 324134 | NULL |
+----------+----------------+--------------+-----------+-------+
2 rows in set (0.00 sec)
attribute_id is a SMALLINT(5)
entity_id is a INT(10)
The problem is that you have a unique index on (entity_id,attribute_id). The query optimizer notices this when you write a query whose WHERE clause is covered by the index, and only returns 1 row since the uniqueness of the index implies that there's at most one matching row.
I'm not sure how you can have those duplicates in the first place, it seems like there's something corrupted in the table. Adding a unique index to a table will normally remove any duplicates. In fact, this is often suggested as a way to get rid of duplicates in a table, see How do I delete all the duplicate records in a MySQL table without temp tables.
In the first statement's selection (the 'WHERE' clause), you are using AND; in the second statement, you are using OR. This boils down to the definition of these logical operators. MySQL's official documentation doesn't say much about these other than that AND and OR are their own natural logical operators. If this is confusing, you may want to read up on basic Boolean Algebra.

Improve SQL statement in mysql to run faster. I want to merge field but merge on different rows

Why does this code take a long time to run?
SELECT
concat((select Sname from member order by rand() limit 1),
Ssurname) as tee
FROM member
But This code is very fast to run
SELECT
concat(Sname,
Ssurname) as tee
FROM member
For every result row returned by your first example, MySQL must produce another row from which to CONCAT() Sname, with a custom (random) order. Because order by rand() is used, the whole table must be reordered randomly for every row in your table. That is likely to be a very expensive operation, since the result of the subquery cannot be cached.
In the second example, a simple rowset is returned. Sname and Ssurname are concatenated from columns in the same row.
I ran an EXPLAIN on a similar query, having one indexed column concatenated against a non-indexed subquery. MySQL is using a temporary table to compute the subqueries.
mysql> EXPLAIN SELECT CONCAT(g_userName, (SELECT g_fullName FROM g2_User ORDER BY RAND() LIMIT 1)) FROM g2_User;
+----+----------------------+---------+-------+---------------+------------+---------+------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+----------------------+---------+-------+---------------+------------+---------+------+------+---------------------------------+
| 1 | PRIMARY | g2_User | index | NULL | g_userName | 98 | NULL | 5 | Using index |
| 2 | UNCACHEABLE SUBQUERY | g2_User | ALL | NULL | NULL | NULL | NULL | 5 | Using temporary; Using filesort |
+----+----------------------+---------+-------+---------------+------------+---------+------+------+---------------------------------+

MySQL: Inner join vs Where [duplicate]

This question already has answers here:
Explicit vs implicit SQL joins
(12 answers)
Closed 9 years ago.
Is there a difference in performance (in mysql) between
Select * from Table1 T1
Inner Join Table2 T2 On T1.ID = T2.ID
And
Select * from Table1 T1, Table2 T2
Where T1.ID = T2.ID
?
As pulled from the accepted answer in question 44917:
Performance wise, they are exactly the
same (at least in SQL Server) but be
aware that they are deprecating the
implicit outer join syntax.
In MySql the results are the same.
I would personally stick with joining tables explicitly... that is the "socialy acceptable" way of doing it.
They are the same. This can be seen by running the EXPLAIN command:
mysql> explain Select * from Table1 T1
-> Inner Join Table2 T2 On T1.ID = T2.ID;
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------------+
| 1 | SIMPLE | T1 | index | PRIMARY | PRIMARY | 4 | NULL | 4 | Using index |
| 1 | SIMPLE | T2 | index | PRIMARY | PRIMARY | 4 | NULL | 4 | Using where; Using index; Using join buffer |
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------------+
2 rows in set (0.00 sec)
mysql> explain Select * from Table1 T1, Table2 T2
-> Where T1.ID = T2.ID;
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------------+
| 1 | SIMPLE | T1 | index | PRIMARY | PRIMARY | 4 | NULL | 4 | Using index |
| 1 | SIMPLE | T2 | index | PRIMARY | PRIMARY | 4 | NULL | 4 | Using where; Using index; Using join buffer |
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------------+
2 rows in set (0.00 sec)
Well one late answer from me, As I am analyzing performance of a older application which uses comma based join instead of INNER JOIN clause.
So here are two tables which have a join (both have records more than 1 lac). When executing query which has a comma based join, it takes a lot longer than the INNER JOIN case.
When I analyzed the explain statement, I found that the query having comma join was using the join buffer. However the query having INNER JOIN clause had 'using Where'.
Also these queries are significantly different, as shown in rows column in explain query.
These are my queries and their respective explain results.
explain select count(*) FROM mbst a , his_moneypv2 b
WHERE b.yymm IN ('200802','200811','201001','201002','201003')
AND a.tel3 != ''
AND a.mb_no = b.mb_no
AND b.true_grade_class IN (3,6)
OR b.grade_class IN (4,7);
+----+-------------+-------+-------------+----------------------------------------------------------------+--------------------------------------+---------+------+--------+---------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------------+----------------------------------------------------------------+--------------------------------------+---------+------+--------+---------------------------------------------------------------------+
| 1 | SIMPLE | b | index_merge | PRIMARY,mb_no,yymm,yymm_2,idx_true_grade_class,idx_grade_class | idx_true_grade_class,idx_grade_class | 5,9 | NULL | 16924 | Using sort_union(idx_true_grade_class,idx_grade_class); Using where |
| 1 | SIMPLE | a | ALL | PRIMARY | NULL | NULL | NULL | 134472 | Using where; Using join buffer |
+----+-------------+-------+-------------+----------------------------------------------------------------+--------------------------------------+---------+------+--------+---------------------------------------------------------------------+
v/s
explain select count(*) FROM mbst a inner join his_moneypv2 b
on a.mb_no = b.mb_no
WHERE b.yymm IN ('200802','200811','201001','201002','201003')
AND a.tel3 != ''
AND b.true_grade_class IN (3,6)
OR b.grade_class IN (4,7);
+----+-------------+-------+-------------+----------------------------------------------------------------+--------------------------------------+---------+--------------------+-------+---------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------------+----------------------------------------------------------------+--------------------------------------+---------+--------------------+-------+---------------------------------------------------------------------+
| 1 | SIMPLE | b | index_merge | PRIMARY,mb_no,yymm,yymm_2,idx_true_grade_class,idx_grade_class | idx_true_grade_class,idx_grade_class | 5,9 | NULL | 16924 | Using sort_union(idx_true_grade_class,idx_grade_class); Using where |
| 1 | SIMPLE | a | eq_ref | PRIMARY | PRIMARY | 62 | shaklee_my.b.mb_no | 1 | Using where |
+----+-------------+-------+-------------+----------------------------------------------------------------+--------------------------------------+---------+--------------------+------
Actually they are virtually the same, The JOIN / ON is newer ANSI syntac, the WHERE is older ANSI syntax. Both are recognized by query engines
The comma in a FROM clause is a CROSS JOIN. We can imagine that SQL server has a select query execution procedure which somehow should look like that:
1. iterate through every table
2. find rows that meet join predicate and put it into result table.
3. from the result table, get only those rows that meets the where condition.
If it really looks like that, then using a CROSS JOIN on a table that has a few thousands rows could allocate a lot of memory, when every row is combined with each other before the where condition is examined. Your SQL server could be quite busy then.
I would think so because the first example explicitly tells mysql which columns to join and how to join them where the second one mysql has to try and figure out where you want to join.
the second query is just another notation for an inner join, so if there is a difference in porformance it's only because one query can be parsed faster than the other one - and that difference, if it exists, will be so tiny that you won't notice it.
for more information you could try to take a look at this question (and use the search on SO next time before asking a question that already is answered)
The first query is easier to understand for MySQL so it is likely that the execution plan will be better and that the query will run faster.
The second query without the where clause, is a cross join. If MySQL is able to understand the where clause good enough, it will do its best to avoid cross joining all the rows, but nothing guarantee that.
In a case as simple as yours, the performance will be strictly identical.
Performance wise, the first query will always be better or identical to the second one. And from my point of view it is also a lot easier to understand when rereading.

Why does MySQL not use an index when executing this query?

mysql> desc users;
+-------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| email | varchar(128) | NO | UNI | | |
| password | varchar(32) | NO | | | |
| screen_name | varchar(64) | YES | UNI | NULL | |
| reputation | int(10) unsigned | NO | | 0 | |
| imtype | varchar(1) | YES | MUL | 0 | |
| last_check | datetime | YES | MUL | NULL | |
| robotno | int(10) unsigned | YES | | NULL | |
+-------------+------------------+------+-----+---------+----------------+
8 rows in set (0.00 sec)
mysql> create index i_users_imtype_robotno on users(imtype,robotno);
Query OK, 24 rows affected (0.25 sec)
Records: 24 Duplicates: 0 Warnings: 0
mysql> explain select * from users where imtype!='0' and robotno is null;
+----+-------------+-------+------+------------------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+------------------------+------+---------+------+------+-------------+
| 1 | SIMPLE | users | ALL | i_users_imtype_robotno | NULL | NULL | NULL | 24 | Using where |
+----+-------------+-------+------+------------------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)
But this way,it's used:
mysql> explain select * from users where imtype in ('1','2') and robotno is null;
+----+-------------+-------+-------+------------------------+------------------------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+------------------------+------------------------+---------+------+------+-------------+
| 1 | SIMPLE | users | range | i_users_imtype_robotno | i_users_imtype_robotno | 11 | NULL | 3 | Using where |
+----+-------------+-------+-------+------------------------+------------------------+---------+------+------+-------------+
1 row in set (0.01 sec)
Besides,this one also did not use index:
mysql> explain select id,email,imtype from users where robotno=1;
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | users | ALL | NULL | NULL | NULL | NULL | 24 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)
SELECT *
FROM users
WHERE imtype != '0' and robotno is null
This condition is not satisified by a single contiguous range of (imtype, robotno).
If you have records like this:
imtype robotno
$ NULL
$ 1
0 NULL
0 1
1 NULL
1 1
2 NULL
2 1
, ordered by (imtype, robotno), then the records 1, 5 and 7 would be returned, while other records wouldn't.
You'll need create this index to satisfy the condition:
CREATE INDEX ix_users_ri ON users (robotno, imptype)
and rewrite your query a little:
SELECT *
FROM users
WHERE (
robotno IS NULL
AND imtype < '0'
)
OR
(
robotno IS NULL
AND imtype > '0'
)
, which will result in two contiguous blocks:
robotno imtype
--- first block start
NULL $
--- first block end
NULL 0
--- second block start
NULL 1
NULL 2
--- second block end
1 $
1 0
1 1
1 2
This index will also serve this query:
SELECT id, email, imtype
FROM users
WHERE robotno = 1
, which is not served now by any index for the same reason.
Actually, the index for this query:
SELECT *
FROM users
WHERE imtype in ('1', '2')
AND robotno is null
is used only for coarse filtering on imtype (note using where in the extra field), it doesn't range robotno's
You need an index that has robotno as the first column. Your existing index is (imtype,robotno). Since imtype is not in the where clause, it can't use that index.
An index on (robotno,imtype) could be used for queries with just robotno in the where clause, and also for queries with both imtype and robotno in the where clause (but not imtype by itself).
Check out the docs on how MySQL uses indexes, and look for the parts that talk about multi-column indexes and "leftmost prefix".
BTW, if you think you know better than the optimizer, which is often the case, you can force MySQL to use a specific index by appending
FORCE INDEX (index_name) after FROM users.
It's because 'robotno' is potentially a primary key, and it uses that instead of the index.
A database systems query planner determines whether to do an index scan or not by analyzing the selectivity of the query's where clause relative to the index. (Indexes are also used to join tables together, but you only have users here.)
The first query has where imtype != '0'. This would select nearly all of the rows in users, assuming you have a large number of distinct values of imtype. The inequality operator is inherently unselective. So the MySQL query planner is betting here that reading through the index won't help and that it may as well just do a sequential scan through the whole table, since it probably would have to do that anyway.
On the other hand, had you said where imtype ='0', equality is a highly selective operator, and MySQL would bet that by reading just a few index blocks it could avoid reading nearly all of the blocks of the users table itself. So it would pick the index.
In your second example, where imtype in ('1','2'), MySQL knows that the index will be highly selective (though only half as selective as where imtype = '0'), and it will again bet that using the index will lead to a big payoff, as you discovered.
In your third example, where robotno=1, MySQL probably can't effectively use the index on users(imtype,robotno) since it would need to read in all the index blocks to find the robotno=1 record numbers: the index is sorted by imtype first, then robotno. If you had another index on users(robotno), MySQL would eagerly use it though.
As a footnote, if you had two indexes, one on users(imtype), and the other on users(imtype,robotno), and your query was on where imtype = '0', either index would make your query fast, but MySQL would probably select users(imtype) simply because it's more compact and fewer blocks would need to be read from it.
I'm being very simplistic here. Early database systems would just look at imtype's datatype and make a very rough guess at the selectivity of your query, but people very quickly realized that giving the query planner interesting facts like the total size of the table, the number of ditinct values in each column, etc. would enable it to make much smarter decisions. For instance if you had a users table where imtype was only every '0' or '1', the query planner might choose the index, since in that case the where imtype != '0' is more selective.
Take a look at the MySQL UPDATE STATISTICS statement and you'll see that its query planner must be sophisticated. For that reason I'd hesitate a great deal before using the FORCE statement to dictate a query plan to it. Instead, use UPDATE STATISTICS to give the query planner improved information to base its decisions on.
Your index is over users(imtype,robotno). In order to use this index, either imtype or imtype and robotno must be used to qualify the rows. You are just using robotno in your query, thus it can't use this index.