Two questions on SO seem to be asking about different behaviour for equivalent MySQL queries. In both cases a join is being performed on tables having identical column names. This poster is asking how to eliminate duplicated columns having the same name from the result and this poster is asking how to achieve the duplication of columns having the same name in the result.
To test this I created toy tables:
mysql> describe table_1;
+----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+-------+
| col_name | varchar(255) | YES | | NULL | |
+----------+--------------+------+-----+---------+-------+
mysql> describe table_2;
+----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+-------+
| col_name | varchar(255) | YES | | NULL | |
+----------+--------------+------+-----+---------+-------+
I inserted the value 'value' into both tables and and executed these joins where JOIN_OP is either "join" or "left join":
mysql> select * from table_1 as t1 JOIN_OP table_2 as t2 on t1.col_name = t2.col_name;
+----------+----------+
| col_name | col_name |
+----------+----------+
| value | value |
+----------+----------+
This result conforms to the results in the first post. What is the difference between the two queries and the two results? Why is the second poster not seeing any duplication?
By default MySQL will return all columns for all tables if you use *. Two columns are returned as one is for table_1 and other for table_2. You will need to explicitly enter column names in your query to retrieve them the way you want. Use the query as follows:
select t1.col_name from table_1 as t1 JOIN_OP table_2 as t2 on t1.col_name = t2.col_name;
SQL:2003 rules say that if you use the USING() join condition, the redundant column is eliminated from the select-list, because it's totally clear that the columns have the same name and the same value.
mysql> select * from table_1 as t1 JOIN table_2 as t2 USING (col_name);
+----------+
| col_name |
+----------+
| value |
+----------+
Whereas if you use the more verbose JOIN ON syntax, and you use select *, the select-list retains a separate column from each of the joined tables, even though in this case we can tell that the value is bound to be identical.
But apparently SQL isn't smart enough to make that inference, because a condition in the ON clause could be something other than equality, e.g. it could be ON t1.col_name <> t2.col_name, therefore both columns should be retained in the select-list because they'll have different values.
MySQL 5.0.12 and later supports this standard behavior (several fixes were made to join semantics in MySQL 5.0.12, to be more standard-compliant).
You can see more discussion here: http://bugs.mysql.com/bug.php?id=6489
Related
A "select distinct col1,col2 from table1" where col1 and col2 are of type TEXT and table1 has about 65K rows works fine with MySQL 5.5.58. Now that I've upgraded to MySQL 5.7.20 it takes almost an hour! Does anyone know of any changes to MySQL that may be causing this? Does anyone have any suggestions how col1 and col2 should be optimally indexed for this query, or what other settings I should check to make this query run faster? I don't get the feeling that indexes are even being used since EXPLAIN says it's using a temporary table and no keys:
mysql> `
explain SELECT DISTINCT author,sort_author from itemsbyauthor;
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
| 1 | SIMPLE | itemsbyauthor | NULL | ALL | NULL | NULL | NULL | NULL | 64727 | 100.00 | Using temporary |
+----+-------------+---------------+------------+------+---------------+------+---------+------+-------+----------+-----------------+
1 row in set, 1 warning (0.00 sec)
In many cases, MySQL doesn't use prefix indexes properly and it seems this is one of these cases.
Do you really need the column type to be TEXT?
From the column names, it looks like the columns are holding author names, which seems like a relatively short string (let's say, up to 50 or 100 characters)?
I would re-consider the column type and try to alter it to VARCHAR with a fixed size, instead of TEXT.
Then, add a compound index that includes both columns.
Can someone explain this... before I have myself committed? The first result set should have two results, same as the second, no?
mysql> SELECT * FROM kuru_footwear_2.customer_address_entity_varchar
-> WHERE attribute_id=31 AND entity_id=324134;
+----------+----------------+--------------+-----------+-------+
| value_id | entity_type_id | attribute_id | entity_id | value |
+----------+----------------+--------------+-----------+-------+
| 885263 | 2 | 31 | 324134 | NULL |
+----------+----------------+--------------+-----------+-------+
1 row in set (0.00 sec)
mysql> SELECT * FROM kuru_footwear_2.customer_address_entity_varchar
-> WHERE value_id=885263 OR value_id=950181;
+----------+----------------+--------------+-----------+-------+
| value_id | entity_type_id | attribute_id | entity_id | value |
+----------+----------------+--------------+-----------+-------+
| 885263 | 2 | 31 | 324134 | NULL |
| 950181 | 2 | 31 | 324134 | NULL |
+----------+----------------+--------------+-----------+-------+
2 rows in set (0.00 sec)
attribute_id is a SMALLINT(5)
entity_id is a INT(10)
The problem is that you have a unique index on (entity_id,attribute_id). The query optimizer notices this when you write a query whose WHERE clause is covered by the index, and only returns 1 row since the uniqueness of the index implies that there's at most one matching row.
I'm not sure how you can have those duplicates in the first place, it seems like there's something corrupted in the table. Adding a unique index to a table will normally remove any duplicates. In fact, this is often suggested as a way to get rid of duplicates in a table, see How do I delete all the duplicate records in a MySQL table without temp tables.
In the first statement's selection (the 'WHERE' clause), you are using AND; in the second statement, you are using OR. This boils down to the definition of these logical operators. MySQL's official documentation doesn't say much about these other than that AND and OR are their own natural logical operators. If this is confusing, you may want to read up on basic Boolean Algebra.
I am trying to query for rows that are not in another set of rows. However, the other set of rows may contain strings that include strings from the first table.
I'm confusing myself trying to explain so I'll use the following example tables:
mysql> DESCRIBE tablea;
+------------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+-------+
| name | char(40) | NO | PRI | | |
+------------+----------+------+-----+---------+-------+
mysql> DESCRIBE tableb;
+------------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+------+-----+---------+-------+
| nametag | char(40) | NO | PRI | | |
+------------+----------+------+-----+---------+-------+
mysql> SELECT name FROM tablea;
+------------+
| name |
+------------+
| cat |
| dog |
| cow |
+------------+
mysql> SELECT nametag FROM tableb;
+------------+
| nametag |
+------------+
| wolf |
| dog |
| browncow |
+------------+
I am trying to find a method similar to the NOT IN operation, however because cow is "in" browncow, I also want to exclude this value.
mysql> SELECT name FROM tablea WHERE name NOT IN ( SELECT nametag FROM tableb );
+------------+
| name |
+------------+
| cat |
| cow |
+------------+
# I am looking for something that would only return "cat" for this example.
Is there any operation where I can search for rows that aren't contained in another set with additional modifiers?
You could use an anti-join pattern, with a LIKE predicate to do the matching. (The anti-join is an outer join, to return all the rows from one table, along with matches from another table, and then a predicate to exclude the rows that had a match
SELECT a.name
FROM tablea a
JOIN tableb b
ON b.nametag LIKE CONCAT('%',a.name,'%')
WHERE b.nametag IS NULL
(Any rows from a that had a matching row from b... the row from b will have a non-NULL value. Or, to put it another way... rows from a that didn't have a matching row in b will have a NULL value for the columns from b.)
If there's a row in a that has name='cow', and a row from b that has nametag='browncow', those rows will match.
The row from a with name='cat' will only be returned if the string 'cat' doesn't appear in any values of b.nametag.
NOTE: The percent and underscore characters are wildcards in the LIKE predicate. If you want to do matching on those characters, you'd need to "escape" those with a backslash. There's similar issues using a REGEXP match, but a lot more possible mischievous characters.
There are other query patterns that will return an equivalent result.
For example:
SELECT a.name
FROM tablea a
WHERE NOT EXISTS
( SELECT 1
FROM tableb b
WHERE b.nametag LIKE CONCAT('%',a.name,'%')
)
Personally, I prefer the anti-join pattern.
Use "NOT EXISTS" along with INSTR
select *
from tablea a
where not exists(select 1 from tableb b where INSTR(a.name, b.nametag) > 0)
;
To exclude empty strings:
select *
from tablea a
where not exists(select 1 from tableb b where INSTR(a.name, b.nametag) > 0)
and length(a.name) > 0
;
Consider this MySQL query
SELECT <cols> FROM <tbl> INNER|LEFT JOIN (SELECT ...) AS sub_query ON FALSE
Of course this will yield an empty result set no matter what's in tbl or sub_query because of the ON statement is always false.
In my situation the only part I can control is the ON statement.
My question is, is MySQL smart enough to detect that? And then skips running the subquery altogether and return the empty result set immediately?
We could just test this out:
mysql> explain select * from mdl_user join (select 1) q on false;
+----+-------------+-------+------+---------------+------+---------+------+------+------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE |
| 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+-------+------+---------------+------+---------+------+------+------------------+
2 rows in set (0.02 sec)
mysql> select * from mdl_user join (select 1) q on false;
Empty set (0.00 sec)
That would seem to indicate that yes, MySQL detects the on false condition, and doesn't bother with the entire join.
(sorry, a moodle database was what I was buried in at the time)
This is too long for a comment.
I would be surprised if MySQL were smart enough in practice to do this on a real query. You should test on your own data.
However, the documentation suggests that it is a possibility:
For non-EXPLAIN queries, delay of materialization may result in not
having to do it at all. Consider a query that joins the result of a
subquery in the FROM clause to another table: If the optimizer
processes that other table first and finds that it returns no rows,
the join need not be carried out further and the optimizer can
completely skip materializing the subquery.
This question already has answers here:
Explicit vs implicit SQL joins
(12 answers)
Closed 9 years ago.
Is there a difference in performance (in mysql) between
Select * from Table1 T1
Inner Join Table2 T2 On T1.ID = T2.ID
And
Select * from Table1 T1, Table2 T2
Where T1.ID = T2.ID
?
As pulled from the accepted answer in question 44917:
Performance wise, they are exactly the
same (at least in SQL Server) but be
aware that they are deprecating the
implicit outer join syntax.
In MySql the results are the same.
I would personally stick with joining tables explicitly... that is the "socialy acceptable" way of doing it.
They are the same. This can be seen by running the EXPLAIN command:
mysql> explain Select * from Table1 T1
-> Inner Join Table2 T2 On T1.ID = T2.ID;
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------------+
| 1 | SIMPLE | T1 | index | PRIMARY | PRIMARY | 4 | NULL | 4 | Using index |
| 1 | SIMPLE | T2 | index | PRIMARY | PRIMARY | 4 | NULL | 4 | Using where; Using index; Using join buffer |
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------------+
2 rows in set (0.00 sec)
mysql> explain Select * from Table1 T1, Table2 T2
-> Where T1.ID = T2.ID;
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------------+
| 1 | SIMPLE | T1 | index | PRIMARY | PRIMARY | 4 | NULL | 4 | Using index |
| 1 | SIMPLE | T2 | index | PRIMARY | PRIMARY | 4 | NULL | 4 | Using where; Using index; Using join buffer |
+----+-------------+-------+-------+---------------+---------+---------+------+------+---------------------------------------------+
2 rows in set (0.00 sec)
Well one late answer from me, As I am analyzing performance of a older application which uses comma based join instead of INNER JOIN clause.
So here are two tables which have a join (both have records more than 1 lac). When executing query which has a comma based join, it takes a lot longer than the INNER JOIN case.
When I analyzed the explain statement, I found that the query having comma join was using the join buffer. However the query having INNER JOIN clause had 'using Where'.
Also these queries are significantly different, as shown in rows column in explain query.
These are my queries and their respective explain results.
explain select count(*) FROM mbst a , his_moneypv2 b
WHERE b.yymm IN ('200802','200811','201001','201002','201003')
AND a.tel3 != ''
AND a.mb_no = b.mb_no
AND b.true_grade_class IN (3,6)
OR b.grade_class IN (4,7);
+----+-------------+-------+-------------+----------------------------------------------------------------+--------------------------------------+---------+------+--------+---------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------------+----------------------------------------------------------------+--------------------------------------+---------+------+--------+---------------------------------------------------------------------+
| 1 | SIMPLE | b | index_merge | PRIMARY,mb_no,yymm,yymm_2,idx_true_grade_class,idx_grade_class | idx_true_grade_class,idx_grade_class | 5,9 | NULL | 16924 | Using sort_union(idx_true_grade_class,idx_grade_class); Using where |
| 1 | SIMPLE | a | ALL | PRIMARY | NULL | NULL | NULL | 134472 | Using where; Using join buffer |
+----+-------------+-------+-------------+----------------------------------------------------------------+--------------------------------------+---------+------+--------+---------------------------------------------------------------------+
v/s
explain select count(*) FROM mbst a inner join his_moneypv2 b
on a.mb_no = b.mb_no
WHERE b.yymm IN ('200802','200811','201001','201002','201003')
AND a.tel3 != ''
AND b.true_grade_class IN (3,6)
OR b.grade_class IN (4,7);
+----+-------------+-------+-------------+----------------------------------------------------------------+--------------------------------------+---------+--------------------+-------+---------------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------------+----------------------------------------------------------------+--------------------------------------+---------+--------------------+-------+---------------------------------------------------------------------+
| 1 | SIMPLE | b | index_merge | PRIMARY,mb_no,yymm,yymm_2,idx_true_grade_class,idx_grade_class | idx_true_grade_class,idx_grade_class | 5,9 | NULL | 16924 | Using sort_union(idx_true_grade_class,idx_grade_class); Using where |
| 1 | SIMPLE | a | eq_ref | PRIMARY | PRIMARY | 62 | shaklee_my.b.mb_no | 1 | Using where |
+----+-------------+-------+-------------+----------------------------------------------------------------+--------------------------------------+---------+--------------------+------
Actually they are virtually the same, The JOIN / ON is newer ANSI syntac, the WHERE is older ANSI syntax. Both are recognized by query engines
The comma in a FROM clause is a CROSS JOIN. We can imagine that SQL server has a select query execution procedure which somehow should look like that:
1. iterate through every table
2. find rows that meet join predicate and put it into result table.
3. from the result table, get only those rows that meets the where condition.
If it really looks like that, then using a CROSS JOIN on a table that has a few thousands rows could allocate a lot of memory, when every row is combined with each other before the where condition is examined. Your SQL server could be quite busy then.
I would think so because the first example explicitly tells mysql which columns to join and how to join them where the second one mysql has to try and figure out where you want to join.
the second query is just another notation for an inner join, so if there is a difference in porformance it's only because one query can be parsed faster than the other one - and that difference, if it exists, will be so tiny that you won't notice it.
for more information you could try to take a look at this question (and use the search on SO next time before asking a question that already is answered)
The first query is easier to understand for MySQL so it is likely that the execution plan will be better and that the query will run faster.
The second query without the where clause, is a cross join. If MySQL is able to understand the where clause good enough, it will do its best to avoid cross joining all the rows, but nothing guarantee that.
In a case as simple as yours, the performance will be strictly identical.
Performance wise, the first query will always be better or identical to the second one. And from my point of view it is also a lot easier to understand when rereading.