I'm having some issues with a query that joins two tables. It runs trough the tables far more times than I expect and I can't seem to find why it does this.
My Query is: SELECT * FROM indexAddress LEFT JOIN indexTx ON indexTx.address_id = indexAddress.id WHERE indexAddress.walletId = '2'
IndexTx contains rows with transactions and a field with the address ID (address_id)
IndexAddress contains the address data with the ID as primary key.
id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
1 | SIMPLE | indexAddress | NULL | ref | Wallet ID | Wallet ID | 4 | const | 121 | 100.00
1 | SIMPLE | indexTx | NULL | ref | Address ID | Address ID | 4 | indexAddress.id | 23 | 100.00
My Question is: Why the table runs indexTx 23 times and not just 1. 121 rows is expected, since it's the number of rows for the expected Wallet ID, but 23 is confusing me.
If one of table has more then one record with walletId == 2 the record will be duplicated.
You need filter more the data or use DISTINCT.
Related
I have two tables called ny_clean (3454602 entries) and pickup_0_ids_temp_table (2739268 entries) who have both an id CHAR(11) column which is a primary key and has a BTREE index on top of it ( MySQL 5.7) .
The "id" column in pickup_0_ids_temp_table is a subset of ny_clean and I want to get a result which is ny_clean without the id values from pickup_0_ids_temp_table.
Option 1:
EXPLAIN
SELECT *
FROM pickup_0_ids_temp_table as t
JOIN ny_clean as n
ON n.id != t.id;
+----+-------------+----------+------------+-------+---------------+-------------------+---------+------+---------+----------+-----------------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+---------------+-------------------+---------+------+---------+----------+-----------------------------------------------------------------+
| 1 | SIMPLE | t | NULL | index | NULL | PRIMARY | 11 | NULL | 2734512 | 100.00 | Using index |
| 1 | SIMPLE | ny_clean | NULL | index | NULL | btree_pk_ny_clean | 11 | NULL | 3445904 | 90.00 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+----------+------------+-------+---------------+-------------------+---------+------+---------+----------+-----------------------------------------------------------------+
Option 2:
EXPLAIN
SELECT *
FROM ny_clean as n
WHERE n.id NOT IN (
SELECT id
FROM pickup_0_ids_temp_table);
+----+--------------------+-------------------------+------------+-----------------+------------------------+---------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------------------------+------------+-----------------+------------------------+---------+---------+------+---------+----------+-------------+
| 1 | PRIMARY | n | NULL | ALL | NULL | NULL | NULL | NULL | 3445904 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | pickup_0_ids_temp_table | NULL | unique_subquery | PRIMARY,btree_pickup_0 | PRIMARY | 11 | func | 1 | 100.00 | Using index |
+----+--------------------+-------------------------+------------+-----------------+------------------------+---------+---------+------+---------+----------+-------------+
I then use one of the options inside this larger query
EXPLAIN
INSERT INTO y
SELECT id, pickup_longitude, pickup_latitude
FROM x
JOIN
(OPTION 1 OR 2) as z
ON z.id = x.id;
When I used Option 1 inside the larger query it ran for two days and it was not finished. Option 2 on the other hand did the job in less than 30minutes
My Question: Why is that?
Following the MySQL documentation (https://dev.mysql.com/doc/refman/5.7/en/subquery-materialization.html) I would suspect that it is due to materialization of the subquery but how would I check this ?
And am I interpreting the EXPLAIN Output wrong? Because judging from it I would expect Option 1 to be faster since it uses an index on both tables
Or does it have to do ith the larger query?
Thanks in advance
Your option 1 doesn't do what you think will do.
If you have two tables
n.id t.id
1 1
2 2
3 3
ON n.id != t.id;
You get:
1,2
1,3
2,1
2,3
3,1
3,2
That is almost a cartesian product. So 3.4 mill x 2.7 mill ~ 9.18 mill rows
Then you try to do a JOIN and because that materialzed table doesnt have index will take very long time.
We have two tables - the first is relatively big (contact table) 250k rows and the second is small(user table, < 10 rows). On mysql 5.6 version I have next explain result:
EXPLAIN SELECT
o0_.id AS id_0,
o8_.first_name,
o8_.last_name
FROM
contact o0_
LEFT JOIN user o8_ ON o0_.user_owner_id = o8_.id
LIMIT
25 OFFSET 100
+----+-------------+-------+-------+---------------+----------------------+---------+------+--------+----------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+----------------------+---------+------+--------+----------------------------------------------------+
| 1 | SIMPLE | o0_ | index | NULL | IDX_403263ED9EB185F9 | 5 | NULL | 253030 | Using index |
| 1 | SIMPLE | o8_ | ALL | PRIMARY | NULL | NULL | NULL | 5 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+-------+---------------+----------------------+---------+------+--------+----------------------------------------------------+
2 rows in set (0,00 sec)
When i use force index for join:
EXPLAIN SELECT
o0_.id AS id_0,
o8_.first_name,
o8_.last_name
FROM
contact o0_
LEFT JOIN user o8_ force index for join(`PRIMARY`) ON o0_.user_owner_id = o8_.id
LIMIT
25 OFFSET 100
or adding indexes on fields which appears in select clause (first_name, last_name) on user table:
alter table user add index(first_name, last_name);
Explain result changes to this:
+----+-------------+-------+--------+---------------+----------------------+---------+-------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+----------------------+---------+-------------------------+--------+-------------+
| 1 | SIMPLE | o0_ | index | NULL | IDX_403263ED9EB185F9 | 5 | NULL | 253030 | Using index |
| 1 | SIMPLE | o8_ | eq_ref | PRIMARY | PRIMARY | 4 | o0_.user_owner_id | 1 | NULL |
+----+-------------+-------+--------+---------------+----------------------+---------+-------------------------+--------+-------------+
2 rows in set (0,00 sec)
On mysql 5.5 version I have same explain result without additional indexes:
+----+-------------+-------+--------+---------------+----------------------+---------+-------------------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+---------------+----------------------+---------+-------------------------+--------+-------------+
| 1 | SIMPLE | o0_ | index | NULL | IDX_403263ED9EB185F9 | 5 | NULL | 255706 | Using index |
| 1 | SIMPLE | o8_ | eq_ref | PRIMARY | PRIMARY | 4 | o0_.user_owner_id | 1 | |
+----+-------------+-------+--------+---------------+----------------------+---------+-------------------------+--------+-------------+
2 rows in set (0.00 sec)
Why i need force use PRIMARY index or add extra indexes on mysql 5.6 version?
Same behavior occurs with other selects, when join small tables.
If you have a table with so few rows, it may actually be faster to do a full table scan, than going to an index, locate the records and then go back to the table. If you have other fields in the user table apart from the 3 in the query, then you may consider adding a covering index, but franly, I do not think that any of this would have significant affect on the speed of the query.
I don't understand mysql's EXPLAIN output for the following two queries.
In the first query mysql has to select 1238264 records first:
explain select
count(distinct utc.id)
from
user_to_company utc
inner join
users u
on utc.user_id=u.id
where
u.is_removed=false
order by
utc.user_id asc limit 20;
+----+-------------+--------+------+----------------------------+---------+---------+---------------------------------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+----------------------------+---------+---------+---------------------------------+---------+-------------+
| 1 | SIMPLE | u | ALL | PRIMARY | NULL | NULL | NULL | 1238264 | Using where |
| 1 | SIMPLE | utc | ref | user_id,FKF513E0271C2D1677 | user_id | 8 | u.id | 1 | Using index
In the second query, a GROUP BY was added which makes mysql to select only 20 records:
explain select
count(distinct utc.id)
from
user_to_company utc
inner join
users u
on utc.user_id=u.id
where
u.is_removed=false
group by
utc.user_id
order by
utc.user_id asc limit 20;
+----+-------------+--------+--------+----------------------------+--------------------+---------+-------------------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+--------+----------------------------+--------------------+---------+-------------------------+------+-------------+
| 1 | SIMPLE | utc | index | user_id,FKF513E0271C2D1677 | FKF513E0271C2D1677 | 8 | NULL | 20 | Using index |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 8 | utc.user_id | 1 | Using where |
+----+-------------+--------+--------+----------------------------+--------------------+---------+-------------------------+------+-------------+
For more info, there are 1333194 records in the users table and 1327768 records in user_to_company table.
How does adding the GROUP BY make mysql select only 20 records in the first pass?
The first query has to read all the data to find all the values of utc.id. It returns only one row, which is a summary for the whole table. So, it has to generate all the data.
The second query is producing a separate total for each utc.user_id. You have a limit clause and an index on utc.user_id. MySQL is, apparently, smart enough to recognize that it can go to the index to get the first 20 values of utc.user_id. It uses these to generate the counts.
I am surprised that MySQL is smart enough to do this (although the logic is documented pretty well here). But it makes perfect sense that the second query can be optimized this way where the first one cannot be.
I have a table artists which has an index like this
combo1 => by_id, origin, genre
I run the queries
SELECT * FROM artists
WHERE by_id = '324'
AND genre = 'rock'
AND origin = 'Australia'
and
SELECT * FROM artists
WHERE by_id = '324'
AND origin = 'Australia'
AND genre = 'rock'
Clearly, in the second query the columns are mentioned as they are in the index. When I run EXPLAIN on these, it says it is using the index. But I am a bit confused whether the second one would be faster then the first one. Is it?
There is no difference as the optimizer would take care of the order as long as the columns are included.
WHERE by_id = '324' AND origin = 'Australia' AND genre = 'rock'
| ID | SELECT_TYPE | TABLE | TYPE | POSSIBLE_KEYS | KEY | KEY_LEN | REF | ROWS | EXTRA |
-------------------------------------------------------------------------------------------------------------------
| 1 | SIMPLE | artists | const | PRIMARY | PRIMARY | 80 | const,const,const | 1 | Using index |
WHERE by_id = '324' AND genre = 'rock' AND origin = 'Australia'
| ID | SELECT_TYPE | TABLE | TYPE | POSSIBLE_KEYS | KEY | KEY_LEN | REF | ROWS | EXTRA |
-------------------------------------------------------------------------------------------------------------------
| 1 | SIMPLE | artists | const | PRIMARY | PRIMARY | 80 | const,const,const | 1 | Using index |
If you had happened to leave out by_id, the index would have not been used (indexes work left to right)
WHERE genre = 'rock'
| ID | SELECT_TYPE | TABLE | TYPE | POSSIBLE_KEYS | KEY | KEY_LEN | REF | ROWS | EXTRA |
---------------------------------------------------------------------------------------------------------------------
| 1 | SIMPLE | artists | index | (null) | PRIMARY | 80 | (null) | 1 | Using where; Using index |
WHERE origin='Australia' AND genre = 'rock'
| ID | SELECT_TYPE | TABLE | TYPE | POSSIBLE_KEYS | KEY | KEY_LEN | REF | ROWS | EXTRA |
---------------------------------------------------------------------------------------------------------------------
| 1 | SIMPLE | artists | index | (null) | PRIMARY | 80 | (null) | 1 | Using where; Using index |
Second query should be more faster than first.
The order in which columns are listed in the index definition is important. It is possible to retrieve a set of row identifiers using only the first indexed column. However, it is not possible or efficient (on most databases) to retrieve the set of row identifiers using only the second or greater indexed column.
For example, imagine a phone book that is organized by city first, then by last name, and then by first name. If you are given the city, you can easily extract the list of all phone numbers for that city. However, in this phone book it would be very tedious to find all the phone numbers for a given last name. You would have to look within each city's section for the entries with that last name. Some databases can do this, others just won’t use the index.
In the web page that I'm working on I need to show some statistics based on a different user details which are in three tables. So I have the following query that I join to more different tables:
SELECT *
FROM `user` `u`
LEFT JOIN `subscriptions` `s` ON `u`.`user_id` = `s`.`user_id`
LEFT JOIN `devices` `ud` ON `u`.`user_id` = `ud`.`user_id`
GROUP BY `u`.`user_id`
When I execute the query with LIMIT 1000 it takes about 0.05 seconds and since I'm using the data from all the three tables in a lot of queries I've decided to put it inside a VIEW:
CREATE VIEW `user_details` AS ( the same query from above )
And now when I run:
SELECT * FROM user_details LIMIT 1000
it takes about 7-10 seconds.
So my question is can I do something to optimize the view because the query seems to be pretty quick or I should the whole query instead of the view ?
Edit: this is what EXPLAIN SELECT * FROM user_details returns
+----+-------------+------------+--------+----------------+----------------+---------+------------------------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+--------+----------------+----------------+---------+------------------------+--------+-------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 322666 | |
| 2 | DERIVED | u | index | NULL | PRIMARY | 4 | NULL | 372587 | |
| 2 | DERIVED | s | eq_ref | PRIMARY | PRIMARY | 4 | db_users.u.user_id | 1 | |
| 2 | DERIVED | ud | ref | device_id_name | device_id_name | 4 | db_users.u.user_id | 1 | |
+----+-------------+------------+--------+----------------+----------------+---------+------------------------+--------+-------+
4 rows in set (8.67 sec)
this is what explain retuns for the query:
+----+-------------+-------+--------+----------------+----------------+---------+------------------------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+----------------+----------------+---------+------------------------+--------+-------+
| 1 | SIMPLE | u | index | NULL | PRIMARY | 4 | NULL | 372587 | |
| 1 | SIMPLE | s | eq_ref | PRIMARY | PRIMARY | 4 | db_users.u.user_id | 1 | |
| 1 | SIMPLE | ud | ref | device_id_name | device_id_name | 4 | db_users.u.user_id | 1 | |
+----+-------------+-------+--------+----------------+----------------+---------+------------------------+--------+-------+
3 rows in set (0.00 sec)
Views and joins are extremely bad if it comes to performance. This is more or less true for all relational database management systems. Sounds strange, since that is what those systems are designed for, but it is true nevertheless.
Try to avoid the joins if this is a query in heavy usage on your page: instead create a real table (not a view) that is filled from the three tables. you can automate that process using triggers. So each time an entry is inserted into one of the original tables the triggers takes care that the data is propagated to the physical user_details table.
This strategy certainly means a one time investment for the setup, but you definitely will get a much better performance.