slow left join using mysql - mysql

Here is the SQL query in question:
select * from company1
left join company2 on company2.model
LIKE CONCAT(company1.model,'%')
where company1.manufacturer = company2.manufacturer
company1 contains 2000 rows while company2 contains 9000 rows.
The query takes around 25 seconds to complete.
I have company1.model and company2.model indexed.
Any idea how I can speed this up? Thanks!
+----+-------------+-----------+------+---------------+------+---------+------+------+--------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+------+---------------+------+---------+------+------+--------------------------------+
| 1 | SIMPLE | company1 | ALL | NULL | NULL | NULL | NULL | 2853 | |
| 1 | SIMPLE | company2 | ALL | NULL | NULL | NULL | NULL | 8986 | Using where; Using join buffer |
+----+-------------+-------+---+------+---------------+------+---------+------+------+--------------------------------+

This query is not conceptually identical to yours, but maybe you want something like this? I am quite sure it will give you the same result as yours:
select
*
from
company1 inner join company2
on company1.manufacturer = company2.manufacturer
where
company2.model LIKE CONCAT(company1.model,'%')
EDIT: i also removed your left join and put an inner join. If the join doesn't succeed, company2.model is always null and NULL LIKE 'Something%' can never be true.

One way to speed this up is to remove the LIKE CONCAT() from the join condition.
MySQL is not able to use an index for substring based searches like that, so your query results in a full table scan.

Your EXPLAIN shows that you have no indexes that can be used.
Appropriate indexes on both tables would help. Either a single index on (manufacturer) or a composite (manufacturer, model):
ALTER TABLE company1
ADD INDEX manufacturer_model_IDX --- this is just a name (of your choice)
(manufacturer, model) ; --- for the index
ALTER TABLE company2
ADD INDEX manufacturer_model_IDX
(manufacturer, model) ;

Related

Why is my MySQL query is so slow?

I'm trying to figure out why that query so slow (take about 6 second to get result)
SELECT DISTINCT
c.id
FROM
z1
INNER JOIN
c ON (z1.id = c.id)
INNER JOIN
i ON (c.member_id = i.member_id)
WHERE
c.id NOT IN (... big list of ids which should be excluded)
This is execution plan
+----+-------------+-------+--------+-------------------+---------+---------+--------------------+--------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+--------+-------------------+---------+---------+--------------------+--------+----------+--------------------------+
| 1 | SIMPLE | z1 | index | PRIMARY | PRIMARY | 4 | NULL | 318563 | 99.85 | Using where; Using index; Using temporary |
| 1 | SIMPLE | c | eq_ref | PRIMARY,member_id | PRIMARY | 4 | z1.id | 1 | 100.00 | |
| 1 | SIMPLE | i | eq_ref | PRIMARY | PRIMARY | 4 | c.member_id | 1 | 100.00 | Using index |
+----+-------------+-------+--------+-------------------+---------+---------+--------------------+--------+----------+--------------------------+
is it because mysql has to take out almost whole 1st table ? Can it be adjusted ?
You can try to replace c with a subquery.
SELECT DISTINCT
c.id
FROM
z1
INNER JOIN
(select c.id
from c
WHERE
c.id NOT IN (... big list of ids which should be excluded)) c ON (z1.id = c.id)
INNER JOIN
i ON (c.member_id = i.member_id)
to leave only necessary id's
It is imposible to say from the information you've provided whether there is a faster solution to obtaining the same data (we would need to know abou data distributions and what foreign keys are obligatory). However assuming that this is a hierarchical data set, then the plan is probably not optimal: the only predicate to reduce the number of rows is c.id NOT IN.....
The first question to ask yourself when optimizing any query is Do I need all the rows? How many rows is this returning?
I'm struggling to see any utlity in a query which returns a list of 'id' values (implying a set of autoincrement integers).
You can't use an index for a NOT IN (or <>) hence the most eficient solution is probably to start with a full table scan on 'c' - which should be the outcome of StanislavL's query.
Since you don't use the values from i and z, the joins could be replaced with 'exists' which may help performance.
I would consider creating a compound index for c(id, member_id). This way the query should work at index level only without scanning any rows in tables.

joining table in mysql not using index properly?

I have four tables that I am trying to join and output the result to a new table. My code looks like this:
create table tbl
select a.dte, a.permno, (ret - rf) f0_xs_ret, (xs_ret - (betav*xs_mkt)) f0_resid, mkt_cap last_year_mkt_cap, betav beta_value
from a inner join b using (dte)
inner join c on (year(a.dte) = c.yr and a.permno = c.permno)
inner join d on (a.permno = d.permno and year(a.dte)-1 = year(d.dte));
All of the tables have multiple indices and for table a, (dte, permno) identify a unique record, for table b, dte id's a unique record, for table c, (yr, permno) id a unique record and for table d, (dte, permno) id a unique record. the explain from the select part of the query is:
+----+-------------+-------+--------+-------------------+---------+---------+---------- ------------------------+--------+-------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-------------------+---------+---------+---------- ------------------------+--------+-------------------+
| 1 | SIMPLE | d | ALL | idx1 | NULL | NULL | NULL | 264129 | |
| 1 | SIMPLE | c | ref | idx2 | idx2 | 4 | achernya.d.permno | 16 | |
| 1 | SIMPLE | b | ALL | PRIMARY,idx2 | NULL | NULL | NULL | 12336 | Using join buffer |
| 1 | SIMPLE | a | eq_ref | PRIMARY,idx1,idx2 | PRIMARY | 7 | achernya.b.dte,achernya.d.permno | 1 | Using where |
+----+-------------+-------+--------+-------------------+---------+---------+----------------------------------+--------+-------------------+
Why does mysql have to read so many rows to process this thing? and if i am reading this correctly, it has to read (264129*16*12336) rows which should take a good month.
Could someone please explain what's going on here?
MySQL has to read the rows because you're using functions as your join conditions. An index on dte will not help resolve YEAR(dte) in a query. If you want to make this fast, then put the year in its own column to use in joins and move the index to that column, even if that means some denormalization.
As for the other columns in your index that you don't apply functions to, they may not be used if the index won't provide much benefit, or they aren't the leftmost column in the index and you don't use the leftmost prefix of that index in your join condition.
Sometimes MySQL does not use an index, even if one is available. One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table. (In this case, a table scan is likely to be much faster because it requires fewer seeks.)
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

SQL Query Optimization

I am trying to speed up this django app (note: I didn't design this... just stuck maintaining it) and the biggest bottle neck seems to be these queries that are being generated by the admin. We have a content class that 4-5 other sub-classes inherit from and anytime the master list is pulled up in the admin a query like this is generated:
SELECT `content_content`.`id`,
`content_content`.`issue_id`,
`content_content`.`slug`,
`content_content`.`section_id`,
`content_content`.`priority`,
`content_content`.`group_id`,
`content_content`.`rotatable`,
`content_content`.`pub_status`,
`content_content`.`created_on`,
`content_content`.`modified_on`,
`content_content`.`old_pk`,
`content_content`.`content_type_id`,
`content_image`.`content_ptr_id`,
`content_image`.`caption`,
`content_image`.`kicker`,
`content_image`.`pic`,
`content_image`.`crop_x`,
`content_image`.`crop_y`,
`content_image`.`crop_side`,
`content_issue`.`id`,
`content_issue`.`special_issue_name`,
`content_issue`.`web_publish_date`,
`content_issue`.`issue_date`,
`content_issue`.`fm_name`,
`content_issue`.`arts_name`,
`content_issue`.`comments`,
`content_section`.`id`,
`content_section`.`name`,
`content_section`.`audiodizer_id`
FROM `content_image`
INNER
JOIN `content_content`
ON `content_image`.`content_ptr_id` = `content_content`.`id`
INNER
JOIN `content_issue`
ON `content_content`.`issue_id` = `content_issue`.`id`
INNER
JOIN `content_section`
ON `content_content`.`section_id` = `content_section`.`id`
WHERE NOT ( `content_content`.`pub_status` = -1 )
ORDER BY `content_issue`.`issue_date` DESC LIMIT 30
I ran an EXPLAIN on this and got the following:
+----+-------------+-----------------+--------+-------------------------------------------------------------------------------------------------+---------+---------+--------------------------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------------+--------+-------------------------------------------------------------------------------------------------+---------+---------+--------------------------------------+-------+---------------------------------+
| 1 | SIMPLE | content_image | ALL | PRIMARY | NULL | NULL | NULL | 40499 | Using temporary; Using filesort |
| 1 | SIMPLE | content_content | eq_ref | PRIMARY,issue_id,content_content_issue_id,content_content_section_id,content_content_pub_status | PRIMARY | 4 | content_image.content_ptr_id | 1 | Using where |
| 1 | SIMPLE | content_section | eq_ref | PRIMARY | PRIMARY | 4 | content_content.section_id | 1 | |
| 1 | SIMPLE | content_issue | eq_ref | PRIMARY | PRIMARY | 4 | content_content.issue_id | 1 | |
+----+-------------+-----------------+--------+-------------------------------------------------------------------------------------------------+---------+---------+--------------------------------------+-------+---------------------------------+
Now, from what I've read, I need to somehow figure out how to make the query to content_image not be terrible; however, I'm drawing a blank on where to start.
Currently, judging by the execution plan, MySQL is starting with content_image, retrieving all rows, and only thereafter using primary keys on the other tables: content_image has a foreign key to content_content, and content_content has foreign keys to content_issue and content_section. Also, only after all the joins are complete can it make much use of the ORDER BY content_issue.issue_date DESC LIMIT 30, since it can't tell which of these joins might fail, and therefore, how many records from content_issue will really be needed before it can get the first thirty rows of output.
So, I would try the following:
Change JOIN content_issue to JOIN (SELECT * FROM content_issue ORDER BY issue_date DESC LIMIT 30) content_issue. This will allow MySQL, if it starts with content_issue and works its way to the other tables, to grab a very small subset of content_issue.
Note: properly speaking, this changes the semantics of the query: it means that only records from at most the last 30 content_issues will be retrieved, and therefore that if some of those issues don't have published contents with images, then fewer than 30 records will be retrieved. I don't have enough information about your data to gauge whether this change of semantics would actually change the results you get.
Also note: I'm not suggesting to remove the ORDER BY content_issue.issue_date DESC LIMIT 30 from the end of the query. I think you want it in both places.
Add an index on content_issue.issue_date, to optimize the above subquery.
Add an index on content_image.content_ptr_id, so MySQL can work its way from content_content to content_image without doing a full table scan.

Fullscan on other table LEFT JOIN

I have 2 database tables Companies (uses InnoDB engine) and Company_financial_figures (uses MyISAM engine). Table Companies has about 300 000 records, Company_financial_figures has 600 000 recodrs. Also Table Company_financial_figures has flag that is used in table LEFT JOIN.
Query idea is to select all actual balances for companies (there is a situation when tere is no balance data for that company, but anyway it must be selected, so I have to use LEFT JOIN). It seems to me, that it must select about 300k records from Company_financial_figures table, but not to make Full table scan, like 600k records. And the performance for this query is very slow.
Query is something like this:
SELECT DISTINCT comp.id, comp.name, comp.surname, cff.balance FROM companies comp LEFT JOIN company_financial_figures cff ON (cff.company_id = comp.id AND cff.actual = 1)
+----+-------------+-------------------+-------+---------------+----------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------------+-------+---------------+----------+---------+------+--------+-------------+
| 1 | SIMPLE | companies | index | NULL | comp_i_i | 2 | NULL | 346908 | Using index |
| 1 | SIMPLE | company_finan.. | ALL | NULL | NULL | NULL | NULL | 610364 | |
+----+-------------+-------------------+-------+---------------+----------+---------+------+--------+-------------+
I have index on company_id column, but it doesn't help.
Any suggestions?
Why do you need the DISTINCT keyword? DISTINCT always slows down your querying because every record has to be considered explicitly for DISTINCT comparison. I don't know how MySQL handles this in detail, but in Oracle, you can get horrible execution plans if one of your DISTINCT fields is nullable.
But you shouldn't need it, your records are probably distinct anyway, because you select comp.id (I'm guessing?) Wouldn't it be a lot faster when you remove DISTINCT ?

Cascading WHERE clause inside a VIEW with a UNION

This isn't solved, but I found out why: MySQL View containing UNION does not optimize well...In other words SLOW!
Original post:
I'm working with a database for a game. There are two identical tables equipment and safety_dep_box. To check if a player has a piece of equipment I'd like to check both tables.
Instead of doing two queries, I want to take advantage of the UNION functionality in MySQL. I've recently learned that I can create a VIEW. Here's my view:
CREATE VIEW vAllEquip AS SELECT * FROM equipment UNION SELECT * FROM safety_dep_box;
The view created just fine. However when I run
SELECT * FROM vAllEquip WHERE owner=<id>
The query takes forever, while independent select queries are quick. I think I know why, but I don't know how to fix it.
Thanks!
P.S. with Additional Information:
The two tables are identical in structure, but split because they are multi-100-million row tables.
The structure includes primary key on int id, multiple index on int owner.
What I don't understand is the speed difference between the following:
SELECT COUNT(*) FROM (SELECT * FROM equipment WHERE owner=1 UNION ALL SELECT * FROM safety_dep_box WHERE owner=1) AS uES;
0.42 sec
SELECT COUNT(*) FROM (SELECT * FROM equipment WHERE owner=1 UNION SELECT * FROM safety_dep_box WHERE owner=1) AS uES;
0.37 sec
SELECT COUNT(*) FROM vAllEquip WHERE owner=1;
aborted after 60 seconds
Version: 5.1.51
mysql> explain SELECT * FROM equipment UNION SELECT * FROM safety_dep_box;
+----+--------------+----------------+------+---------------+------+---------+------+---------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+----------------+------+---------------+------+---------+------+---------+-------+
| 1 | PRIMARY | equipment | ALL | NULL | NULL | NULL | NULL | 1499148 | |
| 2 | UNION | safety_dep_box | ALL | NULL | NULL | NULL | NULL | 867321 | |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+----------------+------+---------------+------+---------+------+---------+-------+
with a WHERE clause
mysql> explain SELECT * FROM equipment WHERE owner=1 UNION ALL SELECT * FROM safety_dep_box WHERE owner=1
-> ;
+----+--------------+----------------+------+-----------------------+-------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+----------------+------+-----------------------+-------+---------+-------+------+-------+
| 1 | PRIMARY | equipment | ref | owner,owner_2,owner_3 | owner | 4 | const | 1 | |
| 2 | UNION | safety_dep_box | ref | owner,owner_3 | owner | 4 | const | 1 | |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+----------------+------+-----------------------+-------+---------+-------+------+-------+
First off, you should probably be using UNION ALL instead of plain UNION. With plain UNION, the engine will try to de-duplicate your result set. That is likely the source of your problem.
Secondly, you'll need indexes on owner in both tables, not just one. And, ideally, they'll be integer columns.
Thirdly, Randolph is right that you should not be using "*" in your SELECT statement. List out all the columns you want included. That is especially important in a UNION because the columns must match up exactly and, if there's a disagreement in the column order in your two tables you may be forcing some type conversion to go on that is costing you some time.
Finally, the phrase "There are two identical tables" is almost always a tip-off that your database is not optimally designed. These should probably be a single table. To indicate ownership of an item, your safety_dep_box table should contain only the ownerID and itemID of the item (to relate equipment and players), and possibly an additional autonumbered integer key column.
First off, don't use SELECT * in views ever. It's lazy code. Secondly, without knowing what the base tables look like, we're even less likely to be able to help you.
The reason it takes forever is because it has to build the full result and then filter it. You'll want indexes on your owner fields, whatever they may be.