MySQL LEFT JOIN order of ON conditions

MySQL LEFT JOIN order of ON conditions - mysql

I have a MySQL query with inner joins and one left join and a lot of data in my database, and it's running quite slow. This is roughly my query:
SELECT
main_table.*
FROM
main_table
INNER JOIN
...
LEFT JOIN
second_table ON (main_table.id = second_table.ref_id AND second_table.type = 'foo' AND second_table.bar IS NULL
WHERE
second_table.id IS NULL
;
An entry from main_table may have one or more referenced entries in second_table. I want to get all results from main_table, that either have no results in second_table, or only has irrelevant data in the second table (type 'foo' or bar is NULL).
Taking a look into the EXPLAIN, MySQL searches for bar IS NULL first, followed by type = 'foo', that would still result in many thousands of result, whereas checking for ref_id first would only leave very few results to check the other conditions on.
I only have an index on ref_id, not for type or bar and I don't feel the need to index them if I could just get the query search for ref_id first.
--EDIT: I noticed that on the copy of the database (where it has the actual data and runs slow) does also have an index on type and bar individually, so that's probably why MySQL prefers bar over the other keys. I'm considering a key spanning multiple fields.--
Does anybody have an idea how to optimize this kind of query? Is it possible to force MySQL using a certain order in the ON conditions?
"Solution": I added an index spanned over all the relevant fields.
I don't consider this being a real solution, because I believe, it would also have been faster if the JOIN was done on the indexed ref_id first. It probably did so when that was the only index, however my colleague had the idea to add an index separately on the other fields as well for some reason, probably needed somewhere else in our application.

What happens if you move the "Irrelevant" rows to the where part?
Seems to me the DB should have an easier time joining the tables, and will use the index
Something like
SELECT
main_table.*
FROM
main_table
INNER JOIN
...
LEFT JOIN
second_table ON main_table.id = second_table.ref_id
WHERE
second_table.id IS NULL OR
(second_table.type = 'foo' AND second_table.bar IS NULL)

In MYSQL JOIN is faster then LEFT JOIN so you can write your query like this.
SELECT
main_table.*
FROM
main_table
INNER JOIN
...
LEFT JOIN (SELECT main_table.*,second_table.* FROM main_table
JOIN second_table ON main_table.id = second_table.ref_id AND
second_table.type = 'foo' AND second_table.bar IS NULL) AS main_table2 ON
main_table2.id = main_table.id
WHERE
second_table.id IS NULL;

Related

Query with multiple table joins taking too much time despite indexing

Query-
SELECT SUM(sale_data.total_sale) as totalsale, `sale_data_temp`.`customer_type_cy` as `customer_type`, `distributor_list`.`customer_status` FROM `distributor_list` LEFT JOIN `sale_data` ON `sale_data`.`depo_code` = `distributor_list`.`depo_code` and `sale_data`.`customer_code` = `distributor_list`.`customer_code` LEFT JOIN `sale_data_temp` ON `distributor_list`.`address_coordinates` = `sale_data_temp`.`address_coordinates` LEFT JOIN `item_master` ON `sale_data`.`item_code` = `item_master`.`item_code` WHERE `invoice_date` BETWEEN "2017-04-01" and "2017-11-01" AND `item_master`.`id_category` = 1 GROUP BY `distributor_list`.`address_coordinates`
Query, rewritten with formatting.
SELECT SUM(sale_data.total_sale) as totalsale,
sale_data_temp.customer_type_cy as customer_type,
distributor_list.customer_status
FROM distributor_list
LEFT JOIN sale_data
ON sale_data.depo_code = distributor_list.depo_code
and sale_data.customer_code = distributor_list.customer_code
LEFT JOIN sale_data_temp
ON distributor_list.address_coordinates = sale_data_temp.address_coordinates
LEFT JOIN item_master
ON sale_data.item_code = item_master.item_code
WHERE invoice_date BETWEEN "2017-04-01" and "2017-11-01"
AND item_master.id_category = 1
GROUP BY distributor_list.address_coordinates
DESC-
This Query is taking 7.5 seconds to run. My application contains 3-4 such queries. Therefore loading time appraches 1 min on server.
My sale data table contains 450K records.
Distributor list contains 970 records
Item master contains 7774 records and sale_data_temp contains 324 records.
I am using indexing but it is not being used for sale data table.
All the 400K records are searched as is evident from explain sql.
If I reduce the duration of BETWEEN clause than sale data table uses date index otherwise it scans all 400K rows.
The rows between 01-04-2017 and 01-11-2017 are 84000 but still it scans 400K rows.
MYSQL EXPLAIN-
I have modified queries two times with no success.
Modification 1:
SELECT SUM(sale_data.total_sale) as totalsale, `sale_data_temp`.`customer_type_cy` as `customer_type`, `distributor_list`.`customer_status` FROM `distributor_list` LEFT JOIN `sale_data` ON `sale_data`.`depo_code` = `distributor_list`.`depo_code` and `sale_data`.`customer_code` = `distributor_list`.`customer_code` AND `invoice_date` BETWEEN "2017-04-01" and "2017-11-01" LEFT JOIN `sale_data_temp` ON `distributor_list`.`address_coordinates` = `sale_data_temp`.`address_coordinates` LEFT JOIN `item_master` ON `sale_data`.`item_code` = `item_master`.`item_code` WHERE `item_master`.`id_category` = 1 GROUP BY `distributor_list`.`address_coordinates`
Modification 2
SELECT SQL_NO_CACHE SUM( sd.total_sale ) AS totalsale, `sale_data_temp`.`customer_type_cy` AS `customer_type` , `distributor_list`.`customer_status` FROM `distributor_list` LEFT JOIN (SELECT * FROM `sale_data` WHERE `invoice_date` BETWEEN "2017-04-01" AND "2017-11-01")sd ON `sd`.`depo_code` = `distributor_list`.`depo_code` AND `sd`.`customer_code` = `distributor_list`.`customer_code` LEFT JOIN `sale_data_temp` ON `distributor_list`.`address_coordinates` = `sale_data_temp`.`address_coordinates` LEFT JOIN `item_master` ON `sd`.`item_code` = `item_master`.`item_code` WHERE `item_master`.`id_category` =1 GROUP BY `distributor_list`.`address_coordinates`
HERE ARE MY INDEXES ON SALE DATA TABLE

See the key column of the EXPLAIN results view - no key is being used at the moment so MySQL is not using any of your indexes for filtering out rows so it is scanning the whole table on each query. This is why it is taking so long.
I have taken a look at your first query with relation to your sale_data indices. It looks like you will need to create a new composite index on this table that contains the following columns only:
depo_code, customer_code, item_code, invoice_date, total_sale
I recommend that you name this index test1 and experiment with modifying the ordering of the columns and keep testing again each time using EXPLAIN EXTENDED until you achieve a selected key - you want to see index test1 has been selected in the key column.
See this answer that has helped me before with this, and it will help you understand the importance of correctly ordering your composite indices.
Looking at the cardinality of the single field indices, here is my best attempt at giving you the correct index to apply:
ALTER TABLE `sale_data` ADD INDEX `test1` (`item_code`, `customer_code`, `invoice_date`, `depo_code`, `total_sale`);
Good luck with your mission!

A few things to notice about your query.
You are misusing the notorious MySQL extension to GROUP BY. Read this, then mention the same columns in your GROUP BY clause as you mention in your SELECT clause.
Your LEFT JOIN sale_data and LEFT JOIN item_master operations are actually ordinary JOIN operations. Why? You mention columns from those tables in your WHERE clause.
Your best bet for speedup is doing a date-range scan on an index on sale_data.invoice_date. For some reason known only to the MySQL query planner's feverish machinations, you're not getting it.
Try refactoring your query. Here's one suggestion:
SELECT SUM(sale_data.total_sale) as totalsale,
sale_data_temp.customer_type_cy as customer_type,
distributor_list.customer_status
FROM distributor_list
JOIN sale_data
ON sale_data.invoice_date BETWEEN "2017-04-01" and "2017-11-01"
and sale_data.depo_code = distributor_list.depo_code
and sale_data.customer_code = distributor_list.customer_code
LEFT JOIN sale_data_temp
ON distributor_list.address_coordinates = sale_data_temp.address_coordinates
JOIN item_master
ON sale_data.item_code = item_master.item_code
WHERE item_master.id_category = 1
GROUP BY sale_data_temp.customer_type_cy, distributor_list.customer_status
Try creating a covering index on sale_data for this query. You'll have to mess around a bit to get this right, but this is a starting point. (invoice_date, item_code, depo_code, customer_code, total_sale). The point of a covering index is to allow the query to be satisfied entirely from the index without having to refer back to the table's data. That's why I included total_sale in the index.
Please notice that index I suggested makes your index on invoice_date redundant. You can drop that index.

mariadb - LEFT OUTER JOIN

recently have migrated a server, and I have found this "error", I had mysql as a DB, and what I wanted (I'm not an expert on SQL), was to join 2 related tables by 1:N, as an example,
Table 1: Badges_Person
Table 2: Badges
Badges is a table with the badges, and Badges_Person contains a relation like (id_badge, id_person), easy, uh?
Well this SQL query always seemed to work fine:
SELECT id, nombre, descripcion, insignias.time, obtained
FROM insignias LEFT OUTER JOIN
(SELECT *, '1' as obtained
FROM insignias_user
WHERE insignias_user.username = 'Octal'
) as insignias_user_seleccionado
ON insignias.id = insignias_user_seleccionado.id_insignia;
The output of this query was the list of badges with a 'obtained' column (0 or 1) which says if the user 'Octal' has that badge or not.
So..., now, I have mariadb as DB, and it returns a different output, where all the rows are being marked with 'obtained' = 1.
I came here because as far as I have tried I have discarded all the silly posible errors.

I cannot speak to why the query is not working. That would seem to be a data issue -- all the rows match.
But, there is a better way to write the query:
SELECT i.id, i.nombre, i.descripcion, i.time, ius.obtained
FROM insignias i LEFT OUTER JOIN
insignias_user iu
ON i.id = ius.id_insignia AND ius.username = 'Octal';
This is much more efficient because the intermediate table does not need to be materialized and the database can make use of appropriate indexes on insgnias_user.
Also note: I changed the column references to qualified column names. The table alias may not be correct.

SELECT i.id, i.nombre, i.descripcion, i.time, IF(ius.id_insignia IS NULL, 0, 1)
FROM insignias i LEFT OUTER JOIN insignias_user ius
ON i.id = ius.id_insignia AND ius.username = 'Octal';
Ok, it works again, thank you.

Dependant SubQuery v Left Join

This query displays the correct result but when doing an EXPLAIN, it lists it as a "Dependant SubQuery" which I'm led to believe is bad?
SELECT Competition.CompetitionID, Competition.CompetitionName, Competition.CompetitionStartDate
FROM Competition
WHERE CompetitionID NOT
IN (
SELECT CompetitionID
FROM PicksPoints
WHERE UserID =1
)
I tried changing the query to this:
SELECT Competition.CompetitionID, Competition.CompetitionName, Competition.CompetitionStartDate
FROM Competition
LEFT JOIN PicksPoints ON Competition.CompetitionID = PicksPoints.CompetitionID
WHERE UserID =1
and PicksPoints.PicksPointsID is null
but it displays 0 rows. What is wrong with the above compared to the first query that actually does work?

The seconds query cannot produce rows: it claims:
WHERE UserID =1
and PicksPoints.PicksPointsID is null
But to clarify, I rewrite as follows:
WHERE PicksPoints.UserID =1
and PicksPoints.PicksPointsID is null
So, on one hand, you are asking for rows on PicksPoints where UserId = 1, but then again you expect the row to not exist in the first place. Can you see the fail?
External joins are so tricky at that! Usually you filter using columns from the "outer" table, for example Competition. But you do not wish to do so; you wish to filter on the left-joined table. Try and rewrite as follows:
SELECT Competition.CompetitionID, Competition.CompetitionName, Competition.CompetitionStartDate
FROM Competition
LEFT JOIN PicksPoints ON (Competition.CompetitionID = PicksPoints.CompetitionID AND UserID = 1)
WHERE
PicksPoints.PicksPointsID is null
For more on this, read this nice post.
But, as an additional note, performance-wise you're in some trouble, using either subquery or the left join.
With subquery you're in trouble because up to 5.6 (where some good work has been done), MySQL is very bad with optimizing inner queries, and your subquery is expected to execute multiple times.
With the LEFT JOIN you are in trouble since a LEFT JOIN dictates the order of join from left to right. Yet your filtering is on the right table, which means you will not be able to use an index for filtering the USerID = 1 condition (or you would, and lose the index for the join).

These are two different queries. The first query looks for competitions associated with user id 1 (via the PicksPoints table), which the second joins with those rows that are associated with user id 1 that in addition have a null PicksPointsID.
The second query is coming out empty because you are joining against a table called PicksPoints and you are looking for rows in the join result that have PicksPointsID as null. This can only happen if
The second table had a row with a null PickPointsID and a competition id that matched a competition id in the first table, or
All the columns in the second table's contribution to the join are null because there is a competition id in the first table that did not appear in the second.
Since PicksPointsID really sounds like a primary key, it's case 2 that is showing up. So all the columns from PickPointsID are null, your where clause (UserID=1 and PicksPoints.PicksPointsID is null) will always be false and your result will be empty.
A plain left join should work for you
select c.CompetitionID, c.CompetitionName, c.CompetitionStartDate
from Competition c
left join PicksPoints p
on (c.CompetitionID = p.CompetitionID)
where p.UserID <> 1
Replacing the final where with an and (making a complex join clause) might also work. I'll leave it to you to analyze the plans for each query. :)
I'm not personally convinced of the need for the is null test. The article linked to by Shlomi Noach is excellent and you may find some tips in there to help you with this.

MySQL JOIN tables with WHERE clause

I need to gather posts from two mysql tables that have different columns and provide a WHERE clause to each set of tables. I appreciate the help, thanks in advance.
This is what I have tried...
SELECT
blabbing.id,
blabbing.mem_id,
blabbing.the_blab,
blabbing.blab_date,
blabbing.blab_type,
blabbing.device,
blabbing.fromid,
team_blabbing.team_id
FROM
blabbing
LEFT OUTER JOIN
team_blabbing
ON team_blabbing.id = blabbing.id
WHERE
team_id IN ($team_array) ||
mem_id='$id' ||
fromid='$logOptions_id'
ORDER BY
blab_date DESC
LIMIT 20
I know that this is messy, but i'll admit, I am no mysql veteran. I'm a beginner at best... Any suggestions?

You could put the where-clauses in subqueries:
select
*
from
(select * from ... where ...) as alias1 -- this is a subquery
left outer join
(select * from ... where ...) as alias2 -- this is also a subquery
on
....
order by
....
Note that you can't use subqueries like this in a view definition.
You could also combine the where-clauses, as in your example. Use table aliases to distinguish between columns of different tables (it's a good idea to use aliases even when you don't have to, just because it makes things easier to read). Example:
select
*
from
<table> as alias1
left outer join
<othertable> as alias2
on
....
where
alias1.id = ... and alias2.id = ... -- aliases distinguish between ids!!
order by
....

Two suggestions for you since a relative newbie in SQL. Use "aliases" for your tables to help reduce SuperLongTableNameReferencesForColumns, and always qualify the column names in a query. It can help your life go easier, and anyone AFTER you to better know which columns come from what table, especially if same column name in different tables. Prevents ambiguity in the query. Your left join, I think, from the sample, may be ambigous, but confirm the join of B.ID to TB.ID? Typically a "Team_ID" would appear once in a teams table, and each blabbing entry could have the "Team_ID" that such posting was from, in addition to its OWN "ID" for the blabbing table's unique key indicator.
SELECT
B.id,
B.mem_id,
B.the_blab,
B.blab_date,
B.blab_type,
B.device,
B.fromid,
TB.team_id
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
WHERE
TB.Team_ID IN ( you can't do a direct $team_array here )
OR B.mem_id = SomeParameter
OR b.FromID = AnotherParameter
ORDER BY
B.blab_date DESC
LIMIT 20
Where you were trying the $team_array, you would have to build out the full list as expected, such as
TB.Team_ID IN ( 1, 4, 18, 23, 58 )
Also, not logical "||" or, but SQL "OR"
EDIT -- per your comment
This could be done in a variety of ways, such as dynamic SQL building and executing, calling multiple times, once for each ID and merging the results, or additionally, by doing a join to yet another temp table that gets cleaned out say... daily.
If you have another table such as "TeamJoins", and it has say... 3 columns: a date, a sessionid and team_id, you could daily purge anything from a day old of queries, and/or keep clearing each time a new query by the same session ID (as it appears coming from PHP). Have two indexes, one on the date (to simplify any daily purging), and second on (sessionID, team_id) for the join.
Then, loop through to do inserts into the "TempJoins" table with the simple elements identified.
THEN, instead of a hard-coded list IN, you could change that part to
...
FROM
blabbing B
LEFT JOIN team_blabbing TB
ON B.ID = TB.ID
LEFT JOIN TeamJoins TJ
on TB.Team_ID = TJ.Team_ID
WHERE
TB.Team_ID IN NOT NULL
OR B.mem_id ... rest of query

What I ended up doing is;
I added an extra column to my blabbing table called team_id and set it to null as well as another field in my team_blabbing table called mem_id
Then I changed the insert script to also insert a value to the mem_id in team_blabbing.
After doing this I did a simple UNION ALL in the query:
SELECT
*
FROM
blabbing
WHERE
mem_id='$id' OR
fromid='$logOptions_id'
UNION ALL
SELECT
*
FROM
team_blabbing
WHERE
team_id
IN
($team_array)
ORDER BY
blab_date DESC
LIMIT 20
I am open to any thought on what I did. Try not to be too harsh though:) Thanks again for all the info.

What is the size limitation for IN and NOT IN in MySQL

I get out of memory exception in my application, when the condition for IN or NOT IN is very large. I would like to know what is the limitation for that.

Perhaps you would be better off with another way to accomplish your query?
I suggest you load your match values into a single-column table, and then inner-join the column being queried to the single column in the new table.
Rather than
SELECT a, b, c FROM t1 WHERE d in (d1, d2, d3, d4, ...)
build a temp table with 1 column, call it "dval"
dval
----
d1
d2
d3
SELECT a, b, c FROM t1
INNER JOIN temptbl ON t1.d = temptbl.dval

Having to ask about limits when either doing a SQL query or database design is a good indicator that you're doing it wrong.

I only ever use IN and NOT IN when the condition is very small (under 100 rows or so). It performs well in those scenarios. I use an OUTER JOIN when the condition is large as the query doesn't have to look up the "IN" condition for every tuple. You just have to check the table that you want all rows to come from.
For "IN" the join condition IS NOT NULL
For "NOT IN" the join condition IS NULL
e.g.
/* Get purchase orders that have never been rejected */
SELECT po.*
FROM PurchaseOrder po LEFT OUTER JOIN
(/* Get po's that have been rejected */
SELECT po.PurchaesOrderID
FROM PurchaseOrder po INNER JOIN
PurchaseOrderStatus pos ON po.PurchaseOrderID = pos.PurchaseOrderID
WHERE pos.Status = 'REJECTED'
) por ON po.PurchaseOrderID = por.PurchaseOrderID
WHERE por.PurchaseOrderID IS NULL /* We want NOT IN */

I"m having a similar issue but only passing 100 3 digit ids in my IN clause. When I look at the stack trace, it actually cuts off the comma separate values in the IN clause. I don't get an error, I just don't get all the results to return. Has anyone had an issue like this before? If its relevant, I'm using the symfony framework... I'm checking to see if its a propel issue but just wanted to see if it could be sql

I have used IN with quite large lists of IDs - I suspect that the memory problem is not in the query itself. How are you retrieving the results?
This query, for example is from a live site:
SELECT DISTINCT c.id, c.name FROM categories c
LEFT JOIN product_categories pc ON c.id = pc.category_id
LEFT JOIN products p ON p.id = pc.product_id
WHERE p.location_id IN (
955,891,901,877,736,918,900,836,846,914,771,773,833,
893,782,742,860,849,850,812,945,775,784,746,1036,863,
750,763,871,817,749,838,986,794,867,758,923,804,733,
949,808,837,741,747,954,939,865,857,787,820,783,760,
911,745,928,818,887,847,978,852
) ORDER BY c.name ASC
My first pass at the code is terribly naive and there are about 10 of these queries on a single page and the database doesn't blink.
You could, of course, be running a list of 100k values which would be a different story altogether.

I don't know what the limit is, but I've run into this problem before as well. I had to rewrite my query something like this:
select * from foo
where id in (select distinct foo_id from bar where ...)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008