Convert a MySQL NOT EXISTS into an INNER JOIN - mysql

I'm a little new to joins, so I'm not even sure if this is possible. I've been Googling and trying a few things..
What I need:
Select data.id where the corresponding user2data.user_id does not exist where user2data.user_id = 'X'
Exciting right? :D
What works:
SELECT * FROM data WHERE NOT EXISTS (SELECT * FROM user2data WHERE user2data.user_id=1 AND user2data.data_id=data.id) LIMIT 100;
However, it's slow, even though all 3 columns are indexed. I tried an OUTER JOIN for this purpose from another SO answer, but it's EVEN SLOWER than the above. What I need is an INNER JOIN.
Please let me know if this is actually possible, or if there is an alternative that takes advantage of the indexes.
Thanks and best

Could be you can use left join
SELECT *
FROM data WHERE
LEFT JOIN user2data ON ( user2data.user_id=1 AND user2data.data_id=data.id )
where user2data.data_id is null
LIMIT 100;

Related

Is there any difference, performance wise, with these two queries? (Repeating the where clause inside the sub-query) MYSQL

I have a query that goes something like this.
Select *
FROM FaultCode FC
JOIN (
SELECT INNER_E.* FROM Equipment INNER_E
) E USING(EquipmentID)
LEFT JOIN AssetType AT ON AT.id_asset_type = E.id_asset_type AND AT.id_language = 'en-us'
LEFT JOIN Project P ON E.current_id_project = P.id_project
WHERE E.id_organization = 100057 AND E.equipment_status = 'ACTIVE'
AND FC.code_status = 'OPEN'
As you can see, in the outside query, there is a where clause in the outside main query.
But also, on the inside, we have an Inner Join statement with the line SELECT INNER_E.* FROM Equipment INNER_E. This inner join makes us only retrieve the fault codes that are inside the equipment table (correct me if I'm wrong).
I am trying to optimize this query.
My question is, does it make any difference to do this
Select *
FROM FaultCode FC
JOIN (
SELECT INNER_E.* FROM Equipment INNER_E
WHERE INNER_E.id_organization = 100057 AND INNER_E.equipment_status = 'ACTIVE'
) E USING(EquipmentID)
LEFT JOIN AssetType AT ON AT.id_asset_type = E.id_asset_type AND AT.id_language = 'en-us'
LEFT JOIN Project P ON E.current_id_project = P.id_project
WHERE E.id_organization = 100057 AND E.equipment_status = 'ACTIVE'
AND FC.code_status = 'OPEN'
So repeating the where clause inside the inner sub query, to further limit it before it joins. Or does the optimizer know to do this automatically?
I tried implementing that line in code, and it seemed to only make my query slower strangely enough. Is there any way I can optimize that query above, or since it's pretty simple, is that the best it's going to get without indexes?
I tried running the Explain Select statement, but I have a hard time parsing what it's telling me. Are there any good resources I can look into to learn some tips or techniques to optimize my query?
I don't have any aggregate functions in my Select fields. So is the only real answer Indexes?
Why is the first subquery needed? Perhaps simply
Select *
FROM FaultCode FC
JOIN Equipment AS E USING(EquipmentID)
LEFT JOIN AssetType AT ON AT.id_asset_type = E.id_asset_type
AND AT.id_language = 'en-us'
LEFT JOIN Project P ON E.current_id_project = P.id_project
WHERE E.id_organization = 100057
AND E.equipment_status = 'ACTIVE'
AND FC.code_status = 'OPEN';
Likely Indexes:
FC: INDEX(code_status, EquipmentID)
E: INDEX(id_organization, equipment_status, EquipmentID,)
Probably unwise to do SELECT * -- It will give you all the columns of all 4 tables. (Without further details, I cannot suggest any "covering" indexes, which seems likely for AT.)
With my version of the query, your question about repeating the WHERE vanishes. With your version, it is likely to help. I don't think the Optimizer is smart enough to catch on to what you are doing.
Show us the EXPLAINs. We can help some with what the cryptic stuff is saying. (And what it is not saying.)
"the best it's going to get without indexes" -- Are you saying you have no indexes??! Not even a PRIMARY KEY for each table? "So is the only real answer Indexes?" Every time you write a query against a non-tiny table, you should ask "do the table(s) have adequate indexes for this query?"

MySQL Join AND EXISTS in combination

Case
I got the following query:
SELECT * FROM `parking_parking`
JOIN `parking_address` ON `parking_parking`.`parking_address` = `parking_address`.`address_id`
WHERE `parking_id` = 3
This query selects information about a parking (address and data about the parking itself)
And in general.. IT WORKS!
Problem
There is a small problem though..
Whenever the address has been deleted from the database and the parking itself still exists. The entire query returns 0. Simply because it looks for and A AND B as linked. But if one not found the second won't be returned either.
Now there is a solution..
EXISTS
However I do not know how to use it.
I tried:
EXISTS JOIN
JOIN EXISTS
JOIN `parking_address` ON EXISTS
But to no avail.
I hope (and guess) I have overlooked a small thing.
Note
!! I do not use this in real life! !!
SELECT * FROM
I did this one when I was still at the veeeery basics and I found out the hard way that even the simplest pages took ages to load.
Solution by : GolezTrol
SELECT * FROM `parking_parking`
LEFT JOIN `parking_address` ON `parking_parking`.`parking_address` = `parking_address`.`address_id`
WHERE `parking_id` = 3
Change join (which is short for inner join) to left join (= left outer join). This will return all parkings and will just return null for the address fields if there is no matching address:
SELECT * FROM `parking_parking`
LEFT JOIN `parking_address`
ON `parking_parking`.`parking_address` = `parking_address`.`address_id`
WHERE `parking_id` = 3

Join table very slow how can I fix it?

I have problem with my query because of joining a lot of table. So I would like to find another ways beside this query but the result it the same. My code is bellow :
SELECT
tblReturn.CountryID,
tblReturn.ShopLocationId,
tblReturn.ReturnID,
tblVoucherDetail.VoucherDetailID,
tblVoucherDetail.VoucherNo,
tblVoucherDetail.BarcodeVoucher,
tblReturn.DateTimeStamp AS DateTimeReturn,
tblClient.ClientNoString,
tblSaleDetail.BarCode,
tblReturnType.Description,
tblReturn.Reason,
tblSale.DateTimeStamp AS DateTimeSale,tblReturn.Status
FROM
(
tblSale
INNER JOIN
(
(
(
tblReturn
INNER JOIN
(
tblVoucherDetail
INNER JOIN
tblVoucher ON tblVoucherDetail.VoucherID = tblVoucher.VoucherID
) ON tblReturn.ReturnID = tblVoucher.ReturnID
)
INNER JOIN
tblClient
ON
tblVoucherDetail.ClientNo = tblClient.ClientNo
)
INNER JOIN
tblSaleDetail
ON
tblReturn.SaleDetailIDOrigin = tblSaleDetail.SaleDetailID)
ON
(tblSale.SaleID = tblSaleDetail.SaleID)
AND
(tblSale.ShopLocationID = tblSaleDetail.ShopLocationID))
INNER JOIN
tblReturnType
ON
tblReturn.ReturnTypeID = tblReturnType.ReturnTypeID
WHERE
tblReturn.CountryID = 7
AND
tblReturn.ShopLocationID = 4
ORDER BY
tblVoucherDetail.VoucherDetailID DESC
How can I adjust it?
Use explain and indexes. http://dev.mysql.com/doc/refman/5.0/en/explain.html. Using indexes will enable the database to optimize the search
While optimizing your query is good (indexes are your friend), there is another approach you might consider with this many joins. "Denormalize" your database and create a table that has exactly the data you want. This involves updating your data both in the original table and in the new "report" table you create. There are obvious disadvantages to this -- (1) more space used, (2) you have to either write code to keep your data in sync, or you have to rebuild your "report" table on a periodic basis, (3) subtle bugs can happen if you data does get out of sync.
However, sometimes denormalizing is the simplest and best solution to nasty join performance problems.

Need help speeding up a MySQL query

I need a query that quickly shows the articles within a particular module (a subset of articles) that a user has NOT uploaded a PDF for. The query I am using below takes about 37 seconds, given there are 300,000 articles in the Article table, and 6,000 articles in the Module.
SELECT *
FROM article a
INNER JOIN article_module_map amm ON amm.article=a.id
WHERE amm.module = 2 AND
a.id NOT IN (
SELECT afm.article
FROM article_file_map afm
INNER JOIN article_module_map amm ON amm.article = afm.article
WHERE afm.organization = 4 AND
amm.module = 2
)
What I am doing in the above query is first truncating the list of articles to the selected module, and then further truncating that list to the articles that are not in the subquery. The subquery is generating a list of the articles that an organization has already uploaded PDF's for. Hence, the end result is a list of articles that an organization has not yet uploaded PDF's for.
Help would be hugely appreciated, thanks in advance!
EDIT 2012/10/25
With #fthiella's help, the below query ran in an astonishing 1.02 seconds, down from 37+ seconds!
SELECT a.* FROM (
SELECT article.* FROM article
INNER JOIN article_module_map
ON article.id = article_module_map.article
WHERE article_module_map.module = 2
) AS a
LEFT JOIN article_file_map
ON a.id = article_file_map.article
AND article_file_map.organization=4
WHERE article_file_map.id IS NULL
I am not sure that i can understand the logic and the structure of the tables correctly. This is my query:
SELECT
article.id
FROM
article
INNER JOIN
article_module_map
ON article.id = article_module_map.article
AND article_module_map.module=2
LEFT JOIN
article_file_map
ON article.id = article_file_map.article
AND article_file_map.organization=4
WHERE
article_file_map.id IS NULL
I extract all of the articles that have a module 2. I then select those that organization 4 didn't provide a file.
I used a LEFT JOIN instead of a subquery. In some circumstances this could be faster.
EDIT Thank you for your comment. I wasn't sure it would run faster, but it surprises me that it is so much slower! Anyway, it was worth a try!
Now, out of curiosity, I would like to try all the combinations of LEFT/INNER JOIN and subquery, to see which one runs faster, eg:
SELECT *
FROM
(SELECT *
FROM
article INNER JOIN article_module_map
ON article.id = article_module_map.article
WHERE
article_module_map.module=2)
LEFT JOIN
etc.
maybe removing *, and I would like to see what changes between the condition on the WHERE clause and on the ON clause... anyway I think it doesn't help much, you should concentrate on indexes now.
Indexes on keys/foreign key should be okay already, but what if you add an index on article_module_map.module and/or article_file_map.organization ?
When optimizing queries I use to check the following points:
First: I would avoid using * in SELECT clause, instead, name the diferent fields you want. This increases crazily the speed (I had one which took 7 seconds with *, and naming the field decreased to 0.1s).
Second: As #Adder says, add indexes to your tables.
Third: Try using INNER JOIN instead of WHERE amm.module = 2 AND a.id NOT IN ( ... ). I think I read (I don't remember it well, so take it carefully) that usually MySQL optimize INNER JOINS, and as your subquery is a filter, maybe using three INNER JOINS plus WHERE would be faster to retrieve.

Do you have to join tables "ON" fields or can you just equate them in there where clause?

We have been doing queries a bunch of different ways and queries have been working when we do a
SELECT t.thing FROM table1 t JOIN table2 s WHERE t.something = s.somethingelse AND t.something = 1
and it worked with all queries except one. This one query was hanging forever and crashes our server, but it apparently works if we do it like:
SELECT t.thing FROM table1 t JOIN table2 s ON t.something = s.somethingelse WHERE t.something = 1
We are trying to figure out if the problem is due to the query structure or due to some corruption in the account we are trying to query.
Is the first syntax correct? Thanks.
You need to use the ON clause. Though you can also join with commas, e.g.: SELECT * FROM table1, table2;
Hope that helps!
There are different ANSI formats.. you can use
Select ...
from tbl1 join tbl2 on tbl1.fld = tbl2.fld
OR
select ...
from tbl1, tbl2
where tbl1.fld = tbl2.fld...
The explicit join is the more common format where you are explicitly showing developers after yourself how the tables are related without respect to filtering criteria.
Your first syntax miss the ON. When you join it is mandatory to tell ON what fields the join is happening.
I would recommend using JOIN ON over WHERE to do your joins.
1) your where clause will be easier to read since it will not be pollute by join where clause.
2) your join section is easier to read and understand.
We all agree both method works, but the JOIN one is better due to theses points.
my 2 cents