MySQL optimizator join calculation rule - mysql

I read about MySQL optimization join and there found about MySQL decide who join will be first and i want to know exactly what is criterion about MySQL do this.
I was following MySQL example and creeate tables:
create table a(
col int default null,
index a_index(col)
);
create table b(
col int default null,
index a_index(col)
);
create table c(
col int default null,
index a_index(col)
);
create table d(
col int default null,
index a_index(col)
);
And follow her example query:
explain SELECT *
FROM a JOIN b LEFT JOIN c ON (c.col=a.col)
LEFT JOIN d ON (d.col=a.col)
WHERE b.col=d.col;
but i see order who he do join is same like i do
explain SELECT *
FROM b JOIN a LEFT JOIN c ON (c.col=a.col)
LEFT JOIN d ON (d.col=a.col)
WHERE b.col=d.col;
same index use , same extra field see, only order is b,a,d,c.
And i want to know why do d,c and don't do c,d.

Inner joins are commutative
It means that the order of evaluation doesn't matter, A JOIN B gives the same result as B JOIN A, and the optimizer can swap the order of join evaluation if it thinks that this gives better performance.
Outer joins are not commutative.
A LEFT JOIN B is not the same as B LEFT JOIN A, their results can differ.
In this case the optimizer cannot change the order of evaluation, it must alwayst get a result of do A first, then can evaluate B.

Related

fetch rows where left join subquery is null (not found)

How to fetch rows where a joined subquery is null?
SELECT *
FROM bank_recon b
LEFT JOIN (
SELECT o.bank_recon_id
FROM data_voucher_ocr_bank o
LEFT JOIN data_voucher v ON v.id=o.data_voucher_id
WHERE v.is_ocr_verified=1
LIMIT 1
) s ON s.bank_recon_id=b.id
WHERE s IS NULL
update
When using this query (the subquery) something is fetched depending on if is_ocr_verified is set or not
SELECT o.bank_recon_id
FROM data_voucher_ocr_bank o
LEFT JOIN data_voucher v ON v.id=o.data_voucher_id
WHERE v.is_ocr_verified=1 && o.bank_recon_id=320062
When using this query everything is fetched no matter what!?
SELECT b.txt, b.amount
FROM bank_recon b
LEFT JOIN (
SELECT o.bank_recon_id
FROM data_voucher_ocr_bank o
LEFT JOIN data_voucher v ON v.id=o.data_voucher_id
WHERE v.is_ocr_verified=1
LIMIT 1
) s ON s.bank_recon_id=b.id
WHERE b.id=320062 && s.bank_recon_id IS NULL
Specify a column in your WHERE clause, not just the subquery.
WHERE s.bank_recon_id IS NULL
An anti join (which is what you are trying to apply here) is a method we use when the straight-forward NOT IN or NOT EXISTS have performance issues in a DBMS.
Provided data_voucher_ocr_bank.bank_recon_id cannot be null, we can use:
SELECT txt, amount
FROM bank_recon
WHERE id NOT IN
(
SELECT bank_recon_id
FROM data_voucher_ocr_bank
WHERE data_voucher_id IN (SELECT id FROM data_voucher WHERE is_ocr_verified = 1)
);
(Otherwise we'd add AND bank_recon_id IS NOT NULL or use NOT EXISTS instead.)

MINUS operator in MySQL query [duplicate]

I am trying to perform a MINUS operation in MySql.I have three tables:
one with service details
one table with states that a service is offered in
another table (based on zipcode and state) shows where this service is not offered.
I am able to get the output for those two select queries separately. But I need a combined statement that gives the output as
'SELECT query_1 - SELECT query_2'.
Service_Details Table
Service_Code(PK) Service Name
Servicing_States Table
Service_Code(FK) State Country PK(Service_Code,State,Country)
Exception Table
Service_Code(FK) Zipcode State PK(Service_Code,Zipcode,State)
MySql does not recognise MINUS and INTERSECT, these are Oracle based operations. In MySql a user can use NOT IN as MINUS (other solutions are also there, but I liked it lot).
Example:
select a.id
from table1 as a
where <condition>
AND a.id NOT IN (select b.id
from table2 as b
where <condition>);
MySQL Does not supports MINUS or EXCEPT,You can use NOT EXISTS, NULL or NOT IN.
Here's my two cents... a complex query just made it work, originally expressed with Minus and translated for MySql
With MINUS:
select distinct oi.`productOfferingId`,f.name
from t_m_prod_action_oitem_fld f
join t_m_prod_action_oitem oi
on f.fld2prod_action_oitem = oi.oid;
minus
select
distinct r.name,f.name
from t_m_prod_action_oitem_fld f
join t_m_prod_action_oitem oi
on f.fld2prod_action_oitem = oi.oid
join t_m_rfs r
on r.name = oi.productOfferingId
join t_m_attr a
on a.attr2rfs = r.oid and f.name = a.name;
With NOT EXISTS
select distinct oi.`productOfferingId`,f.name
from t_m_prod_action_oitem_fld f
join t_m_prod_action_oitem oi
on f.fld2prod_action_oitem = oi.oid
where not exists (
select
r.name,f.name
from t_m_rfs r
join t_m_attr a
on a.attr2rfs = r.oid
where r.name = oi.productOfferingId and f.name = a.name
The tables have to have the same columns, but I think you can achieve what you are looking for with EXCEPT... except that EXCEPT only works in standard SQL! Here's how to do it in MySQL:
SELECT * FROM Servicing_states ss WHERE NOT EXISTS
( SELECT * FROM Exception e WHERE ss.Service_Code = e.Service_Code);
http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/
Standard SQL
SELECT * FROM Servicing_States
EXCEPT
SELECT * FROM Exception;
An anti-join pattern is the approach I typically use. That's an outer join, to return all rows from query_1, along with matching rows from query_2, and then filtering out all the rows that had a match... leaving only rows from query_1 that didn't have a match. For example:
SELECT q1.*
FROM ( query_1 ) q1
LEFT
JOIN ( query_2 ) q2
ON q2.id = q1.id
WHERE q2.id IS NULL
To emulate the MINUS set operator, we'd need the join predicate to compare all columns returned by q1 and q2, also matching NULL values.
ON q1.col1 <=> q2.col2
AND q1.col2 <=> q2.col2
AND q1.col3 <=> q2.col3
AND ...
Also, To fully emulate the MINUS operation, we'd also need to remove duplicate rows returned by q1. Adding the DISTINCT keyword would be sufficient to do that.
In case the tables are huge and are similar, one option is to save the PK to new tables. Then compare based only on the PK. In case you know that the first half is identical or so add a where clause to check only after a specific value or date .
create table _temp_old ( id int NOT NULL PRIMARY KEY )
create table _temp_new ( id int NOT NULL PRIMARY KEY )
### will take some time
insert into _temp_old ( id )
select id from _real_table_old
### will take some time
insert into _temp_new ( id )
select id from _real_table_new
### this version should be much faster
select id from _temp_old to where not exists ( select id from _temp_new tn where to.id = tn.id)
### this should be much slower
select id from _real_table_old rto where not exists ( select id from _real_table_new rtn where rto.id = rtn.id )

How to speed up left join queries by indexing?

At the moment I am experiencing some slower MySQL queries in my application which I want to speed up. Unfortunately I’m not quite sure which is the correct way to do it.
I have the following (fictitious) tables: Book, Page and Word.
Word is child of Page by word_page_id
Page is child of Book by page_book id
I already have individual indexes on page_book_id, word_page_id, book_user_id and book_flag_delete.
SELECT `book`.*, COUNT(word_id) AS `word_amount` FROM `book`
LEFT JOIN `page` ON page_book_id = book_id
LEFT JOIN `word` ON word_page_id = paragraph_id
WHERE (book_user_id = 1) AND (book_flag_delete IS NULL)
GROUP BY `book_id`
ORDER BY `book_id` ASC LIMIT 100
SELECT COUNT(DISTINCT `book_id`) AS `book_row_count` FROM `book`
LEFT JOIN `page` ON page_book_id = book_id
LEFT JOIN `word` ON word_page_id = page_id
WHERE (book_user_id = 59) AND (book_flag_delete IS NULL)
Any ideas how to speed up such queries?
Is there extra indexing involved?
Set indexes on the fields you use for joining.
Further make sure that these have both the same datatype, encoding, and collation, else the index will also not be used.
mysql> EXPLAIN <query> will show you the actually used fields (key column in output) and the available indexes (possible_keys output field).
For this query:
SELECT b.*, COUNT(w.word_id) AS `word_amount`
FROM `book` b LEFT JOIN
`page` p
ON p.page_book_id = b.book_id LEFT JOIN
`word` w
ON w.word_page_id = p.paragraph_id
WHERE (b.book_user_id = 1) AND (b.book_flag_delete IS NULL)
GROUP BY b.`book_id`
ORDER BY b.`book_id` ASC
LIMIT 100;
The best indexes are: book(user_id, book_flag_delete, book_id), page(page_book_id, paragraph_id), and word(word_page_id, word_id).
However, the overall group by might be expensive. You might try writing the query as:
SELECT b.*,
(SELECT COUNT(w.word_id)
FROM `page` p JOIN
`word` w
ON w.word_page_id = p.paragraph_id
WHERE p.page_book_id = b.book_id
) AS `word_amount`
FROM `book` b LEFT JOIN
WHERE (b.book_user_id = 1) AND (b.book_flag_delete IS NULL)
ORDER BY b.`book_id` ASC
LIMIT 100;
The same indexes indexes work here. But, this query should avoid a group by on all the data at once (instead, it uses the indexes for the aggregation).
The optimal schema for a many-to-many mapping table is
CREATE TABLE XtoY (
# No surrogate id for this table
x_id MEDIUMINT UNSIGNED NOT NULL, -- For JOINing to one table
y_id MEDIUMINT UNSIGNED NOT NULL, -- For JOINing to the other table
# Include other fields specific to the 'relation'
PRIMARY KEY(x_id, y_id), -- When starting with X
INDEX (y_id, x_id) -- When starting with Y
) ENGINE=InnoDB;
The details on 'why' are in my index cookbook
In your select you're gonna want to refrain from using the wildcard "*" to grab columns. Plus utilize aliases ALWAYS!! This will keep your db from having to create a "virtual" alias.
select book1.column1, book1.column2, page1.column1
from book book1
left join page page1
on page1.page_book_id = book1.book_id
..... blah

MySQL Query Times out - Need to speed it up

I whipped up a query here that does something particular with retrieving results that do not match the join (as suggested by this SO question).
SELECT cf.f_id
FROM comments_following AS cf
INNER JOIN comments AS c ON cf.c_id = c.id
WHERE NOT EXISTS (
SELECT 1 FROM follows WHERE f_id = cf.f_id
)
Any ideas on how to speed this up? There are anywhere from 30k-200k rows it's looking through and appears to be using indexes, but the query times out.
EXPLAIN/DESCRIBE Info:
1 PRIMARY c ALL PRIMARY NULL NULL NULL 39119
1 PRIMARY cf ref c_id, c_id_2 c_id 8 ...c.id 11 Using where; Using index
2 DEPENDENT SUBQUERY following index NULL PRIMARY 8 NULL 35612 Using where; Using index
The comments table isn't used explicitly in the query. Is it being used for filtering? If not, try:
SELECT cf.f_id
FROM comments_following cf
WHERE NOT EXISTS (
SELECT 1 FROM follows WHERE follows.f_id = cf.f_id
)
By the way, if this generates a syntax error (because follows.f_id does not exist), then that is the problem. In that case, you would think you have a correlated subquery, but there is not really one.
Or the left outer join version:
SELECT cf.f_id
FROM comments_following cf left outer join
follows f
on f.f_id = cf.f_id
where f.f_id is null
Having an index on follows(f_id) should make both these versions run faster.
LEFT JOIN sometimes is faster then WHERE NOT EXISTS subquerys, try:
SELECT cf.f_id
FROM comments_following AS cf
INNER JOIN comments AS c ON cf.c_id = c.id
LEFT JOIN follows AS f ON f.f_id = cf.f_id
WHERE f.f_id IS NULL
The answer to this problem was to place a second index on follows.f_id.

SQL database index design for inner join keyword search

I have this query
SELECT a.*
FROM entries a
INNER JOIN entries_keywords b ON a.id = b.entry_id
INNER JOIN keywords c ON b.keyword_id = c.id
WHERE c.key IN ('wake', 'up')
GROUP BY a.id
HAVING COUNT(*) = 2
but it's slow. How do I design indexes optimally to speed things up?
EDIT
This is the current schema
CREATE TABLE `entries` (`id` integer PRIMARY KEY AUTOINCREMENT, `sha` text);
CREATE TABLE `entries_keywords` (`id` integer PRIMARY KEY AUTOINCREMENT, `entry_id` integer REFERENCES `entries`, `keyword_id` integer REFERENCES `keywords`);
CREATE TABLE `keywords` (`id` integer PRIMARY KEY AUTOINCREMENT, `key` string);
CREATE INDEX `entries_keywords_entry_id_index` ON `entries_keywords` (`entry_id`);
CREATE INDEX `entries_keywords_entry_id_keyword_id_index` ON `entries_keywords` (`entry_id`, `keyword_id`);
CREATE INDEX `entries_keywords_keyword_id_index` ON `entries_keywords` (`keyword_id`);
CREATE INDEX `keywords_key_index` ON `keywords` (`key`);
I'm using Sqlite3, the query doesn't fail, but is slow.
Right now I'm a query like this (subquery for each keyword):
select *
from (
select *
from (entries) e
inner join entries_keywords ek on e.id = ek.entry_id
inner join keywords k on ek.keyword_id = k.id
where k.key = 'wake') e
inner join entries_keywords ek on e.id = ek.entry_id
inner join keywords k on ek.keyword_id = k.id
where k.key = 'up';
This is way faster but doesn't feel right since it's going to get ugly if I have a lot of keywords.
The key indexes required for that query
keywords(key)
entries_keywords(keyword_id,entry_id)
entries(id)
You must be using MySQL, because the SELECT a.* would otherwise fail.
EDIT after the 2nd comment about this statement, let me point out why select a.* will fail here - it's because of the GROUP BY.
To explain, because the criteria (WHERE) is on c.key, it needs to be indexed.
This then goes up the JOIN against b.keyword_id. We create an index to include b.entry_id so that it never has to look up against the table - the index alone can cover the columns required.
Finally, a.id=b.entry_id joins back to the entries table, so we index the id of that table.
It is quite likely entries(id) is already the primary key, but you may have entries_keywords indexed the other way around - it won't work to satisfy this join.