MySQLoptimization in combining 3 tables and search - mysql

Can anyone tell me whats wrong with this MySQL query ?
select distinct(a.productId)
from product a
left join product_keyword b
on b.productId = a.productId
left join keywords c
on c.keywordId = b.keywordId
where a.productName LIKE '%truck%' OR c.value LIKE '%truck%'
limit 100;
Actually I need to join 3 tables (product, product_keyword and keywords) and search based on user input. One product can be multiple keywords and I store it (keywordId from table keywords in product_keyword).
Can anyone help me please?

When you use the % wildcard in the prefix of a LIKE search, MySQL isn't able to utilize any indexes for the search. Instead, MySQL must scan all of the rows.
You should at least have indexes on the join columns (productID and keywordID) so that MySQL is able to more quickly perform the join operations. However, if the result set is too large, MySQL will perform a scan for the JOINs as well.
Most likely, MySQL is scanning each row in product, then performing the JOIN to product_keyword, then performing the join to keywords. Then, it checks to see if it can exclude the row based on the WHERE clause. Once it returns 100 rows, it stops.

If your tables are large, this will be a very expensive query. Using a leading wildcard on a LIKE query will usually be very slow. If you need that sort of search capability, it is probably better to do it externally in Lucene or something similar, rather than in the database.

Related

What is a "point-in-select" in MySQL?

I was given this query to update a report, and it was taking a long time to run on my computer.
select
c.category_type, t.categoryid, t.date, t.clicks
from transactions t
join category c
on c.category_id = t.categoryid
I asked the DBA if there were any issues with the query, and the DBA optimized the query in this manner:
select
(select category_type
from category c where c.category_id = t.categoryid) category_type,
categoryid,
date, clicks
from transactions t
He described the first subquery as a "point-in-select". I have never heard of this before. Can someone explain this concept?
I want to note that the two queries are not the same, unless the following is true:
transactions.categoryid is always present in category.
category has no duplicate values of category_id.
In practice, these would be true (in most databases). The first query should be using a left join version for closer equivalence:
select c.category_type, t.categoryid, t.date, t.clicks
from transactions t left join
category c
on c.category_id = t.categoryid;
Still not exactly the same, but more similar.
Finally, both versions should make use of an index on category(category_id), and I would expect the performance to be very similar in MySQL.
Your DBA's query is not the same, as others noted, and afaik nonstandard SQL. Yours is much preferable just for its simplicity alone.
It's usually not advantageous to re-write queries for performance. It can help sometimes, but the DBMS is supposed to execute logically equivalent queries equivalently. Failure to do so is a flaw in the query planner.
Performance issues are often a function of physical design. In your case, I would look for indexes on the category and transactions tables that contain categoryid as first column. If neither exist, your join is O(mn) because the category table must be scanned for each transaction row.
Not being a MySQL user, I can only advise you to get query planner output and look for indexing opportunities.

INNER JOIN with condition on a column - Efficient way

I have 2 tables:
Service_BD:
LOB:
I have a requirement now to drop the redundant columns in LOB table like industryId etc. and use Service_BD table to fetch the LOBs for industryId and then get the details of the particular LOB using LOB table.
I am trying to get a single SQL query using Inner Joins but the results are odd.
When I run a simple SQL query like this:
SELECT industryId, LobId
FROM Service_BD
WHERE industryId = 'I01'
GROUP BY lobId
The results are 9 rows:
Now, I would like to join rest of the LOB columns (minus the dropped ones of course) to get the LOB details out of it. So I use the below query:
SELECT *
FROM LOB
INNER JOIN Service_BD ON Service_BD.lobId = LOB.lobId
WHERE Service_BD.industryId = 'I01'
GROUP BY Service_BD.lobID
I am getting the desired results but I have a doubt if this is the most efficient way or not. I doubt because, both Service_BD and LOB tables have huge amount of data, but I have a feeling that if GROUP BY Service_BD.lobID is performed first that would reduce the time complexity of WHERE condition.
Just wanted to know if this is the right way to write the query or are there any better ways to do the same.
You haven't mentioned which DB engine you are using so I guess you are using MySQL. In most cases the GROUP BY will be done only on the rows meeting the WHERE condition. So the GROUP BY is performed only on the fetched result of both the INNER JOIN and the WHERE clause.
I don't think
SELECT *
FROM LOB INNER
JOIN Service_BD ON Service_BD.lobId = LOB.lobId
WHERE Service_BD.industryId = 'I01'
GROUP BY Service_BD.lobID
improves the performance of your query but it certainly eliminates duplicate lobID from your result. Also, I don't see any other better way to eliminate duplicates except introducing the HAVING clause but I don't think it's going to improve the performance of your query.

how can i make query more faster

I have one complex queries and which fetches data from database based on search keywords. I have written two query to fetch data based on keyword by joining two tables. And each table contains more than 5 millions of records. But the problem is, this query takes 5-7 seconds to run so the page take more time to laod. The queries are:
SELECT DISTINCT( `general_info`.`company_name` ),
general_info.*
FROM general_info
INNER JOIN `financial_info`
ON `financial_info`.`reg_code` = `general_info`.`reg_code`
WHERE ( `financial_info`.`type_of_activity` LIKE '%siveco%'
OR `general_info`.`company_name` LIKE '%siveco%'
OR `general_info`.`reg_code` LIKE '%siveco%' )
The parentheses around distinct don't make a difference. distinct is not a function. So your query is equivalent to:
SELECT gi.*
FROM general_info gi INNER JOIN
`financial_info` gi
ON fi.`reg_code` = gi.`reg_code`
WHERE fi.`type_of_activity` LIKE '%siveco%' OR
gi.`company_name` LIKE '%siveco%' OR
gi.`reg_code` LIKE '%siveco%';
For the join, you should have indexes on general_info(reg_code) and financial_info(reg_code). You may already have these indexes.
The real problem is probably the where clause. Because you are using wildcards at the beginning of the pattern, you cannot optimize this with a regular index. You may be able to do what you want using full text search, along with the matches clause. The documentation for such an index is here. This will work particularly well if you are looking for complete words in the various names.

Where is better to put 'on' conditions in multiple joins? (mysql)

I have multiple joins including left joins in mysql. There are two ways to do that.
I can put "ON" conditions right after each join:
select * from A join B ON(A.bid=B.ID) join C ON(B.cid=C.ID) join D ON(c.did=D.ID)
I can put them all in one "ON" clause:
select * from A join B join C join D ON(A.bid=B.ID AND B.cid=C.ID AND c.did=D.ID)
Which way is better?
Is it different if I need Left join or Right join in my query?
For simple uses MySQL will almost inevitably execute them in the same manner, so it is a manner of preference and readability (which is a great subject of debate).
However with more complex queries, particularly aggregate queries with OUTER JOINs that have the potential to become disk and io bound - there may be performance and unseen implications in not using a WHERE clause with OUTER JOIN queries.
The difference between a query that runs for 8 minutes, or .8 seconds may ultimately depend on the WHERE clause, particularly as it relates to indexes (How MySQL uses Indexes): The WHERE clause is a core part of providing the query optimizer the information it needs to do it's job and tell the engine how to execute the query in the most efficient way.
From How MySQL Optimizes Queries using WHERE:
"This section discusses optimizations that can be made for processing
WHERE clauses...The best join combination for joining the tables is
found by trying all possibilities. If all columns in ORDER BY and
GROUP BY clauses come from the same table, that table is preferred
first when joining."
For each table in a join, a simpler WHERE is constructed to get a fast
WHERE evaluation for the table and also to skip rows as soon as
possible
Some examples:
Full table scans (type = ALL) with NO Using where in EXTRA
[SQL] SELECT cr.id,cr2.role FROM CReportsAL cr
LEFT JOIN CReportsCA cr2
ON cr.id = cr2.id AND cr.role = cr2.role AND cr.util = 1000
[Err] Out of memory
Uses where to optimize results, with index (Using where,Using index):
[SQL] SELECT cr.id,cr2.role FROM CReportsAL cr
LEFT JOIN CReportsCA cr2
ON cr.id = cr2.id
WHERE cr.role = cr2.role
AND cr.util = 1000
515661 rows in set (0.124s)
****Combination of ON/WHERE - Same result - Same plan in EXPLAIN*******
[SQL] SELECT cr.id,cr2.role FROM CReportsAL cr
LEFT JOIN CReportsCA cr2
ON cr.id = cr2.id
AND cr.role = cr2.role
WHERE cr.util = 1000
515661 rows in set (0.121s)
MySQL is typically smart enough to figure out simple queries like the above and will execute them similarly but in certain cases it will not.
Outer Join Query Performance:
As both LEFT JOIN and RIGHT JOIN are OUTER JOINS (Great in depth review here) the issue of the Cartesian product arises, the avoidance of Table Scans must be avoided, so that as many rows as possible not needed for the query are eliminated as fast as possible.
WHERE, Indexes and the query optimizer used together may completely eliminate the problems posed by cartesian products when used carefully with aggregate functions like AVERAGE, GROUP BY, SUM, DISTINCT etc. orders of magnitude of decrease in run time is achieved with proper indexing by the user and utilization of the WHERE clause.
Finally
Again, for the majority of queries, the query optimizer will execute these in the same manner - making it a manner of preference but when query optimization becomes important, WHERE is a very important tool. I have seen some performance increase in certain cases with INNER JOIN by specifying an indexed col as an additional ON..AND ON clause but I could not tell you why.
Put the ON clause with the JOIN it applies to.
The reasons are:
readability: others can easily see how the tables are joined
performance: if you leave the conditions later in the query, you'll get way more joins happening than need to - it's like putting the conditions in the where clause
convention: by following normal style, your code will be more portable and less likely to encounter problems that may occur with unusual syntax - do what works

which query is better and efficient - mysql

I came across writing the query in differnt ways like shown below
Type-I
SELECT JS.JobseekerID
, JS.FirstName
, JS.LastName
, JS.Currency
, JS.AccountRegDate
, JS.LastUpdated
, JS.NoticePeriod
, JS.Availability
, C.CountryName
, S.SalaryAmount
, DD.DisciplineName
, DT.DegreeLevel
FROM Jobseekers JS
INNER
JOIN Countries C
ON JS.CountryID = C.CountryID
INNER
JOIN SalaryBracket S
ON JS.MinSalaryID = S.SalaryID
INNER
JOIN DegreeDisciplines DD
ON JS.DegreeDisciplineID = DD.DisciplineID
INNER
JOIN DegreeType DT
ON JS.DegreeTypeID = DT.DegreeTypeID
WHERE
JS.ShowCV = 'Yes'
Type-II
SELECT JS.JobseekerID
, JS.FirstName
, JS.LastName
, JS.Currency
, JS.AccountRegDate
, JS.LastUpdated
, JS.NoticePeriod
, JS.Availability
, C.CountryName
, S.SalaryAmount
, DD.DisciplineName
, DT.DegreeLevel
FROM Jobseekers JS, Countries C, SalaryBracket S, DegreeDisciplines DD
, DegreeType DT
WHERE
JS.CountryID = C.CountryID
AND JS.MinSalaryID = S.SalaryID
AND JS.DegreeDisciplineID = DD.DisciplineID
AND JS.DegreeTypeID = DT.DegreeTypeID
AND JS.ShowCV = 'Yes'
I am using Mysql database
Both works really well, But I am wondering
which is best practice to use all time for any situation?
Performance wise which is better one?(Say the database as a millions records)
Any advantages of one over the other?
Is there any tool where I can check which is better query?
Thanks in advance
1- It's a no brainer, use the Type I
2- The type II join are also called 'implicit join', whereas the type I are called 'explicit join'. With modern DBMS, you will not have any performance problem with normal query. But I think with some big complex multi join query, the DBMS could have issue with the implicit join. Using explicit join only could improve your explain plan, so faster result !
3- So performance could be an issue, but most important maybe, the readability is improve for further maintenance. Explicit join explain exactly what you want to join on what field, whereas implicit join doesn't show if you make a join or a filter. The Where clause is for filter, not for join !
And a big big point for explicit join : outer join are really annoying with implicit join. It is so hard to read when you want multiple join with outer join that explicit join are THE solution.
4- Execution plan are what you need (See the doc)
Some duplicates :
Explicit vs implicit SQL joins
SQL join: where clause vs. on clause
INNER JOIN ON vs WHERE clause
in the most code i've seen, those querys are done like your Type-II - but i think Type-I is better because of readability (and more logic - a join is a join, so you should write it as a join (althoug the second one is just another writing style for inner joins)).
in performance, there shouldn't be a difference (if there is one, i think the Type-I would be a bit faster).
Look at "Explain"-syntax
http://dev.mysql.com/doc/refman/5.1/en/explain.html
My suggestion.
Update all your tables with some amount of records. Access the MySQL console and run SQL both command one by one. You can see the time execution time in the console.
For the two queries you mentioned (each with only inner joins) any modern database's query optimizer should produce exactly the same query plan, and thus the same performance.
For MySQL, if you prefix the query with EXPLAIN, it will spit out information about the query plan (instead of running the query). If the information from both queries is the same, them the query plan is the same, and the performance will be identical. From the MySQL Reference Manual:
EXPLAIN returns a row of information
for each table used in the SELECT
statement. The tables are listed in
the output in the order that MySQL
would read them while processing the
query. MySQL resolves all joins using
a nested-loop join method. This means
that MySQL reads a row from the first
table, and then finds a matching row
in the second table, the third table,
and so on. When all tables are
processed, MySQL outputs the selected
columns and backtracks through the
table list until a table is found for
which there are more matching rows.
The next row is read from this table
and the process continues with the
next table.
When the EXTENDED keyword is used,
EXPLAIN produces extra information
that can be viewed by issuing a SHOW
WARNINGS statement following the
EXPLAIN statement. This information
displays how the optimizer qualifies
table and column names in the SELECT
statement, what the SELECT looks like
after the application of rewriting and
optimization rules, and possibly other
notes about the optimization process.
As to which syntax is better? That's up to you, but once you move beyond inner joins to outer joins, you'll need to use the newer syntax, since there's no standard for describing outer joins using the older implicit join syntax.