inner join and where in() clause performance? - mysql

I can get same result for these queries, but which one is the fastest, and most efficient?
where in() or inner join?
SELECT `stats`.`userid`,`stats`.`sumpoint`
FROM `stats`
INNER JOIN users
ON `stats`.`userid` = `users`.`userid`
WHERE `users`.`nick` = '$nick'
ORDER BY `statoylar`.`sumpoint` DESC limit 0,10
and
SELECT `stats`.`userid`,`stats`.`sumpoint`
FROM `stats`
WHERE userid
IN (
SELECT userid
FROM `users`
WHERE `users`.`nick` = '$nick'
)
ORDER BY `stats`.`sumpoint` DESC limit 0,10

Depends on your SQL engine. Newer SQL systems that have reasonable query optimizers will most likely rewrite both queries to the same plan. Typically, a sub-query (your second query) is rewritten using a join (the first query).
In simple SQL engines that may not have great query optimizers, the join should be faster because they may run sub-queries into a temporary in-memory table before running the outer query.
In some SQL engines that have limited memory footprint, however, the sub-query may be faster because it doesn't require joining -- which produces more data.
So, in summary, it depends.

to check the performance execute both Query with EXPLAIN SELECT ....
AFAIK, INNER JOIN is faster than IN
btw what is your type of table engine MYISAM or INNODB

also there is another option, EXISTS. I'm a tsql guy so....
SELECT s.[userid], s.[sumpoint]
FROM stats AS s
WHERE
EXISTS (
SELECT 1
FROM users AS u
WHERE
u.[userID] = s.[userID]
AND u.[nick] = '$nick'
)
ORDER BY s.[sumpoint] DESC
I think EXISTS is available in most engines. It's generally pretty fast.
IN sql server at least (2005+) there is no performance difference at all between IN and EXISTS for cases where the column in question is not NULLABLE.
probably irrelevant but hey.....

Related

setting LIMIT on subquery when amount of returned rows doesnt matter

im using a framework ORM to run my queries ... here is a example of ORM query output ... it is a relational query
select * from `clients` where exists
(select * from `transactions` where `clients`.`id` = `transactions`.`client_id` )
order by `id` desc limit 20 offset 02
i was wondering if putting limitation on subquery has any benefits performance wise in this query since it doesn't matter how many rows it returns
like
(select * from `transactions` where `clients`.`id` = `transactions`.`client_id` LIMIT 1 )
or performance in this scenario is not dependent on how many rows we select in subquery ?
where exists
(select * from `transactions` where `clients`.`id` = `transactions`.`client_id` )
The suquery is used as an argument for an EXISTS condition. In english, that would translate as : this customer has at least one transaction.
When processing this type of condition, MySQL generally optimizes the process to just check that at least one record is returned by the subquery. Using LIMIT in this context is useless, your RDBMS knows better.
First of all, using LIMIT without ORDER BY is a fairly meaningless thing, because in your suggested subquery you are not telling MySQL which single record you want to retain.
Next, you absolutely don't want to potentially limit the exists subquery, because then you might cause it to fail prematurely, before giving it a chance to find a match. The whole point of the exists subquery is to potentially scan the entire transactions table looking for a client match.
A positive EXISTS clause is already optimized in the sense that MySQL will stop as soon as it finds a single match.

Does MySQL database optimize joins itself or do we have to specify the order to join?

I have a query like the following
select f.number,SUM(ii.qty),SUM(c.g1),SUM(c.g2),SUM(c.g3),SUM(c.g4),SUM(c.g5),SUM(c.g6) from
farmer as f
inner join
issue as i on f.number = i.farmerno
inner join
item_issue as ii on i.recno = (ii.recno+0)
inner join
crop as c on c.farmerno = f.number
where ii.item like 'S%'
group by f.number
Do we have to think about the order of joins that would be optimal for the query or does MySQL figure out the best way of doing it?
You can view your query after the MySQL Optimizer had it by doing an
EXPLAIN EXTENDED SELECT ...
This will give you a warning containing the query as it is really processed (what the optimizer had done with it). That way you can review the real join order.
As for performance you should make sure to use a B-TREE index on your ii.item column since you do a LIKE Operation. MySQL default are HASH indices which are not optimal for that.

Two inner joins cause slow execution

I have two inner joins in my SQL query:
SELECT `M`.`msg_id`,
`U`.`username`,
`U`.`seo_username`
FROM `newdb2`.`users` AS `U`
INNER JOIN (SELECT subscriber_to_id
FROM subscriptions
WHERE subscriber_id = 434) AS subscriber
ON id = subscriber_to_id
INNER JOIN `newdb2`.`messages` AS `M`
ON (`M`.`uid_fk` = `U`.`id`)
ORDER BY id DESC LIMIT 10
When I execute this query I see that is really slow.
How can I modify thiş query to make it faster?
Quick fixes for things like this are adding indexes which allows your database server to quickly look up columns you are searching on. For more info on how to add indexes to columns, see the manual.
In this query, those columns are:
subscriptions.subscriber_id
subscriptions.subscriber_to_id
users.id
messages.uid_fk
The ORDER BY id should be OK as I assume your id column has a primary key index on it already, but ordering queries will slow it down too.
Subselect queries will also slow the query down. In this particular query, I can't see the alias subscriber (containing the results of your subquery, which is inner joined on) used anywhere, so remove that join completely.

Improve JOIN query speed

I have this simple join that works great but is HORRIBLY slow I think because the tech table is very large. There are many instances of uid as it tracks timestamp of the uid thus the distinct. What is the best way to speed this query up?
SELECT DISTINCT tech.uid,
listing.empno,
listing.firstname,
listing.lastname
FROM tech,
listing
WHERE tech.uid = listing.empno
ORDER BY listing.empno ASC
First add an Index to tech.UID and listing.EmpNo on their respective tables.
After you are sure there are indexes you can try to re-write your query like this:
SELECT DISTINCT tech.uid, listing.EmpNo, listing.FirstName, listing.LastName
FROM listing INNER JOIN tech ON tech.uid = listing.EmpNo
ORDER BY listing.EmpNo ASC;
If it's still not fast enough, put the word EXPLAIN before the query to get some hints about the execution plan of the query.
EXPLAIN SELECT DISTINCT tech.uid, listing.EmpNo, listing.FirstName, listing.LastName
FROM listing INNER JOIN tech ON tech.uid = listing.EmpNo
ORDER BY listing.EmpNo ASC;
Posts the Explain results so we can get better insight.
Hope it helps,
This is very simple query. Only thing you can do in SQL - you may add indexes on fields used in JOIN/WHERE and ORDER BY clauses (tech.uid, listing.empno), if there are no indexes.
If there are JOIN fields with NULL values - they may ruin your performance. You should filter them in WHERE clause (WHERE tech.uid is not null and listing.empno not null). If there are many rows with JOIN on NULL field - that data may produce cartesian result (not sure how is this called in english) with may contain enormous count of rows.
You may change MySQL configuration. There are many options useful for performance tuning, like key_buffer_size, sort_buffer_size, tmp_table_size, max_heap_table_size, read_buffer_size etc.

MySQL query optimization and/or tweaks

I have the following query which both tables are huge. The query were very slow and I need your idea to optimize this query or do you have any other solution?
SELECT c.EstablishmentID,
(SELECT COUNT(ID)
FROM cleanpoi
WHERE EstablishmentID=c.EstablishmentID OR EstablishmentID
IN (SELECT ChildEstablishmentID
FROM crawlerchildren
WHERE ParentEstablishmentID=c.EstablishmentID)
) POI
FROM crawler c
GROUP BY c.EstablishmentID
BTW, I have the appropriate indexes applied.
UPDATE:
Okay, I have attached the explain result.
Try it by using JOIN
SELECT c.EstablishmentID, COUNT(d.ID)
FROM crawler c
LEFT JOIN cleanpoi d
ON c.establishmentid = d.establishmentID
LEFT JOIN
(
SELECT DISTINCT ChildEstablishmentID
FROM crawlerchildren
) e ON e.ParentEstablishmentID = c.EstablishmentID
GROUP BY c.EstablishmentID
IN() and NOT IN() subqueries are poorly optimized:
MySQL executes the subquery as a dependent subquery for each row in the outer query. This is a frequent cause of serious performance problems in MySQL 5.5 and older versions. The query probably should be rewritten as a JOIN or a LEFT OUTER JOIN, respectively.
Non-deterministic GROUP BY:
The SQL retrieves columns that are neither in an aggregate function nor the GROUP BY expression, so these values will be non-deterministic in the result.
GROUP BY or ORDER BY on different tables:
This will force the use of a temporary table and filesort, which can be a huge performance problem and can consume large amounts of memory and temporary space on disk.