I have the following query which both tables are huge. The query were very slow and I need your idea to optimize this query or do you have any other solution?
SELECT c.EstablishmentID,
(SELECT COUNT(ID)
FROM cleanpoi
WHERE EstablishmentID=c.EstablishmentID OR EstablishmentID
IN (SELECT ChildEstablishmentID
FROM crawlerchildren
WHERE ParentEstablishmentID=c.EstablishmentID)
) POI
FROM crawler c
GROUP BY c.EstablishmentID
BTW, I have the appropriate indexes applied.
UPDATE:
Okay, I have attached the explain result.
Try it by using JOIN
SELECT c.EstablishmentID, COUNT(d.ID)
FROM crawler c
LEFT JOIN cleanpoi d
ON c.establishmentid = d.establishmentID
LEFT JOIN
(
SELECT DISTINCT ChildEstablishmentID
FROM crawlerchildren
) e ON e.ParentEstablishmentID = c.EstablishmentID
GROUP BY c.EstablishmentID
IN() and NOT IN() subqueries are poorly optimized:
MySQL executes the subquery as a dependent subquery for each row in the outer query. This is a frequent cause of serious performance problems in MySQL 5.5 and older versions. The query probably should be rewritten as a JOIN or a LEFT OUTER JOIN, respectively.
Non-deterministic GROUP BY:
The SQL retrieves columns that are neither in an aggregate function nor the GROUP BY expression, so these values will be non-deterministic in the result.
GROUP BY or ORDER BY on different tables:
This will force the use of a temporary table and filesort, which can be a huge performance problem and can consume large amounts of memory and temporary space on disk.
Related
I have simplified the query I am firing for brevity as follows
SELECT
1 AS mae
FROM
(SELECT
t.id
FROM transaction t) a
LEFT OUTER JOIN
(SELECT
track_id
FROM attendee) AS b ON a.id = b.track_id
HAVING mae > 0;
Over here there is no aggregation. However, I still have to use having. If I use where, mysql is unable to recognize the column mae.
Why is this so?
In general, aliases defined in the SELECT clause are not available for re-use in the same SELECT -- nor in the WHERE, nor in the ON clauses. This is a true of all SQL dialects. Aliases are allowed in the ORDER BY, on the other hand.
MySQL recognizes column aliases in the HAVING clause. This is so convenient that MySQL has extended the HAVING clause for use with non-aggregation queries. So, your query is using this extension.
One nice feature of this extension is that it allows the reference without using a subquery -- the normal way around this. Because MySQL materializes (almost) all derived tables, this saves overhead in the processing.
Ae you just trying to do a regular join ?
SELECT *
FROM transaction t
JOIN attendee AS b
ON a.id = b.track_id
I have a MYSQL query of this form:
SELECT
employee.name,
totalpayments.totalpaid
FROM
employee
JOIN (
SELECT
paychecks.employee_id,
SUM(paychecks.amount) totalpaid
FROM
paychecks
GROUP BY
paychecks.employee_id
) totalpayments on totalpayments.employee_id = employee.id
I've recently found that this returns MUCH faster in this form:
SELECT
employee.name,
(
SELECT
SUM(paychecks.amount)
FROM
paychecks
WHERE
paychecks.employee_id = employee.id
) totalpaid
FROM
employee
It surprises me that there would be a difference in speed, and that the lower query would be faster. I prefer the upper form for development, because I can run the subquery independently.
Is there a way to get the "best of both worlds": speedy results return AND being able to run the subquery in isolation?
Likely, the correlated subquery is able to make effective use of an index, which is why it's fast, even though that subquery has to be executed multiple times.
For the first query with the inline view, that causing MySQL to create a derived table, and for large sets, that's effectively a MyISAM table.
In MySQL 5.6.x and later, the optimizer may choose to add an index on the derived table, if that would allow a ref operation and the estimated cost of the ref operation is lower than the nested loops scan.
I recommend you try using EXPLAIN to see the access plan. (Based on your report of performance, I suspect you are running on MySQL version 5.5 or earlier.)
The two statements are not entirely equivalent, in the case where there are rows in employees for which there are no matching rows in paychecks.
An equivalent result could be obtained entirely avoiding a subquery:
SELECT e.name
, SUM(p.amount) AS total_paid
FROM employee e
JOIN paychecks p
ON p.employee_id = e.id
GROUP BY e.id
(Use an inner join to get a result equivalent to the first query, use a LEFT outer join to be equivalent to the second query. Wrap the SUM() aggregate in an IFNULL function if you want to return a zero rather than a NULL value when no matching row with a non-null value of amount is found in paychecks.)
Join is basically Cartesian product that means all the records of table A will be combined with all the records of table B. The output will be
number of records of table A * number of records of table b =rows in the new table
10 * 10 = 100
and out of those 100 records, the ones that match the filters will be returned in the query.
In the nested queries, there is a sample inner query and whatever is the total size of records of the inner query will be the input to the outter query that is why nested queries are faster than joins.
I have a query like the following
select f.number,SUM(ii.qty),SUM(c.g1),SUM(c.g2),SUM(c.g3),SUM(c.g4),SUM(c.g5),SUM(c.g6) from
farmer as f
inner join
issue as i on f.number = i.farmerno
inner join
item_issue as ii on i.recno = (ii.recno+0)
inner join
crop as c on c.farmerno = f.number
where ii.item like 'S%'
group by f.number
Do we have to think about the order of joins that would be optimal for the query or does MySQL figure out the best way of doing it?
You can view your query after the MySQL Optimizer had it by doing an
EXPLAIN EXTENDED SELECT ...
This will give you a warning containing the query as it is really processed (what the optimizer had done with it). That way you can review the real join order.
As for performance you should make sure to use a B-TREE index on your ii.item column since you do a LIKE Operation. MySQL default are HASH indices which are not optimal for that.
I have two inner joins in my SQL query:
SELECT `M`.`msg_id`,
`U`.`username`,
`U`.`seo_username`
FROM `newdb2`.`users` AS `U`
INNER JOIN (SELECT subscriber_to_id
FROM subscriptions
WHERE subscriber_id = 434) AS subscriber
ON id = subscriber_to_id
INNER JOIN `newdb2`.`messages` AS `M`
ON (`M`.`uid_fk` = `U`.`id`)
ORDER BY id DESC LIMIT 10
When I execute this query I see that is really slow.
How can I modify thiş query to make it faster?
Quick fixes for things like this are adding indexes which allows your database server to quickly look up columns you are searching on. For more info on how to add indexes to columns, see the manual.
In this query, those columns are:
subscriptions.subscriber_id
subscriptions.subscriber_to_id
users.id
messages.uid_fk
The ORDER BY id should be OK as I assume your id column has a primary key index on it already, but ordering queries will slow it down too.
Subselect queries will also slow the query down. In this particular query, I can't see the alias subscriber (containing the results of your subquery, which is inner joined on) used anywhere, so remove that join completely.
I can get same result for these queries, but which one is the fastest, and most efficient?
where in() or inner join?
SELECT `stats`.`userid`,`stats`.`sumpoint`
FROM `stats`
INNER JOIN users
ON `stats`.`userid` = `users`.`userid`
WHERE `users`.`nick` = '$nick'
ORDER BY `statoylar`.`sumpoint` DESC limit 0,10
and
SELECT `stats`.`userid`,`stats`.`sumpoint`
FROM `stats`
WHERE userid
IN (
SELECT userid
FROM `users`
WHERE `users`.`nick` = '$nick'
)
ORDER BY `stats`.`sumpoint` DESC limit 0,10
Depends on your SQL engine. Newer SQL systems that have reasonable query optimizers will most likely rewrite both queries to the same plan. Typically, a sub-query (your second query) is rewritten using a join (the first query).
In simple SQL engines that may not have great query optimizers, the join should be faster because they may run sub-queries into a temporary in-memory table before running the outer query.
In some SQL engines that have limited memory footprint, however, the sub-query may be faster because it doesn't require joining -- which produces more data.
So, in summary, it depends.
to check the performance execute both Query with EXPLAIN SELECT ....
AFAIK, INNER JOIN is faster than IN
btw what is your type of table engine MYISAM or INNODB
also there is another option, EXISTS. I'm a tsql guy so....
SELECT s.[userid], s.[sumpoint]
FROM stats AS s
WHERE
EXISTS (
SELECT 1
FROM users AS u
WHERE
u.[userID] = s.[userID]
AND u.[nick] = '$nick'
)
ORDER BY s.[sumpoint] DESC
I think EXISTS is available in most engines. It's generally pretty fast.
IN sql server at least (2005+) there is no performance difference at all between IN and EXISTS for cases where the column in question is not NULLABLE.
probably irrelevant but hey.....