Hi everyone i am looking for ways to optimize this SQL script for better performance can anyone please help
With accounts As
(
select account_id, creation_date
from account
where program_distributor = 'brinks'
and channel = 'online'
and creation_year = 2017
),
Form_opens as
(
select session_id, log_time
from web_action_log
where web_action = 'open_dd_form'
),
Mapping as
(
select session_id, account_id
from web_link
)
Select
trunc (acc.creation_date),
count(distinct acc.account_id),
count (distinct fo.account_id)
from accounts acc
left outer join mapping mp
on acc.account_id = mp.account_id
Left outer join form_opens fo
on mo.session_id = fo.session_id and
acc.creation_date > do.log_time
Group by trunc(acc.creation_date)
Order by 1;
Try below query.You can skip writing with clause because you are not querying with clause output tables multiple times. Adding filtering criteria during join also improves performance.
select
trunc (acc.creation_date),
count(distinct acc.account_id),
count (distinct fo.account_id)
from account acc
left outer join web_link mp
on acc.account_id = mp.account_id
left outer join web_action_log fo
on mo.session_id = fo.session_id
and acc.creation_date > fo.log_time
and fo.web_action = 'open_dd_form'
where acc.program_distributor = 'brinks'
and acc.channel = 'online'
and acc.creation_year = 2017
group by trunc(acc.creation_date)
order by 1;
Related
i have this query for calculate success total in each district. this query works but its take until 2min to output data, i have 15k rows in orders.
SELECT
nsf.id,
nsf.province,
nsf.city,
nsf.district,
nsf.shipping_fee,
IFNULL((SELECT COUNT(orders.id) FROM orders
JOIN users ON orders.customer_id = users.id
JOIN addresses ON addresses.user_id = users.id
JOIN subdistricts ON subdistricts.id = addresses.subdistrict_id
WHERE orders.status_tracking IN ("Completed","Successful Delivery")
AND subdistricts.ninja_fee_id = nsf.id
AND orders.transfer_to = "cod"),0) as success_total
from ninja_shipping_fees nsf
GROUP BY nsf.id
ORDER BY nsf.province;
the output should be like this
can you help me to improve the peformance? Thanks
Try performing the grouping/calculation in a joined "derived table" instead of a "correlated subquery"
SELECT
nsf.id
, nsf.province
, nsf.city
, nsf.district
, nsf.shipping_fee
, COALESCE( g.order_count, 0 ) AS success_total
FROM ninja_shipping_fees nsf
LEFT JOIN (
SELECT
subdistricts.ninja_fee_id
, COUNT( orders.id ) AS order_count
FROM orders
JOIN users ON orders.customer_id = users.id
JOIN addresses ON addresses.user_id = users.id
JOIN subdistricts ON subdistricts.id = addresses.subdistrict_id
WHERE orders.status_tracking IN ('Completed', 'Successful Delivery')
AND orders.transfer_to = 'cod'
GROUP BY subdistricts.ninja_fee_id
) AS g ON g.ninja_fee_id = nsf.id
ORDER BY nsf.province;
"Correlated subqueries" are often a source of poor performance.
Other notes, I prefer to use COALESCE() because it is ANSI standard and available in most SQL implementations now. Single quotes are more typically used to denote strings literals.
Inner query:
select up.user_id, up.id as utility_pro_id from utility_pro as up
join utility_pro_zip_code as upz ON upz.utility_pro_id = up.id and upz.zip_code_id=1
where up.available_for_survey=1 and up.user_id not in (select bjr.user_id from book_job_request as bjr where
((1583821800000 between bjr.start_time and bjr.end_time) and (1583825400000 between bjr.start_time and bjr.end_time)))
Divided in two queries:
select up.user_id, up.id as utility_pro_id from utility_pro as up
join utility_pro_zip_code as upz ON upz.utility_pro_id = up.id and upz.zip_code_id=1
Select bjr.user_id as userId from book_job_request as bjr where bjr.user_id in :userIds and (:startTime between bjr.start_time and bjr.end_time) and (:endTime between bjr.start_time and bjr.end_time)
Note:
As per my understanding, when single query will be executed using inner query it will scan all the data of book_job_request but while using multiple queries rows with specified user ids will be checked.
Any other better option for the same operation other than these two is also appreciated.
I expect that the query is supposed to be more like this:
SELECT up.user_id
, up.id utility_pro_id
FROM utility_pro up
JOIN utility_pro_zip_code upz
ON upz.utility_pro_id = up.id
LEFT
JOIN book_job_request bjr
ON bjr.user_id = up.user_id
AND bjr.end_time >= 1583821800000
AND bjr.start_time <= 1583825400000
WHERE up.available_for_survey = 1
AND upz.zip_code_id = 1
AND bjr.user_id IS NULL
For further help with optimisation (i.e. which indexes to provide) we'd need SHOW CREATE TABLE statements for all relevant tables as well as the EXPLAIN for the above
Another possibility:
SELECT up.user_id , up.id utility_pro_id
FROM utility_pro up
JOIN utility_pro_zip_code upz ON upz.utility_pro_id = up.id
WHERE up.available_for_survey = 1
AND upz.zip_code_id = 1
AND bjr.user_id IS NULL
AND NOT EXISTS( SELECT 1 FROM book_job_request
WHERE user_id = up.user_id
AND end_time >= 1583821800000
AND start_time <= 1583825400000 )
Recommended indexes (for my NOT EXISTS and for Strawberry's LEFT JOIN):
book_job_request: (user_id, start_time, end_time)
upz: (zip_code_id, utility_pro_id)
up: (available_for_survey, user_id, id)
The column order given is important. And, no, the single-column indexes you currently have are not as good.
I have written an sql statement that besides all the other columns should return the number of comments and the number of likes of a certain post. It works perfectly when I don't try to get the number of times it has been shared too. When I try to get the number of time it was shared instead it returns a wrong number of like that seems to be either the number of shares and likes or something like that. Here is the code:
SELECT
[...],
count(CS.commentId) as shares,
count(CL.commentId) as numberOfLikes
FROM
(SELECT *
FROM accountSpecifics
WHERE institutionId= '{$keyword['id']}') `AS`
INNER JOIN
account A ON A.id = `AS`.accountId
INNER JOIN
comment C ON C.accountId = A.id
LEFT JOIN
commentLikes CL ON C.commentId = CL.commentId
LEFT JOIN
commentShares CS ON C.commentId = CS.commentId
GROUP BY
C.time
ORDER BY
year, month, hour, month
Could you also tell me if you think this is an efficient SQL statement or if you would do it differently? thank you!
Do this instead:
SELECT
[...],
(select count(*) from commentLikes CL where C.commentId = CL.commentId) as shares,
(select count(*) from commentShares CS where C.commentId = CS.commentId) as numberOfLikes
FROM
(SELECT *
FROM accountSpecifics
WHERE institutionId= '{$keyword['id']}') `AS`
INNER JOIN account A ON A.id = `AS`.accountId
INNER JOIN comment C ON C.accountId = A.id
GROUP BY C.time
ORDER BY year, month, hour, month
If you use JOINs, you're getting back one result set, and COUNT(any field) simply counts the rows and will always compute the same thing, and in this case the wrong thing. Subqueries are what you need here. Good luck!
EDIT: as posted below, count(distinct something) can also work, but it's making the database do more work than necessary for the answer you want to end up with.
Quick fix:
SELECT
[...],
count(DISTINCT CS.commentId) as shares,
count(DISTINCT CL.commentId) as numberOfLikes
Better approach:
SELECT [...]
, Coalesce(shares.numberOfShares, 0) As numberOfShares
, Coalesce(likes.numberOfLikes , 0) As numberOfLikes
FROM [...]
LEFT
JOIN (
SELECT commentId
, Count(*) As numberOfShares
FROM commentShares
GROUP
BY commentId
) As shares
ON shares.commentId = c.commentId
LEFT
JOIN (
SELECT commentId
, Count(*) As numberOfLikes
FROM commentLikes
GROUP
BY commentId
) As likes
ON likes.commentId = c.commentId
I've created sqlfiddle to try and get my head around this http://sqlfiddle.com/#!2/21e72/1
In the query, I have put a max() on the compiled_date column but the recommendation column is still coming through incorrect - I'm assuming that a select statement will need to be inserted on line 3 somehow?
I've tried the examples provided by the commenters below but I think I just need to understand this from a basic query to begin with.
As others have pointed out, the issue is that some of the select columns are neither aggregated nor used in the group by clause. Most DBMSs won't allow this at all, but MySQL is a little relaxed on some of the standards...
So, you need to first find the max(compiled_date) for each case, then find the recommendation that goes with it.
select r.case_number, r.compiled_date, r.recommendation
from reporting r
join (
SELECT case_number, max(compiled_date) as lastDate
from reporting
group by case_number
) s on r.case_number=s.case_number
and r.compiled_date=s.lastDate
Thank you for providing sqlFiddle. But only reporting data is given. we highly appreciate if you give us sample data of whole tables.
Anyway, Could you try this?
SELECT
`case`.number,
staff.staff_name AS ``case` owner`,
client.client_name,
`case`.address,
x.mx_date,
report.recommendation
FROM
`case` INNER JOIN (
SELECT case_number, MAX(compiled_date) as mx_date
FROM report
GROUP BY case_number
) x ON x.case_number = `case`.number
INNER JOIN report ON x.case_number = report.case_number AND report.compiled_date = x.mx_date
INNER JOIN client ON `case`.client_number = client.client_number
INNER JOIN staff ON `case`.staff_number = staff.staff_number
WHERE
`case`.active = 1
AND staff.staff_name = 'bob'
ORDER BY
`case`.number ASC;
Check below query:
SELECT c.number, s.staff_name AS `case owner`, cl.client_name,
c.address, MAX(r.compiled_date), r.recommendation
FROM case c
INNER JOIN (SELECT r.case_number, r.compiled_date, r.recommendation
FROM report r ORDER BY r.case_number, r.compiled_date DESC
) r ON r.case_number = c.number
INNER JOIN client cl ON c.client_number = cl.client_number
INNER JOIN staff s ON c.staff_number = s.staff_number
WHERE c.active = 1 AND s.staff_name = 'bob'
GROUP BY c.number
ORDER BY c.number ASC
SELECT
case.number,
staff.staff_name AS `case owner`,
client.client_name,
case.address,
(select MAX(compiled_date)from report where case_number=case.number),
report.recommendation
FROM
case
INNER JOIN report ON report.case_number = case.number
INNER JOIN client ON case.client_number = client.client_number
INNER JOIN staff ON case.staff_number = staff.staff_number
WHERE
case.active = 1 AND
staff.staff_name = 'bob'
GROUP BY
case.number
ORDER BY
case.number ASC
try this
My host is saying that the following query is taking lots of Server CPU. Please tell me how can I optimize it.
SELECT COUNT(*) FROM (SELECT COUNT(*) AS tot,wallpapers.*,resolutions.res_height,resolutions.res_width FROM wallpapers
INNER JOIN analytics ON analytics.`wall_id` = wallpapers.`wall_id`
INNER JOIN resolutions ON resolutions.`res_id` = wallpapers.`res_id`
WHERE analytics.ana_date >= '2013-09-01 16:36:56' AND wallpapers.wall_status = 'public'
GROUP BY analytics.`wall_id`) as Q
Please note that the analytics table contains the records for all the pageviews and clicks. So it is very very large.
As far as I can tell, your query just counts distinct wall_id values after filtering via the joins and the WHERE clause. Something like this should be close:
SELECT COUNT(DISTINCT analytics.wall_id)
FROM wallpapers
INNER JOIN analytics ON analytics.wall_id = wallpapers.wall_id
INNER JOIN resolutions ON resolutions.res_id = wallpapers.res_id
WHERE analytics.ana_date >= '2013-09-01 16:36:56'
AND wallpapers.wall_status = 'public'
This is your query:
SELECT COUNT(*)
FROM (SELECT COUNT(*) AS tot, wallpapers.*, resolutions.res_height, resolutions.res_width
FROM wallpapers INNER JOIN
analytics
ON analytics.`wall_id` = wallpapers.`wall_id` INNER JOIN
resolutions
ON resolutions.`res_id` = wallpapers.`res_id`
WHERE analytics.ana_date >= '2013-09-01 16:36:56' AND
wallpapers.wall_status = 'public'
GROUP BY analytics.`wall_id`
) as Q
The subquery requires extra effort as does the group by. You can replace this with:
SELECT COUNT(distinct analytics.wall_id)
FROM wallpapers INNER JOIN
analytics
ON analytics.`wall_id` = wallpapers.`wall_id` INNER JOIN
resolutions
ON resolutions.`res_id` = wallpapers.`res_id`
WHERE analytics.ana_date >= '2013-09-01 16:36:56' AND
wallpapers.wall_status = 'public';
You might then be able to do further optimizations using indexes, but it would be helpful to see an explain of this query and the current indexes on the tables.