Is there a way to optimize this query - mysql

I have written a query but it's taking a lot of time. I want to know if there exists any solution to optimize it without making a temp table in MYSQL. Is there a way to optimize the subquery part since AccessLog2019 is huge so it's taking forever)
Here is my query
SELECT distinct l.ListingID,l.City,l.ListingStatus,l.Price,l.Bedrooms,l.FullBathrooms, gc.Latitude,gc.Longitude , count(distinct s.AccessLogID) AS access_count, s.LBID , lb.CurrentListingID
from lockbox.Listings l
JOIN lockbox.GeoCoordinates gc ON l.ListingID = gc.ID
LEFT JOIN lockbox.LockBox lb ON l.ListingID = lb.CurrentListingID
LEFT JOIN
(SELECT * FROM lockbox.AccessLog2019 ac where ac.AccessType not in('1DayCodeGen','BluCodeGen','SmartMACGen') AND DATEDIFF(NOW(), ac.UTCAccessedDT ) < 1 ) s
ON lb.LBID = s.LBID
WHERE l.AssocID = 'AS00000000CC' AND (gc.Confidence <> '5 - Unmatchable' OR gc.Confidence IS NULL OR gc.Confidence = ' ')
group BY l.ListingID
Thanks

If you can avoid the outer group by, that is a big win. I am thinking:
SELECT l.ListingID, l.City, l.ListingStatus, l.Price, l.Bedrooms, l.FullBathrooms,
gc.Latitude, gc.Longitude,
(select count(*)
from lockbox.LockBox lb join
lockbox.AccessLog2019 ac
on lb.LBID = ac.LBID
where l.ListingID = lb.CurrentListingID and
ac.AccessType not in ('1DayCodeGen', 'BluCodeGen', 'SmartMACGen') and
DATEDIFF(NOW(), ac.UTCAccessedDT) < 1
) as cnt
from lockbox.Listings l JOIN
lockbox.GeoCoordinates gc
ON l.ListingID = gc.ID
WHERE l.AssocID = 'AS00000000CC' AND
(gc.Confidence <> '5 - Unmatchable' OR
gc.Confidence IS NULL OR
gc.Confidence = ' '
)
Note: This does not select s.LBID or lb.CurrentListingID because these don't make sense in your query. If I understand correctly, these could have different values on different rows.

You could try breaking out the subquery to the JOIN clause.
It might give a hint to the optimizer that it can use the LBID field first, and then test the AccessType later (in case the optimizer doesn't figure that out when you have the sub-select).
SELECT distinct l.ListingID,l.City,l.ListingStatus,l.Price,l.Bedrooms,l.FullBathrooms, gc.Latitude,gc.Longitude , count(distinct s.AccessLogID) AS access_count, s.LBID , lb.CurrentListingID
from lockbox.Listings l
JOIN lockbox.GeoCoordinates gc ON l.ListingID = gc.ID
LEFT JOIN lockbox.LockBox lb ON l.ListingID = lb.CurrentListingID
LEFT JOIN AccessLog2019 s
ON lb.LBID = s.LBID
AND s.AccessType not in('1DayCodeGen','BluCodeGen','SmartMACGen')
AND DATEDIFF(NOW(), s.UTCAccessedDT ) < 1
WHERE l.AssocID = 'AS00000000CC' AND (gc.Confidence <> '5 - Unmatchable' OR gc.Confidence IS NULL OR gc.Confidence = ' ')
group BY l.ListingID
Note that this is one of those cases where conditions in the JOIN clause gives different behavior than using a WHERE clause. If you just had lb.LBID = s.LBID and then had the conditions I wrote in the WHERE of the outer query the results would be different. They would exclude the records matching lb.LBID = s.LBID. But in the JOIN clause, it is part of the conditions of the outer join.

SELECT * --> Select only the columns needed.
SELECT DISTINCT ... GROUP BY -- Do one or the other, not both.
Need composite INDEX(AssocID, ListingID) (in that order)
DATEDIFF(NOW(), ac.UTCAccessedDT ) < 1 --> ac.UTCAccessedDT > NOW() - INTERVAL 1 DAY (or whatever your intent was. Then add INDEX(UTCAccessedDT)
OR is hard to optimize; consider cleansing the data so that Confidence does not have 3 values that mean the same thing.

Related

How to show the repeated value as NULL in sql?

I have a query which gives result as below, how to replace duplicate values with NULL
Query:
SELECT
word.lemma,
synset.definition,
synset.pos,
sampletable.sample
FROM
word
LEFT JOIN
sense ON word.wordid = sense.wordid
LEFT JOIN
synset ON sense.synsetid = synset.synsetid
LEFT JOIN
sampletable ON synset.synsetid = sampletable.synsetid
WHERE
word.lemma = 'good'
Result:
Required Result: all the greyed out results as NULL
First, this is the type of transformation that is generally better done at the application level. The reason is that it presupposes that the result set is in a particular order -- and you seem to be assuming this even with no order by clause.
Second, it is often simpler in the application.
However, in MySQL 8+, it is not that hard. You can do:
SELECT w.lemma,
(CASE WHEN ROW_NUMBER() OVER (PARTITION BY w.lemma, ss.definition ORDER BY st.sample) = 1
THEN ss.definition
END) as definition,
ss.pos,
st.sample
FROM word w LEFT JOIN
sense s
ON w.wordid = s.wordid LEFT JOIN
synset ss
ON s.synsetid = ss.synsetid LEFT JOIN
sampletable st
ON ss.synsetid = st.synsetid
WHERE w.lemma = 'good'
ORDER BY w.lemma, ss.definition, st.sample;
For this to work reliably, the outer ORDER BY clause needs to be compatible with the ORDER BY for the window function.
If you are using Mysql 8 try with Rank().. As I didn't have your table or data couldn't test this query.
SELECT
word.lemma
,case when r = 1 synset.definition else null end as definition
,synset.pos
,sampletable.sample
FROM
(
SELECT
word.lemma
,synset.definition
,synset.pos
,sampletable.sample
,RANK() OVER (PARTITION BY synset.definition ORDER BY synset.definition) r
FROM
(
SELECT
word.lemma,
synset.definition,
synset.pos,
sampletable.sample
FROM
word
LEFT JOIN
sense ON word.wordid = sense.wordid
LEFT JOIN
synset ON sense.synsetid = synset.synsetid
LEFT JOIN
sampletable ON synset.synsetid = sampletable.synsetid
WHERE
word.lemma = 'good'
) t
)t1;

How to get rid of OR condition in inner join [duplicate]

This question already has answers here:
How to handle multiple joins
(3 answers)
Closed 6 years ago.
I have a complex query which requires fields from a total of 4 tables. I have one inner join statement which has an OR clause, and this is slowing the query down drastically.
This is my query:
SELECT
pending_corrections.sightinguid AS 'pending_corrections_sightinguid',
vehicle_ownership.id AS 'fk_vehicle_owner',
#bill_id AS 'fk_bills',
#nullValue AS 'fk_final_sightings_sightinguid',
TRIM(pending_corrections.corrected_plate) AS 'vrn',
pending_corrections.seenDate AS 'seen_date',
cameras.in_out AS 'in_out',
vehicle_vrn.fk_sysno AS 'fk_sysno',
cameras.zone AS 'fk_zones',
'0' AS 'auto_generated'
FROM
(pending_corrections
INNER JOIN cameras ON pending_corrections.camerauid = cameras.camera_id)
INNER JOIN
vehicle_vrn ON (pending_corrections.corrected_plate = vehicle_vrn.vrn500
OR pending_corrections.corrected_plate = vehicle_vrn.vrnno)
INNER JOIN
vehicle_ownership ON vehicle_vrn.fk_sysno = vehicle_ownership.fk_sysno
WHERE
pending_corrections.corrected_plate <> ''
AND pending_corrections.corrected_plate IS NOT NULL
AND pending_corrections.unable_to_correct <> '1'
AND pending_corrections.seenDate >= #dateFrom
AND pending_corrections.seenDate <= #dateTo
AND (cameras.in_out = 1 OR cameras.in_out = 0)
AND cameras.zone IN (SELECT
zone_number
FROM
zones
WHERE
fk_site = #siteId)
AND seenDate >= vehicle_vrn.vrn_start_date
AND (seenDate <= vehicle_vrn.vrn_end_date
OR vehicle_vrn.vrn_end_date IS NULL
OR vehicle_vrn.vrn_end_date = '0001-01-01 00:00:00')
AND seenDate >= vehicle_ownership.ownership_start_date
AND (seenDate <= vehicle_ownership.ownership_end_date
OR vehicle_ownership.ownership_end_date IS NULL
OR vehicle_ownership.ownership_end_date = '0001-01-01 00:00:00')
ORDER BY pending_corrections.corrected_plate , pending_corrections.seenDate ASC;
How can I achieve the same effect but without the OR in one of the joins? The reason for the OR clause is because the pending_corrections.corrected_plate value has to match either the vrn500 or vrnno columns in the vehicle_vrn table.
Instead of using two equals expressions with an OR, you could use a IN expression such as:
FROM
(pending_corrections
INNER JOIN cameras ON pending_corrections.camerauid = cameras.camera_id)
INNER JOIN
vehicle_vrn ON pending_corrections.corrected_plate IN(vehicle_vrn.vrn500, vehicle_vrn.vrnno)
INNER JOIN
vehicle_ownership ON vehicle_vrn.fk_sysno = vehicle_ownership.fk_sysno
You can replace the first OR using IN(), as Scott mentioned
pending_corrections.corrected_plate IN (vehicle_vrn.vrn500, vehicle_vrn.vrnno)
If you can UPDATE the vrn_end_date and ownership_end_date columns from '0001-01-01 00:00:00' to NULL, the other OR conditions can be simplified to
AND (seenDate <= IFNULL(vehicle_vrn.vrn_end_date, seenDate))
What's the execution plan look like? SQL probably can't optimize this join into a hash or merge, so is just having to do table scans.
Sometimes with ORs, a UNION works well. It's more code but can run faster because SQL can optimize them better.
; WITH PendingCorrectionsCte AS
(
SELECT pc.corrected_plate,
pc.seen_date,
c.in_out,
pc.sightinguid AS 'pending_corrections_sightinguid',
FROM pending_corrections pc
INNER JOIN cameras c ON pending_corrections.camerauid = cameras.camera_id
WHERE pc.seenDate BETWEEN '2015-01-01 00:00:00' AND '2015-01-31 23:59:59'
AND NULLIF(LTRIM(pc.corrected_plate), '') IS NOT NULL
AND pending_corrections.unable_to_correct <> '1'
AND pending_corrections.seenDate >= #dateFrom
AND pending_corrections.seenDate <= #dateTo
AND c.in_out IN (0, 1)
AND c.zone IN (SELECT zone_number FROM zones WHERE fk_site = #siteId)
),
VrnCte AS
(
SELECT pc.corrected_plate,
pc.seen_date,
vrn.fk_sysno,
pc.in_out,
pc.pending_corrections_sightinguid
FROM PendingCorrectionsCte pc
INNER JOIN vehicle_vrn vrn ON pc.corrected_plate = vehicle_vrn.vrn500
-- Could also do this inline in the where clause, but chaining isnull and nullif could get hard to read
CROSS APPLY (SELECT NULLIF(vrn.vrn_end_date, '0001-01-01 00:00:00') AS value) vrn_end_date
WHERE pc.seenDate BETWEEN vrn.vrn_start_date AND ISNULL(vrn_end_date.value, seenDate)
UNION
SELECT pc.corrected_plate,
pc.seenDate,
vrn.fk_sysno,
pc.in_out,
pc.pending_corrections_sightinguid
FROM pending_corrections pc
INNER JOIN vehicle_vrn vrn ON pc.corrected_plate = vehicle_vrn.vrnno
-- Could also do this inline in the where clause, but chaining isnull and nullif could get hard to read
CROSS APPLY (SELECT NULLIF(vrn.vrn_end_date, '0001-01-01 00:00:00') AS value) vrn_end_date
WHERE pc.seenDate BETWEEN vrn.vrn_start_date AND ISNULL(vrn_end_date.value, seenDate)
)
SELECT pending_corrections.sightinguid AS 'pending_corrections_sightinguid',
vo.id AS 'fk_vehicle_owner',
#bill_id AS 'fk_bills',
#nullValue AS 'fk_final_sightings_sightinguid',
TRIM(VrnCte.corrected_plate) AS 'vrn',
VrnCte.seenDate AS 'seen_date',
VrnCte.in_out AS 'in_out',
VrnCte.fk_sysno AS 'fk_sysno',
VrnCte.pending_corrections_sightinguid
'0' AS 'auto_generated'
FROM VrnCte
INNER JOIN vehicle_ownership vo ON VrnCte.fk_sysno = vo.fk_sysno
CROSS APPLY (SELECT NULLIF(vo.ownership_end_date, '0001-01-01 00:00:00') AS value) ownership_end_date
WHERE VrnCte.seenDate BETWEEN vehicle_ownership.ownership_start_date AND ISNULL(ownership_end_date.value, seenDate)
ORDER BY
VrnCte.corrected_plate,
VrnCte.seenDate ASC

Join between sub-queries in SQLAlchemy

In relation to the answer I accepted for this post, SQL Group By and Limit issue, I need to figure out how to create that query using SQLAlchemy. For reference, the query I need to run is:
SELECT t.id, t.creation_time, c.id, c.creation_time
FROM (SELECT id, creation_time
FROM thread
ORDER BY creation_time DESC
LIMIT 5
) t
LEFT OUTER JOIN comment c ON c.thread_id = t.id
WHERE 3 >= (SELECT COUNT(1)
FROM comment c2
WHERE c.thread_id = c2.thread_id
AND c.creation_time <= c2.creation_time
)
I have the first half of the query, but I am struggling with the syntax for the WHERE clause and how to combine it with the JOIN. Any one have any suggestions?
Thanks!
EDIT: First attempt seems to mess up around the .filter() call:
c = aliased(Comment)
c2 = aliased(Comment)
subq = db.session.query(Thread.id).filter_by(topic_id=122098).order_by(Thread.creation_time.desc()).limit(2).offset(2).subquery('t')
subq2 = db.session.query(func.count(1).label("count")).filter(c.id==c2.id).subquery('z')
q = db.session.query(subq.c.id, c.id).outerjoin(c, c.thread_id==subq.c.id).filter(3 >= subq2.c.count)
this generates the following SQL:
SELECT t.id AS t_id, comment_1.id AS comment_1_id
FROM (SELECT count(1) AS count
FROM comment AS comment_1, comment AS comment_2
WHERE comment_1.id = comment_2.id) AS z, (SELECT thread.id AS id
FROM thread
WHERE thread.topic_id = :topic_id ORDER BY thread.creation_time DESC
LIMIT 2 OFFSET 2) AS t LEFT OUTER JOIN comment AS comment_1 ON comment_1.thread_id = t.id
WHERE z.count <= 3
Notice the sub-query ordering is incorrect, and subq2 somehow is selecting from comment twice. Manually fixing that gives the right results, I am just unsure of how to get SQLAlchemy to get it right.
Try this:
c = db.aliased(Comment, name='c')
c2 = db.aliased(Comment, name='c2')
sq = (db.session
.query(Thread.id, Thread.creation_time)
.order_by(Thread.creation_time.desc())
.limit(5)
).subquery(name='t')
sq2 = (
db.session.query(db.func.count(1))
.select_from(c2)
.filter(c.thread_id == c2.thread_id)
.filter(c.creation_time <= c2.creation_time)
.correlate(c)
.as_scalar()
)
q = (db.session
.query(
sq.c.id, sq.c.creation_time,
c.id, c.creation_time,
)
.outerjoin(c, c.thread_id == sq.c.id)
.filter(3 >= sq2)
)

Slow MySQL query with subquery from table

I am trying to bring back a string based on an IF statement but it is extremely slow.
It has something to do with the first subquery but I am unsure of how to rearrange this as to bring back the same results but faster.
Here is my SQL:
SELECT IF
(
(
SELECT COUNT(*)
FROM
(
SELECT DISTINCT enquiryId, type
FROM parts_enquiries, parts_service_types AS pst
WHERE parts_enquiries.serviceTypeId = pst.id
) AS parts
WHERE parts.enquiryId = enquiries.id
) > 1, 'Mixed',
(
SELECT DISTINCT type
FROM parts_enquiries, parts_service_types AS pst
WHERE parts_enquiries.serviceTypeId = pst.id AND enquiryId = enquiries.id
)
) AS partTypes
FROM enquiries,
entities
WHERE enquiries.entityId = entities.id
How can I make it faster?
I have modified my original query below, but I am getting the error that subquery returns more than one row:
SELECT
(SELECT
CASE WHEN COUNT(DISTINCT type) > 1 THEN 'Mixed' ELSE `type` END AS type
FROM parts_enquiries
INNER JOIN parts_service_types AS pst ON parts_enquiries.serviceTypeId = pst.id
INNER JOIN enquiries ON parts_enquiries.enquiryId = enquiries.id
INNER JOIN entities ON enquiries.entityId = entities.id
GROUP BY enquiryId) AS partTypes
FROM enquiries,
entities
WHERE enquiries.entityId = entities.id
Please have a look if this query yields the same results:
SELECT
enquiryId,
CASE WHEN COUNT(DISTINCT type) > 1 THEN 'Mixed' ELSE `type` END AS type
FROM parts_enquiries
INNER JOIN parts_service_types AS pst ON parts_enquiries.serviceTypeId = pst.id
INNER JOIN enquiries ON parts_enquiries.enquiryId = enquiries.id
INNER JOIN entities ON enquiries.entityId = entities.id
GROUP BY enquiryId
But N.B.'s comment is still valid. To see if and index is used and other information we need to see the EXPLAIN and the table definitions.
This should get you what you want.
I would first pre-query your parts enquiries and parts service types looking for both the count and MINIMUM of the part 'type', grouped by the enquiry ID.
then, run your IF() against that result. If the distinct count is > 0, then 'Mixed'. If only one, since I did the MIN(), it would only have the description of that one value that you desire anyhow.
SELECT
E.ID
IF ( PreQuery.DistTypes > 1, 'Mixed', PreQuery.FirstType ) as PartType
from
Enquiries E
JOIN ( SELECT
PE.EnquiryID,
COUNT( DISTINCT PE.ServiceTypeID ) as DistTypes,
MIN( PST.Type ) as FirstType
from
Parts_Enquiries PE
JOIN Parts_Service_Types PST
ON PE.ServiceTypeID = PST.ID
group by
PE.EnquiryID ) as PreQuery
ON E.ID = PreQuery.EnquiryID

MySQL evaluating results

I have a funny MySQL query that needs to pull a subquery from another table, I'm wondering if this is even possible to get mysql to evaluate the subquery.
example:
(I had to replace some brackets with 'gte' & 'lte' cause they were screwing up the post format)
select a.id,a.alloyname,a.label,a.symbol, g.grade,
if(a.id = 1,(
(((select avg(cost/2204.6) as averageCost from nas_cost where cost != '0' and `date` lte '2011-03-01' and `date` gte '2011-03-31') - t.value) * (astm.astm/100) * 1.2)
),(a.formulae)) as thisValue
from nas_alloys a
left join nas_triggers t on t.alloyid = a.id
left join nas_astm astm on astm.alloyid = a.id
left join nas_estimatedprice ep on ep.alloyid = a.id
left join nas_grades g on g.id = astm.gradeid
where a.id = '1' or a.id = '2'
order by g.grade;
So when the IF statement is not = '1' then the (a.formulae) is the value in the nas_alloys table which is:
((ep.estPrice - t.value) * (astm.astm/100) * 0.012)
Basically I want this query to run as:
select a.id,a.alloyname,a.label,a.symbol, g.grade,
if(a.id = 1,(
(((select avg(cost/2204.6) as averageCost from nas_cost where cost != '0' and `date` gte '2011-03-01' and `date` lte '2011-03-31') - t.value) * (astm.astm/100) * 1.2)
),((ep.estPrice - t.value) * (astm.astm/100) * 0.012)) as thisValue
from nas_alloys a
left join nas_triggers t on t.alloyid = a.id
left join nas_astm astm on astm.alloyid = a.id
left join nas_estimatedprice ep on ep.alloyid = a.id
left join nas_grades g on g.id = astm.gradeid
where a.id = '1' or a.id = '2'
order by g.grade;
When a.id != '1', btw, there are about 30 different possibilities for a.formulae, and they change frequently, so hard banging in multiple if statements is not really an option. [redesigning the business logic is more likely than that!]
Anyway, any thoughts? Will this even work?
-thanks
-sean
Create a Stored Function to compute that value for you, and pass the params you will decide later on. When your business logic changes, you just have to update the Stored Function.