Optimizing a where statement MYSQL - mysql

Im writing this complex query to return a large dataset, which is about 100,000 records. The query runs fine until i add in this OR statement to the WHERE clause:
AND (responses.StrategyFk = strategies.Id Or responses.StrategyFk IS
Null)
Now i understand that by putting the or statement in there it adds a lot of overhead.
Without that statement and just:
AND responses.StrategyFk = strategies.Id
The query runs within 15 seconds, but doesn't return any records that didn't have a fk linking a strategie.
Although i would like these records as well. Is there an easier way to find both records with a simple where statement? I can't just add another AND statement for null records because that will break the previous statement. Kind of unsure of where to go from here.
Heres the lower half of my query.
FROM
responses, subtestinstances, students, schools, items,
strategies, subtests
WHERE
subtestinstances.Id = responses.SubtestInstanceFk
AND subtestinstances.StudentFk = students.Id
AND students.SchoolFk = schools.Id
AND responses.ItemFk = items.Id
AND (responses.StrategyFk = strategies.Id Or responses.StrategyFk IS Null)
AND subtests.Id = subtestinstances.SubtestFk

try:
SELECT ... FROM
responses
JOIN subtestinstances ON subtestinstances.Id = responses.SubtestInstanceFk
JOIN students ON subtestinstances.StudentFk = students.Id
JOIN schools ON students.SchoolFk = schools.Id
JOIN items ON responses.ItemFk = items.Id
JOIN subtests ON subtests.Id = subtestinstances.SubtestFk
LEFT JOIN strategies ON responses.StrategyFk = strategies.Id
That's it. No OR condition is really needed, because that's what a LEFT JOIN does in this case. Anywhere responses.StrategyFk IS NULL will result in no match to the strategies table, and it wil return a row for that.
See this link for a simple explanation of joins: http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html
After that, if you're still having performance issues then you can start looking at the EXPLAIN SELECT ... ; output and looking for indexes that may need to be added. Optimizing Queries With Explain -- MySQL Manual

Try using explicit JOINs:
...
FROM responses a
INNER JOIN subtestinstances b
ON b.id = a.subtestinstancefk
INNER JOIN students c
ON c.id = b.studentfk
INNER JOIN schools d
ON d.id = c.schoolfk
INNER JOIN items e
ON e.id = a.itemfk
INNER JOIN subtests f
ON f.id = b.subtestfk
LEFT JOIN strategies g
ON g.id = a.strategyfk

Related

SQL query optimization for speed

So I was working on the problem of optimizing the following query I have already optimized this to the fullest from my side can this be further optimized?
select distinct name ad_type
from dim_ad_type x where exists ( select 1
from sum_adserver_dimensions sum
left join dim_ad_tag_map on dim_ad_tag_map.id=sum.ad_tag_map_id and dim_ad_tag_map.client_id=sum.client_id
left join dim_site on dim_site.id = dim_ad_tag_map.site_id
left join dim_geo on dim_geo.id = sum.geo_id
left join dim_region on dim_region.id=dim_geo.region_id
left join dim_device_category on dim_device_category.id=sum.device_category_id
left join dim_ad_unit on dim_ad_unit.id=dim_ad_tag_map.ad_unit_id
left join dim_monetization_channel on dim_monetization_channel.id=dim_ad_tag_map.monetization_channel_id
left join dim_os on dim_os.id = sum.os_id
left join dim_ad_type on dim_ad_type.id = dim_ad_tag_map.ad_type_id
left join dim_integration_type on dim_integration_type.id = dim_ad_tag_map.integration_type_id
where sum.client_id = 50
and dim_ad_type.id=x.id
)
order by 1
Your query although joined ok, is an overall bloat. You are using the dim_ad_type table on the outside, just to make sure it exists on the inside as well. You have all those left-joins that have NO bearing on the final outcome, why are they even there. I would simplify by reversing the logic. By tracing your INNER query for the same dim_ad_type table, I find the following is the direct line. sum -> dim_ad_tag_map -> dim_ad_type. Just run that.
select distinct
dat.name Ad_Type
from
sum_adserver_dimensions sum
join dim_ad_tag_map tm
on sum.ad_tag_map_id = tm.id
and sum.client_id = tm.client_id
join dim_ad_type dat
on tm.ad_type_id = dat.id
where
sum.client_id = 50
order by
1
Your query was running ALL dim_ad_types, then finding all the sums just to find those that matched. Run it direct starting with the one client, then direct with JOINs.

sql inner and left join + performance

I have 3 tables:
room
room location
room_storys
If anyone creates a room, everytime comes a row automaticlly to room location. room_storys table can be empty.
Now I want to inner join the tables.
If I make this I get no results:
SELECT
r.name,
r.date,
rl.city,
rl.street,
rl.number,
rl.name,
rs.source,
rs.date
FROM room r
INNER JOIN room_location rl
ON rl.room_id = 67
INNER JOIN room_storys rs
ON rs.room_id = 67
LIMIT 1;
If I make this:
INNER JOIN room_storys rs
ON rs.room_id = 67
to this:
LEFT JOIN room_storys rs
ON rs.room_id = 67
```
then it works. But I heard that left join has no good performance, how you would perform this query above? Or is that okey?
Every JOIN type has its pros and cons but, for a query like yours, the β€œcost” is negligible. One recommendation would be to have the tables reference each other rather than a specific id as part of the ON clause. Specificity can be crucial with OUTER JOINs in some situations. Your query can be written like this:
SELECT r.`name`,
r.`date`,
rl.`city`,
rl.`street`,
rl.`number`,
rl.`name`,
rs.`source`,
rs.`date`
FROM `room` r INNER JOIN `room_location` rl ON r.`id` = rl.`room_id`
LEFT OUTER JOIN `room_storys` rs ON rl.`room_id` = rs.`room_id`
WHERE r.`id` = 67;
This change solves the problem of returning everything in room, which was mitigated by the LIMIT 1. It also ensures a record is returned even if there is nothing in room_stories for the room_id.
Hope this gives you something to think about with future queries πŸ‘πŸ»
Think of it this way:
ON says how the tables are related, such as
ON room.id = room_story.room_id
WHERE is used for filtering, such as
WHERE room.id = 67
Also, JOIN (or INNER JOIN requires the matching rows in each of the 2 tables to both exist. LEFT JOIN says that the matching row in the right table is optionally missing.
Putting those together, I think this is what you needed:
FROM room r
INNER JOIN room_location rl ON rl.room_id = r.id
INNER JOIN room_storys rs ON rs.room_id = r.id
WHERE r.id = 67
This becomes irrelevent (I think): LIMIT 1

Replacing Subqueries with Joins in MySQL

I have the following query:
SELECT PKID, QuestionText, Type
FROM Questions
WHERE PKID IN (
SELECT FirstQuestion
FROM Batch
WHERE BatchNumber IN (
SELECT BatchNumber
FROM User
WHERE RandomString = '$key'
)
)
I've heard that sub-queries are inefficient and that joins are preferred. I can't find anything explaining how to convert a 3+ tier sub-query to join notation, however, and can't get my head around it.
Can anyone explain how to do it?
SELECT DISTINCT a.*
FROM Questions a
INNER JOIN Batch b
ON a.PKID = b.FirstQuestion
INNER JOIN User c
ON b.BatchNumber = c.BatchNumber
WHERE c.RandomString = '$key'
The reason why DISTINCT was specified is because there might be rows that matches to multiple rows on the other tables causing duplicate record on the result. But since you are only interested on records on table Questions, a DISTINCT keyword will suffice.
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins
Try :
SELECT q.PKID, q.QuestionText, q.Type
FROM Questions q
INNER JOIN Batch b ON q.PKID = b.FirstQuestion
INNER JOIN User u ON u.BatchNumber = q.BatchNumber
WHERE u.RandomString = '$key'
select
q.pkid,
q.questiontext,
q.type
from user u
join batch b
on u.batchnumber = b.batchnumber
join questions q
on b.firstquestion = q.pkid
where u.randomstring = '$key'
Since your WHERE clause filters on the USER table, start with that in the FROM clause. Next, apply your joins backwards.
In order to do this correctly, you need distinct in the subquery. Otherwise, you might multiply rows in the join version:
SELECT q.PKID, q.QuestionText, q.Type
FROM Questions q join
(select distinct FirstQuestion
from Batch b join user u
on b.batchnumber = u.batchnumber and
u.RandomString = '$key'
) fq
on q.pkid = fq.FirstQuestion
As to whether the in or join version is better . . . that depends. In some cases, particularly if the fields are indexed, the in version might be fine.

Getting Repeated values in SQL

I was desperately trying harder and harder to get this thing done but didn`t yet succeed. I am getting repeated values when i run this query.
select
tbl_ShipmentStatus.ShipmentID
,Tbl_Contract.ContractID,
Tbl_Contract.KeyWinCountNumber,
Tbl_Item.ItemName,
Tbl_CountryFrom.CountryFromName,
Tbl_CountryTo.CountryToName,
Tbl_Brand.BrandName,
Tbl_Count.CountName,
Tbl_Seller.SellerName,
Tbl_Buyer.BuyerName,
Tbl_Contract.ContractNumber,
Tbl_Contract.ContractDate,
tbl_CountDetail.TotalQty,
tbl_CostUnit.CostUnitName,
tbl_Comission.Payment,
tbl_Port.PortName,
Tbl_Contract.Vans,
tbl_Comission.ComissionPay,
tbl_Comission.ComissionRcv,
tbl_CountDetail.UnitPrice,
tbl_Comission.ComissionRemarks,
tbl_CountDetail.Amount,
tbl_LCStatus.LCNumber,
tbl_ShipmentStatus.InvoiceNumber,
tbl_ShipmentStatus.InvoiceDate,
tbl_ShipmentStatus.BLNumber,
tbl_ShipmentStatus.BLDate,
tbl_ShipmentStatus.VesselName,
tbl_ShipmentStatus.DueDate
from tbl_ShipmentStatus
inner join tbl_LCStatus
on
tbl_LCStatus.LCID = tbl_ShipmentStatus.LCStatusID
inner join Tbl_Contract
on
tbl_LCStatus.ContractID = Tbl_Contract.ContractID
inner join Tbl_CountDetail
on Tbl_Contract.ContractID = Tbl_CountDetail.ContractId
inner join tbl_Comission
on
tbl_Comission.ContractID = Tbl_Contract.ContractID
inner join Tbl_Item
on
Tbl_Item.ItemID = Tbl_Contract.ItemID
inner join Tbl_Brand
on Tbl_Brand.BrandID = Tbl_Contract.BrandID
inner join Tbl_Buyer
on Tbl_Buyer.BuyerID = Tbl_Contract.BuyerID
inner join Tbl_Seller
on Tbl_Seller.SellerID = Tbl_Contract.SellerID
inner join Tbl_CountryFrom
on Tbl_CountryFrom.CountryFromID = Tbl_Contract.CountryFromID
inner join Tbl_CountryTo
on
Tbl_CountryTo.CountryToID = Tbl_Contract.CountryToID
inner join Tbl_Count
on
Tbl_Count.CountID = Tbl_CountDetail.CountId
inner join tbl_CostUnit
on tbl_Comission.CostUnitID = tbl_CostUnit.CostUnitID
inner join tbl_Port
on tbl_Port.PortID = tbl_Comission.PortID
where tbl_LCStatus.isDeleted = 0
and tbl_ShipmentStatus.isDeleted =0
and tbl_LCStatus.isDeleted = 0
and Tbl_CountDetail.isDeleted = 0
and Tbl_Contract.isDeleted = 0
and tbl_ShipmentStatus.LCStatusID = 5
I have also attached a picture of my result set of rows.
Any suggestions why this is happening would really be appreciable.
Result Set
Typically this happens when you have an implicit partial cross join (Cartesian product) between two of your tables. That's what it looks like to me here.
This happens most often when you have a many-to-many relationship. For example, if a single Album allows both multiple Artists and multiple Songs and the only relationship between Artists and Songs is Album, then there's essentially a many-to-many relationship between Artists and Songs. If you select from all three tables at once you're going to implicitly cross join Artists and Songs, and this may not be what you want.
Looking at your query, I see many-to-many between Tbl_CountDetail and tbl_Comission through Tbl_Contract. Try eliminating one of those joins to test to see if the behavior disappears.
Try using the DISTINCT keyword. It should solve your issue
Select DISTINCT ....
Wait as far as I can see your records are not duplicates.
HOWEVER
Notice the CountName column and Shipment ID column
The combination is unique for every row. Hence the values are unique as far as I can see. Try not selecting CountName.
Well if you have distinct rows its not a duplication problem. The issue is during the join a combination is occurring you don't want it to duplicating the results.
Either don't select CountName or you have a mistake in your data.
Only one of those rows should be true either 6 with Count2 or 6 with Count1. Likewise for 7. The fact that your getting both when your not supposed to indicates a logic mistake

How to optimize simple MySQL query with lots of INNER JOINS

I've found info on how to optimize MySQL queries, but most of the tips seem to suggest avoiding things MySQL isn't built for (e.g., calculations, validation, etc.) My query on the other hand is very straight forward but joins a lot of tables together.
Is there an approach to speeding up simple queries with many INNER JOINS? How would I fix my query below?
SELECT t_one.id FROM table_one t_one
INNER JOIN entr_to_state st
INNER JOIN entr_to_country ct
INNER JOIN entr_to_domain dm
INNER JOIN entr_timing t
INNER JOIN entr_to_weather a2w
INNER JOIN entr_to_imp_num a2i
INNER JOIN entr_collection c
WHERE t_one.type='normal'
AND t_one.campaign_id = c.id
AND t_one.status='running'
AND c.status='running'
AND (c.opt_schedule = 'continuous' OR (c.opt_schedule = 'schedulebydate'
AND (c.start_date <= '2011-03-06 14:25:52' AND c.end_date >= '2011-03-06 14:25:52')))
AND t.entr_id = t_one.id AND ct.entr_id = t_one.id
AND st.entr_id = t_one.id AND a2w.entr_id = t_one.id
AND (t_one.targeted_gender = 'male' OR t_one.targeted_gender = 'both')
AND t_one.targeted_min_age <= 23.1 AND t_one.targeted_max_age > 23.1
AND (ct.abbreviation = 'US' OR ct.abbreviation = 'any')
AND (st.abbreviation = 'CO' OR st.abbreviation = 'any')
AND t.sun = 1 AND t.hour_14 = 1
AND (a2w.weather_category_id = 1 OR a2w.weather_category_id = 0)
AND t_one.targeted_min_temp <= 46
AND t_one.targeted_max_temp > 46 GROUP BY t_one.id
Index all relevant fields, of course, which I'm sure you have
Then find which joins are the most costly ones by running EXPLAIN SELECT...
Consider splitting them off into a seperate query i.e. narrow down the record(s) you're looking for, then perform the joins on those records rather than all the records
i.e.
SELECT c.*, ....
FROM (SELECT x, y, z .... ) AS c
You would need to EXPLAIN SELECT the query, check which parts of the query are not using in indices, and then attempt to index those. If possible, break the query down into smaller parts as well.
If you really cannot in any way optimize the underlying DB or your query, you could resort to a flat table that has the data you need for fast access. Then just hook up the main query to update the flat table to run as often as needed.