Optimize QUERY JOIN - mysql

I'm working on a transport program that forces me to create a long query to search for possible routes. I need to optimize the query as much as possible. It calculates stopovers between bus stops. People get off at one stop and get on at another within a radius in kilometers. I have to ensure that the path is not prevented by some rules.
How can I optimize the query?
The problem is the relationship between t2 and t3 and the relationship between t5 and t6 which are joined by a radius.
SELECT '3' AS type, s1.id_sott AS id_sott1,s2.id_sott AS id_sott2,s3.id_sott AS id_sott3,s4.id_sott AS id_sott4, s5.id_sott AS id_sott5,s6.id_sott AS id_sott6, '0' AS id_sott7, '0' AS id_sott8, ch1.changeid as changeid1, ch2.changeid as changeid2, '0' AS changeid3,
ABS((s2.distance - s1.distance)) as dist1, ABS((s4.distance - s3.distance)) as dist2, ABS((s6.distance - s5.distance)) as dist3,'0' AS dist4, (ABS((s2.distance - s1.distance)) + ABS((s4.distance - s3.distance)) + ABS((s6.distance - s5.distance)) ) AS km,
s1.id_corsa AS id_corsa1,s3.id_corsa AS id_corsa2,s5.id_corsa AS id_corsa3,'0' AS id_corsa4, s1.orario AS orariostart1,s2.orario AS orariostop1, s3.orario AS orariostart2, s4.orario AS orariostop2,s5.orario AS orariostart3, s6.orario AS orariostop3,'0' AS orariostart4,
IFNULL(
SELECT GROUP_CONCAT(corse)
FROM regole_linee
WHERE ('2023-02-24' BETWEEN da AND a )
AND FIND_IN_SET( (DAYOFWEEK( '2023-02-24' ) -1 ) , giorni_sett)
AND id_az= 28 AND stato=1
, '0'
) AS rl,
111.111 * DEGREES(ACOS(LEAST(1.0, COS(RADIANS(t3.lat)) * COS(RADIANS(t2.lat)) * COS(RADIANS(t3.lon - t2.lon)) + SIN(RADIANS(t3.lat)) * SIN(RADIANS(t2.lat))))) AS dist_frompart1,
111.111 * DEGREES(ACOS(LEAST(1.0, COS(RADIANS(t5.lat)) * COS(RADIANS(t4.lat)) * COS(RADIANS(t5.lon - t4.lon)) + SIN(RADIANS(t5.lat)) * SIN(RADIANS(t4.lat))))) AS dist_frompart2,
'0' AS dist_frompart3
FROM corse_fermate AS s1
INNER JOIN corse_fermate AS s2 ON s1.id_corsa = s2.id_corsa
INNER JOIN corse_fermate AS s3
INNER JOIN corse_fermate AS s4 ON s3.id_corsa = s4.id_corsa
INNER JOIN corse_fermate AS s5
INNER JOIN corse_fermate AS s6 ON s5.id_corsa = s6.id_corsa
INNER JOIN tratte_sottoc AS t ON t.id_sott=s1.id_sott
INNER JOIN tratte_sottoc AS t2 ON t2.id_sott=s2.id_sott
INNER JOIN tratte_sottoc AS t3 ON t3.id_sott=s3.id_sott
INNER JOIN tratte_sottoc AS t4 ON t4.id_sott=s4.id_sott
INNER JOIN tratte_sottoc AS t5 ON t5.id_sott=s5.id_sott
INNER JOIN tratte_sottoc AS t6 ON t6.id_sott=s6.id_sott
/*
INNER JOIN tratte_sottoc_tratte AS tt1 ON (s1.id_sott=tt1.id_sott1 AND s2.id_sott=tt1.id_sott2)
INNER JOIN tratte_sottoc_tratte AS tt2 ON (s3.id_sott=tt2.id_sott1 AND s4.id_sott=tt2.id_sott2)
INNER JOIN tratte_sottoc_tratte AS tt3 ON (s5.id_sott=tt3.id_sott1 AND s6.id_sott=tt3.id_sott2)
*/
INNER JOIN changeover AS ch1 ON s2.id_sott=ch1.changeid
INNER JOIN changeover AS ch2 ON s4.id_sott=ch2.changeid
WHERE s1.id_sott = 3
AND s6.id_sott = 85
AND s2.ordine > s1.ordine AND s4.ordine > s3.ordine AND s6.ordine > s5.ordine
AND s1.id_corsa != s3.id_corsa AND s1.id_corsa != s5.id_corsa AND s3.id_corsa != s5.id_corsa
AND s1.id_sott != s2.id_sott AND s6.id_sott != s4.id_sott AND s2.id_sott != s4.id_sott
AND s1.stato=1 AND s3.stato=1 AND s5.stato=1
AND TIMESTAMPDIFF(MINUTE, s2.orario, s3.orario) >= 0
AND TIMESTAMPDIFF(MINUTE, s2.orario, s3.orario) <= 180
AND TIMESTAMPDIFF(MINUTE, s4.orario, s5.orario) >= 0
AND TIMESTAMPDIFF(MINUTE, s4.orario, s5.orario) <= 180
/*AND s1.id_az=1 AND s2.id_az=1 AND s3.id_az=1 AND s4.id_az=1 AND s5.id_az=1 AND s6.id_az=1 AND ch1.id_az=1 AND ch2.id_az=1 */
GROUP BY s1.id_sott,s2.id_sott,s3.id_sott,s4.id_sott,s5.id_sott,s6.id_sott,s1.id_corsa,s3.id_corsa,s5.id_corsa
HAVING dist_frompart1 < 5
AND dist_frompart2 < 5
AND find_in_set(s1.id_corsa,rl) = 0
AND find_in_set(s3.id_corsa,rl) = 0
AND find_in_set(s5.id_corsa,rl) = 0
ORDER BY km ASC LIMIT 5
corse_fermate
regole_linee
final path

Instead of using a subquery, you can do a left join with the regole_linee table, you will have the same result with less effort.

These indexes may help performance:
corse_fermate: INDEX(stato, id_sott, ordine, id_corsa, distance, orario)
tratte_sottoc: INDEX(id_sott, lat, long)
Does regole_linee have an index starting with id_az?
DATE_FORMAT( '2023-02-24 00:00:00', '%Y-%m-%d %H:%i:%s' ) can be simplified to simply '2023-02-24'.
D

Related

Having clause not filter some rows

Have query which calculates average scores by users, want to add condition to show score in some range
Select byUserAndQ.email, byUserAndQ.userName,
round( avg(byUserAndQ.innerScore) ) as score,
round( avg (byUserAndQ.innerScore) ) >=50 as inRange
from (
SELECT tu.email, tu.name as userName,
sum(ta.score)/q.max_score*100 as innerScore
FROM test t
INNER JOIN test_user tu ON tu.test_id = t.id
LEFT JOIN test_action ta on ta.test_id = t.id and ta.email = tu.email
LEFT JOIN question q ON q.id = ta.question_id
WHERE t.id = 144
AND tu.email IS NOT NULL
GROUP BY tu.email, q.id ) as byUserAndQ
group by byUserAndQ.email
having score>= 50
returns me all rows instead score>=50:
user1#example.com, User1,56,1
user2#example.com, User2,28,0
user3#example.com,User3,78,1
if I remove , byUser.userName from top select - it working fine. which is very strange for me. I think Having should be applied to final data, so no matter which fields else in result.
Also tried to group by name as well - without success:
Select byUserAndQ.email, byUserAndQ.userName,
round( avg(byUserAndQ.innerScore) ) as score,
round( avg (byUserAndQ.innerScore) ) >=50 as inRange
from (SELECT tu.email, tu.name as userName,
sum(ta.score)/q.max_score*100 as innerScore
FROM test t
INNER JOIN test_user tu ON tu.test_id = t.id
LEFT JOIN test_action ta on ta.test_id = t.id and ta.email = tu.email
LEFT JOIN question q ON q.id = ta.question_id
WHERE t.id = 144
AND tu.email IS NOT NULL
GROUP BY tu.email, tu.name, q.id
) as byUserAndQ
group by byUserAndQ.email, byUserAndQ.userName
having score>= 50
wrapping in another subquery with where works fine as well:
select * from (Select byUser.email, byUser.userName, round( avg(byUser.innerScore) ) as score, round( avg (byUser.innerScore) ) >=50 as inRange
from (SELECT
tu.email, tu.name as userName,
sum(ta.score)/q.max_score*100 as innerScore
FROM test t
INNER JOIN test_user tu ON tu.test_id = t.id
LEFT JOIN test_action ta on ta.test_id = t.id and ta.email = tu.email
LEFT JOIN question q ON q.id = ta.question_id
WHERE t.id = 144
AND tu.email IS NOT NULL
GROUP BY tu.email, q.id ) as byUser
group by byUser.email ) as final
where score>= 50
Why its working in such way?
MySql version: 5.7.12( AWS Serverless RDS )

Improving query performance from mysql slow log query

We are using Prestashop as an e-commerce application and for the mobile application we used the same database and Prestashop architecture and almost using the same queries from core PrestaShop.
For some of the queries that we are using in our NODE Js API give us RDS (MySQL) CPU spike up to 100% when the traffic spikes up.
RDS configuration: db.m4.xlarg,4vCPU, 16 GB RAM
Slow Query (This query only appears in slow query logs when traffic is high) :
EXPLAIN SELECT t1.id_product,
t1.position,
t1.price,
t1.quantity,
t1.reserve_stock,
t1.name,
t1.link_rewrite hyphen_name,
t1.id_category_default,
t1.id_sub_category,
t1.is_back_in_stock,
(SELECT link_rewrite FROM pml_category_lang WHERE id_category = 36 AND id_shop = 1 AND id_lang = 1 LIMIT 1) category_link_rewrite,
CONCAT('[', t1.images, ']') images,
t3.reduction,
t3.reduction_type,
IF(t3.reduction,
IF(t3.reduction_type = 'percentage', ROUND((t1.price - (t3.reduction * t1.price))),
ROUND(t1.price - t3.reduction)),
t1.price) discounted_price
FROM
(SELECT ps.id_product, ps.id_category_default, ps.is_back_in_stock, cp.position, ROUND ((7/100) * ps.price * 1 + ps.price * 1) price,
sa.quantity, sa.reserve_stock, pl.name, p.date_add, pl.link_rewrite,
(SELECT pml_category.id_category FROM pml_category
LEFT JOIN pml_category_product ON pml_category_product.id_category = pml_category.id_category
AND pml_category_product.id_shop = 1
WHERE id_parent = 36 AND pml_category_product.id_category IS NOT NULL AND pml_category_product.id_product = cp.id_product LIMIT 1) AS id_sub_category,
GROUP_CONCAT(DISTINCT CONCAT('{"id_image":', i.id_image, ',', '"position":', i.position, ',', '"cover":', ims.cover, '}')) images
FROM pml_category_product cp
JOIN pml_product_shop ps
ON cp.id_product = ps.id_product AND cp.id_shop = ps.id_shop
JOIN pml_product p
ON p.id_product = cp.id_product
AND p.id_category_default = 36
JOIN pml_stock_available sa
ON cp.id_product = sa.id_product AND cp.id_shop = sa.id_shop AND sa.id_product_attribute = 0 AND sa.quantity > 0
JOIN pml_product_lang pl
ON cp.id_product = pl.id_product AND cp.id_shop = pl.id_shop AND pl.id_lang = 1
JOIN pml_image i
ON cp.id_product = i.id_product
JOIN pml_image_shop ims
ON ims.id_image = i.id_image AND ims.id_shop = 1
WHERE ps.id_shop = 1
AND ps.active = 1 AND i.smartly !=1
AND ps.visibility = 'both'
GROUP BY cp.id_product) t1 join pml_pomelo_rank pr on t1.id_product = pr.id_product and pr.id_shop = 1 and (pr.alltime_regular_cr > 2 or pr.alltime_qty_sold > 100) and pr.id_product != 14930 and pr.id_product not in (select id_product_2 from pml_accessory where id_product_1 = 14930)
LEFT JOIN
(SELECT t2.* FROM
(SELECT id_product, id_specific_price_rule, reduction, reduction_type, `from`, `to`
FROM pml_specific_price
WHERE id_shop IN (0, 1) and id_currency IN (0,1)
AND ((`from` = '0000-00-00 00:00:00' OR '2017-08-15 13:15:33' >= `from`) AND (`to` = '0000-00-00 00:00:00' OR '2017-08-15 13:15:33' <= `to`))
ORDER BY id_product ASC, id_specific_price_rule ASC) t2
GROUP BY t2.id_product) t3
ON t1.id_product = t3.id_product ORDER BY rand() LIMIT 0, 4;
Current we are trying to find if it would be good to increase RDS infrastructure size or Queries like this needs improvements.
Note: this is exactly the same query that Prestashop core uses for category page
any suggestion or help regarding query optimization or infrastructure RDS (config) optimization for such queries would be helpful.
RDS GRAPH:

Sum does'nt work for multiply

SELECT Sum((pvc.cpt_amount) * (pvc.unit)) AS billed_amount,
pvc.cpt_amount,
pvc.unit
FROM payment AS pay
LEFT JOIN patient_insurances AS pi
ON pay.who_paid = pi.id
LEFT JOIN patient_visit_cpt AS pvc
ON (
pay.encounter_id = pvc.id
AND pay.claim_id = pvc.claim_id )
LEFT JOIN patient_visit AS pv
ON pvc.visit_id = pv.id
WHERE
and pi.insurance_id = 761
AND pay.created_at <= '2016-04-01 23:00:00'
AND pay.created_at >= '2016-03-31 00:00:00'
GROUP BY pay.check_number,
pv.attending_provider_id
for unit = 1 and cpt_amount = 145, but output comes like
mulitplied with 4. like 1 * 145 = 580.. please any one give solution
Your Group By might be wrong.
Also remove keyword AND after the WHERE Clause
Try below query
SELECT Sum((pvc.cpt_amount) * (pvc.unit)) AS billed_amount,
pvc.cpt_amount,pvc.unit
FROM payment AS pay
LEFT JOIN patient_insurances AS pi ON pay.who_paid = pi.id
LEFT JOIN patient_visit_cpt AS pvc ON(pay.encounter_id = pvc.id AND pay.claim_id = pvc.claim_id)
LEFT JOIN patient_visit AS pv ON pvc.visit_id = pv.id
WHERE pi.insurance_id = 761 AND pay.created_at <= '2016-04-01 23:00:00'
AND pay.created_at >= '2016-03-31 00:00:00'
GROUP BY pvc.cpt_amount,pvc.unit

Use results returned from select subquery in update query

I am having a query in which i need to update a table.
This is my select query:
SELECT so.fk_customer,
IF (((sum( lgg.fk_catalog_attribute_option_global_gender = 1 )/count( lgg.fk_catalog_attribute_option_global_gender )) *100 > 60) AND (c.gender='male'), 'Men',
IF (((sum( lgg.fk_catalog_attribute_option_global_gender = 2 )/count( lgg.fk_catalog_attribute_option_global_gender )) *100 > 60) AND (c.gender='female'), 'Women', '') ) as calculatedGender
FROM catalog_attribute_link_global_gender AS lgg
INNER JOIN catalog_simple AS cs ON cs.fk_catalog_config = lgg.fk_catalog_config
INNER JOIN sales_order_item AS soi ON soi.sku = cs.sku
INNER JOIN sales_order AS so ON soi.fk_sales_order = so.id_sales_order
INNER JOIN customer as c ON c.id_customer = so.fk_customer
WHERE lgg.fk_catalog_attribute_option_global_gender IN (1,2)
AND so.created_at BETWEEN DATE_SUB(NOW(), INTERVAL 30 DAY) AND NOW()
GROUP BY so.fk_customer
HAVING count( lgg.fk_catalog_attribute_option_global_gender ) > 2
I have another table in which there is a column fk_customer and column gender, I need to update that table with the results from above query. I need to do it in the same query. The above query is giving me perfect results.
You will have to merge the UPDATE statement with this query as:
UPDATE anotherTable A JOIN
(SELECT so.fk_customer,
IF (((sum( lgg.fk_catalog_attribute_option_global_gender = 1 ) / count(lgg.fk_catalog_attribute_option_global_gender )) *100 > 60)
AND (c.gender='male'), 'Men',
IF (((sum( lgg.fk_catalog_attribute_option_global_gender = 2 ) / count( lgg.fk_catalog_attribute_option_global_gender )) *100 > 60)
AND (c.gender='female'), 'Women', '') ) as calculatedGender,
'ID'
FROM catalog_attribute_link_global_gender AS lgg
INNER JOIN catalog_simple AS cs ON cs.fk_catalog_config = lgg.fk_catalog_config
INNER JOIN sales_order_item AS soi ON soi.sku = cs.sku
INNER JOIN sales_order AS so ON soi.fk_sales_order = so.id_sales_order
INNER JOIN customer as c ON c.id_customer = so.fk_customer
WHERE lgg.fk_catalog_attribute_option_global_gender IN (1,2)
AND so.created_at BETWEEN DATE_SUB(NOW(), INTERVAL 30 DAY) AND NOW()
GROUP BY so.fk_customer
HAVING count( lgg.fk_catalog_attribute_option_global_gender ) > 2) B
ON A.'ID' = B.'ID'
SET A.fk_customer = B.fk_customer,
A.gender = B.calculatedGender;
This shall give you the desired result provided you figure out the 'ID' column on which you will join both the tables A and B.

How to optimize for speed a sql multiple select with SUM

I have a really long select from my database with many joins. The problem is with counting SUM: without sum, select time is about 3s, but with SUM is about 15s.
Is it possible to optimize my select to obtain a shorter select time?
Here is my code:
SELECT
accomodation.id,
accomodation.aid,
accomodation.title_en,
accomodation.title_url_en,
accomodation.address,
accomodation.zip,
accomodation.stars,
accomodation.picture,
accomodation.valid_from,
accomodation.valid_to,
accomodation.latitude,
accomodation.longitude,
accomodation.city_id AS
accomodation_city_id,
db_cities.id AS city_id,
db_cities.title_en AS city,
db_cities.title_url AS city_url,
db_countries.title_en AS country_title,
db_countries.title_url_en AS country_url,
accomodation_type.class AS accomodation_type_class,
accomodation_review_value_total.value AS review_total,
MIN(accomodation_price.price) AS price_from,
accomodation_rooms.total_persons
FROM
(SELECT aid, MAX(info_date_add) AS max_info_date_add FROM accomodation GROUP BY aid) accomodation_max
INNER JOIN accomodation
ON
accomodation_max.aid = accomodation.aid AND
accomodation_max.max_info_date_add = accomodation.info_date_add
LEFT JOIN db_cities
ON (
db_cities.id = accomodation.city_id OR
(((acos(sin((db_cities.latitude*pi()/180)) * sin((accomodation.latitude*pi()/180)) + cos((db_cities.latitude*pi()/180)) * cos((accomodation.latitude*pi()/180)) * cos(((db_cities.longitude - accomodation.longitude)*pi()/180))))*180/pi())*60*1.1515*1.609344) < '20')
JOIN db_countries
ON db_countries.id = accomodation.country_id
LEFT JOIN accomodation_review_value_total
ON accomodation_review_value_total.accomodation_aid = accomodation.aid
LEFT JOIN accomodation_type_value
ON accomodation_type_value.accomodation_id = accomodation.id
LEFT JOIN accomodation_type
ON accomodation_type.id = accomodation_type_value.accomodation_type_id
JOIN accomodation_season
ON (
accomodation_season.accomodation_aid = accomodation.aid AND
( '2013-11-04' BETWEEN accomodation_season.start_date AND accomodation_season.end_date OR '2013-11-05' BETWEEN accomodation_season.start_date AND accomodation_season.end_date ) )
JOIN accomodation_price
ON
accomodation_price.accomodation_aid = accomodation.aid AND
accomodation_price.accomodation_price_type_id = '1' AND
accomodation_price.accomodation_price_cat_id = '1' AND
accomodation_price.price BETWEEN '20' AND '250' AND
accomodation_price.accomodation_season_id = accomodation_season.id
JOIN accomodation_theme_value
ON accomodation_theme_value.accomodation_id = accomodation.id
INNER JOIN
(SELECT
accomodation_id,
SUM(accomodation_rooms.rooms) AS total_rooms,
SUM(accomodation_rooms.beds * accomodation_rooms.rooms) AS total_persons
FROM accomodation_rooms
GROUP BY accomodation_id) accomodation_rooms
ON
accomodation_rooms.accomodation_id = accomodation.id AND
accomodation_rooms.total_persons >= '4'
WHERE
db_countries.title_url_en LIKE '%spain%' AND
db_cities.title_url LIKE '%barcelona%' AND
accomodation_type_value.accomodation_type_id IN (5,10) AND
total_rooms >= '2' AND
accomodation_theme_value.accomodation_theme_id IN (11,12,13) AND
accomodation.stars IN (3,4,5) AND
( accomodation_review_value_total.value >= '4.5' ) AND
db_cities.id = '2416'
GROUP BY accomodation.aid
ORDER BY
CASE
WHEN accomodation.valid_to>=NOW() AND accomodation.valid_from<=NOW() AND MIN(accomodation_price.price) IS NOT NULL THEN 0
WHEN NOW()>accomodation.valid_to AND accomodation.valid_to>'0000-00-00' AND MIN(accomodation_price.price) IS NOT NULL THEN 1
WHEN accomodation.valid_to>=NOW() AND accomodation.valid_from<=NOW() THEN 2
WHEN NOW()>accomodation.valid_to AND accomodation.valid_to>'0000-00-00' THEN 3
ELSE 4 END,
review_total DESC,
accomodation.title_en
LIMIT 10