SQL optimization (MySQL) - mysql

I know it´s difficult to answer without knowing the model, but I have next heavy query that takes around 10 secs to complete in my MySQL database. I guess it can be optimized, but I´m not that skilled.
SELECT DISTINCT
b . *
FROM
boats b,
states s,
boat_people bp,
countries c,
provinces pr,
cities ct1,
cities ct2,
ports p,
addresses a,
translations t,
element_types et
WHERE
s.name = 'Confirmed' AND bp.id = '2'
AND b.state_id = s.id
AND b.id NOT IN (SELECT
bc.boat_id
FROM
boat_calendars bc
WHERE
(date(bc.since) <= '2015-02-09 09:23:00 +0100'
AND date(bc.until) >= '2015-02-09 09:23:00 +0100')
OR (date(bc.since) <= '2015-02-10 09:23:00 +0100'
AND date(bc.until) >= '2015-02-10 09:23:00 +0100'))
AND b.people_capacity_id >= bp.id
AND c.id = (SELECT DISTINCT
t.element_id
FROM
translations t,
element_types et
WHERE
t.element_translation = 'Spain'
AND et.name = 'Country'
AND t.element_type_id = et.id)
AND pr.country_id = c.id
AND pr.id = (SELECT DISTINCT
t.element_id
FROM
translations t,
element_types et
WHERE
t.element_translation = 'Mallorca'
AND et.name = 'Province'
AND t.element_type_id = et.id)
AND ((ct1.province_id = pr.id AND p.city_id = ct1.id AND b.port_id = p.id)
OR (ct2.province_id = pr.id AND a.city_id = ct2.id AND b.address_id = a.id));
Basically, it tries to get all the boats, that are not already booked in Confirmed state and that are in a province and a country ie. Mallorca, Spain.
Please, let me know if you need some more details about de purpose of the query or the model.

remove * from select clause. instead give column names in select clause. it will increase some
performance. Its one of the way to optimize

Instead of having a sub query, use LEFT JOIN NULL (just google for it) and it will help a lot.

All your answers are good. But, according to #POHH suggestions, I increased the performance magically by just replacing the b.* for b.somecolumnsnames.
From 10 to 1 or 2 secs.

Related

Group by not getting the expected results in mysql

I have the next query:
SELECT DISTINCT
bt.name, b.id
FROM
ports po,
cities c,
provinces p,
countries co,
states s,
translations t,
element_types et,
languages l,
boat_models bm,
boat_types bt,
boats b
JOIN
boat_prices bprf ON b.id = bprf.boat_id
AND bprf.checkin_date IS NULL
AND bprf.duration_id IS NULL
WHERE
t.element_translation = 'España'
AND et.name = 'Country'
AND s.name = 'confirmed'
AND s.id = b.state_id
AND l.locale = 'es'
AND t.language_id = l.id
AND t.element_type_id = et.id
AND t.element_id = p.country_id
AND c.province_id = p.id
AND po.city_id = c.id
AND b.port_id = po.id
AND bm.id = b.boat_model_id
AND bt.id = bm.boat_type_id
That is working perfectly and returning 9 rows:
'BOAT_TYPE_CATAMARAN','13707'
'BOAT_TYPE_SAILBOAT','13700'
'BOAT_TYPE_SAILBOAT','13701'
'BOAT_TYPE_SAILBOAT','13702'
'BOAT_TYPE_SAILBOAT','13703'
'BOAT_TYPE_SAILBOAT','13704'
'BOAT_TYPE_SAILBOAT','13705'
'BOAT_TYPE_SAILBOAT','13706'
'BOAT_TYPE_SAILBOAT','13708'
I want to group the results by boat type and get the number of boats per type.
However, when I do:
SELECT DISTINCT
bt.name, COUNT(b.id) AS num_boats
FROM
ports po,
cities c,
provinces p,
countries co,
states s,
translations t,
element_types et,
languages l,
boat_models bm,
boat_types bt,
boats b
JOIN
boat_prices bprf ON b.id = bprf.boat_id
AND bprf.checkin_date IS NULL
AND bprf.duration_id IS NULL
WHERE
t.element_translation = 'España'
AND et.name = 'Country'
AND s.name = 'confirmed'
AND s.id = b.state_id
AND l.locale = 'es'
AND t.language_id = l.id
AND t.element_type_id = et.id
AND t.element_id = p.country_id
AND c.province_id = p.id
AND po.city_id = c.id
AND b.port_id = po.id
AND bm.id = b.boat_model_id
AND bt.id = bm.boat_type_id
GROUP BY bt.name
ORDER BY bt.name
I´m getting:
'BOAT_TYPE_CATAMARAN','241'
'BOAT_TYPE_SAILBOAT','1928'
but according to the first query, I´m expecting
'BOAT_TYPE_CATAMARAN','1'
'BOAT_TYPE_SAILBOAT','8'
What am I missing?
I suspect that you want:
SELECT bt.name, COUNT(DISTINCT b.id) AS num_boats
FROM ...
WHERE ...
GROUP BY bt.name
ORDER BY bt.name
That is: move the DISTINCT within the COUNT() rather than directly in the SELECT.
Generally speaking, DISTINCT and GROUP BY do not go along well together; DISTINCT is already aggregation in essence, so mixing both is usually not relevant.
Note that your syntax uses old-school, implicit joins (with a comma in the FROM clause): you should be using standard joins (with the ON keyword), whose syntax has been state-of-the-art for decades.
You are doing a distinct in your first query so you are 'hiding' a lot if rows that gets doubled because of your join.

Issues with duplicated Ids in aggregated subquery

I have this query: (apologies for complexity, I'm not certain what I can remove without impacting the question)
SELECT COUNT(*) AS total,
SUM(o.total) AS total_loss,
SUM((SELECT SUM(cost_price) FROM `orders_items` WHERE orders_id = o.orders_id)) AS cost_total ,
SUM((SELECT COUNT(*) FROM refunds AS r1 WHERE r1.order_id = r.order_id AND NOT r.reason IS NULL)) AS refund_count ,
SUM((SELECT COUNT(*) FROM exchanges AS e1 WHERE e1.order_id = e.order_id AND e.type = :countResend AND NOT e.reason IS NULL)) AS resend_count ,
SUM((SELECT COUNT(*) FROM exchanges AS e2 WHERE e2.order_id = e.order_id AND e.type = :countExchange AND NOT e.reason IS NULL)) AS exchange_count
FROM orders AS o
JOIN sales_channel_config AS s ON o.sales_channel = s.sales_channel AND o.sub_sales_channel = s.sub_sales_channel
JOIN courier_service AS cs ON o.courier_service = cs.code
LEFT JOIN refunds AS r ON o.orders_id = r.order_id
JOIN orders_items AS oi ON o.orders_id = oi.orders_id
JOIN third_party_config AS tc ON SUBSTRING(oi.product_id_new, 3, 2) = tc.code
LEFT JOIN exchanges AS e ON o.orders_id = e.order_id
WHERE 1 = 1
AND o.tracking_num NOT IN (:cancelStatus)
AND (o.order_date >= :startDate AND o.order_date <= :endDate)
AND o.courier_service = :courier
AND SUBSTRING(oi.product_id_new, 3, 2) = :supplier
AND (NOT r.reason IS NULL OR NOT e.reason IS NULL)
The problem I'm having is that the various SUM((query)) clauses are counting duplicate orders, which is proving difficult to resolve. For example:
SUM((SELECT COUNT(DISTINCT r1.order_id) FROM refunds AS r1 WHERE r1.order_id = r.order_id AND NOT r.reason IS NULL)) AS refund_count ,
And
SUM((SELECT COUNT(*) FROM refunds AS r1 WHERE r1.order_id = r.order_id AND NOT r.reason IS NULL GROUP BY r1.order_id)) AS refund_count ,
Do not lower the resulting SUM at all. I have confirmed that the data returned will contain duplicates via another structurally identical query that returns rows from the parent query. When the other query is run without duplicate filtering, the counts match correctly so I'm confident that my problem query is accurate aside from including duplicated order ids.
So can anyone suggest another approach I might try?
For anyone who might benefit:
I removed most of the select logic and grouped on orders_id, which gives me an entirely accurate list of relevant orders:
SELECT o.orders_id AS order_id, r.id AS refund_id, e.id AS exchange_id, e.type AS exchange_type
FROM orders AS o
JOIN sales_channel_config AS s ON o.sales_channel = s.sales_channel AND o.sub_sales_channel = s.sub_sales_channel
JOIN courier_service AS cs ON o.courier_service = cs.code
LEFT JOIN refunds AS r ON o.orders_id = r.order_id
JOIN orders_items AS oi ON o.orders_id = oi.orders_id
JOIN third_party_config AS tc ON SUBSTRING(oi.product_id_new, 3, 2) = tc.code
LEFT JOIN exchanges AS e ON o.orders_id = e.order_id
WHERE 1 = 1
AND o.tracking_num NOT IN (:cancelStatus)
AND (o.order_date >= :startDate
AND o.order_date <= :endDate)
AND o.courier_service = :courier
AND SUBSTRING(oi.product_id_new, 3, 2) = :supplier
AND (NOT r.reason IS NULL OR NOT e.reason IS NULL)
GROUP BY (o.orders_id)
I've bitten the bullet here. I'm going to do some post processing to get the counts themselves, which is at least possible for me now.
Still don't understand why getting distinct values in the sub selects failed though.

how to simplify my sql query

I have this query, but it takes about 15 seconds to finish.. how can i simplyfy it to get same result in less time? my problem is that i need all of this data at ones.
SELECT * FROM (
SELECT c.client_id, c.client_name, c.client_bpm,
c.client_su_name, c.client_maxbpm, s.bpm,
s.timestamp, m.mesure_id, ms.currentT
FROM tbl_clients c, tbl_meting m, tbl_sensor_meting s,
tbl_magsens_meting ms
WHERE c.client_id = m.client_id
AND (m.mesure_id = s.id_mesure
OR m.mesure_id = ms.id_mesure)
AND m.live =1
ORDER BY s.timestamp DESC
) AS mesure
GROUP BY mesure.client_id
I think the problem may be the OR condition from your WHERE clause? You seem to be trying to join to one table or another, which you can't do. So I've replaced it with a LEFT JOIN, so in the event no related records exist nothing will be returned.
I also took out your GROUP BY, as I don't think it was required.
SELECT c.client_id, c.client_name, c.client_bpm,
c.client_su_name, c.client_maxbpm, s.bpm,
s.timestamp, m.mesure_id, ms.currentT
FROM tbl_clients c
JOIN tbl_meting m ON m.client_id = c.client_id
LEFT JOIN tbl_sensor_meting s ON s.id_mesure = m.mesure_id
LEFT JOIN tbl_magsens_meting ms ON ms.id_mesure = m.mesure_id
WHERE m.live = 1
ORDER BY s.timestamp DESC

Speedier alternative to SUBSTRING() in MySQL?

I have a query that uses SUBSTRING() as a criteria:
SELECT p.name p_name,
pa.line1 p_line1,
pa.zip p_zip,
c.name c_name,
ca.line1 c_line1,
ca.zip c_zip
FROM bank b
JOIN import_bundle ib ON ib.bank_id = b.id
JOIN generic_import gi ON gi.import_bundle_id = ib.id
JOIN account_import ai ON ai.generic_import_id = gi.id
JOIN account a ON a.account_import_id = ai.id
JOIN account_address aa ON aa.account_id = a.id
JOIN address ca ON aa.address_id = ca.id
JOIN address pa ON pa.zip = ca.zip OR (pa.zip = ca.zip AND pa.line1 = ca.line1)
JOIN prospect p ON p.address_id = pa.id
JOIN customer c ON a.customer_id = c.id
WHERE b.name = 'M'
AND ib.active = 1
AND gi.active = 1
AND SUBSTRING(p.name, 1, 12) = SUBSTRING(c.name, 1, 12)
LIMIT 100
As you can see, it's just comparing the first 12 characters of p.name and c.name. Unfortunately, adding this query to the WHERE clause makes my query unbearably slow. Are there any tricks out there to do this same comparison, or is my best bet to add another column to each table that contains the first 12 characters of the customer's name? I hope it's not the latter because that would be a lot of work and I'll ultimately be doing several comparisons like this.
Add the extra columns and set up an update trigger to populate them automatically. Be sure to create indexes on the new columns, of course.

MySQL evaluating results

I have a funny MySQL query that needs to pull a subquery from another table, I'm wondering if this is even possible to get mysql to evaluate the subquery.
example:
(I had to replace some brackets with 'gte' & 'lte' cause they were screwing up the post format)
select a.id,a.alloyname,a.label,a.symbol, g.grade,
if(a.id = 1,(
(((select avg(cost/2204.6) as averageCost from nas_cost where cost != '0' and `date` lte '2011-03-01' and `date` gte '2011-03-31') - t.value) * (astm.astm/100) * 1.2)
),(a.formulae)) as thisValue
from nas_alloys a
left join nas_triggers t on t.alloyid = a.id
left join nas_astm astm on astm.alloyid = a.id
left join nas_estimatedprice ep on ep.alloyid = a.id
left join nas_grades g on g.id = astm.gradeid
where a.id = '1' or a.id = '2'
order by g.grade;
So when the IF statement is not = '1' then the (a.formulae) is the value in the nas_alloys table which is:
((ep.estPrice - t.value) * (astm.astm/100) * 0.012)
Basically I want this query to run as:
select a.id,a.alloyname,a.label,a.symbol, g.grade,
if(a.id = 1,(
(((select avg(cost/2204.6) as averageCost from nas_cost where cost != '0' and `date` gte '2011-03-01' and `date` lte '2011-03-31') - t.value) * (astm.astm/100) * 1.2)
),((ep.estPrice - t.value) * (astm.astm/100) * 0.012)) as thisValue
from nas_alloys a
left join nas_triggers t on t.alloyid = a.id
left join nas_astm astm on astm.alloyid = a.id
left join nas_estimatedprice ep on ep.alloyid = a.id
left join nas_grades g on g.id = astm.gradeid
where a.id = '1' or a.id = '2'
order by g.grade;
When a.id != '1', btw, there are about 30 different possibilities for a.formulae, and they change frequently, so hard banging in multiple if statements is not really an option. [redesigning the business logic is more likely than that!]
Anyway, any thoughts? Will this even work?
-thanks
-sean
Create a Stored Function to compute that value for you, and pass the params you will decide later on. When your business logic changes, you just have to update the Stored Function.