Optimizing SQL query with sub queries - mysql

I have got a SQL query that I tried to optimize and I could reduce through various means the time from over 5 seconds to about 1.3 seconds, but no further. I was wondering if anyone would be able to suggest further improvements.
The Explain diagram shows a full scan:
explain diagram
The Explain table will give you more details:
explain tabular
The query is simplified and shown below - just for reference, I'm using MySQL 5.6
select * from (
select
#row_num := if(#yacht_id = yacht_id and #charter_type = charter_type and #start_base_id = start_base_id and #end_base_id = end_base_id, #row_num +1, 1) as row_number,
#yacht_id := yacht_id as yacht_id,
#charter_type := charter_type as charter_type,
#start_base_id := start_base_id as start_base_id,
#end_base_id := end_base_id as end_base_id,
model, offer_type, instant, rating, reviews, loa, berths, cabins, currency, list_price, list_price_per_day,
discount, client_price, client_price_per_day, days, date_from, date_to, start_base_city, end_base_city, start_base_country, end_base_country,
service_binary, product_id, ext_yacht_id, main_image_url
from (
select
offer.yacht_id, offer.charter_type, yacht.model, offer.offer_type, offer.instant, yacht.rating, yacht.reviews, yacht.loa,
yacht.berths, yacht.cabins, offer.currency, offer.list_price, offer.list_price_per_day,
offer.discount, offer.client_price, offer.client_price_per_day, offer.days, date_from, date_to,
offer.start_base_city, offer.end_base_city, offer.start_base_country, offer.end_base_country,
offer.service_binary, offer.product_id, offer.start_base_id, offer.end_base_id,
yacht.ext_yacht_id, yacht.main_image_url
from website_offer as offer
join website_yacht as yacht
on offer.yacht_id = yacht.yacht_id,
(select #yacht_id:='') as init
where date_from > CURDATE()
and date_to <= CURDATE() + INTERVAL 3 MONTH
and days = 7
order by offer.yacht_id, charter_type, start_base_id, end_base_id, list_price_per_day asc, discount desc
) as filtered_offers
) as offers
where row_number=1;
Thanks,
goppi
UPDATE
I had to abandon some performance improvements and replaced the original select with the new one. The select query is actually dynamically built by the backend based on which filter criteria are set. As such the where clause of the most inner select can expland quite a lot. However, this is the default select if no filter is set and is the version that takes significantly longer than 1 sec.
explain in text form - doesn't come out pretty as I couldn't figure out how to format a table, but here it is:
1 PRIMARY ref <auto_key0> <auto_key0> 9 const 10
2 DERIVED ALL 385967
3 DERIVED system 1 Using filesort
3 DERIVED offer ref idx_yachtid,idx_search,idx_dates idx_dates 5 const 385967 Using index condition; Using where
3 DERIVED yacht eq_ref PRIMARY,id_UNIQUE PRIMARY 4 yachtcharter.offer.yacht_id 1
4 DERIVED No tables used

Sub selects are never great,
You should sign up here: https://www.eversql.com/
Run that and it will give you all the right indexes and optimsiations you need for this query.

There's still some optimization you can use. Considering the subquery returns 5000 rows only you could use an index for it.
First rephrase the predicate as:
select *
from website_offer
where date_from >= CURDATE() + INTERVAL 1 DAY -- rephrased here
and date(date_to) <= CURDATE() + INTERVAL 3 MONTH
and days = 7
order by yacht_id, charter_type, list_price_per_day asc, discount desc
limit 5000
Then, if you add the following index the performance could improve:
create index ix1 on website_offer (days, date_from, date_to);

Related

How to speed up a query containing HAVING?

I have a table with close to a billion records, and need to query it with HAVING. It's very slow (about 15 minutes on decent hardware). How to speed it up?
SELECT ((mean - 3.0E-4)/(stddev/sqrt(N))) as t, ttest.strategyid, mean, stddev, N,
kurtosis, strategies.strategyId
FROM ttest,strategies
WHERE ttest.strategyid=strategies.id AND dataset=3 AND patternclassid="1"
AND exitclassid="1" AND N>= 300 HAVING t>=1.8
I think the problem is t cannot be indexed because it needs to be computed. I cannot add it as a column because the '3.0E-4' will vary per query.
Table:
create table ttest (
strategyid bigint,
patternclassid integer not null,
exitclassid integer not null,
dataset integer not null,
N integer,
mean double,
stddev double,
skewness double,
kurtosis double,
primary key (strategyid, dataset)
);
create index ti3 on ttest (mean);
create index ti4 on ttest (dataset,patternclassid,exitclassid,N);
create table strategies (
id bigint ,
strategyId varchar(500),
primary key(id),
unique key(strategyId)
);
explain select.. :
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
ttest
NULL
range
PRIMARY,ti4
ti4
17
NULL
1910344
100.00
Using index condition; Using MRR
1
SIMPLE
strategies
NULL
eq_ref
PRIMARY
PRIMARY
8
Jellyfish_test.ttest.strategyid
1
100.00
Using where
The query needs to reformulated and an index needs to be added.
Plan A:
SELECT ((tt.mean - 3.0E-4)/(tt.stddev/sqrt(tt.N))) as t,
tt.strategyid, tt.mean, tt.stddev, tt.N, tt.kurtosis,
s.strategyId
FROM ttest AS tt
JOIN strategies AS s ON tt.strategyid = s.id
WHERE tt.dataset = 3
AND tt.patternclassid = 1
AND tt.exitclassid = 1
AND tt.N >= 300
AND ((tt.mean - 3.0E-4)/(tt.stddev/sqrt(tt.N))) >= 1.8
and a 'composite' and 'covering' index on test. Replace your ti4 with this (to make it 'covering'):
INDEX(dataset, patternclassid, exitclassid, -- any order
N, strategyid) -- in this order
Plan B:
SELECT ((tt.mean - 3.0E-4)/(tt.stddev/sqrt(tt.N))) as t,
tt.strategyid, tt.mean, tt.stddev, tt.N, tt.kurtosis,
( SELECT s.strategyId
FROM strategies AS s
WHERE s.id = tt.strategyid = s.id
) AS strategyId
FROM ttest AS tt
WHERE tt.dataset = 3
AND tt.patternclassid = 1
AND tt.exitclassid = 1
AND tt.N >= 300
AND ((tt.mean - 3.0E-4)/(tt.stddev/sqrt(tt.N))) >= 1.8
With the same index.
Unfortunately the expression for t needs to be repeated. By moving it from HAVING to WHERE, avoids gathering unwanted rows, only to end up throwing them away. Maybe the optimizer will do that automatically. Please provide EXPLAIN SELECT ... to see.
Also, it is unclear whether one of the two formulations will run faster than the other.
To be honest, I've never seen HAVING being used like this; for 20+ years I've assumed it can only be used in GROUP BY situations!
Anyway, IMHO you don't need it here, as Rick James points out, you can put it all in the WHERE.
Rewriting it a bit I end up with:
SELECT ((t.mean - 3.0E-4)/(t.stddev/sqrt(t.N))) as t,
t.strategyid,
t.mean,
t.stddev,
t.N,
t.kurtosis,
s.strategyId
FROM ttest t,
JOIN strategies s
ON s.id = t.strategyid =
WHERE t.dataset=3
AND t.patternclassid="1"
AND t.exitclassid="1"
AND t.N>= 300
AND ((t.mean - 3.0E-4)/(t.stddev/sqrt(t.N))) >= 1.8
Most of that we can indeed foresee a reasonable index. The problem remains with the last calculation:
AND ((t.mean - 3.0E-4)/(t.stddev/sqrt(t.N))) >= 1.8
However, before we go to that: how many rows are there if you ignore this 'formula'? 100? 200? If so, indexing as foreseen in Rick James' answer should be sufficient IMHO.
If it's 1000's or many more than the question becomes: how much of those are thrown out by the formula? 1%? 50% 99%? If it's on the low side then again, indexing as proposed by Rick James will do. If however you only need to keep a few you may want to further optimize this and index accordingly.
From your explanation I understand that 3.0E-4 is variable so we can't include it in the index.. so we'll need to extract the parts we can:
If my algebra isn't failing me you can play with the formula like this:
AND ((t.mean - 3.0E-4) / (t.stddev / sqrt(t.N))) >= 1.8
AND ((t.mean - 3.0E-4) ) >= 1.8 * (t.stddev / sqrt(t.N))
AND t.mean - 3.0E-4 >= (1.8 * (t.stddev / sqrt(t.N)))
AND - 3.0E-4 >= (1.8 * (t.stddev / sqrt(t.N))) - t.mean
So the query becomes:
SELECT ((t.mean - 3.0E-4)/(t.stddev/sqrt(t.N))) as t,
t.strategyid,
t.mean,
t.stddev,
t.N,
t.kurtosis,
s.strategyId
FROM ttest t,
JOIN strategies s
ON s.id = t.strategyid =
WHERE t.dataset=3
AND t.patternclassid="1"
AND t.exitclassid="1"
AND t.N>= 300
AND (1.8 * (t.stddev / sqrt(t.N))) - t.mean <= -3.0E-4
I'm not familiar with mysql but glancing the documentation it should be possible to include 'generated columns' in the index. So, we'll do exactly that with (1.8 * (t.stddev / sqrt(t.N)) - t.mean).
Your indexed fields thus become:
dataset, paternclassid, exitclassid, N, (1.8 * (t.stddev / sqrt(t.N))) - t.mean)
Note that the system will have to calculate this value for each and every row on insert (and possibly update) you do on the table. However, once there (and indexed) it should make the query quite a bit faster.

Below Sql query is taking too much time. How to make this faster?

SELECT
call_id
,call_date
,call_no
,call_amountdue
,rechargesamount
,call_penalty
,callpayment_received
,calldiscount
FROM `call`
WHERE calltype = 'Regular'
AND callcode = 98
AND call_connect = 1
AND call_date < '2018-01-01'
ORDER BY
`call_date` DESC
,`call_id` DESC
limit 1
Index is already there on call_date, callcode, calltype, callconnect
Table has 10 million records. Query is taking 2 min
How to get results within 3sec?
INDEX (calltype, callcode, call_connect, -- in any order
call_date, -- next
call_id) -- last
This will make it possible to find the one row that is desired without having to step over other rows.
Since you seem to have INDEX(calltype), Drop it; it will be in the way and, anyway, redundant. The rest of the indexes you mentioned will be ignored.
More discussion in Index Cookbook

Optimizing Parameterized MySQL Queries

I have a query that has a number of parameters which if I run from in MySQLWorkbench takes around a second to run.
If I take this query and get rid of the parameters and instead substitute the values into the query then it takes about 22 seconds to run, same as If I convert this query to a parameterized stored procedure and run it (it then takes about 22 seconds).
I've enabled profiling on MySQL and I can see a few things there. For example, it shows the number of rows examined and there's an order of difference (20,000 to 400,000) which I assume is the reason for the 20x increase in processing time.
The other difference in the profile is that the parameterized query sent from MySQLWorkbench still has the parameters in (e.g. where limit < #lim) while the sproc the values have been set (where limit < 300).
I've tried this a number of different ways, I'm using JetBrains's DataGrip (as well as MySQLWorkbench) and that works like MySQLWorkbench (sends through the # parameters), I've tried executing the queries and the sproc from MySQLWorkbench, DataGrip, Java (JDBC) and .Net. I've also tried prepared statements in Java but I can't get anywhere near the performance of sending the 'raw' SQL to MySQL.
I feel like I'm missing something obvious here but I don't know what it is.
The query is relatively complex, it has a CTE a couple of sub-selects and a couple of joins, but as I said it runs quickly straight from MySQL.
My main question is why the query is 20x faster in one format than another.
Does the way the query is sent to MySQL have anything to do with this (the '#' values sent through and can I replicate this in a stored procedure?
Updated 1st Jan
Thanks for the comments, I didn't post the query originally as I'm more interested in the general concepts around the use of variables/parameters and how I could take advantage of that (or not)
Here is the original query:
with tmp_bat as (select bd.MatchId,
bd.matchtype,
bd.playerid,
bd.teamid,
bd.opponentsid,
bd.inningsnumber,
bd.dismissal,
bd.dismissaltype,
bd.bowlerid,
bd.fielderid,
bd.score,
bd.position,
bd.notout,
bd.balls,
bd.minutes,
bd.fours,
bd.sixes,
bd.hundred,
bd.fifty,
bd.duck,
bd.captain,
bd.wicketkeeper,
m.hometeamid,
m.awayteamid,
m.matchdesignator,
m.matchtitle,
m.location,
m.tossteamid,
m.resultstring,
m.whowonid,
m.howmuch,
m.victorytype,
m.duration,
m.ballsperover,
m.daynight,
m.LocationId
from (select *
from battingdetails
where matchid in
(select id
from matches
where id in (select matchid from battingdetails)
and matchtype = #match_type
)) as bd
join matches m on m.id = bd.matchid
join extramatchdetails emd1
on emd1.MatchId = m.Id
and emd1.TeamId = bd.TeamId
join extramatchdetails emd2
on emd2.MatchId = m.Id
and emd2.TeamId = bd.TeamId
)
select players.fullname name,
teams.teams team,
'' opponents,
players.sortnamepart,
innings.matches,
innings.innings,
innings.notouts,
innings.runs,
HS.score highestscore,
HS.NotOut,
CAST(TRUNCATE(innings.runs / (CAST((Innings.Innings - innings.notOuts) AS DECIMAL)),
2) AS DECIMAL(7, 2)) 'Avg',
innings.hundreds,
innings.fifties,
innings.ducks,
innings.fours,
innings.sixes,
innings.balls,
CONCAT(grounds.CountryName, ' - ', grounds.KnownAs) Ground,
'' Year,
'' CountryName
from (select count(case when inningsnumber = 1 then 1 end) matches,
count(case when dismissaltype != 11 and dismissaltype != 14 then 1 end) innings,
LocationId,
playerid,
MatchType,
SUM(score) runs,
SUM(notout) notouts,
SUM(hundred) Hundreds,
SUM(fifty) Fifties,
SUM(duck) Ducks,
SUM(fours) Fours,
SUM(sixes) Sixes,
SUM(balls) Balls
from tmp_bat
group by MatchType, playerid, LocationId) as innings
JOIN players ON players.id = innings.playerid
join grounds on Grounds.GroundId = LocationId and grounds.MatchType = innings.MatchType
join
(select pt.playerid, t.matchtype, GROUP_CONCAT(t.name SEPARATOR ', ') as teams
from playersteams pt
join teams t on pt.teamid = t.id
group by pt.playerid, t.matchtype)
as teams on teams.playerid = innings.playerid and teams.matchtype = innings.MatchType
JOIN
(SELECT playerid,
LocationId,
MAX(Score) Score,
MAX(NotOut) NotOut
FROM (SELECT battingdetails.playerid,
battingdetails.score,
battingdetails.notout,
battingdetails.LocationId
FROM tmp_bat as battingdetails
JOIN (SELECT battingdetails.playerid,
battingdetails.LocationId,
MAX(battingdetails.Score) AS score
FROM tmp_bat as battingdetails
GROUP BY battingdetails.playerid,
battingdetails.LocationId,
battingdetails.playerid) AS maxscore
ON battingdetails.score = maxscore.score
AND battingdetails.playerid = maxscore.playerid
AND battingdetails.LocationId = maxscore.LocationId ) AS internal
GROUP BY internal.playerid, internal.LocationId) AS HS
ON HS.playerid = innings.playerid and hs.LocationId = innings.LocationId
where innings.runs >= #runs_limit
order by runs desc, KnownAs, SortNamePart
limit 0, 300;
Wherever you see '#match_type' then I substitute that for a value ('t'). This query takes ~1.1 secs to run. The query with the hard coded values rather than the variables down to ~3.5 secs (see the other note below). The EXPLAIN for this query gives this:
1,PRIMARY,<derived7>,,ALL,,,,,219291,100,Using temporary; Using filesort
1,PRIMARY,players,,eq_ref,PRIMARY,PRIMARY,4,teams.playerid,1,100,
1,PRIMARY,<derived2>,,ref,<auto_key3>,<auto_key3>,26,"teams.playerid,teams.matchtype",11,100,Using where
1,PRIMARY,grounds,,ref,GroundId,GroundId,4,innings.LocationId,1,10,Using where
1,PRIMARY,<derived8>,,ref,<auto_key0>,<auto_key0>,8,"teams.playerid,innings.LocationId",169,100,
8,DERIVED,<derived3>,,ALL,,,,,349893,100,Using temporary
8,DERIVED,<derived14>,,ref,<auto_key0>,<auto_key0>,13,"battingdetails.PlayerId,battingdetails.LocationId,battingdetails.Score",10,100,Using index
14,DERIVED,<derived3>,,ALL,,,,,349893,100,Using temporary
7,DERIVED,t,,ALL,PRIMARY,,,,3323,100,Using temporary; Using filesort
7,DERIVED,pt,,ref,TeamId,TeamId,4,t.Id,65,100,
2,DERIVED,<derived3>,,ALL,,,,,349893,100,Using temporary
3,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where
3,DERIVED,m,,eq_ref,PRIMARY,PRIMARY,4,matches.Id,1,100,
3,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index
3,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index
3,DERIVED,battingdetails,,ref,"TeamId,MatchId,match_team",match_team,8,"emd1.TeamId,matches.Id",15,100,
3,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails)
and the EXPLAIN for the query with the hardcoded values looks like this:
1,PRIMARY,<derived8>,,ALL,,,,,20097,100,Using temporary; Using filesort
1,PRIMARY,players,,eq_ref,PRIMARY,PRIMARY,4,HS.PlayerId,1,100,
1,PRIMARY,grounds,,ref,GroundId,GroundId,4,HS.LocationId,1,100,Using where
1,PRIMARY,<derived2>,,ref,<auto_key0>,<auto_key0>,30,"HS.LocationId,HS.PlayerId,grounds.MatchType",17,100,Using where
1,PRIMARY,<derived7>,,ref,<auto_key0>,<auto_key0>,46,"HS.PlayerId,innings.MatchType",10,100,Using where
8,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where; Using temporary
8,DERIVED,m,,eq_ref,"PRIMARY,LocationId",PRIMARY,4,matches.Id,1,100,
8,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index
8,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index
8,DERIVED,<derived14>,,ref,<auto_key2>,<auto_key2>,4,m.LocationId,17,100,
8,DERIVED,battingdetails,,ref,"PlayerId,TeamId,Score,MatchId,match_team",MatchId,8,"matches.Id,maxscore.PlayerId",1,3.56,Using where
8,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails)
14,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where; Using temporary
14,DERIVED,m,,eq_ref,PRIMARY,PRIMARY,4,matches.Id,1,100,
14,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index
14,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index
14,DERIVED,battingdetails,,ref,"TeamId,MatchId,match_team",match_team,8,"emd1.TeamId,matches.Id",15,100,
14,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails)
7,DERIVED,t,,ALL,PRIMARY,,,,3323,100,Using temporary; Using filesort
7,DERIVED,pt,,ref,TeamId,TeamId,4,t.Id,65,100,
2,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where; Using temporary
2,DERIVED,m,,eq_ref,PRIMARY,PRIMARY,4,matches.Id,1,100,
2,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index
2,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index
2,DERIVED,battingdetails,,ref,"TeamId,MatchId,match_team",match_team,8,"emd1.TeamId,matches.Id",15,100,
2,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails)
Pointers as to ways to improve my SQL are always welcome (I'm definitely not a database person), but I''d still like to understand whether I can use the SQL with the variables from code and why that improves the performance by so much
Update 2 1st Jan
AAArrrggghhh. My machine rebooted overnight and now the queries are generally running much quicker. It's still 1 sec vs 3 secs but the 20 times slowdown does seem to have disappeared
In your WITH construct, are you overthinking your select in ( select in ( select in ))) ... overstating what could just be simplified to the with Innings I have in my solution.
Also, you were joining to the extraMatchDetails TWICE, but joined on the same conditions on match and team, but never utliized either of those tables in the "WITH CTE" rendering that component useless, doesn't it? However, the MATCH table has homeTeamID and AwayTeamID which is what I THINK your actual intent was
Also, your WITH CTE is pulling many columns not needed or used in subsequent return such as Captain, WicketKeeper.
So, I have restructured... pre-query the batting details once up front and summarized, then you should be able to join off that.
Hopefully this MIGHT be a better fit, function and performance for your needs.
with innings as
(
select
bd.matchId,
bd.matchtype,
bd.playerid,
m.locationId,
count(case when bd.inningsnumber = 1 then 1 end) matches,
count(case when bd.dismissaltype in ( 11, 14 ) then 0 else 1 end) innings,
SUM(bd.score) runs,
SUM(bd.notout) notouts,
SUM(bd.hundred) Hundreds,
SUM(bd.fifty) Fifties,
SUM(bd.duck) Ducks,
SUM(bd.fours) Fours,
SUM(bd.sixes) Sixes,
SUM(bd.balls) Balls
from
battingDetails bd
join Match m
on bd.MatchID = m.MatchID
where
matchtype = #match_type
group by
bd.matchId,
bd.matchType,
bd.playerid,
m.locationId
)
select
p.fullname playerFullName,
p.sortnamepart,
CONCAT(g.CountryName, ' - ', g.KnownAs) Ground,
t.team,
i.matches,
i.innings,
i.runs,
i.notouts,
i.hundreds,
i.fifties,
i.ducks,
i.fours,
i.sixes,
i.balls,
CAST( TRUNCATE( i.runs / (CAST((i.Innings - i.notOuts) AS DECIMAL)), 2) AS DECIMAL(7, 2)) 'Avg',
hs.maxScore,
hs.maxNotOut,
'' opponents,
'' Year,
'' CountryName
from
innings i
JOIN players p
ON i.playerid = p.id
join grounds g
on i.locationId = g.GroundId
and i.matchType = g.matchType
join
(select
pt.playerid,
t.matchtype,
GROUP_CONCAT(t.name SEPARATOR ', ') team
from
playersteams pt
join teams t
on pt.teamid = t.id
group by
pt.playerid,
t.matchtype) as t
on i.playerid = t.playerid
and i.MatchType = t.matchtype
join
( select
i2.playerid,
i2.locationid,
max( i2.score ) maxScore,
max( i2.notOut ) maxNotOut
from
innings i2
group by
i2.playerid,
i2.LocationId ) HS
on i.playerid = HS.playerid
AND i.locationid = HS.locationid
FROM
where
i.runs >= #runs_limit
order by
i.runs desc,
g.KnownAs,
p.SortNamePart
limit
0, 300;
Now, I know that you stated that after the server reboot, performance is better, but really, what you DO have appears to really have overbloated queries.
Not sure this is the correct answer but I thought I'd post this in case other people have the same issue.
The issue seems to be the use of CTEs in a stored procedure. I have a query that creates a CTE and then uses that CTE 8 times. If I run this query using interpolated variables it takes about 0.8 sec, if I turn it into a stored procedure and use the stored procedure parameters then it takes about to a minute (between 45 and 63 seconds) to run!
I've found a couple of ways of fixing this, one is to use multiple temporary tables (8 in this case) as MySQL cannot re-use a temp table in a query. This gets the query time right down but just doesn't fell like a maintainable or scalable solution. The other fix is to leave the variables in place and assign them from the stored procedure parameters, this also has no real performance issues. So my sproc looks like this:
create procedure bowling_individual_career_records_by_year_for_team_vs_opponent(IN team_id INT,
IN opponents_id INT)
begin
set #team_id = team_id;
set #opponents_id = opponents_id;
# use these variables in the SQL below
...
end
Not sure this is the best solution but it works for me and keeps the structure of the SQL the same as it was previously.

mariadb slow count on console but very fast on dbeaver

I have this SQL query that when executing from mariadb console or PHP (Laravel) takes a long time
SELECT Count(*) AS aggregate
FROM (SELECT paquetes.id,
paquetes.codigoseguimiento,
paquetes.direccionentrega,
paquetes.telefono1,
paquetes.nombrerecibe,
paquetes.nombrequienenvia,
paquetes.estado,
paquetes.marcadevolucion,
tipos_paquetes.nombre AS tipo,
paquetes.created_at,
paquetes.devolucion,
ciudades.nombre AS ciudadEntrega
FROM paquetes
LEFT JOIN tipos_paquetes
ON tipos_paquetes.id = paquetes.tipo
LEFT JOIN ciudades
ON ciudades.id = paquetes.ciudadentrega
WHERE paquetes.created_at BETWEEN
"2021-10-17 00:00:00" AND "2021-11-17 23:59:59"
AND paquetes.estado != 0
ORDER BY paquetes.created_at DESC) count_row_table
Result:
+-----------+
| aggregate |
+-----------+
| 763141 |
+-----------+
1 row in set (20.631 sec)
But if I run the same SQL from DBeaver (always clearing the query cache)
It only takes 1,458 seconds.
What I discovered is that adding a limit 1 to the SQL runs at the same speed as DBeaver in both mariadb console and PHP
SELECT Count(*) AS aggregate
FROM (SELECT paquetes.id,
paquetes.codigoseguimiento,
paquetes.direccionentrega,
paquetes.telefono1,
paquetes.nombrerecibe,
paquetes.nombrequienenvia,
paquetes.estado,
paquetes.marcadevolucion,
tipos_paquetes.nombre AS tipo,
paquetes.created_at,
paquetes.devolucion,
ciudades.nombre AS ciudadEntrega
FROM paquetes
LEFT JOIN tipos_paquetes
ON tipos_paquetes.id = paquetes.tipo
LEFT JOIN ciudades
ON ciudades.id = paquetes.ciudadentrega
WHERE paquetes.created_at BETWEEN
"2021-10-17 00:00:00" AND "2021-11-17 23:59:59"
AND paquetes.estado != 0
ORDER BY paquetes.created_at DESC) count_row_table
LIMIT 1;
Result:
+-----------+
| aggregate |
+-----------+
| 763141 |
+-----------+
1 row in set (1.339 sec)
The SQL query is generated automatically by the DataTables library for Laravel, the explains shows:
But I'm not sure if this is correct or why it happens, there are currently about 8 million records in the table
Edit:
The execution plan for the query without limit 1
It seems that it is going through the whole table and not using the index.
The execution plan for the query with limit 1:
Apparently I was using the index and it does not go through the whole table
Table indexes:
This will probably run faster in both places:
SELECT COUNT(*) AS aggregate
FROM paquetes
WHERE paquetes.created_at >= "2021-10-17"
AND paquetes.created_at < "2021-10-17" + INTERVAL 1 MONTH + INTERVAL 1 DAY
AND paquetes.estado != 0;
The + INTERVAL 1 DAY was derived from your range, and may be a mistake.
More
The Optimizer, in general, has two ways to perform a query -- Use an index, or scan the table. As a simplified rule, if more than 20% of the rows are needed, it will prefer to simply scan the table; Explain will say "All". Otherwise, it will tell you which Index it picked.
When using an index, it will bounce between the Index's BTree and the Data's BTree. Since the Optimizer cannot precisely predict whether this "bouncing" will be faster or slower than simply reading all rows ('table scan') and ignoring the ones that don't meet the Where clause, it will sometimes pick the 'wrong' way to perform the query.
You are probably seeing the results of this "which way?" quandary when you change the date range.

How can I optimize my query (rank query)?

For the last two days, I have been asking questions on rank queries in Mysql. So far, I have working queries for
query all the rows from a table and order by their rank.
query ONLY one row with its rank
Here is a link for my question from last night
How to get a row rank?
As you might notice, btilly's query is pretty fast.
Here is a query for getting ONLY one row with its rank that I made based on btilly's query.
set #points = -1;
set #num = 0;
select * from (
SELECT id
, points
, #num := if(#points = points, #num, #num + 1) as point_rank
, #points := points as dummy
FROM points
ORDER BY points desc, id asc
) as test where test.id = 3
the above query is using subquery..so..I am worrying about the performance.
are there any other faster queries that I can use?
Table points
id points
1 50
2 50
3 40
4 30
5 30
6 20
Don't get into a panic about subqueries. Subqueries aren't always slow - only in some situations. The problem with your query is that it requires a full scan.
Here's an alternative that should be faster:
SELECT COUNT(DISTINCT points) + 1
FROM points
WHERE points > (SELECT points FROM points WHERE id = 3)
Add an index on id (I'm guessing that you probably you want a primary key here) and another index on points to make this query perform efficiently.