mySQL Optimisation of query - mysql

I'm struggling to optimise this SQL code, can someone please help? Ideally I need the data outputted exactly the same to avoid any php coding.
I basically want to be able to group tariff_freemins by a set range. The below code works but it takes over 2 seconds to run which for a website isn't really quick enough. Is there an optimised version of the code. The table has an index on tariff_freemins setup.
SELECT 'Free Mins' as category,
'tariff_freemins' as fieldname,
t.range as `element`,
count(*) as phone_count
from ( select deal_count,
case
when tariff_freemins between 0 and 200 then 'a0TO200'
when tariff_freemins between 200 and 400 then 'b200TO400'
when tariff_freemins between 400 and 600 then 'c400TO600'
when tariff_freemins between 600 and 1000 then 'd600TO1000'
when tariff_freemins between 1000 and 2000 then 'e1000TO2000'
when tariff_freemins >2001 then 'hUnlimited'
end as `range`
from options
) t group by t.range
The output should be something like the below:
"category" "fieldname" "element" "phone_count"
"Data Allowance" "tariff_dataallowance" "a0TO1" "289716"
"Data Allowance" "tariff_dataallowance" "b1TO2" "64472"
"Data Allowance" "tariff_dataallowance" "c2TO5" "114685"
"Data Allowance" "tariff_dataallowance" "d5TO11" "33305"
"Data Allowance" "tariff_dataallowance" "e11TO20" "36798"
"Data Allowance" "tariff_dataallowance" "f20TO50" "5839"
"Data Allowance" "tariff_dataallowance" "hUnlimited" "51114"
UPDATE ////////////////////
Secondly, I'm using many of the above mySQL queries to generate a master table. Each of these have separate group by's and are joined using a UNION ALL (See below). As you can see there is a where clause which states "Where type = 'Contract'". I can only assume there is a way to using the same filtered options table on all of the group by queries? How would this look? Is there a better way to optimise this? Thanks!
SELECT 'Phone Cost' as category,'offer_phonecost' as fieldname, offer_phonecost_range as `element`, count(*) as phone_count from
options
WHERE 1 AND type = 'Contract' group by offer_phonecost_range UNION ALL
SELECT 'Monthly Cost' as category,'offer_offerental' as fieldname, offer_offerental_range as `element`, count(*) as phone_count from
options
WHERE 1 AND type = 'Contract' GROUP BY offer_offerental_range UNION ALL
SELECT 'Data Allowance' as category,'tariff_dataallowance' as fieldname, tariff_dataallowance_range as `element`, count(*) as phone_count from
options
WHERE 1 AND type = 'Contract' GROUP BY tariff_dataallowance_range

Create this index:
create index opt_tariff on options( tariff_freemins );
and rewrite the query to this:
select 'Free Mins' as category,
'tariff_freemins' as fieldname,
case
when tariff_freemins between 0 and 200 then 'a0TO200'
when tariff_freemins between 200 and 400 then 'b200TO400'
when tariff_freemins between 400 and 600 then 'c400TO600'
when tariff_freemins between 600 and 1000 then 'd600TO1000'
when tariff_freemins between 1000 and 2000 then 'e1000TO2000'
when tariff_freemins >2001 then 'hUnlimited'
end as `element`,
sum(cnt) as phone_count
from (
select tariff_freemins, count(*) As cnt
from options
group by tariff_freemins
) x
group by element
;
The inner subquery uses the index to optimize GROUP BY and it is fast.
The outer query cannot be optimized - it cannot be in MySql, since MySql does not support expression based indexes.
However the outer query performs an aggregation on much smaller set of data, that is already preagregated by the inner subquery, and the whole query should be faster than your's version.
Queries like this can be easily optimized in other RDBMS like Oracle, PostgreSQL and SQL-Server, using expression based indexes.

Related

Optimizing Parameterized MySQL Queries

I have a query that has a number of parameters which if I run from in MySQLWorkbench takes around a second to run.
If I take this query and get rid of the parameters and instead substitute the values into the query then it takes about 22 seconds to run, same as If I convert this query to a parameterized stored procedure and run it (it then takes about 22 seconds).
I've enabled profiling on MySQL and I can see a few things there. For example, it shows the number of rows examined and there's an order of difference (20,000 to 400,000) which I assume is the reason for the 20x increase in processing time.
The other difference in the profile is that the parameterized query sent from MySQLWorkbench still has the parameters in (e.g. where limit < #lim) while the sproc the values have been set (where limit < 300).
I've tried this a number of different ways, I'm using JetBrains's DataGrip (as well as MySQLWorkbench) and that works like MySQLWorkbench (sends through the # parameters), I've tried executing the queries and the sproc from MySQLWorkbench, DataGrip, Java (JDBC) and .Net. I've also tried prepared statements in Java but I can't get anywhere near the performance of sending the 'raw' SQL to MySQL.
I feel like I'm missing something obvious here but I don't know what it is.
The query is relatively complex, it has a CTE a couple of sub-selects and a couple of joins, but as I said it runs quickly straight from MySQL.
My main question is why the query is 20x faster in one format than another.
Does the way the query is sent to MySQL have anything to do with this (the '#' values sent through and can I replicate this in a stored procedure?
Updated 1st Jan
Thanks for the comments, I didn't post the query originally as I'm more interested in the general concepts around the use of variables/parameters and how I could take advantage of that (or not)
Here is the original query:
with tmp_bat as (select bd.MatchId,
bd.matchtype,
bd.playerid,
bd.teamid,
bd.opponentsid,
bd.inningsnumber,
bd.dismissal,
bd.dismissaltype,
bd.bowlerid,
bd.fielderid,
bd.score,
bd.position,
bd.notout,
bd.balls,
bd.minutes,
bd.fours,
bd.sixes,
bd.hundred,
bd.fifty,
bd.duck,
bd.captain,
bd.wicketkeeper,
m.hometeamid,
m.awayteamid,
m.matchdesignator,
m.matchtitle,
m.location,
m.tossteamid,
m.resultstring,
m.whowonid,
m.howmuch,
m.victorytype,
m.duration,
m.ballsperover,
m.daynight,
m.LocationId
from (select *
from battingdetails
where matchid in
(select id
from matches
where id in (select matchid from battingdetails)
and matchtype = #match_type
)) as bd
join matches m on m.id = bd.matchid
join extramatchdetails emd1
on emd1.MatchId = m.Id
and emd1.TeamId = bd.TeamId
join extramatchdetails emd2
on emd2.MatchId = m.Id
and emd2.TeamId = bd.TeamId
)
select players.fullname name,
teams.teams team,
'' opponents,
players.sortnamepart,
innings.matches,
innings.innings,
innings.notouts,
innings.runs,
HS.score highestscore,
HS.NotOut,
CAST(TRUNCATE(innings.runs / (CAST((Innings.Innings - innings.notOuts) AS DECIMAL)),
2) AS DECIMAL(7, 2)) 'Avg',
innings.hundreds,
innings.fifties,
innings.ducks,
innings.fours,
innings.sixes,
innings.balls,
CONCAT(grounds.CountryName, ' - ', grounds.KnownAs) Ground,
'' Year,
'' CountryName
from (select count(case when inningsnumber = 1 then 1 end) matches,
count(case when dismissaltype != 11 and dismissaltype != 14 then 1 end) innings,
LocationId,
playerid,
MatchType,
SUM(score) runs,
SUM(notout) notouts,
SUM(hundred) Hundreds,
SUM(fifty) Fifties,
SUM(duck) Ducks,
SUM(fours) Fours,
SUM(sixes) Sixes,
SUM(balls) Balls
from tmp_bat
group by MatchType, playerid, LocationId) as innings
JOIN players ON players.id = innings.playerid
join grounds on Grounds.GroundId = LocationId and grounds.MatchType = innings.MatchType
join
(select pt.playerid, t.matchtype, GROUP_CONCAT(t.name SEPARATOR ', ') as teams
from playersteams pt
join teams t on pt.teamid = t.id
group by pt.playerid, t.matchtype)
as teams on teams.playerid = innings.playerid and teams.matchtype = innings.MatchType
JOIN
(SELECT playerid,
LocationId,
MAX(Score) Score,
MAX(NotOut) NotOut
FROM (SELECT battingdetails.playerid,
battingdetails.score,
battingdetails.notout,
battingdetails.LocationId
FROM tmp_bat as battingdetails
JOIN (SELECT battingdetails.playerid,
battingdetails.LocationId,
MAX(battingdetails.Score) AS score
FROM tmp_bat as battingdetails
GROUP BY battingdetails.playerid,
battingdetails.LocationId,
battingdetails.playerid) AS maxscore
ON battingdetails.score = maxscore.score
AND battingdetails.playerid = maxscore.playerid
AND battingdetails.LocationId = maxscore.LocationId ) AS internal
GROUP BY internal.playerid, internal.LocationId) AS HS
ON HS.playerid = innings.playerid and hs.LocationId = innings.LocationId
where innings.runs >= #runs_limit
order by runs desc, KnownAs, SortNamePart
limit 0, 300;
Wherever you see '#match_type' then I substitute that for a value ('t'). This query takes ~1.1 secs to run. The query with the hard coded values rather than the variables down to ~3.5 secs (see the other note below). The EXPLAIN for this query gives this:
1,PRIMARY,<derived7>,,ALL,,,,,219291,100,Using temporary; Using filesort
1,PRIMARY,players,,eq_ref,PRIMARY,PRIMARY,4,teams.playerid,1,100,
1,PRIMARY,<derived2>,,ref,<auto_key3>,<auto_key3>,26,"teams.playerid,teams.matchtype",11,100,Using where
1,PRIMARY,grounds,,ref,GroundId,GroundId,4,innings.LocationId,1,10,Using where
1,PRIMARY,<derived8>,,ref,<auto_key0>,<auto_key0>,8,"teams.playerid,innings.LocationId",169,100,
8,DERIVED,<derived3>,,ALL,,,,,349893,100,Using temporary
8,DERIVED,<derived14>,,ref,<auto_key0>,<auto_key0>,13,"battingdetails.PlayerId,battingdetails.LocationId,battingdetails.Score",10,100,Using index
14,DERIVED,<derived3>,,ALL,,,,,349893,100,Using temporary
7,DERIVED,t,,ALL,PRIMARY,,,,3323,100,Using temporary; Using filesort
7,DERIVED,pt,,ref,TeamId,TeamId,4,t.Id,65,100,
2,DERIVED,<derived3>,,ALL,,,,,349893,100,Using temporary
3,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where
3,DERIVED,m,,eq_ref,PRIMARY,PRIMARY,4,matches.Id,1,100,
3,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index
3,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index
3,DERIVED,battingdetails,,ref,"TeamId,MatchId,match_team",match_team,8,"emd1.TeamId,matches.Id",15,100,
3,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails)
and the EXPLAIN for the query with the hardcoded values looks like this:
1,PRIMARY,<derived8>,,ALL,,,,,20097,100,Using temporary; Using filesort
1,PRIMARY,players,,eq_ref,PRIMARY,PRIMARY,4,HS.PlayerId,1,100,
1,PRIMARY,grounds,,ref,GroundId,GroundId,4,HS.LocationId,1,100,Using where
1,PRIMARY,<derived2>,,ref,<auto_key0>,<auto_key0>,30,"HS.LocationId,HS.PlayerId,grounds.MatchType",17,100,Using where
1,PRIMARY,<derived7>,,ref,<auto_key0>,<auto_key0>,46,"HS.PlayerId,innings.MatchType",10,100,Using where
8,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where; Using temporary
8,DERIVED,m,,eq_ref,"PRIMARY,LocationId",PRIMARY,4,matches.Id,1,100,
8,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index
8,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index
8,DERIVED,<derived14>,,ref,<auto_key2>,<auto_key2>,4,m.LocationId,17,100,
8,DERIVED,battingdetails,,ref,"PlayerId,TeamId,Score,MatchId,match_team",MatchId,8,"matches.Id,maxscore.PlayerId",1,3.56,Using where
8,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails)
14,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where; Using temporary
14,DERIVED,m,,eq_ref,PRIMARY,PRIMARY,4,matches.Id,1,100,
14,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index
14,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index
14,DERIVED,battingdetails,,ref,"TeamId,MatchId,match_team",match_team,8,"emd1.TeamId,matches.Id",15,100,
14,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails)
7,DERIVED,t,,ALL,PRIMARY,,,,3323,100,Using temporary; Using filesort
7,DERIVED,pt,,ref,TeamId,TeamId,4,t.Id,65,100,
2,DERIVED,matches,,ALL,PRIMARY,,,,114162,10,Using where; Using temporary
2,DERIVED,m,,eq_ref,PRIMARY,PRIMARY,4,matches.Id,1,100,
2,DERIVED,emd1,,ref,"PRIMARY,TeamId",PRIMARY,4,matches.Id,1,100,Using index
2,DERIVED,emd2,,eq_ref,"PRIMARY,TeamId",PRIMARY,8,"matches.Id,emd1.TeamId",1,100,Using index
2,DERIVED,battingdetails,,ref,"TeamId,MatchId,match_team",match_team,8,"emd1.TeamId,matches.Id",15,100,
2,DERIVED,battingdetails,,ref,MatchId,MatchId,4,matches.Id,31,100,Using index; FirstMatch(battingdetails)
Pointers as to ways to improve my SQL are always welcome (I'm definitely not a database person), but I''d still like to understand whether I can use the SQL with the variables from code and why that improves the performance by so much
Update 2 1st Jan
AAArrrggghhh. My machine rebooted overnight and now the queries are generally running much quicker. It's still 1 sec vs 3 secs but the 20 times slowdown does seem to have disappeared
In your WITH construct, are you overthinking your select in ( select in ( select in ))) ... overstating what could just be simplified to the with Innings I have in my solution.
Also, you were joining to the extraMatchDetails TWICE, but joined on the same conditions on match and team, but never utliized either of those tables in the "WITH CTE" rendering that component useless, doesn't it? However, the MATCH table has homeTeamID and AwayTeamID which is what I THINK your actual intent was
Also, your WITH CTE is pulling many columns not needed or used in subsequent return such as Captain, WicketKeeper.
So, I have restructured... pre-query the batting details once up front and summarized, then you should be able to join off that.
Hopefully this MIGHT be a better fit, function and performance for your needs.
with innings as
(
select
bd.matchId,
bd.matchtype,
bd.playerid,
m.locationId,
count(case when bd.inningsnumber = 1 then 1 end) matches,
count(case when bd.dismissaltype in ( 11, 14 ) then 0 else 1 end) innings,
SUM(bd.score) runs,
SUM(bd.notout) notouts,
SUM(bd.hundred) Hundreds,
SUM(bd.fifty) Fifties,
SUM(bd.duck) Ducks,
SUM(bd.fours) Fours,
SUM(bd.sixes) Sixes,
SUM(bd.balls) Balls
from
battingDetails bd
join Match m
on bd.MatchID = m.MatchID
where
matchtype = #match_type
group by
bd.matchId,
bd.matchType,
bd.playerid,
m.locationId
)
select
p.fullname playerFullName,
p.sortnamepart,
CONCAT(g.CountryName, ' - ', g.KnownAs) Ground,
t.team,
i.matches,
i.innings,
i.runs,
i.notouts,
i.hundreds,
i.fifties,
i.ducks,
i.fours,
i.sixes,
i.balls,
CAST( TRUNCATE( i.runs / (CAST((i.Innings - i.notOuts) AS DECIMAL)), 2) AS DECIMAL(7, 2)) 'Avg',
hs.maxScore,
hs.maxNotOut,
'' opponents,
'' Year,
'' CountryName
from
innings i
JOIN players p
ON i.playerid = p.id
join grounds g
on i.locationId = g.GroundId
and i.matchType = g.matchType
join
(select
pt.playerid,
t.matchtype,
GROUP_CONCAT(t.name SEPARATOR ', ') team
from
playersteams pt
join teams t
on pt.teamid = t.id
group by
pt.playerid,
t.matchtype) as t
on i.playerid = t.playerid
and i.MatchType = t.matchtype
join
( select
i2.playerid,
i2.locationid,
max( i2.score ) maxScore,
max( i2.notOut ) maxNotOut
from
innings i2
group by
i2.playerid,
i2.LocationId ) HS
on i.playerid = HS.playerid
AND i.locationid = HS.locationid
FROM
where
i.runs >= #runs_limit
order by
i.runs desc,
g.KnownAs,
p.SortNamePart
limit
0, 300;
Now, I know that you stated that after the server reboot, performance is better, but really, what you DO have appears to really have overbloated queries.
Not sure this is the correct answer but I thought I'd post this in case other people have the same issue.
The issue seems to be the use of CTEs in a stored procedure. I have a query that creates a CTE and then uses that CTE 8 times. If I run this query using interpolated variables it takes about 0.8 sec, if I turn it into a stored procedure and use the stored procedure parameters then it takes about to a minute (between 45 and 63 seconds) to run!
I've found a couple of ways of fixing this, one is to use multiple temporary tables (8 in this case) as MySQL cannot re-use a temp table in a query. This gets the query time right down but just doesn't fell like a maintainable or scalable solution. The other fix is to leave the variables in place and assign them from the stored procedure parameters, this also has no real performance issues. So my sproc looks like this:
create procedure bowling_individual_career_records_by_year_for_team_vs_opponent(IN team_id INT,
IN opponents_id INT)
begin
set #team_id = team_id;
set #opponents_id = opponents_id;
# use these variables in the SQL below
...
end
Not sure this is the best solution but it works for me and keeps the structure of the SQL the same as it was previously.

why my sql query slow?

I try to create a view which join from 4 tables (tb_user is 200 row, tb_transaction is 250.000 row, tb_transaction_detail is 250.000 row, tb_ms_location is 50 row),
when i render with datatables serverside, it's take 13 secons. even when I filtering it.
I don't know why it's take too long...
here my sql query
CREATE VIEW `vw_cashback` AS
SELECT
`tb_user`.`nik` AS `nik`,
`tb_user`.`full_name` AS `nama`,
`tb_ms_location`.`location_name` AS `lokasi`,
`tb_transaction`.`date_transaction` AS `tanggal_setor`,
sum(CASE WHEN `tb_transaction_detail`.`vehicle_type`=1 THEN 1 ELSE 0 END) AS `mobil`,
sum(CASE WHEN `tb_transaction_detail`.`vehicle_type`=2 THEN 1 ELSE 0 END) AS `motor`,
sum(CASE WHEN `tb_transaction_detail`.`vehicle_type`=3 THEN 1 ELSE 0 END) AS `truck`,
sum(CASE WHEN `tb_transaction_detail`.`vehicle_type`=4 THEN 1 ELSE 0 END) AS `speda`,
sum(`tb_transaction_detail`.`total`) AS `total_global`,
(sum(`tb_transaction_detail`.`total`) * 0.8) AS `total_user`,
(sum(`tb_transaction_detail`.`total`) * 0.2) AS `total_tgr`,
((sum(`tb_transaction_detail`.`total`) * 0.2) / 2) AS `total_cashback`,
(curdate() - cast(`tb_user`.`created_at` AS date)) AS `status`
FROM `tb_user`
JOIN `tb_transaction` ON `tb_user`.`id` = `tb_transaction`.`user_id`
JOIN `tb_transaction_detail` ON `tb_transaction`.`id` = `tb_transaction_detail`.`transaction_id`
JOIN `tb_ms_location` ON `tb_ms_location`.`id` = `tb_transaction`.`location_id`
GROUP BY
`tb_user`.`id`,
`tb_transaction`.`date_transaction`,
`tb_user`.`nik`,
`tb_user`.`full_name`,
`tb_user`.`created_at`,
`tb_ms_location`.`location_name`
thanks
The unfiltered query must be slow, because it takes all records from all tables, joins and aggregates them.
But you say the view is still slow when you filter. The question is: How do you filter? As you are aggregating by user, location and transaction date, it should be one of these. However, you don't have the user ID or the transaction ID in your result list. This doesn't feel natural and I'd suggest you add them, so a query like
select * from vw_cashback where user_id = 5
or
select * from vw_cashback where transaction_id = 12345
would be possible.
As is, you'd have to filter by location name or user nik / name. So if you want it thus, then create Indexes for the lookup:
CREATE idx_location_name ON tb_ms_location(location_name, id)
CREATE idx_user_name ON tb_user(full_name, id)
CREATE idx_user_nik ON tb_user(nik, id)
The latter two can even be turned into covering indexs (i.e. indexes containing all columns used in the query) that may still speed up the process:
CREATE idx_user_name ON tb_user(nik, id, full_name, created_at);
CREATE idx_user_nik ON tb_user(full_name, id, nik, created_at);
As for the access via index, you also may want covering indexes:
CREATE idx_location_id ON tb_ms_location(id, location_name)
CREATE idx_user_id ON tb_user(id, nik, full_name, created_at);

Optimize query mysql search

I have the following SQL but its execution this very slow, takes about 45 seconds, the table has 15 million record, how can I improve?
SELECT A.*, B.ESPECIE
FROM
(
SELECT
A.CODIGO_DOCUMENTO,
A.DOC_SERIE,A.DATA_EMISSAO,
A.DOC_NUMERO,
A.CF_NOME,
A.CF_SRF,
A.TOTAL_DOCUMENTO,
A.DOC_MODELO
FROM MOVIMENTO A
WHERE
A.CODIGO_EMPRESA = 1
AND A.CODIGO_FILIAL = 5
AND A.DOC_TIPO_MOVIMENTO = 1
AND A.DOC_MODELO IN ('65','55')
AND (A.CF_NOME LIKE '%TEXT_SEARCH%'
OR A.CF_CODIGO LIKE 'TEXT_SEARCH%'
OR A.CF_SRF LIKE 'TEXT_SEARCH%'
OR A.DOC_SERIE LIKE 'TEXT_SEARCH%'
OR A.DOC_NUMERO LIKE 'TEXT_SEARCH%')
ORDER BY A.DATA_EMISSAO DESC , A.CODIGO_DOCUMENTO DESC
LIMIT 0, 100
) A
LEFT JOIN MODELODOCUMENTOFISCAL B ON A.DOC_MODELO = B.CODMODELO
For this query, I would start with an index on MOVIMENTO(CODIGO_EMPRESA, CODIGO_FILIAL, DOC_MODELO) and MODELODOCUMENTOFISCAL(CODMODELO).
That should speed the query.
If it doesn't you may need to consider a full text search to handle the LIKE clauses. I do note that you only have a wildcard at the beginning of one of the patterns. Is that intentional?

What is a better way to process data?

I have a table with around 10 million rows and 47 columns in Oracle. I do some processing on them before converting the data into JSON and transporting it to view layer. The processing is mostly select() grouping by various columns. This processing by select() is done 5 times with each time differently grouped columns. Now this is taking a lot of time. Is there any way to speed up the process?
I was thinking about pumping data from table into a csv file and processing it and then converting data into JSON to send it. Am I thinking into right direction. Please help.
The 5 queries I use are below for better understanding.
select sum(case when LOWER(column1) LIKE 'succeeded' then 1 else 0 end)/count(*))
from tablename where (TIME_STAMP between 'startTime' and 'endTime')
select column2,sum(case when LOWER(column1) LIKE 'succeeded' then 1 else 0 end)/count(*))
from tablename where (TIME_STAMP between 'startTime' and 'endTime') group by column2;
select column2,column3,sum(case when LOWER(column1) LIKE 'succeeded' then 1 else 0 end)/count(*))
from tablename where (TIME_STAMP between 'startTime' and 'endTime') group by column2,column3;
select column4,column3,sum(case when LOWER(column1) LIKE 'succeeded' then 1 else 0 end)/count(*))
from tablename where (TIME_STAMP between 'startTime' and 'endTime') group by column4,column3;
select column5,column4,column3,sum(case when LOWER(column1) LIKE 'succeeded' then 1 else 0 end)/count(*))
from tablename where (TIME_STAMP between 'startTime' and 'endTime') group by column5,column4,column3;
The result set is combined with the help of JSON and sent to View layer.
EDIT1: There are going to be multiple connections(5-20) to this database. Each connection executing these same queries.

query optimization for mysql

I have the following query which takes about 28 seconds on my machine. I would like to optimize it and know if there is any way to make it faster by creating some indexes.
select rr1.person_id as person_id, rr1.t1_value, rr2.t0_value
from (select r1.person_id, avg(r1.avg_normalized_value1) as t1_value
from (select ma1.person_id, mn1.store_name, avg(mn1.normalized_value) as avg_normalized_value1
from matrix_report1 ma1, matrix_normalized_notes mn1
where ma1.final_value = 1
and (mn1.normalized_value != 0.2
and mn1.normalized_value != 0.0 )
and ma1.user_id = mn1.user_id
and ma1.request_id = mn1.request_id
and ma1.request_id = 4 group by ma1.person_id, mn1.store_name) r1
group by r1.person_id) rr1
,(select r2.person_id, avg(r2.avg_normalized_value) as t0_value
from (select ma.person_id, mn.store_name, avg(mn.normalized_value) as avg_normalized_value
from matrix_report1 ma, matrix_normalized_notes mn
where ma.final_value = 0 and (mn.normalized_value != 0.2 and mn.normalized_value != 0.0 )
and ma.user_id = mn.user_id
and ma.request_id = mn.request_id
and ma.request_id = 4
group by ma.person_id, mn.store_name) r2
group by r2.person_id) rr2
where rr1.person_id = rr2.person_id
Basically, it aggregates data depending on the request_id and final_value (0 or 1). Is there a way to simplify it for optimization? And it would be nice to know which columns should be indexed. I created an index on user_id and request_id, but it doesn't help much.
There are about 4907424 rows on matrix_report1 and 335740 rows on matrix_normalized_notes table. These tables will grow as we have more requests.
First, the others are right about knowing better how to format your samples. Also, trying to explain in plain language what you are trying to do is also a benefit. With sample data and sample result expectations is even better.
However, that said, I think it can be significantly simplified. Your queries are almost completely identical with the exception of the one field of "final_value" = 1 or 0 respectively. Since each query will result in 1 record per "person_id", you can just do the average based on a CASE/WHEN AND remove the rest.
To help optimize the query, your matrix_report1 table should have an index on ( request_id, final_value, user_id ). Your matrix_normalized_notes table should have an index on ( request_id, user_id, store_name, normalized_value ).
Since your outer query is doing the average based on an per stores averages, you do need to keep it nested. The following should help.
SELECT
r1.person_id,
avg(r1.ANV1) as t1_value,
avg(r1.ANV0) as t0_value
from
( select
ma1.person_id,
mn1.store_name,
avg( case when ma1.final_value = 1
then mn1.normalized_value end ) as ANV1,
avg( case when ma1.final_value = 0
then mn1.normalized_value end ) as ANV0
from
matrix_report1 ma1
JOIN matrix_normalized_notes mn1
ON ma1.request_id = mn1.request_id
AND ma1.user_id = mn1.user_id
AND NOT mn1.normalized_value in ( 0.0, 0.2 )
where
ma1.request_id = 4
AND ma1.final_Value in ( 0, 1 )
group by
ma1.person_id,
mn1.store_name) r1
group by
r1.person_id
Notice the inner query is pulling all transactions for the final value as either a zero OR one. But then, the AVG is based on a case/when of the respective value for the normalized value. When the condition is NOT the 1 or 0 respectively, the result is NULL and is thus not considered when the average is computed.
So at this point, it is grouped on a per-person basis already with each store and Avg1 and Avg0 already set. Now, roll these values up directly per person regardless of the store. Again, NULL values should not be considered as part of the average computation. So, if Store "A" doesn't have a value in the Avg1, it should not skew the results. Similarly if Store "B" doesnt have a value in Avg0 result.