How to optimize query when using sub-queries in left join - mysql

Tables:
Please take a look here to see tables. How to query counting specific wins of team and find the winner of the series
Questions:
How to make query more optimize?
How to reduce query redundancy?
How to make this query more faster?
Summary
As you can see in the example query this part is use many times.
WHERE leagueid = 2096
AND start_time >= 1415938900
AND ((matches.radiant_team_id= 1848158 AND matches.dire_team_id= 15)
OR (matches.radiant_team_id= 15 AND matches.dire_team_id= 1848158))
SELECT matches.radiant_team_id,
matches.dire_team_id,
matches.radiant_name,
matches.dire_name,
TA.Count AS teamA,
TB.Count AS teamB,
TA.Count + TB.Count AS total_matches,
SUM(TA.wins),
SUM(TB.wins),
(CASE
WHEN series_type = 0 THEN 1
WHEN series_type = 1 THEN 2
WHEN series_type = 2 THEN 3
END) AS wins_goal
FROM matches
LEFT JOIN
(SELECT radiant_team_id,
COUNT(id) AS COUNT,
CASE
WHEN matches.radiant_team_id = radiant_team_id && radiant_win = 1 THEN 1
END AS wins
FROM matches
WHERE leagueid = 2096
AND start_time >= 1415938900
AND ((matches.radiant_team_id= 1848158
AND matches.dire_team_id= 15)
OR (matches.radiant_team_id= 15
AND matches.dire_team_id= 1848158))
GROUP BY radiant_team_id) AS TA ON TA.radiant_team_id = matches.radiant_team_id
LEFT JOIN
(SELECT dire_team_id,
COUNT(id) AS COUNT,
CASE
WHEN matches.dire_team_id = dire_team_id && radiant_win = 0 THEN 1
END AS wins
FROM matches
WHERE leagueid = 2096
AND start_time >= 1415938900
AND ((matches.radiant_team_id= 1848158
AND matches.dire_team_id= 15)
OR (matches.radiant_team_id= 15
AND matches.dire_team_id= 1848158))
GROUP BY dire_team_id) AS TB ON TB.dire_team_id = matches.dire_team_id
WHERE leagueid = 2096
AND start_time >= 1415938900
AND ((matches.radiant_team_id= 1848158
AND matches.dire_team_id= 15)
OR (matches.radiant_team_id= 15
AND matches.dire_team_id= 1848158))
GROUP BY series_id
Scheduled Matches
ID| leagueid| team_a_id| team_b_id| starttime
1| 2096| 1848158| 15| 1415938900

I believe it can be done without subqueries.
I made the following match table
And used the following query to group results, one line per series
SELECT
matches.leagueid,
matches.series_id,
matches.series_type,
COUNT(id) as matches,
IF(radiant_team_id=1848158,radiant_name, dire_name) AS teamA,
IF(radiant_team_id=1848158,dire_name, radiant_name) AS teamB,
SUM(CASE
WHEN radiant_team_id=1848158 AND radiant_win=1 THEN 1
WHEN dire_team_id=1848158 AND radiant_win=0 THEN 1
ELSE 0 END) AS teamAwin,
SUM(CASE
WHEN radiant_team_id=15 AND radiant_win=1 THEN 1
WHEN dire_team_id=15 AND radiant_win=0 THEN 1
ELSE 0 END) AS teamBwin
FROM `matches`
WHERE leagueid = 2096
AND start_time >= 1415938900
AND dire_team_id IN (15, 1848158)
AND radiant_team_id IN (15, 1848158)
group by leagueid,series_id,series_type,teamA,teamB
which yields the following result
Please note that, when grouping the results of one series, there isn't such thing as radiant team or dire team. The radiant and dire roles might be switched several times during the same series, so I only addressed the teams as teamA and teamB.
Now, looking at your prior question, I see that you need to determine the series winner based on the series type and each team victories. This would need to wrap the former query and use it as a subquery such as
SELECT matchresults.*,
CASE series_type
WHEN 0 then IF(teamAwin>=1, teamA,teamB)
WHEN 1 then IF(teamAwin>=2, teamA,teamB)
ELSE IF(teamAwin>=3, teamA,teamB)
END as winner
from ( THE_MAIN_QUERY) as matchresults

There may be more efficient ways to get the results you want. But, to make this query more efficient, you can add indexes. This is the repeated where clause:
WHERE leagueid = 2096 AND
start_time >= 1415938900 AND
((matches.radiant_team_id= 1848158 AND matches.dire_team_id= 15) OR
(matches.radiant_team_id= 15 AND matches.dire_team_id= 1848158))
Conditions with or are hard for the optimizer. The following index will be helpful: matches(leagueid, start_time). A covering index (for the where conditions at least) is matches(leagueid, start_time, radiant_team_id, dire_team_id). I would start with this latter index and see if that improves performance sufficiently for your purposes.

Related

How to Group Counted Data

I have 3 categories (Below SLA, Near SLA, Over SLA) that has different conditions, I try to count the data but the result is not summarized by their category
This is my query:
SELECT
B.province AS 'PROVINCE',
CASE
WHEN TIMEDIFF(A.deli_time, A.create_time) < '20:00:00' THEN COUNT(TIMEDIFF(A.deli_time, A.create_time))
END AS 'Below SLA',
CASE
WHEN (TIMEDIFF(A.deli_time, A.create_time) > '20:00:00') AND (TIMEDIFF(A.deli_time, A.create_time) < '24:00:00') THEN COUNT(TIMEDIFF(A.deli_time, A.create_time))
END AS 'NEAR SLA',
CASE
WHEN TIMEDIFF(A.deli_time, A.create_time) > '24:00:00' THEN COUNT(TIMEDIFF(A.deli_time, A.create_time))
END AS 'OVER SLA'
FROM
deli_order A
INNER JOIN
deli_order_delivery B on A.id = B.order_id
WHERE
(DATE(A.plat_create_time) BETWEEN '2019-03-30' AND'2019-04-07') AND (TIMEDIFF(A.deli_time, A.create_time) IS NOT NULL)
GROUP BY B.province;
and this is the result that i got:
Province | Below SLA | Near SLA | Over SLA
------------------------------------------------
Bali 30 Null Null
30 is the total of all the records of 'Bali', but its actually divided into 19 Below SLAs, 5 Near SLAs, and 6 Over SLAs.
What should i change in my query?
SELECT
B.province AS 'PROVINCE',
SUM(CASE
WHEN TIMEDIFF(A.deli_time, A.create_time) < '20:00:00' THEN 1
END) AS 'Below SLA',
Put an aggregate function for each case,OUTSIDE of it.I did it for just one case,it`s all the same.

How to get only latest record from different ranges?

I am looking at a case in which we have a number of tanks filled with liquid. The amount of liquid is measured and information is stored in a database. This update is done every 5 minutes. Here the following information is stored:
tankId
FillLevel
TimeStamp
Each tank is categorized in one of the following 'fill-level' ranges:
Range A: 0 - 40%
Range B: 40 - 75%
Range C: 75 - 100%
Per range I count the amount of events per tankId.
SELECT sum(
CASE
WHEN filllevel>=0 and filllevel<40
THEN 1
ELSE 0
END) AS 'Range A',
sum(
CASE
WHEN filllevel>=40 and filllevel<=79
THEN 1
ELSE 0
END) AS 'Range B',
sum(
CASE
WHEN filllevel>79 and filllevel<=100
THEN 1
ELSE 0
END) AS 'Range C'
FROM TEST ;
The challenge is to ONLY count the latest record for each tank. So for each tankId there is only one count (and that must be the record with the latest time stamp).
For the following data:
insert into tank_db1.`TEST` (ts, tankId, fill_level) values
('2017-08-11 03:31:18', 'tank1', 10),
('2017-08-11 03:41:18', 'tank1', 45),
('2017-08-11 03:51:18', 'tank1', 95),
('2017-08-11 03:31:18', 'tank2', 20),
('2017-08-11 03:41:18', 'tank2', 30),
('2017-08-11 03:51:18', 'tank2', 80),
('2017-08-11 03:31:18', 'tank3', 30),
('2017-08-11 03:41:18', 'tank3', 45),
('2017-08-11 03:51:18', 'tank4', 55);
I would expect the outcome to be (only the records with the latest timestamp per tankId are counted):
- RANGE A: 0
- RANGE B: 1 (tankdId 3)
- RANGE C: 2 (tankId 1 and tankId2)
Probably easy if you are an expert, but for me it is real hard to see what the options are.
Thanks
You can use the following query to get the latest per group timestamp value:
select tankId, max(ts) as max_ts
from test
group by tankId;
Output:
tankId max_ts
--------------------------------
1 tank1 11.08.2017 03:51:18
2 tank2 11.08.2017 03:51:18
3 tank3 11.08.2017 03:41:18
4 tank4 11.08.2017 03:51:18
Using the above query as a derived table you can extract the latest per group fill_level value. This way you can apply the logic that computes each range level:
select sum(
CASE
WHEN t1.fill_level>=0 and t1.fill_level<40
THEN 1
ELSE 0
END) AS 'Range A',
sum(
CASE
WHEN t1.fill_level>=40 and t1.fill_level<=79
THEN 1
ELSE 0
END) AS 'Range B',
sum(
CASE
WHEN t1.fill_level>79 and t1.fill_level<=100
THEN 1
ELSE 0
END) AS 'Range C'
from test as t1
join (
select tankId, max(ts) as max_ts
from test
group by tankId
) as t2 on t1.tankId = t2.tankId and t1.ts = t2.max_ts
Output:
Range A Range B Range C
---------------------------
1 0 2 2
Demo here
I get a different result (oh, well, same result as GB):
SELECT GROUP_CONCAT(CASE WHEN fill_level < 40 THEN x.tankid END) range_a
, GROUP_CONCAT(CASE WHEN fill_level BETWEEN 40 AND 75 THEN x.tankid END) range_b
, GROUP_CONCAT(CASE WHEN fill_level > 75 THEN x.tankid END) range_c
FROM test x
JOIN (SELECT tankid,MAX(ts) ts FROM test GROUP BY tankid) y
ON y.tankid = x.tankid AND y.ts = x.ts;
+---------+-------------+-------------+
| range_a | range_b | range_c |
+---------+-------------+-------------+
| NULL | tank3,tank4 | tank1,tank2 |
+---------+-------------+-------------+
EDIT:
If I was solving this problem, and wanted to include the tank names in the result, then I'd probably execute the following...
SELECT x.*
FROM test x
JOIN
( SELECT tankid,MAX(ts) ts FROM test GROUP BY tankid) y
ON y.tankid = x.tankid
AND y.ts = x.ts
...and handle all the other problems, concerning counts, ranges, and missing/'0' values in application code.

MYSQL - SUM values of last 10 days

I have this table in mysql:
| player1 | player2 | date | fs_1 | fs_2 |
Jack Tom 2015-03-02 10 2
Mark Riddley 2015-05-02 3 1
...
I need to know how many aces (fs_1) player 1 have done BEFORE the match reported in date_g (10 days before for example).
This is what i tried without success:
OPTION 1
SELECT
players_atp.name_p AS 'PLAYER 1',
P.name_p AS 'PLAYER 2',
DATE(date_g) AS 'DATE',
result_g AS 'RESULT',
FS_1,
FS_2,
SUM(IF(date_sub(date_g, interval 10 day)< date_g, FS_1, 0)) AS 'last 10 days'
FROM
stat_atp stat_atp
JOIN
backup3.players_atp ON ID1 = id_P
JOIN
backup3.players_atp P ON P.id_p = id2
JOIN
backup3.games_atp ON id1_g = id1 AND id2_g = id2
AND id_t_g = id_t
AND id_r_g = id_r
WHERE
date_g > '2015-01-01'
GROUP BY ID1;
OPTION 2
SELECT
players_atp.name_p AS 'PLAYER 1',
P.name_p AS 'PLAYER 2',
DATE(date_g) AS 'DATE',
result_g AS 'RESULT',
FS_1,
FS_2,
SUM(CASE WHEN date_g between date_g and date_sub(date_g, interval 10 day) then fs_1 else 0 end) AS 'last 10 days'
FROM
stat_atp stat_atp
JOIN
backup3.players_atp ON ID1 = id_P
JOIN
backup3.players_atp P ON P.id_p = id2
JOIN
backup3.games_atp ON id1_g = id1 AND id2_g = id2
AND id_t_g = id_t
AND id_r_g = id_r
WHERE
date_g > '2015-01-01'
GROUP BY ID1;
I have edited the code, now is more easy to read and understand.
SELECT
id1 AS 'PLAYER 1',
id2 AS 'PLAYER 2',
DATE(date_g) AS 'DATE',
result_g AS 'RESULT',
FS_1,
FS_2,
SUM(CASE
WHEN date_g BETWEEN date_g AND DATE_SUB(date_g, INTERVAL 10 DAY) THEN fs_1
END) AS 'last 20 days' FROM
stat_atp stat_atp
JOIN
backup3.games_atp ON id1_g = id1 AND id2_g = id2
AND id_t_g = id_t
AND id_r_g = id_r GROUP BY ID1;
Thanx in advance.
Maybe this could help you:
SELECT
id1,
SUM(fs_1)
FROM
stat_atp
WHERE
date_g <= DATE_SUB('2015-03-02', INTERVAL 1 DAY) AND date_g >= DATE_SUB('2015-03-02', INTERVAL 10 DAY)
AND
id1='Jack'
GROUP BY id1;
Remember that RDBMS are used to construct rigorous data sets that are linked between each others by clear ids (keys when talking about SQL). It's easier to respect the three first normal forms. That's why you should use keys to identify your match itself. By this way you could use subqueries (subsets) to achieve your goal.
Then, keep in mind that SQL is STRUCTURED. It's its force and weakness cause you won't be able to use it as a Turing complete programming langage with loops and conditions. But in any situation you will be able to find the same structure for a query. However you can interact with a SQL query result with another langage and use loops and condition on the result set itself. That's up to you.
Anyway, you may want to read about the MySQL GROUP BY clause which is different from the ISO SQL form : https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html

Count with Aggreate function in SQL

Hi my actual code is below:
Select M.TicketID,M.CreatedMoment, Max(L.StatusChangeMoment)AS StatusChangeTime,
Elapsed_time_in_Hours_Minutes = CONVERT(NUMERIC(18,2),DATEDIFF(minute, M.createdmoment, MAX(L.statuschangemoment))/60+(DATEDIFF(minute, M.createdmoment, MAX(L.statuschangemoment)) % 60/100.0))
From XX_MASTER_TICKETS AS M Left Join XX_DETAIL_TICKET_STATUS_LOG AS L
On M.RowID = L.TicketRowID
Where M.CreatedMoment between '08-23-2014' And '08-26-2014'
Group by M.TicketID,M.CreatedMoment
Order by M.TicketID asc
And the Partial Results is this:
TicketID CreatedMoment StatusChangeTime Elapsed_time_in_Hours_Minutes
201408231 8/23/14 8:05 AM 8/25/14 11:47 AM 51.42
2014082310 8/23/14 8:19 AM 8/23/14 12:43 PM 4.24
20140823100 8/23/14 8:38 AM 8/24/14 11:15 AM 26.37
20140823101 8/23/14 8:38 AM 8/23/14 11:58 AM 3.2
20140823102 8/23/14 8:38 AM 8/24/14 10:33 AM 25.55
Basically the statuschangetime came from aggregate function, and the last column is the difference of the 2nd and 3rd column.
I want to modify the query so the results will look like this:
Date below24Hrs above24hours
2014-8-23 2 3
My problem is i'm getting error when running this code:
Select
[below24hrs] = COUNT (Case WHEN (CONVERT(NUMERIC(18,2),DATEDIFF(minute, TM.createdmoment, MAX(LG.statuschangemoment))/60+(DATEDIFF(minute, TM.createdmoment, MAX(LG.statuschangemoment)) % 60/100.0))) < 24 THEN 1 END)
From XX_MASTER_TICKETS AS M Left Join XX_DETAIL_TICKET_STATUS_LOG AS L
On M.RowID = L.TicketRowID
Where M.CreatedMoment between '08-23-2014' And '08-26-2014'
Group by M.TicketID,M.CreatedMoment
Order by M.TicketID asc
It says cannot count with the MAX aggregrate function inside the query.
You need to use subquery and sum them.
select cast(sqry.CreatedMoment as date) as CreatedMoment
, sum(case when Elapsed_time_in_Hours_Minutes < 24 then 1 else 0 end) as below24Hrs
, sum(case when Elapsed_time_in_Hours_Minutes > 24 then 1 else 0 end) as above24Hrs
, sum(case when Elapsed_time_in_Hours_Minutes = 24 then 1 else 0 end) as At24Hrs
from
(
Select M.TicketID,M.CreatedMoment, Max(L.StatusChangeMoment)AS StatusChangeTime,
Elapsed_time_in_Hours_Minutes = CONVERT(NUMERIC(18,2),DATEDIFF(minute, M.createdmoment, MAX(L.statuschangemoment))/60+(DATEDIFF(minute, M.createdmoment, MAX(L.statuschangemoment)) % 60/100.0))
From XX_MASTER_TICKETS AS M Left Join XX_DETAIL_TICKET_STATUS_LOG AS L
On M.RowID = L.TicketRowID
Where M.CreatedMoment between '08-23-2014' And '08-26-2014'
Group by M.TicketID,M.CreatedMoment
Order by M.TicketID asc
) sqry
group by cast(CreatedMoment as date)

SQL - Using sum but optionally using default value for row

Given tables
asset
col - id
date_sequence
col - date
daily_history
col - date
col - num_error_seconds
col - asset_id
historical_event
col - start_date
col - end_date
col - asset_id
I'm trying to count up all the daily num_error_seconds for all assets in a given time range in order to display "Percentage NOT in error" by day. The catch is if there is a historical_event involving an asset that has an end_date beyond the sql query range, then daily_history should be ignored and a default value of 86400 seconds (one day of error_seconds) should be used for that asset
The query I have that does not use the historical_event is:
select ds.date,
IF(count(dh.time) = 0,
100,
100 - (100*sum(dh.num_error_seconds) / (86400 * count(*)))
) percent
from date_sequence ds
join asset a
left join daily_history dh on dh.date = ds.date and dh.asset_id=a.asset_id
where ds.date >= in_start_time and ds.date <= in_end_time
group by ds.thedate;
To build on this is beyond my SQL knowledge. Because of the aggregate function, I cannot simply inject 86400 seconds for each asset that is associated with an event that has an end_date beyond the in_end_time.
Sample Data
Asset
1
2
Date Sequence
2013-09-01
2013-09-02
2013-09-03
2013-09-04
Daily History
2013-09-01, 1400, 1
2013-09-02, 1501, 1
2013-09-03, 1420, 1
2013-09-04, 0, 1
2013-09-01, 10000, 2
2013-09-02, 20000, 2
2013-09-03, 30000, 2
2013-09-04, 40000, 2
Historical Event
start_date, end_date, asset_id
2013-09-03 12:01:03, 2014-01-01 00:00:00, 1
What I would expect to see with this sample data is a % of time these assets are in error
2013-09-01 => 100 - (100*(1400 + 10000))/(86400*2)
2013-09-02 => 100 - (100*(1501 + 20000))/(86400*2)
2013-09-03 => 100 - (100*(1420 + 30000))/(86400*2)
2013-09-04 => 100 - (100*(0 + 40000))/(86400*2)
Except: There was a historical event which should take precendence. It happened on 9/3 and is open-ended (has an end date in the future, so the calculations would change to:
2013-09-01 => 100 - (100*(1400 + 10000))/(86400*2)
2013-09-02 => 100 - (100*(1501 + 20000))/(86400*2)
2013-09-03 => 100 - (100*(86400 + 30000))/(86400*2)
2013-09-04 => 100 - (100*(86400 + 40000))/(86400*2)
Asset 1's num_error_seconds gets overwritten with a full day of error seconds if there is a historical event that has a start_date before 'in_end_time' and an end_time after the in_end_time
Can this be accomplished in one query? Or do I need to stage data with an initial query?
I think you're after something like this:
Select
ds.date,
100 - 100 * Sum(
case
when he.asset_id is not null then 86400 -- have a historical_event
when dh.num_error_seconds is null then 0 -- no daily_history record
else dh.num_error_seconds
end
) / 86400 / count(a.id) as percent -- need to divide by number of assets
From
date_sequence ds
cross join
asset a
left outer join
daily_history dh
on a.id = dh.asset_id and
ds.date = dh.date
left outer join (
select distinct -- avoid counting multiple he records
asset_id
from
historical_event he
Where
he.end_date > in_end_time
) he
on a.id = he.asset_id
Where
ds.date >= in_start_time and
ds.date <= in_end_time -- I'd prefer < here
Group By
ds.date
Example Fiddle