How can I revise the following query with subqueries composed of functions and join queries in functions. I want to append extra values to my main query that relies on main tables and two primary joins (risks, users) on several occasions.
Creating a MCVRE (Minimal Complete Verifiable Reproduceable Example) proved to be somewhat challenging because of request sent to SQL Fiddle has too many rows (too many text characters) After removing nearly all rows on main two tables ( users, risks ) I ended up with a running query.
The Fiddle (http://www.sqlfiddle.com/#!9/1d52a0/17) create functions and insertion of data commands have reduced rows from actual example on my local pc due to character count of 8000 being exceeded for request payload for SQLFiddle to understand.
Actual table has about 100 rows for risks, and 20 or so rows for users and takes about 3 seconds to run
What can I do to speed up query, via staving desired function results in table, or by revision, index insertion, movement of joins to outer main query, or even using stored procedure, or rewriting query structure, to reduce execution time to possibly half the time or less optimistically. SQL fiddle does not take all rows needed so I pasted a very limited subset, Even SQLFiddle query (see total select query below) does not run, due to Stack Overflow (pun partially intended).
http://www.sqlfiddle.com/#!9/1d52a0/17
Base Queries that do run on the fiddle (see fiddle)
select * from users;
select * from risks;
select * from riskevents;
select * from riskmatrixthresholds;
select * from risklevels;
#significantly minimized result set but still query does not run due to stack overflow issue on sql fiddle - see fiddle result (on bottom most portion of fiddle query output)
SELECT r.RiskID,
r.CreatorID,
r.OwnerID,
r.ApproverID,
r.RiskTitle,
r.RiskStatement,
r.ClosureCriteria,
r.RiskState,
r.Context AS 'Context',
GetRiskUserLastOrFirstName(GetRiskUserID('Creator', r.RiskID,0),r.RiskID, 'Last','') AS 'creator.lastname',
GetRiskUserLastOrFirstName(GetRiskUserID('Creator', r.RiskID,0),r.RiskID, 'First','') AS 'creator.firstname',
GetRiskUserLastOrFirstName(GetRiskUserID('Owner', r.RiskID,0),r.RiskID, 'Last','') AS 'owner.lastname',
GetRiskUserLastOrFirstName(GetRiskUserID('Owner', r.RiskID,0),r.RiskID, 'First','') AS 'owner.firstname',
GetRiskUserLastOrFirstName(GetRiskUserID('Approver',r.RiskID,0),r.RiskID, 'Last','') AS 'approver.lastname',
GetRiskUserLastOrFirstName(GetRiskUserID('Approver',r.RiskID,0),r.RiskID, 'First','') AS 'approver.firstname',
r.Likelihood AS 'OriginalLikelihood',
r.Technical AS 'OriginalTechnical',
r.Schedule AS 'OriginalSchedule',
r.Cost AS 'OriginalCost',
GREATEST(r.Technical, r.Schedule, r.Cost) AS 'OriginalConsequence',
RiskValue(r.Likelihood, GREATEST(r.Technical, r.Schedule, r.Cost),0) AS 'OriginalValue',
RiskLevel(RiskValue(r.Likelihood, GREATEST(r.Technical, r.Schedule, r.Cost),0),'') AS 'OriginalLevel',
LatestEventDate(r.RiskID, r.AssessmentDate,'') AS 'LatestEventDate',
r.AssessmentDate AS 'AssessmentDate',
(SELECT CurrentLikelihood(r.RiskID,0)) AS 'CurrentLikelihood',
(SELECT CurrentConsequence(r.RiskID,0)) AS 'CurrentConsequence',
(SELECT CurrentRiskValue(r.RiskID,0)) AS 'CurrentValue',
(SELECT RiskLevel(CurrentRiskValue(r.RiskID,0),'')) AS 'CurrentLevel'
FROM risks r;
Create Function Script
CREATE TABLE `riskevents` (
`ID` int NOT NULL AUTO_INCREMENT,
`EventID` int ,
`RiskID` int ,
`EventTitle` text,
`EventStatus` varchar(10) ,
`EventOwnerID` int ,
`ActualDate` date ,
`ScheduleDate` date ,
`BaselineDate` date ,
`ActualLikelihood` int ,
`ActualTechnical` int ,
`ActualSchedule` int ,
`ActualCost` int ,
`ScheduledLikelihood` int ,
`ScheduledTechnical` int ,
`ScheduledSchedule` int ,
`ScheduledCost` int ,
`BaselineLikelihood` int ,
`BaselineTechnical` int ,
`BaselineSchedule` int ,
`BaselineCost` int ,
PRIMARY KEY (`ID`)
)
CREATE TABLE `risklevels` (
`ID` int NOT NULL AUTO_INCREMENT,
`RiskLevelID` int ,
`RiskMaximum` float ,
`RiskHigh` float ,
`RiskMedium` float ,
`RiskMinimum` float ,
PRIMARY KEY (`ID`)
)
CREATE TABLE `riskmatrixthresholds` (
`ID` int NOT NULL AUTO_INCREMENT,
`CellID` int ,
`Likelihood` int ,
`Consequence` int ,
`Level` decimal(2,2) ,
PRIMARY KEY (`ID`)
)
CREATE TABLE `risks` (
`ID` int NOT NULL AUTO_INCREMENT,
`RiskState` varchar(10) ,
`RiskID` int ,
`RiskTitle` text CHARACTER SET latin1,
`RiskStatement` text CHARACTER SET latin1,
`ApproverID` int ,
`OwnerID` int ,
`CreatorID` int ,
`Likelihood` int ,
`Technical` int ,
`Schedule` int ,
`Cost` int ,
`ClosureCriteria` text CHARACTER SET latin1,
`CategoryID` int ,
`AssessmentDate` date ,
`CompletionDate` date ,
`ClosureDate` date ,
`Context` text,
PRIMARY KEY (`ID`),
UNIQUE KEY `risk_index` (`RiskID`)
)
CREATE TABLE `users` (
`ID` int NOT NULL AUTO_INCREMENT,
`UserID` int NOT NULL,
`LastName` char(25) ,
`FirstName` char(15) ,
`Title` char(20) ,
`Email` varchar(30) ,
`Phone` char(12) ,
`Extension` char(4) ,
`Department` char(25) ,
PRIMARY KEY (`ID`),
UNIQUE KEY `user_index` (`UserID`),
KEY `SURROGATE` (`UserID`)
)
insert into `riskevents`(`ID`,`EventID`,`RiskID`,`EventTitle`,`EventStatus`,`EventOwnerID`,`ActualDate`,`ScheduleDate`,`BaselineDate`,`ActualLikelihood`,`ActualTechnical`,`ActualSchedule`,`ActualCost`,`ScheduledLikelihood`,`ScheduledTechnical`,`ScheduledSchedule`,`ScheduledCost`,`BaselineLikelihood`,`BaselineTechnical`,`BaselineSchedule`,`BaselineCost`) values
(171,0,1,'Risk','Complete',5,'2019-06-14',NULL,'2019-06-14',5,2,2,5,NULL,NULL,NULL,NULL,5,2,2,5),
(184,0,10,'Risk','Complete',21,'2019-10-07',NULL,'2019-10-07',5,4,5,4,NULL,NULL,NULL,NULL,5,4,5,4));
insert into `risklevels`(`ID`,`RiskLevelID`,`RiskMaximum`,`RiskHigh`,`RiskMedium`,`RiskMinimum`) values
(1,1,1,0.55,0.3,0);
insert into `riskmatrixthresholds`(`ID`,`CellID`,`Likelihood`,`Consequence`,`Level`) values
(1,1,1,1,0.09),
(2,2,1,2,0.12),
(3,3,1,3,0.16),
(4,4,1,4,0.19),
(5,5,1,5,0.23),
(6,6,2,1,0.12),
(7,7,2,2,0.19),
(8,8,2,3,0.27),
(9,9,2,4,0.34),
(10,10,2,5,0.41),
(11,11,3,1,0.16),
(12,12,3,2,0.27),
(13,13,3,3,0.37),
(14,14,3,4,0.48),
(15,15,3,5,0.59),
(16,16,4,1,0.19),
(17,17,4,2,0.34),
(18,18,4,3,0.48),
(19,19,4,4,0.63),
(20,20,4,5,0.77),
(21,21,5,1,0.23),
(22,22,5,2,0.41),
(23,23,5,3,0.59),
(24,24,5,4,0.77),
(25,25,5,5,0.95);
insert into `risks`(`ID`,`RiskState`,`RiskID`,`RiskTitle`,`RiskStatement`,`ApproverID`,`OwnerID`,`CreatorID`,`Likelihood`,`Technical`,`Schedule`,`Cost`,`ClosureCriteria`,`CategoryID`,`AssessmentDate`,`CompletionDate`,`ClosureDate`,`Context`) values
(1,'Completed',1,'t','t',1,5,1,5,2,2,5,'t',NULL,'2019-06-14','2020-09-26',NULL,'t'),
(2,'Completed',2,'t','t',2,1,1,5,3,4,2,'test',NULL,'2019-05-14',NULL,NULL,'t'),
insert into `users`(`ID`,`UserID`,`LastName`,`FirstName`,`Title`,`Email`,`Phone`,`Extension`,`Department`) values
(1,1,'Admin','','Admin','a#yz.com','17890','1234',''),
(2,2,'Last','First','Engineer','a#yz.com','123890','1234','Supplier');
CREATE FUNCTION Consequence(technical int, sched int, cost int, consequence int) RETURNS int
BEGIN
select GREATEST(technical, sched, cost) into consequence;
return consequence;
END;
CREATE FUNCTION CurrentRiskEventID(riskidentifier int, eid int) RETURNS int
BEGIN
select MAX(e.EventID) into eid
FROM riskevents e
WHERE e.eventstatus not in('Open')
AND e.riskid = riskidentifier;
return riskeventid;
END;
CREATE FUNCTION CurrentConsequence(riskidentifier int, currentconsequence int) RETURNS int
BEGIN
SELECT coalesce(
(SELECT GREATEST(actualtechnical, actualschedule, actualcost)
FROM riskevents
WHERE id = CurrentRiskEventID(riskidentifier, 0)
and actualtechnical is not null
ANDactualschedule is not null
andactualschedule is not null),
(SELECT greatest(technical, schedule, cost)
from risks
Where riskid = riskidentifier)
) into currentconsequence;
return currentconsequence;
END;
CREATE FUNCTION CurrentLikelihood(riskidentifier int, currentlikelihood int) RETURNS int
BEGIN
SELECT coalesce(
(SELECT actuallikelihood
FROM riskevents
WHERE id = CurrentRiskEventID(riskidentifier, 0)),
(SELECT r.likelihood
FROM risks r
WHERE r.riskid = riskidentifier)) into currentlikelihood;
return currentlikelihood;
END;
CREATE FUNCTION CurrentRiskLevel(riskidentifier int, currentrisklevel int) RETURNS int
BEGIN
select RiskLevel(CurrentRiskValue(riskidentifier, 0), '') into currentrisklevel;
return currentrisklevel;
END;
CREATE FUNCTION CurrentRiskValue(riskidentifier int, currentriskvalue int) RETURNS int
BEGIN
SELECT RiskValue(CurrentLikelihood(riskidentifier, 0), CurrentConsequence(riskidentifier, 0), 0) into currentriskvalue;
return currentriskvalue;
END;
CREATE FUNCTION GetRiskUserID(riskusertype VARCHAR(25), riskidentifier int, riskuserid int) RETURNS int
BEGIN
SELECT COALESCE(userres.userid, 0) into riskuserid FROM
(
SELECT r.creatorid, r.ownerid, r.approverid, u.userid
FROM risks r, users u
WHERE r.riskid = (select riskidentifier) and
(
((select riskusertype) = 'Creator' AND u.userid = r.creatorid) OR
((select riskusertype) = 'Approver' AND u.userid = r.approverid) OR
((select riskusertype) = 'Owner' AND u.userid = r.ownerid)
)
) userres;
RETURN riskuserid;
END;
CREATE FUNCTION GetRiskUserLastOrFirstName(riskuserid int, riskid int, whichname char(25), firstorlastname char(25)) RETURNS char(25) CHARSET utf8 COLLATE utf8_unicode_ci
BEGIN
SELECT (case
when whichname = 'Last' then u.LastName
WHEN whichname = 'First' THEN u.FirstName
end)
into firstorlastname
FROM users u,risks r
WHERE u.UserID = riskuserid
AND r.RiskID = riskid;
return firstorlastname;
END;
CREATE FUNCTION LatestEventDate(riskidentifier int, riskassessmentdate date, latestdate date) RETURNS date
BEGIN
SELECT COALESCE(
(SELECT ActualDate FROM riskevents evt WHERE evt.eventid = CurrentRiskEventID(riskidentifier, 0) and evt.riskid = riskidentifier),
(SELECT riskassessmentdate)) into latestdate;
return latestdate;
END;
CREATE FUNCTION RiskLevel(riskvalue int, risklevel varchar(4)) RETURNS varchar(4)
begin
SELECT
CASE
WHEN riskvalue >= levels.riskhigh*100 THEN 'High'
WHEN riskvalue >= levels.riskmedium*100 THEN 'Med'
ELSE 'Low'
ENd as cat into risklevel
FROM risklevels levels;
return risklevel;
END;
CREATE FUNCTION RiskValue(likelihood int, consequence int, riskvalue int) RETURNS int
BEGIN
SELECT m.level*100 INTO riskvalue FROM riskmatrixthresholds m WHERE m.likelihood = likelihood AND m.consequence = consequence;
RETURN riskvalue;
END;
http://www.sqlfiddle.com/#!9/1d52a0/17
Note: SQL is a declarative language, not a procedural language. You tell it what you want, not how to get it. Your use of functions and so forth is procedural.
How can you make this application faster?
First, use the latest version of MySQL (8+, or MariaDB 10.4+). Later versions get faster.
Second, you have stated a requirement to use "subqueries composed of functions". That means you probably can't do much about the performance.
Why not? The subqueries buried in functions are so-called dependent subqueries. Those don't perform well. And because they're buried MySQL's query planner can't do anything useful to optimize them.
Refactoring your query to avoid using functions with SELECT operations will give the query planner visibility into your overall query. That will give it a chance to optimize things. You might replace them with views.
And don't use SELECT tablea, tableb syntax. That's been obsolete since 1992. Use
SELECT tablea JOIN tableb ON tablea.joincolumn = tableb.joincolumn.
I'd offer you more advice but I can't figure out your intent.
Application of the following changes in CurrentLikelihood() and CurrentConsequence() reduced total query execution time to exec 0.070 sec, total 0.082 sec.
Old Current Likelihood Query (producing slow and incorrect output)
SELECT coalesce(
(SELECT actuallikelihood
FROM riskevents
WHERE id = CurrentRiskEventID(riskidentifier, 0)),
(SELECT r.likelihood
FROM risks r
WHERE r.riskid = riskidentifier)) into currentlikelihood;
return currentlikelihood;
Working CurrentLikelihood Query
SELECT actuallikelihood INTO currentlikelihood
FROM riskevents
WHERE eventid = CurrentRiskEventID(riskidentifier)
AND riskid = riskidentifier;
Old CurrentConsequence Query (producing slow and incorrect output)
SELECT coalesce(
(SELECT GREATEST(actualtechnical, actualschedule, actualcost)
FROM riskevents
WHERE id = CurrentRiskEventID(riskidentifier, 0)
and actualtechnical is not null
and actualschedule is not null),
(SELECT greatest(technical, schedule, cost)
from risks
Where riskid = riskidentifier)
) into currentconsequence;
Working CurrentConsequence Query
SELECT GREATEST(actualtechnical, actualschedule, actualcost) INTO currentconsequence
FROM riskevents
WHERE eventid = CurrentRiskEventID(riskidentifier)
AND riskid = riskidentifier;
Old CurrentRiskEventID() Query
select MAX(e.EventID) into currentriskeventid
FROM riskevents e
WHERE e.eventstatus not in('Open')
AND e.riskid = riskidentifier;
Modified GetRiskEventID() function
SELECT MAX(e.EventID) INTO currentriskeventid
FROM riskevents e
WHERE e.riskid = riskidentifier AND
(e.eventstatus != 'Open'
OR
(e.EventID = 0 AND e.eventstatus = 'Open'));
I have the table below. I'm looking for the both the max value in each column AND it's matching username (all values of NULL to be ignored).
A bunch of mad googling has lead me to believe I need to find the max values and then use a second query to find the matching username?
But is there a query that can return this in one go?
ID username Vale Jorge Andrea
-------------------------------------------
01 John 2 6 NULL
02 Ted NULL 0 0
03 Marcy NULL 2 1
Output would be...
John Jorge 6
John Vale 2
Marcy Andrea 1
There's different ways of looking at it, here's a table that gives a row for each username that has a matching max value:
SELECT
username
, IF (max_vale = t.vale, max_vale, NULL) AS for_vale
, IF (max_jorge = t.jorge, max_jorge, NULL) AS for_jorge
, IF (max_andrea = t.andrea, max_andrea, NULL) AS for_andrea
FROM (
SELECT
MAX(vale) AS max_vale
, MAX(jorge) AS max_jorge
, MAX(andrea) AS max_andrea
FROM t
) y
JOIN t ON (
t.vale = max_vale
OR t.jorge = max_jorge
OR t.andrea = max_andrea
)
http://sqlfiddle.com/#!9/58e37d/5
This gives:
username for_vale for_jorge for_andrea
----------------------------------------------
John 2 6 (null)
Marty (null) (null) 1
Basically, all I'm doing is selecting the specific column max values, then using that query as the source for another query that just looks at the MAX generated columns, and filters (IF()) based on the matches found.
This is one reasonably simple way of doing it... make a table of all the maximum values and join it to the usernames on matching the value with the max. Uses the fact that NULL isn't equal to anything. I haven't attempted to order the results but that is easy enough with adding an ORDER BY clause.
select username, name, COALESCE(mv.vale, mv.jorge, mv.Andrea) as value
from table1
join
(select 'Vale' as name, max(vale) as vale, NULL as jorge, NULL as andrea from table1
union ALL
select 'Jorge', NULL, max(jorge), NULL from table1
union all
select 'Andrea', NULL, NULL, max(andrea) from table1) mv
on table1.vale = mv.vale or table1.jorge = mv.jorge or table1.andrea = mv.andrea
Output
username name value
John Vale 2
John Jorge 6
Marcy Andrea 1
To extrapolate this to more columns is reasonably straightforward (if somewhat painful) e.g. to add a column called fred you would use (changes inside **):
select username, name, COALESCE(mv.vale, mv.jorge, mv.Andrea**, mf.fred**) as value
from table1
join
(select 'Vale' as name, max(vale) as vale, NULL as jorge, NULL as andrea**, NULL as fred** from table1
union ALL
select 'Jorge', NULL, max(jorge), NULL**, NULL** from table1
union all
select 'Andrea', NULL, NULL, max(andrea)**, NULL** from table1) mv
**union all
select 'Fred', NULL, NULL, max(fred), NULL from table1) mf**
on table1.vale = mv.vale or table1.jorge = mv.jorge or table1.andrea = mv.andrea **or table1.fred = mf.fred**
If you have access to stored procedures, you can also do it like this (in a much more flexible way in terms of columns)
DROP PROCEDURE IF EXISTS list_maxes;
DELIMITER //
CREATE PROCEDURE list_maxes(tname VARCHAR(20), column_list VARCHAR(1000))
BEGIN
DECLARE maxv INT DEFAULT 0;
DECLARE cpos INT;
DECLARE colname VARCHAR(20);
-- loop through the column names
WHILE (LENGTH(column_list) > 0)
DO
SET cpos = LOCATE(',', column_list);
IF (cpos > 0) THEN
SET colname = LEFT(column_list, cpos - 1);
SET column_list = SUBSTRING(column_list, cpos + 1);
ELSE
SET colname = column_list;
SET column_list = '';
END IF;
-- find the maximum value of this column
SET #getmax = CONCAT('SELECT MAX(', colname, ') INTO #maxv FROM Table1');
PREPARE s1 FROM #getmax;
EXECUTE s1;
DEALLOCATE PREPARE s1;
-- now find the user with the maximum value
SET #finduser = CONCAT("SELECT username, '", colname, "' AS name, ", colname, ' AS value FROM ', tname,' WHERE ', colname, ' = ', #maxv);
PREPARE s2 FROM #finduser;
EXECUTE s2;
DEALLOCATE PREPARE s2;
END WHILE;
END//
DELIMITER ;
CALL list_maxes('table1', 'Vale,Jorge,Andrea')
Output
John Vale 2
John Jorge 6
Marcy Andrea 1
This is quite long but working. I combine all columns into one table using UNION ALL. Then get the max value per surname. Join this to the original table using the surname and value. Order by value in descending order.
select tv.*
from( select surname, max(val) as maxval
from (
select username,'vale' as surname,vale as val
from tbl
union all
select username,'jorge' as surname,jorge
from tbl
union all
select username,'andrea' as surname,andrea
from tbl) tab
group by surname) tt
join (
select username,'vale' as surname,vale as val
from tbl
union all
select username,'jorge' as surname,jorge
from tbl
union all
select username,'andrea' as surname,andrea
from tbl) tv
on tt.surname=tv.surname and tt.maxval=tv.val
order by tv.val desc;
As a whole other way to resolve the request, we may look at the model, e.g., the fact that what appears to be domain content is represented as columns instead of rows. This is typical of a source aggregate query (pivot or rollup to get aggregated totals or groupings, for instance), but if the underlying data spreads out, should perhaps be based on the transactional integrity of that data source (the "spread out" underlying source).
Basically, I wonder why there are Vale, Jorge and Andrea columns at all in the database. This implies it's already been summarized.
So we may look at an alternate model that is notably easier to navigate for these purposes:
CREATE TABLE IF NOT EXISTS `user` (
`id` MEDIUMINT NOT NULL AUTO_INCREMENT,
`username` varchar(20) NOT NULL,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `prospect` (
`id` MEDIUMINT NOT NULL AUTO_INCREMENT,
`name` varchar(20) NOT NULL,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `aggregate` (
`id` MEDIUMINT NOT NULL AUTO_INCREMENT,
`user_id` int(6) unsigned NOT NULL,
`prospect_id` int(6) unsigned NOT NULL,
`total` int(6) unsigned NOT NULL,
PRIMARY KEY (`id`)
) DEFAULT CHARSET=utf8;
SELECT
user.username
, prospect.name
, MAX(aggregate.total) AS max_aggregate
FROM aggregate
JOIN user ON user_id = user.id
JOIN prospect ON prospect_id = prospect.id
GROUP BY username
This produces:
John Andrea 6
Marty Jorge 2
Ted Jorge 5
http://sqlfiddle.com/#!9/07ba0c/1
This may not be useful to you now, but as your experience grows and your experience with advanced querying evolves, this will make more sense. The main difficulty may be that the core data is already turned, making the querying more difficult because what you want is a different dimension than what you may have already derived.