SQL Server (T-SQL): Avoid call scalar function multiple times - sql-server-2008

I have a stored procedure which does a SELECT to return some rows. Within the SELECT, I need to check a condition in order to return the correct value for some columns. This condition consists on a scalar function. All times scalar function is called with the same parameter for the row being processed, see below:
SELECT
Id,
Name,
Surname,
CASE WHEN (dbo.GetNumTravels(Id) >= 50)
THEN 10
ELSE 99
END as Factor1,
CASE WHEN (dbo.GetNumTravels(Id) >= 50)
THEN 15
ELSE -1
END as Factor2,
CASE WHEN (dbo.GetNumTravels(Id) >= 50)
THEN 30
ELSE 70
END as Factor3
FROM
Employees
WHERE
DepartmentId = 100
I am worried about performance, I mean, I do not like to call scalar function dbo.GetNumTravels multiples times, so how to avoid this and only call it once and then used it all the times I need it?

Scalar user defined functions are infamous for poor performance. If you can convert it to an inline table-valued function you can expect to see performance gains.
If you convert your scalar function to an inline table-valued function you can call it once for each row using cross apply() like so:
select
Id,
Name,
Surname,
case when x.NumTravels >= 50
then 10
else 99
end as Factor1,
case when x.NumTravels >= 50)
then 15
else -1
end as Factor2,
case when x.NumTravels >= 50
then 30
else 70
end as Factor3
from Employees
cross apply dbo.GetNumTravels_itvf(e.Id) x
where DepartmentId = 100
Reference:
When is a sql function not a function? "If it’s not inline, it’s rubbish." - Rob Farley
Inline Scalar Functions - Itzik Ben-Gan
Scalar functions, inlining, and performance: An entertaining title for a boring post - Adam Machanic
tsql User-Defined Functions: Ten Questions You Were Too Shy To Ask - Robert Sheldon

You can achieve this by using derived table concept, In derived table we once called to function dbo.GetNumTravels(Id) only once and used its output in outer query, this may help to gaining performance at some level by avoiding multiple calls to same function.
SELECT
Id,
Name,
Surname,
CASE WHEN (NumTravelsID >= 50) THEN 10 ELSE 99 END as Factor1,
CASE WHEN (NumTravelsID >= 50) THEN 15 ELSE -1 END as Factor2,
CASE WHEN (NumTravelsID >= 50) THEN 30 ELSE 70 END as Factor3
FROM (
SELECT
Id,
Name,
Surname,
dbo.GetNumTravels(Id) as NumTravelsID
FROM Employees
WHERE DepartmentId = 100
)M

I am not sure about performances (anyway, do your tests considering other answers too), but I would like to test this. I tried to reduce use of function and use of CASE too. Pls let me know
SELECT A.*
, 10*F0+99*~F0 AS FACTOR1
, 15*F0-1*~F0 AS FACTOR2
, 30*F0+70*~F0 AS FACTOR3
FROM (
SELECT
Id,
Name,
Surname,
CAST(CASE WHEN (dbo.GetNumTravels(Id) >= 50) THEN 1 ELSE 0 END AS BIT) AS F0
FROM Employees
WHERE DepartmentId = 100
) A

Related

How to calculate AVG, MAX and MIN number of rows in a column

I try to collect general statistics on the depth of correspondence: average, maximum and minimum number of messages of each type per one request. Have 2 tables:
First:
ticketId,ticketQueueId,ticketCreatedDate
Second:
articleId,articleCreatedDt,articleType (can be IN or OUT - support responses), ticketId
I reasoned like this:
SELECT AVG(COUNT(articleType='IN')) AS AT_IN, AVG(COUNT(articleType='OUT')) AS AT_OUT
FROM tickets.tickets JOIN tickets.articles
ON tickets.ticketId=articles.ticketId;
GROUP BY tickets.ticketId
but it doesn't work.
Error Code: 1111. Invalid use of group function
you can't use nested aggregation function (AVG(COUNT())) but use proper subquery and apply the aggregation function the the subquery gradually
also your use of of count in improper
the count function count each row where the related column is not null so in your case the evaluation articleType='IN' (or articleType='OUT') returning 0 or 1 is never null
select AVG(T_IN), AVG(T_OUT)
from (
SELECT sum(case when articleType='IN' then 1 else 0 END AS T_IN, sum(case when articleType='OUT' then 1 else 0 END AS T_OUT
FROM tickets.tickets
JOIN tickets.articles ON tickets.ticketId=articles.ticketId
GROUP BY tickets.ticketId
) t
(and You have also a wrong semicolon )

use case when to show column conditional value in MySQL

I would like to add a column and show customer rewards status based on their points.
However, MySQL states there is an error in the syntax which I cannot figure out why.
My code is as below, I got the error msg that SELECT is not valid at this position.
SELECT customer_id, first_name,
CASE points
WHEN > 3000 THEN 'Gold'
WHEN BETWEEN 2000 to 3000 THEN 'Silver'
ELSE 'Bronze'
END AS rewards_status
FROM customers
The short syntax for case, which you are using here (like: case <expr> when <val> then ... end) only supports equality condition; this does not fit your use case, which requires inequality conditions. You need to use the long case syntax (like: case when <condition> then ... end).
Also, there is no need for between in the second condition. Branches of a case expression are evaluated sequentially, and the process stops as soon as a condition is fullfilled, so you can do:
select
customer_id,
first_name
case
when points > 3000 then 'Gold'
when points > 2000 then 'Silver'
else 'Bronze'
end as reward_status
from customers
You may use the alternative syntax for CASE expressions here:
SELECT
customer_id,
first_name,
CASE WHEN points > 3000 THEN 'Gold'
WHEN points BETWEEN 2000 to 3000 THEN 'Silver'
ELSE 'Bronze' END AS rewards_status
FROM customers;

Is there any way in SQL or function in MYSQL that sums up all the increments in a column?

I want to find a way to sum up all the increments in the value of a column.
We provide delivery services to our customers. A customer can pay as he go, but if he pays an upfront fee, he gets a better deal. There is a table that has the balance of the customer across the time. So I want to sum all the increments to the balance. I can't change the way the payment is recorded.
I have alredy coded an stored procedure that works, but is kind slow, so I'm looking for alternatives. I think that, maybe, an sql statement that can do this task, can outperform my stored procedure that has loops.
My stored procedure makes a select of the customer in a given date range, and insert the result in a temp table X. After that, it starts to pop rows from X table, comparing the balance value in that row against the previous row, and detects if there is an increment. If there is not increment, pops another row and do the same routine, if there is an increment, it calculates the difference between that row and the previous, and the result is inserted in another temp table Y.
When there are no rows left, the stored procedure performs a SUM in the temp table Y, and thus, you can know how much the customer has "refilled" its balance.
This is an example of the table X, and the expected result:
DATE BALANCE
---- -------
2019-02-01 200
2019-02-02 195 //from 200 to 195 there is a decrement, so it doesn't matter
2019-02-03 180
2019-02-04 150
2019-02-05 175 //there is an increment from 150 to 175, it's 25 that must be inserted in the temp table
2019-02-06 140
2019-02-07 180 //there is another increment, from 140 to 180, it's 40
So the resulting temp table Y must be something like this:
REFILL
------
25
40
The expected result is 65. My stored procedure returns this value, but as I said, is kind slow (it takes about 22 seconds to process 3900 rows, equivalent to 3 days, aprox), I think is because the loops. I would like to explore another alternatives. Because some details that I don't mention here, for a single costumer, I can have 1300 rows per day (the example is given in days, but I have rows by the minute). My tables are indexed, I think properly. I can't post my stored procedure, but it works as described (I know that "The devil is in the detail"). So any suggestion will be appreciated.
Use a user-defined variable to hold the balance from the previous row, and then subtract it from the current row's balance.
SELECT SUM(refill) AS total_refill
FROM (
SELECT GREATEST(0, balance - #prev_balance) AS refill, #prev_balance := balance
FROM (
SELECT balance
FROM tableX
ORDER BY date) AS t
CROSS JOIN (SELECT #prev_balance := NULL) AS ars
) AS t
There is a quite well-known mechanism to deal with these: Use a variable inside a field.
SELECT #result:=0;
SELECT #lastbalance:=9999999999; -- whatever value is sure to be highe than any real balance
SELECT SUM(increments) AS total FROM (
SELECT
IF(balance>#lastbalance, balance-#lastbalance, 0) AS increments,
#lastbalance:=balance AS ignore
FROM X -- insert real table name here
WHERE
-- insert selector here
ORDER BY
-- insert real chronological sorter here
) AS baseview;
Use lag() in MySQL 8+:
select sum(balance - prev_balance) as refills
from (select t.*, lag(balance) over (order by date) prev_balance
from t
) t
where balance > prev_balance;
In older versions of MySQL this is tricky. If the values are continuous dates, then a simple JOIN works:
select sum(t.balance - tprev.balance) as refills
from t join
t tprev
on tprev.date = t.date - 1
where t.balance > tprev.balance;
This may not be the case. Then the next best method is variables. But you have to be very careful. MySQL does not declare the order of evaluation of expressions in a SELECT. As the documentation explains:
The order of evaluation for expressions involving user variables is undefined. For example, there is no guarantee that SELECT #a, #a:=#a+1 evaluates #a first and then performs the assignment.
The variables need to be assigned and used in the same expression:
select sum(balance - prev_balance) as refills
from (select t.*,
(case when (#temp_prevb := #prevb) = NULL -- intentionally false
then -1
when (#prevb := balance)
then #temp_prevb
end) as prev_balance
from (select t.* from t order by date) t cross join
(select #prevb := NULL) params
) t
where balance > prev_balance;
And the final method is a correlated subquery:
select sum(balance - prev_balance) as refills
from (select t.*,
(select t2.balance
from t t2
where t2.date < t.date
order by t2.date desc
) as prev_balance
from t
) t
where balance > prev_balance;

Count consecutive row occurrences

I have a MySQL table with three columns: takenOn (datetime - primary key), sleepDay (date), and type (int). This table contains my sleep data from when I go to bed to when I get up (at a minute interval).
As an example, if I go to bed on Oct 29th at 11:00pm and get up on Oct 30th at 6:00am, I will have 420 records (7 hours * 60 minutes). takenOn will range from 2016-10-29 23:00:00 to 2016-10-30 06:00:00. sleepDay will be 2016-10-30 for all 420 records. type is the "quality" of my sleep (1=asleep, 2=restless, 3=awake). I'm trying to get how many times I was restless/awake, which can be calculated by counting how many times I see type=2 (or type=3) consecutively.
So far, I have to following query, which works for one day only. Is this the correct/"efficient" way of doing this (as this method requires that I have the data without any "gaps" in takenOn)? Also, how can I expand it to calculate for all possible sleepDays?
SELECT
sleepDay,
SUM(CASE WHEN type = 2 THEN 1 ELSE 0 END) AS TimesRestless,
SUM(CASE WHEN type = 3 THEN 1 ELSE 0 END) AS TimesAwake
FROM
(SELECT s1.sleepDay, s1.type
FROM sleep s1
LEFT JOIN sleep s2
ON s2.takenOn = ADDTIME(s1.takenOn, '00:01:00')
WHERE
(s2.type <> s1.type OR s2.takenOn IS NULL)
AND s1.sleepDay = '2016-10-30'
ORDER BY s1.takenOn) a
I have created an SQL Fiddle - http://sqlfiddle.com/#!9/b33b4/3
Thank you!
Your own solution is quite alright, given the assumptions you are aware of.
I present here an alternative solution, that will deal well with gaps in the series, and can be used for more than one day at a time.
The downside is that it relies more heavily on non-standard MySql features (inline use of variables):
select sleepDay,
sum(type = 2) TimesRestless,
sum(type = 3) TimesAwake
from (
select #lagDay as lagDay,
#lagType as lagType,
#lagDay := sleepDay as sleepDay,
#lagType := type as type
from (select * from sleep order by takenOn) s1,
(select #lagDay := '',
#lagType := '') init
) s2
where lagDay <> sleepDay
or lagType <> type
group by sleepDay
To see how it works it can help to select the second select statement on its own. The inner-most select must have the order by clause to make sure the middle query will process the records in that order, which is important for the variable assignments that happen there.
See your updated SQL fiddle.

AVG or SUM in SQL where the values are being calculated on the fly

I have an existing SQL query that gets call stats from a Zultys MX250 phone system: -
SELECT
CONCAT(LEFT(u.firstname,1),LEFT(u.lastname,1)) AS Name,
sec_to_time(SUM(
time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)
)) AS Duration,
COUNT(*) AS '#Calls'
FROM
session s
JOIN mxuser u ON
s.ExtensionID1 = u.ExtensionId
OR s.ExtensionID2 = u.ExtensionId
WHERE
s.ServiceExtension1 IS NULL
AND s.connecttimestamp >= CURRENT_DATE
AND BINARY u.userprofilename = BINARY 'DBAM'
GROUP BY
u.firstname,
u.lastname
ORDER BY
'#Calls' DESC,
Duration DESC;
Output is as follows: -
Name Duration #Calls
TH 01:19:10 30
AS 00:44:59 28
EW 00:51:13 22
SH 00:21:20 13
MG 00:12:04 8
TS 00:42:02 5
DS 00:00:12 1
I am trying to generate a 4th column that shows the average call time for each user, but am struggling to figure out how.
Mathematically it's just "'Duration' / '#Calls'" but after looking at some similar questions on StackOverflow, the example queries are too simple to help me relate to my one above.
Right now, I'm not even sure that it's going to be possible to divide the time column by the number of calls.
UPDATE: I was so close in my testing but got all confused & overcomplicated things. Here's the latest SQL (thanks to #McAdam331 & my buddy Jim from work): -
SELECT
CONCAT(LEFT(u.firstname,1),LEFT(u.lastname,1)) AS Name,
sec_to_time(SUM(
time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)
)) AS Duration,
COUNT(*) AS '#Calls',
sec_to_time(SUM(time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)) / COUNT(*)) AS Average
FROM
session s
JOIN mxuser u ON
s.ExtensionID1 = u.ExtensionId
OR s.ExtensionID2 = u.ExtensionId
WHERE
s.ServiceExtension1 IS NULL
AND s.connecttimestamp >= CURRENT_DATE
AND BINARY u.userprofilename = BINARY 'DBAM'
GROUP BY
u.firstname,
u.lastname
ORDER BY
Average DESC;
Output is as follows: -
Name Duration #Calls Average
DS 00:14:25 4 00:03:36
MG 00:17:23 11 00:01:34
TS 00:33:38 22 00:01:31
EW 01:04:31 43 00:01:30
AS 00:49:23 33 00:01:29
TH 00:43:57 35 00:01:15
SH 00:13:51 12 00:01:09
Well, you are able to get the number of total seconds, as you do before converting it to time. Why not take the number of total seconds, divide that by the number of calls, and then convert that back to time?
SELECT sec_to_time(
SUM(time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)) / COUNT(*))
AS averageDuration
If I understand correctly, you can just replace sum() with avg():
SELECT
CONCAT(LEFT(u.firstname,1),LEFT(u.lastname,1)) AS Name,
sec_to_time(SUM(
time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)
)) AS Duration,
COUNT(*) AS `#Calls`,
sec_to_time(AVG(
time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)
)) AS AvgDuration
Seems like all you need is another expression in the SELECT list. The SUM() aggregate (from the second expression) divided by COUNT aggregate (the third expr). Then wrap that in a sec_to_time function. (Unless I'm totally missing the question.)
Personally, I'd use the TIMESTAMPDIFF function to get a difference in times.
SEC_TO_TIME(
SUM(TIMESTAMPDIFF(SECOND,s.connecttimestamp,s.disconnecttimestamp))
/ COUNT(*)
) AS avg_duration
If what you are asking is there's a way to reference other expressions in the SELECT list by the alias... the answer is unfortunately, there's not.
With a performance penalty, you could use your existing query as an inline view, then in the outer query, the alias names assigned to the expressions are available...
SELECT t.Name
, SEC_TO_TIME(s.TotalDur) AS Duration
, s.`#Calls`
, SEC_TO_TIME(s.TotalDur/s.`#Calls`) AS avgDuration
FROM (
SELECT CONCAT(LEFT(u.firstname,1),LEFT(u.lastname,1)) AS Name
, SUM(TIMESTAMPDIFF(SECOND,s.connecttimestamp,s.disconnecttimestamp)) AS TotalDur
, COUNT(1) AS `#Calls`
FROM session s
-- the rest of your query
) t