mysql reference not supported to group function - mysql

I have a MySQL procedure that takes in 2 parameters par_DateFrom and par_DateTo
I'm getting a nasty error. I'm pretty sure that reusing alias TotalDaysOut to calculate TotalIncome is the culprit. How can I fix this elegantly?
Error 1247:
reference TotalDaysOutnot supported reference to group function
BEGIN
SELECT t.LicencePlate
,f.Make
,f.Model
,f.Year
,COUNT(t.LicencePlate) AS TotalTrx
,SUM(DATEDIFF(IF(checkedIn >par_DateTo, par_DateTo, checkedIn) ,IF(checkedOut <par_DateFrom, par_DateFrom, checkedOut))) AS TotalDaysOut
,SUM(t.Price* (SELECT TotalDaysOut)) AS TotalIncome
FROM TRANSACTIONS t
INNER JOIN FLEET f
ON t.LicencePlate = f.LicencePlate
WHERE t.CheckedOut < par_DateTo AND t.CheckedIn > par_DateFrom
GROUP BY t.LicencePlate
,f.Make
,f.Model
,f.Year;
END

Since your predicates are already verifying that all of the columns referenced in the expression are not null:
checkedIn
par_DateTo
checkedOut
par_DateFrom
(The predicates in the WHERE clause require all of those to be non-NULL), you could simplify the expression a bit, to reference each column once, rather than twice:
DATEDIFF(LEAST(t.checkedIn, par_DateTo),GREATEST(t.checkedOut, par_DateFrom))
And (as Gordon already suggested) just repeat that expression where the result is needed.
When we absolutely, positively have to have reference to an alias from a query, the only real option in MySQL is to use an inline view, though this approach has significant performance consequences for large sets.
SELECT v.LicencePlate
, f.Make
, f.Model
, f.Year
, COUNT(v.LicencePlate) AS TotalTrx
, SUM(v.DaysOut) AS TotalDaysOut
, SUM(v.DaysOut)*v.Price AS TotalIncome
FROM ( SELECT t.LicencePlate
, t.Price
, DATEDIFF(
LEAST(t.checkedIn, par_DateTo),
GREATEST(t.checkedOut, par_DateFrom)
) AS DaysOut
FROM TRANSACTIONS t
WHERE t.CheckedOut < par_DateTo
AND t.CheckedIn > par_DateFrom
) v
JOIN FLEET f
ON f.LicencePlate = v.LicencePlate
GROUP
BY v.LicencePlate
, f.Make
, f.Model
, f.Year
That's less performant, and less elegant, than just simplifying and repeating the expression.

You can't do that. You cannot reference a column alias at the same level of the select as where it is defined. I'm not sure what the exact error is but the (select TotalDaysOut) doesn't make sense.
So, repeat the expression with the additional multiplication:
SUM(DATEDIFF(IF(checkedIn >par_DateTo, par_DateTo, checkedIn) ,IF(checkedOut <par_DateFrom, par_DateFrom, checkedOut))) AS TotalDaysOut,
SUM(t.Price * DATEDIFF(IF(checkedIn >par_DateTo, par_DateTo, checkedIn) ,IF(checkedOut <par_DateFrom, par_DateFrom, checkedOut))) AS TotalIncome

Related

SUM inside SUM SQL Invalid use of group function

Hi I want to perform a calculation inside a SUM with my sql, but there is one SUM field that consist of other SUM fields. I get the General error: 1111 Invalid use of group function. What is the proper way of summing other sum fields in SQL?
I can't use the alias of other sum fields to perform the calculation because it says that the alias is unidentified.
This part is my problem
SUM((SUM(transactions.payable) + SUM(transactions.discount) ) - SUM(deliveries.delivery_fee) ) AS raw_sales
Thank you
Here is my SQL.
SELECT
MONTHNAME(transactions.date_transac) AS MONTH,
SUM(transactions.payable) AS total,
SUM(transactions.discount) AS discount,
SUM(deliveries.delivery_fee) AS delivery,
SUM(
(
SUM(transactions.payable) + SUM(transactions.discount)
) - SUM(deliveries.delivery_fee)
) AS raw_sales,
MONTH(transactions.date_transac) AS monthnum
FROM
`transactions`
LEFT JOIN `requisitions` ON `transactions`.`requisition_id` = `requisitions`.`id`
LEFT JOIN `transactions` AS `ct`
ON
`transactions`.`code` = `ct`.`charge_transaction_code`
LEFT JOIN `deliveries` ON `transactions`.`delivery_id` = `deliveries`.`id`
WHERE
`transactions`.`transaction_type` = Sale AND YEAR(`transactions`.`date_transac`) = 2020
GROUP BY
`month`
ORDER BY
`monthnum` ASC
enter image description here
You can't nest aggregate functions. Here, I suspect that you could move the arithmetics within the aggregate function rather than attempting to nest:
SUM(
transactions.payable
+ transactions.discount
- COALESCE(deliveries.delivery_fee, 0)
) AS raw_sales
delivery_fee comes from a left join table so it could be null, hence we use coalesce().
That said, I am quite suspicious about the logic of your query. I am wondering, for example, why transactions appears twice in the from clause. There are also missing quotes around literal string "Sale" in the WHERE clause. If you were to ask a legitimate question, including sample data, desired results, and an explanation of the purpose of the query, one might be able to suggests optimizations.
The query just worked, I haven't realized that it is no longer necessary to calculate all Sum fields. I just removed the external sum.

Optimize derived table in select

I have sql query:
SELECT tsc.Id
FROM TEST.Services tsc,
(
select * from DICT.Change sp
) spc
where tsc.serviceId = spc.service_id
and tsc.PlanId = if(spc.plan_id = -1, tsc.PlanId, spc.plan_id)
and tsc.startDate > GREATEST(spc.StartTime, spc.startDate)
group by tsc.Id;
This query is very, very slow.
Explain:
Can this be optimized? How to rewrite this subquery for another?
What is the point of this query? Why the CROSS JOIN operation? Why do we need to return multiple copies of id column from Services table? And what are we doing with the millions of rows being returned?
Absent a specification, an actual set of requirements for the resultset, we're just guessing at it.
To answer your questions:
Yes, the query could be "optimized" by rewriting it to the resultset that is actually required, and do it much more efficiently than the monstrously hideous SQL in the question.
Some suggestions: ditch the old-school comma syntax for the join operation, and use the JOIN keyword instead.
With no join predicates, it's a "cross" join. Every row matched from one side matched to every row from the right side.) I recommend including the CROSS keyword as an indication to future readers that the absence of an ON clause (or, join predicates in the WHERE clause) is intentional, and not an oversight.
I'd also avoid an inline view, unless there is a specific reason for one.
UPDATE
The query in the question is updated to include some predicates. Based on the updated query, I would write it like this:
SELECT tsc.id
FROM TEST.Services tsc
JOIN DICT.Change spc
ON tsc.serviceid = spc.service_id
AND tsc.startdate > spc.starttime
AND tsc.startdate > spc.starttdate
AND ( tsc.planid = spc.plan_id
OR ( tsc.planid IS NOT NULL AND spc.plan_id = -1 )
)
Ensure that the query is making use of suitable index by looking at the output of EXPLAIN to see the execution plan, in particular, which indexes are being used.
Some notes:
If there are multiple rows from spc that "match" a row from tsc, the query will return duplicate values of tsc.id. (It's not clear why or if we need to return duplicate values. IF we need to count the number of copies of each tsc,id, we could do that in the query, returning distinct values of tsc.id along with a count. If we don't need duplicates, we could return just a distinct list.
GREATEST function will return NULL if any of the arguments are null. If the condition we need is "a > GREATEST(b,c)", we can specify "a > b AND a > c".
Also, this condition:
tsc.PlanId = if(spc.plan_id = -1, tsc.PlanId, spc.plan_id)
can be re-written to return an equivalent result (I'm suspicious about the actual specification, and whether this original condition actually satisfies that adequately. Without example data and sample of expected output, we have to rely on the SQL as the specification, so we honor that in the rewrite.)
If we don't need to return duplicate values of tsc.id, assuming id is unique in TEST.Services, we could also write
SELECT tsc.id
FROM TEST.Services tsc
WHERE EXISTS
( SELECT 1
FROM DICT.Change spc
ON spc.service_id = tsc.serviceid
AND spc.starttime < tsc.startdate
AND spc.starttdate < tsc.startdate
AND ( ( spc.plan_id = tsc.planid )
OR ( spc.plan_id = -1 AND tsc.planid IS NOT NULL )
)
)

Adding subqueries when one is null [duplicate]

In MySQL, is there a way to set the "total" fields to zero if they are NULL?
Here is what I have:
SELECT uo.order_id, uo.order_total, uo.order_status,
(SELECT SUM(uop.price * uop.qty)
FROM uc_order_products uop
WHERE uo.order_id = uop.order_id
) AS products_subtotal,
(SELECT SUM(upr.amount)
FROM uc_payment_receipts upr
WHERE uo.order_id = upr.order_id
) AS payment_received,
(SELECT SUM(uoli.amount)
FROM uc_order_line_items uoli
WHERE uo.order_id = uoli.order_id
) AS line_item_subtotal
FROM uc_orders uo
WHERE uo.order_status NOT IN ("future", "canceled")
AND uo.uid = 4172;
The data comes out fine, except the NULL fields should be 0.
How can I return 0 for NULL in MySQL?
Use IFNULL:
IFNULL(expr1, 0)
From the documentation:
If expr1 is not NULL, IFNULL() returns expr1; otherwise it returns expr2. IFNULL() returns a numeric or string value, depending on the context in which it is used.
You can use coalesce(column_name,0) instead of just column_name. The coalesce function returns the first non-NULL value in the list.
I should mention that per-row functions like this are usually problematic for scalability. If you think your database may get to be a decent size, it's often better to use extra columns and triggers to move the cost from the select to the insert/update.
This amortises the cost assuming your database is read more often than written (and most of them are).
None of the above answers were complete for me.
If your field is named field, so the selector should be the following one:
IFNULL(`field`,0) AS field
For example in a SELECT query:
SELECT IFNULL(`field`,0) AS field, `otherfield` FROM `mytable`
Hope this can help someone to not waste time.
You can try something like this
IFNULL(NULLIF(X, '' ), 0)
Attribute X is assumed to be empty if it is an empty String, so after that you can declare as a zero instead of last value. In another case, it would remain its original value.
Anyway, just to give another way to do that.
Yes IFNULL function will be working to achieve your desired result.
SELECT uo.order_id, uo.order_total, uo.order_status,
(SELECT IFNULL(SUM(uop.price * uop.qty),0)
FROM uc_order_products uop
WHERE uo.order_id = uop.order_id
) AS products_subtotal,
(SELECT IFNULL(SUM(upr.amount),0)
FROM uc_payment_receipts upr
WHERE uo.order_id = upr.order_id
) AS payment_received,
(SELECT IFNULL(SUM(uoli.amount),0)
FROM uc_order_line_items uoli
WHERE uo.order_id = uoli.order_id
) AS line_item_subtotal
FROM uc_orders uo
WHERE uo.order_status NOT IN ("future", "canceled")
AND uo.uid = 4172;

MySQL String Position and Substring sort

Example data to sort:
xy3abc
y3bbc
z3bd
Sort order must be abc, bbc, bd regardless of what is before the numeral.
I tried:
SELECT
*,
LEAST(
if (Locate('0',fcccall) >0,Locate('0',fcccall),99),
if (Locate('1',fcccall) >0,Locate('1',fcccall),99),
if (Locate('2',fcccall) >0,Locate('2',fcccall),99),
if (Locate('3',fcccall) >0,Locate('3',fcccall),99),
if (Locate('4',fcccall) >0,Locate('4',fcccall),99),
if (Locate('5',fcccall) >0,Locate('5',fcccall),99),
if (Locate('6',fcccall) >0,Locate('6',fcccall),99),
if (Locate('7',fcccall) >0,Locate('7',fcccall),99),
if (Locate('8',fcccall) >0,Locate('8',fcccall),99),
if (Locate('9',fcccall) >0,Locate('9',fcccall),99)
) as locationPos,
SUBSTRING(fcccall,locationPos,3) as fccsuffix
FROM memberlist
ORDER BY locationPos, fccsuffix
but locationPos gives me an error on the substring function call
It's not possible to reference that expression by its alias locationPos, within another expression in the same SELECT list.
Replicating the entire expression would be the SQL way to do it. (Yes, it is ugly repeating that entire expression.)
Another (less performant) approach is to use your query (minus the fccsuffix expression) as an inline view. The outer query can reference the assigned locationPos alias as a column name.
As a simple example:
SELECT v.locationPos
FROM ( SELECT 'my really big expression' AS locationPos
FROM ...
) v
This approach of using an inline view ("derived table") can have some serious performance implications with large sets.
But for raw performance, repeating the expression is the way to go:
SELECT *
, LEAST(
if (Locate('0',fcccall) >0,Locate('0',fcccall),99),
if (Locate('1',fcccall) >0,Locate('1',fcccall),99),
if (Locate('2',fcccall) >0,Locate('2',fcccall),99),
if (Locate('3',fcccall) >0,Locate('3',fcccall),99),
if (Locate('4',fcccall) >0,Locate('4',fcccall),99),
if (Locate('5',fcccall) >0,Locate('5',fcccall),99),
if (Locate('6',fcccall) >0,Locate('6',fcccall),99),
if (Locate('7',fcccall) >0,Locate('7',fcccall),99),
if (Locate('8',fcccall) >0,Locate('8',fcccall),99),
if (Locate('9',fcccall) >0,Locate('9',fcccall),99)
) AS locationPos
, SUBSTRING(fcccall
, LEAST(
if (Locate('0',fcccall) >0,Locate('0',fcccall),99),
if (Locate('1',fcccall) >0,Locate('1',fcccall),99),
if (Locate('2',fcccall) >0,Locate('2',fcccall),99),
if (Locate('3',fcccall) >0,Locate('3',fcccall),99),
if (Locate('4',fcccall) >0,Locate('4',fcccall),99),
if (Locate('5',fcccall) >0,Locate('5',fcccall),99),
if (Locate('6',fcccall) >0,Locate('6',fcccall),99),
if (Locate('7',fcccall) >0,Locate('7',fcccall),99),
if (Locate('8',fcccall) >0,Locate('8',fcccall),99),
if (Locate('9',fcccall) >0,Locate('9',fcccall),99)
),3
) AS fccsuffix
FROM memberlist
ORDER BY locationPos, fccsuffix
Unfortunately, with MySQL, it's not possible to reference the result of the locationPos column within an expression in the same SELECT list.
For only one numeral I like:
SELECT *
FROM memberlist
ORDER BY SUBSTRING(fcccall,
LOCATE('0',fcccall)+
LOCATE('1',fcccall)+
LOCATE('2',fcccall)+
LOCATE('3',fcccall)+
LOCATE('4',fcccall)+
LOCATE('5',fcccall)+
LOCATE('6',fcccall)+
LOCATE('7',fcccall)+
LOCATE('8',fcccall)+
LOCATE('9',fcccall),3)
But the sensible approach is not to store two separate bits of information in one field.

Dev Code - Understanding What I Am Seeing

This is where I start by saying I am not a developer and this is not my code. As the DBA though it has shown up on plate from a performance perspective. The execution plan shows me that there are CI scans for Table2 aliased as D and Table2 aliased as E. Focusing on Table 2 aliased as E. The scan is coming from the subquery in the where clause for E.SEQ_NBR =
I am also seeing far more executions than need be. I know it depends on the exact index structure on the table, but at a high level is it likely that what I am seeing is a CI scan resulting from the aggregate (min) for every match it finds. Basically it is walking the table for the min SEQ_NBR for each match on EMPLID and other fields?
If likely, is it more a result of the manner in which it is written (I would think incorporating a CTE with some ROW_NUMBER logic would help) or lack of indexing? I am trying to avoid throwing an index at it "just because". I am getting hung up on that sub query in the where clause.
SELECT
D.EMPLID
,D.JOBCODE
,D.DEPTID
,E.DUR
,SUM(D.TL_QUANTITY) 'YTD_TL_QUANTITY'
FROM
Table1 B
,Table2 D
,Table2 E
WHERE
D.TRC = B.TRC
AND B.TL_ERNCD IN ( #0, #1, #2, #3, #4, #5, #6 )
AND D.EMPLID = E.EMPLID
AND D.EMPL_RCD = E.EMPL_RCD
AND D.DUR < = E.DUR
AND D.DUR > = '1/1/' + CAST(DATEPART(YEAR, E.DUR) AS CHAR)
AND E.SEQ_NBR =
( SELECT
MIN(EX.SEQ_NBR)
FROM
Table2 EX
WHERE
E.EMPLID = EX.EMPLID
AND E.EMPL_RCD = EX.EMPL_RCD
AND E.DUR = EX.DUR
)
AND B.EFFDT = ( SELECT
MAX(B_ED.EFFDT)
FROM
Table1 B_ED
WHERE
B.TRC = B_ED.TRC
AND B_ED.EFFDT < = GETDATE()
)
GROUP BY
D.EMPLID
,D.JOBCODE
,D.DEPTID
,E.DUR
The MIN operation has nothing to do with the CL scan. A MIN or Max is calculated using a sort. The problem is most likely the number of times the subquery is being executed. It has to loop through the subquery for every record returned in the parent query. A CTE may be helpful here depending on the size of Table2, but I don't think you need to worry about finding a replacement for the MIN() ... at least not yet.
Correlated subqueries are performance killers. Remove them and replace them with CTEs and JOINs or derived tables.
Try something like this (not tested)
SELECT
D.EMPLID
,D.JOBCODE
,D.DEPTID
,E.DUR
,SUM(D.TL_QUANTITY) 'YTD_TL_QUANTITY'
FROM Table1 B
JOIN Table2 D
ON D.TRC = B.TRC AND D.EMPLID = E.EMPLID
JOIN Table2 E
ON D.EMPL_RCD = E.EMPL_RCD AND D.DUR < = E.DUR
JOIN (SELECT MIN(EX.SEQ_NBR)FROM Table2) EX
ON E.EMPLID = EX.EMPLID
AND E.EMPL_RCD = EX.EMPL_RCD
AND E.DUR = EX.DUR
JOIN (SELECT MAX(B_ED.EFFDT)
FROM Table1
WHERE B_ED.EFFDT < = GETDATE()) B_ED
ON B.TRC = B_ED.TRC
WHERE B.TL_ERNCD IN ( #0, #1, #2, #3, #4, #5, #6 )
AND D.DUR > = '1/1/' + CAST(DATEPART(YEAR, E.DUR) AS CHAR)
As far as the implicit join syntax, do not allow anyone to ever do this again. It is a poor programming technique. As a DBA you can say what you will and will not allow in the database. Code review what is coming in and do not pass it until they remove the implicit syntax.
Why is is bad? In the first place you get accidental cross joins. Further, from a maintenance perspective, you can't tell if the cross join was accidental (and thus the query incorrect) or on purpose. This means the query with a cross join in it is unmaintainable.
Next, if you have to change some of the joins later to outer joins and do not fix all the implict ones at the same time, you can get incorrect results (which may not be noticed by an inexperienced developer. In SQL Server 2008 you cannot use the implicit syntax for an outer join, but it shouldn't have been used even as far back as SQl Server 2000 because Books Online (for SQL Server 2000) states that there are cases where it is misinterpreted. In other words, the syntax in unreliable for outer joins. There is no excuse ever for using an implicit join, you gain nothing from them over using an explicit join and they can create more problems.
You need to educate your developers and tell them that this code (which has been obsolete since 1992!) is not longer acceptable.
This a quick one, but this, CAST('1/1/' + CAST(DATEPART(YEAR, E.DUR) AS CHAR) AS DATETIME), it likely causing a table scan on Table2 E because the function likely has to be evaluated against each row.