Using timestamp in where with count in having clause - mysql

I am trying to use a datediff function in my where clause and need to use a count function in having clause but I need to use OR function ie select loans from clients in the past 2 months or count of loans should be more than 50
I tried to use UNION clause wherein I specify the datediff part in one part of the code and the count function in the other part but I am getting duplicate values because of the count function.
where FundedBit = 1
and SellerStatusTypeId = 1 -- Approved
and lu.DelegatedUnderwritingBit = 1
and cml.LastFileDeliveryCompletedDtTm is not Null
group by
p.PrimaryEmailAddrTxt,
s.SellerSalesExecPersonId,
s.SellerPartyId,
s.SellerNum,
s.PartyName ,
having (count(lc.LoanId) >= 50 or (cml.LastFileDeliveryCompletedDtTm <= convert(date,DATEADD(month, -2, GETDATE())))) ----I need output if either of this two gets executed
I am getting an error like "Column 'cml.LastFileDeliveryCompletedDtTm' is invalid in the HAVING clause because it is not contained in either an aggregate function or the GROUP BY clause"

Related

Retrieve rows from DB if Date-Difference condition apply

I need to retrieve rows from SQL database where a time-difference condition apply.
Basically I need to retrieve rows where date difference from Now() is < x minutes.
Column datetime has this format: 2022-12-05 15:01:43
I tried
SELECT * FROM copytrade WHERE DATEDIFF(minute, datetime, GETDATE() AS DateDiff) < 10
Using this select I get error "Incorrect parameter count in the call to native function 'DATEDIFF'
Is it possible in a single SQL line to achieve this select?
Thanks

How to calculate AVG, MAX and MIN number of rows in a column

I try to collect general statistics on the depth of correspondence: average, maximum and minimum number of messages of each type per one request. Have 2 tables:
First:
ticketId,ticketQueueId,ticketCreatedDate
Second:
articleId,articleCreatedDt,articleType (can be IN or OUT - support responses), ticketId
I reasoned like this:
SELECT AVG(COUNT(articleType='IN')) AS AT_IN, AVG(COUNT(articleType='OUT')) AS AT_OUT
FROM tickets.tickets JOIN tickets.articles
ON tickets.ticketId=articles.ticketId;
GROUP BY tickets.ticketId
but it doesn't work.
Error Code: 1111. Invalid use of group function
you can't use nested aggregation function (AVG(COUNT())) but use proper subquery and apply the aggregation function the the subquery gradually
also your use of of count in improper
the count function count each row where the related column is not null so in your case the evaluation articleType='IN' (or articleType='OUT') returning 0 or 1 is never null
select AVG(T_IN), AVG(T_OUT)
from (
SELECT sum(case when articleType='IN' then 1 else 0 END AS T_IN, sum(case when articleType='OUT' then 1 else 0 END AS T_OUT
FROM tickets.tickets
JOIN tickets.articles ON tickets.ticketId=articles.ticketId
GROUP BY tickets.ticketId
) t
(and You have also a wrong semicolon )

How do I aggregate with group by in mysql withou setting sql mode?

This query gets the output I want. In order to run it I have to run
SET sql_mode = '';
Because otherwise I get an error:
SELECT list is not in GROUP BY clause and contains nonaggregated column 'knownLoss.t1.loss' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
SELECT
t1.klDate AS LDate
, t1.Category
, t1.reason AS Reason
, sum(t1.loss) AS Loss
, round(t1.loss / t2.loss,2) AS Percentage
FROM
deptkl t1
JOIN (
SELECT
Store
, sum(loss) AS loss
FROM
deptkl
WHERE
klDate >= date_sub(SUBDATE(curdate(), WEEKDAY(curdate())), interval 7 day)
AND
klDate < SUBDATE(curdate(), WEEKDAY(curdate()))
AND
Store = 19
AND
Department = 80
) t2 ON t1.Store = t2.Store
WHERE
klDate >= date_sub(SUBDATE(curdate(), WEEKDAY(curdate())), interval 7 day)
AND
klDate < SUBDATE(curdate(), WEEKDAY(curdate()))
AND
t1.Store = 19
AND
Department = 80
GROUP BY
klDate
, Category
, reason
When I place this into the Dataset and Query Dialog of Jasper Studio, I get the same error and I am unable to use the SET sql_mode = ''; command. Any thoughts? If there is a way to achieve this without using SET sql_mode = '';?
I'm guessing the error is from this line in your select:
round(t1.loss / t2.loss,2) AS Percentage
Since you GROUP BY clause does not include this column it's somewhat of a coin toss which t1.loss and t2.loss values will be used. In some cases those values happen to always be the same based on your criteria and so you get the correct results regardless, but the db will still complain since it's being asked to return somewhat arbitrary results for those columns. One way to deal with this would be to simply apply an aggregate function to the columns in question like this:
round(min(t1.loss) / min(t2.loss),2) AS Percentage
or...
round(avg(t1.loss) / avg(t2.loss),2) AS Percentage
I think you want to do this :
round(sum(t1.loss / t2.loss)/count(*),2) AS Percentage
this will calculate the sum of the average loss for every records in the result then divide it on the the count of the record of the group it's like average of average.
EDITS:
sorry i made a syntax error now ,it should give the th wanted result and the error is because you are not using aggregate function on a column that is not in group by clause

SQL query to select values grouped by hour(col) and weekday(row) based on the timestamp

I have searched SO for this question and found slightly similar posts but was unable to adapt to my needs.
I have a database with server requests since forever, each one with a timestamp and i'm trying to come up with a query that allows me to create a heatmatrix chart (CCC HeatGrid).
The sql query result must represent the server load grouped by each hour of each weekday.
Like this: Example table
I just need the SQL query, i know how to create the chart.
Thank you,
Those looks like "counts" of rows.
One of the issues is "sparse" data, we can address that later.
To get the day of the week ('Sunday','Monday',etc.) returned, you can use the DATE_FORMAT function. To get those ordered, we need to include an integer value 0 through 6, or 1 through 7. We can use an ORDER BY clause on that expression to get the rows returned in the order we want.
To get the "hour" across the top, we can use expressions in the SELECT list that conditionally increments the count.
Assuming your timestamp column is named ts, and assuming you want to pull all rows from the year 2014, we start with something like this:
SELECT DAYOFWEEK(t.ts)
, DATE_FORMAT(t.ts,'%W')
FROM mytable t
WHERE t.ts >= '2014-01-01'
AND t.ts < '2015-01-01'
GROUP BY DAYOFWEEK(t.ts)
ORDER BY DAYOFWEEK(t.ts)
(I need to check the MySQL documentation, WEEKDAY and DAYOFWEEK are real similar, but we want the one that returns lowest value for Sunday, and highest value for Saturday... i think we want DAYOFWEEK, easy enough to fix later)
The "trick" now is the columns across the top.
We can extract the "hour" from timestamp using the DATE_FORMAT() function, the HOUR() function, or an EXTRACT() function... take your pick.
The expressions we want are going to return a 1 if the timestamp is in the specified hour, and a zero otherwise. Then, we can use a SUM() aggregate to count up the 1. A boolean expression returns a value of 1 for TRUE and 0 for FALSE.
, SUM( HOUR(t.ts)=0 ) AS `h0`
, SUM( HOUR(t.ts)=1 ) AS `h1`
, SUM( HOUR(t.ts)=2 ) AS `h2`
, '...'
, SUM( HOUR(t.ts)=22 ) AS `h22`
, SUM( HOUR(t.ts)=23 ) AS `h23`
A boolean expression can also evaluate to NULL, but since we have a predicate (i.e. condition in the WHERE clause) that ensures us that ts can't be NULL, that won't be an issue.
The other issue we can encounter (as I mentioned earlier) is "sparse" data. To illustrate that, consider what happens (with our query) if there are no rows that have a ts value for a Monday. What happens is that we don't get a row in the resultset for Monday. If it does happen that a row is "missing" for Monday (or any day of the week), we do know that all of the hourly counts across the "missing" Monday row would all be zero.

Retrieve Timediff between current row and next row in subquery

Why I am getting more than 24 hours? I am trying to get the timediff between each row in the sub-query if the timediff is greater than 10 min. then sum the result per day.
My goal is to figure out for each user the total of every brake thats longer than 10 min. and list that among the amount of calls on that particular day?
SELECT DATE_FORMAT(last_call, '%d, %W') AS DAY
, COUNT(call_id) AS calls
, ( SELECT SEC_TO_TIME(SUM((
SELECT timestampdiff(SECOND, c.last_call, c2.last_call)
FROM calls c2
WHERE c2.calling_agent = c.calling_agent
AND c2.last_call > c.last_call
AND timestampdiff(SECOND, c.last_call, c2.last_call) > 600
ORDER BY c2.last_call LIMIT 1
)))
FROM calls AS c
WHERE EXTRACT(DAY FROM c.last_call) = EXTRACT(DAY FROM calls.last_call)
) AS `brakes`
FROM calls
WHERE 9 IN (calls.reg_calling_agent)
AND last_call > DATE_SUB(now() , INTERVAL 12 MONTH)
GROUP BY EXTRACT(DAY FROM last_call)
ORDER BY EXTRACT(DAY FROM last_call) DESC
You're getting more than 24 hours because
1) the row retrieved from c2 could be from a different day. There's no guarantee that the next call (10 minutes after the previous call) isn't the first call made/received by an agent after a week long vacation.
2) that same "gap" of over 10 minutes is going to reported for the last call the agent made/received. And you're also going to get a "gap" between the call the agent made immediately before the one before the gap, and the one before that. That is, there's no provision to made exclude the calls that DID have a subsequent call within 10 minutes. (The subquery is just looking for any subsequent call that is 10 minutes after a call.)
3) you are getting getting an aggregate total (SUM) of all of those gaps in a given day, irregardless of the agent; all the gaps for all agents are being totaled.
4) the outer query is getting a years worth of calls, (for all agents?) but is grouping by day of month (1 through 31). So, you're getting back one row for the 5th of the month, but there will be multiple agents and multiple "days" (Jan 5, Feb 5, March 5, etc.), multiple values of 'brakes', and only one of those values is going to be included in the result,. It's indeterminate which of those row values will be returned. (Other RDBMS's would balk with this construct, a non-aggregate expression in the SELECT list which not included in the GROUP BY, but by default, MySQL allows it.)
--
FOLLOWUP
Q: could you please post the corrected query?
A: I don't have the table schema, or sample data, or a specification, so it's impossible for me to provide a "corrected" query.
For example, it's not at all clear why there's a predicate on reg_calling_agent in the outermost query, but the subqueries don't have any reference to that column, or any other column from the table in the outer query, except for the last_call column. The query to find a subsequent call is relying on the calling_agent column, not reg_calling_agent, but that's being performed for ALL calls in a given day of month.
I can take a shot a query that may be closer to what you are looking for, but there is absolutely no guarantee that this is "correct" in terms of matching the schema, the datatypes, the actual data, or the expected output. A query that returns unexpected results is not an adequate specification.
SELECT a.calling_agent
, DATE_FORMAT(a.last_call,'%d, %W') AS `day`
, COUNT(a.call_id) AS `calls`
, SEC_TO_TIME(
SUM(
SELECT IF(TIMESTAMPDIFF(SECOND, a.last_call, c.last_call) > 600
,TIMESTAMPDIFF(SECOND, a.last_call, c.last_call)
,NULL
) AS `gap`
FROM calls c
WHERE c.calling_agent = a.calling_agent
AND c.last_call > a.last_call
AND c.last_call < DATE(a.last_call)+INTERVAL 1 DAY
ORDER BY c.last_call
LIMIT 1
)
) AS `breaks`
FROM calls a
WHERE a.reg_calling_agent = 9
AND a.last_call > DATE(NOW()) - INTERVAL 12 MONTH
GROUP BY a.calling_agent, DATE_FORMAT(a.last_call,'%d, %W')
ORDER BY a.calling_agent, DATE_FORMAT(a.last_call,'%d, %W') DESC
UNPACKING THE QUERY
I thought I might provide some insight as to the design of this query, what it's intended to do. I retained the FROM and WHERE clauses from the original outer query. I just gave an alias to the calls table, and re-wrote the predicates to a form that I think is simpler, and that I'm more used to using.
For the GROUP BY, I added calling_agent, since it doesn't seem to make sense that we would want to lump all of the agents together. (It's really up to you to decide whether that matches the spec or not.) I did this because calling_agent is NOT referenced in the WHERE clause. (There's an equality predicate on reg_calling_agent, but that's a different column.)
I replaced the EXTRACT(DAY FROM ) expression, since that's only returning an integer value between 1 and 31. And it just doesn't seem to make sense to lump together all the "4th day" of all months. I chose to use the expression that's in the SELECT list; because that's the normative pattern... returning the expressions used in the GROUP BY clause in the SELECT list, so the client will be able to distinguish which row in the result belongs to which group identifier.
I also qualified all column references with a table alias, as an aid to the future reader. We're familiar following that pattern in complex queries. It's natural that we extend that same pattern to simpler queries, even when it's not required.
The big change is to the derived breaks column. (I renamed that from 'brakes', because it seems like what this query is doing is finding out when calling_agents weren't making/receiving calls, when workers were "taking a break". (That's entirely a guess on my part.)
There's a SEC_TO_TIME function, all that's doing is reformatting the result.
There's a SUM() aggregate. This is just going to total up the values, for each row in a that's in a "group".
The real "meat" is the correlated subquery. What that does... for each row returned by the outer query (i.e. every row from calls that satisfies the WHERE clause on the outer query)... we are going to run another SELECT. And it's going to look for the very "next" call made/received by the same calling_agent. To do that, the calling_agent on the "next" call needs to match the value from row from the outer query...
WHERE c.calling_agent = a.calling_agent
Also, the datetime/timestamp of the subsequent "call" needs to be anytime after the datetime/timestamp of the row from the outer query...
AND c.last_call > a.last_call
And, we only want to look for calls that are on the same calendar date (year, month, day) as the previous call. (This prevents us from considering a call made four days later as a "subsequent" call.)
AND c.last_call < DATE(a.last_call)+INTERVAL 1 DAY
And, out of all those potential subsequent calls, we only want the first one, so we order them by datetime/timestamp, and then take just the first one.
ORDER BY c.last_call
LIMIT 1
If we don't get a row, the subquery will return a NULL. If we do get a row, the next thing we want to do is check if the datetime/timestamp on this call is more than 10 minutes after the previous call. We use the same TIMESTAMPDIFF expression from the original query, to derive the number of seconds between the calls, and we compare that to 10 minutes. If the gap is greater than 10 minutes, we consider this as a "break", and we return the difference as number of seconds. Otherwise, we just return a NULL, as if we hadn't found a "next" row.
IF(TIMESTAMPDIFF(SECOND, a.last_call, c.last_call) > 600
,TIMESTAMPDIFF(SECOND, a.last_call, c.last_call)
,NULL
) AS `gap`
That's MySQL-specific shorthand for the ANSI-standard form:
CASE
WHEN TIMESTAMPDIFF(SECOND, a.last_call, c.last_call) > 600
THEN TIMESTAMPDIFF(SECOND, a.last_call, c.last_call)
ELSE NULL
END AS `gap`
(NOTE: the ELSE NULL could be omitted, that would be functionally equivalent because NULL is the default when ELSE is omitted. I include it here for completeness, and for comparison to the MySQL IF() function.)
Finally, we include all of the expressions in the GROUP BY clause in the SELECT list. (This isn't required, but it's the usual pattern. If those expressions are omitted, there should be a pretty obvious reason why they are omitted. For example, if the outer query had an equality predicate on calling_agent, e.g.
AND a.calling_agent = 86
Then we'd know that any row returned by the query would have a value of 86 returned for calling_agent, so we could omit the expression from the SELECT list. But if we omit an equality predicate, or change it so that more than one calling_agent could be returned, something like:
AND (a.calling_agent = 86 OR a.calling_agent = 99)
then without calling_agent in the SELECT list, we won't be able to tell which rows are for which calling_agent. If we're going to the bother of doing a GROUP BY on the expression, we usually want to include the expression in the SELECT list; that's the normal pattern.