Exclude zero from average result

Exclude zero from average result - function

I wish to get the 12 week average of sales by day name (IE 12 weeks worth of Mondays...)
but I want to not have the zero values that may occur due to a store closure.Example of 12 Weeks with Zero's
I wrote a script to try exclude the Zero's
select
de.[Restaurant Name]
,dd.[DayNameOfWeek]
,dd.[FinancialWC]
,AVG([Net]) OVER (PARTITION BY dw.[RedCatID], dd.DayNameOfWeek
ORDER BY dw.[RedCatID], dd.[FullDate] ROWS 11 PRECEDING ) as [12 Week]
from Daily_Sales_Summary dw
inner join Restaurant_View de on de.RedCatID = dw.[RedCatID]
inner join DimDate dd on dd.[DateKey] = dw.[DateKey]
where [net] <> '0'
group by de.[Restaurant Name]
,dw.[RedCatID]
,dd.[DayNameOfWeek]
,dd.[FullDate]
,dd.[FinancialWC],[Net]
order by [Restaurant Name] asc
This did give me the averages I wanted but produced no results for the days with a zero value
Query Results showing missing dates
How would I go about doing this without missing the weeks with zero nulls?
I am happy to have it go back an extra week if a zero is present instead of excluding it if that is a simpler solution..

You need to be a little creative here to include rows but not count them in average. Try the code below and see if it works. I haven't tested it.
Basically, what is does is redefines the denominator of calculation to give a count of rows where [NET] <> 0
EDIT: Found a simpler solution than what i originally posted
select
de.[Restaurant Name]
,dd.[DayNameOfWeek]
,dd.[FinancialWC]
,avg(nullif([Net],0)) OVER (PARTITION BY dw.[RedCatID], dd.DayNameOfWeek
ORDER BY dw.[RedCatID], dd.[FullDate] ROWS 11 PRECEDING )
as [12 Week]
from Daily_Sales_Summary dw
inner join Restaurant_View de on de.RedCatID = dw.[RedCatID]
inner join DimDate dd on dd.[DateKey] = dw.[DateKey]
group by de.[Restaurant Name]
,dw.[RedCatID]
,dd.[DayNameOfWeek]
,dd.[FullDate]
,dd.[FinancialWC],[Net]
order by [Restaurant Name] asc

Related

MySQL - get users who placed 25th order during period

I have users and orders tables with this structure (simplified for question):
USERS
userid
registered(date)
ORDERS
id
date (order placed date)
user_id
I need to get array of users (array of userid) who placed their 25th order during specified period (for example in May 2019), date of 25th order for each user, number of days to place 25th order (difference between registration date for user and date of 25th order placed).
For example if user registered in April 2018, then placed 20 orders in 2018, and then placed 21-30th orders in Jan-May 2019 - this user should be in this array, if he placed 25th (overall for his account) order in May 2019.
How I can do this with MySQL request?
Sample data and structure: http://www.sqlfiddle.com/#!9/998358 (for testing you can get 3rd order as ex., not 25th, to not add a lot of sample data records).
One request is not required - if this can't be done in one request, few is possible and allowed.

You can use a correlated subquery to get the count of orders placed before the current one by a user. If that's 24 the current order is the 25th. Then check if the date is in the desired range.
SELECT o1.user_id,
o1.date,
datediff(o1.date, u1.registered)
FROM orders o1
INNER JOIN users u1
ON u1.userid = o1.user_id
WHERE (SELECT count(*)
FROM orders o2
WHERE o2.user_id = o1.user_id
AND o2.date < o1.date
OR o2.date = o1.date
AND o2.id < o1.id) = 24
AND o1.date >= '2019-01-01'
AND o1.date < '2019-06-01';

The basic inefficient way of doing this would be to get the user_id for every row in ORDERS where the date is in your target range AND the count of rows in ORDERS with the same user_id and a lower date is exactly 24.
This can get very ugly, very quickly, though.
If you're calling this from code you control, can't you do it from the code?
If not, there should be a way to assign to each row an index describing its rank among orders for its specific user_id, and select from this all user_id from rows with an index of 25 and a correct date. This will give you a select from select from select, but it should be much faster. The difficulty here is to control the order of the rows, so here are the selects I envision:
Select all rows, order by user_id asc, date asc, union-ed to nothing from a table made of two vars you'll initialize at 0.
from this, select all while updating a var to know if a row's user_id is the same as the last, and adding a field that will report so (so for each user_id the first line in order will have a specific value like 0 while the other rows for the same user_id will have a 1)
from this, select all plus a field that equals itself plus one in case the first added field is 1, else 0
from this, select the user_id from the rows where the second added field is 25 and the date is in range.
The union thingy is only necessary if you need to do it all in one request (you have to initialize them in a lower select than the one they're used in).
Edit: Well if you need the date too you can just select it along with the user_id, but calculating the number of days in sql will be a pain. Just join the result table to the users table and get both the date of 25th order and their date of registration, you'll surely be able to do the difference in code.
I'll try building an actual request, however if you want to truly understand what you need to make this you gotta read up on mysql variables, unions, and conditional statements.
"Looks too complicated. I am sure that this can be done with current DB structure and 1-2 requests." Well, yeah. Use the COUNT request, it will be easy, and slow as hell.
For the complex answer, see http://www.sqlfiddle.com/#!9/998358/21
Since you can use multiple requests, you can just initialize the vars first.
It isn't actually THAT complicated, you just have to understand how to concretely express what you mean by "an user's 25th command" to a SQL engine.
See http://www.sqlfiddle.com/#!9/998358/24 for the difference in days, turns out there's a method for that.
Edit 5: seems you're going with the COUNT method. I'll pray your DB is small.
Edit 6: For posterity:
The count method will take years on very large databases. Since OP didn't come back, I'm assuming his is small enough to overlook query speed. If that's not your case and let's say it's 10 years from now and the sqlfiddle links are dead; here's the two-queries solution:
SET #PREV_USR:=0;
SELECT user_id, date_ FROM (
SELECT user_id, date_, SAME_USR AS IGNORE_SMUSR,
#RANK_USR:=(CASE SAME_USR WHEN 0 THEN 1 ELSE #RANK_USR+1 END) AS RANK FROM (
SELECT orders.*, CASE WHEN #PREV_USR = user_id THEN 1 ELSE 0 END AS SAME_USR,
#PREV_USR:=user_id AS IGNORE_USR FROM
orders
ORDER BY user_id ASC, date_ ASC, id ASC
) AS DERIVED_1
) AS DERIVED_2
WHERE RANK = 25 AND YEAR(date_) = 2019 AND MONTH(date_) = 4 ;
Just change RANK = ? and the conditions to fit your needs. If you want to fully understand it, start by the innermost SELECT then work your way high; this version fuses the points 1 & 2 of my explanation.
Now sometimes you will have to use an API or something and it wont let you keep variable values in memory unless you commit it or some other restriction, and you'll need to do it in one query. To do that, you put the initialization one step lower and make it so it does not affect the higher statements. IMO the best way to do this is in a UNION with a fake table where the only row is excluded. You'll avoid the hassle of a JOIN and it's just better overall.
SELECT user_id, date_ FROM (
SELECT user_id, date_, SAME_USR AS IGNORE_SMUSR,
#RANK_USR:=(CASE SAME_USR WHEN 0 THEN 1 ELSE #RANK_USR+1 END) AS RANK FROM (
SELECT DERIVED_4.*, CASE WHEN #PREV_USR = user_id THEN 1 ELSE 0 END AS SAME_USR,
#PREV_USR:=user_id AS IGNORE_USR FROM
(SELECT * FROM orders
UNION
SELECT * FROM (
SELECT (#PREV_USR:=0) AS INIT_PREV_USR, 0 AS COL_2, 0 AS COL_3
) AS DERIVED_3
WHERE INIT_PREV_USR <> 0
) AS DERIVED_4
ORDER BY user_id ASC, date_ ASC, id ASC
) AS DERIVED_1
) AS DERIVED_2
WHERE RANK = 25 AND YEAR(date_) = 2019 AND MONTH(date_) = 4 ;
With that method, the thing to watch for is the amount and the type of columns in your basic table. Here orders' first field is an int, so I put INIT_PREV_USR in first then there are two more fields so I just add two zeroes with names and call it a day. Most types work, since the union doesn't actually do anything, but I wouldn't try this when your first field is a blob (worst comes to worst you can use a JOIN).
You'll note this is derived from a method of pagination in mysql. If you want to apply this to other engines, just check out their best pagination calls and you should be able to work thinks out.

MYSQL join most recent record and multiply

I have a stock table and a stock history table, and I am basically trying to write a MySQL statement which will get the value of the stock on a particular day (in this case on the 31st of March), which can only be found by multiplying the cost per unit against what the balance for each item was on the particular day
So far I have :
SELECT
SUM(tbl_stock.cost_per_unit * tbl_stock_history.quantity_balance) as total
FROM
tbl_stock
LEFT JOIN
tbl_stock_history ON tbl_stock.part_ID = tbl_stock_history.part_ID
WHERE
tbl_stock_history.date_of_entry <= '20180331'
and tbl_stock.department = 1
AND tbl_stock.qty > 0
Unfortunately, this code takes the sum of ALL qty_balances found against the part ID's history instead of just the most recent one against the booking_date parameter.
I have tried all the solutions I could find with sub select queries but none of them were playing ball and I feel like I am missing something super obvious!
Any help is greatly appreciated!

I think that this is what you are looking for:
SELECT
SUM(tbl_stock.cost_per_unit * t.quantity_balance) as total
FROM
tbl_stock
LEFT JOIN
(
SELECT * FROM tbl_stock_history
WHERE date_of_entry <= '20180331' ORDER BY date_of_entry DESC limit 1
)
t on tbl_stock.part_ID = t.part_ID
WHERE tbl_stock.department = 1
AND tbl_stock.qty > 0

Generating complex sql tables

I currently have an employee logging sql table that has 3 columns
fromState: String,
toState: String,
timestamp: DateTime
fromState is either In or Out. In means employee came in and Out means employee went out. Each row can only transition from In to Out or Out to In.
I'd like to generate a temporary table in sql to keep track during a given hour (hour by hour), how many employees are there in the company. Aka, resulting table has columns HourBucket, NumEmployees.
In non-SQL code I can do this by initializing the numEmployees as 0 and go through the table row by row (sorted by timestamp) and add (employee came in) or subtract (went out) to numEmployees (bucketed by timestamp hour).
I'm clueless as how to do this in SQL. Any clues?

Use a COUNT ... GROUP BY query. Can't see what you're using toState from your description though! Also, assuming you have an employeeID field.
E.g.
SELECT fromState AS 'Status', COUNT(*) AS 'Number'
FROM StaffinBuildingTable
INNER JOIN (SELECT employeeID AS 'empID', MAX(timestamp) AS 'latest' FROM StaffinBuildingTable GROUP BY employeeID) AS LastEntry ON StaffinBuildingTable.employeeID = LastEntry.empID
GROUP BY fromState
The LastEntry subquery will produce a list of employeeIDs limited to the last timestamp for each employee.
The INNER JOIN will limit the main table to just the employeeIDs that match both sides.
The outer GROUP BY produces the count.
SELECT HOUR(SBT.timestamp) AS 'Hour', SBT.fromState AS 'Status', COUNT(*) AS 'Number'
FROM StaffinBuildingTable AS SBT
INNER JOIN (
SELECT SBIJ.employeeID AS 'empID', MAX(timestamp) AS 'latest'
FROM StaffinBuildingTable AS SBIJ
WHERE DATE(SBIJ.timestamp) = CURDATE()
GROUP BY SBIJ.employeeID) AS LastEntry ON SBT.employeeID = LastEntry.empID
GROUP BY SBT.fromState, HOUR(SBT.timestamp)
Replace CURDATE() with whatever date you are interested in.
Note this is non-optimal as it calculates the HOUR twice - once for the data and once for the group.
Again you are using the INNER JOIN to limit the number of returned row, this time to the last timestamp on a given day.

To me your description of the FromState and ToState seem the wrong way round, I'd expect to doing this based on the ToState. But assuming I'm wrong on that the following should point you in the right direction:
First, I create a "Numbers" table containing 24 rows one for each hour of the day:
create table tblHours
(Number int);
insert into tblHours values
(0),(1),(2),(3),(4),(5),(6),(7),
(8),(9),(10),(11),(12),(13),(14),(15),
(16),(17),(18),(19),(20),(21),(22),(23);
Then for each date in your employee logging table, I create a row in another new table to contain your counts:
create table tblDailyHours
(
HourBucket datetime,
NumEmployees int
);
insert into tblDailyHours (HourBucket, NumEmployees)
select distinct
date_add(date(t.timeStamp), interval h.Number HOUR) as HourBucket,
0 as NumEmployees
from
tblEmployeeLogging t
CROSS JOIN tblHours h;
Then I update this table to contain all the relevant counts:
update tblDailyHours h
join
(select
h2.HourBucket,
sum(case when el.fromState = 'In' then 1 else -1 end) as cnt
from
tblDailyHours h2
join tblEmployeeLogging el on
h2.HourBucket >= el.timeStamp
group by h2.HourBucket
) cnt ON
h.HourBucket = cnt.HourBucket
set NumEmployees = cnt.cnt;
You can now retrieve the counts with
select *
from tblDailyHours
order by HourBucket;
The counts give the number on site at each of the times displayed, if you want during the hour in question, we'd need to tweak this a little.
There is a working version of this code (using not very realistic data in the logging table) here: rextester.com/DYOR23344
Original Answer (Based on a single over all count)
If you're happy to search over all rows, and want the current "head count" you can use this:
select
sum(case when t.FromState = 'In' then 1 else -1) as Heads
from
MyTable t
But if you know that there will always be no-one there at midnight, you can add a where clause to prevent it looking at more rows than it needs to:
where
date(t.timestamp) = curdate()
Again, on the assumption that the head count reaches zero at midnight, you can generalise that method to get a headcount at any time as follows:
where
date(t.timestamp) = "CENSUS DATE" AND
t.timestamp <= "CENSUS DATETIME"
Obviously you'd need to replace my quoted strings with code which returned the date and datetime of interest. If the headcount doesn't return to zero at midnight, you can achieve the same by removing the first line of the where clause.

AVG or SUM in SQL where the values are being calculated on the fly

I have an existing SQL query that gets call stats from a Zultys MX250 phone system: -
SELECT
CONCAT(LEFT(u.firstname,1),LEFT(u.lastname,1)) AS Name,
sec_to_time(SUM(
time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)
)) AS Duration,
COUNT(*) AS '#Calls'
FROM
session s
JOIN mxuser u ON
s.ExtensionID1 = u.ExtensionId
OR s.ExtensionID2 = u.ExtensionId
WHERE
s.ServiceExtension1 IS NULL
AND s.connecttimestamp >= CURRENT_DATE
AND BINARY u.userprofilename = BINARY 'DBAM'
GROUP BY
u.firstname,
u.lastname
ORDER BY
'#Calls' DESC,
Duration DESC;
Output is as follows: -
Name Duration #Calls
TH 01:19:10 30
AS 00:44:59 28
EW 00:51:13 22
SH 00:21:20 13
MG 00:12:04 8
TS 00:42:02 5
DS 00:00:12 1
I am trying to generate a 4th column that shows the average call time for each user, but am struggling to figure out how.
Mathematically it's just "'Duration' / '#Calls'" but after looking at some similar questions on StackOverflow, the example queries are too simple to help me relate to my one above.
Right now, I'm not even sure that it's going to be possible to divide the time column by the number of calls.
UPDATE: I was so close in my testing but got all confused & overcomplicated things. Here's the latest SQL (thanks to #McAdam331 & my buddy Jim from work): -
SELECT
CONCAT(LEFT(u.firstname,1),LEFT(u.lastname,1)) AS Name,
sec_to_time(SUM(
time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)
)) AS Duration,
COUNT(*) AS '#Calls',
sec_to_time(SUM(time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)) / COUNT(*)) AS Average
FROM
session s
JOIN mxuser u ON
s.ExtensionID1 = u.ExtensionId
OR s.ExtensionID2 = u.ExtensionId
WHERE
s.ServiceExtension1 IS NULL
AND s.connecttimestamp >= CURRENT_DATE
AND BINARY u.userprofilename = BINARY 'DBAM'
GROUP BY
u.firstname,
u.lastname
ORDER BY
Average DESC;
Output is as follows: -
Name Duration #Calls Average
DS 00:14:25 4 00:03:36
MG 00:17:23 11 00:01:34
TS 00:33:38 22 00:01:31
EW 01:04:31 43 00:01:30
AS 00:49:23 33 00:01:29
TH 00:43:57 35 00:01:15
SH 00:13:51 12 00:01:09

Well, you are able to get the number of total seconds, as you do before converting it to time. Why not take the number of total seconds, divide that by the number of calls, and then convert that back to time?
SELECT sec_to_time(
SUM(time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)) / COUNT(*))
AS averageDuration

If I understand correctly, you can just replace sum() with avg():
SELECT
CONCAT(LEFT(u.firstname,1),LEFT(u.lastname,1)) AS Name,
sec_to_time(SUM(
time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)
)) AS Duration,
COUNT(*) AS `#Calls`,
sec_to_time(AVG(
time_to_sec(s.disconnecttimestamp) - time_to_sec(s.connecttimestamp)
)) AS AvgDuration

Seems like all you need is another expression in the SELECT list. The SUM() aggregate (from the second expression) divided by COUNT aggregate (the third expr). Then wrap that in a sec_to_time function. (Unless I'm totally missing the question.)
Personally, I'd use the TIMESTAMPDIFF function to get a difference in times.
SEC_TO_TIME(
SUM(TIMESTAMPDIFF(SECOND,s.connecttimestamp,s.disconnecttimestamp))
/ COUNT(*)
) AS avg_duration
If what you are asking is there's a way to reference other expressions in the SELECT list by the alias... the answer is unfortunately, there's not.
With a performance penalty, you could use your existing query as an inline view, then in the outer query, the alias names assigned to the expressions are available...
SELECT t.Name
, SEC_TO_TIME(s.TotalDur) AS Duration
, s.`#Calls`
, SEC_TO_TIME(s.TotalDur/s.`#Calls`) AS avgDuration
FROM (
SELECT CONCAT(LEFT(u.firstname,1),LEFT(u.lastname,1)) AS Name
, SUM(TIMESTAMPDIFF(SECOND,s.connecttimestamp,s.disconnecttimestamp)) AS TotalDur
, COUNT(1) AS `#Calls`
FROM session s
-- the rest of your query
) t

MySQL cumulative sum grouped by date

I know there have been a few posts related to this, but my case is a little bit different and I wanted to get some help on this.
I need to pull some data out of the database that is a cumulative count of interactions by day. currently this is what i have
SELECT
e.Date AS e_date,
count(e.ID) AS num_interactions
FROM example AS e
JOIN example e1 ON e1.Date <= e.Date
GROUP BY e.Date;
The output of this is close to what I want but not exactly what I need.
The problem I'm having is the dates are stored with the hour minute and second that the interaction happened, so the group by is not grouping days together.
This is what the output looks like.
On 12-23 theres 5 interactions but its not grouped because the time stamp is different. So I need to find a way to ignore the timestamp and just look at the day.
If I try GROUP BY DAY(e.Date) it groups the data by the day only (i.e everything that happened on the 1st of any month is grouped into one row) and the output is not what I want at all.
GROUP BY DAY(e.Date), MONTH(e.Date) is splitting it up by month and the day of the month, but again the count is off.
I'm not a MySQL expert at all so I'm puzzled on what i'm missing

New Answer
At first, I didn't understand you were trying to do a running total. Here is how that would look:
SET #runningTotal = 0;
SELECT
e_date,
num_interactions,
#runningTotal := #runningTotal + totals.num_interactions AS runningTotal
FROM
(SELECT
DATE(eDate) AS e_date,
COUNT(*) AS num_interactions
FROM example AS e
GROUP BY DATE(e.Date)) totals
ORDER BY e_date;
Original Answer
You could be getting duplicates because of your join. Maybe e1 has more than one match for some rows which is inflating your count. Either that or the comparison in your join is also comparing the seconds, which is not what you expect.
Anyhow, instead of chopping the datetime field into days and months, just strip the time from it. Here is how you do that.
SELECT
DATE(e.Date) AS e_date,
count(e.ID) AS num_interactions
FROM example AS e
JOIN example e1 ON DATE(e1.Date) <= DATE(e.Date)
GROUP BY DATE(e.Date);

I figured out what I needed to do last night... but since I'm new to this I couldn't post it then... what I did that worked was this:
SELECT
DATE(e.Date) AS e_date,
count(e.ID) AS num_daily_interactions,
(
SELECT
COUNT(id)
FROM example
WHERE DATE(Date) <= e_date
) as total_interactions_per_day
FROM example AS e
GROUP BY e_date;
Would that be less efficient than your query? I may just do the calculation in python after pulling out the count per day if its more efficient, because this will be on the scale of thousands to hundred of thousands of rows returned.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Exclude zero from average result - function

Related

MySQL - get users who placed 25th order during period

MYSQL join most recent record and multiply

Generating complex sql tables

AVG or SUM in SQL where the values are being calculated on the fly

MySQL cumulative sum grouped by date

Categories

Resources