Count number of days an employee missed work - mysql

I have a table which stores information about employees and one of those fields is Date. I want to make a query that returns a count of the number of days they have missed, not including weekends. Date format is '2018-1-1' for example, consecutive days would be '2018-1-2', '2018-1-3', and if next record is '2018-1-5', then count would increase by 1 because 2018-1-4 was a Thursday and they should have a record for that day.
Any ideas on how to best do this?
What I have so far:
SELECT * FROM `time` where name like 'John' AND DayOfWeek(Date) not like 7
and dayofweek(Date) not like 1
ORDER BY `time`.`Date` ASC
This is giving me all of the records for John excluding Saturdays and Sundays. What I want to do now is somehow find gaps between the dates that the records have for workdays. For example, consecutive days would be '2018-1-2', '2018-1-3', and if next record is '2018-1-5', then count would increase by 1 because 2018-1-4 was a Thursday

To make this possible you'll need a helper table, which can be useful also for many other purposes: a table with one column that has natural numbers starting from 0 up to some large n. You could create it like this:
create table nums (i int);
insert into nums values (0), (1), (2), (3);
insert into nums select i+4 from nums;
insert into nums select i+8 from nums;
insert into nums select i+16 from nums;
insert into nums select i+32 from nums;
insert into nums select i+64 from nums;
insert into nums select i+128 from nums;
insert into nums select i+256 from nums;
You can see how you double the number of records by adding a similar insert statement, but this will generate 512 records, which would be enough for your purposes.
Then you can use this query to answer your question:
SELECT ref_date
FROM (
SELECT date_add('2018-01-01', interval i day) ref_date
FROM nums
) calendar
WHERE ref_date <= curdate()
AND dayofweek(ref_date) not in (1, 7)
AND ref_date NOT IN (
SELECT Date
FROM `time`
WHERE name = 'John'
)
See also SQLfiddle

Related

SQL Find the average of 3 day closest

I have an SQL structure like this:
Create Table Transactions (
Id integer primary key not null auto_increment,
ResourceId varchar(255),
Price Integer,
TransactionTime date
);
I would like to get the time (TransactionTime) along with the average of 3 days price. For example, the 3 day average of the 22nd will be the average of the 20th, 21st, and 22nd.
Thanks so much.
Presumably, you want this information on each row and for a given resource. If so:
select t.*,
avg(price) over (partition by resourceid
order by transactiontime
range between interval 2 day preceding and current row
) as avg_3
from transactions t;
For SQL server:
SELECT AVG(Price), MAX(TransactionTime) FROM Transactions GROUP BY FLOOR(DATEDIFF(DAY, GETDATE(), TransactionTime) / 3);
You can use nested select:
select t.TransactionTime,
(select sum(t1.Price) / 3
from Transactions t1 where t1.Data in (t.Data, t.Date-2);) as avg3;
from Transactions t;

How to get relative counts/frequency in mySQL with a single query

I want to get relative counts/frequency of values (can be many) in the column.
From this toy table numbers:
num
1
2
3
1
1
2
1
0
This one:
num | count
0 | 0.125
1 | 0.5
2 | 0.25
3 | 0.125
I can do this with a variable and two queries:
SET #total = (SELECT COUNT(*) FROM numbers);
SELECT num, ROUND(COUNT(*) / #total, 3) AS count
FROM numbers
GROUP BY num
ORDER BY num ASC
But how I can get the results in one query (without listing all the possible values of num)?
If I am querying joins of several tables, then even getting a total number of rows becomes quite long and ugly.
EDIT: This is tested in msSql, misread question!
You can try this:
--DROP TABLE numbers
CREATE TABLE numbers(num decimal(16,3))
INSERT INTO numbers VALUES(1)
INSERT INTO numbers VALUES(2)
INSERT INTO numbers VALUES(3)
INSERT INTO numbers VALUES(1)
INSERT INTO numbers VALUES(1)
INSERT INTO numbers VALUES(2)
INSERT INTO numbers VALUES(1)
INSERT INTO numbers VALUES(0)
SELECT
num,
CAST(numCount as DECIMAL(16,2)) / CAST(sum(numCount) over() AS decimal(16,2)) frequency
FROM (
SELECT
num,
count(num) numCount
FROM
numbers
GROUP BY
num
) numbers
num frequency
0.000 0.1250000000000000000
1.000 0.5000000000000000000
2.000 0.2500000000000000000
3.000 0.1250000000000000000
You can use windowing functions -
SELECT DISTINCT num,
ROUND(CAST(COUNT(1) OVER (Partition by num) AS DECIMAL) / CAST(COUNT(1)OVER() AS DECIMAL),3) AS [count]
FROM numbers
ORDER BY num ASC
COUNT(num) would give the same results, it's personal preference for me to count a supplied value per row rather than counting the value in the rows, the partitioning handles which rows are included in the count.
Note the counts need to be cast as decimal, otherwise your division will be integer division, giving you wrong numbers.
Using DISTINCT instead of GROUP lets your windowing function apply to the whole table, not just each group within that table, and still only returns one result per num.
SQLFiddle
This is about the same number of keystrokes, and about the same performance, but it is only one statement:
SELECT n.num, ROUND(COUNT(*) / t.total, 3) AS count
FROM ( SELECT COUNT(*) AS total FROM numbers ) AS t
JOIN numbers AS n
GROUP BY n.num
ORDER BY n.num ASC

Cumulative SQL to work out no of payers

Currently trying to create a query that shows how many accounts have paid month on month but on a cumulative basis (penetration). So as an example I have a table with Month paid and account number, which shows what month that account paid.
Month | AccountNo
Jan-14 | 123456
Feb-14 | 321654
So using the above the result set would show
Month | Payers
Jan-14 | 1
Feb-14 | 2
being because one account paid in Jan, then one in Feb meaning that there have been by the end of Feb 2 payments overall, but only one in Jan. Tried a few inner joins back onto the table itself with a t1.Month >= t2.Month as i would for a normal cumulative query but the result is always out.
Any questions please ask, unsure if the above will be clear to anyone but me.
If you have date in the table then you can try the following query.
SELECT [Month]
,(SELECT COUNT(AccountNo)
FROM theTable i
-- This is to make sure to add until the last day of the current month.
WHERE i.[Date] <= DATEADD(s,-1,DATEADD(mm, DATEDIFF(m,0,o.[Date])+1,0)) AS CumulativeCount
FROM theTable o
Ok, several things. You need to have an actual date field, as you can't order by the month column you have.
You need to consider there may be gaps in the months - i.e. some months where there is no payment (not sure if that is true or not)
I'd recommend a recursive common table expression to do the actual aggregation
Heres how it works out:
-- setup
DECLARE #t TABLE ([Month] NCHAR(6), AccountNo INT)
INSERT #t ( [Month], AccountNo )
VALUES ( 'Jan-14',123456),('Feb-14',456789),('Apr-14',567890)
-- assume no payments in march
; WITH
t2 AS -- get a date column we can sort on
(
SELECT [Month],
CONVERT(DATETIME, '01 ' + REPLACE([Month], '-',' '), 6) AS MonthStart,
AccountNo
FROM #t
),
t3 AS -- group by to get the number of payments in each month
(
SELECT [Month], MonthStart, COUNT(1) AS PaymentCount FROM t2
GROUP BY t2.[Month], t2.MonthStart
),
t4 AS -- get a row number column to order by (accounting for gaps)
(
SELECT [Month], MonthStart, PaymentCount,
ROW_NUMBER() OVER (ORDER BY MonthStart) AS rn FROM t3
),
t5 AS -- recursive common table expression to aggregate subsequent rows
(
SELECT [Month], MonthStart, PaymentCount AS CumulativePaymentCount, rn
FROM t4 WHERE rn = 1
UNION ALL
SELECT t4.[Month], t4.MonthStart,
t4.PaymentCount + t5.CumulativePaymentCount AS CumulativePaymentCount, t4.rn
FROM t5 JOIN t4 ON t5.rn + 1 = t4.rn
)
SELECT [Month], CumulativePaymentCount FROM t5 -- select desired results
and the results...
Month CumulativePaymentCount
Jan-14 1
Feb-14 2
Apr-14 3
If your month column is date type then its easy to work on else you need some additional conversion for it. Here the query goes...
create table example (
MONTHS datetime,
AccountNo INT
)
GO
insert into example values ('01/Jan/2009',300345)
insert into example values ('01/Feb/2009',300346)
insert into example values ('01/Feb/2009',300347)
insert into example values ('01/Mar/2009',300348)
insert into example values ('01/Feb/2009',300349)
insert into example values ('01/Mar/2009',300350)
SELECT distinct datepart (m,months),
(SELECT count(accountno)
FROM example b
WHERE datepart (m,b.MONTHS) <= datepart (m,a.MONTHS)) AS Total FROM example a

Running total over date range - fill in the missing dates

I have the following table.
DATE | AMT
10/10 | 300
12/10 | 300
01/11 | 200
03/11 | 100
How do I get the monthly total? A result like -
DATE | TOT
1010 | 300
1110 | 300
1210 | 600
0111 | 800
0211 | 800
0311 | 900
A sql statement like
SELECT SUM(AMT) FROM TABLE1 WHERE DATE BETWEEN '1010' AND '0111'
would result in the 800 for 0111 but...
NOTE There is not a date restriction. which is my dilemma. How do I populate this column without doing a loop for all dates and have the missing months displayed as well?
To cater for missing months, create a template table to join against.
Think of it as caching. Rather than looping through and filling gaps, just have a calendar cached in your database.
You can even combine multiple calendars (start of month, start of week, bank holidays, working day, etc) all into one table, with a bunch of search flags and indexes.
You end up with something like...
SELECT
calendar.date,
SUM(data.amt)
FROM
calendar
LEFT JOIN
data
ON data.date >= calendar.date
AND data.date < calendar.date + INTERVAL 1 MONTH
WHERE
calendar.date >= '20110101'
AND calendar.date < '20120101'
GROUP BY
calendar.date
EDIT
I just noticed that the OP wants a running total.
This -is- possible in SQL but it is extremely inefficient. The reason being that the result from one month isn't used to calculate the following month. Instead the whole running-total has to be calculated again.
For this reason It is normally strongly recommended that you calculate the monthly total as above, then use your application to itterate through and make the running total values.
If you really must do it in SQL, it would be something like...
SELECT
calendar.date,
SUM(data.amt)
FROM
calendar
LEFT JOIN
data
ON data.date >= #yourFirstDate
AND data.date < calendar.date + INTERVAL 1 MONTH
WHERE
calendar.date >= #yourFirstDate
AND calendar.date < #yourLastDate
GROUP BY
calendar.date
the main problem is the and have the missing months displayed as well?
I don't see how to do it with out an aux table containing the combination of month\year to be displayed:
create table table1(
date datetime,
amt int
)
insert into table1 values ('10/10/2010',100)
insert into table1 values ('12/12/2010',200)
insert into table1 values ('01/01/2011',50)
insert into table1 values ('03/03/2011',500)
truncate table #dates
create table #dates(
_month int,
_year int
)
insert into #dates values(10,2010)
insert into #dates values(11,2010) --missing month
insert into #dates values(12,2010)
insert into #dates values(01,2011)
insert into #dates values(02,2011)--missing month
insert into #dates values(03,2011)
select D._month, D._year, sum(amt)
from #dates D left join TABLE1 T on D._month=month(T.date) and D._year=year(T.date)
group by D._month, D._year
You can also generate a range on the fly, pass its value as the interval to DATE_ADD, and basically project a sequence of month values.
As #Dems said, you need to have a correlated subquery calculate the running total, which will be very inefficient, because it will run a nested loop internally.
To see how to generate the sequence, check my post here:
How to generate a range of numbers in Mysql
The end query should look something like this: (Incidentally, you should have a date column, not this varchar mess).
/*NOTE: This assumes a derived table (inline view) containing the sequence of date values and their corresponding TOT value*/
SELECT
DATEVALUES.DateValue,
(
SELECT SUM(TABLE1.AMT) FROM TABLE1 WHERE TABLE1.DateValue <= DATEVALUES.DateValue)
) AS RunningSubTotal
FROM
DATEVALUES
Or something like that.
select sum(AMT) from TABLE1 group by Date

SQL Work out the average time difference between total rows

I've searched around SO and can't seem to find a question with an answer that works fine for me. I have a table with almost 2 million rows in, and each row has a MySQL Date formatted field.
I'd like to work out (in seconds) how often a row was inserted, so work out the average difference between the dates of all the rows with a SQL query.
Any ideas?
-- EDIT --
Here's what my table looks like
id, name, date (datetime), age, gender
If you want to know how often (on average) a row was inserted, I don't think you need to calculate all the differences. You only need to sum up the differences between adjacent rows (adjacent based on the timestamp) and divide the result by the number of the summands.
The formula
((T1-T0) + (T2-T1) + … + (TN-TN-1)) / N
can obviously be simplified to merely
(TN-T0) / N
So, the query would be something like this:
SELECT TIMESTAMPDIFF(SECOND, MIN(date), MAX(date)) / (COUNT(*) - 1)
FROM atable
Make sure the number of rows is more than 1, or you'll get the Division By Zero error. Still, if you like, you can prevent the error with a simple trick:
SELECT
IFNULL(TIMESTAMPDIFF(SECOND, MIN(date), MAX(date)) / NULLIF(COUNT(*) - 1, 0), 0)
FROM atable
Now you can safely run the query against a table with a single row.
Give this a shot:
select AVG(theDelay) from (
select TIMESTAMPDIFF(SECOND,a.date, b.date) as theDelay
from myTable a
join myTable b on b.date = (select MIN(x.date)
from myTable x
where x.date > a.date)
) p
The inner query joins each row with the next row (by date) and returns the number of seconds between them. That query is then encapsulated and is queried for the average number of seconds.
EDIT: If your ID column is auto-incrementing and they are in date order, you can speed it up a bit by joining to the next ID row rather than the MIN next date.
select AVG(theDelay) from (
select TIMESTAMPDIFF(SECOND,a.date, b.date) as theDelay
from myTable a
join myTable b on b.date = (select MIN(x.id)
from myTable x
where x.id > a.id)
) p
EDIT2: As brilliantly commented by Mikael Eriksson, you may be able to just do:
select (TIMESTAMPDIFF(SECOND,(MAX(date),MIN(date)) / COUNT(*)) from myTable
There's a lot you can do with this to eliminate off-peak hours or big spans without a new record, using the join syntax in my first example.
Try this:
select avg(diff) as AverageSecondsBetweenDates
from (
select TIMESTAMPDIFF(SECOND, t1.MyDate, min(t2.MyDate)) as diff
from MyTable t1
inner join MyTable t2 on t2.MyDate > t1.MyDate
group by t1.MyDate
) a