Best way to plot by hour by day by month in r - html

currently I have created the following dataframe in R but I am having trouble with my visualisation.
The Dataframe looks as follows:
date weekday dayhour amount
2017-06 0 1 100
2017-06 0 2 200
2017-06 0 3 150
2017-06 0 4 600
2017-06 0 5 75
....
2018-06 6 21 60
2018-06 6 22 90
2018-06 6 23 150
2018-06 6 24 110
The amount is the average of that weekday by hour for that month. So for example the month june in 2017 on the first hour of each monday in june has an average amount of 100.
Now the idea is to plot my data in R in several graphs which will show me the data by hour by weekday for that given month. So 12 plots with each the amount on the y axis and the hour+weekday on the x axis.
I have tried several approaches such as looping through the months and plotting them with par(mfrow = c(2,6)). Also I tried plotting them one by one. However I am still a rookie with R and I can't find any good documentation or tutorial on how to do this. For now I have only been able to stack the datapoints in one loop by weekday and not by hour by doing the following on the dataset without hours included yet:
increase = 7
for (i in (length(occupancy_by_day)/7)) {
data = head(occupancy_by_day,increase:increase+increase)
plot(average_occupancy ~ Weekday, data=data)
increase = increase + 7
}
My closest guess to the correct answer at this moment is something like this:
par(mfrow = c(2,6))
increase = 06
for (i in (length(occupancy_by_day)/30,5)) {
data = occupancy_by_day[occupancy_by_day$date == paste(c('2017-',increase)), ]
plot(amount ~ weekday, data=data)
increase = increase + 1
}
This gives me the error:
Error in plot.window(...) : need finite 'xlim' values
Does anyone know a good solution to plotting the data in R?
Thanks in advance for any help/comments!
EDIT:
priority on this post would be how to plot data by hour by weekday. I could iterate through the months manually however I would still need to plot them. A loop for each month would be added bonus. Right now I have the following:
data =occupancy_by_day[occupancy_by_day$date == '2017-06', ]
plot(Amount ~ weekday+dayhour, data=data)
This sadly only plots the data by dayhour.
ADDED DRAWING OF CONCEPT:
https://imgur.com/qKFbbmJ
ANSWER:
Eventually I did a litle workaround to plot them with:
ggplot(data = data[data$date == '2017-12', ], aes(plotstamp, Amount, group=Weekday, col=Weekday)) +
geom_line() +
geom_point() +
ggtitle("December 2017")
the plotstamp is an extra column/index I added to my DF which allowed my to plot the values continously. Then I just plotted them seperately per month.

Make similar data
I think this is the partial solution you ask for in your edit (if I understand your task correctly), but I believe you can loop through months in the same way.
The only way I could think of was to transform the dates you have to date class. I used some prepared date data but you can fix yours using the strptime() and paste() commands to match mine. Also, the data I made is only for two days.
date1 <- c(rep("2017-06-1",24),rep("2017-06-2",24))
weekday <- c(rep(0,24),rep(1,24))
dayhour <- c(1:24,1:24)
# Add dayhour to date
date <- paste(date1, dayhour, sep = " ")
date <- strptime(date, "%Y-%m-%d %H")
amount <- c(1:24,(48:25)*2)
dat <- data.frame(date,weekday,dayhour,amount)
View(dat)
plot(x=dat$date, y=dat$amount)
This is how my created data looks like.
date weekday dayhour amount
1 2017-06-01 01:00:00 0 1 1
2 2017-06-01 02:00:00 0 2 2
3 2017-06-01 03:00:00 0 3 3
4 2017-06-01 04:00:00 0 4 4
....
46 2017-06-02 22:00:00 1 22 54
47 2017-06-02 23:00:00 1 23 52
48 2017-06-03 00:00:00 1 24 50
Loop for the plot.
If you write this in an R markdown document you will get nice pages for each plot so you don't have to use par(mfrow = c(1,2)). You probably also need to fix the loop arguments to fit your data.
par(mfrow = c(1,2))
start <- 0
end <- 23
step = 1
for (i in 1:(length(dat$date)/24)) {
data <- dat[(start+step) : (end+step), ] # The parenteses at (start+step) and (end+step) are important!
plot(x = data$date, y = data$amount)
step = step + 23
}
I hope this help you.
P.S. This is the first answer I write, so feel free to edit or improve my answer.

Related

MYSQL How to perform custom month difference between two dates in MYSQL?

My requirement is to compute the total months and then broken months separately between 2 dates (ie first date from table and second date is current date). If broken months total count is > 15 then account it as one month experience and if its les than 15 don't account that as 1 month experience.
Assume I have a date on table as 25/11/2018 and current date is 06/01/2019;
the full month in between is December, so 1 month experience; and broken months are November and January, so now I have to count the dates which is 6 days in Nov and 6 days in Jan, so 12 days and is <= (lte) 15 so total experience will be rounded to 1 month experience
I referred multiple questions related to calculating date difference in MYSQL from stackoverflow, but couldn't find any possible options. The inbuilt functions in MYSQL TIMESTAMPDIFF, TIMEDIFF, PERIOD_DIFF, DATE_DIFF are not giving my required result as their alogrithms are different from my calculation requirement.
Any clue on how to perform this calculation in MYSQL and arrive its result as part of the SQL statement will be helpful to me. Once this value is arrived, in the same SQL, that value will be validated to be within a given value range.
Including sample table structure & value:
table_name = "user"
id | name | join_date
---------------------
1| Sam | 25-11-2017
2| Moe | 03-04-2017
3| Tim | 04-07-2018
4| Sal | 30-01-2017
5| Joe | 13-08-2018
I wanted to find out the users from above table whose experience is calculated in months based on the aforementioned logic. If those months are between either of following ranges, then those users are fetched for further processing.
table_name: "allowed_exp_range"
starting_exp_months | end_exp_months
-------------------------------------
0 | 6
9 | 24
For ex: Sam's experience till date (10-12-2018) based on my calculation is 12+1 month = 13 months. Since 13 is between 9 & 24, Sam's record is one of the expected output.
I think this query will do what you want. It uses
(YEAR(CURDATE())*12+MONTH(CURDATE()))
- (YEAR(STR_TO_DATE(join_date, '%d-%m-%Y'))*12+MONTH(STR_TO_DATE(join_date, '%d-%m-%Y'))) -
- 1
to get the number of whole months of experience for the user,
DAY(LAST_DAY(STR_TO_DATE(join_date, '%d-%m-%Y')))
- DAY(STR_TO_DATE(join_date, '%d-%m-%Y'))
+ 1
to get the number of days in the first month, and
DAY(CURDATE())
to get the number of days in the current month. The two day counts are summed and if the total is > 15, 1 is added to the number of whole months e.g.
SELECT id
, name
, (YEAR(CURDATE())*12+MONTH(CURDATE())) - (YEAR(STR_TO_DATE(join_date, '%d-%m-%Y'))*12+MONTH(STR_TO_DATE(join_date, '%d-%m-%Y'))) - 1 -- whole months
+ CASE WHEN DAY(LAST_DAY(STR_TO_DATE(join_date, '%d-%m-%Y'))) - DAY(STR_TO_DATE(join_date, '%d-%m-%Y')) + 1 + DAY(CURDATE()) > 15 THEN 1 ELSE 0 END -- broken month
AS months
FROM user
We can use this expression as a JOIN condition between user and allowed_exp_range to find all users who have experience within a given range:
SELECT u.id
, u.name
, a.starting_exp_months
, a.end_exp_months
FROM user u
JOIN allowed_exp_range a
ON (YEAR(CURDATE())*12+MONTH(CURDATE())) - (YEAR(STR_TO_DATE(u.join_date, '%d-%m-%Y'))*12+MONTH(STR_TO_DATE(u.join_date, '%d-%m-%Y'))) - 1
+ CASE WHEN DAY(LAST_DAY(STR_TO_DATE(u.join_date, '%d-%m-%Y'))) - DAY(STR_TO_DATE(u.join_date, '%d-%m-%Y')) + 1 + DAY(CURDATE()) > 15 THEN 1 ELSE 0 END
BETWEEN a.starting_exp_months AND a.end_exp_months
Output (for your sample data, includes all users as they all fit into one of the experience ranges):
id name starting_exp_months end_exp_months
1 Sam 9 24
2 Moe 9 24
3 Tim 0 6
4 Sal 9 24
5 Joe 0 6
I've created a small demo on dbfiddle which demonstrates the steps in arriving at the result.

Post Increment date field in mySQL query using R

I am trying to query a table in our mySQL database using the DBI R package. However, I need to pull the fields from the table by changing the date field on a monthly basis and limiting it to 1.
I'm having trouble with the looping and sql query text. I would like to create a loop that changes the date (monthly) and then prints that to a database query that will then pull all the data that matches the monthly conditions.
This is my code so far:
for (i in seq(0,12,1)){
results <- dbGetQuery(myDB, paste("SELECT * FROM cost_and_price_period WHERE start_date <=", '01-[[i]]-2019'))
}
The main issue is that R doesn't acknowledge post-increment operators like ++, so I know I could just make 12 individual queries and then rbind them, but I would prefer to do one efficient query. Does anyone have any ideas?
This below solution could give you an idea how to proceed with your problem.
DummyTable
id names dob
1 1 aa 2018-01-01
2 2 bb 2018-02-01
3 3 cc 2018-03-01
4 4 dd 2018-04-01
5 5 ee 2018-05-01
6 6 ff 2018-06-01
7 7 gg 2018-07-01
8 8 hh 2018-08-01
9 9 ii 2018-09-01
10 10 jj 2018-10-01
11 11 kk 2018-11-01
12 12 ll 2018-12-01
13 13 ll 2018-12-01
Imagine we have the above table in MySQL. Then we need to access the data for 1st day of every month and store whole records as a data frame.
### Using for loop like from your question
n <- 12
df <- vector("list", n)
for (i in seq(1:12)){
df[[i]] <- data.frame(dbGetQuery(pool, paste0("SELECT * FROM dummyTable WHERE dob = '2018-",i,"-01';" ))) # in iteration `i` corresponds for month number
}
df <- do.call(rbind, df)
### Using lapply(preferred way)
n <- seq(1:12)
df <- lapply(n, function(x){
dbGetQuery(pool, paste0("SELECT * FROM dummyTable WHERE dob = '2018-",x,"-01';" ))
})
df <- do.call(rbind, df)
So output of df data frame will give the matched records from MySQL.

Count number of rows when using dplyr to access sql table/query

What would be the efficient way to count the number of rows which using dplyr to access sql table. MWE is below using SQLite, but I use PostgreSQL and have the same issue. Basically dim() is not very consistent. I used
dim()
This works for a schema in the database (First case), but is not very consistent when I create a tbl from an SQL query for the same schema (Second case). My number of rows is in the millions or I see this even with a small 1000 of rows. I get NA or ??. Is there anything that is missing?
#MWE
test_db <- src_sqlite("test_db.sqlite3", create = T)
library(nycflights13)
flights_sqlite <- copy_to(test_db, flights, temporary = FALSE, indexes = list(
c("year", "month", "day"), "carrier", "tailnum"))
flights_postgres <- tbl(test_db, "flights")
First case (table from direct schema)
flights_postgres
> flights_postgres
Source: postgres 9.3.5 []
From: flights [336,776 x 16]
year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time distance hour minute
1 2013 1 1 517 2 830 11 UA N14228 1545 EWR IAH 227 1400 5 17
2 2013 1 1 533 4 850 20 UA N24211 1714 LGA IAH 227 1416 5 33
#using dim()
> dim(flights_postgres)
[1] 336776 16
The above works and get the count of the number of rows.
Second case (table from SQL query)
## use the flights schema above but can also be used to create other variables (like lag, lead) in run time
flight_postgres_2 <- tbl(test_db, sql("SELECT * FROM flights"))
>flight_postgres_2
Source: postgres 9.3.5 []
From: <derived table> [?? x 16]
year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time distance hour minute
1 2013 1 1 517 2 830 11 UA N14228 1545 EWR IAH 227 1400 5 17
2 2013 1 1 533 4 850 20 UA N24211 1714 LGA IAH 227 1416 5 33
>
> dim(flight_postgres_2)
[1] NA 16
As you see it either prints as ?? or NA. So not very helpful.
I got around this by either using collect() or then convert the output to a dataframe using as.data.frame() to check the dimension. But these two methods may not be the ideal solution, given the time it may take for larger number of rows.
I think the answer is what #alistaire suggests: Do it in the database.
> flight_postgres_2 %>% summarize(n())
Source: sqlite 3.8.6 [test_db.sqlite3]
From: <derived table> [?? x 1]
n()
(int)
1 336776
.. ...
Asking dim to do this would be having your cake (lazy evaluation of SQL with dplyr, keeping data in the database) and eating it too (having full access to the data in R).
Note that this is doing #alistaire's approach underneath:
> flight_postgres_2 %>% summarize(n()) %>% explain()
<SQL>
SELECT "n()"
FROM (SELECT COUNT() AS "n()"
FROM (SELECT * FROM flights) AS "zzz11") AS "zzz13"
<PLAN>
selectid order from detail
1 0 0 0 SCAN TABLE flights USING COVERING INDEX flights_year_month_day

5 point average in SSRS

I try to put a 5 point avg in my chart. I add a trendline, but it looks like this:
And then I created a new series to calculate there the avg. and this looks like this:
but I would like to show this in a 5 point average. How can I do this?
This answer is based on my experience with Excel, not reporting services, but it is probably the same problem.
Your chart is probably a scatter plot rather than a line chart (note: this is Excel terminology). A scatter plot does not have an intrinsic ordering in the data. A line chart does.
The solution (for a scatter plot) is simply to sort the data by the x-values. The same will probably work for you. If you are pulling the data from a database, then order by can accomplish this. Otherwise, you can sort the data in the application.
Using this post as a starting point you can see that it is possible to calculate a moving average for a chart using the SQL query that pulls the data from the database.
For example, using this table in my database called mySalesTable
myDate sales myDate sales myDate sales
---------- ------ ---------- ------ ---------- ------
01/01/2015 456 16/01/2015 546 31/01/2015 658
02/01/2015 487 17/01/2015 12 01/02/2015 121
03/01/2015 245 18/01/2015 62 02/02/2015 654
04/01/2015 812 19/01/2015 516 03/02/2015 261
05/01/2015 333 20/01/2015 1 04/02/2015 892
06/01/2015 449 21/01/2015 65 05/02/2015 982
07/01/2015 827 22/01/2015 15 06/02/2015 218
08/01/2015 569 23/01/2015 656 07/02/2015 212
09/01/2015 538 24/01/2015 25 08/02/2015 312
10/01/2015 455 25/01/2015 549 09/02/2015 21
11/01/2015 458 26/01/2015 261
12/01/2015 542 27/01/2015 21
13/01/2015 549 28/01/2015 21
14/01/2015 432 29/01/2015 61
15/01/2015 685 30/01/2015 321
You can pull out this data, and create a Moving average based on the last 5 dates by using the following query for your dataset
SELECT mst.myDate, mst.sales, avg(mst_past.sales) AS moving_average
FROM mySalesTable mst
JOIN mySalesTable as mst_past
ON mst_past.myDate
BETWEEN DATEADD(D, -4, mst.myDate) AND mst.myDate
GROUP BY mst.myDate, mst.sales
ORDER BY mst.myDate ASC
This is effectively joining a sub-table for each row consisting of the previous 4 dates and the current date, and finds the average for these dates, outputting that as the column moving_average
You can then chart both these fields as normal, to give the following output (with the data table so you and see the actual calculated moving average)
Hopefully this will help you. Please let me know if you require further assistance

google spreadsheet calculation

How to calculate this:
0-10 hours = 100 pr. hour
11-20 hours = 80 pr. hour
21+ hours = 70 pr. hour
If ex. I have 23 hours:
10 hours - 100 pr. hour
10 hours - 80 pr. hour
3 hours - 70 pr. hour
Is it possible to write a function in Google spreadsheet to calculate the total amount if I only have the total hours.
You can express this as a formula. Consider that there are three cases:
If more than 20 hours, rate is (10 x 100 + 10 x 80) + (hours - 20) x 70
If less than 21 hours but more than 10, rate is (10 x 100) + (hours - 10) x 80
If less than 11 hours, rate is hours x 100
If we have our hours in cell A1, here's the formula:
=if(A1>20,(A1-20)*70+1800,if(A1>10,(A1-10)*80+1000,A1*100))
I'm writing gsheet formulas for 4+years.
IMHO you need to create additional table "rates"
HoursFrom | HoursTo | Rate
0 | 10 | 100
11 | 20 | 80
21 | 9999 | 70
than you can write flexible formula that will work everywhere:
= filter(rates!C:C, rates!A:A>A1, rates!B:B
it may seem a bit more complex in the beginning, but late you will easily read your formula and you will be able to change your data in single place in your "Rates" table without searching for all formulas that refer to this data.
Yes, it's possible to write a function for this. Here is a sample code.
function amount(x) {
if(!x){
return "no hours provided";
} else {
if(x<=10){
return x*100;
} else if (x<=20){
return 10*100+(x-10)*80;
} else {
return 10*100+10*80+(x-20)*70;
}
}
}
You can use this anywhere in the spreadsheet as a custom function. e.g. if
A1 = 23;
//You can put below line in any cell. A2 is just being given as an example
A2 = amount(A1) //answer will be 2010