I am receiving a json which I convert to DataFrame df. One of the column is with Dates this format
/Date(950842800000)/, /Date(1000436400000)/, ...
The problem is that One of this dates has 12 digit and the others 13 digits. This one with 13 digits are converted fine, and with 12 there is a problem. The way I am converting
df["Data"] = df["Date"].apply(lambda x: datetime.fromtimestamp(int(x[6:-2][:10])) if len(x) > 12 else datetime.fromtimestamp(int(x[6:-2][:11])))
doesn´t work for 12 digits.
Thank You for help.
Short:
lambda x: datetime.fromtimestamp(int(x[6:-2][:-3]))
Long:
If you have such input data:
"/Date(950842800000)/",
"/Date(1000436400000)/"
Than a few modifications will make script working correctly:
from datetime import datetime
dates = [
"/Date(950842800000)/",
"/Date(1000436400000)/"
]
for d in dates:
l = lambda x: datetime.fromtimestamp(int(x[6:-2][:9])) if len(x) < 21 else datetime.fromtimestamp(
int(x[6:-2][:10]))
print(l(d))
produce:
2000-02-18 04:00:00
2001-09-14 05:00:00
what is what you expect.
But then we may think to simplicity, you may just use:
from datetime import datetime
dates = [
"/Date(950842800000)/",
"/Date(1000436400000)/"
]
for d in dates:
l = lambda x: datetime.fromtimestamp(int(x[6:-2][:-3]))
print(l(d))
Related
I need to convert month number to month name.
I have date time as the date type - 2009-01-01 00:00:00.000
I also have 4-byte integer data type - 1
how do I convert this 1 to "January" for example?
i think you are in the data flow:
it is really easy to get MOnth Name in a script component from Date:
add a varchar column to your dataflow
Mark your date column for read access
enter the following script
Row.[NewColumnName] = Row.[Your Date Column].ToString("MMMM");
Result:
Here is a good translations for any date part to string formatting:
// create date time 2008-03-09 16:05:07.123
DateTime dt = new DateTime(2008, 3, 9, 16, 5, 7, 123);
String.Format("{0:y yy yyy yyyy}", dt); // "8 08 008 2008" year
String.Format("{0:M MM MMM MMMM}", dt); // "3 03 Mar March" month
String.Format("{0:d dd ddd dddd}", dt); // "9 09 Sun Sunday" day
String.Format("{0:h hh H HH}", dt); // "4 04 16 16" hour 12/24
String.Format("{0:m mm}", dt); // "5 05" minute
String.Format("{0:s ss}", dt); // "7 07" second
String.Format("{0:f ff fff ffff}", dt); // "1 12 123 1230" sec.fraction
String.Format("{0:F FF FFF FFFF}", dt); // "1 12 123 123" without zeroes
String.Format("{0:t tt}", dt); // "P PM" A.M. or P.M.
String.Format("{0:z zz zzz}", dt); // "-6 -06 -06:00" time zone
Furthermore, you asked about quarters. I don't think it is as easy but here is something I stole from another answer.
Build DateTime extensions:
Normal Quarter:
public static int GetQuarter(this DateTime date)
{
return (date.Month + 2)/3;
}
Financial Year Quarter (This case is for quarters that start on April 1):
public static int GetFinancialQuarter(this DateTime date)
{
return (date.AddMonths(-3).Month + 2)/3;
}
Integer division will truncate decimals, giving you an integer result. Place methods into a static class and you will have an extension method to be used as follows:
Row.calendarQuarter = Row.[your Date Column].GetQuarter()
Row.fiscalQuarter = Row.[your Date Column].GetFinancialQuarter()
In SQL Server, one method is:
select datename(month, datefromparts(2000, 1, 1))
The first "1" is the column for the month. The year is arbitrary.
following steps:
create variable with datetime datatype and assigned value.
used MONTH function in ssis to extract month number and assigned to new variable with integer data type: #[User::newdata]= MONTH( #[User::dbdate])
finally used if else condition which manually compare all 12 months
(code available:1)
I'm trying to sort a CSV by date first then time second. With Pandas, it was easy by using df = df.sort_values(by=['Date', 'Time_UTC']). In the csv library, the code is (from here):
with open ('eqph_csv_29May2020_noF_5lines.csv') as file:
reader = csv.DictReader(file, delimiter=',')
date_sorted = sorted(reader, key=lambda Date: datetime.strptime('Date', '%Y-%m-%d'))
print(date_sorted)
The datetime documentation clearly says these codes are right. Here's a sample CSV (no delimiter):
Date Time_UTC Latitude Longitude
2020-05-28 05:17:31 16.63 120.43
2020-05-23 02:10:27 15.55 121.72
2020-05-20 12:45:07 5.27 126.11
2020-05-09 19:18:12 14.04 120.55
2020-04-10 18:45:49 5.65 126.54
csv.DictReader returns an iterator that yields a dict for each row in the csv file. To sort it on a column from each row, you need to specify that column in the sort function:
date_sorted = sorted(reader, key=lambda row: datetime.strptime(row['Date'], '%Y-%m-%d'))
To sort on both Date and Time_UTC, you could combine them into one string and convert that to a datetime:
date_sorted = sorted(reader, key=lambda row: datetime.strptime(row['Date'] + ' ' + row['Time_UTC'], '%Y-%m-%d %H:%M:%S'))
Nick's answer worked and used it to revise mine. I used csv.reader() instead.
lon,lat = [],[]
xy = zip(lon,lat)
with open ('eqph_csv_29May2020_noF_20lines.csv') as file:
reader = csv.reader(file, delimiter=',')
next(reader)
date_sorted = sorted(reader, key=lambda row: datetime.strptime
(row[0] + ' ' + row[1], '%Y-%m-%d %H:%M:%S'))
for row in date_sorted:
lon.append(float(row[2]))
lat.append(float(row[3]))
for i in xy:
print(i)
Result
(6.14, 126.2)
(14.09, 121.36)
(13.74, 120.9)
(6.65, 125.42)
(6.61, 125.26)
(5.49, 126.57)
(5.65, 125.61)
(11.33, 124.64)
(11.49, 124.42)
(15.0, 119.79) # 2020-03-19 06:33:00
(14.94, 120.17) # 2020-03-19 06:49:00
(6.7, 125.18)
(5.76, 125.14)
(9.22, 124.01)
(20.45, 122.12)
(5.65, 126.54)
(14.04, 120.55)
(5.27, 126.11)
(15.55, 121.72)
(16.63, 120.43)
To save time I would like to iterate through a vector of month start and month end dates and make an API request each time and store the output from each request.
Say we start with a dataframe called dateTable holding the first and last day of the month for the date range:
firstDOM lastDOM
2016-05-01 2016-05-31
2016-06-01 2016-06-30
2016-07-01 2016-07-31
2016-08-01 2016-08-31
2016-09-01 2016-09-30
2016-10-01 2016-10-31
2016-11-01 2016-11-30
2016-12-01 2016-12-31
2017-01-01 2017-01-31
2017-02-01 2017-02-28
2017-03-01 2017-03-31
2017-04-01 2017-04-30
2017-05-01 2017-05-31
2017-06-01 2017-06-30
2017-07-01 2017-07-31
2017-08-01 2017-08-31
I would like to iterate through each row and paste the startDate and endDate into the following rest API request however I keep getting the following error when running this piece of code and I am not sure what's causing it:
for (i in 1:nrow(dateTable)) {
startDate <- dateTable$firstDOM
endDate <- dateTable$lastDOM
#Obtian the Volume of Mentions by Day using declared specs from above
qryMen <- GET(paste("https://newapi.brandwatch.com/projects/", projId, dataSpec
, "?queryId=", queryId, "&startDate=", startDate, "&endDate=", endDate
, '&pageSize=', pageSize, "&access_token=", accessToken$access_token, sep = ""))
}
#Error
Error: length(url) == 1 is not TRUE
Any help would be greatly appreciated!
Currently you are passing the entire vector in your for loop with each iteration and not indexing by the loop variable, i:
for (i in 1:nrow(dateTable)) {
startDate <- dateTable$firstDOM[[i]]
endDate <- dateTable$lastDOM[[i]]
...
}
Nonetheless, consider Map (or the equivalent mapply(..., SIMPLIFY=FALSE)) to iterate elementwise through the two columns. With this approach you can save a large list of objects (whatever your query returns) with number of elements equal to the rows of dataTable. You can then use this list for further operations.
api_fct <- function(startDate, endDate) {
qryMen <- GET(paste0("https://newapi.brandwatch.com/projects/", projId, dataSpec
, "?queryId=", queryId, "&startDate=", startDate, "&endDate=", endDate
, '&pageSize=', pageSize, "&access_token=", accessToken$access_token))
}
api_list <- Map(api_fct, dateTable$firstDOM, dateTable$lastDOM)
# api_list <- mapply(api_fct, dateTable$firstDOM, dateTable$lastDOM, SIMPLIFY=FALSE)
Couple things, your for loop isn't actually doing anything. You say for i in ... but you never reference i again. And, there's no reason to put the startDate and endDate in the loop. Also, it'd help if you post some sample data so that we can attempt to recreate what you are doing.
Anyway, The error is telling you what is wrong: you can't pass a vector of URLs to GET. Take everything you passed to GET() and just paste it into the console. You'll get back n URLs, n being the number of rows in your dateTable.
I'm assuming your R objects that you pass to GET (other than startDate and endDate) don't change? If that's the case, and you want to use a loop, you can preallocate a vector of the same length as the data you expect to return, then loop through your startDate and endDate, passing them into GET() and slotting them into your qryMen object.
startDate <- dateTable$firstDOM
endDate <- dateTable$lastDOM
qryMen <- vector(mode = "list", length = nrow(dataTable)
for (i in 1:nrow(dateTable)) {
qryMen[i] <- GET(paste("https://newapi.brandwatch.com/projects/", projId,
dataSpec, "?queryId=", queryId,
"&startDate=", startDate[i],
"&endDate=", endDate[i],
"&pageSize=", pageSize,
"&access_token=", accessToken$access_token, sep = ""))
}
I am using Rails 4. I've read 2 records(b1 and b2) from database. Both them have a column called build_start_time, which is defined as datetime type in Mysql. The build_start_time between b1 record and b2 record are like this:
2.0.0-p643 :021 > b1.build_start_time
=> Tue, 12 Aug 2014 18:23:31 UTC +00:00
2.0.0-p643 :012 > b2.build_start_time
=> Fri, 15 Aug 2014 10:07:18 UTC +00:00
How do I calculate the duration between them in Rails ? Does anybody have an idea?
The result should be something like:
b2.build_start_time - b1.build_start_time = 2 days 15 hours 53 minutes 47 seconds
Is this possible?
Assuming you can tolerate your answer being off by less than a second you could try using the to_i and ago methods:
def datetime_diff(datetime1, datetime2)
res = datetime1 <=> datetime2
if res == 0
# order doesn't matter in this case
min = datetime1
max = datetime2
elif res < 0
min = datetime1
max = datetime2
else
min = datetime2
max = datetime1
end
max.ago(min.to_i) # min.to_i returns min in seconds since the epoch
end
You figure out which time came first then you return the later of the two times x seconds ago, where x is the earlier time in seconds since the epoch. See http://api.rubyonrails.org/classes/DateTime.html#method-i-3C-3D-3E
I'm trying to calculate various time period returns (monthly, quarterly, yearly etc.) for each unique member (identified by Code in the example below) of a data set. The data set will contain monthly pricing information for a 20 year period for approximately 500 stocks. An example of the data is below:
Date Code Price Dividend
1 2005-01-31 xyz 1000.00 20.0
2 2005-01-31 abc 1.00 0.1
3 2005-02-28 xyz 1030.00 20.0
4 2005-02-28 abc 1.01 0.1
5 2005-03-31 xyz 1071.20 20.0
6 2005-03-31 abc 1.03 0.1
7 2005-04-30 xyz 1124.76 20.0
I am fairly new to R, but thought that there would be a more efficient solution than looping through each Code and then each Date as shown here:
uniqueDates <- unique(data$Date)
uniqueCodes <- unique(data$Code
for (date in uniqueDates) {
for (code in uniqueCodes) {
nextDate <- seq.Date(from=stock_data$Date[i], by="3 months",length.out=2)[2]
curPrice <- data$Price[data$Date == date]
futPrice <- data$Price[data$Date == nextDate]
data$ret[(data$Date == date) & (data$Code == code)] <- (futPrice/curPrice)-1
}
}
This method in itself has an issue in that seq.Date does not always return the final day in the month.
Unfortunately the data is not uniform (the number of companies/codes varies over time) so using a simple row offset won't work. The calculation must match the Code and Date with the desired date offset.
I had initially tried selecting the future dates by using the seq.Date function
data$ret = (data[(data$Date == (seq.Date(from = data$Date, by="3 month", length.out=2)[2])), "Price"] / data$Price) - 1
But this generated an error as seq.Date requires a single entry.
> Error in seq.Date(from = stock_data$Date, by = "3 month", length.out =
> 2) : 'from' must be of length 1
I thought that R would be well suited to this type of calculation but perhaps not. Since all the data is in a mysql database I am now thinking that it might be faster/easier to do this calc directly in the database.
Any suggestions would be greatly appreciated.
Load data:
tc='
Date Code Price Dividend
2005-01-31 xyz 1000.00 20.0
2005-01-31 abc 1.00 0.1
2005-02-28 xyz 1030.00 20.0
2005-02-28 abc 1.01 0.1
2005-03-31 xyz 1071.20 20.0
2005-03-31 abc 1.03 0.1
2005-04-30 xyz 1124.76 20.0'
df = read.table(text=tc,header=T)
df$Date=as.Date(df$Date,"%Y-%m-%d")
First I would organize the data by date:
library(plyr)
pp1=reshape(df,timevar='Code',idvar='Date',direction='wide')
Then you would like to obtain monthly, quarterly, yearly, etc returns.
For that there are several options, one could be:
Make the data zoo or xts class. i.e
library(xts)
pp1[2:ncol(pp1)] = as.xts(pp1[2:ncol(pp1)],order.by=pp1$Date)
#let's create a function for calculating returns.
rets<-function(x,lag=1){
return(diff(log(x),lag))
}
Since this database is monthly, the lags for the returns will be:
monthly=1, quaterly=3, yearly =12. for instance let's calculate monthly return
for xyz.
lagged=1 #for monthly
This calculates Monthly returns for xyz
pp1$returns_xyz= c(NA,rets(pp1$Price.xyz,lagged))
To get all the returns:
#create matrix of returns
pricelist= ls(pp1)[grep('Price',ls(pp1))]
returnsmatrix = data.frame(matrix(rep(0,(nrow(pp1)-1)*length(pricelist)),ncol=length(pricelist)))
j=1
for(i in pricelist){
n = which(names(pp1) == i)
returnsmatrix[,j] = rets(pp1[,n],1)
j=j+1
}
#column names
codename= gsub("Price.", "", pricelist, fixed = TRUE)
names(returnsmatrix)=paste('ret',codename,sep='.')
returnsmatrix
You can do this very easily with the quantmod and xts packages. Using the data in AndresT's answer:
library(quantmod) # loads xts too
pp1 <- reshape(df,timevar='Code',idvar='Date',direction='wide')
# create an xts object
x <- xts(pp1[,-1], pp1[,1])
# only get the "Price.*" columns
p <- getPrice(x)
# run the periodReturn function on each column
r <- apply(p, 2, periodReturn, period="monthly", type="log")
# merge prior result into a multi-column object
r <- do.call(merge, r)
# rename columns
names(r) <- paste("monthly.return",
sapply(strsplit(names(p),"\\."), "[", 2), sep=".")
Which leaves you with an r xts object containing:
monthly.return.xyz monthly.return.abc
2005-01-31 0.00000000 0.000000000
2005-02-28 0.02955880 0.009950331
2005-03-31 0.03922071 0.019608471
2005-04-30 0.04879016 NA