uniting lists out of the googleways package for google directions? - json

I'm working on looping through long and latitude points for the googleways api. I've come up with two ways to do that in an effort to access the points sections shown in the following link:
https://cran.r-project.org/web/packages/googleway/vignettes/googleway-vignette.html
Unforuntaely since this uses a unique key I can't provide a reproducible example but Below are my attempts, one using mapply and the other with a loop. Both work in producing a response in list format, however I am not sure how to unpack it to pull out the points route as you would when passing only one point:
df$routes$overview_polyline$points
Any suggestions?
library(googleway)
dir_results = mapply(
myfunction,
origin = feed$origin,
destination = feed$destination,
departure = feed$departure
)
OR
empty_df = NULL
for (i in 1:nrow(feed)) {
print(i)
output = google_directions(feed[i,"origin"],
feed[i,"destination"],
mode = c("driving"),
departure_time = feed[i,"departure"],
arrival_time = NULL,
waypoints = NULL, alternatives = FALSE, avoid = NULL,
units = c("metric"), key = chi_directions, simplify = T)
empty_df = rbind(empty_df, output)
}
EDIT**
The intended output would be a data frame like below: where "id" represents the original trip fed in.
lat lon id
1 40.71938 -73.99323 40.7193908691406+-73.9932174682617 40.7096214294434+-73.9497909545898
2 40.71992 -73.99292 40.7193908691406+-73.9932174682617 40.7096214294434+-73.9497909545898
3 40.71984 -73.99266 40.7193908691406+-73.9932174682617 40.7096214294434+-73.9497909545898
4 40.71932 -73.99095 40.7193908691406+-73.9932174682617 40.7096214294434+-73.9497909545898
5 40.71896 -73.98981 40.7193908691406+-73.9932174682617 40.7096214294434+-73.9497909545898
6 40.71824 -73.98745 40.7193908691406+-73.9932174682617 40.7096214294434+-73.9497909545898
7 40.71799 -73.98674 40.7193908691406+-73.9932174682617 40.7096214294434+-73.9497909545898
8 40.71763 -73.98582 40.7193908691406+-73.9932174682617 40.7096214294434+-73.9497909545898
EDIT****
dput provided for answering question on dataframe to pair list:
structure(list(origin = c("40.7193908691406 -73.9932174682617",
"40.7641792297363 -73.9734268188477", "40.7507591247559 -73.9739990234375"
), destination = c("40.7096214294434-73.9497909545898", "40.7707366943359-73.9031448364258",
"40.7711143493652-73.9871368408203")), .Names = c("origin", "destination"
), row.names = c(NA, 3L), class = "data.frame")
sql code is basic looks like such:
feed = sqlQuery(con, paste("select top 10
longitude as px,
latitude as py,
dlongitude as dx ,
dlatitude as dy,
from mydb"))
and then before feeding it my data frame feed looks like so (u can ignore departure i was using that for the distance api):
origin destination departure
1 40.7439613342285 -73.9958724975586 40.716911315918-74.0121383666992 2017-03-03 01:00:32
2 40.7990493774414 -73.9685516357422 40.8066520690918-73.9610137939453 2017-03-03 01:00:33
3 40.7406234741211 -74.0055618286133 40.7496566772461-73.9834671020508 2017-03-03 01:00:33
4 40.7172813415527 -73.9953765869141 40.7503852844238-73.9811019897461 2017-03-03 01:00:33
5 40.7603607177734 -73.9817123413086 40.7416114807129-73.9795761108398 2017-03-03 01:00:34

As you know the result of the API query returns a list. And if you're doing multiple calls to the API you'll return multiple lists.
So to extract the data of interest you have to do standard operations on lists. In this example it can be done with a couple of *applys
Using the data.frame feed where each row consists of an origin lat/lon (px/py) and a destination lat/lon (dx/dy)
feed <- data.frame(px = c(40.7193, 40.7641),
py = c(-73.993, -73.973),
dx = c(40.7096, 40.7707),
dy = c(-73.949, -73.903))
You can use an apply to query the google_directions() API for each row of the data.frame. And within the same apply you can do whatever you want with the result to extract/format it how you want.
lst <- apply(feed, 1, function(x){
## query Google Directions API
res <- google_directions(key = key,
origin = c(x[['px']], x[['py']]),
destination = c(x[['dx']], x[['dy']]))
## Decode the polyline
df_route <- decode_pl(res$routes$overview_polyline$points)
## append the original coordinates as an 'id' column
df_route[, "id"] <- paste0(paste(x[['px']], x[['py']], sep = "+")
," "
, paste(x[['dx']], x[['dy']], sep = "+")
, collapse = " ")
## store the results of the query, the decoded polyline,
## and the original query coordinates in a list
lst_result <- list(route = df_route,
full_result = res,
origin = c(x[['px']], x[['py']]),
destination = c(x[['dx']],x[['dy']]))
return(lst_result)
})
So now lst is a list that contains the result of each query, plus the decoded polyline as a data.frame. To get all the decoded polylines as a single data.frame you can do another lapply, and then rbind it all together
## do what we want with the result, for example bind all the route coordinates into one data.frame
df <- do.call(rbind, lapply(lst, function(x) x[['route']]))
head(df)
lat lon id
1 40.71938 -73.99323 40.7193+-73.993 40.7096+-73.949
2 40.71992 -73.99292 40.7193+-73.993 40.7096+-73.949
3 40.71984 -73.99266 40.7193+-73.993 40.7096+-73.949
4 40.71932 -73.99095 40.7193+-73.993 40.7096+-73.949
5 40.71896 -73.98981 40.7193+-73.993 40.7096+-73.949
6 40.71824 -73.98745 40.7193+-73.993 40.7096+-73.949

Related

R - specifying interaction contrasts for aov

How to specificy the contrasts (point estimates, 95CI and p-values) for the between-group differences of the within-group delta changes?
In the example below, I would be interest in the between-groups (group = 1 minus group = 2) of delta changes (time = 3 minus time = 1).
df and model:
demo3 <- read.csv("https://stats.idre.ucla.edu/stat/data/demo3.csv")
## Convert variables to factor
demo3 <- within(demo3, {
group <- factor(group)
time <- factor(time)
id <- factor(id)
})
par(cex = .6)
demo3$time <- as.factor(demo3$time)
demo3.aov <- aov(pulse ~ group * time + Error(id), data = demo3)
summary(demo3.aov)
Neither of these chunks of code achieve my goal, correct?
m2 <- emmeans(demo3.aov, "group", by = "time")
pairs(m2)
m22 <- emmeans(demo3.aov, c("group", "time") )
pairs(m22)
Look at the documentation for emmeans::contrast and in particular the argument interaction. If I understand your question correctly, you might want
summary(contrast(m22, interaction = c("pairwise", "dunnett")),
infer = c(TRUE, TRUE))
which would compute Dunnett-style contrasts for time (each time vs. time1), and compare those for group1 - group2. The summary(..., infer = c(TRUE, TRUE)) part overrides the default that tests but not CIs are shown.
You could also do this in stanges:
time.con <- contrast(m22, "dunnett", by = "group", name = "timediff")
summary(pairs(time.con, by = NULL), infer = c(TRUE, TRUE))
If you truly want just time 3 - time 1, then replace time.con with
time.con1 <- contrast(m22, list(`time3-time1` = c(-1, 0, 1, 0, 0))
(I don't know how many times you have. I assumed 5 in the above.)

Retrieve data in sets Pandas

I'm retrieving data from the Open Weather Map API. I have the following code where I'm extracting the current weather from more than 500 cities and I want the log that is giving me separate the data in sets of 50 each
I did a non efficient way that I would really like to improve!
Many many thanks!
x = 1
for index, row in df.iterrows():
base_url = "http://api.openweathermap.org/data/2.5/weather?"
units = "imperial"
query_url = f"{base_url}appid={api_key}&units={units}&q="
city = row['Name'] #this comes from a df
response = requests.get(query_url + city).json()
try:
df.loc[index,"Max Temp"] = response["main"]["temp_max"]
if index < 50:
print(f"Processing Record {index} of Set {x} | {city}")
elif index <100:
x = 2
print(f"Processing Record {index} of Set {x} | {city}")
elif index <150:
x = 3
print(f"Processing Record {index} of Set {x} | {city}")
except (KeyError, IndexError):
pass
print("City not found. Skipping...")

Function on each row of pandas DataFrame but not generating a new column

I have a data frame in pandas as follows:
A B C D
3 4 3 1
5 2 2 2
2 1 4 3
My final goal is to produce some constraints for an optimization problem using the information in each row of this data frame so I don't want to generate an output and add it to the data frame. The way that I have done that is as below:
def Computation(row):
App = pd.Series(row['A'])
App = App.tolist()
PT = [row['B']] * len(App)
CS = [row['C']] * len(App)
DS = [row['D']] * len(App)
File3 = tuplelist(zip(PT,CS,DS,App))
return m.addConstr(quicksum(y[r,c,d,a] for r,c,d,a in File3) == 1)
But it does not work out by calling:
df.apply(Computation, axis = 1)
Could you please let me know if there is anyway to do this process?
.apply will attempt to convert the value returned by the function to a pandas Series or DataFrame. So, if that is not your goal, you are better off using .iterrows:
# In pseudocode:
for row in df.iterrows:
constrained = Computation(row)
Also, your Computation can be expressed as:
def Computation(row):
App = list(row['A']) # Will work as long as row['A'] is iterable
# For the next 3 lines, see note below.
PT = [row['B']] * len(App)
CS = [row['C']] * len(App)
DS = [row['D']] * len(App)
File3 = tuplelist(zip(PT,CS,DS,App))
return m.addConstr(quicksum(y[r,c,d,a] for r,c,d,a in File3) == 1)
Note: [<list>] * n will create n pointers or references to the same <list>, not n independent lists. Changes to one copy of n will change all copies in n. If that is not what you want, use a function. See this question and it's answers for details. Specifically, this answer.

How to write a JSON object from R dataframe with grouping

In general I feel there is a need to make JSON objects by folding multiple columns. There is no direct way to do this afaik. Please point it out if there is ..
I have data of this from
A B C
1 a x
1 a y
1 c z
2 d p
2 f q
2 f r
How do I write a json which looks like
{'query':'1', 'type':[{'name':'a', 'values':[{'value':'x'}, {'value':'y'}]}, {'name':'c', 'values':[{'value':'z'}]}]}
and similarly for 'query':'2'
I am looking to spit them in the mongo import/export individual json lines format.
Any pointers are also appreciated..
You've got a little "non-standard" thing going with two keys of "value" (I don't know if this is legal json), as you can see here:
(js <- jsonlite::fromJSON('{"query":"1", "type":[{"name":"a", "values":[{"value":"x"}, {"value":"y"}]}, {"name":"c", "values":[{"value":"z"}]}]}'))
## $query
## [1] "1"
##
## $type
## name values
## 1 a x, y
## 2 c z
... with a data.frame cell containing a list of data.frames:
js$type$values[[1]]
## value
## 1 x
## 2 y
class(js$type$values[[1]])
## [1] "data.frame"
If you can accept your "type" variable containing a vector instead of a named-list, then perhaps the following code will suffice:
jsonlite::toJSON(lapply(unique(dat[, 'A']), function(a1) {
list(query = a1,
type = lapply(unique(dat[dat$A == a1, 'B']), function(b2) {
list(name = b2,
values = dat[(dat$A == a1) & (dat$B == b2), 'C'])
}))
}))
## [{"query":[1],"type":[{"name":["a"],"values":["x","y"]},{"name":["c"],"values":["z"]}]},{"query":[2],"type":[{"name":["d"],"values":["p"]},{"name":["f"],"values":["q","r"]}]}]

Assign each aggregate value to seperate variable in R and display it in HTML

I am using the following R script to calculate a monthly CpK number:
mydf <- read.csv('file.csv', header = TRUE, sep=",")
date <- strptime(mydf$PDATETIME, "%Y/%m/%d %H:%M:%S")
plot(date,mydf$MEAS_AVG,xlab='Date',ylab='MEAS_AVG',main='year')
abline(h=mydf$TARG_MIN,col=3,lty=1)
abline(h=mydf$TARG_MAX,col=3,lty=1)
grid(NULL,NULL,col="black")
legend("topright", legend = c(" ", " "), text.width = strwidth("1,000,000"), lty = 1:2, xjust = 1, yjust = 1, title = "Data")
myavg <-mean(mydf$MEAS_AVG, na.rm=TRUE)
newds <- (mydf$MEAS_AVG - myavg)^2
newsum <- sum(newds, na.rm=TRUE)
N <- length(mydf$MEAS_AVG) - 1
newN <- 1/N
total <- newN*newsum
sigma <- total^(1/2)
USL <- mean(mydf$TARG_MAX, na.rm=TRUE)
LSL <- mean(mydf$TARG_MIN, na.rm=TRUE)
cpk <- min(((USL-myavg)/(3*sigma)),((myavg-LSL)/(3*sigma)))
cpkmonthly <- aggregate(mydf$MEAS_AVG, na.rm=TRUE, list(month=months(as.Date(mydf$PDATETIME))), mean)
monthlycpk <- by(mydf$MEAS_AVG, na.rm=TRUE, list(month=months(as.Date(mydf$PDATETIME))), mean)
cpk 'variable to store the entire year's CpK number
cpkmonthly 'variable to store the each month's mean CpK number
So far, the above script correctly goes through all the code assigns values to the cpkmonthly and cpk variables. Their outputs are as follows:
> cpk
[1] 0.5892231
> cpkmonthly
month x
1 April 0.2456467
2 August 0.2415564
3 July 0.2456895
4 June 0.2541071
5 March 0.1234333
6 May 0.4321418
Question: How to I break apart the appregated "cpkmonthly" variable and assign a seperate variable for each entry? Ideally, I would like each to go into an array, because I would like to have the final output variable be in a HTML display string.
SudoCode:
cpkmonth[1] = April
cpkvalue[1] = .245...
cpkmonth[2] = August
cpkvalue[2] = .2415...
...
I would like the final table in HTML to look like this:
So the final output variable would need to be in this format:
<tr><td>"Total Cpk"</td><tdcpkmonth[0]</td><td>cpkmonth[1]</td><td>...</td></tr>
<tr><td>"cpk"</td><tdcpkvalue[0]</td><td>cpkvalue[1]</td><td>...</td></tr>
For the HTML, I have tried using toJSON/RJSON,R2HTML,HTMLUtil, and a few others, but I am simply looking for one output variable. Is this possible?
You should be able to access both of these columns using the $ syntax:
cpkmonth = cpkmonthly$month
cpkvalue = cpkmonthly$value
you can also use [:
cpkmonth = cpkmonthly['month']