R - Vectorize a JSON call - json

Working with Mapquest directions API to plot thousands of routes using ggplot2 in R.
Basic code theory: Have a list of end locations and a single start location. For each end location, a call to fromJSON returns routing coordinates from Mapquest. From there, have already vectorized the assignment of coordinates (read as lists in lists) to the geom_path geom of ggplot2.
Right now, running this on a location set of ~ 1200 records takes ~ 4 minutes. Would love to get that down. Any thoughts on how to vectorize the call to fromJSON (which returns a list of lists)?
Windows 7, 64-bit, R 2.14.2
libraries: plyr, ggplot2, rjson, mapproj, XML
k = 0
start_loc = "263+NORTH+CENTER+ST.,+MESA+ARIZ."
end_loc = funder_trunc[,length(funder_trunc)]
route_urls = paste(mapquest_baseurl, "&from=", start_loc, "&to=", end_loc, "&ambiguities=ignore", sep="")
for (n in route_urls) {
route_legs = fromJSON(file = url(n))$route$legs[[1]]$maneuvers
lats = unlist(lapply(route_legs, function(x) return(x$startPoint[[2]])))
lngs = unlist(lapply(route_legs, function(x) return(x$startPoint[[1]])))
frame = data.frame(cbind(lngs, lats))
path_added = geom_path(aes(lngs, lats), data = frame)
p = p + path_added
k = k + 1
print(paste("Processed ", k, " of ", nrow(funder_trunc), " records in set.", sep=""))
}

Going out on a limb here since I don't use rjson or mapproj, but it seems like calling the server thousands of times is the real culprit. If the mapquest server doesn't have an API that allows you to send multiple requests in one go, you are in trouble. If it does, then you need to find out how to use/modify rjson and/or mapproj to call it...
As #Chase said, you might be able to call it in parallel, but the server won't like getting too many parallel requests from the same client - it might ban you. Btw, it might not even like getting thousands of serial requests in rapid succession from the same client either - but apparently your current code works so I guess it doesn't mind.

Related

How to get dataset into array

I have worked all the tutorials and searched for "load csv tensorflow" but just can't get the logic of it all. I'm not a total beginner, but I don't have much time to complete this, and I've been suddenly thrown into Tensorflow, which is unexpectedly difficult.
Let me lay it out:
Very simple CSV file of 184 columns that are all float numbers. A row is simply today's price, three buy signals, and the previous 180 days prices
close = tf.placeholder(float, name='close')
signals = tf.placeholder(bool, shape=[3], name='signals')
previous = tf.placeholder(float, shape=[180], name = 'previous')
This article: https://www.tensorflow.org/guide/datasets
It covers how to load pretty well. It even has a section on changing to numpy arrays, which is what I need to train and test the 'net. However, as the author says in the article leading to this Web page, it is pretty complex. It seems like everything is geared toward doing data manipulation, where we have already normalized our data (nothing has really changed in AI since 1983 in terms of inputs, outputs, and layers).
Here is a way to load it, but not in to Numpy and no example of not manipulating the data.
with tf.Session as sess:
sess.run( tf.global variables initializer())
with open('/BTC1.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter =',')
line_count = 0
for row in csv_reader:
?????????
line_count += 1
I need to know how to get the csv file in to the
close = tf.placeholder(float, name='close')
signals = tf.placeholder(bool, shape=[3], name='signals')
previous = tf.placeholder(float, shape=[180], name = 'previous')
so that I can follow the tutorials to train and test the net.
It's not that clear for me your question. You might be answering, tell me if I'm wrong, how to feed data in your model? There are several fashions to do so.
Use placeholders with feed_dict during the session. This is the basic and easier one but often suffers from training performance issue. Further explanation, check this post.
Use queue. Hard to implement and badly documented, I don't suggest, because it's been taken over by the third method.
tf.data API.
...
So to answer your question by the first method:
# get your array outside the session
with open('/BTC1.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter =',')
dataset = np.asarray([data for data in csv_reader])
close_col = dataset[:, 0]
signal_cols = dataset[:, 1: 3]
previous_cols = dataset[:, 3:]
# let's say you load 100 row each time for training
batch_size = 100
# define placeholders like you
...
with tf.Session() as sess:
...
for i in range(number_iter):
start = i * batch_size
end = (i + 1) * batch_size
sess.run(train_operation, feed_dict={close: close_col[start: end, ],
signals: signal_col[start: end, ],
previous: previous_col[start: end, ]
}
)
By the third method:
# retrieve your columns like before
...
# let's say you load 100 row each time for training
batch_size = 100
# construct your input pipeline
c_col, s_col, p_col = wrapper(filename)
batch = tf.data.Dataset.from_tensor_slices((close_col, signal_col, previous_col))
batch = batch.shuffle(c_col.shape[0]).batch(batch_size) #mix data --> assemble batches --> prefetch to RAM and ready inject to model
iterator = batch.make_initializable_iterator()
iter_init_operation = iterator.initializer
c_it, s_it, p_it = iterator.get_next() #get next batch operation automatically called at each iteration within the session
# replace your close, signal, previous placeholder in your model by c_it, s_it, p_it when you define your model
...
with tf.Session() as sess:
# you need to initialize the iterators
sess.run([tf.global_variable_initializer, iter_init_operation])
...
for i in range(number_iter):
start = i * batch_size
end = (i + 1) * batch_size
sess.run(train_operation)
Good luck!

Parsing data requests from google flights using google flights package

I'm working on interacting with the google flights api (qpx). I am using the following link and working with the following experimental package to feed in information for a request:
https://github.com/rweyant/googleflights
Below is the code I have thus far for anyone interested in replicating my results:
#call library and data-------------------------------------------------------------------
library(googleflights)
library(MUCflights) #to access airport codes
data("airports")
#codes for countries i'm interested in------------------------------------------
code_list = airports
#later interface for updating codes
my_destinations = matrix(c("San Juan", "Amsterdam", "Berlin",
"San Diego", "Lima", "Cali", "Havana"))
my_home = matrix(c("LGA", "JFK"))
#loop extract
code_list = airports
code_bucket = NULL
for (i in my_destinations) {
print(i)
drop = code_list[code_list$City == i,c("City","IATA")]
drop = as.data.frame(drop)
print(drop)
code_bucket = rbind(code_bucket, drop)
code_bucket = as.data.frame(code_bucket)
}
#clean my code bucket---------------------------------------------------------------
code_bucket = na.omit(code_bucket)
code_bucket = code_bucket[code_bucket$IATA != "",]
code_bucket
#feed in codes into function---------------------------------------------------------
#each ping to QPX will combine NYC to x
#data i want
# pricing
# times
key = "(key is here)"
set_apikey(key)
result_flights = search(my_home[1], code_bucket[2,2], "2016-11-27", "2016-11-28")
I've been looking through the package details to understand the functionality and noticed that the request comes back as a list as opposed to a JSON, which seems to be for the application of a "summarise_segment" function that isn't working for me. Here is the link to the function I'm referencing:
https://github.com/rweyant/googleflights/blob/master/R/unpack.R
I'm wondering if anyone has any luck or ideas for parsing out the request that returns? The resulting list is large and I'm reaching the limits of my knowledge on dealing with these structures. Any help in pointing me in the right direction would be appreciated!

How different colors to different sections of a route on leaflet map? [R Studio]

I have a JSON file of a long route. The file contains the lat and long of of this route.
I'm trying to mark different sections of this route based on a set of criteria (which I've compiled in a dataframe). However, I'm facing to problems:
1) How do I break up this long set of lat and longs into segments? (can't do this manually because I have many route variations)
2) How do I assign a variable color to each segment?
I intend to use leaflet map (for its interactivity), but I'm open to better suggestions.
When working with spatial data, it helps to know spatial classes! I am assuming you know hoe to read your JSON file as a data frame into R.
Here's a reproducible example:
library(mapview)
library(sp)
### create some random data with coordinates from (data("breweries91", package = "mapview"))
set.seed(123)
dat <- data.frame(val = as.integer(rnorm(32, 10, 2)),
lon = coordinates(breweries91)[, 1],
lat = coordinates(breweries91)[, 2])
### state condition for creation of lines
cond <- c(8, 9, 10)
### loop through conditions and create a SpatialLines object for each condition
lns <- lapply(seq(cond), function(i) {
ind <- dat$val == cond[i]
sub_dat <- dat[ind, ]
coords <- cbind(sub_dat$lon, sub_dat$lat)
ln <- coords2Lines(coords, ID = as.character(cond[i]))
proj4string(ln) <- "+init=epsg:4326"
return(ln)
})
### view lines with mapview
mapview(lns[[1]], col = "darkred") +
mapview(lns[[2]], col = "forestgreen") +
mapview(lns[[3]], col = "cornflowerblue")
Essentially, what we are doing here is create a valid sp::SpatialLines object for each condition we specify. The we plot those using mapview given you mentioned interactivity. Plotting of spatial objects can be achieved in many ways (base, lattice, ggplot2, leaflet, ...) so there's many options to choose. Have a look at sp Gallery for a nice tutorial.
Note: This answer is only valid for non-projected geographic coordinates (i.e. latitude/longitude)!

qmplot error Error in unlist(.all_aesthetics[1:42]) : object '.all_aesthetics' not found

I'm trying to plot points on a map using ggmap, ggplot2 libraries. I'm successful using get_map to prepare the map, then ggmap to plot it...although I'm only able to plot ~80 coordinate points before I get an error that I'm exceeding the google map api limit of 2048 chars. Does this limit seem correct/expected?
moving on to try using qmplot & qmap commands to (hopefully) overcome this constraint.
I'm successful with the qmplot command; I'm using:
qmap("austin", zoom = 11, source="google", maptype = "roadmap", scale = 2) to create the map.
NOT successful with qmap command. I'm using:
'qmplot(coord$lon, coord$lat, data = coord)`
coord is a df with lat/lon pairs.
I get the error: Error in unlist(.all_aesthetics[1:42]) :
object '.all_aesthetics' not found
I haven't been able to find (google) anything about this error mode.
To proof myself, I try running example code from pg 47 & 48:
https://cran.r-project.org/web/packages/ggmap/ggmap.pdf, example top of page 47
violent_crimes <- subset(crime,
offense != "auto theft" &
offense != "theft" &
offense != "burglary"
)
qmplot(lon, lat, data = violent_crimes, colour = offense,
size = I(3.5), alpha = I(.6), legend = "topleft")
preparing the violent crimes (using a built-in R dataset) command work fine. qmplot command results in the same error message that I"m getting with my code.
It's a bug that was addressed. See here
devtools::install_github("dkahle/ggmap")

Octave and multiple Bode plots

I'm teaching myself Octave and as a motivational exercise am attempting to create some Bode plots. I'd like to create a plot that has multiple curves for different values of a parameter in a transfer function, for example the time constant of a simple RC filter. I'm trying to do it as follows:
tau = [1,2,3]
for i = tau
g(i) = tf(1,[tau(i),1])
endfor
bode(g(1),g(2),g(3))
But it doesn't work, I get the error
error: octave_base_value::imag (): wrong type argument `struct'
However, it works fine if there are not multiple arguments to the bode command and the last line is simply:
bode(g(1))
Any advice as to where I've gone wrong would be appreciated - is there a better way to do what I want to do?
I was able to do it with the following sequence (with octave 3.2.4 on debian):
bode(g(1))
set (findobj (gcf, "type", "axes"), "nextplot", "add")
bode(g(2))
bode(g(3))
The second command is similar to hold on but it works when there are subplots; I found it here.
Using your own code:
subplot(211), hold on
subplot(212), hold on
tau = [1,2,3]
for i = 1:length(tau),
g(i) = tf(1,[tau(i),1]);
bode(g(i))
endfor
The problem with this solution is that you cannot identify a specific plot. You cannot access figure properties through bode() function directly.
Here then a plausible solution to bring you colorful plots:
colorsplot = ["b","m","g"]
tau = [1,2,3]
g = tf(1,[tau(1),1]);
[mag, ph, w] = bode(g);
subplot(211), semilogx(w,20*log(mag)), hold on
subplot(212), semilogx(w,ph), hold on
for i = 2:length(tau),
g = tf(1,[tau(i),1]);
[mag, ph, waux] = bode(g,w);
subplot(211), semilogx(w,20*log(mag),colorsplot(i))
subplot(212), semilogx(w,ph,colorsplot(i))
endfor