Graphhopper - Travel Times Between All 30,000 Visible Zip Codes? - graphhopper

I'd like to calculate the matrix of travel times between US zipcodes. There are about 30k visible zipcodes, so this is 900 million calculations (or 450 million assuming travel time is the same in both directions).
I haven't used graphhopper before but it seems suited to the task. My question are:
What's the best way of doing it?
Will this overload the graphhopper servers?
How long will it take?
I can supply latitude and longitude for each pair of zip codes.
Thanks - Steve

I've not tested GraphHopper yet for these large amount of points, but it should be possible.
What's the best way of doing it?
It would be probably faster if you avoid the HTTP overhead and use the Java lib directly like in this example. Be sure to assign enough RAM as the matrix itself is already 2g if you only use a short value for the distance or time. See also this question.
Will this overload the graphhopper servers?
The API is not allowed to be used without an API key which you can grab here. Or set up your own GraphHopper server.
How long will it take?
Will take probably some days though.
Warning - enterprisy note: we provide support to setup your servers or for your usecase. And also we sell a matrix add-on which makes those calculations at least 10 times faster.

Related

Defining observation space and reward for traffic signal phase optimization for reinforcement learning

I am trying to use Reinforcement Learning for traffic signal phase optimization for improving traffic flow at intersections.
I am aware that in practice we won't be able to get the information about all the vehicles in each of the lanes.
If we use a camera for getting information about the queue length then we can get accurate data only upto, say 200 meters.
Should I take this into consideration while defining my observation space or can I directly use the data from sumo?
Furthermore, what should be the ideal observation space for such a task?
sumo_rl allows to use various metrics for reward calucation such as pressure metric, queue length metric, etc. What will be a good choice of rewards for my use case or what factors should I consider while defining my reward?
I have tried getting metrics from the e2 detector's output file such as throughput, lane delay and queue length. For the agent however, I might not be able to use them (as traci/sumo wrappers offer better implementations?) So how do I use traci for getting this modified information?
Yes, you should try to match your observation space as close to the real world as possible. SUMO can also filter the data directly (for instance with an E3 detector).
If you want to maximize flow than the reward should also include the flow metric (throughput). It's quite easy to get it via traci (as you already noticed) but I cannot tell how it integrates with your framework since you did not give details about it.

Should I use MySQL Geo-Spatial data types for vector graphics

I am working on a project where I need to store and do computations on SVG paths and points (preferably in MySQL). I need to be able to quickly query whether a point lies within a path. MySQL's Geo-spatial features seems to support this kind of query with the ST_Within function.
However, I have found 2 opposing claims regarding whether MySQL's Geo-spatial functionality takes into account the 'curvature of the earth'. "I understand spatial will factor in the curvature of the earth" and "all calculations are performed assuming Euclidean (planar) geometry as opposed to the geocentric system (coordinates on the Earth's surface)". So, my question is which of the claims is true and whether/how does this effect me?
Also, any general advice on whether I should be taking this approach of storing SVG objects as MySQL Geo-spatial data types is welcome.
Upon further research, it seems that the second claim is true. That is, all computations in MySQL are done without regards to the curvature of the earth and just assumes a flat plane. References:
https://www.percona.com/blog/2013/10/21/using-the-new-mysql-spatial-functions-5-6-for-geo-enabled-applications/
http://www.programering.com/a/MTNwQjMwATI.html
http://blog.karmona.com/index.php/2010/11/01/the-geospatial-cloud/
General advice on whether I should be taking this approach of storing SVG objects as MySQL Geo-spatial data types is still very much welcome.

GoogleMapApi looses precision depending the country you execute

I'm using http://code.google.com/p/php-google-map-api/. I made an application to get latitude and longitude of different street names. But when I execute this script from outside my country this precision is lost and I can't geolocate all the streets.
I think that Google keeps a different index depending of the country you are. How can I change the country (or locale) of my API?
Once we had an experiment on Mobile Network Development. We used GoogleMaps as basic geolocation tool for mapping/locating and measuring Base Stations characteristics. As the result, we've got into trouble very quickly.
We needed rather precise data (about 5 meters maximum deviation) and what do you think ? The street which was 2 km long (what was measured after experiment with required accuracy) was calculated as 1.7 km in GoogleMaps.
Moreover, most of the patches (ground photos) that are shown on map, overlaps each other in different way. Actually, it depends on country and on the precision of shooting, because some countries are more detailed some are not (very not).
Speaking about streets, this deviation is rather considerable to say that it can be precise. GoogleMaps should not be treated as the precise geolocation tool in any case, especially if high precision is required (street level is already above-normal precision).
So, I propose you not to take into account this data very seriously. Otherwise GoogleMaps is a very nice security breach for all of us. Imagine that You have nuclear bomb or missle and you already know where to direct it with accuracy of several meters, sitting somewhere in the middle of the Sahara. Here you are ...

when referring to 'Number Crunching', how intensive is 'intensive'?

I am currently reading / learning Erlang, and it is often noted that it is not (really) suitable for 'heavy number crunching'. Now I often come across this phrase or similar, but never really know what 'heavy' exactly means.
How does one decide if an operation is computationally intensive? Can it be quantified before testing?
Edit:
is there a difference between the quantity of calculations, the complexity of the algorithm or the size of the input values.
for example 1000 computaions of 28303 / 4 vs 100 computations of 239847982628763482 / 238742
When you are talking about Erlang specifically, I doubt that you in general want to develop applications that require intensive number crunching with it. That is - you don't learn Erlang to code a physics engine in it. So don't worry about Erlang being too slow for you.
Moving from Erlang to the question in general, these things almost always come down to relativity. Let's ignore number crunching and ask a general question about programming: How fast is fast enough?
Well, fast enough depends on:
what you want to do with the application
how often you want to do it
how fast your users expect it to happen
If reading a file in some program takes 1ms or 1000ms - is 1000 ms to be considered "too slow"?
If ten files have to be read in quick succession - yes, probably way too slow. Imagine an XML parser that takes 1 second to simply read an XML file from disk - horrible!
If a file on the other hand only has to be read when a user manually clicks a button every 15 minutes or so then it's not a problem, e.g. in Microsoft Word.
The reason nobody says exactly what too slow is, is because it doesn't really matter. The same goes for your specific question. A language should rarely, if ever, be shunned for being "slow".
And last but not least, if you develop some monstrous project in Erlang and, down the road, realise that dagnabbit! you really need to crunch those numbers - then you do your research, find good libraries and implement algorithms in the language best suited for it, and then interop with that small library.
With this sort of thing you'll know it when you see it! Usually this refers to situations when it matters if you pick an int, float, double etc. Things like physical simulations or monte carlo methods, where you want to do millions of calculations.
To be honest, in reality you just write those bits in C and use your favourite other language to run them.
i once asked a question about number crunching in couch DB mapreduce: CouchDB Views: How much processing is acceptable in map reduce?
whats interesting in one of the answers is this:
suppose you had 10,000 documents and they take 1 second each to
process (which is way higher than I have ever seen). That is 10,000
seconds or 2.8 hours to completely build the view. However once the
view is complete, querying any row (?key=...) or row slice
(?startkey=...&endkey=...) takes the same time as querying for
documents directly. Lookup time is O(log n) for the document count.
In other words, even if it takes 1 second per document to execute the
map, it will take a few milliseconds to fetch the result. (Of course,
the view must build first, since it is actually an index.)
I think if you think about your current question in those terms, its an interesting angle to think of your question. on the topic of the language's speed / optimization:
How does one decide if an operation is computationally intensive?
Facebook asked this question about PHP, and ended up writing HIP HOP to solve the problem -- it compiles PHP into C++. They said the reason php is much slower than C++ is because the PHP language is all dynamic lookup, and therefore much processing is required to do anything with variables, arrays, dynamic typing (which is a source of slowdown), etc.
So, a question you can ask is: is erlang dynamic-lookup? static typing? compiled?
is there a difference between the quantity of calculations, the
complexity of the algorithm or the size of the input values. For
example 1000 computaions of 28303 / 4 vs 100 computations of
239847982628763482 / 238742
So, with that said, the fact that you can even grant specific types to numbers of different kinds means you SHOULD be using the right types, and that will definitely cause performance increase.
suitability for number-crunching depends on the library support and inherent nature of the language. for example, a pure functional language will not allow any mutable variables, which makes it extremely interesting to implement any equation solving type problems. Erlang probably falls in to this category.

TCP Slow Start, Congestion Avoidance & Determining Bandwidth

Is there a formula someplace which can be used to determine the minimum number of segments / bytes which need to be transfered across a TCP connection to determine it's bandwidth and which takes into account Slow Start and Congestion Avoidance? I'm aware of the pathrate tool, but I want if possible something a bit simpler that I can incorporate in an app to get a descent ballpark figure. One example of usage would be downloading some data from a webserver in order to determine the optimum number of threads for downloading a bunch of small files automatically. This is related to a previous question I posted: TCP, HTTP and the Multi-Threading Sweet Spot
You can fire up scholar.google.com and search for "TCP chirp". However, that requires hires timers, and if you don't write a kernel tcp congestion control algorithm, you'd have to reimplement TCP in userspace. And that by itself will probably not give good results (general purpose OS are not very good at realtime hires timer related stuff, runnning in userspace).
In theory, using TCP chirp you need as few as 4-5 segments (typically, you'd get better resolution with a longer train of segments) to determine the "optimal" bandwidth.
In any case, since you can not know which path is used (ie. satellite link or tv broadcast in the forward direction), you may need a considerable amount of data (10+ MB, perhaps even 1GB) to get a decent measurement over arbitrary paths. (Satellites can have many dozend MB/s bandwidth, but also latencies in the 1000-3000 ms range; and TCP takes a couple round-trip times to open up cwnd (I'd say around 10 RTTs before a measurement should be started)...
I do not think that there is a fixed number of bytes required to be sent to determine the bandwidth. This number can depend on network type and speed.
Bandwidth is a measure of some resource transferred over a time interval. To get real data you need to measure it. Here are some hints how to do that