I'm following this example:
https://developer.here.com/api-explorer/rest/routing/time-based-isoline-start-as-center
but it seems to always return a different result depending on when I call it. I would like to specify the departure time the isochrone is calculated at but I don't see any like in the simple routing example.
Can I specify this parameter or do I just have to make my program run at different times of day automatically (I'm proposing this solution just as a joke)
The fact that you get a different result depending on when you call the API is because the routing algorithm takes the traffic speed and incidents into account when calculating the route.
You can set a departure time in the request with the departure query parameter with a specific time as a value, or with the special value now:
departure
Time when travel is expected to start. Traffic speed and incidents are taken into account when calculating the route. You can use now to specify the current time. It can be used only if parameter start is also used. Type: xs:dateTime.
departure=2013-07-04T17:00:00+02
Note: When the optional timezone offset is not specified, the departure value is assumed to be in local time.
Source: Calculate Isoline API Reference
Related
I need to measure the response times of my API endpoints using graphite but I'm unsure with the namespace.
Suppose I have one endpoint to measure
/results/
/results/?organization_id=100&limit=100
Assuming I have tons of results, the first endpoint response time will be much slower than the second one (and I need to measure both endpoints).
So how should I create the namespace for this in graphite? Is it common to include query params in the namespace (e.g. project.results.get.organization_id=100&limit=100 and project.results.get.all)? One of my concern to include the query params is that the namespace will blew up if query params are continuously changing (different organization, different limit, different field, etc)
I believe the best metric naming scheme (namespaces) depends on what do you want to use these metrics for.
For example, assuming that in your case what you want to check is the average, min and max time for the /results/ endpoint whatever the query parameters are, you would use a single metric name like stats.results.timer. This is useful to monitor how long requests are taking and make sure they are fast enough.
On the other hand, if you consider that querying /results/ with and without querystring parameters have two very different timings and you want to consider them separately, you could create individual metrics for the different kind of parameters included, like stats.results.organization_id__limit, stats.results.organization_id, etc. I wouldn't ever think of including the parameter values in the metric name, since it would indeed potentially blow up the metrics namespace.
I would definitely go for the first option, but again it depends a lot on your concrete use case.
Here's the scenario. I'm using a MySQL/NodeJS/Sequelize stack on the server, and I have a request I want to perform.
There are anywhere from 1000-2000 entries that are retrieved from the request, but I don't need the full list of entries. I want the summary of the entries after they have been grouped by the day the entry was made, which condenses down to about 5-10 objects in an array.
If I group by day on the server, it may group them differently from the time zone of the client. And if I send it to the client, then I have to send 1-2k entries for analysis, but it will group them correctly.
How would you handle this scenario?
Use UTC timestamps for everything. For example open the Chrome console and enter this:
// Milliseconds since 1/1/1970
new Date().getTime()
1438263135084
// Human friendly without timezone
new Date().toISOString()
"2015-07-30T13:32:15.715Z"
// Human friendly with timezone
new Date().toString()
"Thu Jul 30 2015 09:32:21 GMT-0400 (EDT)"
All of these are the same (time since 1/1/1970), but simply formatted differently. Using either of the first two will allow you to send dates that can be interpreted and formatted correctly regardless of timezone.
Interesting reading: https://en.wikipedia.org/wiki/Coordinated_Universal_Time
Perhaps you could convert the dates to the user's timezone Django MySQL group by day with timezone - but that of course requires you to know the user's timezone, and caching will be harder
In sequelize, it would be expressed something like
sequelize.fn('CONVERT_TZ', sequelize.col('date'), 'UTC', user_tz)
First off, thanks all for your suggestions on how to handle this. Ultimately, I resolved this issue by doing the following.
First, in the request from the client, I used the moment js method .utcOffset() to get the offset value.
Then, once I had the offset value, I included it into the API query as ?offset=(value here)
Finally, on the server side, I used this value to interpret the local timezone offset of the client, and then grouped all the data there, and send the formatted data back to the client. It resulted in a much faster, much, much smaller response query.
It was a fairly simple solution - I'm bugged it took me to long to come up with it.
I am trying to use the following formula and a key I obtained from mapquest (for the free service) to initiate their location services and calculate the distances between one variable location and 20 of my plant locations.
=importXML("http://mapquestapi.com/directions/v2/route?key=*****************&outFormat=xml&from=" & $B$3 & "&to=" & 56244,"//response/route/distance")
This is working flawlessly expect after using it for a short period of time I have received an email stating I have used 80% of my allotment (15000 transactions) for the month.
The variable location has only been changed around 20-25 times this month so I don't see how I could have used that many transactions. Can someone explain what exactly this formula is doing and how I could make it more efficient if possible? I feel like it has to be using transaction that are unnecessary. Keep in mind I do not need the actual directions all I need is the driving mileage required.
Thanks in advance.
Intro
I'm using a slightly modified GTFS database.
I have a first step algorithm that given two geographical locations provides:
the list of stops around departure and arrival
the list of routes that connects those list of stops
The second step algorithm finds the best journeys matching those stops and routes.
This is working well on direct journeys as well as journeys using one connection.
My problem arises when trying to find the best journey using 2 connections (so there are 3 trips to be searched).
Database
The GTFS format has the following tables (each table has a foreign key to the previous/next table in this list):
stops: stop information (geolocation, name, etc)
stop_times: timetable
trips: itinerary taken by a vehicle (bus, metro, etc)
routes: family of trips that roughly take the same path (e.g. standard and express trips on the same route, but different stops taken)
I have added the following tables
stop_connections: stop to stop connections (around 1 to 20)
stops_routes: lists the available routes at every stop
Here's the table row count in a city where I get slow results (Paris, France):
stops: 28k
stop_times: 12M
trips: 513k
routes: 1k
stop_connections: 365k
stops_routes: 227k
Algorithm
The first step of my algo takes two latitude/longitude points as input, and provides:
the list of stops at each location
the routes that can be used to connect those stops (with up to two connections)
The second step takes each start stop, and analyses the available journeys that use only the routes selected by the first step.
This is the part that I'm trying to optimize. Here's how I'm querying the database:
My search terms (green in the picture):
one departure stop
several arrival stops (1 to 20)
allowed routes at departure, at first connection and on last trip
service ID (not relevant here, can be ignored)
Here's what I do now:
Start from a stop => get timetable => get trips => get routes; filter on allowed routes.
Connect the arrival stops of the first trip to a list of possible stops using stop_connections
Repeat from step 1 two times so that I have 3 trips/2 connections
The problem
This is working fine on some cases, but it can be very slow in others. Usually as soon as I join the timetable or the stop connections, there is a 10x increase of the returned rows. Since I'm joining these table 8 times, there are potentially 10^8 rows to be searched by the engine.
Now I'm sure that I can get this to be more efficient.
My problem is that the number of rows increases at every join, and the arrival stop selection is made at the very end.
I mean I get all the possible journeys from a given stop at a given departure time (there can be millions of combinations), and only when my search reaches the last trip, I can filter on the ~20 allowed arrival stops.
It could be much faster if I could somehow 'know' soon enough that a route isn't worth searching.
Optimizations
Here's what I tried/thought of:
1. Inner join stops_routes when joining stop_connections
Only select stops at a connection that lead to the allowed routes at next trip.
This is sometimes efficient when there is a lot of connections and not all the connected stops are interesting (some connected stop might only be used by a route we don't want to take).
However this inner join can increase the number of rows if there are not many connected stops and a lot of allowed routes.
2. Partition the stop_times table
Create a smaller copy of the stop_times that contains only the timetable of the next two hours or so. Indeed, having the database engine search for the timetable (up to 10pm for example) when my trips starts at 8am is useless. Keeping only 8am-10am is enough and much faster.
This is very efficient, because it dramatically decreases the number of rows to be searched.
I have implemented this with success, it decreased the search time by a factor of about 10x or even 100x.
3. Identify 'good' and 'bad' routes
There is usually, in a metropolitan area, large routes that are very useful when travelling large distances. But these routes aren't the best option when travelling small distances. A human person who knows his own city's public transportation system will quickly tell that from this neighborhood to this other, the best option is to take a specific route.
However this is very difficult to do, and requires a customization on every city.
I plan to make this algo completely independant of the city, so I'm not really willing to go down that road
4. Use crowdsourcing to identify paths that work well
The first search is slow, but the information taken from it can be used to serve fast result to the next person with a similar journey.
However there are so many combinations of departure and arrival stops that the information taken from one query might not be very useful.
I don't know if this is a good idea. I haven't implemented it.
Next
I'm running out of ideas. I know this is not a programming question, but rather a request of ideas on an algorithm. I hope this falls into the SO scope.
Having it on a network makes things a little bit interesting, but fundamentally, you're doing pathfinding, which is a slow process. You're running into the exponential nature of the problem, and doing so with only 3 connections.
I have a couple suggestions that you can perhaps use while doing this with mysql, and a couple that are likely not implementable within it.
Rather than partitioning the timetable, only take the next time for any given route. If you're leaving at 8 AM, you're correct, only looking at routes from 8-10 is better than looking at them all. However, if there's a route from A-B that leaves at 8:20, 8:40, 9:00, 9:15, 9:25, 9:45... there is zero reason to take them all: just take the first arrival time for any given route, since it's strictly better than the rest.
I presume you are pruning any routes that return to an already-visited location? If not, you perhaps should be: they're not useful for you. This may be somewhat difficult to do within the SQL framework.
Depending on its coverage, you could perhaps find a path using the (much smaller) routes table, and then find the best implementation of the top working paths from the trips table.
This is likely impossible within the framework of SQL, but the thing that makes most decent pathfinding algorithms fast is that they use a heuristic to search. Your search goes down every possible route -- it would be a lot faster to first look down the route that leads in the right direction. If it doesn't pan out, less likely directions are picked. The key here is that as soon as you have a result, you return it -- you effectively pruned every route you didn't yet search by the time you returned an answer.
Pre-calculated preferred routes: you suggest this would require human intervention, but I counter that you could do it computationally. Spend the time properly searching for routes from various points to various other points, and check on the statistics of how the routes worked. I would expect that you will find things allowing you to make a "anywhere over here to anywhere over there is going to use this intermediate path" table -- your problem is reduced from "find a path from A to B" to "find a path from A to C, followed by a path from D to B". Doing this will have the potential of causing you to find sub-optimal routes (as you are making an assumption from the precalculated statistics), but it may let you find that sub-optimal route much faster. On a mesh layout it will not work at all well; on a hub layout it will work excellently.
Thanks to zebediah49, I have implemented the following algorithm:
0. Lookup tables
First, I have created an ID on the trips table, that uniquely identifies it. It is based on the list of stops taken in sequence. So this ID guarantees that two trips with the same ID will take exactly the same route.
I called this ID trip_type.
I have improved my stop_connections table so that it includes a cost. This is used to select the best connection when two 'from' stops are connected to the same 'to' stop.
1. Get trips running from the departure stop(s)
Limit those trips to only 1 per trip type (group by trip_type)
2. Get arrival stops from these trips
Select only the best trip if there are two trips reaching the same stop
3. Get connected stops from these arrival stops
Select only the best connection if there are >1 stops that are connected to the same stop
4. Repeat from step 1
I have splitted this into several subqueries and temporary tables, because I can easily group and filter the best stops/trips at each step. This ensures that the minimum searches are sent to the SQL server.
I have stored this algorithm into an SQL procedure, that will do this in a single SQL statement:
call Get2CJourneys(dt, sd, sa, r1, r2, r3)
Where:
dt: departure time
sd: stops at departure point
sa: stops at arrival point
r1, r2, r3: allowed routes for the 1st, 2nd and 3rd trips
The procedure call returns interesting results in <600ms where my previous algorithm returned the same results in several minutes.
Expanding on #zebedia49's fourth point, you can precompute the vector traveled by a route, e.g. a route going due north has a vector of 0, due west = 90, due south = 180, due east = 270. Only return routes whose vectors are within, say, +/- 15 modulo 360 degrees from the as-the-crow-flies route (or +/- 30 if the +/- 15 query doesn't return any hits).
We are trying to get the last N changes for a user, and currently do so by getting the largestChangeId, then subtracting a constant from that and getting more changes.
As an example, we typically are making API calls with the changestamp = largestChangeId - 300, with maxResults set to 300.
We've seen as few as half a dozen changes to 180 changes come back across our userbase with these parameters.
One issue that we're running into is that the number of changes that we get back are rather unpredictable, with huge jumps in change stamps for some users and so we've have to choose between two rather unpalatable scenarios to get the last N changes.
Request lots of changes, which can lead to slow API calls simply because there are lots of changes.
Requests a small set of changes, and seek back progressively in smaller batches, which is also slow as it results in multiple RPC calls, due to multiple API calls.
Our goal is to get the last ~30 or so changes for a user as fast as possible.
As a workaround, we are currently maintaining per user state in our application to tune the max number of changes we request up or down based on the results we got for a user the last time around. However, this is somewhat fragile due to how the rate of changes incrementing for users can vary over time.
So my question is as follows:
Is there a way to efficiently get the last N changes a user, specifically in one API call ?
ID generation is very complex, it's impossible to calculate the ID of the user's nth latest change :) Changes list actually has no features that'd be appropriate for your use case. In my own personal opinion, changes list should be in the reverse chronological order, going to discuss it with the rest of the team.