I'm trying to find all points that are on (or near < 10 m) from the lines in the below example.
These are two separate vector layers, I want to create a third layer, which is a subset of only the points on or near the lines i.e. removing the outliers.
In QGis I have been trying the following but have not been successful:
Vector > Geoprocessing Tools > Intersection
Vector > Research tools > Select by location
Vector > Data Management Tools > Join attributes by location
In the dialog boxes I've tried adjusting for intersection, and touching at different precisions.
None of these solutions gives the desired effect.
Any tips
This was what I did in the end, was a little convoluted but works:
1) Create buffer around road network and dissolve into a single polygon:
**Vector > Geoprocessing Tools > Fixed distance buffer **
input: Nnes
distance: 0.0001
segments = 100
dissolve = true
rename layer: buffer_lines
2) Create buffer around points:
**Vector > Geoprocessing Tools > Fixed distance buffer **
input: points
distance: 0.00001
segments = 100
dissolve = true
rename layer: buffer_points
3) Select buffer_points fully contained by buffer_lines.
**Vector > Research Tools > Select by location **
from: buffer_points
in: buffer_lines
within
4) Save selected features as new layer, by right clicking layer, and tick selected features only.
Create a buffer around the points. This buffer should be the distance from the line within which you wish to pick up points - in your case 10 metres. It will come in handy later on if you give each point a unique ID before this step (if not already done).
Take the intersections between the buffer and point layer. This will give you the sections of line which sit in these 10 metre buffers. The attributes table will tell you which point the buffer belongs to and which line it is intersected by.
Process in Excel to use the unique IDs to obtain which points sit within 10 metres of the line. You may wish to use a VLOOKUP() or INDEX(MATCH()) formula to obtain the point geometries from the original points layer.
My solution is using "join atributtes by nearest neighbor".
The first layer should be your point layer.
The second should be your lines layer.
IMPORTANT: The (optional) maximum distance would be 10m (in your case).
IMPORTANT: Check the "discard non matching" - this will discard all that are farther than the provided distance
You may or may not actually join atributtes but only the points that fit your maximum distance criteria will be exported to a newly created layer.
Related
I have 2 shapefiles that represent roads, let's call them shapes A and B. Each road is represented as line segments. File B is almost a superset of the other, with just a few roads of A not represented. File A with one segment selected (in red):
In this superset file (B), the segments are smaller. I can say that for every segment in A there are one or more segments in B. I believe there isn't a segment in B that corresponds to more than one segment in A. Here is shapefile B with one segment selected (in red):
The line coordinates aren't exact, just very near each other. Here are the coordinates of the leftmost dot of the selected line:
Dot in file A: -42.92896076999995 , -22.77139965999993
Dot in file B: -43.217942900516830, -22.888565009926047
I'm using geopandas.
How would I cross-reference the two datasets? For each line segment in file B find the associated segment in file A (if it exists)?
The question seems to depend on what you are using as a standard for cross-references. For example, you must first decide whether to assume a case where each segmentation intersects or whether to define it as the minimum distance between each segmentation.
Anyway, using geopandas or shapely for both is not a difficult task. After dividing all the segmentation of A into individual linestrings, you can use the overlay function of geopandas to find the occurrence of even a slight intersection with the segmentation of B.
You will have to decide whether to find the shortest distance orthogonal or the shortest distance between the start point and end point of each segmentation. You can use from shapely.ops import nearest_points etc. You can use all the features of shapely to target the geometry of geopandas.
I have a table for segments storages on table as polygons.
Then I want to get all segments that are touched by another polygon for example a square, or a circle.
On image : http://img.acianetmedia.com/GJ3
I represent small gray boxes as segments and big_BOX .
With this query:
SELECT id, position, ASTEXT( value )
FROM segment
WHERE MBRCONTAINS( GEOMFROMTEXT( 'POLYGON(( 20.617202597319 -103.40838420263,20.617202597319 -103.3795955521,20.590250599403 -103.3795955521,20.590250599403 -103.40838420263,20.617202597319 -103.40838420263))' ) , value )
I got 4 segments that are 100% inside big_BOX,
but how to get ALL segments that are touched by big_BOX ?
result has to be 16 segments.
A simple solution:
Instead of MBRContains, you should use MBRIntersects which will return any results that either fully or partially cross space with your big box.
A caution and full solution:
Dependent on your data, and the rest of your solution (especially on how big box is formed), it may be possible that you return more than 16 segments due the number of decimal places your coordinates use. Whilst this is however quite unlikely and would only ever be possible under extreme circumstances its just a possibility to consider.
At 7 decimal places, you're at 1.1cm accuracy (at the equator). If your big box looked to exactly line up with a 4x4 segment set, it is possible (at an absolute maximum degree) that you actually get a result set of 36 (6x6) due to the coordinates overlapping into the next segment on all sides by even the most minute measurement. Any multiple of 4 between 16 and 36 inclusive could be possible.
Again, this is largely unlikely, but if you wanted to always ensure a result set of 16 you could use a combination of methods such as Area(Intersection(#geom1, #geom2)) to calculate the intersection geography between your big box and Intersecting segments, order on that column descending and take the first 16 results.
Whilst this would guarantee the most appropriate 16 segments, it will add additional overhead to all queries just to cater for the most extreme scenarios.
The choice is yours. Hope it helps.
What is the difference between quantization and simplification?
Is quantization another way of doing simplification?
Is it better to use quantization in certain situations?
Or should i be using a combination of both?
The total size of your geometry is controlled by two factors: the number of points and the number of digits (the precision) of each coordinate.
Say you have a large geometry with 1,000,000 points, where each two-dimensional point is represented as longitude in ±180° and latitude in ±90°:
[-90.07231180399987,29.501753271000098],[-90.06635619599979,29.499494248000133],…
Real numbers can have arbitrary precision (in JSON; in JavaScript they are limited by the precision of IEEE 754) and thus an infinite number of digits. But in practice the above is pretty typical, so say each coordinate has 18 digits. Including extra symbols ([, ] and ,), each point takes at most 1 + 18 + 1 + 18 + 1 = 39 bytes to encode in JSON, and the entire geometry is about 39 * 1,000,000 ≈ 39MB.
Now say we convert these real numbers to integers: both longitude and latitude are reduced to integers x and y where 0 ≤ x ≤ 99 and 0 ≤ y ≤ 99. A simple mapping between real-number points ⟨λ,φ⟩ and integer coordinates ⟨x,y⟩ is:
x = floor((λ + 180) / 360 * 100);
y = floor((φ + 90) / 180 * 100);
Since each coordinate now takes at most 2 digits to encode, each point takes at most 1 + 2 + 1 + 2 + 1 = 7 bytes to encode in JSON, and the entire geometry is about 7MB; we reduced the total size by 82%.
Of course, nothing comes for free: if you remove too much information, you will no longer be able to display the geometry accurately. The rule of thumb is that the size of your grid should be at least twice as big as the largest expected display size for the entire map. For example, if you’re displaying a world map in a 960×500 pixel space, then the default 10,000×10,000 (-q 1e4) is a reasonable choice.
So, quantization removes information by reducing the precision of each coordinate, effectively snapping each point to a regular grid. This reduces the size of the generated TopoJSON file because each coordinate is represented as an integer (such as between 0 and 9,999) with fewer digits.
In contrast, simplification removes information by removing points, applying a heuristic that tries to measure the visual salience of each point and removing the least-noticeable points. There are many different methods of simplification, but the Visvalingam method used by the TopoJSON reference implementation is described in my Line Simplification article so I won’t repeat myself here.
While quantization and simplification address these two different types of information mostly independently, there’s an additional complication: quantization is applied before the topology is constructed, whereas simplification is necessarily applied after to preserve the topology. Since quantization frequently introduces coincident points ([24,62],[24,62],[24,62]…), and coincident points are removed, quantization can also remove points.
The reason that quantization is applied before the topology is constructed is that geometric inputs are often not topologically valid. For example, if you takes a shapefile of Nevada counties and combine it with a shapefile of Nevada’s state border, the coordinates in one shapefile might not exactly match the coordinates in the other shapefile. By quantizing the coordinates before constructing the topology, you snap the coordinates to a regular grid and can get a cleaner topology with fewer arcs, hopefully correctly identifying all shared arcs. (Of course, if you over-quantize, then you can cause too many coincident points and get self-intersecting arcs, which causes other problems.)
In a future release, maybe 1.5.0, TopoJSON will allow you to control the quantization before the topology is constructed independently from the quantization of the output TopoJSON file. Thus, you could use a finer grid (or no grid at all!) to compute the topology, then simplify, then use a coarser grid appropriate for a low-resolution screen display. For now, these are tied together, so I recommend using a finer grid (e.g., -q 1e6) that produces a clean topology, at the expense of a slightly larger file. Since TopoJSON also uses delta-encoded coordinates, you rarely pay the full price for all the digits anyway!
The two are related, but have different purposes and results.
I believe quantization collapses nearby points based on the parameter (which you tune to the expected resolution of the view) - no point in having a resolution higher than the pixels that will be drawing the map. But it doesn't go out of the way to analyze the path to determine the optimal number of points needed to represent the shape.
Simplification is an algorithm that will analyze the polygon and reduce the number of points in an optimal manner such that the overall deformation of the polygon is minimized. Basically, it can be used to dramamatically reduce the number of points (and thus file size) without noticeable impact to the quality of the path.
As a parallel case study, consider a straight line made up of 10 points. Quantization will reduce the number of points (collapsing nearby or coincident points) based on the value you use. Simplification will analyze the line and realize that 8 out of the ten points can be removed without significantly changing the polygon's overall shape, and reduce the line to two points (because there is no deformation of the path by removing points on a line).
See also:
Topojson reference: https://github.com/mbostock/topojson/wiki/Command-Line-Reference
M. Bostock's Simplification article: http://bost.ocks.org/mike/simplify/
Both should be used in combination: quatization to reduce the map to a right sized grid, simplification to optimize the paths.
The problem is as follows,
I would be given a set of x and y coordinates(an coordinate array of around 30 to 40 thousand) of a long rope. The rope is lying on the ground and can be in any shape.
Now I would be given a start point(essentially x and y coordinate) and an ending point.
What is the efficient way to determine the set of x and y coordinates from the above mentioned coordinate array lie between the start and end points.
Exhaustive searching ie looping 40k times is not an acceptable solution (mentioned on the question paper)
A little bit margin for error is acceptable
We need to find the start point in the array, then the end point. For each, we can think of the rope as describing a function of distance from that point, and we're looking for the lowest point on that distance graph. If one point is a long way away and another is pretty close, we can do some kind of interpolation guess of where to search next.
distance
| /---\
|-- \ /\ -
| -- ------- -- ------ ---------- -
| \ / \---/ \--/
+-----------------------X--------------------------- array index
In the representation above, we want to find "X"... we look at the distances at a few points, get an impression of the slope of the distance curve, possibly even the rate of change of that slope, to help guide our next bit of probing....
To refine the basic approach of doing binary- or interpolated- searches in areas where we know the distance values are low, we may be able to use the following:
if we happen to be given the rope length and know the coordinate samples are equidistant along the rope, then we can calculate a maximum change in distance from our target point per sample.
if we know the rope has a stiffness ensuring it can't loop in a trivially small diameter, then
there's a known limit to how fast the slope of the curve can change
distance curve converges to vertical on both sides of the 0 point
you could potentially cross-reference/combine distance with, or use instead, the direction of each point from the target: only at the target would the direction instantly change ~180 degrees (how well the data points capture this still depends on the distance between adjacent samples and any stiffness of the rope).
Otherwise, there's always risk the target point may weirdly be encased by two very distance points, frustrating our whole searching algorithm (that must be what they mean about some margin for error - every now and then this search would have to revert to a O(N) brute-force search because any trend analysis fails).
For a one-time search, sometimes linear traversal is the simplest, fastest solution. Maybe that's the case for this problem.
Iterate through the ordered list of points until finding the start or end, and then collect points until hitting the other endpoint.
Now, if we expected to repeat the search, we could build an index to the points.
Edit: This presumes no additional constraints beyond those mentioned by #koool. Constraining the distance between the points would allow the hill-climbing approach described in #Tony's answer.
I don't think you can solve it accurately using anything other than exhaustive search. Say for cases where the rope is folded into half and the resulting double rope forms a spiral with the two ends on the centre.
However if we assume that long portions of the rope are in straight line, then we can eliminate a lot of points based on the slope check:
if (abs(slope(x[i],y[i],x[i+1],y[i+1])
-slope(x[i+1],y[i+1],x[i+2],y[i+2]))<tolerance)
eliminate (x[i+1],y[i+1]);
This will reduce the search time significantly if large portions of the rope are in straight line. But will be linear WRT number of remaining points.
So basically, you've got a sorted list of the points that comprise the entire rope and you're given two arbitrary points from within that list, and tasked with returning the sublist that exists between those two points.
I'm going to make the assumption that the start and end points that are provided are guaranteed to coincide exactly with points within the sorted list (otherwise it introduces a host of issues, particularly if the rope may be arbitrarily thin and passes by the start/end points multiple times).
That means all you're really looking for are the indices of the two provided coordinates. Or the index of one, and the answer to "is the second coordinate to the right or to the left?".
A simple O(n) solution to that would be:
For each index in array
coord = array[index]
if (coord == point1)
startIndex = index
if (coord == point2)
endIndex = index
if (endIndex < startIndex)
swap(startIndex, endIndex)
return array.sublist(startIndex, endIndex)
Or, if you wanted to optimize for repeated queries, I'd suggest a hashing based approach where you map each cooordinate to its index in the array. Something like:
//build the map (do this once, at init)
map = {}
For each index in array
coord = array[index]
map[coord] = index
//find a sublist (do this for each set of start/end points)
startIndex = map[point1]
endIndex = map[point2]
if (endIndex < startIndex)
swap(startIndex, endIndex)
return array.sublist(startIndex, endIndex)
That's O(n) to build the map, but once it's built you can determine the sublist between any two points in O(1). Assuming an efficient hashmap, of course.
Note that if my assumption doesn't hold, then the same solutions are still usable, provided that as a first step you take the provided start and end points and locate the points in the array that best correspond to each one. As noted, unless you are given some constraints regarding the thickness of the rope then interpolating from an arbitrary coordinate to one that's actually part of the rope can only be guesswork at best.
I am trying to get the current traffic conditions at a particular location. The GTrafficOverlay object mentioned here only provides an overlay on an existing map.
Does anyone know how I can get this data from Google using their API?
It is only theorical, but there is perhaps a way to extract those data using the distancematrix api.
Method
1)
Make a topological road network, with node and edge, something like this:
Each edge will have four attributes: [EDGE_NUMBER;EDGE_SPEED;EDGE_TIME,EDGE_LENGTH]
You can use the openstreetmap data to create this network.
At the begining each edge will have the same road speed, for example 50km/h.
You need to use only the drivelink and delete the other edges. Take also into account that some roads are oneway.
2)
Randomly chose two nodes that are not closer than 5 or 10km
Use the dijsktra shortest path algorithm to calculate the shortest path between this two nodes (the cost = EDGE_TIME). Use your topological network to do that. The output will look like:
NODE = [NODE_23,NODE_44] PATH = [EDGE_3,EDGE_130,EDGE_49,EDGE_39]
Calculate the time needed to drive between the two nodes with the distance matrix api.
Preallocate a matrix A of size N X number_of_edge filled with zero value
Preallocate a matrix B of size 1 X number_of_edge filled with zero value
In the first row of matrix A fill each column (corresponding to each edge) with the length of the edge if the corresponding edge is in the path.
[col_1,col_2,col_3,...,col_39,...,col_49,...,col_130]
[0, 0, len_3,...,len_39,...,len_49,...,len_130] %row 1
In the first row of matrix B put the time calculated with the distance matrix api.
Then select two news node that were not used in the first path and repeat the operation until that there is no node left. (so you will fill the row 2, the row 3...)
Now you can solve the linear equation system: Ax = B where speed = 1/x
Assign the new calculated speed to each edge.
3)
Iterate the point 2) until your calculated speed start to converge.
Comment
I'm not sure that the calculated speed will converge, it will be interesting to test the method.I will try to do that if I got some time.
The distance matrix api don't provide a traveling time more precise than 1 minute, that's why the distance between the pair of node need to be at least 5 or 10 or more km.
Also this method fails to respect the Google's terms of service.
Google does not make available public API for this data.
Yahoo has a feed (example) with traffic conditions -- construction, accidents, and such. A write-up on how to access it is here.
If you want actual road speeds, you will probably need to work with a commercial provider.