How to cross reference 2 shapefiles of road segments? - gis

I have 2 shapefiles that represent roads, let's call them shapes A and B. Each road is represented as line segments. File B is almost a superset of the other, with just a few roads of A not represented. File A with one segment selected (in red):
In this superset file (B), the segments are smaller. I can say that for every segment in A there are one or more segments in B. I believe there isn't a segment in B that corresponds to more than one segment in A. Here is shapefile B with one segment selected (in red):
The line coordinates aren't exact, just very near each other. Here are the coordinates of the leftmost dot of the selected line:
Dot in file A: -42.92896076999995 , -22.77139965999993
Dot in file B: -43.217942900516830, -22.888565009926047
I'm using geopandas.
How would I cross-reference the two datasets? For each line segment in file B find the associated segment in file A (if it exists)?

The question seems to depend on what you are using as a standard for cross-references. For example, you must first decide whether to assume a case where each segmentation intersects or whether to define it as the minimum distance between each segmentation.
Anyway, using geopandas or shapely for both is not a difficult task. After dividing all the segmentation of A into individual linestrings, you can use the overlay function of geopandas to find the occurrence of even a slight intersection with the segmentation of B.
You will have to decide whether to find the shortest distance orthogonal or the shortest distance between the start point and end point of each segmentation. You can use from shapely.ops import nearest_points etc. You can use all the features of shapely to target the geometry of geopandas.

Related

How to convert a directed graph to its most minimal form?

I'm dealing with rooted, directed, potentially cyclic graphs. Each vertex in the graph has a label, which might or might not be unique. Edges do not have labels. The graph has a designated root vertex from which every vertex is reachable. The order of the edges outgoing from a vertex is relevant.
For my purposes, a vertex is equal to another vertex if they share the same label, and if their outgoing edges are also considered equal (and are in the same order). Two edges are equal if they have the same direction and if the vertices at their corresponding ends are equal.
Because of the equality rules above, a graph can contain multiple "sections" that are effectively equal. For example, in the graph below, there are two isomorphic sections containing vertices with labels {1, 2, 3, 4}. The root of the graph is vertex 0.
(source: graphonline.ru)
I need to be able to identify sections that are identical, and then remove all duplication, without changing the "meaning" of the graph (with regard to the equality rules above). Using the above example as input, I need to produce this:
(source: graphonline.ru)
Is there a known way of doing this within polynomial time?
The solution that ended up working was to essentially run the recursive equality check against every pair of vertices with the same label.
Let S = all pairs of vertices with the same label
For each s in S:
Compare the two vertices a and b in s by recursively comparing their children
If they compare as equal, take all edges in the graph pointing to b, and point them to a instead

QGIS find points that are on or near a line

I'm trying to find all points that are on (or near < 10 m) from the lines in the below example.
These are two separate vector layers, I want to create a third layer, which is a subset of only the points on or near the lines i.e. removing the outliers.
In QGis I have been trying the following but have not been successful:
Vector > Geoprocessing Tools > Intersection
Vector > Research tools > Select by location
Vector > Data Management Tools > Join attributes by location
In the dialog boxes I've tried adjusting for intersection, and touching at different precisions.
None of these solutions gives the desired effect.
Any tips
This was what I did in the end, was a little convoluted but works:
1) Create buffer around road network and dissolve into a single polygon:
**Vector > Geoprocessing Tools > Fixed distance buffer **
input: Nnes
distance: 0.0001
segments = 100
dissolve = true
rename layer: buffer_lines
2) Create buffer around points:
**Vector > Geoprocessing Tools > Fixed distance buffer **
input: points
distance: 0.00001
segments = 100
dissolve = true
rename layer: buffer_points
3) Select buffer_points fully contained by buffer_lines.
**Vector > Research Tools > Select by location **
from: buffer_points
in: buffer_lines
within
4) Save selected features as new layer, by right clicking layer, and tick selected features only.
Create a buffer around the points. This buffer should be the distance from the line within which you wish to pick up points - in your case 10 metres. It will come in handy later on if you give each point a unique ID before this step (if not already done).
Take the intersections between the buffer and point layer. This will give you the sections of line which sit in these 10 metre buffers. The attributes table will tell you which point the buffer belongs to and which line it is intersected by.
Process in Excel to use the unique IDs to obtain which points sit within 10 metres of the line. You may wish to use a VLOOKUP() or INDEX(MATCH()) formula to obtain the point geometries from the original points layer.
My solution is using "join atributtes by nearest neighbor".
The first layer should be your point layer.
The second should be your lines layer.
IMPORTANT: The (optional) maximum distance would be 10m (in your case).
IMPORTANT: Check the "discard non matching" - this will discard all that are farther than the provided distance
You may or may not actually join atributtes but only the points that fit your maximum distance criteria will be exported to a newly created layer.

What's the difference between LineString and Multipoint in GeoJSON

What's the difference between LineString and MultiPoint in GeoJSON?
To me the examples given are identical. http://geojson.org/geojson-spec.html#id3.
I'm planning some things in GeoJSON and if something as basic as this is confusing I'm in trouble.
A LineString is a Polyline or from Wikipedia "a curve specified by the sequence of points". So if you like to do a track or route based on latitude and longitude for a map application - use LineString.
MultiPoint is simply a collection of points without lines between them. Say a group of people. So the use cases are very different.
Specification of LineString requires at least two positions.
Other than that there is only a difference in intent. LineString defines a line through the points in given order. MultiPoint defines a finite collection of points.

Please explain ST_GeomFromText parameters

I am having trouble understanding ST_GeomFromText. It looks like there are 3 sets of 2 numbers. Why is that? Wouldn't coordinates just consist of a latitude and longitude?
Here is an example from http://postgis.net/docs/ST_GeomFromText.html:
SELECT ST_GeomFromText('LINESTRING(-71.160281 42.258729,-71.160837 42.259113,-71.161144 42.25932)');
ST_GeomFromText() takes a WKT expression of a geometry object and
Constructs a PostGIS ST_Geometry object from the OGC Well-Known text representation.
The WKT expression in the example is a LINESTRING which is
a one-dimensional object representing a sequence of points and the line segments connecting them.
You might think a linestring would be two-dimensional, but it's not, because a line has no width or height. (Points are 0-dimensional, polygons are 2-dimensional).
So, by definition, that would have more than one set of coordinates. A pair of coordinates would be a POINT, not a linestring, and would look something like this, in conjunction with the function in question:
ST_GeomFromText('POINT (30 10)');
You may want to read up on some GIS fundamentals:
http://www.cise.ufl.edu/~mschneid/Service/Tutorials/TutorialSDT.pdf - excellent tutorial
http://www.opengeospatial.org/standards/orm - OGC Reference Model

Multidimensional interpolation

Given a dataset of samples in a multi dimensional space (in my case a 4D space) where the samples are present on all the corners of the 4D cube and a substantial amount of samples within this cube but not in a neatly grid. Each sample has an output value next to it's 4D coordinate. The cube has coordinates [0,0,0,0]..[1,1,1,1].
Given a new coordinate (4D) how can I come up with the best interpolated value given these samples? Eg how do I choose the samples to start with, how to interpolate.
As a first guess I would guess that this can be done with a two step process:
find the smallest convex pentachoron (4D equivalent of the 3D tetrahedron / the 2D triangle) around the coordinate we need to interpolate.
interpolate within this tetrahedron.
Especially step 1 seems quite complex and slow.
Here's the first approach I'd try.
Step 1
Find the point's 4 nearest neighbors by Euclidean distance. It's important that these 4 points are linearly independent because next they're used to create a Barycentric coordinate system. Those 4 points become the vertices of your pentachoron (aka 4-simplex).
If nearest-neighbor checks are too slow, try structuring your data into a spatial lookup tree that works in 4D.
Step 2
Now we need to associate a value with the interpolation point X. Start by deriving X's representation in this new Barycentric coordinate system. This Barycentric coordinate consists of 4 numbers, which collectively describe the relative distance between the interpolation point and each of the 4-simplex's vertices.
Normalize the Barycentric coordinate so its components sum to 1.
Each of those 4 simplex vertices are data points and have an output value. Combine those 4 output values into a vector.
Finally, interpolate by calculating the dot product of the normalized coordinate with the vector of output values.
Source: This idea is really just a 4D extension of this gem in middle of the Barycentric coordinate system page on Wikipedia.