ST_Polygonize/Shapely cannot polygonize when overlapping with shared node/point

ST_Polygonize/Shapely cannot polygonize when overlapping with shared node/point - gis

I'm trying to use both polygonize_full (Shapely, python) and ST_Polygonize (postgis) to get all polygons from a set of line strings.
There is one case where it fails: when there is an "overlapping polygon" with a shared "node".
Pictured here:
The result set is empty, but shapely provides a 'dangles' that provides the entire "failed" linestring,
Has anyone encountered this? Is there a way to gracefully solve this? I couldn't find anything in the docs.
I would expect these two be two separate polygons that overlap eachother, despite their shared node.
Here's the WKT i'm using for example.
MULTILINESTRING((-122.43682090001732 37.767652910517015,-122.4366733785215 37.767652910517015),(-122.4366733785215 37.767652910517015,-122.4366733785215 37.76759990327466),(-122.4366733785215 37.76759990327466,-122.43682090001732 37.76759990327466),(-122.43682090001732 37.76759990327466,-122.43682090001732 37.767652910517015),(-122.43682090001732 37.767652910517015,-122.43675920920998 37.767717579352684),(-122.43675920920998 37.767717579352684,-122.43675920920998 37.767626406895836),(-122.43675920920998 37.767626406895836,-122.43682090001732 37.767652910517015))

Related

Minor bug in pararrayfun? Is there a workaround (threads vs processes)?

I am using Octave version 4.2.2, but I think the question applies to previous versions as well.
I want to know if the following behavior is well-known and caused by my ignorance or if it is a bug that should be addressed, and also if there is a workaround. Note that I am focusing on parallelization in situations where vectorization is not possible, and where copy by values are not an option.
Basically, my problem is that functions such as pararrayfun or parcellfun seem to violate the principle that the properties of handles are passed by reference.
As an example, suppose we define
classdef data_class < handle
properties
data
end
end
and that we want to take each element from the .data property of one object input_arr from this class, apply some stochastic function fancy_func to them, and copy the result to the corresponding index of the .data property of another object arr of this class. The .data property is just a matrix of size 12*1, and we want to have 4 processes that each process 3 elements.
elem_per_process=3;
num_processes=4;
start_indexes={1,4,7,10};
function []=fill_arr(start_idx,num_elems,in_arr_handle,arr_to_fill_handle)
for i=1:num_elems
arr_to_fill_handle.data(start_idx+i-1,:)=fancy_func(in_arr_handle.data(start_idx+i-1,:));
end
end
filler=#(start_idx)fill_arr(start_idx,elem_per_process,input_arr,arr);
cellfun(filler,start_indexes);%works fine
parcellfun(num_processes,filler,start_indexes);% PROBLEM! Nothing is copied
So, this is actually worse that what I thought:
parcellfun copies object handle properties by value
It breaks code that works with cellfun !
A quick look at
<octave_dir>/packages/parallel-3.1.1/parcellfun.m
seems to indicate that the parallelization relies on processes instead of threads (also note pararrayfun uses parcellfun at its core), so this is expected. What bothers me is that it does so without a single warning, while going against the fundamental properties that many Octave users expect to hold.
So, to summarize:
Is this as a (minor) bug?
Is there any way to parallelize using threads instead of processes (for situations where vectorization is not possible, and where copy by value is unacceptable)?

Extract aligned sections of FASTA to new file

I've already looked here and in other forums, but couldn't find the answer to my question. I want to design baits for a target enrichment Sequencing approach and have the output of a MarkerMiner search for orthologous loci from four different genomes with A. thaliana as a Reference as. These output alignments are separate Fasta-Files for each A. thaliana annotated gene with the sequences from my datasets aligned to it.
I have already run a script to filter out those loci supported to be orthologous by at least two of my four input datasets.
However, now, I'm stumped.
My alignments are gappy, since the input data is mostly RNAseq whereas the Reference contains the introns as well. So it looks like this :
AT01G1234567
ATCGATCGATGCGCGCTAGCTGAATCGATCGGATCGCGGTAGCTGGAGCTAGSTCGGATCGC
MyData1
CGATGCGCGC-----------CGGATCGCGG---------------CGGATCGC
MyData2
CGCTGCGCGC------------GGATAGCGG---------------CGGATCCC
To effectively design baits I now need to extract all the aligned parts from the file, so that I will end up with separate files; or separate alignments within the file; for the parts that are aligned between MyData and the Reference sequence with all the gappy parts excluded. There are about 1300 of these fasta files, so doing it manually is no option.
I have a bit of programming experience in python and with Linux command line tools, however I am completely lost on how to go about this. I would appreciate a hint, on what kind of tools are out there I could use or what kind of algorithm I need to come up with.
Thank you.
Cheers

MySQL: Spatial Query to find whether a latitude/longitude point is located within a given boundary

I'm working on google map search functionality. The vision of this is to find whether a (geolocation) point is located within a polygon or not as shown in the illustration below?
I'm using mysql 5.6.20 with Spatial extension, I know it has useful geometry functions built-in, so I will be allowed to query geocoded locations from a database directly.
My aim was to familiarize myself with geospatial functions, so I wrote up an experimental sql.
given a point geolocation: POINT(100.52438735961914 13.748889613522605)
and a polygon or boundary to be tested with was:
POLYGON(100.49503326416016 13.766897133254545,100.55940628051758 13.746555203977,100.56266784667969 13.72170897580617,100.48885345458984 13.739051587150175)
here is my sql example:
SELECT ST_Within(
ST_GEOMFROMTEXT('POINT(100.52438735961914 13.748889613522605)'),
ST_GEOMFROMTEXT('POLYGON(100.49503326416016 13.766897133254545,
100.55940628051758 13.746555203977,100.56266784667969 13.72170897580617,
100.48885345458984 13.739051587150175)'))
As geoFenceStatus
but after issuing the command, what I got in return seemed to be as follows:
geoFenceStatus
===============
null
I'm so unsure why it returned me 'null' value. since it was indicated in function documentation that this should return '1' in case a point resides within a given polygon
any advice would be appreciated, how to get my sql right.

The error message isn't very clear but your issue is that you have an invalid Polygon. There are two problems with it:
You have to repeat the first point at the end -- this is to differentiate it from a LINESTRING, essentially, ie, it is closed.
POLYGONS start with and end with double parenthesis, ie, POLYGON((x1 y1, x2 y2.......xn yn, x1 y1)). This is to allow for inner rings to be delineated with single parenthesis sets inside the polygon.
See the wikipedia WKT article for more information.
You should find that if you write you query as:
SELECT ST_Within(ST_GEOMFROMTEXT('POINT(100.52438735961914 13.748889613522605)'),
ST_GEOMFROMTEXT('POLYGON((100.49503326416016 13.766897133254545, 100.55940628051758 13.746555203977,100.56266784667969 13.72170897580617, 100.48885345458984 13.739051587150175,
100.49503326416016 13.766897133254545))'))
As geoFenceStatus
then is should work.

Complex Gremlin queries to output nodes/edges

I am trying to implement a query and graph visualisation framework that allows a user to enter a Gremlin query, returning a D3 graph of results. The D3 graph is built using a JSON - this is created using separate vertices and edges outputs from the Gremlin query. For simple queries such as:
g.V.filter{it.attr_a == "foo"}
this works fine. However, when I try to perform a more complicated query such as the following:
g.E.filter{it.attr_a == 'foo'}.groupBy{it.attr_b}{it.outV.value}.cap.next().findAll{k,e->e.size()<=3}
- Find all instances of *value*
- Grouped by unique *attr_b*
- Where *attr_a* = foo
- And *attr_b* is paired with no more than 2 other instances of *value*
Instead, the output is of the following form:
attr_b1: {value1, value2, value3}
attr_b2: {value4}
attr_b3: {value6, value7}
I would like to know if there is a way for Gremlin to output the results as a list of nodes and edges so I can display the results as a graph. I am aware that I could edit my D3 code to take in this new output but there are currently no restrictions to the type/complexity of the query, so the key/value pairs will no necessarily be the same every time.
Thanks.

You've hit what I consider one of the key problems with visualizing Gremlin results. They can be anything. Gremlin results might not just be a list of vertices and edges. There is no way to really control this that I can think of. At the end of the day, you can really only visualize results that match a pattern that D3 expects. I'd start by trying to detect that pattern and visualize only in those cases (simply display non-recognized patterns as JSON perhaps).
Thinking of your specific example that results like this:
attr_b1: {value1, value2, value3}
attr_b2: {value4}
attr_b3: {value6, value7}
What would you want D3 to visualize there? The vertices/edges that were traversed over to get that result? If so, you might be stuck. Gremlin doesn't give you a way to introspect the pipeline to see what's passing through it. In other words, unless the user explicitly gathers vertices and edges within the pipeline that were touched you won't have access to them. It would be nice to be able to "spy" on a pipeline in that way, but at the moment it doesn't do that. There's been internal discussion within TinkerPop to create a new kind of pipeline implementation that would help with that, but at the moment, it doesn't exist.
So, without the "spying" capability, I think your only workarounds would be to:
detect vertex/edge list on your client side and only render those with d3. this would force users to always write gremlin that returned data in such a format, if they wanted visualization. put it in the users hands.
perhaps supply server-side bindings for a list of vertices/edges that a user could explicitly side-effect their vertices/edges into if their results did not conform to those expected by your visualization engine. again, this would force users to write their gremlin appropriately for your needs if they want visualization.

GeomFromText function documentation

I see that the following syntax is used in examples:
GeomFromText('Polygon((1 1, 2 2, 3 3))');
The double parenthesis caused a bit of a trouble so I decided to look it up in the official documentation. At my non-small surprise the search mysql polygon did not give me the documentation of this function. A search to mysql geomfromtext also did not give the definition of the function GeomFromText.
So I'm still looking for the official documentation of these functions.

I see that the MySQL Reference Manual for GeomFromText() doesn't even give a typical function definition either, but it does describe how to use it. GeomFromText() converts from "well-known text" (abbreviated as WKT) to MySQL's internal format. WKT is simply a textual representation of a geometry object, which can be a polygon as your example, or any of the other geometry types. A key point to understand is that Polygon(...) is the WKT format for a polygon; it isn't a MySQL function call, even though it sort of looks like one.
Polygons can contain holes. When defining a polygon, you may optionally supply one or more interior boundaries to define such holes. The WKT for polygons use inner parentheses to distinguish these boundaries from one other. Even if you don't want to define holes, the inner parentheses are still required. Wikipedia provides some simple examples of polygon WKTs together with pictures of the resulting polygons.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008