How can I remove water from OpenStreetMap ways? - gis

I'd like to get the outline of San Francisco from OpenStreetMap. There's a relation for SF, but it includes large amounts of water which would make it unrecognizable to a resident of the city:
Is there a better polygon for San Francisco in OSM? Are there ways for the coastlines and, if so, how might I find them and subtract them from the administrative boundary?

I was able to do this using osmtogeojson and ogr2ogr. First I grabbed the full OSM XML for the San Francisco relation and converted it to GeoJSON:
$ curl 'http://www.openstreetmap.org/api/0.6/relation/111968/full' > sf.osm
$ osmtogeojson sf.osm > sf.json
Then I removed the non-polygon features using this script:
$ only_polygons.py sf.json > sf.polygons.json
Then I downloaded the land polygons file that Tordanik referenced and ran:
$ ogr2ogr land_polygons.sfbbox.shp -clipsrc -122.56 37.68 -122.27 37.939 land-polygons-complete-4326/land_polygons.shp
$ ogr2ogr -f GeoJSON sf-land.json -clipsrc sf.polygons.json land_polygons.sfbbox.shp
This produces a GeoJSON file (sf-land.json) with the land boundaries of San Francisco:
I'm not sure where the Farallon Islands went, but this basically seems to work!
Every step of this is fast except for the first ogr2ogr command, which clips the land polygons to a bounding box containing San Francisco. This took ~2 minutes on my MacBook Pro. This way was dramatically fast than passing sf.polygons.json directly as the -clipsrc argument to the first command.

Yes, there are ways for the coastline. For example this way which is also part of the relation but also this way which is obviously not part of the relation. Therefore I think it is very challenging to mechanically remove all water and smaller islands but not impossible. I guess you would have to retrieve all map data inside the polygon and run some spatial queries.
The relation you are looking at defines the administrative border of San Francisco. The relation you are looking for would be San Fransisco's landmass without the water (and probably without the smaller islands belonging to San Francisco). Such relations usually don't exist in OSM.

Related

How do I convert one CRS (Coordinate Reference System) value into another using OSGeo4W Shell?

I have QGIS 2.18 (latest version) installed for windows (new users). Along came OSGeo4W Shell. Now using this shell, I want to convert a specific value in one CRS into another. For example, if I know coordinates in WGS84 (say, 91.7362, 26.1445 just to give an example), I would like to know how to convert it to Indian 1954/UTM Zone 46N (which are in meters) using OSGeoShell.
PS: I know there is a way because I once successfully found the way. I had copied the syntax of the command but I deleted the file by mistake and I can't find the way in net again even after long time searches. It was barely a 2 line and simple command.
I think, the command is:
osgeo4w
gdaltransform -s_srs EPSG:4326 -t_srs EPSG:XXXX < input.csv > output.txt
Where the EPSG codes are the codes for the CRS (4326 is for WGS84). You have to find out the epsg code for your target crs and then you can perform the transformation.

Converting the output of MediaWiki to plain text

Using the MediaWiki API, this gives me an output like so, for search term Tiger
https://simple.wikipedia.org/w/api.php?action=query&prop=extracts&titles=Tiger&format=json&exintro=1
Response:
{"batchcomplete":"","query":{"pages":{"9796":{"pageid":9796,"ns":0,"title":"Tiger","extract":"<p>The <b>tiger</b> (<i>Panthera tigris</i>) is a carnivorous mammal. It is the largest living member of the cat family, the Felidae. It lives in Asia, mainly India, Bhutan, China and Siberia.</p>\n<p></p>"}}}}
How do I get an output as
The tiger (Panthera tigris) is a carnivorous mammal. It is the largest living member of the cat family, the Felidae. It lives in Asia, mainly India, Bhutan, China and Siberia.
Please can someone also tell me how to store everything in a text file? I'm a beginner here so please be nice. I need this for a project I'm doing in Bash, on a Raspberry Pi 2, with Raspbian
It's usually recommended to use JSON parser for handling JSON, one that I like is jq
% jq -r '.query.pages[].extract' file
<p>The <b>tiger</b> (<i>Panthera tigris</i>) is a carnivorous mammal. It is the largest living member of the cat family, the Felidae. It lives in Asia, mainly India, Bhutan, China and Siberia.</p>
<p></p>
To remove the HTML tags you can do something like:
... | sed 's/<[^>]*>//g'
Which will remove HTML tags that are not on continues lines:
% jq -r '.query.pages[].extract' file | sed 's/<[^>]*>//g'
The tiger (Panthera tigris) is a carnivorous mammal. It is the largest living member of the cat family, the Felidae. It lives in Asia, mainly India, Bhutan, China and Siberia.
file is the file the JSON is stored in, eg:
curl -so - 'https://simple.wikipedia.org/w/api.php?action=query&prop=extracts&titles=Tiger&format=json&exintro=1' > file
jq '...' file
or
jq '...' <(curl -so - 'https://simple.wikipedia.org/w/api.php?action=query&prop=extracts&titles=Tiger&format=json&exintro=1')
You can install jq with:
sudo apt-get install jq
For your example input you can also use grep with -P (PCRE). But using a proper JSON parser as above is recommended
grep -oP '(?<=extract":").*?(?=(?<!\\)")' file
<p>The <b>tiger</b> (<i>Panthera tigris</i>) is a carnivorous mammal. It is the largest living member of the cat family, the Felidae. It lives in Asia, mainly India, Bhutan, China and Siberia.</p>\n<p></p>
If you're using PHP, you can do it fairly easily such as below.
Accessing the text
We know that the text is stored inside the extract property, so we need to access that.
The easiest way to do this would be to parse the string from the API into an object format, which is done with the json_decode method in PHP. You can then access the extract property from that object, and this will give you your string. The code would be something like this:
//Get the string from the API, however you've already done it
$JSONString = getFromAPI();
//Use the inbuilt method to create a JSON object
$JSONObject = json_decode($JSONString);
//Follow the structure to get the pages property
$Pages = JSONObject->query->pages;
//Here, we don't know what the Page ID is (because the MediaWiki API returns a different number, depending on the page)
//Therefore we need to simply get the first key, and within it should be our desired 'extract' key
$Extract = "";
foreach($Pages as $value) {
$Extract = $value->extract;
break;
}
//$Extract now contains our desired text
Writing it to a file
Now we need to write the contents of $Extract to a file, as you mentioned. This can be done as follows, by utilizing the file_put_contents method.
//Can be anything you want
$file = 'APIResult.txt';
// Write the contents to the file,
// using the LOCK_EX flag to prevent anyone else writing to the file at the same time
file_put_contents($file, $Extract, LOCK_EX);
Aaand we're done!
Documentation
The documentation for these functions (json_decode and file_put_contents) can be found at:
http://php.net/manual/en/function.json-decode.php
http://php.net/manual/en/function.file-put-contents.php
You may find pandoc helpful, from http://pandoc.org/ - it understands a number of file formats on input including Mediawiki, and also has a bunch of file formats on output including plain text. It's more the "Swiss army knife" approach, and since Mediawiki is arbitrarily complicated to parse, you'll want to use something like this that's been through a big test suite.

flatten/dissolve/merge entire shapefile

I've been using ogr2ogr to do most of what I need with shapefiles (including dissolving them). However, I find that for big ones, it takes a REALLY long time.
Here's an example of what I'm doing:
ogr2ogr new.shp old.shp -dialect sqlite -sql "SELECT ST_Union(geometry) FROM old"
In certain instances, one might want to dissolve common neighboring shapes (which is what I think is going on here in the above command). However, in my case I simply want to flatten the entire file and every shape in it regardless of the values (I've already isolated the shapes I need).
Is there a faster way to do this when you don't need to care about the values and just want a shape that outlines the array of shapes in the file?
If you have isolated the shapes, and they don't have any shared boundaries, they can be easily collected into a single MULTIPOLYGON using ST_Collect. This should be really fast and simple to do:
ogr2ogr gcol.shp old.shp -dialect sqlite -sql "SELECT ST_Collect(geometry) FROM old"
If the geometries overlap and the boundaries need to be "dissolved", then ST_Union must be used. Faster spatial unions are done with a cascaded union technique, described here for PostGIS. It is supported by OGR, but it doesn't seem to be done elegantly.
Here is a two step SQL query. First make a MULTIPOLYGON of everything with ST_Collect (this is fast), then do a self-union which should trigger a UnionCascaded() call.
ogr2ogr new.shp old.shp -dialect sqlite -sql "SELECT ST_Union(gcol, gcol) FROM (SELECT ST_Collect(geometry) AS gcol FROM old) AS f"
Or to better view the actual SQL statement:
SELECT ST_Union(gcol, gcol)
FROM (
SELECT ST_Collect(geometry) AS gcol
FROM old
) AS f
I've had better success (i.e. faster) by converting it to raster then back to vector. For example:
# convert the vector file old.shp to a raster file new.tif using a pixel size of XRES/YRES
gdal_rasterize -tr XRES YRES -burn 255 -ot Byte -co COMPRESS=DEFLATE old.shp new.tif
# convert the raster file new.tif to a vector file new.shp, using the same raster as a -mask speeds up the processing
gdal_polygonize.py -f 'ESRI Shapefile' -mask new.tif new.tif new.shp
# removes the DN attribute created by gdal_polygonize.py
ogrinfo new.shp -sql "ALTER TABLE new DROP COLUMN DN"

Visualize text file with location info and intensity

I have an ascii text file containing location data (column 9-lat and 10-long) and intensity(column 20)
200010 207 020311 40658.5 406593 52 344927.31 7100203.50 -26.2078720 127.4491855 345060.64 7100369.14 26.4 650.3 628.0 55471.293 20.168 55648.817 55637.523 -146.062
the text file has many lines 10k+
I am trying to visualize this using GDAL, but not sure how to proceed.
Ideas?
Try QGis. It is free software for making maps with data.
GDAL is for doing sophisticated data transformations.
If your file is named viz.txt, then you can extract
and plot the data using the following commands:
$ awk '{print $9, $10, $20}' < viz.txt > viz2.txt
$ gnuplot
...
gnuplot> plot "viz2.txt" with points palette
This will give you a chart, nicely coloured by intensity.
If you want a more interactive solution, or to overlay the
data on a map, then you will have to use GIS software such
as ArcView, MapInfo or the free tools Generic Mapping Tools (GMT) or QGIS.

Reversed Latitude/Longitude US Tiger/Line Shape File to MySQL w/ OGR2OGRP

I've downloaded the latest set (2010) of TIGER edge shape files (ESRI shapefile format) from the US Census website and am loading them into MySQL using the GDAL ogr2ogr utility. A new table (geotest) does get created with a SHAPE column that has the geometry defined as a LINESTRING. However, I am seeing reversed latitude and longitude values that get reversed when running the following command:
ogr2ogr -f "MySQL" MySQL:"geo,user=myuser,host=localhost,password=mypwd" -nln geotest -nlt LINESTRING -append -a_srs "EPSG:4326" -lco engine=MYISAM tl_2010_01021_edges.shp
Mapping the latitude/longitude (after reversing them of course) they appear to be spot on so I suspect there is just something I am doing wrong or flag I am missing which is causing the latitude and longitudes to be transposed.
When I select the SHAPE column using astext() I get the following result:
LINESTRING(-86.69863 32.973164,-86.69853 32.97302,-86.69856 32.97287,-86.698613 32.972825,-86.6988 32.972825,-86.6989 32.972892,-86.6989 32.973002,-86.69874 32.97316,-86.69864 32.97318,-86.69863 32.973164)
Any ideas what I am doing wrong?