how to find intersections from OpenStreetMap? - intersection

How to extract intersections in the OpenStreetMap? I need the longitude and latitude of the intersections, thanks!

There has been a similar question here. There is no direct API call to retrieve intersections. But you can query all ways in a given bounding box (for example directly via the API or via the Overpass API) and look for nodes shared by two or more ways as explained in the other answer.

Try the GeoNames findNearestIntersectionOSM API:
http://api.geonames.org/findNearestIntersectionOSMJSON?lat=37.451&lng=-122.18&username=demo
The input is lng and lat of location and the response contains lng and lat of nearest intersection:
{"intersection":{...,"lng":"-122.1808293","lat":"37.4506505"}}
It is provided by GeoNames, but seems to be based on OpenStreetMap

As #scai perfectly explained, you have to process the raw OSM data yourself to find intersection nodes of ways.
Instead of writing your own program, you can try different algorithms first using the OverpassAPI.
Dependent on which kind of ways you are interested in, add the types of ways which should not count as intersection to the regv attribute (at two script sections). The type of ways can be found here: highway tags.
The BoundingBox is the part of the map you are viewing on the Overpass-tourbo Website.
The script is based on a scripted linked in another question's reply but I rewrote it to make it more readable and added comments (both here and in the original reply).
Sample Script:
<!-- Only select the type of ways you are interested in -->
<query type="way" into="relevant_ways">
<has-kv k="highway"/>
<has-kv k="highway" modv="not" regv="footway|cycleway|path|service|track"/>
<bbox-query {{bbox}}/>
</query>
<!-- Now find all intersection nodes for each way independently -->
<foreach from="relevant_ways" into="this_way">
<!-- Get all ways which are linked to this way -->
<recurse from="this_way" type="way-node" into="this_ways_nodes"/>
<recurse from="this_ways_nodes" type="node-way" into="linked_ways"/>
<!-- Again, only select the ways you are interested in, see beginning -->
<query type="way" into="linked_ways">
<item set="linked_ways"/>
<has-kv k="highway"/>
<has-kv k="highway" modv="not" regv="footway|cycleway|path|service|track"/>
</query>
<!-- Get all linked ways without the current way -->
<difference into="linked_ways_only">
<item set="linked_ways"/>
<item set="this_way"/>
</difference>
<recurse from="linked_ways_only" type="way-node" into="linked_ways_only_nodes"/>
<!-- Return all intersection nodes -->
<query type="node">
<item set="linked_ways_only_nodes"/>
<item set="this_ways_nodes"/>
</query>
<print/>
</foreach>

In case you have imported OSM data to postgis postgresql with osm2pgsql, you can query intersections like this:
SELECT
ST_AsText(st_intersection(a.way, b.way)) AS way
FROM
(select * from planet_osm_line where highway='service') AS a,
(select * from planet_osm_line where highway='footway') AS b
WHERE
ST_Intersects(a.way, b.way)
AND a.osm_id != b.osm_id;

Related

How to identify period end contexts in xbrl filings?

I am trying to find only the current period concepts and facts for the three main financial statements. The goal is to be able to iterate through filings of different companies in different periods.
Using Ebay 2017 10-k as an example.
For concepts that capture YoY change, like those in income statement and statement of cash flows, I can use context found in any of the dei tags, for example:
<dei:DocumentFiscalYearFocus contextRef="FD2017Q4YTD" id="Fact-2E3E1FD4D81352F693510AE035FDC862-wk-Fact-2E3E1FD4D81352F693510AE035FDC862">2017</dei:DocumentFiscalYearFocus>
dei:DocumentFiscalYearFocus tag is required, and its context "FD2017Q4YTD" is also found in all IS and SCF period end concepts, so that's easy.
However, balance sheet concepts use a different context:
<us-gaap:CashAndCashEquivalentsAtCarryingValue contextRef="FI2017Q4" decimals="-6" id="d15135667e874-wk-Fact-3E4A0A2B272B59DE9DAF004097ECF968" unitRef="usd">2120000000</us-gaap:CashAndCashEquivalentsAtCarryingValue>
Any idea how to identify the "FI2017Q4" context (or otherwise find current period balance sheet concepts)?
The XBRL document instance you are viewing contains one or more schemaRef elements, each of which loads an XBRL taxonomy, or data dictionary, for the XBRL instance. Somewhere, within that reference graph of files (and there could be several files), is the definition of each context. The definition will look something like this:
<context id="CONTEXT_ID_NAME">
<!-- ... child elements appear here ... -->
</context>
If you can find the <context> element with an attribute of id that matches the contextRef you're interested in, then you've found what you're looking for. In your case, you're looking in the related XBRL taxonomy files for something that says <context ID="FD2017Q4YTD"> and <context ID="FI2017Q4">.
The child elements of the <context> element describes the dates for the context. There are two types of XBRL contexts:
instant, which specifies a context with a single date
period, which specifies a context with a start date and an end date
The child elements of the <context> element will describe which type of context being described.
This work is all manually doable, but it might be best to use XBRL processing software, which will perform all of this work for you.
The value of the contextRef attribute is purely an identifier that references a context definition elsewhere in the document. Using the eBay example, you'll find this context definition:
<context id="FI2017Q4">
<entity>
<identifier scheme="http://www.sec.gov/CIK">0001065088</identifier>
</entity>
<period>
<instant>2017-12-31</instant>
</period>
</context>
The value in the "instant" element is what tells you what date facts associated with this context relate to.
In order to properly understand the XBRL facts, you need to fully understand the associated contexts. There may be other information, such as additional dimensions, defined here.
I'd strongly recommend working with an existing XBRL processor that will resolve the contextual information for you, such as the open source Arelle processor, or the API provided by XBRL US.
One possible approach to working with XBRL data is to use a processor that converts data to the newer xBRL-JSON format, which provides fact objects with all contextual information fully resolved.

Wikipedia Api get amounts of words

I'm a bit stuck in all the options the Wikipedia api has.
My goals is to get the amount of words of an wikipedia page.
I have the url of the wiki.
The search option does return this value:
http://en.wikipedia.org/w/api.php?format=xml&action=query&list=search&srsearch=camera&srlimit=1
Wil return
<api>
<query-continue>
<search sroffset="1"/>
</query-continue>
<query>
<searchinfo totalhits="68658"/>
<search>
<p ns="0" title="Camera" snippet="A <span class='searchmatch'>camera</span> is an optical instrument that records image s that can be stored directly, transmitted to another location, or both. <b>...</b> " size="43246" wordcount="6348" timestamp="2014-04-29T15:48:07Z"/>
</search>
</query>
</api>
(scroll a bit to the right and you find wordcount
But this query is making a search and shows 1 top result. However, when I search on the wikipedia name in the URL, it doesnt always find that record as the first result.
So is there a way to get this wordcount a Wikipedia page?
No other APIs provide this information, so the kludge with list=search is the only way. If you know the exact title you can get better results by appending &srwhat=nearmatch to the query (it will always return 1 result though). See the docs and try the sandbox to learn more.
Note that word counts are not stored in database so the API has to go to Lucene/Elasticsearch for this information which is not exactly fast, so if you need this information en masse you should download a dump instead.

Get list of all intersections in a city

What is the best source and way to get a list of all intersections in a major city?
If one doesn't mind a few false positives the following Overpass API script gets road intersections out of OpenStreetMap data pretty easily:
http://overpass-turbo.eu/s/QD
(the script can't detect false intersections – where only two lines meet, though, e.g. when a road is represented by multiple way objects in the OSM data)
In case that the script goes offline a more readable version directly here:
Dependent on which kind of ways you are interested in, add the types of ways which should not count as intersection to the regv attribute (at two script sections). The type of ways can be found here: highway tags.
The BoundingBox is the part of the map you are viewing on the Overpass-tourbo Website.
Sample Script:
<!-- Only select the type of ways you are interested in -->
<query type="way" into="relevant_ways">
<has-kv k="highway"/>
<has-kv k="highway" modv="not" regv="footway|cycleway|path|service|track"/>
<bbox-query {{bbox}}/>
</query>
<!-- Now find all intersection nodes for each way independently -->
<foreach from="relevant_ways" into="this_way">
<!-- Get all ways which are linked to this way -->
<recurse from="this_way" type="way-node" into="this_ways_nodes"/>
<recurse from="this_ways_nodes" type="node-way" into="linked_ways"/>
<!-- Again, only select the ways you are interested in, see beginning -->
<query type="way" into="linked_ways">
<item set="linked_ways"/>
<has-kv k="highway"/>
<has-kv k="highway" modv="not" regv="footway|cycleway|path|service|track"/>
</query>
<!-- Get all linked ways without the current way -->
<difference into="linked_ways_only">
<item set="linked_ways"/>
<item set="this_way"/>
</difference>
<recurse from="linked_ways_only" type="way-node" into="linked_ways_only_nodes"/>
<!-- Return all intersection nodes -->
<query type="node">
<item set="linked_ways_only_nodes"/>
<item set="this_ways_nodes"/>
</query>
<print/>
</foreach>
You can do it using OpenStreetMap data.
Download the data for the city (use the export link: http://www.openstreetmap.org/export or get the data from here: http://metro.teczno.com/; there are other sources but this isn't the place to list them).
Find all the elements with appropriate values for the 'highway' tag (http://wiki.openstreetmap.org/wiki/Key:highway).
For each such way, get the node IDs that make it up.
Create an array containing entries consisting of the highway information (name, etc.) and a node, one for every node.
Sort the array on node IDs. That groups entries by the node, so that a set of entries with duplicate nodes represents an intersection.
Traverse the array, extracting each group of entries with more than one entry in it and adding a new entry to your list of intersections. At this point you can extract the highway information so that an intersection can be characterised by the highways which meet there.
This is a short summary, I know. But I know it works, because it is the system I use in my map rendering library to identify intersections when creating routing data.

Scrape hyperlinks from an html page

I am trying to extract the latitudes and longitudes for the places listed on the right side of this page. I want to create a table like the following:
Place Latitude Longitude
Agarda 23.12604 87.19869
Ahanda 23.13099 87.18501
.....
.....
West-Sanabandh 23.24876 86.99941
Is it possible to do this in R without calling up the individual hyperlinks for "Agarda:, "Ahanda"... etc. one at a time?
The data appears on different pages. You can't get that data without requesting each page.
If R supports threads then you can call them up in parallel rather than one at a time.
It's possible to use RCurl to scrape each page in some type of loop or sapply. If you combine it with some regex and/or readHTMLTable (to identify the hyperlinks) then it's a relatively straightforward function.
Within RCurl, it's possible to create a multicurl which will do this in parallel, although given the number of queries involved, it might be just as easy to serialise it and put a small system sleep between queries.

Google geocoding service returns response for fake address

I'm using the google geocoding service to validate that a city name (plus region and country) that has been entered in our system exists, and to get the lat/long.
However, I'm finding that it seems to 'guess' if you make a typo, and returns an response even if you made an error.
For instance, a request for "Beverton, Ontario, Canada" returns the lat/long for Beaverton, with no indication that you provided the wrong city name.
I'm using the CSV response type, and am getting the 200 response code.
Can I either prevent the service from doing this, or, better yet, find out if it has?
Edit: to clarify ... Google is correcting the input (when I would expect it to just fail) and I need to know if it has done this.
There isn't any way for the geocoder to let you know if it thinks you had a typo. I agree with Saul's answer, that your best bet is to check your query against the response.
I just wanted to point out that you'll have to check several elements of your input against several of the response values, in order to find the elements that should match up. In this case, "Beaverton" was found inside of "DependentLocalityName".
<?xml version="1.0" encoding="UTF-8" ?>
<kml xmlns="http://earth.google.com/kml/2.0"><Response>
<name>Beverton, Ontario, Canada</name>
<Status>
<code>200</code>
<request>geocode</request>
</Status>
<Placemark id="p1">
<address>Beaverton, Brock, ON, Canada</address>
<AddressDetails Accuracy="4" xmlns="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0"><Country><CountryNameCode>CA</CountryNameCode><CountryName>Canada</CountryName><AdministrativeArea><AdministrativeAreaName>ON</AdministrativeAreaName><SubAdministrativeArea><SubAdministrativeAreaName>Durham Regional Municipality</SubAdministrativeAreaName><Locality><LocalityName>Brock</LocalityName><DependentLocality><DependentLocalityName>Beaverton</DependentLocalityName></DependentLocality></Locality></SubAdministrativeArea></AdministrativeArea></Country></AddressDetails>
<ExtendedData>
<LatLonBox north="44.4502166" south="44.4183470" east="-79.1199562" west="-79.1839858" />
</ExtendedData>
<Point><coordinates>-79.1519710,44.4342840,0</coordinates></Point>
</Placemark>
</Response></kml>
Update:
This may be impossible to actually implement. If your input is "Beverton, Ontario, Canada", how do you know which of those three words to check for? Two of them will match up just fine. What if they're entered in a different order?
Is it responding with a "200 Success Code"? There's a chance that it might be giving you a different status code.
Here are the different status codes google returns: http://code.google.com/apis/maps/documentation/reference.html#GGeoStatusCode
NOTE: See the answer by #Chris B above. As Chris points out, this may be impossible to implement.
Do you have to use the CSV response type? If not, the other response types such as KML provide enough details to determine the location that the coordinates refer to. You could verify your input against the response's LocalityName element.
<kml xmlns="http://earth.google.com/kml/2.0">
<Response>
<name>1600 amphitheatre mountain view ca</name>
<Status>
<code>200</code>
<request>geocode</request>
</Status>
<Placemark>
<address>
1600 Amphitheatre Pkwy, Mountain View, CA 94043, USA
</address>
<AddressDetails Accuracy="8">
<Country>
<CountryNameCode>US</CountryNameCode>
<AdministrativeArea>
<AdministrativeAreaName>CA</AdministrativeAreaName>
<SubAdministrativeArea>
<SubAdministrativeAreaName>Santa Clara</SubAdministrativeAreaName>
<Locality>
<LocalityName>Mountain View</LocalityName>
<Thoroughfare>
<ThoroughfareName>1600 Amphitheatre Pkwy</ThoroughfareName>
</Thoroughfare>
<PostalCode>
<PostalCodeNumber>94043</PostalCodeNumber>
</PostalCode>
</Locality>
</SubAdministrativeArea>
</AdministrativeArea>
</Country>
</AddressDetails>
<Point>
<coordinates>-122.083739,37.423021,0</coordinates>
</Point>
</Placemark>
</Response>
</kml>