SPARQL Find Multiple Search Terms at Once - gis

Sorry, but I have absolutely no idea what the terminology to use here is, which also makes it impossible to search for what I want to do. I only found out about SPARQL about an hour ago.
Basically, I have 475 cities that I want to know the areas of. In the course of searching various things looking for a pre-existing list (or even a very basic GIS guide) to find this, one of my results pointed out that Wikipedia has the areas for all of those cities. Unfortunately, I couldn't figure out how to get the areas for multiple cities at once.
What I can do is very, very basic. Based on the second Google Search result (I closed the other tab ages ago), I can make very basic changes to Jan Drewniak's answer. So, in principle, I can go to query.wikidata.org and find the area of each city individually by changing "Paris" in:
SELECT ?town ?area ?population ?coordinate ?country WHERE {
?town ?label "Paris"#en;
wdt:P2046 ?area;
wdt:P625 ?coordinate;
wdt:P1082 ?population;
wdt:P17 ?country.
}
And having done that, I can download the result and then change Paris again to one of the other 474 cities, download that result and so on until I've done this for all 475 cities. Then I can combine all 475 .csv files. That would work.
Obviously, I'd rather not do that. Tomorrow's Sunday, so I could but it would take ages. What I'd like to be able to do is:
run a single query that includes all 475 cities, is that possible?
get the country to report in terms that aren't wd:Q30, is that possible?
be able to tell if the results I'm getting for area are all the same unit, ideally sqkm but conversions aren't an issue, is that possible?
if it is possible to do all 475 at once, would I be able to reference the names in a .csv file?
I should also note that query.wikidata.org/ is the only place where I know to run this.
If there's some other list made by someone else of the areas of the cities in the UN's World Urbanization Prospects data of cities over 300,000 (which is where my 475 cities were culled from), then that would also work. (On a related note, Demographia has some PDF lists of Urban Areas over 1,000,000 that does have area information... if I try and copy and paste that, it just comes out as a single line, not a table. If I were to give up and find which of my 475 cities are in that list, how would I proceed?)
I've tried the following:
SELECT ?town ?area ?population ?coordinate ?country WHERE {
?town ?label "Paris"#en;
wdt:P2046 ?area;
wdt:P625 ?coordinate;
wdt:P1082 ?population;
wdt:P17 ?country.
}
SELECT ?town ?area ?population ?coordinate ?country WHERE {
?town ?label "London"#en;
wdt:P2046 ?area;
wdt:P625 ?coordinate;
wdt:P1082 ?population;
wdt:P17 ?country.
}
But query.wikidata.org gave me an error and also variations on "Paris" | "London"#en or "Paris"#en | "London"#en, by analogy to R.
As to the tags, I've just copied those from the question where I got the above code model plus added gis and SPARQL ones.

Related

overpass-turbo.eu find all cities on maps

maybe someone can help me with a overpass-turbo.eu-query.
I'd like to highlight (center of it) all cities of a country or region (or current map).
Is there maybe an "simple" example on web?
(Google was not a good friend with this special request, yet. But I am sure someone must tried to search this way already.)
Many thanks for every idea.
Here is an example for finding all cities, towns, villages and hamlets in the country Andorra:
[out:json][timeout:25];
// fetch area “Andorra” to search in
{{geocodeArea:Andorra}}->.searchArea;
// gather results
(
node[place~"city|town|village|hamlet"](area.searchArea);
);
// print results
out body;
>;
out skel qt;
You can view the result at overpass-turbo.eu after clicking the run button.
Note: When running this query for larger countries you might need to increase the timeout value. Also rendering the result in the browser might not be possible due to performance reasons. In this case use the export button and download the raw data instead.

Understanding openaddresses data format

I have downloaded us-west geolocation data (postal addresses) from openaddresses.io. Some of the addresses in the datasets are not complete i.e., some of them doesn't have info like zip_code. Is there a way to retrieve it or is the data incomplete?
I have tried to search other files hoping to find any related info. The complete dataset doesn't contain any info relate to it. City of Mesa, AZ has multiple zip codes, so it is hard to assign one to the address. Is there any way to address this problem?
This is how data looks like (City of Mesa, AZ)
LON,LAT,NUMBER,STREET,UNIT,CITY,DISTRICT,REGION,POSTCODE,ID,HASH
-111.8747353,33.456605,790,N DOBSON RD,,SRPMIC,,,,,dc0c53196298eb8d
-111.8886227,33.4295194,2630,W RIO SALADO PKWY,,MESA,,,,,c38b700309e1e9ce
-111.8867018,33.4290795,2401,E RIO SALADO PKWY,,TEMPE,,,,,9b912eb2b1300a27
-111.8832045,33.4232903,700,S EVERGREEN RD,,TEMPE,,,,,3435b99ab3f4f828
-111.8761202,33.4296416,2100,W RIO SALADO PKWY,,MESA,,,,,b74349c833f7ee18
-111.8775844,33.4347782,1102,N RIVERVIEW,,MESA,,,,,17d0cf1542c66083
Short Answer: The data incomplete.
The data in OpenAddresses.io is only as complete as the datasource it pulls from. OpenAddresses is just an aggregation of publicly available datasets. There's no real consistency between government agencies that make their data available. As a result, other sections of the OpenAddresses dataset might have city names or zip codes, but there's often something missing.
If you're looking to fill in the missing data, take a look at how projects like Pelias use multiple data sources to augment missing data.
Personally, I always end up going back to OpenStreetMaps (OSM). One could argue that OpenAddresses is better quality because it comes from official sources and doesn't try to fill in data using approximations, but the large gaps of missing data make it far less useful, at least on its own.

Google Maps: Norwegian postcodes not returning any results

Putting it simply, we have some Norwegian postcodes and are using the API to get their addresses and lat & long. Nothing to highbrow, but on around 10% of the postcodes. The API returns no results, here's an example:
Success for postal_code=1151:
http://maps.googleapis.com/maps/api/geocode/json?components=country:NO%7Cpostal_code:1151&sensor=false
Fail for postal_code=2066:
http://maps.googleapis.com/maps/api/geocode/json?components=country:NO%7Cpostal_code:2066&sensor=false
I have noticed that the majority appear to be for the Postboks (presuming equivalent of PO boxes in the UK).
However, it's not true for all of them.
Has anyone similar experience and or perhaps a better knowledge of Norwegian postcodes?
Thanks
I tried your given request, and I found that postal_code=2066 give you ZERO_RESULTS, If you are looking to the name Jessheim, am I right? Then I think you should use the postal_code=2069, this will give you the request that you want.
Here is the request that I used.
maps.googleapis.com/maps/api/geocode/json?components=country:NO|postal_code:2069&sensor=false
Also, I think you should know first the correct/available postal code that you used. I tried to request generally in which I did not set any country as a filter.
First, I used the postal_code=1151, and as you can see, you will find the 1151 Oslo, Norway address in the result. Which means the postal_code 1151 is available in Norway.
maps.googleapis.com/maps/api/geocode/json?components=country:|postal_code:1151
Second, I used the postal_code=2066, and you will not find any address that the country is NORWAY.
maps.googleapis.com/maps/api/geocode/json?components=country:|postal_code:2066
For additional note, if you use two components value in your request.
You need to use pipe(|) to separate them. I hope I help you with this
:)
KENDi - thanks a lot for your help and answer. I found out that Norway has two types of postcode. One is for the street addresses, the other is for postboxes (or PO Boxes in the UK) that don't have an geographical address.
Here's an example
http://adressesok.posten.no/en/postal_codes/search?utf8=%E2%9C%93&q=Molde

Scrape html Twitter followers using R

I have a continous task that I think can be automated using R.
Using the twitteR-package I have extracted a list of tweets. Those have been categorized into positive (and neutral) and negative tweets. This have been a manuel task - but I am looking into doing some machine learning on it.
My problem is the reach-part. I want to know not only the number of positive and negative tweets but also the number of people who potentialle have been exposed to the tweet.
There is a way to do this using the twitteR-package, but it is slow, as it requires the machine to sleep between each and every search. And with thousands of tweets this is not a proper way for me.
My thought was therefore if it is possible to extract the number of followers from the html-sourcecode of twitter using the html <- webpage <- getURL("http://www.twitter.com/AngelHaze") and here extract the number of followers.
Also, on top of this, I want to be able to do this using a vector of URL's ("http://www.twitter.com/AngelHaze") and then combining them into a dataframe with the ScreenName (AngelHaze) and the number of followers. I am from Denmark, so the sourcecode containing the number of followers look like this
a class="ProfileNav-stat ProfileNav-stat--link u-borderUserColor u-textCenter js-tooltip js-nav u-textUserColor" title="196.262 følgere" data-nav="followers"
href="/AngelHaze/followers""
Where "196.262 følgere" is the relevant part.
Is this possible? And if yes, can anyone help me going?
Best, Sander Ehmsen.

Extract adjacent word? (Names, streets, creeks, rivers)

Extract adjacent word? (Names, streets, creeks, rivers)
Hi I am looking for a function that I can run through a massive list of paragraphs to extract the word proceeding ‘creek’ such that the creek names could be isolated.
For example a given paragraph might read:
“The site was located up stream three miles from the bridge along Clark Creek.”
The ideal output would be simply
Clark Creek
It would have to be something that looks up the word ‘creek’ as a criteria and extracts the preceding word, even just ‘Clark’ would work for me.
I have been playing around with the RQSlite package & gsub, but no luck so far… I am sure this is a common procedure.
If you're extracting actual addresses, there are services which do this intelligently and can even verify the results: http://smartystreets.com/products/liveaddress-api/extract (To be fair, you should know I helped develop that, although I no longer work there.)
For place names, assuming the place is just one word, you could try a simple regex:
/(?<=\s)(\S+\s+(Creek|Street|River))/ig
Granted, I've never used RQSLite or gsub, but I imagine something like this would do the trick.