Plot cities in tableau from longitude and latitude - json

I had JSON files which I cleaned and exported as csv files to be used in Tableau. The files have Longitude and Latitude fields, but I want to group them in Tableau by cities. How can I do so whether directly in Tableau or using Python?

Use geopy!!
https://geopy.readthedocs.io/en/stable/#module-geopy.geocoders
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="specify_your_app_name_here")
location = geolocator.reverse("52.509669, 13.376294")
print(location.address)
Potsdamer Platz, Mitte, Berlin, 10117, Deutschland, European Union
print((location.latitude, location.longitude))
(52.5094982, 13.3765983)
print(location.raw)
{'place_id': '654513', 'osm_type': 'node', ...}

Related

issue with connecting data in databricks from data lake and reading JSON into Folium

i'm working on something based of this blogpost:
https://python-visualization.github.io/folium/quickstart.html#Getting-Started
specifically part 13 - using Cloropleth maps:
the piece of code they use is the following:
import pandas as pd
url = (
"https://raw.githubusercontent.com/python-visualization/folium/master/examples/data"
)
state_geo = f"{url}/us-states.json"
state_unemployment = f"{url}/US_Unemployment_Oct2012.csv"
state_data = pd.read_csv(state_unemployment)
m = folium.Map(location=[48, -102], zoom_start=3)
folium.Choropleth(
geo_data=state_geo,
name="choropleth",
data=state_data,
columns=["State", "Unemployment"],
key_on="feature.id",
fill_color="YlGn",
fill_opacity=0.7,
line_opacity=0.2,
legend_name="Unemployment Rate (%)",
).add_to(m)
folium.LayerControl().add_to(m)
m
if I use this I get the requested map.
Now I try to do this with my own data; i work in databricks
so I have a JSON with the GEOJSON data (source_file1) and a CSV file (source_file2) with the data that needs to be "plotted" on the map.
source_file1 = "dbfs:/mnt/sandbox/MAARTEN/TOPO/Belgie_GEOJSON.JSON"
state_geo = spark.read.json(source_file1,multiLine=True)
source_file2 = "dbfs:/mnt/sandbox/MAARTEN/TOPO/DATASVZ.csv"
df_2 = spark.read.format("CSV").option("inferSchema", "true").option("header", "true").option("delimiter",";").load(source_file2)
state_data = df_2.toPandas()
when adjusting the code below:
m = folium.Map(location=[48, -102], zoom_start=3)
folium.Choropleth(
geo_data=state_geo,
name="choropleth",
data=state_data,
columns=["State", "Unemployment"],
key_on="feature.properties.name_nl",
fill_color="YlGn",
fill_opacity=0.7,
line_opacity=0.2,
legend_name="% Marktaandeel CC",
).add_to(m)
folium.LayerControl().add_to(m)
m
So i upload the geo_data parameter as a Sparkdatafram, I get the following error:
ValueError: Cannot render objects with any missing geometries: DataFrame[features: array<struct<geometry:struct<coordinates:array<array<array<string>>>,type:string>,properties:struct<arr_fr:string,arr_nis:bigint,arr_nl:string,fill:string,fill-opacity:double,name_fr:string,name_nl:string,nis:bigint,population:bigint,prov_fr:string,prov_nis:bigint,prov_nl:string,reg_fr:string,reg_nis:string,reg_nl:string,stroke:string,stroke-opacity:bigint,stroke-width:bigint>,type:string>>, type: string]```
I think it is because transforming the data from the "blob format" in the Azure datalake to the sparkdataframe, something goes wrong with the format. I tested this in a jupyter notebook from my desktop, data straight from file to folium and it all works.
If i load it directly from the source, like the example does with their webpage, so i adjust the 'geo_data' parameter for the folium function:
m = folium.Map(location=[48, -102], zoom_start=3)
folium.Choropleth(
geo_data=source_file1, #this gets adjusted directly to data lake
name="choropleth",
data=state_data,
columns=["State", "Unemployment"],
key_on="feature.properties.name_nl",
fill_color="YlGn",
fill_opacity=0.7,
line_opacity=0.2,
legend_name="% Marktaandeel CC",
).add_to(m)
folium.LayerControl().add_to(m)
m
I get the error
Use "/dbfs", not "dbfs:": The function expects a local file path. The error is caused by passing a path prefixed with "dbfs:".
So I started wondering what is the difference between my JSON file and the one of the blogpost. And the only thing i can imagine is that the Azure datalake doesn't store my json as a json but as a block blob file and for some reason i am not converting it properly so that folium can read it.
Azure blob storage (data lake)
So can someone with folium knowledge let me know if
A. it is not possible to load the geo_data directly from a datalake ?
B. in what format I need to upload the data ?
any thoughts on this would be helpfull!!!
thanks in advance!
Solved this issue, just had to replace "dbfs:" with "/dbfs". I tried it a lot of times but used "/dbfs:" and got another error.
can't believe i'm this stupid :-)

Importing sting arrays to integer from csv to Neo4j using Cypher

I'm new to Neo4j and Cypher and I'm trying to import some data from csv that includes an array of IDs. I have the query below working but as Cypher defaults to strings, I've been unable to find the best way to convert the array of placeIDs to integers.
LOAD CSV WITH HEADERS FROM 'http://localhost:11001/project-ca45d786-e360-4e3b-b4b4-eb8fe62a7b55/People-Gridv2.csv' AS row
CREATE (:People {peopleID: toInteger(row.peopleID), nickname: row.nickname, firstName: row.firstName, lastName: row.lastName, relationship: row.relationship, firstMemory: row.firstMemory, lastMemory: row.lastMemory, placeID: split(row.placeID,";")})
I hoped that I'd be able to do something like the following, but it doesn't work:
placeID: toInteger(split(row.placeID,";"))
Can anyone point me in the right direction?
that would probably something like
placeID : REDUCE(array=[] , s IN split(row.placeID,";") | array+[toInteger(s)] )
to get an array of integers
example
with '123;456' as placeID
return REDUCE(array=[] , s IN split(placeID,";") | array+[toInteger(s)] )
will return
[123,456]
and even shorter :)
with '123;456' as placeID
return [s IN split(placeID,";") | toInteger(s)]

Shapefile to csv with a WKT multipolygon column

My data is from census maps. I am not familiar with shapefiles or WKT file, but I managed to find this solution, on by which I tried to create my own code.
import ogr
import csv
#Open files
csvfile=open("states_wkt.csv",'wb')
ds=ogr.Open("cb_2015_us_state_20m.shp")
lyr=ds.GetLayer()
#Get field names
dfn=lyr.GetLayerDefn()
nfields=dfn.GetFieldCount()
fields=[]
for i in range(nfields):
fields.append(dfn.GetFieldDefn(i).GetName())
fields.append('kmlgeometry')
csvwriter = csv.DictWriter(csvfile, fields)
While this works, i get geometry results looking like:
""kmlgeometry"":""<MultiGeometry>
<Polygon><outerBoundaryIs><LinearRing><coordinates>-118.593969,33.467198
-118.484785,33.487483 -118.370323,33.409285 -118.286261
</coordinates></LinearRing></outerBoundaryIs></Polygon>
<Polygon><outerBoundaryIs><LinearRing><coordinates>-118.594033,33.035951
-118.540069,32.980933 -118.446771,32.895424 -118.353504,32.821962 -118.425634
</coordinates></LinearRing></outerBoundaryIs></Polygon>
</MultiGeometry>
In my specific case I would like to return geometry data in a form of multipolygon like this:
MULTIPOLYGON (((-71.6062550000000044 42.0133709999999994,
-71.5276060000000058 42.0149979999999985, -71.5169060000000059
42.0155979999999971, -71.4999080000000049 42.0171989999999980,
-71.3814009999999968 42.0187979999999968, -71.3815050000000042
42.0000110000000006, -71.3812010000000043 41.9811979999999991)))
How can I achieve that ?
I managed to find a simple script that is easy to use using GDAL:
ogr2ogr -f CSV multipolygon_states.csv cb_2015_us_state_20m.shp -nlt MULTIPOLYGON -lco GEOMETRY=AS_WKT

how to convert data from r to json and can be readable by geojson?

I have retrieve data from JSON to R and edited the data to dissolve the boundary. The data from JSON consist of list of Daerah and the coordinates, longitude and latitude. So I already dissolved the boundary to Wilayah which each Wilayah consist of many Daerah and I have several Wilayah.
ditu <- readOGR('mys2.json', 'OGRGeoJSON')
lookup <- read.csv("json/aaa")
soa1 <- merge(ditu, lookup, by.x="Name", by.y = "Daerah", all.x = TRUE)
slsoa1 <- gUnaryUnion(soa1, id = soa1$Wilayah)
plot(slsoa1)
My main problem is I want to save the data in JSON format and can be readable by GeoJSON. Any help would be appreciated.
library(geojsonio)
library(rjsonio)
mapwilayah<-geojson_json(slsoa1)
all<-toJSON(mapwilayah)
write(all,"mapwilayah.json")

R, GeoJSON and Leaflet

I recently learned about leafletjs.com from an R-Bloggers.com post. One such tutorial that I would like to achieve is to create interactive choropleth maps with leaflet (http://leafletjs.com/examples/choropleth.html). I have been using the rjson package for R to create the data.js file to be read by leaflet. Although I have had success with using the provided shape file as a readable JSON file in leaflet, I am unable to repeat the process when trying to merge additional properties from the data frame ("data.csv") to the JSON file; in this case, I have done rGIS to attach data on the number of cans in each school listed in the data frame. What I would please like to achieve is to create a choropleth map in leaflet that displayes high school district (as identified by the NAME variable), and the sum of "cans". The issue, I believe, is that writeOGR exports the information as points, rather than polygon?
{
"type": "Feature",
"properties": {
"name": "Alabama",
"density": 94.65
},
"geometry": ...
...
}
###load R scripts from dropbox
dropbox.eval <- function(x, noeval=F) {
require(RCurl)
intext <- getURL(paste0("https://dl.dropboxusercontent.com/",x), ssl.verifypeer = FALSE)
intext <- gsub("\r","", intext)
if (!noeval) eval(parse(text = intext), envir= .GlobalEnv)
return(intext)
}
##pull scripts from dropbox
dropbox.eval("s/wgb3vtd9qfc9br9/pkg.load.r")
dropbox.eval("s/tf4ni48hf6oh2ou/dropbox.r")
##load packages
pkg.load(c(ggplot2,plyr,gdata,sp,maptools,rgdal,reshape2,rjson))
###setup data frames
dl_from_dropbox("data.csv","dx3qrcexmi9kagx")
data<-read.csv(file='data.csv',header=TRUE)
###prepare GIS shape and data for plotting
dropbox.eval("s/y2jsx3dditjucxu/dlshape.r")
temp <- tempfile()
dlshape(shploc="http://files.hawaii.gov/dbedt/op/gis/data/highdist_n83.shp.zip", temp)
shape<- readOGR(".","highdist_n83") #HDOE high school districts
shape#proj4string
shape2<- spTransform(shape, CRS("+proj=longlat +datum=NAD83"))
data.2<-ddply(data, .(year, schoolcode, longitude, latitude,NAME,HDist,SDist), summarise,
total = sum(total),
cans= sum(cans))
###merging back shape properties and data frame
coordinates(data.2) <-~longitude + latitude
shape2#data$id <- rownames(shape2#data)
sh.df <- as.data.frame(shape2)
sh.fort <- fortify(shape2 , region = "id" )
sh.line<- join(sh.fort, sh.df , by = "id" )
mapdf <- merge( sh.line , data.2 , by.x= "NAME", by.y="NAME" , all=TRUE)
mapdf <- mapdf[ order( mapdf$order ) , ]
###exporting merged data frame as JSON
mapdf.sp <- mapdf
coordinates(mapdf.sp) <- c("long", "lat")
writeOGR(mapdf.sp, "hssra.geojson","mapdf", driver = "GeoJSON")
However, it appears that my features are repeating itself constantly. How can I aggregate the features information so that it looks more like the following:
var statesData = {"type":"FeatureCollection","features":[
{"type":"Feature","id":"01","properties":{"name":"Alabama","density":94.65},
"geometry":{"type":"Polygon","coordinates":[[[-87.359296,35.00118],
[-85.606675,34.984749],[-85.431413,34.124869],[-85.184951,32.859696],
[-85.069935,32.580372],[-84.960397,32.421541],[-85.004212,32.322956],
[-84.889196,32.262709],[-85.058981,32.13674],[-85.053504,32.01077],[-85.141136,31.840985],
[-85.042551,31.539753],[-85.113751,31.27686],[-85.004212,31.003013],[-85.497137,30.997536],
[-87.600282,30.997536],[-87.633143,30.86609],[-87.408589,30.674397],[-87.446927,30.510088],
[-87.37025,30.427934],[-87.518128,30.280057],[-87.655051,30.247195],[-87.90699,30.411504],
[-87.934375,30.657966],[-88.011052,30.685351],[-88.10416,30.499135],[-88.137022,30.318396],
[-88.394438,30.367688],[-88.471115,31.895754],[-88.241084,33.796253],
[-88.098683,34.891641],[-88.202745,34.995703],[-87.359296,35.00118]]]}},
{"type":"Feature","id":"02","properties":{"name":"Alaska","density":1.264},
"geometry":{"type":"MultiPolygon","coordinates":[[[[-131.602021,55.117982],
[-131.569159,55.28229],[-131.355558,55.183705],[-131.38842,55.01392],
[-131.645836,55.035827],[-131.602021,55.117982]]],[[[-131.832052,55.42469],
[-131.645836,55.304197],[-131.749898,55.128935],[-131.832052,55.189182],
[-131.832052,55.42469]]],[[[-132.976733,56.437924],[-132.735747,56.459832],
[-132.631685,56.421493],[-132.664547,56.273616],[-132.878148,56.240754],
[-133.069841,56.333862],[-132.976733,56.437924]]],[[[-133.595627,56.350293],
I ended up solving this question.
What I basically did was basically join the data.2 df to the shape file:
(shape2#data<-join(shape2#data,data.2)
and then using rgdal package to writeOGR in JSON format (using JSON driver) with the *.js extension.
I hope this helps others.