R, GeoJSON and Leaflet - json
I recently learned about leafletjs.com from an R-Bloggers.com post. One such tutorial that I would like to achieve is to create interactive choropleth maps with leaflet (http://leafletjs.com/examples/choropleth.html). I have been using the rjson package for R to create the data.js file to be read by leaflet. Although I have had success with using the provided shape file as a readable JSON file in leaflet, I am unable to repeat the process when trying to merge additional properties from the data frame ("data.csv") to the JSON file; in this case, I have done rGIS to attach data on the number of cans in each school listed in the data frame. What I would please like to achieve is to create a choropleth map in leaflet that displayes high school district (as identified by the NAME variable), and the sum of "cans". The issue, I believe, is that writeOGR exports the information as points, rather than polygon?
{
"type": "Feature",
"properties": {
"name": "Alabama",
"density": 94.65
},
"geometry": ...
...
}
###load R scripts from dropbox
dropbox.eval <- function(x, noeval=F) {
require(RCurl)
intext <- getURL(paste0("https://dl.dropboxusercontent.com/",x), ssl.verifypeer = FALSE)
intext <- gsub("\r","", intext)
if (!noeval) eval(parse(text = intext), envir= .GlobalEnv)
return(intext)
}
##pull scripts from dropbox
dropbox.eval("s/wgb3vtd9qfc9br9/pkg.load.r")
dropbox.eval("s/tf4ni48hf6oh2ou/dropbox.r")
##load packages
pkg.load(c(ggplot2,plyr,gdata,sp,maptools,rgdal,reshape2,rjson))
###setup data frames
dl_from_dropbox("data.csv","dx3qrcexmi9kagx")
data<-read.csv(file='data.csv',header=TRUE)
###prepare GIS shape and data for plotting
dropbox.eval("s/y2jsx3dditjucxu/dlshape.r")
temp <- tempfile()
dlshape(shploc="http://files.hawaii.gov/dbedt/op/gis/data/highdist_n83.shp.zip", temp)
shape<- readOGR(".","highdist_n83") #HDOE high school districts
shape#proj4string
shape2<- spTransform(shape, CRS("+proj=longlat +datum=NAD83"))
data.2<-ddply(data, .(year, schoolcode, longitude, latitude,NAME,HDist,SDist), summarise,
total = sum(total),
cans= sum(cans))
###merging back shape properties and data frame
coordinates(data.2) <-~longitude + latitude
shape2#data$id <- rownames(shape2#data)
sh.df <- as.data.frame(shape2)
sh.fort <- fortify(shape2 , region = "id" )
sh.line<- join(sh.fort, sh.df , by = "id" )
mapdf <- merge( sh.line , data.2 , by.x= "NAME", by.y="NAME" , all=TRUE)
mapdf <- mapdf[ order( mapdf$order ) , ]
###exporting merged data frame as JSON
mapdf.sp <- mapdf
coordinates(mapdf.sp) <- c("long", "lat")
writeOGR(mapdf.sp, "hssra.geojson","mapdf", driver = "GeoJSON")
However, it appears that my features are repeating itself constantly. How can I aggregate the features information so that it looks more like the following:
var statesData = {"type":"FeatureCollection","features":[
{"type":"Feature","id":"01","properties":{"name":"Alabama","density":94.65},
"geometry":{"type":"Polygon","coordinates":[[[-87.359296,35.00118],
[-85.606675,34.984749],[-85.431413,34.124869],[-85.184951,32.859696],
[-85.069935,32.580372],[-84.960397,32.421541],[-85.004212,32.322956],
[-84.889196,32.262709],[-85.058981,32.13674],[-85.053504,32.01077],[-85.141136,31.840985],
[-85.042551,31.539753],[-85.113751,31.27686],[-85.004212,31.003013],[-85.497137,30.997536],
[-87.600282,30.997536],[-87.633143,30.86609],[-87.408589,30.674397],[-87.446927,30.510088],
[-87.37025,30.427934],[-87.518128,30.280057],[-87.655051,30.247195],[-87.90699,30.411504],
[-87.934375,30.657966],[-88.011052,30.685351],[-88.10416,30.499135],[-88.137022,30.318396],
[-88.394438,30.367688],[-88.471115,31.895754],[-88.241084,33.796253],
[-88.098683,34.891641],[-88.202745,34.995703],[-87.359296,35.00118]]]}},
{"type":"Feature","id":"02","properties":{"name":"Alaska","density":1.264},
"geometry":{"type":"MultiPolygon","coordinates":[[[[-131.602021,55.117982],
[-131.569159,55.28229],[-131.355558,55.183705],[-131.38842,55.01392],
[-131.645836,55.035827],[-131.602021,55.117982]]],[[[-131.832052,55.42469],
[-131.645836,55.304197],[-131.749898,55.128935],[-131.832052,55.189182],
[-131.832052,55.42469]]],[[[-132.976733,56.437924],[-132.735747,56.459832],
[-132.631685,56.421493],[-132.664547,56.273616],[-132.878148,56.240754],
[-133.069841,56.333862],[-132.976733,56.437924]]],[[[-133.595627,56.350293],
I ended up solving this question.
What I basically did was basically join the data.2 df to the shape file:
(shape2#data<-join(shape2#data,data.2)
and then using rgdal package to writeOGR in JSON format (using JSON driver) with the *.js extension.
I hope this helps others.
Related
Scraping escaped JSON data within a <script type="text/javascript"> in R
I am currently trying to scrape the data from the two graphs on following html page (information from two graphs listed there: Forsmark and Ringhals): https://group.vattenfall.com/se/var-verksamhet/vara-energislag/karnkraft/aktuell-karnkraftsproduktion The data originate from script tags like this (fragment) <script type="text/javascript"> /*<![CDATA[*/ productionData = JSON.parse("{\"timestamp\":1582642616000,\"powerPlant\":\"Ringhals\", // etc </script> I would like to get two dataframes that looks like these: F1 F2 F3 number number number and R1 R2 R3 number number number I tried to use XML and xpath to parse an html page but did not get anywhere with that. Do you have any ideas? Thanks!
Those charts are <iframe>s that load from https://gvp.vattenfall.com/sweden/produced-power/iframe/forsmark and https://gvp.vattenfall.com/sweden/produced-power/iframe/ringhals so you should scrape those two pages directly. This was an interesting challenge. It becomes not too hard with rvest and jsonlite, which you will have to install if you don't already have. Both require rtools. Try this: library('rvest') library('jsonlite') # Load the URL (do the same for the other iframe) url <- 'https://gvp.vattenfall.com/sweden/produced-power/iframe/forsmark' # Parse it webpage <- read_html(url) # Extract the script element. That's a CSS selector for the specific one that holds the json data # You can find it in your browser's DevTools by finding the script element # and right-clicking, choosing Copy > CSS Path/Selector script_element <- html_nodes(webpage, 'body > section:nth-child(2) > script:nth-child(2)') # Extract its string content json = html_text(script_element) # Clean it up json = gsub("\n /*<![CDATA[*/\n productionData = JSON.parse(", "", json, fixed=TRUE) json = gsub(");\n /*]]>*/\n ", "", json, fixed=TRUE) json = gsub("\"{", "{\"", json, fixed=TRUE) json = gsub("}\"", "}", json, fixed=TRUE) json = gsub("{\"\\\"", "{\\\"", json, fixed=TRUE) # Extract data data = jsonlite::fromJSON(gsub("\\\"", "\"", json, fixed=TRUE)) Caveat: I'm not really an R expert, there is likely a more elegant way of doing this (particularly the data cleaning portion). But it works. For historical preservation, that takes this DOM node (the text content of the <script> tag): "\n /*<![CDATA[*/\n productionData = JSON.parse(\"{\\\"timestamp\\\":1582643336000,\\\"powerPlant\\\":\\\"Forsmark\\\",\\\"blockProductionDataList\\\":[{\\\"name\\\":\\\"F1\\\",\\\"production\\\":998.86194,\\\"percent\\\":99.88619},{\\\"name\\\":\\\"F2\\\",\\\"production\\\":1120.434,\\\"percent\\\":97.8545},{\\\"name\\\":\\\"F3\\\",\\\"production\\\":1189.7126,\\\"percent\\\":99.55754}]}\");\n /*]]>*/\n " and will result in data of this format > data $timestamp [1] 1.582647e+12 $powerPlant [1] "Forsmark" $blockProductionDataList name production percent 1 F1 997.7902 99.77902 2 F2 1131.6150 98.83100 3 F3 1190.0520 99.58594
Create a loop within a function so that URLs return a dataframe
I was provided with a list of identifiers (in this case the identifier is called an NPI). These identifiers can be copied and pasted to this website (https://npiregistry.cms.hhs.gov/registry/?). I want to return the name of the NPI number, name of the physician, address, phone number, and specialty. I have over 3,000 identifiers so a copy and paste is not efficient and not easily repeatable for future use. If possible, I would like to create a list of URLs, pass them into a function, and received a dataframe that provides me with the variables mentioned above (NPI, NAME, ADDRESS, PHONE, SPECIALTY). I was able to write a function that produces the URLs needed: Here are some NPI numbers for reference: 1417024746, 1386790517, 1518101096, 1255500625. This is my code for reading in the file that contains my NPIs npiList <- c("1417024746", "1386790517", "1518101096", "1255500625") npiList <- as.list(npiList) npiList <- unlist(npiList, use.names = FALSE) This is the function to return the list of URLs: npiaddress <- function(x){ url <- paste("https://npiregistry.cms.hhs.gov/registry/search-results- table?number=",x,"&addressType=ANY", sep = "") return(url) } I saved the list to a variable and perhaps this is my downfall: npi_urls <- npiaddress(npiList) From here I wrote a function that can accept a single URL, retrieves the data I want and turns it into a dataframe. My issue is that I cannot pass multiple URLs: npiLookup <- function (x){ url <- x webpage <- read_html(url) npi_html <- html_nodes(webpage, "td") npi <- html_text(npi_html) npi[4] <- gsub("\r?\n|\r", " ", npi[4]) npi[4] <- gsub("\r?\t|\r", " ", npi[4]) npiFinal <- npi[c(1:2,4:6)] npiFinal <- as.data.frame(npiFinal) npiFinal <- t(npiFinal) npiFinal <- as.data.frame(npiFinal) names(npiFinal) <- c("NPI", "NAME", "ADDRESS", "PHONE", "SPECIALTY") return(npiFinal) } For example: If I wanted to get a dataframe for the following identifier (1417024746), I can run this and it works: x <- npiLookup("https://npiregistry.cms.hhs.gov/registry/search-results-table?number=1417024746&addressType=ANY") View(x) My output for the example returns the NPI, NAME, ADDRESS, PHONE, SPECIALTY as desired, but again, I need to do this for several thousand NPI identifiers. I feel like I need a loop within npiLookup. I've also tried to put npi_urls into the npiLookup function but it does not work. Thank you for any help and for taking the time to read.
You're most of the way there. The final step uses this useful R idiom: do.call(rbind,lapply(npiList,function(npi) {url=npiaddress(npi); npiLookup(url)})) do.call is a base R function that applies a function (in this case rbind) to the list produced by lapply. That list is the result of running your npiLookup function on the url produced by your npiaddress for each element of npiList. A few further comments for future reference should anyone else come upon this question: (1) I don't know why you're doing the as.list, unlist sequence at the beginning; it's redundant and probably unnecessary. (2) The NPI registry provides a programming interface (API) that avoids the need to scrape data from the HTML pages; this might be more robust in the long run. (3) The NPI registry provides the entire dataset as a downloadable file; this might have been an easier way to go.
SDMX to dataframe with RSDMX in R
I'm trying to get data from the Lithuanian Statistics Department. They offer SDMX API with either XML or JSON (LSD). The example XML shown is : https://osp-rs.stat.gov.lt/rest_xml/data/S3R629_M3010217 which downloads the XML file. I tried following: devtools::install_github("opensdmx/rsdmx") library(rsdmx) string <- "https://osp-rs.stat.gov.lt/rest_xml/data/S3R629_M3010217" medianage <- readSDMX(string) which results in error: <simpleError in doTryCatch(return(expr), name, parentenv, handler): Invalid SDMX-ML file> I also tried simply reading in the manually downloaded file devtools::install_github("opensdmx/rsdmx") library(rsdmx) medianage <- readSDMX(file="rest_data_M3010217_20180116163251.xml" , isURL = FALSE) medianage <- as.data.frame(medianage) results in medianage being NULL (empty) Maybe soneone has an idea, how I could solve downloading /transforming the data from LSD by using either: https://osp-rs.stat.gov.lt/rest_xml/data/S3R629_M3010217 https://osp-rs.stat.gov.lt/rest_json/data/S3R629_M3010217 Thanks a lot!
In order to use rsdmx for this datasource, some enhancements have been added (see details at https://github.com/opensdmx/rsdmx/issues/141). You will need re-install rsdmx from Github (version 0.5-11) You can use the url of the SDMX-ML file library(rsdmx) url <- "https://osp-rs.stat.gov.lt/rest_xml/data/S3R629_M3010217" medianage <- readSDMX(url) df <- as.data.frame(medianage) A connector has been added in rsdmx to facilitate data query on the LSD (Lithuanian Statistics Department) SDMX endpoint. See below an example on how to use it. sdmx <- readSDMX(providerId = "LSD", resource = "data", flowRef = "S3R629_M3010217", dsd = TRUE) df <- as.data.frame(sdmx, labels = TRUE) The above example shows how to enrich the data.frame with code labels extracted from the SDMX Data Structure Definition (DSD). For this, specify dsd = TRUE with readSDMX. This allows then to use labels = TRUE when converting to data.frame. For filtering data with readSDMX, e.g. (startPeriod, endPeriod, code filters), check this page https://github.com/opensdmx/rsdmx/wiki#readsdmx-as-helper-function
Separate multiple JSON data in R
I am newbie of R and working on the below JSON file (snippet of head and relevant code example). {"mdsDat":{"x":[0.098453,-0.19334,-0.23836,-0.28512,0.010195,0.14132,-0.026636,-0.17141, 0.082936,-0.030503,0.22893,0.097832,0.19978,0.048286,0.050141,0.026101,-0.10637,0.040702, 0.013531,0.013531],"y":[-0.21144,-0.25048,0.14525,-0.06405,0.16668,-0.066238,-0.23403, 0.17033,-0.037128,-0.019674,0.0089501,0.0069049,0.10143,-0.14445,0.052727,0.15911,0.049328, 0.074852,0.045969,0.045969],"topics":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20], "Freq":[16.358,13.397,12.979,10.383,7.5134,7.16,6.1765,4.9584,4.6035,3.4624,3.4249,3.0709, 1.8512,1.8512,1.4977,0.90723,0.23895,0.16034,0.0031352,0.0031352], "cluster":[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]}, "tinfo":{"Term":["equation","equations","mathematics","beauty","mathematical","people", "beautiful","explain","world","time","understand","science","things","meaning","language", "symbols","simple","life","nature","interesting","art","agree","movie","find","numbers", "explore","mass","relationship","video","scientists","agree","scientists","amazing","learn", "apply","artistic","common","fear","beautiful","mathematics","study","mathematical","science", "meaning","physics","gravity","exchange","math","world","future","explained","sense", "process","words","equations","experience","move","faster","eyes","fall","nature","power", "human","exam","things","answer","people","world","ways","truth","equations","video", "balance","painting","space" ... "token.table":{"Term":["0","1","2","2","abstract","abstract","addition","admire","agree", "amazing","answer","answer","apple","application","applied","applied","apply","art","artist", "artist","artistic","arts","balance","balance","balance","beautiful","beautiful","beautiful", "beautiful","beauty","beauty","bring","bring","bunch","bunch","calculate","calculation", "collings","collings","collings","common","complex","complex","complex","contact","curiosity", "curiosity","daily","difficult","discover","documentary","documentary","documentary","earth", "earth","einstein","energy","energy","english","english","enjoy","enjoy","enjoy","equation", "equation","equation","equation","equation","equations","equations","equations","equations", "exam","exam","examination","examination","exchange","exchange","exchange","experience", "experience","explain","explained","explained","explore","eyes","eyes","fact","fall","famous", "famous","faster","faster","fear","feel","film","film","find","find","force","formula","formula","found", ... "work","world","world","world","worlds","years","years"],"Topic":[8,5,11,13,8,10,5,4,1,1,2, 15,9,10,9,12,1,3,4,9,1,5,2,4,10,1,4,7,14,2,6,3,15,10,15, ... ,16,3,7,2,14,2,5,1,8,4,9,10,15,1,2,14,9,11,13],"Freq":[0.97036,0.9702,0.75081,0.25027,0.22141, 0.77494,0.97584,0.96609,0.99493,0.98083,0.73954,0.24651,0.99013,0. ... In the situation of the project, I have created a variable of getJSONfield as below, getJSONfield <- json %>% spread_values(jsonList = jstring("token.table")) %>% select(jsonList) Also, it returns a JSON list something like this jsonNodes 1 list(Term = list("0", "1", "1", "2", "Data of JSON"), Topic = list(9, 1, 10,"Data of JSON"), Freq = list(0.99834, "Data of JSON")) And, I have to separate the multiple variables (i.e. Term, Topic and Freq) as head and edge of network diagram. Something that I would like to use the JSON data for: jsonNode <-lapply(json$**topic**, header=T, as.is = T) jsonTermsLinkS <- lapply(json$**term**, header=T, as.is=T) jsonTermsLinkE <- lapply(json$**freq**, header=T, as.is=T) But, first, I need to separate or call them successfully. Can anyone have any idea or advise on this? Great thanks if someone help me!
how to convert data from r to json and can be readable by geojson?
I have retrieve data from JSON to R and edited the data to dissolve the boundary. The data from JSON consist of list of Daerah and the coordinates, longitude and latitude. So I already dissolved the boundary to Wilayah which each Wilayah consist of many Daerah and I have several Wilayah. ditu <- readOGR('mys2.json', 'OGRGeoJSON') lookup <- read.csv("json/aaa") soa1 <- merge(ditu, lookup, by.x="Name", by.y = "Daerah", all.x = TRUE) slsoa1 <- gUnaryUnion(soa1, id = soa1$Wilayah) plot(slsoa1) My main problem is I want to save the data in JSON format and can be readable by GeoJSON. Any help would be appreciated.
library(geojsonio) library(rjsonio) mapwilayah<-geojson_json(slsoa1) all<-toJSON(mapwilayah) write(all,"mapwilayah.json")