JSON Processing Error in R - json

I'm trying to reverse geocode a dataframe of lat/long coordinates. For some reason, the code spits out a single FIPS code, and stops with error messages. I'm not sure what is going on -- could it be the server rate-limiting queries?
dput(latlon)
structure(list(lat = c(38.6536995, 28.5959782, 39.2349128, 40.6988037,
36.7276906, 35.0481824), lon = c(-121.3526261, -81.4514073, -76.6117247,
-73.9183688, -119.803458, -106.4910219)), .Names = c("lat", "lon"
), row.names = c(NA, -6L), class = "data.frame")
#Reverse-Geocoding Function to get county
latlong2fips <- function(latitude, longitude) {
url <- "http://data.fcc.gov/api/block/find?format=json&latitude=%f&longitude=%f"
url <- sprintf(url, latitude, longitude)
json <- RCurl::getURL(url)
json <- RJSONIO::fromJSON(json)
return(as.character(json$County['FIPS']))
}
latlong2fips(latlon$lat, latlon$lon)
[1] "06067"
Warning messages:
1: In if (is.na(encoding)) return(0L) :
the condition has length > 1 and only the first element will be used
2: In if (is.na(i)) { :
the condition has length > 1 and only the first element will be used

The error is because fromJSON doesn't accept a vector. So you need to apply fromJSON to each element of your vector.
I also use jsonlite as my go-to JSON parser in R.
latitude <- latlon$lat
longitude <- latlon$lon
url <- "http://data.fcc.gov/api/block/find?format=json&latitude=%f&longitude=%f"
url <- sprintf(url, latitude, longitude)
json <- RCurl::getURL(url)
## up to here gives you a vector of results, so you now need to extract the 'FIPS' for each vector element
lapply(json, function(x){
jsonlite::fromJSON(x)$County$FIPS
})
$`http://data.fcc.gov/api/block/find?format=json&latitude=38.653700&longitude=-121.352626`
[1] "06067"
$`http://data.fcc.gov/api/block/find?format=json&latitude=28.595978&longitude=-81.451407`
[1] "12095"
$`http://data.fcc.gov/api/block/find?format=json&latitude=39.234913&longitude=-76.611725`
[1] "24510"
$`http://data.fcc.gov/api/block/find?format=json&latitude=40.698804&longitude=-73.918369`
[1] "36047"
$`http://data.fcc.gov/api/block/find?format=json&latitude=36.727691&longitude=-119.803458`
[1] "06019"
$`http://data.fcc.gov/api/block/find?format=json&latitude=35.048182&longitude=-106.491022`
[1] "35001"

Related

R: Selecting certain from a JSON file

I've imported a JSON file into R from ( http://eric.clst.org/wupl/Stuff/gz_2010_us_040_00_20m.json ) and I'm trying to select only counties in Kansas.
Right now I have all the data into one variable and I'm trying to make subdata of this that is just counties of Kansas. I'm not sure how to go about this.
What you have there is geoJson, which can be read directly by library(sf), to give you an sf object, which is also data.frame. Then you can use the usual data.frame subsetting operations
library(sf)
sf <- sf::read_sf("http://eric.clst.org/wupl/Stuff/gz_2010_us_040_00_20m.json")
sf[sf$NAME == "Kansas", ]
# Simple feature collection with 1 feature and 5 fields
# geometry type: MULTIPOLYGON
# dimension: XY
# bbox: xmin: -102.0517 ymin: 36.99308 xmax: -94.58993 ymax: 40.00316
# epsg (SRID): 4326
# proj4string: +proj=longlat +datum=WGS84 +no_defs
# GEO_ID STATE NAME LSAD CENSUSAREA geometry
# 30 0400000US20 20 Kansas 81758.72 MULTIPOLYGON(((-99.541116 3...
And seeing as you want the individual counties, you need to use the counties data set
sf_counties <- sf::read_sf("http://eric.clst.org/wupl/Stuff/gz_2010_us_050_00_500k.json")
sf_counties[sf_counties$STATE == 20, ]
To stay with a JSON workflow, can try jqr
library(jqr)
url <- 'http://eric.clst.org/wupl/Stuff/gz_2010_us_040_00_20m.json'
download.file(url, (f <- tempfile(fileext = ".json")))
res <- paste0(readLines(f), collapse = " ")
out <- jq(res, '.features[] | select(.properties.NAME == "Kansas")')
can map easily like
library(leaflet)
leaflet() %>%
addTiles() %>%
addGeoJSON(out) %>%
setView(-98, 38, 6)
library(rjson)
lst=fromJSON(file = 'http://eric.clst.org/wupl/Stuff/gz_2010_us_040_00_20m.json')
index = which(sapply(lapply(lst$features,"[[",'properties'),'[[','NAME')=='Kansas')
subdata = lst$features[[index]]

Convert JSON into CSV in R programming

I have JSON of the form:
{"abc":
{
"123":[45600],
"378":[78689],
"343":[23456]
}
}
I need to convert above format JSON to CSV file in R.
CSV format :
ds y
123 45600
378 78689
343 23456
I'm using R library rjson to do so. I'm doing something like this:
jsonFile <- fromJSON(file=fileName)
json_data_frame <- as.data.frame(jsonFile)
but it's not doing the way I need it.
You can use jsonlite::fromJSON to read the data into a list, though you'll need to pull it apart to assemble it into a data.frame:
abc <- jsonlite::fromJSON('{"abc":
{
"123":[45600],
"378":[78689],
"343":[23456]
}
}')
abc <- data.frame(ds = names(abc[[1]]),
y = unlist(abc[[1]]), stringsAsFactors = FALSE)
abc
#> ds y
#> 123 123 45600
#> 378 378 78689
#> 343 343 23456
I believe you got the json file reader - fromJSON function right.
df <- data.frame( do.call(rbind, rjson::fromJSON( '{"a":true, "b":false, "c":null}' )) )
The code below gets me Google's Location History (json) archive from https://takeout.google.com. This is if you have enabled a 'Timeline' (location tracking) in Google Maps on your cell. Credit to http://rpubs.com/jsmanij/131030 for the original code. Note that json files like this can be quite large and plyr::llply is so much more efficient than lapply in parsing a list. Data.table gives me the more efficient 'rbindlist' to take the list to a data.table. Google logs between 350 to 800 GPS calls each day for me! A multi-year location history is converted to quite a sizeable list by 'fromJSON':
format(object.size(doc1),units="MB")
[1] "962.5 Mb"
I found 'do.call(rbind..)' un-optimized. The timestamp, lat, and long needed some work to be useful to Google Earth Pro, but I am getting carried away. At the end, I use 'write.csv' to take a data.table to CSV. That is all the original OP wanted here.
ts lat long latitude longitude
1: 1416680531900 487716717 -1224893214 48.77167 -122.4893
2: 1416680591911 487716757 -1224892938 48.77168 -122.4893
3: 1416680668812 487716933 -1224893231 48.77169 -122.4893
4: 1416680728947 487716468 -1224893275 48.77165 -122.4893
5: 1416680791884 487716554 -1224893232 48.77166 -122.4893
library(data.table)
library(rjson)
library(plyr)
doc1 <- fromJSON(file="LocationHistory.json", method="C")
object.size(doc1)
timestamp <- function(x) {as.list(x$timestampMs)}
timestamps <- as.list(plyr::llply(doc1$locations,timestamp))
timestamps <- rbindlist(timestamps)
latitude <- function(x) {as.list(x$latitudeE7)}
latitudes <- as.list(plyr::llply(doc1$locations,latitude))
latitudes <- rbindlist(latitudes)
longitude <- function(x) {as.list(x$longitudeE7)}
longitudes <- as.list(plyr::llply(doc1$locations,longitude))
longitudes <- rbindlist(longitudes)
datageoms <- setnames(cbind(timestamps,latitudes,longitudes),c("ts","lat","long")) [order(ts)]
write.csv(datageoms,"datageoms.csv",row.names=FALSE)

Substring in Data Frame R

I have data from GPS log like this : (this data in rows of data frame columns)
{"mAccuracy":20.0,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":21677339000000,"mExtras":{"networkLocationSource":"cached","networkLocationType":"wifi","noGPSLocation":{"mAccuracy":20.0,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":21677339000000,"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1811956,"mLongitude":126.9104909,"mProvider":"network","mSpeed":0.0,"mTime":1402801381486},"travelState":"stationary"},"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1811956,"mLongitude":126.9104909,"mProvider":"network","mSpeed":0.0,"mTime":1402801381486,"timestamp":1402801665.512}
The problem is I only need Latitude and longitude value, so I think i can use substring and sappy for applying to all data in dataframe.
But I am not sure this way is handsome because when i use substring ex: substr("abcdef", 2, 4) so I need to count who many chars from beginning until "mLatitude" , so anybody can give suggestion the fast way to processing it?
Thank you to #mnel for answering question, it's work , but i still have problem
From mnel answer I've created function like this :
fgps <- function(x) {
out <- fromJSON(x)
c(out$mExtras$noGPSLocation$mLatitude,
out$mExtras$noGPSLocation$mLongitude)
}
and then this is my data :
gpsdata <- head(dfallgps[,4],2)
[1] "{\"mAccuracy\":23.128,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":76437488000000,\"mExtras\":{\"networkLocationSource\":\"cached\",\"networkLocationType\":\"wifi\",\"noGPSLocation\":{\"mAccuracy\":23.128,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":76437488000000,\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1779956,\"mLongitude\":126.9089661,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402894224187},\"travelState\":\"stationary\"},\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1779956,\"mLongitude\":126.9089661,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402894224187,\"timestamp\":1402894517.425}"
[2] "{\"mAccuracy\":1625.0,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":77069916000000,\"mExtras\":{\"networkLocationSource\":\"cached\",\"networkLocationType\":\"cell\",\"noGPSLocation\":{\"mAccuracy\":1625.0,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":77069916000000,\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1811881,\"mLongitude\":126.9084072,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402894857416},\"travelState\":\"stationary\"},\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1811881,\"mLongitude\":126.9084072,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402894857416,\"timestamp\":1402894857.519}"
When run sapply why the data still shows in the result not just the results values.
sapply(gpsdata, function(gpsdata) fgps(gpsdata))
{"mAccuracy":23.128,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":76437488000000,"mExtras":{"networkLocationSource":"cached","networkLocationType":"wifi","noGPSLocation":{"mAccuracy":23.128,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":76437488000000,"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1779956,"mLongitude":126.9089661,"mProvider":"network","mSpeed":0.0,"mTime":1402894224187},"travelState":"stationary"},"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1779956,"mLongitude":126.9089661,"mProvider":"network","mSpeed":0.0,"mTime":1402894224187,"timestamp":1402894517.425}
[1,] 35.178
[2,] 126.909
{"mAccuracy":1625.0,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":77069916000000,"mExtras":{"networkLocationSource":"cached","networkLocationType":"cell","noGPSLocation":{"mAccuracy":1625.0,"mAltitude":0.0,"mBearing":0.0,"mElapsedRealtimeNanos":77069916000000,"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1811881,"mLongitude":126.9084072,"mProvider":"network","mSpeed":0.0,"mTime":1402894857416},"travelState":"stationary"},"mHasAccuracy":true,"mHasAltitude":false,"mHasBearing":false,"mHasSpeed":false,"mIsFromMockProvider":false,"mLatitude":35.1811881,"mLongitude":126.9084072,"mProvider":"network","mSpeed":0.0,"mTime":1402894857416,"timestamp":1402894857.519}
[1,] 35.18119
[2,] 126.90841
I want the result looks like :
[1] 35.178 126.909
[2] 35.18119 126.90841
Thank you
It would appear that your data is in JSON format. Therefore, use a RJSONIO::fromJSON to read the file.
E.g.:
txt <- "{\"mAccuracy\":20.0,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":21677339000000,\"mExtras\":{\"networkLocationSource\":\"cached\",\"networkLocationType\":\"wifi\",\"noGPSLocation\":{\"mAccuracy\":20.0,\"mAltitude\":0.0,\"mBearing\":0.0,\"mElapsedRealtimeNanos\":21677339000000,\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1811956,\"mLongitude\":126.9104909,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402801381486},\"travelState\":\"stationary\"},\"mHasAccuracy\":true,\"mHasAltitude\":false,\"mHasBearing\":false,\"mHasSpeed\":false,\"mIsFromMockProvider\":false,\"mLatitude\":35.1811956,\"mLongitude\":126.9104909,\"mProvider\":\"network\",\"mSpeed\":0.0,\"mTime\":1402801381486,\"timestamp\":1402801665.512}"
Then process:
library(RJSONIO)
out <- fromJSON(txt)
out$$mLongitude
#[1] 126.9105
out$mLatitude
#[1] 35.1812
# to process multiple values
tt <- rep(txt,2)
myData <- lapply(tt, fromJSON)
latlong <- do.call(rbind,lapply(myData, `[` ,c('mLatitude','mLongitude')))
# or using rbind list
library(data.table)
latlong <- rbindlist(lapply(myData, `[` ,c('mLatitude','mLongitude')))

jsonlite not unpacking JSON from Postgres appropriately

I am pulling data directly from a Postgres database into R, where one of the columns in the Postgres table contains rows of JSON objects. I am trying to unpack the JSON objects and have them flatten into columns in an R dataframe, but so far, I'm getting mangled results.
Here's my code:
library(RPostgreSQL)
library(jsonlite)
drv <- dbDriver("PostgreSQL")
con <- dbConnect(drv, host="xxx", dbname="xxx", user="xxx", password="xxx")
query="select column1, column2, json from dummy_table limit 2"
resultSet <- dbSendQuery(con, query)
rawData<-fetch(resultSet,n=-1)
postgresqlCloseConnection(con)
rawData$json
[1]"{\"id\":{\"publisherName\":\"pub1\",\"visitorId\":\"visitor1\",\"timestamp\":1234},\"startAt\":4567,\"endAt\":8910}"
[2]"{\"id\":{\"publisherName\":\"pub2\",\"visitorId\":\"visitor2\",\"timestamp\":2345},\"startAt\":678,\"endAt\":91011}"
unpacked<-fromJSON(rawData$json, simplifyDataFrame=FALSE)
unpacked
$id
$id$publisherName
[1] "pub1"
$id$visitorId
[1] "visitor1"
$id$timestamp
[1] 1234
$startAt
[1] 4567
$endAt
[1] 8910
As you can see, it only unpacked the first JSON object, and it left things quasi-nested (which is fine, but optimally, i would want all the data to live in one level in a dataframe).
I would want the data to look like this:
unpacked
id.publisherName id.visitorId id.timestamp startAt endAt
pub1 visitor1 1234 4567 8910
pub2 visitor2 2345 678 91011
EDIT: Adding the rawData dataframe:
rawData<-structure(list(
column1 = c("abcd", "efgh"
),
column2 = structure(c(123, 456), class = c("POSIXct",
"POSIXt"), tzone = ""),
json = c("{\"id\":{\"publisherName\":\"pub1\",\"visitorId\":\"visitor1\",\"timestamp\":1234},\"startAt\":4567,\"endAt\":8910}",
"{\"id\":{\"publisherName\":\"pub2\",\"visitorId\":\"visitor2\",\"timestamp\":2345},\"startAt\":678,\"endAt\":91011}"
))
, .Names = c("column1", "column2", "json"),
row.names = 1:2, class = "data.frame")
Here's what happens with the paste function.
rawJSON <- paste("[", paste(rawData$json, collapse=","), "]")
rawJSON <- fromJSON(rawJSON, simplifyDataFrame=FALSE)
rawJSON
[[1]]
[[1]]$id
[[1]]$id$publisherName
[1] "pub1"
[[1]]$id$visitorId
[1] "visitor1"
[[1]]$id$timestamp
[1] 1234
[[1]]$startAt
[1] 4567
[[1]]$endAt
[1] 8910
[[2]]
[[2]]$id
[[2]]$id$publisherName
[1] "pub2"
[[2]]$id$visitorId
[1] "visitor2"
[[2]]$id$timestamp
[1] 2345
[[2]]$startAt
[1] 678
[[2]]$endAt
[1] 91011
The fromJSON function assumes that you are feeding it a single complete json string. Character vectors will be collapsed into single string. In your case your data contains multiple separate json objects. So you either need to convert them all individually:
lapply(rawData$json, fromJSON)
Or, to get the result that you're after, use stream_in to parse them as ndjson.
mydata <- jsonlite::stream_in(textConnection(rawData$json))
fromJSON(myjson)
See the jsonlite ?stream_in manual page for more details.

Unable to convert JSON to dataframe

I want to convert a json-file into a dataframe in R. With the following code:
link <- 'https://www.dropbox.com/s/ckfn1fpkcix1ccu/bevingenbag.json'
document <- fromJSON(file = link, method = 'C')
bev <- do.call("cbind", document)
i'm getting this:
type features
1 FeatureCollection list(type = "Feature", geometry = list(type = "Point", coordinates = c(6.54800000288927, 52.9920000044505)), properties = list(gid = "1496600", yymmdd = "19861226", lat = "52.992", lon = "6.548", mag = "2.8", depth = "1.0", knmilocatie = "Assen", baglocatie = "Assen", tijd = "74751"))
which is the first row of a matrix. All the other rows have the same structure. I'm interested in the properties = list(gid = "1496600", yymmdd = "19861226", lat = "52.992", lon = "6.548", mag = "2.8", depth = "1.0", knmilocatie = "Assen", baglocatie = "Assen", tijd = "74751") part, which should be converted into a dataframe with the columns gid, yymmdd, lat, lon, mag, depth, knmilocatie, baglocatie, tijd.
I searched for and tryed several solutions but none of them worked. I used the rjson package for this. I also tryed the RJSONIO & jsonlite package, but was unable to extract the desired information.
Anyone an idea how to solve this problem?
Here's a way to obtain the data frame:
library(rjson)
document <- fromJSON(file = "bevingenbag.json", method = 'C')
dat <- do.call(rbind, lapply(document$features,
function(x) data.frame(x$properties)))
Edit: How to replace empty values with NA:
dat$baglocatie[dat$baglocatie == ""] <- NA
The result:
head(dat)
gid yymmdd lat lon mag depth knmilocatie baglocatie tijd
1 1496600 19861226 52.992 6.548 2.8 1.0 Assen Assen 74751
2 1496601 19871214 52.928 6.552 2.5 1.5 Hooghalen Hooghalen 204951
3 1496602 19891201 52.529 4.971 2.7 1.2 Purmerend Kwadijk 200914
4 1496603 19910215 52.771 6.914 2.2 3.0 Emmen Emmen 21116
5 1496604 19910425 52.952 6.575 2.6 3.0 Geelbroek Ekehaar 102631
6 1496605 19910808 52.965 6.573 2.7 3.0 Eleveld Assen 40114
This is just another, quite similar, approach.
#SvenHohenstein's approach creates a dataframe at each step, an expensive process. It's much faster to create vectors and re-type the whole result at the end. Also, Sven's approach makes each column a factor, which might or might not be what you want. The approach below runs about 200 times faster. This can be important if you intend to do this repeatedly. Finally, you will need to convert columns lon, lat, mag, and depth to numeric.
library(microbenchmark)
library(rjson)
document <- fromJSON(file = "bevingenbag.json", method = 'C')
json2df.1 <- function(json){ # #SvenHohenstein approach
df <- do.call(rbind, lapply(json$features,
function(x) data.frame(x$properties, stringsAsFactors=F)))
return(df)
}
json2df.2 <- function(json){
df <- do.call(rbind,lapply(json[["features"]],function(x){c(x$properties)}))
df <- data.frame(apply(result,2,as.character), stringsAsFactors=F)
return(df)
}
microbenchmark(x<-json2df.1(document), y<-json2df.2(document), times=10)
# Unit: milliseconds
# expr min lq median uq max neval
# x <- json2df.1(document) 2304.34378 2654.95927 2822.73224 2977.75666 3227.30996 10
# y <- json2df.2(document) 13.44385 15.27091 16.78201 18.53474 19.70797 10
identical(x,y)
# [1] TRUE