R: Un-escape JSON string and build JSON object - json

I am just learning to use the jsonlite library to fetch json data from a server. However in the received json response (whose structure I have no control over), there seems to be a node that I can only describe as a chunk of 'escaped' JSON data, right in the middle of the json object. How do I build a JSON object out of it? I am able to extract each such value OK but then I can't use it as is without turning it into a true JSON object.
example:
library(jsonlite)
myFakeJSON <- '"{"country": "UK","ranking": "45"}"'
json <- toJSON(myFakeJSON)
but:
> json
[1] "\"{\"country\": \"UK\",\"ranking\": \"45\"}\""
The result is not a json object.. What am I doing wrong? How do I escape (or un-escape??) the received data? Seems like something obvious, but not to me :(

I think you are making two errors. First: too many quotes, second: wrong test for JSON-hood. If you wanted to use toJSON, then you would give it an R object for conversion rather than a effort at as JSON string.
> myFakeJSON <- '{"country": "UK", "ranking": "45"}'
> fromJSON(myFakeJSON)
$country
[1] "UK"
$ranking
[1] "45"
If you just need to remove the extra double-quotes on the "outside" of the curly braces, then this regex replacement succeeds on this small example:
> json <- fromJSON(gsub("\\}\\\"", "}", gsub("\\\"\\{","{", myFakeJSON))); json
$country
[1] "UK"
$ranking
[1] "45"

Related

I need to flatten a JSON web response with nested arrays [[ ]] into a DataFrame

I'm trying to convert an http JSON response into a DataFrame, then out to CSV file.
I'm struggling with the JSON into DF.
http line:
http://api.kraken.com/0/public/OHLC?pair=XXBTZEUR&interval=1440
JSON response (part of - 720 records in arrays):
[formatted using a JSON site does not post here apparently]
{
"error": [],
"result": {
"XXBTZEUR": [
[1486252800, "959.7", "959.7", "935.0", "943.6", "945.6", "4423.72544809", 5961],
[1486339200, "943.8", "959.7", "940.0", "952.9", "953.5", "4464.48492401", 7678],
[1486425600, "953.6", "990.0", "952.7", "988.5", "977.3", "8123.94462701", 10964],
[1486512000, "988.4", "1000.1", "963.3", "987.5", "983.7", "10989.31074845", 16741],
[1486598400, "987.4", "1007.4", "847.9", "926.4", "934.5", "22530.11626076", 52668],
[1486684800, "926.4", "949.0", "886.0", "939.7", "916.7", "11173.53504917", 12588],
],
"last": 1548288000
}
}
I get
KeyError: 'XXBTZEUR'
on the json_normalize line. Seems to indicate to me that json_normalize is trying to build the DF from the "XXBTZEUR" level, not lower down at the record level. How do I get json_normalize to read the records instead. ie How do I get it to reference deep enough?
I have read several other posts on this site without understanding what I'm doing wrong.
One post mentions that json.loads() must be used. Is json_string.json() also loading the JSON object or do I need the json.loads() instead?
Also tried variations of json_normalize:
BTCEUR_Daily_Table = json_normalize(json_data[[]])
TypeError: unhashable type: 'list'
Can normalize not load an array into a DF line?
code so far:
BTCEUR_Daily_URL = 'http://api.kraken.com/0/public/OHLC?pair=XXBTZEUR&interval=1440'
json_string = requests.get(BTCEUR_Daily_URL)
json_data = json_string.json()
BTCEUR_Daily_Table = json_normalize(json_data, record_path=["XXBTZEUR"])
What I need in result:
In my DF, I just want the arrayed records shown in the "body" of the JSON structure. None of the header & footer are needed.
The solution I found was:
BTCEUR_Daily_Table = json_normalize(data=json_data, record_path=[['result','XXBTZEUR']])
The 2nd parameter specifies the full "path" to the parent label of the records.
Apparently double brackets are needed to specify a full path, otherwise the 2 labels are taken to mean 2 top level names.
Without another post here, I would never have found the solution.

How can I encode an R vector of length 1 as a single value in json using the jsonlite R package?

I am trying to encode R lists into json using the jsonlite package and the toJSON function. I have a simple item like:
list(op='abc')
I'd like that to become:
{
"op" : "abc"
}
Instead, I get:
{
"op" : ["abc"]
}
The API to which I am trying to feed this json chokes on the latter and requires the former. Any suggestions on how to get the former behavior from jsonlite (or another R json package)?
The auto_unbox argument does the trick with the jsonlite package:
toJSON(list(op='abc'),auto_unbox=TRUE)
yields:
{"op":"abc"}
Update: based on comment, this approach is probably safer, and an example of why:
> jsonlite::toJSON(list(x=unbox(1),y=c(1,2)))
{"x":1,"y":[1,2]}
> jsonlite::toJSON(list(x=unbox(1),y=unbox(c(1,2)))) # expect error here.
Error: Tried to unbox a vector of length 2

Using apply on an array of objects in a json

I have a json which I am reading in R and converting to a list object. For a key "metrics", there is an array of multiple objects of the same type.
Json structure:
{"metrics":[{"metricName":"abc",
"metricType":"def"
},
{"metricName":"ghi",
"metricType":"jkl"
}]
}
This is how my list object looks like:
$metrics
$metrics[[1]]
$metrics[[1]]$metricName
[1] "abc"
$metrics[[1]]$metricType
[1] "def"
$metrics[[2]]
$metrics[[2]]$metricName
[1] "ghi"
$metrics[[2]]$metricType
[1] "jkl"
I want to apply a function (someFunc) to each object of the array. $metrics[[1]],$metrics[[2]]. How can this be done using apply family of functions?
somefunc(x){return(list(x$metricName,x$metricType)}
I tried concatenating like this:
lapply(lapply(metrics,"["),someFunc)
This does not throw an error but gives empty lists as output. The someFunc expects x$metricName,x$metricType objects to process. But using "[" does not render that kind of object I guess. Can this be handled using the apply functions?
Can you explici what your function someFunc is doing (and its parameters)?
This is working for me
someFunc=function(metricName,metricType){
return(paste(metricName,metricType))
}
metrics=list(list(metricName="abc",metricType="m"),
list(metricName="gg",metricType="L"))
lapply(metrics,FUN=function(el){someFunc(el$metricName,el$metricType)})
(use sapply eventually if you want to have a vector.

Parsing nodes on JSON with Scala -

I've been asked to parse a JSON file to get all the buses that are over a specified speed inputed by the user.
The JSON file can be downloaded here
It's like this:
{
"COLUMNS": [
"DATAHORA",
"ORDEM",
"LINHA",
"LATITUDE",
"LONGITUDE",
"VELOCIDADE"
],
"DATA": [
[
"04-16-2015 00:00:55",
"B63099",
"",
-22.7931,
-43.2943,
0
],
[
"04-16-2015 00:01:02",
"C44503",
781,
-22.853649,
-43.37616,
25
],
[
"04-16-2015 00:11:40",
"B63067",
"",
-22.7925,
-43.2945,
0
],
]
}
The thing is: I'm really new to scala and I have never worked with json before (shame on me). What I need is to get the "Ordem", "Linha" and "Velocidade" from DATA node.
I created a case class to enclousure all the data so as to later look for those who are over the specified speed.
case class Bus(ordem: String, linha: Int, velocidade: Int)
I did this reading the file as a textFile and spliting. Although this way, I need to foreknow the content of the file in order to go to the lines after DATA node.
I want to know how to do this using a JSON parser. I've tried many solutions, but I couldn't adapt to my problem, because I need to extract all the lines from DATA node instead of nodes inside one node.
Can anyone help me?
PS: Sorry for my english, not a native speaker.
First of all, you need to understand the different JSON data types. The basic types in JSON are numbers, strings, booleans, arrays, and objects. The data returned in your example is an object with two keys: COLUMNS and DATA. The COLUMNS key has a value that is an array of strings and numbers. The DATA key has a value which is an array of arrays of strings.
You can use a library like PlayJSON to work with this type of data:
val js = Json.parse(x).as[JsObject]
val keys = (js \ "COLUMNS").as[List[String]]
val values = (js \ "DATA").as[List[List[JsValue]]]
val busses = values.map(valueList => {
val keyValues = (keys zip valueList).toMap
for {
ordem <- keyValues("ORDEM").asOpt[String]
linha <- keyValues("LINHA").asOpt[Int]
velocidade <- keyValues("VELOCIDADE").asOpt[Int]
} yield Bus(ordem, linha, velocidade)
})
Note the use of asOpt when converting the properties to the expected types. This operator converts the key-values to the provided type if possible (wrapped in Some), and returns None otherwise. So, if you want to provide a default value instead of ignoring other results, you could use keyValues("LINHA").asOpt[Int].getOrElse(0), for example.
You can read more about the Play JSON methods used here, like \ and as, and asOpt in their docs.
You can use Spark SQL to achieve it. Refer section under JSON Datasets here
In essence, Use spark APIs to load a JSON and register it as temp table.
You can run your SQL queries on the table from there.
As seen on #Ben Reich answer, that code works great. Thank you very much.
Although, my Json had some type problems on "Linha". As it can be seen on the JSON example that I put on the Question, there are "" and also numbers, e.g., 781.
When trying to do keyValues("LINHA").asOpt[Int].getOrElse(0), it was producing an error saying that value flatMap is not a member of Int.
So, I had to change some things:
case class BusContainer(ordem: String, linha: String, velocidade: Int)
val jsonString = fromFile("./project/rj_onibus_gps.json").getLines.mkString
val js = Json.parse(jsonString).as[JsObject]
val keys = (js \ "COLUMNS").as[List[String]]
val values = (js \ "DATA").as[List[List[JsValue]]]
val buses = values.map(valueList => {
val keyValues = (keys zip valueList).toMap
println(keyValues("ORDEM"),keyValues("LINHA"),keyValues("VELOCIDADE"))
for {
ordem <- keyValues("ORDEM").asOpt[String]
linha <- keyValues("LINHA").asOpt[Int].orElse(keyValues("LINHA").asOpt[String])
velocidade <- keyValues("VELOCIDADE").asOpt[Int]
} yield BusContainer(ordem, linha.toString, velocidade)
})
Thanks for the help!

How do I POST a JSON formatted request to GET JSON data from a URL in R into the data.frame in a less verbose manner?

I have written the following code in R to start using a data request API. It's a normal web service JSON API.
library(RJSONIO)
library(RCurl)
library(httr)
r <- POST("http://api.scb.se/OV0104/v1/doris/sv/ssd/START/PR/PR0101/PR0101A/KPIFastM2",
body = '{ "query": [], "response": { "format": "json" } }')
stop_for_status(r)
a<-content(r, "text", "application/json", encoding="UTF-8")
cat(a, file = "test.json")
x<-fromJSON(file("test.json", "r"))
mydf<-do.call(rbind, lapply(x$data, data.frame))
colnames(mydf)<-c("YearMonth", "CPI")
Basically it initialized a get reuest for the URL using httr and then convert the resulting JSON data to an R structure via fromJSON. The JSON request looks like this:
{ "query": [], "response": { "format": "json" } }
Indeed my code gets the data into a data.frame like I wanted it to, but it is painfully verbose and I refuse to believe that all of these lines are necessary to achieve the wanted result. The wanted result is the mydf data.frame of course.
So to my question: What is the shortest and most correct way to get the data from the web service into the data.frame?
Cheers,
Michael
There are two problems. One is that you are not using jsonlite :-) The other one is that your JSON source seems to prefix the blob with a U+FEFF Byte order Mark character that makes the JSON invalid. RFC7159 says:
Implementations MUST NOT add a byte order mark to the beginning of a JSON text. In the interests of interoperability, implementations that parse JSON texts MAY ignore the presence of a byte order mark rather than treating it as an error.
So scb.se is not formatting their JSON correctly. Either way, try this:
library(jsonlite)
library(httr)
req <- POST("http://api.scb.se/OV0104/v1/doris/sv/ssd/START/PR/PR0101/PR0101A/KPIFastM2",
body = '{ "query": [], "response": { "format": "json" } }')
stop_for_status(req)
json <- content(req, "text")
# JSON starts with an invalid character:
validate(json)
json <- substring(json, 2)
validate(json)
# Now we can parse
object <- jsonlite::fromJSON(json)
print(objects)