Parse JSON with R - json

I am fairly new to R, but the more use it, the more I see how powerful it really is over SAS or SPSS. Just one of the major benefits, as I see them, is the ability to get and analyze data from the web. I imagine this is possible (and maybe even straightforward), but I am looking to parse JSON data that is publicly available on the web. I am not a programmer by any stretch, so any help and instruction you can provide will be greatly appreciated. Even if you point me to a basic working example, I probably can work through it.

RJSONIO from Omegahat is another package which provides facilities for reading and writing data in JSON format.
rjson does not use S4/S3 methods and so is not readily extensible, but still useful. Unfortunately, it does not used vectorized operations and so is too slow for non-trivial data. Similarly, for reading JSON data into R, it is somewhat slow and so does not scale to large data, should this be an issue.
Update (new Package 2013-12-03):
jsonlite: This package is a fork of the RJSONIO package. It builds on the parser from RJSONIO but implements a different mapping between R objects and JSON strings. The C code in this package is mostly from the RJSONIO Package, the R code has been rewritten from scratch. In addition to drop-in replacements for fromJSON and toJSON, the package has functions to serialize objects. Furthermore, the package contains a lot of unit tests to make sure that all edge cases are encoded and decoded consistently for use with dynamic data in systems and applications.

The jsonlite package is easy to use and tries to convert json into data frames.
Example:
library(jsonlite)
# url with some information about project in Andalussia
url <- 'https://api.stackexchange.com/2.2/badges?order=desc&sort=rank&site=stackoverflow'
# read url and convert to data.frame
document <- fromJSON(txt=url)

Here is the missing example
library(rjson)
url <- 'http://someurl/data.json'
document <- fromJSON(file=url, method='C')

The function fromJSON() in RJSONIO, rjson and jsonlite don't return a simple 2D data.frame for complex nested json objects.
To overcome this you can use tidyjson. It takes in a json and always returns a data.frame. It is currently not availble in CRAN, you can get it here: https://github.com/sailthru/tidyjson
Update: tidyjson is now available in cran, you can install it directly using install.packages("tidyjson")

For the record, rjson and RJSONIO do change the file type, but they don't really parse per se. For instance, I receive ugly MongoDB data in JSON format, convert it with rjson or RJSONIO, then use unlist and tons of manual correction to actually parse it into a usable matrix.

Try below code using RJSONIO in console
library(RJSONIO)
library(RCurl)
json_file = getURL("https://raw.githubusercontent.com/isrini/SI_IS607/master/books.json")
json_file2 = RJSONIO::fromJSON(json_file)
head(json_file2)

Related

Efficient JSON (de)serialization from/to millions of small files

I have a list containing millions of small records as dicts. Instead of serialising the entire thing to a single file as JSON, I would like to write each record to a separate file. Later I need to reconstitute the list from JSON deserialised from the files.
My goal isn't really minimising I/O so much as a general strategy for serialising individual collection elements to separate files concurrently or asynchronously. What's the most efficient way to accomplish this in either Python 3.x or similar high-level language?
For those looking for a modern Python-based solution supporting async/await, I found this neat package which does exactly what I'm looking for: https://pypi.org/project/aiofiles/. Specifically, I can do
import aiofiles, json
"""" A generator that reads and parses JSON from a list of files asynchronously."""
async json_reader(files: Iterable):
async for file in files:
async with aiofiles.open(file) as f:
data = await f.readlines()
yield json.loads(data)

Is it possible to ouput my data in JSON instead of XML in Oracle 11g

I currently have an API built with PL/SQL that uses Oracle 11g. It currently outputs the data in XML. I have been tasked to convert this output to JSON. Is this even possible to do with Oracle 11g? I have been researching the web and I see that JSON support did not arrive until Oracle 12c. Is there a way I can convert the output of this API from XML to JSON. Any help is appreciated. Thanks.
Here is the current XML output I have below:
<?xml version="1.0"?>
-<items>
-<CAGE_INFO>
<CAGE_CODE>21356</CAGE_CODE>
<ORG_NAME_ABBR>NASAJSC</ORG_NAME_ABBR>
<ORGANIZATION_NAME>NASA/ LYNDON B JOHNSON SPACE CENTER</ORGANIZATION_NAME>
</CAGE_INFO>
</items>
I need this JSON output using Oracle 11g:
{
"items": {
"CAGE_INFO": {
"CAGE_CODE": "21356",
"ORG_NAME_ABBR": "NASAJSC",
"ORGANIZATION_NAME": "NASA/ LYNDON B JOHNSON SPACE CENTER"
}
}
}
I guess it depends on what you mean by "convert".
If you literally mean convert and are looking for a tool that takes arbitrary XML and returns JSON, well, writing that would be a lot of work. Someone may have done that already, I don't know.
If you just need this output in JSON, you could find wherever your XML is generated and rewrite it (I assume it's backed by one or more SQL queries) and call a PL/SQL package that generates JSON. My first stop would be plsql-utils library and JSON_UTIL_PKG.
Or, take the function that generates your XML and rewrite it to construct JSON via string operations. JSON is just formatted text, after all. I've done this before and it might be the quickest way if your needs are simple.
A direct conversion might be difficult. Instead you can use XMLTYPE to first parse the XML and then convert it to JSON. The conversion to JSON can be either a custom piece of code or if you have APEX installed on the DB instance, then you can look at the the APEX_JSON package.
Check this out for a description of XMLTYPE in Oracle.
https://docs.oracle.com/cd/A97630_01/appdev.920/a96616/arxml24.htm

How to handle JSON body using Tornado w/ python3

Tornado 4.5.2 using Python3 represents the request body as a byte object instead of a native dictionary. This presents a problem for methods like RequestHandler.get_body_argument() which will not access the field correctly.
My question is how to correctly have tornado parse these bodies into more useable dictionaries so the standard library will work. I've looked throughout tornado's documentation and there's next to nothing on even the existence of this problem.
Am I missing something here or will I need to re-implement those methods myself?
Tornado never automatically parses JSON; it only automatically parses HTML-standard form encoding (the data models of form encoding and JSON are different, so it wouldn't make sense to use the same family of get_argument/get_arguments methods in the less-ambiguous JSON format). If you want to handle JSON requests, it's one line to parse it yourself:
args = tornado.escape.json_decode(self.request.body)

Apache Spark Read One Complex JSON File Per Record RDD or DF

I have an HDFS directory full of the following JSON file format:
https://www.hl7.org/fhir/bundle-transaction.json.html
What I am hoping to do is find an approach to flatten each individual file to become one df record or rdd tuple. I have tried everything I could think of using read.json(), wholeTextFiles(), etc.
If anyone has any best practices advice or pointers, it would be sincerely appreciated.
Load via wholeTextFiles something like this:
sc.wholeTextFiles(...) //RDD[(FileName, JSON)
.map(...processJSON...) //RDD[JsonObject]
Then, you can simply call the .toDF method so that it will infer from your JsonObject.
As far as the processJSON method, you could just use something like the Play json parser
mapPartitions is used when having to deal with data that is structured in a way that different elements can be on different lines. I've worked with both JSON and XML using mapPartitions.
mapPartitions works on an entire block of data at a time, as opposed to a single element. While you should be able to use the DataFrameReader API with JSON, mapPartitions can definitely do as you'd like. I don't have the exact code to flatten a JSON file, but I'm sure you can figure it out. Just remember the output must be an iterable type.

export R list into Julia via JSON

suppose I have this list in R
x = list(a=1:3,b=8:20)
and I write this to a json file on disk with
library(jsonlite)
cat(toJSON(x),file="f.json")
how can I use the Julia JSON package to read that? Can I?
# Julia
using JSON
JSON.parse("/Users/florianoswald/f.json")
gives a mistake - I guess it expects a json string.
Any alternatives? I would benefit from being able to pass a list (i.e. a nested structure) rather than tabular data. thanks!
If you want to do this with the current version of JSON you can use Julia's readall method to get a string from a file.
Pkg.clone("JSON") will get you the latest development version of JSON.jl (as opposed to the latest released version) – it seems parsefile is not released yet.