How to parse JSON with fromJSON on a dataframe column? - json

I have the following data.frame with one column called "json" and two rows of JSON data:
df <- data.frame(json = c('{"client":"ABC Company","totalUSD":7110.0000,"durationDays":731,"familySize":4,"assignmentType":"Long Term","homeLocation":"Australia","hostLocation":"United States","serviceName":"Service ABC","homeLocationGeoLat":-25.274398,"homeLocationGeoLng":133.775136,"hostLocationGeoLat":37.09024,"hostLocationGeoLng":-95.712891}', '{"client":"ABC Company","totalUSD":7110.0000,"durationDays":731,"familySize":4,"assignmentType":"Long Term","homeLocation":"Australia","hostLocation":"United States","serviceName":"Service XYZ","homeLocationGeoLat":-25.274398,"homeLocationGeoLng":133.775136,"hostLocationGeoLat":37.09024,"hostLocationGeoLng":-95.712891}'))
I am trying to parse the JSON into a data.frame using fromJSON from the rjson package.
I cast the column as character type and then attempt to parse:
> df$json <- as.character(df$json)
> final <- fromJSON(json_str = df$json)
However, it only seems to give me the first row of JSON, whereas I expect 2 rows.
How can I parse the JSON into a data.frame from df$json?

You probably want a resultant data frame from this exercise, so:
do.call(rbind.data.frame, lapply(df$json, rjson::fromJSON))
## client totalUSD durationDays familySize assignmentType homeLocation hostLocation serviceName homeLocationGeoLat
## 2 ABC Company 7110 731 4 Long Term Australia United States Service ABC -25.2744
## 21 ABC Company 7110 731 4 Long Term Australia United States Service XYZ -25.2744
## homeLocationGeoLng hostLocationGeoLat hostLocationGeoLng
## 2 133.7751 37.09024 -95.71289
## 21 133.7751 37.09024 -95.71289
The exact same results will come from:
do.call(rbind.data.frame, lapply(df$json, jsonlite::fromJSON))
do.call(rbind.data.frame, lapply(df$json, RJSONIO::fromJSON))

Related

From list of json files to data.table: partial variable list

I have a list of more than 100,000 json files from which I want to get a data.table with only a few variables. Unfortunately the files are complex. The content of each json file looks like:
Sample 1
$id
[1] "10.1"
$title
$title$value
[1] "Why this item"
$itemsource
$itemsource$id
[1] "AA"
$date
[1] "1992-01-01"
$itemType
[1] "art"
$creators
list()
Sample 2
$id
[1] "10.2"
$title
$title$value
[1] "We need this item"
$itemsource
$itemsource$id
[1] "AY"
$date
[1] "1999-01-01"
$itemType
[1] "art"
$creators
type name firstname surname affiliationIds
1 Person Frank W. Cornell. Frank W. Cornell. a1
2 Person David A. Chen. David A. Chen. a1
$affiliations
id name
1 a1 Foreign Affairs Desk, New York Times
What I need from this set of files is a table with creator names, item ids and dates. For the two sample files above:
id date name firstname lastname creatortype
"10.1" "1992-01-01" NA NA NA NA
"10.2" "1999-01-01" Frank W. Cornell. Frank W. Cornell. Person
"10.2" "1999-01-01" David A. Chen. David A. Chen. Person
What I have done so far:
library(parallel)
library(data.table)
library(jsonlite)
library(dplyr)
filelist = list.files(pattern="*.json",recursive=TRUE,include.dirs =TRUE)
parsed = mclapply(filelist, function(x) fromJSON(x),mc.cores=24)
data = rbindlist(mclapply(1:length(parsed), function(x) {
a = data.table(item = parsed[[x]]$id, date = list(list(parsed[[x]]$date)), name = list(list(parsed[[x]]$name)), creatortype = list(list(parsed[[x]]$creatortype))) #ignoring the firstname/lastname fields here for convenience
b = data.table(id = a$item, date = unlist(a$date), name=unlist(a$name), creatortype=unlist(a$creatortype))
return(b)
},mc.cores=24))
However, on the last step, I get this error:
"Error in rbindlist(mclapply(1:length(parsed), function(x){:
Item 1 of list is not a data.frame, data.table or list"
Thanks in advance for your suggestions.
Related questions include:
Extract data from list of lists [R]
R convert json to list to data.table
I want to convert JSON file into data.table in r
How can read files from directory using R?
Convert R data table column from JSON to data table
from the error message, i suppose this basically means that one of the results from mclapply() is empty, by empty I mean either NULL or data.table with 0 row, or simply encounters an error within the parallel processing.
what you could do is:
add more checks inside the mclapply() like try-error or check the class of b and nrow of b, whether b is empty or not
when you use rbindlist, add argument fill = T
hope this solves ur problem.

JSON data to dataframe in R

I have a json file from which i am importing the data
myList = rjson::fromJSON(file = "JsData.json")
myList
[[1]]
[[1]]$key
[1] "type1|new york, ny|NYC|hit"
[[1]]$doc_count
[1] 12
[[2]]
[[2]]$key
[2] "type1|omaha, ne|Omaha|hit"
[[2]]$doc_count
[2] 8
But when I am trying to convert to a data frame by function below ,
do.call(rbind, lapply(myList, data.frame))
I am getting an error.-
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 0
I need to parse this data so that it can be used in excel csv. I looked at the solution for Getting imported json data into a data frame in R but the output is not coming in the proper usable format in excel.
And the JsData.json sample data looks like this:
[{"key":"type1|new york, ny|NYC|hit","doc_count":12},
{"key":"type1|omaha, ne|Omaha|hit","doc_count":8},
{"key":"type2|yuba city, ca|Yuba|hit","doc_count":9}]
You can try :
require(jsonlite)
s ='[{"key":"type1|new york, ny|NYC|hit","doc_count":12},
.......
"key":"type2|yuba city, ca|Yuba|hit","doc_count":9}]'
df <- fromJSON(s)
df
key doc_count
1 type1|new york, ny|NYC|hit 12
2 type1|omaha, ne|Omaha|hit 8
3 type2|yuba city, ca|Yuba|hit 9
I don't know how you want to deal wiyh you key .....

Importing Data from a json file in R

I have a json data file from which I want to import in R. I tried searching for similar blogs but either they are getting data from URLs or the syntax gave errors.
Let's say the name of the json file is "Jsdata.json"
How can i get the data from Jsdata.json to R and convert it into the excel/csv format for a better picture.
To confirm, this is the output using rjson package. The file parameter has to be explicitly specified here, otherwise the function will treat it as a json string and throw an error.
myList = rjson::fromJSON(file = "JsData.json")
myList
# [[1]]
# [[1]]$key
# [1] "type1|new york, ny|NYC|hit"
#
# [[1]]$doc_count
# [1] 12
# [[2]]
# [[2]]$key
# [1] "type1|omaha, ne|Omaha|hit"
# [[2]]$doc_count
# [1] 8
# [[3]]
# [[3]]$key
# [1] "type2|yuba city, ca|Yuba|hit"
# [[3]]$doc_count
# [1] 9
In order to convert this to data frame, you can do:
do.call(rbind, lapply(myList, data.frame))
# key doc_count
# 1 type1|new york, ny|NYC|hit 12
# 2 type1|omaha, ne|Omaha|hit 8
# 3 type2|yuba city, ca|Yuba|hit 9
Write the data frame as csv using write.csv(..., sep = "\t") and configure your excel so that the delimiter matches your sep here should work.
And the JsData.json data looks like this:
[{"key":"type1|new york, ny|NYC|hit","doc_count":12},
{"key":"type1|omaha, ne|Omaha|hit","doc_count":8},
{"key":"type2|yuba city, ca|Yuba|hit","doc_count":9}]

Importing/Conditioning a file.txt with a "kind" of json structure in R

I wanted to import a .txt file in R but the format is really special and it's looks like a json format but I don't know how to import it. There is an example of my data:
{"datetime":"2015-07-08 09:10:00","subject":"MMM","sscore":"-0.2280","smean":"0.2593","svscore":"-0.2795","sdispersion":"0.375","svolume":"8","sbuzz":"0.6026","lastclose":"155.430000000","companyname":"3M Company"},{"datetime":"2015-07-07 09:10:00","subject":"MMM","sscore":"0.2977","smean":"0.2713","svscore":"-0.7436","sdispersion":"0.400","svolume":"5","sbuzz":"0.4895","lastclose":"155.080000000","companyname":"3M Company"},{"datetime":"2015-07-06 09:10:00","subject":"MMM","sscore":"-1.0057","smean":"0.2579","svscore":"-1.3796","sdispersion":"1.000","svolume":"1","sbuzz":"0.4531","lastclose":"155.380000000","companyname":"3M Company"}
To deal with this is used this code:
test1 <- read.csv("C:/Users/test1.txt", header=FALSE)
## Import as 5 observations (5th is all empty) of 1700 variables
#(in fact 40 observations of 11 variables). In fact when I imported the
#.txt file, it's having one line (5th obs) empty, and 4 lines of data and
#placed next to each other 4 lines of data of 11 variables.
# Get the different lines
part1=test1[1:10]
part2=test1[11:20]
part3=test1[21:30]
part4=test1[31:40]
...
## Remove the empty line (there were an empty line after each)
part1=part1[-5,]
part2=part2[-5,]
part3=part3[-5,]
...
## Rename the columns
names(part1)=c("Date Time","Subject","Sscore","Smean","Svscore","Sdispersion","Svolume","Sbuzz","Last close","Company name")
names(part2)=c("Date Time","Subject","Sscore","Smean","Svscore","Sdispersion","Svolume","Sbuzz","Last close","Company name")
names(part3)=c("Date Time","Subject","Sscore","Smean","Svscore","Sdispersion","Svolume","Sbuzz","Last close","Company name")
...
## Assemble data to have one dataset
data=rbind(part1,part2,part3,part4,part5,part6,part7,part8,part9,part10)
## Formate Date Time
times <- as.POSIXct(data$`Date Time`, format='{datetime:%Y-%m-%d %H:%M:%S')
data$`Date Time` <- times
## Keep only the Date
data$Date <- as.Date(times)
## Formate data - Remove text
data$Subject <- gsub("subject:", "", data$Subject)
data$Sscore <- gsub("sscore:", "", data$Sscore)
...
So My code is working to reinstate the data but it's maybe very difficult and more long I know there is better ways to do it, so if you could help me with that I would be very grateful.
There are many packages that read JSON, e.g. rjson, jsonlite, RJSONIO (they will turn in up a google search) - just pick one and give it a go.
e.g.
library(jsonlite)
json.text <- '{"datetime":"2015-07-08 09:10:00","subject":"MMM","sscore":"-0.2280","smean":"0.2593","svscore":"-0.2795","sdispersion":"0.375","svolume":"8","sbuzz":"0.6026","lastclose":"155.430000000","companyname":"3M Company"},{"datetime":"2015-07-07 09:10:00","subject":"MMM","sscore":"0.2977","smean":"0.2713","svscore":"-0.7436","sdispersion":"0.400","svolume":"5","sbuzz":"0.4895","lastclose":"155.080000000","companyname":"3M Company"},{"datetime":"2015-07-06 09:10:00","subject":"MMM","sscore":"-1.0057","smean":"0.2579","svscore":"-1.3796","sdispersion":"1.000","svolume":"1","sbuzz":"0.4531","lastclose":"155.380000000","companyname":"3M Company"}'
x <- fromJSON(paste0('[', json.text, ']'))
datetime subject sscore smean svscore sdispersion svolume sbuzz lastclose companyname
1 2015-07-08 09:10:00 MMM -0.2280 0.2593 -0.2795 0.375 8 0.6026 155.430000000 3M Company
2 2015-07-07 09:10:00 MMM 0.2977 0.2713 -0.7436 0.400 5 0.4895 155.080000000 3M Company
3 2015-07-06 09:10:00 MMM -1.0057 0.2579 -1.3796 1.000 1 0.4531 155.380000000 3M Company
I paste the '[' and ']' around your JSON because you have multiple JSON elements (the rows in the dataframe above) and for this to be well-formed JSON it needs to be an array, i.e. [ {...}, {...}, {...} ] rather than {...}, {...}, {...}.

Is it possible to write a table to a file in JSON format in R?

I'm making word frequency tables with R and the preferred output format would be a JSON file. sth like
{
"word" : "dog",
"frequency" : 12
}
Is there any way to save the table directly into this format? I've been using the write.csv() function and convert the output into JSON but this is very complicated and time consuming.
set.seed(1)
( tbl <- table(round(runif(100, 1, 5))) )
## 1 2 3 4 5
## 9 24 30 23 14
library(rjson)
sink("json.txt")
cat(toJSON(tbl))
sink()
file.show("json.txt")
## {"1":9,"2":24,"3":30,"4":23,"5":14}
or even better:
set.seed(1)
( tab <- table(letters[round(runif(100, 1, 26))]) )
a b c d e f g h i j k l m n o p q r s t u v w x y z
1 2 4 3 2 5 4 3 5 3 9 4 7 2 2 2 5 5 5 6 5 3 7 3 2 1
sink("lets.txt")
cat(toJSON(tab))
sink()
file.show("lets.txt")
## {"a":1,"b":2,"c":4,"d":3,"e":2,"f":5,"g":4,"h":3,"i":5,"j":3,"k":9,"l":4,"m":7,"n":2,"o":2,"p":2,"q":5,"r":5,"s":5,"t":6,"u":5,"v":3,"w":7,"x":3,"y":2,"z":1}
Then validate it with http://www.jsonlint.com/ to get pretty formatting. If you have multidimensional table, you'll have to work it out a bit...
EDIT:
Oh, now I see, you want the dataset characteristics sink-ed to a JSON file. No problem, just give us a sample data, and I'll work on a code a bit. Practically, you need to carry out the data into desirable format, hence convert it to JSON. list should suffice. Give me a sec, I'll update my answer.
EDIT #2:
Well, time is relative... it's a common knowledge... Here you go:
( dtf <- structure(list(word = structure(1:3, .Label = c("cat", "dog",
"mouse"), class = "factor"), frequency = c(12, 32, 18)), .Names = c("word",
"frequency"), row.names = c(NA, -3L), class = "data.frame") )
## word frequency
## 1 cat 12
## 2 dog 32
## 3 mouse 18
If dtf is a simple data frame, yes, data.frame, if it's not, coerce it! Long story short, you can do:
toJSON(as.data.frame(t(dtf)))
## [1] "{\"V1\":{\"word\":\"cat\",\"frequency\":\"12\"},\"V2\":{\"word\":\"dog\",\"frequency\":\"32\"},\"V3\":{\"word\":\"mouse\",\"frequency\":\"18\"}}"
I though I'll need some melt with this one, but simple t did the trick. Now, you only need to deal with column names after transposing the data.frame. t coerces data.frames to matrix, so you need to convert it back to data.frame. I used as.data.frame, but you can also use toJSON(data.frame(t(dtf))) - you'll get X instead of V as a variable name. Alternatively, you can use regexp to clean the JSON file (if needed), but it's a lousy practice, try to work it out by preparing the data.frame.
I hope this helped a bit...
These days I would typically use the jsonlite package.
library("jsonlite")
toJSON(mydatatable, pretty = TRUE)
This turns the data table into a JSON array of key/value pair objects directly.
RJSONIO is a package "that allows conversion to and from data in Javascript object notation (JSON) format". You can use it to export your object as a JSON file.
library(RJSONIO)
writeLines(toJSON(anobject), "afile.JSON")