JSON data to dataframe in R - json

I have a json file from which i am importing the data
myList = rjson::fromJSON(file = "JsData.json")
myList
[[1]]
[[1]]$key
[1] "type1|new york, ny|NYC|hit"
[[1]]$doc_count
[1] 12
[[2]]
[[2]]$key
[2] "type1|omaha, ne|Omaha|hit"
[[2]]$doc_count
[2] 8
But when I am trying to convert to a data frame by function below ,
do.call(rbind, lapply(myList, data.frame))
I am getting an error.-
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 0
I need to parse this data so that it can be used in excel csv. I looked at the solution for Getting imported json data into a data frame in R but the output is not coming in the proper usable format in excel.
And the JsData.json sample data looks like this:
[{"key":"type1|new york, ny|NYC|hit","doc_count":12},
{"key":"type1|omaha, ne|Omaha|hit","doc_count":8},
{"key":"type2|yuba city, ca|Yuba|hit","doc_count":9}]

You can try :
require(jsonlite)
s ='[{"key":"type1|new york, ny|NYC|hit","doc_count":12},
.......
"key":"type2|yuba city, ca|Yuba|hit","doc_count":9}]'
df <- fromJSON(s)
df
key doc_count
1 type1|new york, ny|NYC|hit 12
2 type1|omaha, ne|Omaha|hit 8
3 type2|yuba city, ca|Yuba|hit 9
I don't know how you want to deal wiyh you key .....

Related

From list of json files to data.table: partial variable list

I have a list of more than 100,000 json files from which I want to get a data.table with only a few variables. Unfortunately the files are complex. The content of each json file looks like:
Sample 1
$id
[1] "10.1"
$title
$title$value
[1] "Why this item"
$itemsource
$itemsource$id
[1] "AA"
$date
[1] "1992-01-01"
$itemType
[1] "art"
$creators
list()
Sample 2
$id
[1] "10.2"
$title
$title$value
[1] "We need this item"
$itemsource
$itemsource$id
[1] "AY"
$date
[1] "1999-01-01"
$itemType
[1] "art"
$creators
type name firstname surname affiliationIds
1 Person Frank W. Cornell. Frank W. Cornell. a1
2 Person David A. Chen. David A. Chen. a1
$affiliations
id name
1 a1 Foreign Affairs Desk, New York Times
What I need from this set of files is a table with creator names, item ids and dates. For the two sample files above:
id date name firstname lastname creatortype
"10.1" "1992-01-01" NA NA NA NA
"10.2" "1999-01-01" Frank W. Cornell. Frank W. Cornell. Person
"10.2" "1999-01-01" David A. Chen. David A. Chen. Person
What I have done so far:
library(parallel)
library(data.table)
library(jsonlite)
library(dplyr)
filelist = list.files(pattern="*.json",recursive=TRUE,include.dirs =TRUE)
parsed = mclapply(filelist, function(x) fromJSON(x),mc.cores=24)
data = rbindlist(mclapply(1:length(parsed), function(x) {
a = data.table(item = parsed[[x]]$id, date = list(list(parsed[[x]]$date)), name = list(list(parsed[[x]]$name)), creatortype = list(list(parsed[[x]]$creatortype))) #ignoring the firstname/lastname fields here for convenience
b = data.table(id = a$item, date = unlist(a$date), name=unlist(a$name), creatortype=unlist(a$creatortype))
return(b)
},mc.cores=24))
However, on the last step, I get this error:
"Error in rbindlist(mclapply(1:length(parsed), function(x){:
Item 1 of list is not a data.frame, data.table or list"
Thanks in advance for your suggestions.
Related questions include:
Extract data from list of lists [R]
R convert json to list to data.table
I want to convert JSON file into data.table in r
How can read files from directory using R?
Convert R data table column from JSON to data table
from the error message, i suppose this basically means that one of the results from mclapply() is empty, by empty I mean either NULL or data.table with 0 row, or simply encounters an error within the parallel processing.
what you could do is:
add more checks inside the mclapply() like try-error or check the class of b and nrow of b, whether b is empty or not
when you use rbindlist, add argument fill = T
hope this solves ur problem.

Error while trying to parse json into R

I have recently started using R and have a task regarding parsing json in R to get a non-json format. For this, i am using the "fromJSON()" function. I have tried to parse json as a text file. It runs successfully when i do it with just a single row entry. But when I try it with multiple row entries, i get the following error:
fromJSON("D:/Eclairs/Printing/test3.txt")
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
lexical error: invalid char in json text.
[{'CategoryType':'dining','City':
(right here) ------^
> fromJSON("D:/Eclairs/Printing/test3.txt")
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
parse error: trailing garbage
"mumbai","Location":"all"}] [{"JourneyType":"Return","Origi
(right here) ------^
> fromJSON("D:/Eclairs/Printing/test3.txt")
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
parse error: after array element, I expect ',' or ']'
:"mumbai","Location":"all"} {"JourneyType":"Return","Origin
(right here) ------^
The above errors are due to three different formats in which i tried to parse the json text, but the result was the same, only the location suggested by changed.
Please help me to identify the cause of this error or if there is a more efficient way o performing the task.
The original file that i have is an excel sheet with multiple columns and one of those columns consists of json text. The way i tried right now is by extracting just the json column and converting it to a tab separated text and then parsing it as:
fromJSON("D:/Eclairs/Printing/test3.txt")
Please also suggest if this can be done more efficiently. I need to map all the columns in the excel to the non-json text as well.
Example:
[{"CategoryType":"dining","City":"mumbai","Location":"all"}]
[{"CategoryType":"reserve-a-table","City":"pune","Location":"Kothrud,West Pune"}]
[{"Destination":"Mumbai","CheckInDate":"14-Oct-2016","CheckOutDate":"15-Oct-2016","Rooms":"1","NoOfPax":"3","NoOfAdult":"3","NoOfChildren":"0"}]
Consider reading in the text line by line with readLines(), iteratively saving the JSON dataframes to a growing list:
library(jsonlite)
con <- file("C:/Path/To/Jsons.txt", open="r")
jsonlist <- list()
while (length(line <- readLines(con, n=1, warn = FALSE)) > 0) {
jsonlist <- append(jsonlist, list(fromJSON(line)))
}
close(con)
jsonlist
# [[1]]
# CategoryType City Location
# 1 dining mumbai all
# [[2]]
# CategoryType City Location
# 1 reserve-a-table pune Kothrud,West Pune
# [[3]]
# Destination CheckInDate CheckOutDate Rooms NoOfPax NoOfAdult NoOfChildren
# 1 Mumbai 14-Oct-2016 15-Oct-2016 1 3 3 0

Importing Data from a json file in R

I have a json data file from which I want to import in R. I tried searching for similar blogs but either they are getting data from URLs or the syntax gave errors.
Let's say the name of the json file is "Jsdata.json"
How can i get the data from Jsdata.json to R and convert it into the excel/csv format for a better picture.
To confirm, this is the output using rjson package. The file parameter has to be explicitly specified here, otherwise the function will treat it as a json string and throw an error.
myList = rjson::fromJSON(file = "JsData.json")
myList
# [[1]]
# [[1]]$key
# [1] "type1|new york, ny|NYC|hit"
#
# [[1]]$doc_count
# [1] 12
# [[2]]
# [[2]]$key
# [1] "type1|omaha, ne|Omaha|hit"
# [[2]]$doc_count
# [1] 8
# [[3]]
# [[3]]$key
# [1] "type2|yuba city, ca|Yuba|hit"
# [[3]]$doc_count
# [1] 9
In order to convert this to data frame, you can do:
do.call(rbind, lapply(myList, data.frame))
# key doc_count
# 1 type1|new york, ny|NYC|hit 12
# 2 type1|omaha, ne|Omaha|hit 8
# 3 type2|yuba city, ca|Yuba|hit 9
Write the data frame as csv using write.csv(..., sep = "\t") and configure your excel so that the delimiter matches your sep here should work.
And the JsData.json data looks like this:
[{"key":"type1|new york, ny|NYC|hit","doc_count":12},
{"key":"type1|omaha, ne|Omaha|hit","doc_count":8},
{"key":"type2|yuba city, ca|Yuba|hit","doc_count":9}]

How to parse JSON with fromJSON on a dataframe column?

I have the following data.frame with one column called "json" and two rows of JSON data:
df <- data.frame(json = c('{"client":"ABC Company","totalUSD":7110.0000,"durationDays":731,"familySize":4,"assignmentType":"Long Term","homeLocation":"Australia","hostLocation":"United States","serviceName":"Service ABC","homeLocationGeoLat":-25.274398,"homeLocationGeoLng":133.775136,"hostLocationGeoLat":37.09024,"hostLocationGeoLng":-95.712891}', '{"client":"ABC Company","totalUSD":7110.0000,"durationDays":731,"familySize":4,"assignmentType":"Long Term","homeLocation":"Australia","hostLocation":"United States","serviceName":"Service XYZ","homeLocationGeoLat":-25.274398,"homeLocationGeoLng":133.775136,"hostLocationGeoLat":37.09024,"hostLocationGeoLng":-95.712891}'))
I am trying to parse the JSON into a data.frame using fromJSON from the rjson package.
I cast the column as character type and then attempt to parse:
> df$json <- as.character(df$json)
> final <- fromJSON(json_str = df$json)
However, it only seems to give me the first row of JSON, whereas I expect 2 rows.
How can I parse the JSON into a data.frame from df$json?
You probably want a resultant data frame from this exercise, so:
do.call(rbind.data.frame, lapply(df$json, rjson::fromJSON))
## client totalUSD durationDays familySize assignmentType homeLocation hostLocation serviceName homeLocationGeoLat
## 2 ABC Company 7110 731 4 Long Term Australia United States Service ABC -25.2744
## 21 ABC Company 7110 731 4 Long Term Australia United States Service XYZ -25.2744
## homeLocationGeoLng hostLocationGeoLat hostLocationGeoLng
## 2 133.7751 37.09024 -95.71289
## 21 133.7751 37.09024 -95.71289
The exact same results will come from:
do.call(rbind.data.frame, lapply(df$json, jsonlite::fromJSON))
do.call(rbind.data.frame, lapply(df$json, RJSONIO::fromJSON))

Importing/Conditioning a file.txt with a "kind" of json structure in R

I wanted to import a .txt file in R but the format is really special and it's looks like a json format but I don't know how to import it. There is an example of my data:
{"datetime":"2015-07-08 09:10:00","subject":"MMM","sscore":"-0.2280","smean":"0.2593","svscore":"-0.2795","sdispersion":"0.375","svolume":"8","sbuzz":"0.6026","lastclose":"155.430000000","companyname":"3M Company"},{"datetime":"2015-07-07 09:10:00","subject":"MMM","sscore":"0.2977","smean":"0.2713","svscore":"-0.7436","sdispersion":"0.400","svolume":"5","sbuzz":"0.4895","lastclose":"155.080000000","companyname":"3M Company"},{"datetime":"2015-07-06 09:10:00","subject":"MMM","sscore":"-1.0057","smean":"0.2579","svscore":"-1.3796","sdispersion":"1.000","svolume":"1","sbuzz":"0.4531","lastclose":"155.380000000","companyname":"3M Company"}
To deal with this is used this code:
test1 <- read.csv("C:/Users/test1.txt", header=FALSE)
## Import as 5 observations (5th is all empty) of 1700 variables
#(in fact 40 observations of 11 variables). In fact when I imported the
#.txt file, it's having one line (5th obs) empty, and 4 lines of data and
#placed next to each other 4 lines of data of 11 variables.
# Get the different lines
part1=test1[1:10]
part2=test1[11:20]
part3=test1[21:30]
part4=test1[31:40]
...
## Remove the empty line (there were an empty line after each)
part1=part1[-5,]
part2=part2[-5,]
part3=part3[-5,]
...
## Rename the columns
names(part1)=c("Date Time","Subject","Sscore","Smean","Svscore","Sdispersion","Svolume","Sbuzz","Last close","Company name")
names(part2)=c("Date Time","Subject","Sscore","Smean","Svscore","Sdispersion","Svolume","Sbuzz","Last close","Company name")
names(part3)=c("Date Time","Subject","Sscore","Smean","Svscore","Sdispersion","Svolume","Sbuzz","Last close","Company name")
...
## Assemble data to have one dataset
data=rbind(part1,part2,part3,part4,part5,part6,part7,part8,part9,part10)
## Formate Date Time
times <- as.POSIXct(data$`Date Time`, format='{datetime:%Y-%m-%d %H:%M:%S')
data$`Date Time` <- times
## Keep only the Date
data$Date <- as.Date(times)
## Formate data - Remove text
data$Subject <- gsub("subject:", "", data$Subject)
data$Sscore <- gsub("sscore:", "", data$Sscore)
...
So My code is working to reinstate the data but it's maybe very difficult and more long I know there is better ways to do it, so if you could help me with that I would be very grateful.
There are many packages that read JSON, e.g. rjson, jsonlite, RJSONIO (they will turn in up a google search) - just pick one and give it a go.
e.g.
library(jsonlite)
json.text <- '{"datetime":"2015-07-08 09:10:00","subject":"MMM","sscore":"-0.2280","smean":"0.2593","svscore":"-0.2795","sdispersion":"0.375","svolume":"8","sbuzz":"0.6026","lastclose":"155.430000000","companyname":"3M Company"},{"datetime":"2015-07-07 09:10:00","subject":"MMM","sscore":"0.2977","smean":"0.2713","svscore":"-0.7436","sdispersion":"0.400","svolume":"5","sbuzz":"0.4895","lastclose":"155.080000000","companyname":"3M Company"},{"datetime":"2015-07-06 09:10:00","subject":"MMM","sscore":"-1.0057","smean":"0.2579","svscore":"-1.3796","sdispersion":"1.000","svolume":"1","sbuzz":"0.4531","lastclose":"155.380000000","companyname":"3M Company"}'
x <- fromJSON(paste0('[', json.text, ']'))
datetime subject sscore smean svscore sdispersion svolume sbuzz lastclose companyname
1 2015-07-08 09:10:00 MMM -0.2280 0.2593 -0.2795 0.375 8 0.6026 155.430000000 3M Company
2 2015-07-07 09:10:00 MMM 0.2977 0.2713 -0.7436 0.400 5 0.4895 155.080000000 3M Company
3 2015-07-06 09:10:00 MMM -1.0057 0.2579 -1.3796 1.000 1 0.4531 155.380000000 3M Company
I paste the '[' and ']' around your JSON because you have multiple JSON elements (the rows in the dataframe above) and for this to be well-formed JSON it needs to be an array, i.e. [ {...}, {...}, {...} ] rather than {...}, {...}, {...}.