Importing Data from a json file in R - json

I have a json data file from which I want to import in R. I tried searching for similar blogs but either they are getting data from URLs or the syntax gave errors.
Let's say the name of the json file is "Jsdata.json"
How can i get the data from Jsdata.json to R and convert it into the excel/csv format for a better picture.

To confirm, this is the output using rjson package. The file parameter has to be explicitly specified here, otherwise the function will treat it as a json string and throw an error.
myList = rjson::fromJSON(file = "JsData.json")
myList
# [[1]]
# [[1]]$key
# [1] "type1|new york, ny|NYC|hit"
#
# [[1]]$doc_count
# [1] 12
# [[2]]
# [[2]]$key
# [1] "type1|omaha, ne|Omaha|hit"
# [[2]]$doc_count
# [1] 8
# [[3]]
# [[3]]$key
# [1] "type2|yuba city, ca|Yuba|hit"
# [[3]]$doc_count
# [1] 9
In order to convert this to data frame, you can do:
do.call(rbind, lapply(myList, data.frame))
# key doc_count
# 1 type1|new york, ny|NYC|hit 12
# 2 type1|omaha, ne|Omaha|hit 8
# 3 type2|yuba city, ca|Yuba|hit 9
Write the data frame as csv using write.csv(..., sep = "\t") and configure your excel so that the delimiter matches your sep here should work.
And the JsData.json data looks like this:
[{"key":"type1|new york, ny|NYC|hit","doc_count":12},
{"key":"type1|omaha, ne|Omaha|hit","doc_count":8},
{"key":"type2|yuba city, ca|Yuba|hit","doc_count":9}]

Related

parsing nested structures in R

I have a json-like string that represents a nested structure. it is not a real json in that the names and values are not quoted. I want to parse it to a nested structure, e.g. list of lists.
#example:
x_string = "{a=1, b=2, c=[1,2,3], d={e=something}}"
and the result should be like this:
x_list = list(a=1,b=2,c=c(1,2,3),d=list(e="something"))
is there any convenient function that I don't know that does this kind of parsing?
Thanks.
If all of your data is consistent, there is a simple solution involving regex and jsonlite package. The code is:
if(!require(jsonlite, quiet=TRUE)){
#if library is not installed: installs it and loads it into the R session for use.
install.packages("jsonlite",repos="https://ftp.heanet.ie/mirrors/cran.r-project.org")
library(jsonlite)
}
x_string = "{a=1, b=2, c=[1,2,3], d={e=something}}"
json_x_string = "{\"a\":1, \"b\":2, \"c\":[1,2,3], \"d\":{\"e\":\"something\"}}"
fromJSON(json_x_string)
s <- gsub( "([A-Za-z]+)", "\"\\1\"", gsub( "([A-Za-z]*)=", "\\1:", x_string ) )
fromJSON( s )
The first section checks if the package is installed. If it is it loads it, otherwise it installs it and then loads it. I usually include this in any R code I'm writing to make it simpler to transfer between pcs/people.
Your string is x_string, we want it to look like json_x_string which gives the desired output when we call fromJSON().
The regex is split into two parts because it's been a while - I'm pretty sure this could be made more elegant. Then again, this depends on if your data is consistent so I'll leave it like this for now. First it changes "=" to ":", then it adds quotation marks around all groups of letters. Calling fromJSON(s) gives the output:
fromJSON(s)
$a
[1] 1
$b
[1] 2
$c
[1] 1 2 3
$d
$d$e
[1] "something"
I would rather avoid using JSON's parsing for the lack of extendibility and flexibility, and stick to a solution of regex + recursion.
And here is an extendable base code that parses your input string as desired
The main recursion function:
# Parse string
parse.string = function(.string){
regex = "^((.*)=)??\\{(.*)\\}"
# Recursion termination: element parsing
if(iselement(.string)){
return(parse.element(.string))
}
# Extract components
elements.str = gsub(regex, "\\3", .string)
elements.vector = get.subelements(elements.str)
# Recursively parse each element
parsed.elements = list(sapply(elements.vector, parse.string, USE.NAMES = F))
# Extract list's name and return
name = gsub(regex, "\\2", .string)
names(parsed.elements) = name
return(parsed.elements)
}
.
Helping functions:
library(stringr)
# Test if the string is a base element
iselement = function(.string){
grepl("^[^[:punct:]]+=[^\\{\\}]+$", .string)
}
# Parse element
parse.element = function(element.string){
splits = strsplit(element.string, "=")[[1]]
element = splits[2]
# Parse numeric elements
if(!is.na(as.numeric(element))){
element = as.numeric(element)
}
# TODO: Extend here to include vectors
# Reformat and return
element = list(element)
names(element) = splits[1]
return(element)
}
# Get subelements from a string
get.subelements = function(.string){
# Regex of allowed elements - Extend here to include more types
elements.regex = c("[^, ]+?=\\{.+?\\}", #Sublist
"[^, ]+?=\\[.+?\\]", #Vector
"[^, ]+?=[^=,]+") #Base element
str_extract_all(.string, pattern = paste(elements.regex, collapse = "|"))[[1]]
}
.
Parsing results:
string = "{a=1, b=2, c=[1,2,3], d={e=something}}"
string_2 = "{a=1, b=2, c=[1,2,3], d=somthing}"
named_string = "xyz={a=1, b=2, c=[1,2,3], d={e=something, f=22}}"
named_string_2 = "xyz={d={e=something, f=22}}"
parse.string(string)
# [[1]]
# [[1]]$a
# [1] 1
#
# [[1]]$b
# [1] 2
#
# [[1]]$c
# [1] "[1,2,3]"
#
# [[1]]$d
# [[1]]$d$e
# [1] "something"

Error while trying to parse json into R

I have recently started using R and have a task regarding parsing json in R to get a non-json format. For this, i am using the "fromJSON()" function. I have tried to parse json as a text file. It runs successfully when i do it with just a single row entry. But when I try it with multiple row entries, i get the following error:
fromJSON("D:/Eclairs/Printing/test3.txt")
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
lexical error: invalid char in json text.
[{'CategoryType':'dining','City':
(right here) ------^
> fromJSON("D:/Eclairs/Printing/test3.txt")
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
parse error: trailing garbage
"mumbai","Location":"all"}] [{"JourneyType":"Return","Origi
(right here) ------^
> fromJSON("D:/Eclairs/Printing/test3.txt")
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
parse error: after array element, I expect ',' or ']'
:"mumbai","Location":"all"} {"JourneyType":"Return","Origin
(right here) ------^
The above errors are due to three different formats in which i tried to parse the json text, but the result was the same, only the location suggested by changed.
Please help me to identify the cause of this error or if there is a more efficient way o performing the task.
The original file that i have is an excel sheet with multiple columns and one of those columns consists of json text. The way i tried right now is by extracting just the json column and converting it to a tab separated text and then parsing it as:
fromJSON("D:/Eclairs/Printing/test3.txt")
Please also suggest if this can be done more efficiently. I need to map all the columns in the excel to the non-json text as well.
Example:
[{"CategoryType":"dining","City":"mumbai","Location":"all"}]
[{"CategoryType":"reserve-a-table","City":"pune","Location":"Kothrud,West Pune"}]
[{"Destination":"Mumbai","CheckInDate":"14-Oct-2016","CheckOutDate":"15-Oct-2016","Rooms":"1","NoOfPax":"3","NoOfAdult":"3","NoOfChildren":"0"}]
Consider reading in the text line by line with readLines(), iteratively saving the JSON dataframes to a growing list:
library(jsonlite)
con <- file("C:/Path/To/Jsons.txt", open="r")
jsonlist <- list()
while (length(line <- readLines(con, n=1, warn = FALSE)) > 0) {
jsonlist <- append(jsonlist, list(fromJSON(line)))
}
close(con)
jsonlist
# [[1]]
# CategoryType City Location
# 1 dining mumbai all
# [[2]]
# CategoryType City Location
# 1 reserve-a-table pune Kothrud,West Pune
# [[3]]
# Destination CheckInDate CheckOutDate Rooms NoOfPax NoOfAdult NoOfChildren
# 1 Mumbai 14-Oct-2016 15-Oct-2016 1 3 3 0

Data left out when reading json online to R

I try to read online json data to R through the below codes in R:
library('jsonlite')
address<-'https://data.cityofchicago.org/resource/qnmj-8ku6.json'
sample<-fromJSON(address)
The codes did run and have results in right format of a table. But only produced 1000 observations while the original city portal database has more than 200,000 observations. I am not sure what to be fixed to download the whole dataset. Please help.
You're using the wrong link to get the data. You can see the correct link by going to 'Export'
library(jsonlite)
address <- "https://data.cityofchicago.org/api/views/qnmj-8ku6/rows.json?accessType=DOWNLOAD"
sample <- fromJSON(address)
length(sample)
# [1]
length(sample[[2]])
# [1] 274228
Although, you may want to get it as a .csv to make it easier to work with straight away?
address <- "https://data.cityofchicago.org/api/views/qnmj-8ku6/rows.csv?accessType=DOWNLOAD"
sample_csv <- read.csv(address)
nrow(sample_csv)
# [1] 274228
str(sample_csv)
# 'data.frame': 274228 obs. of 22 variables:
# $ ID : int 10512552 10517063 10517120 10518590 10518648
# $ Case.Number : Factor w/ 274219 levels "HA107183","HA156050",..
# $ Date : Factor w/ 112977 levels "01/01/2014 01:00:00 AM",..
# $ Block : Factor w/ 27499 levels "0000X E 100TH PL",..
# $ IUCR : Factor w/ 331 levels "0110","0141",..
# $ Primary.Type : Factor w/ 33 levels "ARSON","ASSAULT",..
# $ Description : Factor w/ 310 levels "$500 AND UNDER",..
# ... etc

How to parse JSON with fromJSON on a dataframe column?

I have the following data.frame with one column called "json" and two rows of JSON data:
df <- data.frame(json = c('{"client":"ABC Company","totalUSD":7110.0000,"durationDays":731,"familySize":4,"assignmentType":"Long Term","homeLocation":"Australia","hostLocation":"United States","serviceName":"Service ABC","homeLocationGeoLat":-25.274398,"homeLocationGeoLng":133.775136,"hostLocationGeoLat":37.09024,"hostLocationGeoLng":-95.712891}', '{"client":"ABC Company","totalUSD":7110.0000,"durationDays":731,"familySize":4,"assignmentType":"Long Term","homeLocation":"Australia","hostLocation":"United States","serviceName":"Service XYZ","homeLocationGeoLat":-25.274398,"homeLocationGeoLng":133.775136,"hostLocationGeoLat":37.09024,"hostLocationGeoLng":-95.712891}'))
I am trying to parse the JSON into a data.frame using fromJSON from the rjson package.
I cast the column as character type and then attempt to parse:
> df$json <- as.character(df$json)
> final <- fromJSON(json_str = df$json)
However, it only seems to give me the first row of JSON, whereas I expect 2 rows.
How can I parse the JSON into a data.frame from df$json?
You probably want a resultant data frame from this exercise, so:
do.call(rbind.data.frame, lapply(df$json, rjson::fromJSON))
## client totalUSD durationDays familySize assignmentType homeLocation hostLocation serviceName homeLocationGeoLat
## 2 ABC Company 7110 731 4 Long Term Australia United States Service ABC -25.2744
## 21 ABC Company 7110 731 4 Long Term Australia United States Service XYZ -25.2744
## homeLocationGeoLng hostLocationGeoLat hostLocationGeoLng
## 2 133.7751 37.09024 -95.71289
## 21 133.7751 37.09024 -95.71289
The exact same results will come from:
do.call(rbind.data.frame, lapply(df$json, jsonlite::fromJSON))
do.call(rbind.data.frame, lapply(df$json, RJSONIO::fromJSON))

Importing/Conditioning a file.txt with a "kind" of json structure in R

I wanted to import a .txt file in R but the format is really special and it's looks like a json format but I don't know how to import it. There is an example of my data:
{"datetime":"2015-07-08 09:10:00","subject":"MMM","sscore":"-0.2280","smean":"0.2593","svscore":"-0.2795","sdispersion":"0.375","svolume":"8","sbuzz":"0.6026","lastclose":"155.430000000","companyname":"3M Company"},{"datetime":"2015-07-07 09:10:00","subject":"MMM","sscore":"0.2977","smean":"0.2713","svscore":"-0.7436","sdispersion":"0.400","svolume":"5","sbuzz":"0.4895","lastclose":"155.080000000","companyname":"3M Company"},{"datetime":"2015-07-06 09:10:00","subject":"MMM","sscore":"-1.0057","smean":"0.2579","svscore":"-1.3796","sdispersion":"1.000","svolume":"1","sbuzz":"0.4531","lastclose":"155.380000000","companyname":"3M Company"}
To deal with this is used this code:
test1 <- read.csv("C:/Users/test1.txt", header=FALSE)
## Import as 5 observations (5th is all empty) of 1700 variables
#(in fact 40 observations of 11 variables). In fact when I imported the
#.txt file, it's having one line (5th obs) empty, and 4 lines of data and
#placed next to each other 4 lines of data of 11 variables.
# Get the different lines
part1=test1[1:10]
part2=test1[11:20]
part3=test1[21:30]
part4=test1[31:40]
...
## Remove the empty line (there were an empty line after each)
part1=part1[-5,]
part2=part2[-5,]
part3=part3[-5,]
...
## Rename the columns
names(part1)=c("Date Time","Subject","Sscore","Smean","Svscore","Sdispersion","Svolume","Sbuzz","Last close","Company name")
names(part2)=c("Date Time","Subject","Sscore","Smean","Svscore","Sdispersion","Svolume","Sbuzz","Last close","Company name")
names(part3)=c("Date Time","Subject","Sscore","Smean","Svscore","Sdispersion","Svolume","Sbuzz","Last close","Company name")
...
## Assemble data to have one dataset
data=rbind(part1,part2,part3,part4,part5,part6,part7,part8,part9,part10)
## Formate Date Time
times <- as.POSIXct(data$`Date Time`, format='{datetime:%Y-%m-%d %H:%M:%S')
data$`Date Time` <- times
## Keep only the Date
data$Date <- as.Date(times)
## Formate data - Remove text
data$Subject <- gsub("subject:", "", data$Subject)
data$Sscore <- gsub("sscore:", "", data$Sscore)
...
So My code is working to reinstate the data but it's maybe very difficult and more long I know there is better ways to do it, so if you could help me with that I would be very grateful.
There are many packages that read JSON, e.g. rjson, jsonlite, RJSONIO (they will turn in up a google search) - just pick one and give it a go.
e.g.
library(jsonlite)
json.text <- '{"datetime":"2015-07-08 09:10:00","subject":"MMM","sscore":"-0.2280","smean":"0.2593","svscore":"-0.2795","sdispersion":"0.375","svolume":"8","sbuzz":"0.6026","lastclose":"155.430000000","companyname":"3M Company"},{"datetime":"2015-07-07 09:10:00","subject":"MMM","sscore":"0.2977","smean":"0.2713","svscore":"-0.7436","sdispersion":"0.400","svolume":"5","sbuzz":"0.4895","lastclose":"155.080000000","companyname":"3M Company"},{"datetime":"2015-07-06 09:10:00","subject":"MMM","sscore":"-1.0057","smean":"0.2579","svscore":"-1.3796","sdispersion":"1.000","svolume":"1","sbuzz":"0.4531","lastclose":"155.380000000","companyname":"3M Company"}'
x <- fromJSON(paste0('[', json.text, ']'))
datetime subject sscore smean svscore sdispersion svolume sbuzz lastclose companyname
1 2015-07-08 09:10:00 MMM -0.2280 0.2593 -0.2795 0.375 8 0.6026 155.430000000 3M Company
2 2015-07-07 09:10:00 MMM 0.2977 0.2713 -0.7436 0.400 5 0.4895 155.080000000 3M Company
3 2015-07-06 09:10:00 MMM -1.0057 0.2579 -1.3796 1.000 1 0.4531 155.380000000 3M Company
I paste the '[' and ']' around your JSON because you have multiple JSON elements (the rows in the dataframe above) and for this to be well-formed JSON it needs to be an array, i.e. [ {...}, {...}, {...} ] rather than {...}, {...}, {...}.