I have recently started using R and have a task regarding parsing json in R to get a non-json format. For this, i am using the "fromJSON()" function. I have tried to parse json as a text file. It runs successfully when i do it with just a single row entry. But when I try it with multiple row entries, i get the following error:
fromJSON("D:/Eclairs/Printing/test3.txt")
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
lexical error: invalid char in json text.
[{'CategoryType':'dining','City':
(right here) ------^
> fromJSON("D:/Eclairs/Printing/test3.txt")
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
parse error: trailing garbage
"mumbai","Location":"all"}] [{"JourneyType":"Return","Origi
(right here) ------^
> fromJSON("D:/Eclairs/Printing/test3.txt")
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
parse error: after array element, I expect ',' or ']'
:"mumbai","Location":"all"} {"JourneyType":"Return","Origin
(right here) ------^
The above errors are due to three different formats in which i tried to parse the json text, but the result was the same, only the location suggested by changed.
Please help me to identify the cause of this error or if there is a more efficient way o performing the task.
The original file that i have is an excel sheet with multiple columns and one of those columns consists of json text. The way i tried right now is by extracting just the json column and converting it to a tab separated text and then parsing it as:
fromJSON("D:/Eclairs/Printing/test3.txt")
Please also suggest if this can be done more efficiently. I need to map all the columns in the excel to the non-json text as well.
Example:
[{"CategoryType":"dining","City":"mumbai","Location":"all"}]
[{"CategoryType":"reserve-a-table","City":"pune","Location":"Kothrud,West Pune"}]
[{"Destination":"Mumbai","CheckInDate":"14-Oct-2016","CheckOutDate":"15-Oct-2016","Rooms":"1","NoOfPax":"3","NoOfAdult":"3","NoOfChildren":"0"}]
Consider reading in the text line by line with readLines(), iteratively saving the JSON dataframes to a growing list:
library(jsonlite)
con <- file("C:/Path/To/Jsons.txt", open="r")
jsonlist <- list()
while (length(line <- readLines(con, n=1, warn = FALSE)) > 0) {
jsonlist <- append(jsonlist, list(fromJSON(line)))
}
close(con)
jsonlist
# [[1]]
# CategoryType City Location
# 1 dining mumbai all
# [[2]]
# CategoryType City Location
# 1 reserve-a-table pune Kothrud,West Pune
# [[3]]
# Destination CheckInDate CheckOutDate Rooms NoOfPax NoOfAdult NoOfChildren
# 1 Mumbai 14-Oct-2016 15-Oct-2016 1 3 3 0
Related
I have the following JSON file:
{"id":1140854908,"name":"'Amran"}
{"id":1140852651,"name":"'Asir"}
{"id":1140855190,"name":"'Eua"}
{"id":1140851307,"name":"A Coruna"}
{"id":1140854170,"name":"A`Ana"}
I used the package jsonlite but I get a parsing error
library(jsonlite)
try <- fromJSON("states.txt",simplifyDataFrame = T)
# Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
# parse error: trailing garbage
# :1140854908,"name":"'Amran"} {"id":1140852651,"name":"'Asir"
# (right here) ------^
Try changing your data file to below
[
{"id":1140854908,"name":"'Amran"}
,{"id":1140852651,"name":"'Asir"}
,{"id":1140855190,"name":"'Eua"}
,{"id":1140851307,"name":"A Coruna"}
,{"id":1140854170,"name":"A`Ana"}
]
The same code worked for me.. It is looking for an array..
Your file is a newline delimited JSON (http://ndjson.org/). You can read it with jsonlite like this:
try <- stream_in(file("states.txt"))
I have a josn file I'm working with that contains multiple json objects in a single file. R is unable to read the file as a whole. But since each object occurs at regular intervals, I would like to iteratively read a fixed number of lines into R.
There are a number of SO questions on reading single lines into R but I have been unable to extend these solutions to a fixed number of lines. For my problem I need to read 16 lines into R at a time (eg 1-16, 17-32 etc)
I have tried using a loop but can't seem to get the syntax right:
## File
file <- "results.json"
## Create connection
con <- file(description=file, open="r")
## Loop over a file connection
for(i in 1:1000) {
tmp <- scan(file=con, nlines=16, quiet=TRUE)
data[i] <- fromJSON(tmp)
}
The file contains over 1000 objects of this form:
{
"object": [
[
"a",
0
],
[
"b",
2
],
[
"c",
2
]
]
}
With #tomtom inspiration I was able to find a solution.
## File
file <- "results.json"
## Loop over a file
for(i in 1:1000) {
tmp <- paste(scan(file=file, what="character", sep="\n", nlines=16, skip=(i-1)*16, quiet=TRUE),collapse=" ")
assign(x = paste("data", i, sep = "_"), value = fromJSON(tmp))
}
I couldn't create a connection as each time I tried the connection would close before the file had been completely read. So I got rid of that step.
I had to include the what="character" variable as scan() seems to expect a number by default.
I included sep="\n", paste() and collapse=" " to create a single string rather than the vector of characters that scan() creates by default.
Finally I just changed the final assignment operator to have a bit more control over the names of the output.
This might help:
EDITED to make it use a list and Reduce into one file
## Loop over a file connection
data <- NULL
for(i in 1:1000) {
tmp <- scan(file=con, nlines=16, skip=(i-1)*16, quiet=TRUE)
data[[i]] <- fromJSON(tmp)
}
df <- Reduce(function(x, y) {paste(x, y, collapse = " ")})
You would have to make sure that you don't reach further than the end of the file though ;-)
I am trying to access the JSON response from an API call in my R script. The API call is succesful, and I can view the JSON response in the console. However, I am unable to access any data from it.
A sample code segment is:
require(httr)
target <- '#trump'
sentence<- 'Donald trump has a wonderful toupe, it really is quite stunning that a man can be so refined and elegant'
query <- url_encode(sentence)
target <- gsub('#', '', target)
endpoint <- "https://alchemy.p.mashape.com/text/TextGetTargetedSentiment?outputMode=json&target="
apiCall <- paste(endpoint, target, '&text=', query, sep = '')
resp <-GET(apiCall, add_headers("X-Mashape-Key" = sentimentKey, "Accept" = "application/json"))
stop_for_status(resp)
headers(resp)
str(content(resp))
content(resp, "text")
I followed examples in the httr quickstart guide from CRAN (here) as well as this stack.
Unfortunately, I keep getting either "unused parameters 'text' in content()" or "no definition exists for content() accepting a class of 'response.' Does anyone have any advice? PS the headers will print, and resp$content will print the raw bitstream
Expanding on the comment, you need to set the content type explicitly in the call to content(...). Since your code is not reproducible, here is an example using the Census Bureau's geocoder (which returns a json response).
library(httr)
url <- "http://geocoding.geo.census.gov/geocoder/locations/onelineaddress"
resp <-GET(url, query=list(address="1600 Pennsylvania Avenue, Washington DC",
benchmark=9,
format="json"))
json <- content(resp, type="application/json")
json$result$addressMatches[[1]]$coordinates
# $x
# [1] -77.038025
#
# $y
# [1] 38.898735
Assuming your are actually getting a json response, and that it is well-formed, simply using content(resp, type="application/json") should work.
I am trying to read very huge json file using R , and I am using the RJSON library with this commend json_data <- fromJSON(paste(readLines("myfile.json"), collapse=""))
The problem is that I am getting this error message
Error in paste(readLines("myfile.json"), collapse = "") :
could not allocate memory (2383 Mb) in C function 'R_AllocStringBuffer'
Can anyone help me with this issue
Well, just sharing my experience about read json file. the progress of
I am trying to read 52.8MB,19.7MB,1.3GB,93.9MB,158.5MB json files cost me 30minutes and finally auto resume R session, after that tried to apply parallel computing and would like to see the progress but failed.
https://github.com/hadley/plyr/issues/265
And then I tried to add the parameter pagesize = 10000, its work and more efficient then ever. Well, we only need read once and later save as RData/Rda/Rds format as by saveRDS.
> suppressPackageStartupMessages(library('BBmisc'))
> suppressAll(library('jsonlite'))
> suppressAll(library('plyr'))
> suppressAll(library('dplyr'))
> suppressAll(library('stringr'))
> suppressAll(library('doParallel'))
>
> registerDoParallel(cores=16)
>
> ## https://www.kaggle.com/c/yelp-recsys-2013/forums/t/4465/reading-json-files-with-r-how-to
> ## https://class.coursera.org/dsscapstone-005/forum/thread?thread_id=12
> fnames <- c('business','checkin','review','tip','user')
> jfile <- paste0(getwd(),'/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_',fnames,'.json')
> dat <- llply(as.list(jfile), function(x) stream_in(file(x),pagesize = 10000),.parallel=TRUE)
> dat
list()
> jfile
[1] "/home/ryoeng/Coursera-Data-Science-Capstone/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_business.json"
[2] "/home/ryoeng/Coursera-Data-Science-Capstone/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_checkin.json"
[3] "/home/ryoeng/Coursera-Data-Science-Capstone/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_review.json"
[4] "/home/ryoeng/Coursera-Data-Science-Capstone/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_tip.json"
[5] "/home/ryoeng/Coursera-Data-Science-Capstone/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_user.json"
> dat <- llply(as.list(jfile), function(x) stream_in(file(x),pagesize = 10000),.progress='=')
opening file input connection.
Imported 61184 records. Simplifying into dataframe...
closing file input connection.
opening file input connection.
Imported 45166 records. Simplifying into dataframe...
closing file input connection.
opening file input connection.
Found 470000 records...
I got the same problem while working with huge datasets in R.I had used jsonlite package in R for reading json in R.I had used the following code to read json in R:
library(jsonlite)
get_tweets <- stream_in(file("tweets.json"),pagesize = 10000)
here tweets.json is the my file name and the location where it exists,pagesize represents how many number of lines it reads in one iteration.Hope it helps.
For some reason the above solutions all caused R to terminate or worse.
This solution worked for me, with the same data set:
library(jsonlite)
file_name <- 'C:/Users/Downloads/yelp_dataset/yelp_dataset~/dataset/business.JSON'
business<-jsonlite::stream_in(textConnection(readLines(file_name, n=100000)),verbose=F)
Took about 15 minutes
I have several functions that I am trying to implement in R(studio). I will show the simplest one. I am trying to append names on to a vector for later use as a col.name.
# Initialize
headerA <- vector(mode="character",length=20)
headerA[1]="source";headerA[2]="matches"
# Function - add on new name
h <- function(df, compareA, compareB) {
new_header <- paste(compareA,"Vs",compareB,sep="_")
data.frame(df,new_header)
}
# Comparison 1:
compareA <-"AA"
compareB <-"BB"
headers <- (headerA, compareA, compareB)
But I am getting this error and it is very puzzling. I have googled it but the search is too vague/broad.
When run I get:
headers <- (headerA, compareA, compareB)
Error: unexpected ',' in "headers <- (headerA,"
The second error for the other function is similar...
It looks like you're missing a call to your function h and just have an open ( instead:
headers <- h(headerA, compareA, compareB)
Results in:
df new_header
1 source AA_Vs_BB
2 matches AA_Vs_BB
3 AA_Vs_BB
4 AA_Vs_BB
...