I've been trying to import a JSON file into R for a while using both the rjson and RJSONIO package, but I can't get it to work. One of the variations of code I used:
json_file <- "http://toolserver.org/~emw/index.php?c=rawdata&m=get_traffic_data&p1=USA&project1=en&from=12/10/2007&to=4/1/2011"
json_data <- fromJSON(paste(readLines(json_file), collapse=""))
This results in an error message:
Error in fromJSON(paste(readLines(json_file), collapse = "")) :
unexpected character '<'
I think the problem is in the first line of code, because json_file contains the source of the website, not the actual content. I've tried getURL() and getURLContent(), but without any success. Any help would be much appreciated!
Edit: as pointed out by Martin Morgan, the problem seems to be with the URL, not the code!
library(rjson)
fromJSON(readLines('http://toolserver.org/~emw/index.php?c=rawdata&m=get_traffic_data&p1=USA&project1=en&from=12/10/2007&to=4/1/2011')[1])
works for me, with a warning
Related
So, I'm using the following code to get pandas to read my JSON text file-
f = open('C:/Users/stans/WFH Project/data.json')
data = json.load(f)
df = pd.DataFrame(data, index=[0])
f.close()
Once I execute the cell, I get
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
1535: character maps to
I used the above coding for a smaller sample of JSON data and it worked. But, since I updated the file to include a much larger sample, I get that error.
I verified that the JSON format is correct, and I also tried in the open statement-
encoding='utf-8'
and
errors='ignore'
Both produced value errors. Any ideas? Thanks in advance for your help!
I am using the Factiva Package ‘tm.plugin.factiva’ to import html files containing a Factiva search. It has worked beautifully so far, but now I have a problem with importing data and constructing a corpus from several html files (350 in total). I cannot figure out how to write a loop to iterate the simple step-by-step import code I have used before.
Earlier, with a smaller sample, I have managed to import the html files an a step-by-step process:
library(R.temis)
library(tm)
library(tm.plugin.factiva)
# Import corpus
source1 <- FactivaSource("Factiva1.html")
source2 <- FactivaSource("Factiva2.html")
source3 <- FactivaSource("Factiva3.html")
corp_source1 <- Corpus(source1, list(language=NA))
corp_source2 <- Corpus(source2, list(language=NA))
corp_source3 <- Corpus(source3, list(language=NA))
full_corpus <- c(corp_source1, corp_source2, corp_source3)
However, this is obviously not an option for the 350 html files. I have tried writing a loop for the import:
# Import corpus
files <- list.files(my_path)
for (i in files){
source <- FactivaSource(i)
}
tech_corpus <- Corpus(source, list(language=NA))
And:
htmlFiles <- Sys.glob("Factiva*.html")
for (k in 1:lengths(htmlFiles[[k]])){
source <- FactivaSource(htmlFiles[[k]])
}
But both of these only reads the first html file into the source, not the rest.
I have also tried:
for (k in seq_along(htmlFiles)){
source <- FactivaSource(htmlFiles[1:k], encoding = "UTF-8", format = c("HTML"))
}
But then I get the error that:
Error: x must be a string of length 1. I have tried manipulating the htmlFiles into a list (by: html_list <- as.list(htmlFiles)), but no change in result.
The two loops that did work, but only for the first html file.
I got the same result when I tried looping constructing the corpus as well.
for (m in 1:lengths(htmlFiles)){
corp_source <- Corpus(htmlFiles[[m]], list(language=NA))
}
Which worked, but only for the first html file. But then I get the error:
In 1:lengths(htmlFiles) :
numerical expression has 5 elements: only the first used
I would highly appreciate any help to understand how to get around this issue. Ideally, a loop to repeat the step-by-step process I did in the beginning would be super, as it seems to me that neither the FactivaSource() or Corpus() likes the complications I have made here - but I am far from an expert. Any help will be highly appreciated!
I am trying to import a json file in r.
Have installed necessary packages & libraries.
But keep on getting an error.
Error: lexical error: invalid string in json text.
temp.jsonl
(right here) ------^
My code is as below -
library(rjson)
library(jsonlite)
library(RJSONIO)
install.packages("rjson")
install.packages("RJSONIO")
json_data_raw<-fromJSON("temp.jsonl")
Thanks
To read lines of JSON you can use the jsonlite::stream_in function.
df <- jsonlite::stream_in(file("temp.jsonl"))
Reference
The jsonlite stream_in and stream_out functions implement line-by-line processing of JSON data over a connection, such as a socket, url, file or pipe
JSON streaming in R
dat <- fromJSON(sprintf("[%s]", paste(readLines(filepath), collapse=",")))
Was able to fix the issue and create dataframe in r.
Thanks guys.
I have a large .json file and I only want to read in a part of it.
I tried the the following solutions but they didn´t work:
yelp <- stream_in(file("yelp_academic_dataset_review.json"), paigesize = 500)
yelp <- stream_in(file("yelp_academic_dataset_review.json"), nrows = 500)
Anyone know how it works?
First off- always helpful to provide the packages you are using, in your case jsonlite.
One solution is parsing the data file (as a .txt file) prior to streaming it in.
yelp <- readLines("yelp_academic_dataset_review.json")[1:500]
yelp <- stream_in(textConnection(gsub("\\n", "", yelp)))
I'm assuming your file is local?
I have had success with actual piping/streaming json in the past. Ie, from the command line,
cat x.json | parse_json.py
Then you write your python script:
import json,sys
for line in sys.stdin:
js_line = json.loads(line.rstrip())
try:
# do something with js_line['x']['y']
except ValueError:
pass
I'm not sure why you want to use stream_in, but this somewhat manual approach can be effective
I use this code for extracting 1400001 to 1450000 lines of yelp:
setwd("d:/yelp_dataset")
rm(list=ls())
library(jsonlite)
rev<- 'd:/yelp_dataset/review.JSON'
revu<-jsonlite::stream_in(textConnection(readLines(rev)[1400001:1450000],verbose=F)
For my PhD project I need to read a JSONL file into R (the extension isn't json, is jsonl) and transform it in a csv.
I tried to use this code based on jsonlite but it gives me an error:
library(jsonlite)
data <- "/Users/chiarap/tweets.jsonl"
dat <- fromJSON(sprintf("[%s]", paste(readLines(data), collapse=",")))
Error: parse error: unallowed token at this point in JSON text
EFEF","notifications":null}},,,{"in_reply_to_status_id_str":
(right here) ------^
Thanks
If you have a large file, pasting all of the rows together may result in errors. You can process each line separately and then combine them into a data frame.
library(jsonlite)
library(dplyr)
lines <- readLines("myfile.jsonl")
lines <- lapply(lines, fromJSON)
lines <- lapply(lines, unlist)
x <- bind_rows(lines)
Then x is a data frame that you can continue to work with or write to file.