Error with jsonlite package in R - html

Has anyone ever received this error when trying to web scrape a site:
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
lexical error: invalid char in json text.
<!doctype html><html xmlns="htt
(right here) ------^
I do not understand why I am receiving this error when I scraped the first page of the site no problem with the same declaration on that first page. But the second page I get this error. Is there a way around this?
This works fine:
jsonlite::fromJSON("https://www.reddit.com/r/BestOfStreamingVideo/.json", flatten = TRUE)
Get the error here:
jsonlite::fromJSON("https://www.reddit.com/r/BestOfStreamingVideo/?count=25&after=t3_5fvgls/.json", flatten = TRUE)

The latter one does not return a JSON. It returns HTML. Enter both URL's in the browser and you'll see the difference.
I guess the URL you are looking for is:
https://www.reddit.com/r/BestOfStreamingVideo/.json?count=25&after=t3_5fvgls/
You need to put the ./json first and append the URL parameter after.

Related

Problems importing JSON data with R

I'm using the jsonlite library to try to get some JSON data and turn it into a data frame that I could use, but so far it hasn't worked. I've tried three methods so far:
testData = fromJSON("<json file location>")
Which outputs:
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
parse error: trailing garbage
n_reply_to_status_id":null} {"contributors":null,"text":"Te
(right here) ------^
So I figured that if the quotes caused error, I'd just have to remove them:
singleString <- paste(readLines("<json file location>"), collapse=" ")
singleString = gsub('"', "",singleString)
singleString = fromJSON(singleString)
Which outputs:
Error: lexical error: invalid char in json text.
{contributors:null,text:A colour
(right here) ------^
It seems to be pointing to an 'o' in the text. If I delete that 'o' it just points to the 'n' that comes after it.
My last attempt I read on a forum post of someone having a similar problem that it should solve it:
stream = stream_in(file("<json file location>"))
I was pretty happy to see that this did not return an error and that a data frame named "stream" had been added to memory, but when I tried to view that data frame...
> View(stream)
Error in View : 'names' attribute [1] must be the same length as the vector [0]
I'm not exactly sure what this means or what I could do about it in this situation.
If you have an idea of I could get this data loaded correctly, I would deeply appreciate it. Thanks!
EDIT: Here is the JSON data in question: https://gist.github.com/geocachecs/f68d769aeed8e019a26cc230559bbf7f

R - JSON Returned from RCurl::getURL has Special Characters that make it invalid with fromJSON

How can make sure that the result of my getURL() call is properly formatted to be parsed using from JSON?
Details
If I take the string for api URL and paste that into Chrome, then copy and paste out the resulting JSON, RJSONIO::fromJSON() will parse it. However, if I pass the variable test, as in my code below, to fromJSON(), I get this error:
Error in fromJSON(content, handler, default.size, depth, allowComments, :
invalid JSON input
In going through the differences between the two, I found some issues encoding escaped character sequences such as "\\\"\\\\\\\"" which I am able to search for and replace. However there are some other things, where for example, the broken JSON will show " " while the working JSON will show "\u00A0".
library(RJSONIO)
library(RCurl)
apiURL= #sorry I can't post the actual URL due to company security policies
test<-getURL(apiURL,userpwd="myusername:mypassword",httpauth=1L)
transcripts1 <- fromJSON(test)

JSON parsing error, invalid character

I am using fromJSON from the jsonlite package in [R] to call GetPlayerSummaries from the Steam API (https://developer.valvesoftware.com/wiki/Steam_Web_API) to get access to a user's data. For most calls it's working fine, but at some point I get an error:
Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
lexical error: invalid bytes in UTF8 string.
publicâ„¢ II: The Sith Lordsâ", "gameid": "208580" },
(right here) ------^
When I access the call in my browser I find a � on the spot where it is probably giving the error. I could Try-Catch but I'd really like to get this data. How to get around this?
For my purpose, reading with readLines and then parsing it seemed to work
readlines <- readLines(link, warn = FALSE)
parse <- fromJSON(readlines)
I have no idea why and how this works, and may hence be not the most clean solution, but it seems to be robust for my purposes.
You have to use use jsonlite's streaming function
json_file <- stream_in(file("abc.json"))
It has been answered in Stack Overflow here:
Error parsing JSON file with the jsonlite package
and here:
Export JSON from Spark and input into R

how to prevent \r\n and the likes in JSON string

I am getting text from MySQL via a laravel model and converting the content to JSON. Unfortunately the text contains new lines and carriage returns, is there any way to get the contents properly escaped?
Article::find(1)->toJSON();
Gives me an Ajax/JSON error in my view so I was wondering where the problem is. Am I either storing the content the wrong way or am I retrieving it the wrong way?
Thanks.
This is the JSON string I am getting for a test article:
{"id":22,"short_title":"Another test article","long_title":"Longer title for the test article mention in the short title","description":"This article describes a computer classroom where strange things happen and stuf","content":"This is a test article and I think this will generate the error I am looking for. <\/p>\r\n\r\n Maybe or maybe not. <\/p>\r\n","deleted_at":null,"created_at":"2014-04-25 09:10:45","updated_at":"2014-04-25 09:10:45","category_id":1,"zenra_link":"","published_on":null,"published":1,"source_url":"http:\/\/blog.livedoor.jp\/morisitakurumi\/archives\/51856280.html","source_title":"Some source article","source_date":null,"slug":"another-test-article","view_count":0}
The content portion is generated by a textarea that's running CKEditor and it gets saved to a MySQL MEDIUMTEXT field.
Next thing I do is that I inject this into my view to populate my backbone views like this:
<script>var initialData = JSON.parse('{{ $cJson }}');</script>
And that's where the console tells me: Uncaught SyntaxError: Unexpected Token. As suggested I tested the above string on jsonlint.com and it comes back as valid.
I figured out the problem:
<script>var initialData = JSON.parse('{{ $cJson }}');</script>
This is where the error happens. JSON.parse freaks out on the string I supply. If I remove JSON.parse it works like this:
<script>var initialData = {{ $cJson }};</script>
No more unexpected token errors. I was on the wrong track with the problem because I read on so many places that the carriage returns and new lines produce errors in the JSON format.

Parsing Facebook Json results

I'm trying to use the following code to parse a response from the Facebook API, and I get a strange error... below is the code and error... Any thoughts are greatly appreciated!
Link to code: How to get most popular Facebook post in R
The error I get is:
Error in myposts[[1]]$paging$"next" : $ operator is invalid for atomic vectors
Here is what the data looks like:
tail(myposts[[1]])
results in:
$paging
previous
"https://graph.facebook.com/74133697733/posts?access_token=###token###&limit=25&since=1361736602&__previous=1"
next
"https://graph.facebook.com/74133697733/posts?access_token=###token###&limit=25&until=1359727199"
It's because the paging element is a named vector, not a list, so you can't use $ to get sub-elements.
The following should work :
myposts[[1]]$paging["next"]