R JSON UTF-8 parsing - json

I have an issue when trying to parse a JSON file in russian alphabet in R. The file looks like this:
[{"text": "Валера!", "type": "status"}, {"text": "когда выйдет", "type": "status"}, {"text": "КАК ДЕЛА?!)", "type": "status"}]
and it is saved in UTF-8 encoding. I tried libraries rjson, RJSONIO and jsonlite to parse it, but it doesn't work:
library(jsonlite)
allFiles <- fromJSON(txt="ru_json_example_short.txt")
gives me error
Error in feed_push_parser(buf) :
lexical error: invalid char in json text.
[{"text": "Валера!", "
(right here) ------^
When I save the file in ANSI encodieng, it works OK, but then, the Russian alphabet transforms into question marks, so the output is unusable.
Does anyone know how to parse such JSON file in R, please?
Edit: Above mentioned applies for UTF-8 file saved in Windows Notepad. When I save it in PSPad and the parse it, the result looks like this:
text type
1 <U+0412><U+0430><U+043B><U+0435><U+0440><U+0430>! status
2 <U+043A><U+043E><U+0433><U+0434><U+0430> <U+0432><U+044B><U+0439><U+0434><U+0435><U+0442> status
3 <U+041A><U+0410><U+041A> <U+0414><U+0415><U+041B><U+0410>?!) status

Try the following:
dat <- fromJSON(sprintf("[%s]",
paste(readLines("./ru_json_example_short.txt"),
collapse=",")))
dat
[[1]]
text type
1 Валера! status
2 когда выйдет status
3 КАК ДЕЛА?!) status
ref: Error parsing JSON file with the jsonlite package

Related

LuaLaTex using fontspec package and luacode reading JSON file

I'm using Latex since years but I'm new to embedded luacode (with Lualatex). Below you can see a simplified example:
\begin{filecontents*}{data.json}
[
{"firstName":"Max", "lastName":"Möller"},
{"firstName":"Anna", "lastName":"Smith"}
];
\end{filecontents*}
\documentclass[11pt]{article}
\usepackage{fontspec}
%\setmainfont{Carlito}
\usepackage{tikz}
\usepackage{luacode}
\begin{document}
\begin{luacode}
require("lualibs.lua")
local file = io.open('data.json','rb')
local jsonstring = file:read('*a')
file.close()
local jsondata = utilities.json.tolua(jsonstring)
tex.print('\\begin{tabular}{cc}')
for key, value in pairs(jsondata) do
tex.print(value["firstName"] .. ' & ' .. value["lastName"] .. '\\\\')
end
tex.print('\\hline\\end{tabular}')
\end{luacode}
\end{document}
When executing Lualatex following error occurs:
LuaTeX error [\directlua]:6: attempt to index field 'json' (a nil value) [\directlua]:6: in main chunk. \end{luacode}
When commenting the line \usepackage{fontspec} the output will be produced. Alternatively, the error can be avoided by commenting utilities.json.tolua(jsonstring) and all following lua-code lines.
So the question is: How can I use both "fontspec" package and json-data without generating an error message? Apart from this I have another question: How to enable german umlauts in output of luacode (see first "lastName" in example: Möller)?
Ah, I'm using TeX Live 2015/Debian on Ubuntu 16.04.
Thank you,
Jerome

JSON (using jsonlite) parsing error in R

I have the following JSON file:
{"id":1140854908,"name":"'Amran"}
{"id":1140852651,"name":"'Asir"}
{"id":1140855190,"name":"'Eua"}
{"id":1140851307,"name":"A Coruna"}
{"id":1140854170,"name":"A`Ana"}
I used the package jsonlite but I get a parsing error
library(jsonlite)
try <- fromJSON("states.txt",simplifyDataFrame = T)
# Error in feed_push_parser(readBin(con, raw(), n), reset = TRUE) :
# parse error: trailing garbage
# :1140854908,"name":"'Amran"} {"id":1140852651,"name":"'Asir"
# (right here) ------^
Try changing your data file to below
[
{"id":1140854908,"name":"'Amran"}
,{"id":1140852651,"name":"'Asir"}
,{"id":1140855190,"name":"'Eua"}
,{"id":1140851307,"name":"A Coruna"}
,{"id":1140854170,"name":"A`Ana"}
]
The same code worked for me.. It is looking for an array..
Your file is a newline delimited JSON (http://ndjson.org/). You can read it with jsonlite like this:
try <- stream_in(file("states.txt"))

Convert tweets into Bson using twitteR and rmongo library

Since the streamR connection API doesn't work anymore on Tweeter I try to convert the output from searchTwitter function (from TwitteR) into BSON before insert it in a mongodb database.
test.tweets = searchTwitter("mongodb", n=10, lang="en")
class(test.tweets)
test.text=laply(test.tweets,function(t) t$getText())
class(toJSON(test.text))
bson <- mongo.bson.from.JSON(test.text)
R return an error : "Error in mongo.bson.from.JSON(test.text) : Not a valid JSON content:..."
How to resolve this conversion or does exist another solution ?
Thank you
This works
library(rmongodb)
library(jsonlite)
test.text <- c("A tweet", "Another tweet")
(bson <- mongo.bson.from.JSON(toJSON(test.text)))
# 1 : 2 A tweet
# 2 : 2 Another tweet

JSONDecodeError: Expecting value: line 1 column 1

I am receiving this error in Python 3.5.1.
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Here is my code:
import json
import urllib.request
connection = urllib.request.urlopen('http://python-data.dr-chuck.net/comments_220996.json')
js = connection.read()
print(js)
info = json.loads(str(js))
If you look at the output you receive from print() and also in your Traceback, you'll see the value you get back is not a string, it's a bytes object (prefixed by b):
b'{\n "note":"This file .....
If you fetch the URL using a tool such as curl -v, you will see that the content type is
Content-Type: application/json; charset=utf-8
So it's JSON, encoded as UTF-8, and Python is considering it a byte stream, not a simple string. In order to parse this, you need to convert it into a string first.
Change the last line of code to this:
info = json.loads(js.decode("utf-8"))
in my case, some characters like " , :"'{}[] " maybe corrupt the JSON format, so use try json.loads(str) except to check your input

Getting data from JSON file in R

Lets say that I have the following json file:
{
"id": "000018ac-04ef-4270-81e6-9e3cb8274d31",
"currentCompany": "",
"currentTitle": "--",
"currentPosition": ""
}
I use the following code:
Usersfile <- ('trial.json') #where trial the json above
library('rjson')
c <- file(Usersfile,'r')
l <- readLines(c,-71L)
json <- lapply(X=l,fromJSON)
and I have the following error:
Error: parse error: premature EOF
{
(right here) ------^
But when I enter the json file(with notepad) and put the data in one line:
{"id": "000018ac-04ef-4270-81e6-9e3cb8274d31","currentCompany": "","currentTitle": "--","currentPosition": ""}
The code works fine.(In reality the file is really big to do it manually for each line). Why is this happening? How can I overcome that?
Also this one doesnt work:
{ "id": "000018ac-04ef-4270-81e6-9e3cb8274d31","currentCompany": "","currentTitle": "--","currentPosition": ""
}
EDIT: I used the following code that I could read only the first value:
library('rjson')
c <- file.path(Usersfile)
data <- fromJSON(file=c)
Surprised this was never answered! Using the jsonlite package, you can collapse your json data into one character element using paste(x, collapse="") removing EOF markers for proper import into an R dataframe. I, too, faced a pretty-printed json with exact error:
library(jsonlite)
json <- do.call(rbind,
lapply(paste(readLines(Usersfile, warn=FALSE),
collapse=""),
jsonlite::fromJSON))