R list toJSON without slash symbols - json

I have a list:
[[1]]$period
[1] "DAY"
[[1]]$dates
[1] 1.361743e+12 1.362348e+12 1.362953e+12 1.363558e+12 1.364162e+12 1.364764e+12 1.365368e+12 1.365973e+12 1.366578e+12
I want to put this list to json:
toJSON(my_list)
answer:
[
{
\"period\": \"DAY\",
\"dates\": [
1361743200000,
1362348000000,
1362952800000,
1363557600000,
1364162400000,
1364763600000,
1365368400000,
1365973200000,
1366578000000
]
}
]
The answer is with slash symbols "\".
How to get rid of slash symbols? Maybe I should apply another function, to parse my_list to json?

The slash is just R's escape character. Used in this context it allows a quotation mark without closing the string. Although it appears in R console output, it doesn't appear when writing out to a file and it and the character you are escaping are counted as a single character:
x <- "ab\"c"
x
[1] "ab\"c"
writeLines(x)
ab"c
nchar(x)
[1] 4

Related

LISP: how to properly encode a slash ("/") with cl-json?

I have code that uses the cl-json library to add a line, {"main" : "build/electron.js"} to a package.json file:
(let ((package-json-pathname (merge-pathnames *app-pathname* "package.json")))
(let
((new-json (with-open-file (package-json package-json-pathname :direction :input :if-does-not-exist :error)
(let ((decoded-package (json:decode-json package-json)))
(let ((main-entry (assoc :main decoded-package)))
(if (null main-entry)
(push '(:main . "build/electron.js") decoded-package)
(setf (cdr main-entry) "build/electron.js"))
decoded-package)))))
(with-open-file (package-json package-json-pathname :direction :output :if-exists :supersede)
(json:encode-json new-json package-json))
)
)
The code works, but the result has an escaped slash:
"main":"build\/electron.js"
I'm sure this is a simple thing, but no matter which inputs I try -- "//", "/", "#//" -- I still get the escaped slash.
How do I just get a normal slash in my output?
Also, I'm not sure if there's a trivial way for me to get pretty-printed output, or if I need to write a function that does this; right now the output prints the entire package.json file to a single line.
Special characters
The JSON Spec indicates that "Any character may be escaped.", but some of them MUST be escaped: "quotation mark, reverse solidus, and the control characters". The linked section is followed by a grammar that show "solidus" (/) in the list of escaped characters. I don't think it is really important in practice (typically it needs not be escaped), but that may explain why the library escapes this character.
How to avoid escaping
cl-json relies on an internal list of escaped characters named +json-lisp-escaped-chars+, namely:
(defparameter +json-lisp-escaped-chars+
'((#\" . #\")
(#\\ . #\\)
(#\/ . #\/)
(#\b . #\Backspace)
(#\f . #\)
(#\n . #\Newline)
(#\r . #\Return)
(#\t . #\Tab)
(#\u . (4 . 16)))
"Mapping between JSON String escape sequences and Lisp chars.")
The symbol is not exported, but you can still refer to it externally with ::. You can dynamically rebind the parameter around the code that needs to use a different list of escaped characters; for example, you can do as follows:
(let ((cl-json::+json-lisp-escaped-chars+
(remove #\/ cl-json::+json-lisp-escaped-chars+ :key #'car)))
(cl-json:encode-json-plist '("x" "1/5")))
This prints:
{"x":"1/5"}

Can't decode byte list to string list

In my users variable i store a byte list that i get from a local ldap and convert to a string list with my for loop.
I must return this list as jsonify.
If i don't use that encoding key i get a different output from the original but still encoded.
The problem is i can't access the decode method anywhere.
Any help?
users = ldap.get_group_members('ship_crew')
user_list = []
for user in users:
user_list.append((str(user, encoding='utf-8').split(",")[0].split("=")[1]))
return jsonify(user_list)
original list from users variable:
[
"cn=Philip J. Fry,ou=people,dc=planetexpress,dc=com",
"cn=Turanga Leela,ou=people,dc=planetexpress,dc=com",
"cn=Bender Bending Rodr\u00edguez,ou=people,dc=planetexpress,dc=com"
]
for loop with encoded output:
[
"Philip J. Fry",
"Turanga Leela",
"Bender Bending Rodr\u00edguez"
]
expected:
[
"Philip J. Fry",
"Turanga Leela",
"Bender Bending Rodríguez"
]
I Would use regex to extract your names:
import re
l = [
"cn=Philip J. Fry,ou=people,dc=planetexpress,dc=com",
"cn=Turanga Leela,ou=people,dc=planetexpress,dc=com",
"cn=Bender Bending Rodr\u00edguez,ou=people,dc=planetexpress,dc=com"
]
NAME_PATTERN = re.compile(r'cn=(.*?),')
result = [NAME_PATTERN.match(s).group(1) for s in l]
print(result)
Output:
['Philip J. Fry', 'Turanga Leela', 'Bender Bending Rodríguez']
Note that when you dump it to JSON the í isn't supported since by default it tries converting to ASCII so it dumps it to UTF-16 (Unicode 0x00ED):
import json
print(json.dumps(result, indent=2))
Output:
[
"Philip J. Fry",
"Turanga Leela",
"Bender Bending Rodr\u00edguez"
]
You can get around this via setting ensure_ascii=False if you want, though if you are using this in an API I would be careful and stick with ASCII with unicode encodings:
print(json.dumps(result, indent=2, ensure_ascii=False))
Output:
[
"Philip J. Fry",
"Turanga Leela",
"Bender Bending Rodríguez"
]
Your output is correct JSON. Unicode code point 00ED is í, and in JSON any character can be escaped using its Unicode code point. "\00ed" as in your JSON output is a valid way to write that character.
It would also be correct JSON to have that character without encoding it, but apparently jsonify chooses to encode it.
Any competent JSON decoder will then turn it back into a í.
If using the standard library's json.dumps you can use ensure_ascii=False to prevent this behaviour if you don't want it, but I don't know what "jsonify" is.

Unexpected behavior of gsub in R

Excuse me for not being more specific in the title, but I don't know how to explain this without an example.
I have a .html file that looks like this:
<TR><TD>log p-value:</TD><TD>-2.797e+02</TD></TR>
<TR><TD>Information Content per bp:</TD><TD>1.736</TD></TR>
<TR><TD>Number of Target Sequences with motif</TD><TD>894.0</TD></TR>
<TR><TD>Percentage of Target Sequences with motif</TD><TD>47.58%</TD></TR>
<TR><TD>Number of Background Sequences with motif</TD><TD>10864.6</TD></TR>
<TR><TD>Percentage of Background Sequences with motif</TD><TD>22.81%</TD></TR>
<TR><TD>Average Position of motif in Targets</TD><TD>402.4 +/- 261.2bp</TD></TR>
<TR><TD>Average Position of motif in Background</TD><TD>400.6 +/- 246.8bp</TD></TR>
<TR><TD>Strand Bias (log2 ratio + to - strand density)</TD><TD>-0.0</TD></TR>
<TR><TD>Multiplicity (# of sites on avg that occur together)</TD><TD>1.48</TD></TR>
I read it in:
html = readLines("file.html")
I am interested in whatever is between </TD><TD> and </TD></TR>. When I run the following, I get the result I want:
mypattern = '<TR><TD>log p-value:</TD><TD>([^<]*)</TD></TR>'
gsub(mypattern,'\\1',grep(mypattern,html,value=TRUE))
[1] "-2.797e+02"
It works well for almost all lines I want to match, but when I do the same thing for the last two lines, it does not extract anything.
mypattern = '<TR><TD>Strand Bias (log2 ratio + to - strand density)</TD><TD>([^<]*)</TD></TR>'
gsub(mypattern,'\\1',grep(mypattern,html,value=TRUE))
character(0)
mypattern = '<TR><TD>Multiplicity (# of sites on avg that occur together)</TD><TD>([^<]*)</TD></TR>'
gsub(mypattern,'\\1',grep(mypattern,html,value=TRUE))
character(0)
Why is this happening?
Thank you for your help.
If your data structure is really like this. You have a xml file with keys and values so I assume it is easier to utilize this!
library(xml2)
xd <- read_xml("file.html", as_html = TRUE)
key_values <- xml_text(xml_find_all(xd, "//td"))
is_key <- as.logical(seq_along(key_values) %% 2)
setNames(key_values[!is_key], key_values[is_key])
First, I'll say that I would actually solve this problem like this:
gsub(".+>([^<]+)</TD></TR>", "\\1", html)
#> [1] "-2.797e+02" "1.736" "894.0"
#> [4] "47.58%" "10864.6" "22.81%"
#> [7] "402.4 +/- 261.2bp" "400.6 +/- 246.8bp" "-0.0"
#> [10] "1.48"
But, to answer the question of why your way didn't work, we need to checkout the help file for R regular expressions (help("regex")):
Any metacharacter with special meaning may be quoted by preceding it with a backslash. The metacharacters in extended regular expressions are . \ | ( ) [ { ^ $ * + ? ...
The patterns that you had trouble with included parentheses, which you needed to escape (note the double backslash, since backslashes themselves need to be escaped):
mypattern = '<TR><TD>Multiplicity \\(# of sites on avg that occur together\\)</TD><TD>([^<]*)</TD></TR>'
gsub(mypattern,'\\1',grep(mypattern,html,value=TRUE))
# [1] "1.48"

Json Files parsing

So I am trying to open some json files to look for a publication year and sort them accordingly. But before doing this, I decided to experiment on a single file. I am having trouble though, because although I can get the files and the strings, when I try to print one word, it starts printinf the characters.
For example:
print data2[1] #prints
THE BRIDES ORNAMENTS, Viz. Fiue MEDITATIONS, Morall and Diuine. #results
but now
print data2[1][0] #should print THE
T #prints T
This is my code right now:
json_data =open(path)
data = json.load(json_data)
i=0
data2 = []
for x in range(0,len(data)):
data2.append(data[x]['section'])
if len(data[x]['content']) > 0:
for i in range(0,len(data[x]['content'])):
data2.append(data[x]['content'][i])
I probably need to look at your json file to be absolutely sure, but it seems to me that the data2 list is a list of strings. Thus, data2[1] is a string. When you do data2[1][0], the expected result is what you are getting - the character at the 0th index in the string.
>>> data2[1]
'THE BRIDES ORNAMENTS, Viz. Fiue MEDITATIONS, Morall and Diuine.'
>>> data2[1][0]
'T'
To get the first word, naively, you can split the string by spaces
>>> data2[1].split()
['THE', 'BRIDES', 'ORNAMENTS,', 'Viz.', 'Fiue', 'MEDITATIONS,', 'Morall', 'and', 'Diuine.']
>>> data2[1].split()[0]
'THE'
However, this will cause issues with punctuation, so you probably need to tokenize the text. This link should help - http://www.nltk.org/_modules/nltk/tokenize.html

create empty object (empty brackets) with toJSON

I need to create a JSON string from R using toJSON. My issue is that part of the JSON should contain an empty JSON object {}. I thought list() would do it for me:
> fromJSON("{}")
list()
> toJSON(list())
[1] "[]"
[Scratches head]
Anybody know how to get a {} using toJSON? I am using a lib that does the encoding, so answers that do not use toJSON will not help me.
Thanks!
There are a number of packages that have toJSON and fromJSON functions.
Using rjson::fromJSON, '{}' is read in as a list of length 0, whereas RJSONIO::fromJSON reads in {} as a named list of length 0.
In either package, calling fromJSON on a named list will do what you want.
Clearly, RJSONIO is performing as you want it to do
RJSONIO::toJSON(RJSONIO::fromJSON('{}'))
## [1] '{}'
rjson::toJSON(rjson::fromJSON('{}'))
## [1] "[]"
If you use rjson then you will have to manually set the names of the list of length 0
rjson::toJSON(setNames(rjson::fromJSON('{}'), character(0)))
## [1] "{}"