Unescaping Characters in a JSON response string - json

I made a JSON request that gives me a string that uses Unicode character codes that looks like:
s = "\u003Cp\u003E"
And I want to convert it to:
s = "<p>"
What's the best way to do this in Python?
Note, this is the same question as this one, only in Python except Ruby. I am also using the Posterous API.

>>> "\\u003Cp\\u003E".decode('unicode-escape')
u'<p>'

If the data came from JSON, the json module should already have decoded these escapes for you:
>>> import json
>>> json.loads('"\u003Cp\u003E"')
u'<p>'

EDIT: The original question "Unescaping Characters in a String with Python" did not clarify if the string was to be written or to be read (later on, the "JSON response" words were added, to clarify the intention was to read).
So I answered the opposite question: how to write JSON serialized data dumping them to a unescaped string (rather than loading data from the string).
My use case was producing a JSON file from my own data dictionary, but the file contained scaped non-ASCII characters. So I did it like this:
with open(filename,'w') as jsonfile:
jsonstr = json.dumps(myDictionary, ensure_ascii=False)
print(jsonstr) # to screen
jsonfile.write(jsonstr) # to file
If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is.
Taken from here: https://docs.python.org/3/library/json.html

Related

option for \u instead of Unicode replacement

If I run this Go code:
package main
import (
"encoding/json"
"os"
)
func main() {
json.NewEncoder(os.Stdout).Encode("\xa1") // "\ufffd"
}
I lose data, since once the Unicode replacement is done, I can no longer get
back the original value. Compare with this Python code:
import json
a = '\xa1'
b = json.dumps(a) # "\u00a1"
print(json.loads(b) == a) # True
no replacement is done, so no data is lost. In addition, the resultant JSON is
still valid. Does Go have some method to encode JSON string with escaping
instead of replacement?
This example is a false equivalence. The '\xa1' is a valid Unicode string in Python, it's just one possible representation like '\u00a1' or '\U000000a1' or chr(0xa1) or '\N{INVERTED EXCLAMATION MARK}' or '¡' or ...
The equivalent in Python code would be:
>>> print(json.dumps(b'\xa1'.decode(errors='replace')))
"\ufffd"
Which is also printing an ascii representation of the coerced REPLACEMENT CHARACTER on stdout, the same as in Go.
This is because "\xa1" is not a valid Unicode string. It contains the byte 0xa1, which is not valid (not valid by itself). The not valid byte gets replaced with U+FFFD, which is the “replacement character”—used when the input is invalid.
If you want to encode the Unicode character U+00A1, write it as "\u00a1". If you want to make arbitrary data go round-trip through JSON, you will have to represent it a different way (like base64 encoding it, for example).
Python just works differently—in Python, the \xa1 escape sequence is U+00A1. Again, in Go, \xa1 is the byte 0xa1, which is not a valid Unicode string by itself and cannot be encoded as a JSON string.

Dump Chinese data into a json file

I am falling on a problem, while dumping a chinese data (non-latin language data) into a json file.
I am trying to store list into a json file with the following code;
with open("file_name.json","w",encoding="utf8") as file:
json.dump(edits,file)
It will dumped without any errors.
When i am viewing a file, it will look like this,
[{sentence: \u5979\u7d30\u5c0f\u8072\u5c0d\u6211\u8aaa\uff1a\u300c\u6211\u501f\u4f60\u4e00\u679d\u925b\u7b46\u3002\u300d}...]
And I also tried out, without encoding option.
with open("file_name.json","w") as file:
json.dump(edits,file)
My question is, why my json file look like this, and how to dump my json file with having chinese string instead of unicode string.
Any helps would be appreciated. Thanks : )
Check out the docs for json.dump.
Specifically, it has a switch ensure_ascii that if set to False should make the function not escape the characters.
If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is.

json.dumps() problem with non-English characters

I'm trying to serialize a Python dict to JSON format for storing to a database.
obj = json.dumps({
'name':'کوروش'
}, sort_keys=True, indent=-1, ensure_ascii=False).encode('utf-8')
but it's not encoded to my native language. I changed the ensure_ascii to false but that did not work. What am I doing wrong? I searched in Stack Overflow questions about non-English characters but they all said you must set ensure_ascii=False.

How to write a list of string as double quotes, so that json can load?

Consider the following code, where the json can't load back because after the manipulation, the single quotes become double quotes, how can write to file as double quote list so that json can load back?
import configparser
import json
config = configparser.ConfigParser()
config.read("config.ini")
l = json.loads(config.get('Basic', 'simple_list'))
new_config = configparser.ConfigParser()
new_config.add_section("Basic")
new_config.set('Basic', 'simple_list', str(l))
with open("config1.ini", 'w') as f:
new_config.write(f)
config = configparser.ConfigParser()
config.read("config1.ini")
l = json.loads(config.get('Basic', 'simple_list'))
The settings.ini file content is like this:
[Basic]
simple_list = ["a", "b"]
As already mentionned by #L3viathan, the purely technical answer is "use json.dumps() instead of str()" (and yes, it works for dicts too).
BUT: storing json in an ini file is very bad idea. "ini" is a file format on it's own (even if not as strictly specified as json or yaml) and it has been designed to be user-editable with just any text editor. FWIW, the simple canonical way to store "lists" in an ini file is simply to store them as comma separated values, ie:
[Basic]
simple_list = a,b
and parse this back when reading the config as
values = config.get('Basic', 'simple_list')).split(",")
wrt/ "storing dicts", an ini file IS already a (kind of) dict since it's based on key:value pairs. It's restricted to two levels (sections and keys), but here again that's by design - it's a format designed for end-users, not for programmers.
Now if the ini forma doesn't suits your needs, nothing prevents you from just using a json (or yaml) file instead for the whole config

Change single backslash in R character string to valid JSON string

I have a string in R which escapes quotation marks:
my_text = {\"stim\":[\"platery\",\"denial\",\"generic\"]}
When using cat() I get:
{"stim":["platery","denial","generic"]}
Now my whole string is a JSON string that needs to be parsed and is evaluated invalid by JSONLint. If I copy&paste the cat() version, this is valid a JSON, so I think I just miss some pre-processing here.
I saw this SO post here, and this one, and this really good one, so I tried to replace the single quotation marks with double quotation marks for the JSON parser:
gsub("\\\\", "\\\\\\\\", my_text, fixed=TRUE)
but it did't change my string as I wanted. How can I change the string to become valid JSON?
As Wiktor said your gsub didn't work because you are attempting to replace backslashes, but there aren't any backslashes in your string. R is just using the backslashes as a way to store the double quotes. The third SO post you link does a good job explaining R's string literals which addresses this. A backslash in R is stored as a double backslash.
My first piece of advice would be to use the R package jsonlite to construct your JSON from an R object as opposed to a string if possible (heres the vignette).
Example:
myJSON <- jsonlite::toJSON(list(stim=c("platery","denial","generic")))
# {"stim":["platery","denial","generic"]}
Second, (as the third SO post again does a good job of explaining) copying/pasting the print method of the string may not be the best way to validate the JSON. I'm not sure of the use case, but R storing the double quotes with escape characters is probably not a bad thing.
If you want to get a prettier print method you can use numerous tricks in R (noquote(), cat(), print(quote=F)) but this won't change that R stores the double quotes with backslashes:
Additionally, in some cases constructing the JSON isn't necessary. I have an API built using the plumbr package that returns a list as JSON without any modifications (recJSON <- list(message=messages,recommendations=list(name=names, link=URLs)))