Decoding string of bytes back to bytes - json

I need to save a byte string in a json file and get it back as a bytestring.
In order to be able to dump it into the json, I had to convert the bytes to a regular string. The problem I'm having is that once I read the json and try to encode the converted bytestring, the '\' are doubled, so the strings aren't the same.
How could you do it properly? :(
Input:
salt = b'\xd5KS\xe4\x1b\xbd'
Output = b'\xd5KS\xe4\x1b\xbd'

Related

Strip backslashes from encoded JSON response

Building a Json respose with erlang. First I construct the data in terms and then use jsx to convert it to JSON:
Response = jsx:term_to_json(MealsListResponse),
The response actually is valid JSON according to the validators I have used:
The problem is when parsing the response in the front end. Is there a way to strip the backslashes from the Erlang side, so that the will not appear on the payload response?
The backslashes are not actually part of the string. They're just used when the string is printed as a term - that is, in the same way you'd write it in an Erlang source file. This works in the same way as character escapes in strings in C and similar languages: inside double quotes, double quotes that should be part of the string need to be escaped with backslashes, but the backslashes don't actually make it into the string.
To print the string without character escapes, you can use the ~s directive of io:format:
io:format("~s~n", [Response]).
If you're sending the response over a TCP socket, all you need to do is converting the string to binary with an appropriate Unicode conversion. Most of the time you'll want UTF-8, which you can get with:
gen_tcp:send(MySocket, unicode:characters_to_binary(Response)).

Python 2.7 writing strings elements (character) to a binary file

I am using Python 2.7 to access an API that returns JSON with a single key="ringtone_file" and an associated value that is an mp3 file encoded for transport via HTTP. I created a bogus mp3 file consisting of 256 bytes in order from 0x00 through 0xff and the returned file appears below.
{"ringtone_file":"\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007\b\t\n\u000b\f\r\u000e\u000f\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001a\u001b\u001c\u001d\u001e\u001f !\"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ"}
I accessed the API using the following code minus exception handing code
import requests
response = requests.get(url)
dict = response.json()
print dict
This yields the following output
{u'ringtone_file': u'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'}
What I desire to do is write each character or hex value of this string to a file in binary format. I desire the result to be a file of size 256 bytes where the first byte in the file has value 0 and the last byte has value 255. I can't change the API. Can someone suggest a reasonable way of accomplishing this with Python 2.7.
I attempted to do what was obvious to me which was to open a file for writing in binary mode and then writing the unicode string to the file. The error message from the codec indicates I can't write values between and including 128 and 255.
Since the string value is Unicode, you have to encode the string to write it to a file. The latin1 codec directly maps to the first 256 Unicode characters, so use .encode('latin1') on the string.
Example:
>>> s=u'\x00\x01\x02\xfd\xfe\xff'
>>> s
u'\x00\x01\x02\xfd\xfe\xff' # Unicode string
>>> s.encode('latin1')
'\x00\x01\x02\xfd\xfe\xff' # Now a byte string.

How to decode base64 unicode string using T-SQL

Can't decode turkish characters in base64 string.
Base64 string = "xJ/DvGnFn8Onw7bDlsOHxLDEnsOcw5w="
When I decode it must be like this : 'ğüişçöÖÇİĞÜÜ'
I try to decode like this :
SELECT CAST(
CAST(N'' AS XML).value('xs:base64Binary("xJ/DvGnFn8Onw7bDlsOHxLDEnsOcw5w=")' , 'VARBINARY(MAX)')
AS NVARCHAR(MAX)
) UnicodeEncoding ;
Based on this answer : Base64 encoding in SQL Server 2005 T-SQL
But have response like this : '鿄볃앩쎟쎧쎶쎖쒇쒰쎞쎜'
Base64 string is correct because when I try decode in Base64decode.org it works.
Is there any way to decode turkish characters?
Your base-64 encoded data contains an UTF-8 string. MS SQL doesn't support UTF-8, only UTF-16, so it fails for any characters outside of ASCII.
The solution is to either send the data as nvarchar right away, or to encode the string as UTF-16 (and send it as varbinary or base-64, as needed).
Based on Erlang documentation, this might require an external library, unicode: http://www.erlang.org/doc/apps/stdlib/unicode_usage.html
Basically, the default seems to be UTF-8, you need to specify UTF-16 manually. UTF-16 support seems a bit clunky, but it should be quite doable.

Problem with decode UTF character from JSON data

I post data from user and then convert to JSON data and store in database.
To avoid problem with escape char, I used
$jsonData = json_encode($array_json_data,JSON_HEX_APOS|JSON_HEX_QUOT);
and it converted escaped char to UXXXX char.
Now I am having problem while decoding these data.
For example how can I print quote from U0027.
use html_entity_decode to convert quotes to its actual string representation

Unescaping Characters in a JSON response string

I made a JSON request that gives me a string that uses Unicode character codes that looks like:
s = "\u003Cp\u003E"
And I want to convert it to:
s = "<p>"
What's the best way to do this in Python?
Note, this is the same question as this one, only in Python except Ruby. I am also using the Posterous API.
>>> "\\u003Cp\\u003E".decode('unicode-escape')
u'<p>'
If the data came from JSON, the json module should already have decoded these escapes for you:
>>> import json
>>> json.loads('"\u003Cp\u003E"')
u'<p>'
EDIT: The original question "Unescaping Characters in a String with Python" did not clarify if the string was to be written or to be read (later on, the "JSON response" words were added, to clarify the intention was to read).
So I answered the opposite question: how to write JSON serialized data dumping them to a unescaped string (rather than loading data from the string).
My use case was producing a JSON file from my own data dictionary, but the file contained scaped non-ASCII characters. So I did it like this:
with open(filename,'w') as jsonfile:
jsonstr = json.dumps(myDictionary, ensure_ascii=False)
print(jsonstr) # to screen
jsonfile.write(jsonstr) # to file
If ensure_ascii is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If ensure_ascii is false, these characters will be output as-is.
Taken from here: https://docs.python.org/3/library/json.html