Postgres psql output text wrapping when converting to json - json

I have a strange behavior in psql that makes long base64 encoded text break with newline when converting to json string
I encode my text in base64 as followed:
db=> select encode('-------------------------------------------------------------------'::bytea, 'base64');
encode
------------------------------------------------------------------------------
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t+
LS0tLS0tLS0tLQ==
Here the output text is wrapped, that's not actually a problem.
But then if I convert this base64 encoded text to a json string using to_json():
db=> select to_json(encode('-------------------------------------------------------------------'::bytea, 'base64')::text);
to_json
--------------------------------------------------------------------------------------------------
"LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t\nLS0tLS0tLS0tLQ=="
Here the base64 encoded text in json has a new line character (\n) near the end which totally breaks the base64 code when decoding (Thanks Laurenz Albe for the correction!). The new line character gives me trouble later in my program and I'm searching a solution to fix it in psql.
I've tried using the /pset format command or setting PAGER="less -SF" psql ... (from other stackoverflow issue: Disable wrapping in Psql output) but without success.
The only solution I've found (and a very dirty one) is to do:
db=> select to_json(regexp_replace((select to_json(encode('----------------------------------------------------------'::bytea, 'base64')::text))::text, '(\\n|")', '', 'g'));
to_json
------------------------------------------------------------------------------------
"LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQ=="
Here I convert to JSON string (with to_json()) then I remove the JSON string quotes and new line characters (with regexp_replace()) and then re-converting to JSON again to get the expected result.

Related

How do I convince Splunk that a backslash inside a CSV field is not an escape character?

I have the following row in a CSV file that I am ingesting into a Splunk index:
"field1","field2","field3\","field4"
Excel and the default Python CSV reader both correctly parse that as 4 separate fields. Splunk does not. It seems to be treating the backslash as an escape character and interpreting field3","field4 as a single mangled field. It is my understanding that the standard escape character for double quotes inside a quoted CSV field is another double quote, according to RFC-4180:
"If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote."
Why is Splunk treating the backslash as an escape character, and is there any way to change that configuration via props.conf or any other way? I have set:
INDEXED_EXTRACTIONS = csv
KV_MODE = none
for this sourcetype in props.conf, and it is working fine for rows without backslashes in them.
UPDATE: Yeah so Splunk's CSV parsing is indeed not RFC-4180 compliant, and there's not really any workaround that I could find. In the end I changed the upstream data pipeline to output JSON instead of CSVs for ingestion by Splunk. Now it works fine. Let this be a cautionary tale if anyone stumbles across this question while trying to parse CSVs in Splunk!

How to escape single quotes in json string? JSON::ParserError Ruby

I'm getting
/json/common.rb:156:in `parse': 783: unexpected token at '' (JSON::ParserError) while trying to parse a json file in ruby. The problem was because there were some single quotes in one of the strings:
parsed = JSON.parse("{
\"key1\":\"value1\",
\"key2\":\"value2\",
\"key3\":12345,
\"key4\":\"''value4''\",
}")
Is there a way to escape the single quotes in the strings without affect words like don't? The json is read from a file using JSON.parse(file.get_input_stream.read) that's why there are \.
The single quotes aren't your problem, your problem is that you have a stray trailing comma:
parsed = JSON.parse("{
\"key1\":\"value1\",
\"key2\":\"value2\",
\"key3\":12345,
\"key4\":\"''value4''\",
}") #--------------------^ This should not be there.
JSON doesn't allow that comma so you don't actually have a JSON file.
You should figure out where the file came from and fix that tool to produce real JSON rather than the "looks mostly like JSON" that is currently being written to the file.

How to get PostgreSQL to escape text from jsonb_array_element?

I'm loading some JSON from Postgres 13 into Elasticsearch using Logstash and ran into some errors caused by text not being escaped with reverse solidus. I tracked my problem down to this behavior:
SELECT
json_build_object(
'literal_text', 'abc\ndef'::text,
'literal_text_type', pg_typeof('abc\ndef'::text),
'text_from_jsonb_array_element', a->>0,
'jsonb_array_element_type', pg_typeof(a->>0)
)
FROM jsonb_array_elements('["abc\ndef"]') jae (a);
{
"literal_text": "abc\\ndef",
"literal_text_type": "text",
"text_from_jsonb_array_element": "abc\ndef",
"jsonb_array_element_type":"text"
}
db-fiddle
json_build_object encodes the literal text as expected (turning \n into \\n); however, it doesn't encode the text retrieved via jsonb_array_element even though both are text.
Why is the text extracted from jsonb_array_element being treated differently (not getting escaped by jsonb_build_object)? I've tried casting, using jsonb_array_elements_text (though my actual use case involves an array of arrays, so I need to split to a set of jsonb), and various escaping/encoding/formatting functions, but haven't found a solution yet.
Is there a trick to cast text pulled from jsonb_array_element so it will get properly encoded by jsonb_build_object?
Thanks for any hints or solutions.
Those strings look awfully similar, but they're actually different. When you create a string literal like '\n', that's a backslash character followed by an "n" character. So when you put that into json_build_object, it needs to add a backslash to escape the backslash you're giving it.
On the other hand, when you call jsonb_array_elements('["abc\ndef"]'), you're saying that the JSON has precisely a \n encoded in it with no second backslash, and therefore when it's converted to text, that \n is interpreted as a newline character, not two separate characters. You can see this easily by running the following:
SELECT a->>0 FROM jsonb_array_elements('["abc\ndef"]') a;
?column?
----------
abc +
def
(1 row)
On encoding that back into a JSON, you get a single backslash again, because it's once again encoding a newline character.
If you want to escape it with an extra backslash, I suggest a simple replace:
SELECT
json_build_object(
'text_from_jsonb_with_replace', replace(a->>0, E'\n', '\n')
)
FROM jsonb_array_elements('["abc\ndef"]') jae (a);
json_build_object
------------------------------------------------
{"text_from_jsonb_with_replace" : "abc\\ndef"}

Strip backslashes from encoded JSON response

Building a Json respose with erlang. First I construct the data in terms and then use jsx to convert it to JSON:
Response = jsx:term_to_json(MealsListResponse),
The response actually is valid JSON according to the validators I have used:
The problem is when parsing the response in the front end. Is there a way to strip the backslashes from the Erlang side, so that the will not appear on the payload response?
The backslashes are not actually part of the string. They're just used when the string is printed as a term - that is, in the same way you'd write it in an Erlang source file. This works in the same way as character escapes in strings in C and similar languages: inside double quotes, double quotes that should be part of the string need to be escaped with backslashes, but the backslashes don't actually make it into the string.
To print the string without character escapes, you can use the ~s directive of io:format:
io:format("~s~n", [Response]).
If you're sending the response over a TCP socket, all you need to do is converting the string to binary with an appropriate Unicode conversion. Most of the time you'll want UTF-8, which you can get with:
gen_tcp:send(MySocket, unicode:characters_to_binary(Response)).

Json parsing with unicode characters

i have a json file with unicode characters, and i'm having trouble to parse it. I've tried in Flash CS5, the JSON library, and i have tried it in http://json.parser.online.fr/ and i always get "unexpected token - eval fails"
I'm sorry, there realy was a problem with the syntax, it came this way from the client.
Can someone please help me? Thanks
Quoth the RFC:
JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.
So a correctly encoded Unicode character should not be a problem. Which leads me to believe that it's not correctly encoded (maybe it uses latin-1 instead of UTF-8). How did you create the file? In a text editor?
There might be an obscure Unicode whitespace character hidden in your string.
This URL contains more detail:
http://timelessrepo.com/json-isnt-a-javascript-subset
In asp.net you would think you would use System.Text.Encoding to convert a string like "Paul\u0027s" back to a string like "Paul's" but i tried for hours and found nothing that worked.
The trouble is hardcoding a string as shown above already decodes the string as you will see if you put a break point on it so in the end i wrote a function that converts the Hex27 to Dec39 so that i ended up with HTML encodeing and then decoded that.
string Padding = "000";
for (int f = 1; f <= 256; f++)
{
string Hex = "\\u" + Padding.Substring(0, 4 - f.ToString().Length) + f;
string Dec = "&#" + Int32.Parse(f.ToString(), NumberStyles.HexNumber) + ";";
HTML = HTML.Replace(Hex, Dec);
}
HTML = System.Web.HttpUtility.HtmlDecode(HTML);
Ugly as sin, I know but without using the latest framework (Not on ISP's server) it was the best I could do and someone must know a better solution.
I had the same problem and I just change the file encoding type Mac-Roman/windows-1252 to UTF-8.. and it worked
I had the same problem with Twitter json files. I was parsing them in Python with json.loads(tweet) but it failed for half of the records.
I changed to Python3 and it works well now.
If you seem to have trouble with the encoding of a JSON file (i.e. escaped codes such as \u00fc aren't displayed correctly regardless of your editor's encoding setting) generated by Python with json.dump s(): it encodes ASCII by default and escapes the unicode characters! See python json unicode - how do I eval using javascript (and python: json.dumps can't handle utf-8? and Why does json.dumps escape non-ascii characters with "\uxxxx").