Hive and get_json_object strange behavior - json

I am running hive query using get_json_object to read json strings from files in HDFS.
And I bumped with some strange behavior:
if the json is as follow:
{"data":{"oneSlash":"aaa\bbb","twoSlashes":"ccc\\ddd","threeSlashes":"eee\\\fff"}}
The result of the query is:
{"oneSlash":"aaabbb","twoSlashes":"ccc\\ddd","threeSlashes":"eee\\fff"}
I understand the 'oneSlash' and the 'threeSlashes' result but why 'twoSlashes' did not equal to "ccc\ddd"?
after all '\' should be unescaped to '\'
BTW the quesry is:
SELECT get_json_object(escaping_test.data, '$.data') FROM escaping_test

it's because \b and \f is valid escape characters whereas \d is not. there's a post about this in more detail: Where can I find a list of escape characters required for my JSON ajax return type?

Related

How do I convince Splunk that a backslash inside a CSV field is not an escape character?

I have the following row in a CSV file that I am ingesting into a Splunk index:
"field1","field2","field3\","field4"
Excel and the default Python CSV reader both correctly parse that as 4 separate fields. Splunk does not. It seems to be treating the backslash as an escape character and interpreting field3","field4 as a single mangled field. It is my understanding that the standard escape character for double quotes inside a quoted CSV field is another double quote, according to RFC-4180:
"If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote."
Why is Splunk treating the backslash as an escape character, and is there any way to change that configuration via props.conf or any other way? I have set:
INDEXED_EXTRACTIONS = csv
KV_MODE = none
for this sourcetype in props.conf, and it is working fine for rows without backslashes in them.
UPDATE: Yeah so Splunk's CSV parsing is indeed not RFC-4180 compliant, and there's not really any workaround that I could find. In the end I changed the upstream data pipeline to output JSON instead of CSVs for ingestion by Splunk. Now it works fine. Let this be a cautionary tale if anyone stumbles across this question while trying to parse CSVs in Splunk!

How to escape single quotes in json string? JSON::ParserError Ruby

I'm getting
/json/common.rb:156:in `parse': 783: unexpected token at '' (JSON::ParserError) while trying to parse a json file in ruby. The problem was because there were some single quotes in one of the strings:
parsed = JSON.parse("{
\"key1\":\"value1\",
\"key2\":\"value2\",
\"key3\":12345,
\"key4\":\"''value4''\",
}")
Is there a way to escape the single quotes in the strings without affect words like don't? The json is read from a file using JSON.parse(file.get_input_stream.read) that's why there are \.
The single quotes aren't your problem, your problem is that you have a stray trailing comma:
parsed = JSON.parse("{
\"key1\":\"value1\",
\"key2\":\"value2\",
\"key3\":12345,
\"key4\":\"''value4''\",
}") #--------------------^ This should not be there.
JSON doesn't allow that comma so you don't actually have a JSON file.
You should figure out where the file came from and fix that tool to produce real JSON rather than the "looks mostly like JSON" that is currently being written to the file.

How to get PostgreSQL to escape text from jsonb_array_element?

I'm loading some JSON from Postgres 13 into Elasticsearch using Logstash and ran into some errors caused by text not being escaped with reverse solidus. I tracked my problem down to this behavior:
SELECT
json_build_object(
'literal_text', 'abc\ndef'::text,
'literal_text_type', pg_typeof('abc\ndef'::text),
'text_from_jsonb_array_element', a->>0,
'jsonb_array_element_type', pg_typeof(a->>0)
)
FROM jsonb_array_elements('["abc\ndef"]') jae (a);
{
"literal_text": "abc\\ndef",
"literal_text_type": "text",
"text_from_jsonb_array_element": "abc\ndef",
"jsonb_array_element_type":"text"
}
db-fiddle
json_build_object encodes the literal text as expected (turning \n into \\n); however, it doesn't encode the text retrieved via jsonb_array_element even though both are text.
Why is the text extracted from jsonb_array_element being treated differently (not getting escaped by jsonb_build_object)? I've tried casting, using jsonb_array_elements_text (though my actual use case involves an array of arrays, so I need to split to a set of jsonb), and various escaping/encoding/formatting functions, but haven't found a solution yet.
Is there a trick to cast text pulled from jsonb_array_element so it will get properly encoded by jsonb_build_object?
Thanks for any hints or solutions.
Those strings look awfully similar, but they're actually different. When you create a string literal like '\n', that's a backslash character followed by an "n" character. So when you put that into json_build_object, it needs to add a backslash to escape the backslash you're giving it.
On the other hand, when you call jsonb_array_elements('["abc\ndef"]'), you're saying that the JSON has precisely a \n encoded in it with no second backslash, and therefore when it's converted to text, that \n is interpreted as a newline character, not two separate characters. You can see this easily by running the following:
SELECT a->>0 FROM jsonb_array_elements('["abc\ndef"]') a;
?column?
----------
abc +
def
(1 row)
On encoding that back into a JSON, you get a single backslash again, because it's once again encoding a newline character.
If you want to escape it with an extra backslash, I suggest a simple replace:
SELECT
json_build_object(
'text_from_jsonb_with_replace', replace(a->>0, E'\n', '\n')
)
FROM jsonb_array_elements('["abc\ndef"]') jae (a);
json_build_object
------------------------------------------------
{"text_from_jsonb_with_replace" : "abc\\ndef"}

removing backslashes in mule 4 dataweave transformation

I am fetching data from SQL Server Database and transforming it into JSON in Mule 4. My input has a single backslash and converted to double backslashes. I only need a single backslash in my output.
Input example:
abchd\kdgf
Output is:
"abchd\\kdgf"
It should be:
"abchd\kdgf"
Anyone can help with this data weave transformation?
In JSON strings the backslash character is the escape character, and has to be escaped itself to represent a single backlash. That's how JSON works, it is not a Mule issue.
Here single slash treated internally as double slash. Try the dataweave expression like below
payload replace /([\\])/ with ("")
Hope it helps

spark csv writer - escape string without using quotes

I am trying to escape delimiter character that appears inside data. Is there a way to do it by passing option parameters? I can do it from udf, but I am hoping it is possible using options.
val df = Seq((8, "test,me\nand your", "other")).toDF("number", "test", "t")
df.coalesce(1).write.mode("overwrite").format("csv").option("quote", "\u0000").option("delimiter", ",").option("escape", "\\").save("testcsv1")
But the escape is not working. The output file is written as
8,test,me
and your,other
I want the output file to be written as.
8,test\,me\\nand your,other
I'm not certain, but I think if you had your sequence as
Seq((8, "test\\,me\\\\nand your", "other"))
and did not specify a custom escape character, it would behave as you are expecting and give you 8,test\,me\\nand your,other as the output. This is because \\ acts simply as the character '\' rather than an escape, so they are printed where you want and the n immediately after is not interpreted as part of a newline character.