Nifi CSVSetRecordWriter problems - csv

I try to transform json to csv using UpdateRecord processor with updating record.
My input json can contains fields with | and ". In the output csv file I want to use | as field delimeter and " for quotiong all fields.
So, example for input json file:
{"field1": "value1",
"id": "11112",
"desc1": "description Text \""
}
I get the following result:
"value1"|"11112"|"description Text ""
Expected result:
"value1"|"11112"|"description Text \""
Property of CSVRecordSetWriter:
What can I do to add a \ before the quotes in the values
Also I have a problem if \ in field. (such as "example str\"). I expect example str\\", but this is not true.

Related

How to prettify json with jq given a string with escaped double quotes

I would like to pretty print a json string I copied from an API call which contains escaped double quotes.
Similar to this:
"{\"name\":\"Hans\", \"Hobbies\":[\"Car\",\"Swimming\"]}"
However when I execute pbpaste | jq "." the result is not changed and is still in one line.
I guess the problem are the escaped double quotes, but I don't know how to tell jq to remove them if possible, or any other workaround.
What you have is a JSON string which happens to contain a JSON object. You can decode the contents of the string with the fromjson function.
$ pbpaste | jq "."
"{\"name\":\"Hans\", \"Hobbies\":[\"Car\",\"Swimming\"]}"
$ pbpaste | jq "fromjson"
{
"name": "Hans",
"Hobbies": [
"Car",
"Swimming"
]
}

Convert a JSON representation of CSV data to actual CSV data

This self-answered question is about transforming a JSON representation of CSV data into actual CSV data.[1]
The following JSON contains separate properties that describe the headers (column names) (columns) and arrays of corresponding row values (rows), respectively:
{
"columns": [
{
"name": "ColumnName1",
"type": "Number"
},
{
"name": "ColumnName2",
"type": "String"
},
{
"name": "ColumnName3",
"type": "String"
}
],
"rows": [
[
11111,
"ResourceType1",
"String1"
],
[
22222,
"ResourceType2",
"String2"
],
[
33333,
"ResourceType3",
"String3"
]
]
}
How can I convert this JSON input to the CSV data it represents?
[1] The question duplicates this closed question, which was closed presumably due to lack of effort, even though what it asks for is reasonably well-defined.
Note that CSV files have no concept of data types - all values are strings,
so the data-type information (from the .column.type properties) is lost, unless you choose to incorporate it
in some way as a convention that the consumer of the CSV would have to be aware of (the code below does not do that).
Assume that the JSON in the question is saved in file file.json, which can be parsed into a ([pscustomobject]) object graph with ConvertFrom-Json, via reading the file as text with Get-Content:
# Convert the JSON text into a [pscustomobject] object graph.
$fromJson = ConvertFrom-Json (Get-Content -Raw file.json)
# Process the array of column names and the arrays of row values by
# enclosing the array elements in "..." and joining them with ","
(, $fromJson.Columns.Name + $fromJson.Rows).ForEach({
$_.ForEach({ '"{0}"' -f ($_ -replace '"', '""') }) -join ','
})
Note that the above encloses the column names and values in "..." so as to also support
names and values with embedded , characters; additionally, any embedded " characters are properly escaped by doubling them.
If you know that the input data neither contains values with embedded , nor
", you can simply omit the inner .ForEach() array method
call above, which will result in unquoted values.
The above outputs:
"ColumnName1","ColumnName2","ColumnName3"
"11111","ResourceType1","String1"
"22222","ResourceType2","String2"
"33333","ResourceType3","String3"
To convert the above in-memory to ([pscustomobject]) objects representing the CSV data, use ConvertFrom-Csv (... represents the command above):
... | ConvertFrom-Csv
To save the above to a CSV file, use Set-Content:
... | Set-Content -Encoding utf8 out.csv

Print empty string for a missing key and converting the result to CSV

I am trying to convert belowJSON to CSV using jq command but the final CSV is unable to place deviceName field properly as it's missing in some JSON lines.
{
"id": "ABC",
"deviceName": "",
"total": 100,
"master": 20
}
{
"id": "ABC",
"total": 100,
"master": 20
}
How can i make sure empty value gets when Key is missing ?.
I Tried below command to generate CSV
./jq -r '[.[]] | #csv' > final.csv
But it gives CSV like below as you can see when deviceName key is missing in JSON it's cell shifting left side.
"ABC","",100,20
"ABC",100,20
I want output something like below which adds empty value if deviceName is missing.
"ABC","",100,20
"ABC","",100,20
In jq you can use the alternate operator // that can be used to return default values. E.g. .foo // 1 will evaluate to 1 if there's no .foo element in the input
Using that and appending an empty string "" if the key is not present, you can do
jq -r '[.id // "", .deviceName // "", .total // "", .master // ""] | #csv'
Note: The alternate operator .foo // 1 causes the evaluation to 1 for case when the value of foo is null or false. You may wish to modify the above program if your data contains null or false values.
You can take the first object as a reference for a full record, like
keys_unsorted as $k | (., inputs) | [.[$k[]]] | #csv
For your sample this produces following
"ABC","",100,20
"ABC",,100,20

How to load on Snowflake a JSON that has a literal unicode escape char "\\uNo"

I have the following JSON:
{
"name": "foo \\uNo bar"
}
I'm trying to load this into Snowflake using a STAGE on S3. This is in a CSV file like:
{"name": "foo \\uNo bar"}
However, when I try to load it, Snowflake breaks with an Error parsing JSON message. If I try to load it directly on Snowflake console, as SELECT PARSE_JSON('{"name": "foo \\uNo bar"}'), I get:
Error parsing JSON: hex digit is expected in \u???? escape sequence, pos 17
The problem is that Snowflake is parsing the string, checking for an unicode digit \uNo (which doesn't exist). How can I disable this?
The default FILE FORMAT for parsing CSVs in Snowflake is interpreting the double backslash string '{"name": "foo \\uNo bar"}' as an escape sequence for the character \ which means that the character sequence \uNo is getting passed to PARSE_JSON which then fails because \uNo not a valid escape sequence for a JSON string. You can prevent this by overriding the FILE FORMAT escape sequence settings.
Given this CSV file:
JSON
'{"name": "foo \\uNo bar"}'
And the following CREATE TABLE and COPY INTO statements:
CREATE OR REPLACE TABLE JSON_TEST (JSON TEXT);
COPY INTO JSON_TEST
FROM #my_db.public.my_s3_stage/json.csv
FILE_FORMAT = (TYPE = CSV
SKIP_HEADER = 1
FIELD_OPTIONALLY_ENCLOSED_BY = '\''
ESCAPE = NONE
ESCAPE_UNENCLOSED_FIELD = NONE);
I am able to parse there result as JSON:
SELECT PARSE_JSON(JSON) FROM JSON_TEST;
Which returns
+-----------------------------+
| JSON |
+-----------------------------|
| { "name": "foo \\uNo bar" } |
+-----------------------------+

How to get a formatted json string from a json object?

I'm storing the output of cat ~/path/to/file/blah | jq tojson in a variable to be used later in a curl POST with JSON content. It works well, but it removes all line breaks. I understand line breaks are not supported in JSON, but I'd like them to be replaced with \n characters so when the data is used it isn't all one line.
Is there a way to do this?
Example:
{
"test": {
"name": "test",
"description": "blah"
},
"test2": {
"name": "test2",
"description": "blah2"
}
}
becomes
"{\"test\":{\"name\":\"test\",\"description\":\"blah\"},\"test2\":{\"name\":\"test2\",\"description\":\"blah2\"}}"
but I'd like it to look like
{\n \"test\": {\n \"name\": \"test\",\n \"description\": \"blah\"\n },\n \"test2\": {\n \"name\": \"test2\",\n \"description\": \"blah2\" \n }\n}
I'm actually only converting it to a JSON string so it is able to be posted as part of another JSON. When is it posted, I'd like it to have the format it had originally which can be achieved if there are \n characters.
I can do this manually by doing
cat file | sed -E ':a;N;$!ba;s/\r{0,1}\n/\\n/g' | sed 's/\"/\\"/g')
but this is not ideal.
tojson (or other json outputting filters) will not format the json. It will take on the usual compact form. There is a feature request out there for this so look out for that in a future version.
You could take advantage of jq's regular formatted output, but you'll want to stringify it. You could simulate stringifying by slurping in as raw input, the formatted output. This will read in all of the input as a single string. And since the input was just a json object, it'll produce a string representation of that object.
If you don't mind the extra jq calls, you could do this:
$ var=$(jq '.' input.json | jq -sR '.')
$ echo "$var"
"{\n \"test\": {\n \"name\": \"test\",\n \"description\": \"blah\"\n },\n \"test2\": {\n \"name\": \"test2\",\n \"description\": \"blah2\"\n }\n}\n"
Then of course if your input is already formatted, you could leave out the first jq call.
If your input contains only one JSON value, then jq isn't really buying you much here: all you need is to escape the few characters that are valid in JSON but that don't represent themselves in JSON strings, and you can easily do that using command-line utilities for general-purpose string processing.
For example:
perl -wpe '
s/\\/\\<backslash>/g;
s/\t/\\t/g;
s/\n/\\n/g;
s/\r/\\r/g;
s/"/\\"/g;
s/\\<backslash>/\\\\/g
' ~/path/to/file/blah