Create .jsonl files from .csv - csv

I want to use AutoML, specifically the Entity extraction, however, I'm asked to upload a .jsonl file.
I don't know that a .jsonl file is nor how to create it. I only have a .csv file.
So, how can I create a .jsonl file from a .csv file? And if that is not possible, how can I create a .jsonl file?

This is JSONlines http://jsonlines.org/
And you can use Miller (https://github.com/johnkerl/miller). In example if your input CSV is
fieldOne,FieldTwo
1,lorem
2,ipsum
you can run
mlr --c2j cat input_01.csv >output.json
to have
{ "fieldOne": 1, "FieldTwo": "lorem" }
{ "fieldOne": 2, "FieldTwo": "ipsum" }
This output is a JSON Lines (one valid JSON object, for each row). If you want a JSON you must add the --jlistwrap flag.
mlr --c2j --jlistwrap cat input.csv
to have
[
{ "fieldOne": 1, "FieldTwo": "lorem" }
,{ "fieldOne": 2, "FieldTwo": "ipsum" }
]

Related

How to add commas in between JSON objects using Linux Shell and SnowSQL?

While there are several posts about this topic on Stack Overflow, none match my exact use case. I am using a Linux shell script to run SnowSQL to generate a json file.
========================
My json file needs to have a comma between json objects.
This:
{
"CAMPAIGN": "Welcome_New",
"UUID": "fe881781-bdc2-41b2-95f2-e0e8c19dc597"
}
{
"CAMPAIGN": "Welcome_Existing",
"UUID": "77a41c02-beb9-48bf-ada4-b2074c1a78cb"
}
...needs to look this:
{
"CAMPAIGN": "Welcome_New",
"UUID": "fe881781-bdc2-41b2-95f2-e0e8c19dc597"
},
{
"CAMPAIGN": "Welcome_Existing",
"UUID": "77a41c02-beb9-48bf-ada4-b2074c1a78cb"
}
Here is my complete ksh script:
#!/usr/bin/ksh
. /appl/.snf_logon
export SNOW_PKEY_FILE=$(mktemp ./pkey-XXXXXX)
trap "rm -f ${SNOW_PKEY_FILE}" EXIT
LibGetSnowCred
{
outFile=JSON_FILE_TYPE_TEST.json
inDir=/testing
outFileNm=#my_db.my_schema.my_file_stage/${outFile}
snowsql \
--private-key-path $SNOW_PKEY_FILE \
-o exit_on_error=true \
-o friendly=false \
-o timing=false \
-o log_level=ERROR \
-o echo=true <<!
COPY INTO ${outFileNm}
FROM (SELECT object_construct(
'UUID',UUID
,'CAMPAIGN',CAMPAIGN)
FROM my_db.my_schema.JSON_Test_Table
LIMIT 2)
FILE_FORMAT=(
TYPE=JSON
COMPRESSION=NONE
)
OVERWRITE=True
HEADER=False
SINGLE=True
MAX_FILE_SIZE=4900000000
;
get ${outFileNm} file://${inDir}/;
rm ${outFileNm};
!
if [ $? -eq 0 ]; then
echo "Export successful"
else
echo "ERROR in export"
fi
}
Is the best practice to add the comma during the SELECT or after the file is generated and how?
With or without that comma, the text is still not JSON but just a random text that looks like JSON. You export several rows, each row as an independent object. You need to gather all these objects into an array to produce a valid JSON.
A JSON that encodes an array of rows looks like this:
[
{
"CAMPAIGN": "Welcome_New",
"UUID": "fe881781-bdc2-41b2-95f2-e0e8c19dc597"
},
{
"CAMPAIGN": "Welcome_Existing",
"UUID": "77a41c02-beb9-48bf-ada4-b2074c1a78cb"
}
]
The easiest way to produce this output would be to ask the database, if it supports this option (to wrap all the records into a list before generating the JSON, to not export each record in a separate JSON).
If this is not possible then you have a file that contains multiple JSONs. You can use jq to convert these individual JSONs into a JSON similar to the one described above (encoding an array of objects).
It is as simple as that:
jq --slurp '.' input_file > output_file
The option --slurp tells jq to read all the JSONs from the file input_file in memory, to parse them and to put them into an array. That is the program input.
'.' is the jq program. It says "dump the current object". It does not do any processing to the input data. The current object is the array.
After it executes the program (which, in this case doesn't do anything), jq dumps the modified value (as JSON, of course) to the standard output (by default, on screen).
The > output_file part redirects this output to a file (named output_file) instead of showing it on screen.
You can see how it works on the jq playground.

Merge multiple JSON files and include filename of each file in the resulting object

I have hundreds of files being named as [guid].json where structure of them all looks similar to this:
{
"Active": true,
"CaseType": "CaseType",
"CustomerGroup": ["Core", "Extended"]
}
First I need to append a new key-value pair to all files with "CaseId": "[filename]" and then merge them all into one big array and save it as a new json manifest file.
I would like one file with the following structure from a jq command:
[
{
"Active": true,
"CaseType": "CaseType",
"CustomerGroup": ["Core", "Extended"],
"CaseId": "43d47f66-5a0a-4b86-88d6-1f1f893098d2"
},
{
"Active": true,
"CaseType": "CaseType",
"CustomerGroup": ["Core", "Extended"],
"CaseId": "e3x47f66-5a0a-4b86-88d6-1f1f893098d2"
}
]
You're looking for input_filename.
jq -n '[ inputs | .CaseId = input_filename ]' *.json
You can use reduce adding on one input object at a time. Use input_filename to get the UTF-8 encoded filename and form the record with CaseId
jq -n 'reduce inputs as $d (null; . + [ $d + { CaseId: input_filename } ] )' *.json

jq bash Adding a json field to a json file

i'm stuck on jq input problem. I have a json file that looks like this:
{
"main_object": {
"child1": ["banana", "apple", "orange"]
}
}
I need to add another child object and rewrite this file, the problem is that this child object needs to be generated dynamically. so i'm doing this:
added_string=$(printf '.main_object += {%s: %s}' "$child_name" "$fruits")
Then I wrote this line, which worked well on my mac shell:
edited_json=$(cat $json_variable_file | jq $added_string)
When i tried to run all of this from a bash script i got this error:
jq: error: Could not open file +=: No such file or directory
jq: error: Could not open file {"child2":: No such file or directory
jq: error: Could not open file ["orange","potato","watermelon"]}: No such file or directory
So I tried many things so far, most of them still give me the same error, also tried doing this:
edited_json=$(cat $json_variable_file | jq <<< $added_string)
The error i got is this:
parse error: Invalid numeric literal at line 1, column 23
Really appreciate your time, the weird thing here is that it works completely fine, generating the needed json on my zsh but it does not work on bash.
With bash and zsh:
child_name="child2"
fruits='["orange","potato","watermelon"]'
added_string=$(printf '.main_object += {%s: %s}' "$child_name" "$fruits")
cat file | jq "$added_string" # quotes are important
Output:
{
"main_object": {
"child1": [
"banana",
"apple",
"orange"
],
"child2": [
"orange",
"potato",
"watermelon"
]
}
}

Need help! - Unable to load JSON using COPY command

Need your expertise here!
I am trying to load a JSON file (generated by JSON dumps) into redshift using copy command which is in the following format,
[
{
"cookieId": "cb2278",
"environment": "STAGE",
"errorMessages": [
"70460"
]
}
,
{
"cookieId": "cb2271",
"environment": "STG",
"errorMessages": [
"70460"
]
}
]
We ran into the error - "Invalid JSONPath format: Member is not an object."
when I tried to get rid of square braces - [] and remove the "," comma separator between JSON dicts then it loads perfectly fine.
{
"cookieId": "cb2278",
"environment": "STAGE",
"errorMessages": [
"70460"
]
}
{
"cookieId": "cb2271",
"environment": "STG",
"errorMessages": [
"70460"
]
}
But in reality most JSON files from API s have this formatting.
I could do string replace or reg ex to get rid of , and [] but I am wondering if there is a better way to load into redshift seamlessly with out modifying the file.
One way to convert a JSON array into a stream of the array's elements is to pipe the former into jq '.[]'. The output is sent to stdout.
If the JSON array is in a file named input.json, then the following command will produce a stream of the array's elements on stdout:
$ jq ".[]" input.json
If you want the output in jsonlines format, then use the -c switch (i.e. jq -c ......).
For more on jq, see https://stedolan.github.io/jq

Create a nested object json file using bash

I have a small bash script that scours through a directory and its subs (media/) and adds the output to a json file.
The line that outputs the json is as follows:
printf '{"id":%s,"foldername":"%s","path":"%s","date":"%s","filename":"%s"},\n' $num "$folder" "/media/$file2" "$fullDate" "$filename" >> /media/files.json
The json file looks like this:
{"id":1,"foldername":"5101","path":"/media/5101/Musicali10.mp3","date":"2015-08-09:13:16","filename":"Musicali10"},
{"id":2,"foldername":"5101","path":"/media/5101/RumCollora.mp3","date":"2015-08-09:13:16","filename":"RumCollora"}
I would like it group all files in a folder and output something like this
[ {
"id":1,
"foldername":"5101",
"files":[
{
"path":"/media/5101/Musicali10.mp3",
"date":"2015-08-09:13:16",
"filename":"Musicali10"
},
{
"path":"/media/5101/RumCollora.mp3",
"date":"2015-08-09:13:16",
"filename":"RumCollora"
}
] },
{
"id":2,
"foldername":"3120",
"files":[
{
"path":"/media/3120/Marimba4.mp3",
"date":"2015-08-04:10:15",
"filename":"Marimba4"
},
{
"path":"/media/3120/Rumbidzaishe6.mp3",
"date":"2015-08-04:09:10",
"filename":"Rumbidzaishe6"
}
]
}
]
My question is how to create a json file that has nested "files" objects? I want each "foldername" to have a nested list of of files. So far I am only able to output each file as an array using the printf statement above.