I'm trying to improve a bash script I wrote using jq (Python version), but can't quite get the conditional nature of the task at hand to work.
The task: insert array from one JSON object ("lookup") into another ("target") only if the key of the "lookup" matches a particular "higher-level" value in the "target". Assume that the two JSON objects are in lookup.json and target.json, respectively.
A minimal example to make this clearer:
"Lookup" JSON:
{
"table_one": [
"a_col_1",
"a_col_2"
],
"table_two": [
"b_col_1",
"b_col_2",
"b_col_3"
]
}
"Target" JSON:
{
"top_level": [
{
"name": "table_one",
"tests": [
{
"test_1": {
"param_1": "some_param"
}
},
{
"test_2": {
"param_1": "another_param"
}
}]
},
{
"name": "table_two",
"tests": [
{
"test_1": {
"param_1": "some_param"
}
},
{
"test_2": {
"param_1": "another_param"
}
}
]
}
]
}
I want the output to be:
{
"top_level": [{
"name": "table_one",
"tests": [{
"test_1": {
"param_1": "some_param"
}
},
{
"test_2": {
"param_1": "another_param",
"param_2": [
"a_col_1",
"a_col_2"
]
}
},
{
"name": "table_two",
"tests": [{
"test_1": {
"param_1": "some_param"
}
},
{
"test_2": {
"param_1": "another_param",
"param_2": [
"b_col_1",
"b_col_2",
"b_col_3"
]
}
}
]
}
]
}
]
}
Hopefully, that makes sense. Early attempts slurped both JSON blobs and assigned them to two variables. I'm trying to select for a match on [roughly] ($lookup | keys[]) == $target.top_level.name, but I can't quite get this match or the subsequent the array insert working.
Any advice is well-received!
Assuming the JSON samples have been corrected, and that the following program is in the file "target.jq", the invocation:
jq --argfile lookup lookup.json -f target.jq target.json
produces the expected result.
target.jq
.top_level |= map(
$lookup[.name] as $value
| .tests |= map(
if has("test_2")
then .test_2.param_2 = $value
else . end) )
Caveat
Since --argfile is officially deprecated, you might wish to choose an alternative method of passing in the contents of lookup.json, but --argfile is supported by all extant versions of jq as of this writing.
The jq answer is already given, but the ask itself is so fascinating - it requires a cross-lookup from a source file into the file being inserted, so I could not help providing also an alternative solution using jtc utility:
<target.json jtc -w'<name>l:<N>v[-1][tests][-1:][0]' \
-i file.json -i'<N>t:' -T'{"param_2":{{}}}'
A brief overlook of the used options:
-w'<name>l:<N>v[-1][tests][-1:][0]' - selects points of insertions in the source (target.json) by finding and memorizing into namespace N keys to be looked up in the inserted file, then rolling back 1 level up in the JSON tree, selecting tests label, then the last entry in it and finally addressing a 1st element of the last one
-i file.json make an insertion from the file
-i'<N>t:' - this walk over file.json finds recursively a tag (label) preserved in the namespace N from the respective walk -w (if not this insert option with the walk argument, then the whole file would get inserted into the insertion points -w..)
-T'{"param_2":{{}}}' - finally, a template operation is applied onto the insertion result transforming found entry (in file.json) into the one with the right label
PS. I'm the developer of the jtc - multithreading JSON processing utility for unix.
PPS. the disclaimer is required by SO.
I'm going to preface by saying that "no, find a different way to do it" is an acceptable answer here.
Is there a reliable way to store a short bit of JSON in a bash variable for use in a AWS CLI command running from the same script?
I'll be running a job from Jenkins that's updating an AWS Route53 record, which requires UPSERTing a JSON file with the change in records. Because it's running from Jenkins, there's no local storage where I can keep this file, and I'd really like to avoid needing to do a git checkout every time this project will run (which will be once an hour).
Ideally, storing the data in a variable ($foo) and calling it as part of the change-resource-record-sets command would be most convenient given the Jenkins setup, but I'm unfamiliar with exactly how to quote/store JSON inside bash - can it be done safely?
The specific JSON in this case is the following;
{"Comment":"Update DNSName.","Changes":[{"Action":"UPSERT","ResourceRecordSet":{"Name":"alex.","Type":"A","AliasTarget":{"HostedZoneId":"######","DNSName":"$bar","EvaluateTargetHealth":false}}}]}
As an added complication the DNSName value - $bar - needs to be expanded.
You could use a here-doc:
foo=$(cat <<EOF
{"Comment":"Update DNSName.","Changes":[{"Action":"UPSERT","ResourceRecordSet":{"Name":"alex.","Type":"A","AliasTarget":{"HostedZoneId":"######","DNSName":"$bar","EvaluateTargetHealth":false}}}]}
EOF
)
By leaving EOF in the first line unquoted, the contents of the here-doc will be subject to parameter expansion, so your $bar expands to whatever you put in there.
If you can have linebreaks in your JSON, you can make it a little more readable:
foo=$(cat <<EOF
{
"Comment": "Update DNSName.",
"Changes": [
{
"Action": "UPSERT",
"ResourceRecordSet": {
"Name": "alex.",
"Type": "A",
"AliasTarget": {
"HostedZoneId": "######",
"DNSName": "$bar",
"EvaluateTargetHealth": false
}
}
}
]
}
EOF
)
or even (first indent on each line must be a tab)
foo=$(cat <<-EOF
{
"Comment": "Update DNSName.",
"Changes": [
{
"Action": "UPSERT",
"ResourceRecordSet": {
"Name": "alex.",
"Type": "A",
"AliasTarget": {
"HostedZoneId": "######",
"DNSName": "$bar",
"EvaluateTargetHealth": false
}
}
}
]
}
EOF
)
and to show how that is stored, including quoting (assuming that bar=baz):
$ declare -p foo
declare -- foo="{
\"Comment\": \"Update DNSName.\",
\"Changes\": [
{
\"Action\": \"UPSERT\",
\"ResourceRecordSet\": {
\"Name\": \"alex.\",
\"Type\": \"A\",
\"AliasTarget\": {
\"HostedZoneId\": \"######\",
\"DNSName\": \"baz\",
\"EvaluateTargetHealth\": false
}
}
}
]
}"
Because this expands some shell metacharacters, you could run into trouble if your JSON contains something like `, so alternatively, you could assign directly, but be careful about quoting around $bar:
foo='{"Comment":"Update DNSName.","Changes":[{"Action":"UPSERT","ResourceRecordSet":{"Name":"alex.","Type":"A","AliasTarget":{"HostedZoneId":"######","DNSName":"'"$bar"'","EvaluateTargetHealth":false}}}]}'
Notice the quoting for $bar: it's
"'"$bar"'"
│││ │││
│││ ││└ literal double quote
│││ │└ opening syntactical single quote
│││ └ closing syntactical double quote
││└ opening syntactical double quote
│└ closing syntactical single quote
└ literal double quote
It can be stored safely; generating it is a different matter, since the contents of $bar may need to be encoded. Let a tool like jq handle creating the JSON.
var=$(jq -n --arg b "$bar" '{
Comment: "Update DNSName.",
Changes: [
{
Action: "UPSERT",
ResourceRecordSet: {
Name: "alex.",
Type: "A",
AliasTarget: {
HostedZoneId: "######",
DNSName: $b,
EvaluateTargetHealth: false
}
}
}
]
}')
I have a preference for shell commands to get things done. I have a very, very big file -- about 2.8 GB and the content is that of JSON. Everything is on one line, and I was told there are at least 1.5 million records in there.
I must prepare the file for consumption. Each record must be on its own line. Sample:
{"RomanCharacters":{"Alphabet":[{"RecordId":"1",...]},{"RecordId":"2",...},{"RecordId":"3",...},{"RecordId":"4",...},{"RecordId":"5",...} }}
Or, use the following...
{"Accounts":{"Customer":[{"AccountHolderId":"9c585258-c94c-442b-a2f0-1ebbcc274795","Title":"Mrs","Forename":"Tina","Surname":"Wright","DateofBirth":"1988-01-01","Contact":[{"Contact_Info":"9168777943","TypeId":"Mobile Number","PrimaryFlag":"No","Index":"1","Superseded":"No" },{"Contact_Info":"9503588153","TypeId":"Home Telephone","PrimaryFlag":"Yes","Index":"2","Superseded":"Yes" },{"Contact_Info":"acne.pimple#microchimerism.com","TypeId":"Email Address","PrimaryFlag":"No","Index":"3","Superseded":"No" },{"Contact_Info":"swati.singh#microchimerism.com","TypeId":"Email Address","PrimaryFlag":"Yes","Index":"4","Superseded":"Yes" }, {"Contact_Info":"christian.bale#hollywood.com","TypeId":"Email Address","PrimaryFlag":"No","Index":"5","Superseded":"NO" },{"Contact_Info":"15482475584","TypeId":"Mobile_Phone","PrimaryFlag":"No","Index":"6","Superseded":"No" }],"Address":[{"AddressPtr":"5","Line1":"Flat No.14","Line2":"Surya Estate","Line3":"Baner","Line4":"Pune ","Line5":"new","Addres_City":"pune","Country":"India","PostCode":"AB100KP","PrimaryFlag":"No","Superseded":"No"},{"AddressPtr":"6","Line1":"A-602","Line2":"Viva Vadegiri","Line3":"Virar","Line4":"new","Line5":"banglow","Addres_City":"Mumbai","Country":"India","PostCode":"AB10V6T","PrimaryFlag":"Yes","Superseded":"Yes"}],"Account":[{"Field_A":"6884133655531279","Field_B":"887.07","Field_C":"A Loan Product",...,"FieldY_":"2015-09-18","Field_Z":"24275627"}]},{"AccountHolderId":"92a5788f-cd8f-423d-ae5f-4eb0ceb457fd","_Title":"Dr","_Forename":"Christopher","_Surname":"Carroll","_DateofBirth":"1977-02-02","Contact":[{"Contact_Info":"9168777943","TypeId":"Mobile Number","PrimaryFlag":"No","Index":"7","Superseded":"No" },{"Contact_Info":"9503588153","TypeId":"Home Telephone","PrimaryFlag":"Yes","Index":"8","Superseded":"Yes" },{"Contact_Info":"acne.pimple#microchimerism.com","TypeId":"Email Address","PrimaryFlag":"No","Index":"9","Superseded":"No" },{"Contact_Info":"swati.singh#microchimerism.com","TypeId":"Email Address","PrimaryFlag":"Yes","Index":"10","Superseded":"Yes" }],"Address":[{"AddressPtr":"11","Line1":"Flat No.14","Line2":"Surya Estate","Line3":"Baner","Line4":"Pune ","Line5":"new","Addres_City":"pune","Country":"India","PostCode":"AB11TXF","PrimaryFlag":"No","Superseded":"No"},{"AddressPtr":"12","Line1":"A-602","Line2":"Viva Vadegiri","Line3":"Virar","Line4":"new","Line5":"banglow","Addres_City":"Mumbai","Country":"India","PostCode":"AB11O8W","PrimaryFlag":"Yes","Superseded":"Yes"}],"Account":[{"Field_A":"4121879819185553","Field_B":"887.07","Field_C":"A Loan Product",...,"Field_X":"2015-09-18","Field_Z":"25679434"}]},{"AccountHolderId":"4aa10284-d9aa-4dc0-9652-70f01d22b19e","_Title":"Dr","_Forename":"Cheryl","_Surname":"Ortiz","_DateofBirth":"1977-03-03","Contact":[{"Contact_Info":"9168777943","TypeId":"Mobile Number","PrimaryFlag":"No","Index":"13","Superseded":"No" },{"Contact_Info":"9503588153","TypeId":"Home Telephone","PrimaryFlag":"Yes","Index":"14","Superseded":"Yes" },{"Contact_Info":"acne.pimple#microchimerism.com","TypeId":"Email Address","PrimaryFlag":"No","Index":"15","Superseded":"No" },{"Contact_Info":"swati.singh#microchimerism.com","TypeId":"Email Address","PrimaryFlag":"Yes","Index":"16","Superseded":"Yes" }],"Address":[{"AddressPtr":"17","Line1":"Flat No.14","Line2":"Surya Estate","Line3":"Baner","Line4":"Pune ","Line5":"new","Addres_City":"pune","Country":"India","PostCode":"AB12SQR","PrimaryFlag":"No","Superseded":"No"},{"AddressPtr":"18","Line1":"A-602","Line2":"Viva Vadegiri","Line3":"Virar","Line4":"new","Line5":"banglow","Addres_City":"Mumbai","Country":"India","PostCode":"AB12BAQ","PrimaryFlag":"Yes","Superseded":"Yes"}],"Account":[{"Field_A":"3288214945919484","Field_B":"887.07","Field_C":"A Loan Product",...,"Field_Y":"2015-09-18","Field_Z":"66264768"}]}]}}
Final outcome should be:
{"RomanCharacters":{"Alphabet":[{"RecordId":"1",...]},
{"RecordId":"2",...},
{"RecordId":"3",...},
{"RecordId":"4",...},
{"RecordId":"5",...} }}
Attempted commands:
sed -e 's/,{"RecordId"/}]},\n{"RecordId"/g' sample.dat
awk '{gsub(",{\"RecordId\"",",\n{\"RecordId\"",$0); print $0}' sample.dat
The attempted commands works perfectly fine for small files. But it does not work for the 2.8 GB file that I must manipulate. Sed quits midway after 10 mins without reason and nothing was done. Awk errored with a Segmentation Fault (core dump) reason after many hours in. I tried perl's search and replace and got an error saying "Out of memory".
Any help/ ideas would be great!
Additional info on my machine:
More than 105 GB disk space available.
8 GB memory
4 cores CPU
Running Ubuntu 14.04
Since you've tagged your question with sed, awk AND perl, I gather that what you really need is a recommendation for a tool. While that's kind of off-topic, I believe that jq is something you could use for this. It will be better than sed or awk because it actually understands JSON. Everything shown here with jq could also be done in perl with a bit of programming.
Assuming content like the following (based on your sample):
{"RomanCharacters":{"Alphabet": [ {"RecordId":"1","data":"data"},{"RecordId":"2","data":"data"},{"RecordId":"3","data":"data"},{"RecordId":"4","data":"data"},{"RecordId":"5","data":"data"} ] }}
You can easily reformat this to "prettify" it:
$ jq '.' < data.json
{
"RomanCharacters": {
"Alphabet": [
{
"RecordId": "1",
"data": "data"
},
{
"RecordId": "2",
"data": "data"
},
{
"RecordId": "3",
"data": "data"
},
{
"RecordId": "4",
"data": "data"
},
{
"RecordId": "5",
"data": "data"
}
]
}
}
And we can dig in to the data to retrieve only the records you're interested in (regardless of what they're wrapped in):
$ jq '.[][][]' < data.json
{
"RecordId": "1",
"data": "data"
}
{
"RecordId": "2",
"data": "data"
}
{
"RecordId": "3",
"data": "data"
}
{
"RecordId": "4",
"data": "data"
}
{
"RecordId": "5",
"data": "data"
}
This is much more readable, both by humans and by tools like awk which process content line-by-line. If you want to join your lines for processing per your question, the awk becomes much more simple:
$ jq '.[][][]' < data.json | awk '{printf("%s ",$0)} /}/{printf("\n")}'
{ "RecordId": "1", "data": "data" }
{ "RecordId": "2", "data": "data" }
{ "RecordId": "3", "data": "data" }
{ "RecordId": "4", "data": "data" }
{ "RecordId": "5", "data": "data" }
Or, as #peak suggested in comments, eliminate the awk portion of thie entirely by using jq's -c (compact output) option:
$ jq -c '.[][][]' < data.json
{"RecordId":"1","data":"data"}
{"RecordId":"2","data":"data"}
{"RecordId":"3","data":"data"}
{"RecordId":"4","data":"data"}
{"RecordId":"5","data":"data"}
Regarding perl: Try setting the input line separator $/ to }, like this:
#!/usr/bin/perl
$/= "},";
while (<>){
print "$_\n";
}'
or, as a one-liner:
$ perl -e '$/="},";while(<>){print "$_\n"}' sample.dat
Try using } as the record separator, e.g. in Perl:
perl -l -0175 -ne 'print $_, $/' < input
You might need to glue back lines containing only }.
This avoids the memory problem by not looking at the data as a single record, but may go too far the other way with respect to performance (processing a single character at a time). Also note that it requires gawk for the built-in RT variable (value of the current record separator):
$ cat j.awk
BEGIN { RS="[[:print:]]" }
RT == "{" { bal++}
RT == "}" { bal-- }
{ printf "%s", RT }
RT == "," && bal == 2 { print "" }
END { print "" }
$ gawk -f j.awk j.txt
{"RomanCharacters":{"Alphabet":[{"RecordId":"1",...]},
{"RecordId":"2",...},
{"RecordId":"3",...},
{"RecordId":"4",...},
{"RecordId":"5",...} }}
Using the sample data provided here (the one that begins with {Accounts:{Customer... ), the solution to this problem is one that reads in the file and as it is reading it is counting the number of delimiters defined in $/. For every count of 10,000 delimiters, it will write out to a new file. And for each delimiter found, it gives it a new line. Here is how the script looks like:
#!/usr/bin/perl
$base="/home/dat789/incoming";
#$_="sample.dat";
$/= "}]},"; # delimiter to find and insert new line after
$n = 0;
$match="";
$filecount=0;
$recsPerFile=10000; # set number of records in a file
print "Processing " . $_ ."\n";
while (<>){
if ($n < $recsPerFile) {
$match=$match.$_."\n";
$n++;
print "."; #This is so that we'd know it has done something
}
else {
my $newfile="partfile".$recsPerFile."-".$filecount . ".dat";
open ( OUTPUT,'>', $newfile );
print OUTPUT $match;
$match="";
$filecount++;
$n=0;
print "Wrote file " . $newfile . "\n";
}
}
print "Finished\n\n";
I've used this script against the big 2.8 GB file where it's content is an unformatted one-liner JSON. The resulting output files would be missing the correct JSON headers and footers but this can be easily fixed.
Thank you so much guys for contributing!