I want to convert the text file data to JSON using Jq - json

I have the date in the file which looks like
test,test
test1,test1
I want to convert it into like:
{"test":"test","test1":"test1"}
I have tried jq for this purpose jq -R -s -c 'split("\n")'
But its oupting in the format ["test,test","test1,test1",""]

jq 1.5 has inputs, which allows a simple and efficient solution:
jq -R -n -c '[inputs|split(",")|{(.[0]):.[1]}] | add' input.txt
Important: don't forget the -n (--null-input) option, otherwise you'll lose the first line.
Alternative
If your jq does not have inputs, then it's time to upgrade if at all possible. Otherwise:
jq -R -s '
split("\n")
| map(if index(",") then split(",")|{(.[0]):.[1]}
else empty end)
| add' input.txt

As #peak indicates, use the inputs with the split function. But to merge the key/values into one single object, use the reduce method:
jq -Rn '[inputs|split(",")| {(.[0]): .[1]}] | reduce .[] as $obj ({}; . + $obj) ' input.csv
The reduce method reduces each item in the array into a single item. In this case, we indicate that each item should be assigned to the $obj variable, and that we start out with the empty {} object. The second argument to the reduce method indicates how to "reduce" things down to a single item. In this case, we are adding/merging the $obj we assigned with the {} object we started out with and then returning the resulting object to be used in the next iteration. After all the iterations have completed, the final item (in this case, the combined object) is returned.

What you ask is possible to achieve with just standar unix shell utilities (assuming your input in file.txt):
bash $ echo { \"$(<file.txt sed 's/,/":"/g' | paste -s -d, - | sed 's/,/","/g')\" }
{ "test":"test","test1":"test1" }
bash $
resulting output is a valid json

Related

Parse multiple json files and output the match/hits against the regex with associated file names

Currently, the cat command piped to jq helps me to parse multiple JSON files in my working directory and screen against the regex pattern matching email ids available in all in the files. However, am keen to identify the file name also in which the regex pattern is being hit/matched
cat *.json | jq '. as $data | [path(..| select(scalars and (tostring | test("^[a-zA-Z0-9+_.-]+#[a-zA-Z0-9.-]+$", "ixn")))) ] | map({ (.|join(".")): (. as $path | .=$data | getpath($path)) }) | reduce .[] as $item ({}; . * $item)'
Request your kind help tweaking the command to print $filename. thanks!
input_filename evaluates to the input file name of the file currently being read (after it has been opened). For STDIN, it evaluates to "<stdin>":
jq 'input_filename, input_filename' <<< 1
"<stdin>"
"<stdin>"
It works with the -n command-line option, but only after an input or inputs function has been called:
jq -n 'input_filename, (input | input_filename)' <<< 1
null
"<stdin>"
For a jq-internal solution use input_filename as #peak suggested. Here's an external solution which iterates over your input files and passes the file name as variable into jq. This approach, however, calls jq once for each input file (as opposed to your cat *.json | jq ... approach which has just one call), so you might run into performance issues when applied to a larger number of input files.
for f in *.json
do jq --arg f "$f" '. as $data | ... (use $f here) ...' "$f"
done

create json from bash variable and associative array [duplicate]

This question already has answers here:
Constructing a JSON object from a bash associative array
(5 answers)
Closed 5 months ago.
Lets say I have the following declared in bash:
mcD="had_a_farm"
eei="eeieeio"
declare -A animals=( ["duck"]="quack_quack" ["cow"]="moo_moo" ["pig"]="oink_oink" )
and I want the following json:
{
"oldMcD": "had a farm",
"eei": "eeieeio",
"onThisFarm":[
{
"duck": "quack_quack",
"cow": "moo_moo",
"pig": "oink_oink"
}
]
}
Now I know I could do this with an echo, printf, or assign text to a variable, but lets assume animals is actually very large and it would be onerous to do so. I could also loop through my variables and associative array and create a variable as I'm doing so. I could write either of these solutions, but both seem like the "wrong way". Not to mention its obnoxious to deal with the last item in animals, after which I do not want a ",".
I'm thinking the right solution uses jq, but I'm having a hard time finding much documentation and examples on how to use this tool to write jsons (especially those that are nested) rather than parse them.
Here is what I came up with:
jq -n --arg mcD "$mcD" --arg eei "$eei" --arg duck "${animals['duck']}" --arg cow "${animals['cow']}" --arg pig "${animals['pig']}" '{onThisFarm:[ { pig: $pig, cow: $cow, duck: $duck } ], eei: $eei, oldMcD: $mcD }'
Produces the desired result. In reality, I don't really care about the order of the keys in the json, but it's still annoying that the input for jq has to go backwards to get it in the desired order. Regardless, this solution is clunky and was not any easier to write than simply declaring a string variable that looks like a json (and would be impossible with larger associative arrays). How can I build a json like this in an efficient, logical manner?
Thanks!
Assuming that none of the keys or values in the "animals" array contains newline characters:
for i in "${!animals[#]}"
do
printf "%s\n%s\n" "${i}" "${animals[$i]}"
done | jq -nR --arg oldMcD "$mcD" --arg eei "$eei" '
def to_o:
. as $in
| reduce range(0;length;2) as $i ({};
.[$in[$i]]= $in[$i+1]);
{$oldMcD,
$eei,
onthisfarm: [inputs] | to_o}
'
Notice the trick whereby {$x} in effect expands to {(x): $x}
Using "\u0000" as the separator
If any of the keys or values contains a newline character, you could tweak the above so that "\u0000" is used as the separator:
for i in "${!animals[#]}"
do
printf "%s\0%s\0" "${i}" "${animals[$i]}"
done | jq -sR --arg oldMcD "$mcD" --arg eei "$eei" '
def to_o:
. as $in
| reduce range(0;length;2) as $i ({};
.[$in[$i]]= $in[$i+1]);
{$oldMcD,
$eei,
onthisfarm: split("\u0000") | to_o }
'
Note: The above assumes jq version 1.5 or later.
You can reduce associative array with for loop and pipe it to jq:
for i in "${!animals[#]}"; do
echo "$i"
echo "${animals[$i]}"
done |
jq -n -R --arg mcD "$mcD" --arg eei "$eei" 'reduce inputs as $i ({onThisFarm: [], mcD: $mcD, eei: $eei}; .onThisFarm[0] += {($i): (input | tonumber ? // .)})'

Reading and Looping Through A JSON File in BASH

I've got a JSON file (see below) called department_groups.json.
Essentially if I gave an argument of commercial I'd like it to return:
commercial-team#domain.com
commercial-updates#domain.com
Can anyone guide/help me with doing this?
{
"legal": {
"google_groups":[
["Legal", "legal#domain.com"],
["Legal Team", "legal-team#domain.com"],
["Compliance Checks", "compliance#domain.com"]
],
"samba_groups": ""
},
"commercial":{
"google_groups":[
["Commercial Team", "commercial-team#domain.com"],
["Commercial Updates", "commercial-updates#domain.com"]
],
"samba_groups": ""
},
"technology":{
"google_groups":[
["Technology", "technology#domain.com"],
["Incidents", "incidents#domain.com"]
],
"samba_groups": ""
}
}
This returns the second element in each array in the google_groups property of the commercial property:
jq --arg key commercial '.[$key].google_groups | .[] | .[1]' file
Use jq -r to output in "raw" format (lose the double quotes).
$ key=commercial
$ jq -r --arg key "$key" '.[$key].google_groups | .[] | .[1]' file
commercial-team#domain.com
commercial-updates#domain.com
I used --arg in these examples to show how it is used, optionally with a shell variable. If, on the other hand, commercial was just a fixed string, then you could simplify:
jq -r '.commercial.google_groups | .[] | .[1]' file
To process each line of the output, you can just use a shell while read loop:
key=commercial
while read -r email; do
echo "$email"
# process each email individually here
done < <(jq -r --arg key "$key" '.[$key].google_groups | .[] | .[1]' file)
Here I am using a process substitution <(), which acts like a file that can be processed by the shell. One advantage of doing this, over using a pipe, is that no subshell is created. Among other things, this means that the variables used within the loop remain in scope after the while block, so you can use them later.
If you prefer to use a pipe, just remove the part after done and move the command up to the first line:
jq ... | while read -r email; do # etc.
As #TomFenech noted, the requirements are somewhat unclear, but if it's the email addresses you want, the following variant of his answer may be of interest:
key=commercial
$ jq -r --arg key "$key" '.[$key].google_groups[][] | select(test("#"))' department_groups.json
commercial-team#domain.com
commercial-updates#domain.com

Optimize JSON denormalization using JQ - "cartesian product" from 1:N

I have a JSON database change log, output of wal2json. It looks like this:
{"xid":1190,"timestamp":"2018-07-19 17:18:02.905354+02","change":[
{"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update AA",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}},
{"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update BB",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}}]}
...
Each top level entry (xid) is a transaction, each item in change is, well, a change. One row may change multiple times.
To import to an OLAP system with limited feature set, I need to have the order explicitly stated. So I need to add a sn for each change in a transaction.
Also, each change must be a top level entry - the OLAP can't iterate sub-items within one entry.
{"xid":1190, "sn":1, "kind":"update", "data":{"id":401,"name":"Update AA","age":20} }
{"xid":1190, "sn":2, "kind":"update", "data":{"id":401,"name":"Update BB","age":20} }
{"xid":1191, "sn":1, "kind":"insert", "data":{"id":625,"name":"Inserted","age":20} }
{"xid":1191, "sn":2, "kind":"delete", "data":{"id":625} }
(The reason is that the OLAP has limited ability to transform the data during import, and also doesn't have the order as a parameter.)
So, I do this using jq:
function transformJsonDataStructure {
## First let's reformat it to XML, then transform using XPATH, then back to JSON.
## Example input:
# {"xid":1074,"timestamp":"2018-07-18 17:49:54.719475+02","change":[
# {"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update AA",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}},
# {"kind":"update","table":"mytable2","columnnames":["id","name","age"],"columnvalues":[401,"Update BB",20],"oldkeys":{"keynames":["id"],"keyvalues":[401]}}]}
cat "$1" | while read -r LINE ; do
XID=`echo "$LINE" | jq -c '.xid'`;
export SN=0;
#serr "{xid: $XID, changes: $CHANGES}";
echo "$LINE" | jq -c '.change[]' | while read -r CHANGE ; do
SN=$((SN+=1))
KIND=`echo "$CHANGE" | jq -c --raw-output .kind`;
TABLE=`echo "$CHANGE" | jq -c --raw-output .table`;
DEST_FILE="$TARGET_PATH-$TABLE.json";
case "$KIND" in
update|insert)
MAP=$(convertTwoArraysToMap "$(echo "$CHANGE" | jq -c ".columnnames")" "$(echo "$CHANGE" | jq -c ".columnvalues")") ;;
delete)
MAP=$(convertTwoArraysToMap "$(echo "$CHANGE" | jq -c ".oldkeys.keynames")" "$(echo "$CHANGE" | jq -c ".oldkeys.keyvalues")") ;;
esac
#echo "{\"xid\":$XID, \"table\":\"$TABLE\", \"kind\":\"$KIND\", \"data\":$MAP }" >> "$DEST_FILE"; ;;
echo "{\"xid\":$XID, \"sn\":$SN, \"kind\":\"$KIND\", \"data\":$MAP }" | tee --append "$DEST_FILE";
done;
done;
return;
}
The problem is the performance. I am calling jq few times per entry. This is quite slow, around 1000x times slower than without the transformation.
How can perform the transformation above using just one pass? (jq is not a must, other tool can be used too, but should be in CentOS packages. I want to avoid coding an extra tool for that.
From man jq it seems that it could be capable of processing the whole file (JSON entry per row) in one go. I could do it in XSLT but I can't wrap my head around jq. Especially the iteration of the change array and combining columnnames and columnvalues to a map.
For the iteration, I think map or map_values could be used.
For the 2 arrays to map, I see the from_entries and with_entries functions, but can't get it work.
Any jq master around to advise?
The following helper function converts the incoming array into an object using headers as the keys:
def objectify(headers):
[headers, .] | transpose | map({(.[0]): .[1]}) | add;
The trick now is to use range(0;length) to generate .sn:
{xid} +
(.change
| range(0;length) as $i
| .[$i]
| .columnnames as $header
| {sn: ($i + 1),
kind,
data: (.columnvalues|objectify($header)) } )
Output
For the given log entry, the output would be:
{"xid":1190,"sn":1,"kind":"update","data":{"id":401,"name":"Update AA","age":20}}
{"xid":1190,"sn":2,"kind":"update","data":{"id":401,"name":"Update BB","age":20}}
Moral
If a solution looks too complicated, it probably is.

extract 2 values from JSON object and use as variables in loop using jq and bash

I am new to jq. I am trying to write a simple script that loops through a JSON file, gets two values within each object and assigns them to two separate variables I can use with a curl REST call. I see both values as output when I echo $i but how can I get value and addr as separate variables?
for i in `cat /Users/egraham/Downloads/test2 | jq .[] | jq ."value,.addr"`; do
You can do this:
jq -rc '.populator.value + " " + .populator.addr' file.json |
while read -r value addr; do
echo do something with "$value" and "$addr"
done
If spaces or tabs or other special characters make using 'read -r' problematic, and if your shell has "readarray", then it could be used:
$ readarray -t v < <(jq -rc '.populator | (.value,.addr)' file.json)
The values would then be available as ${v[0]} and ${v[1]}
This approach is especially useful if there are more than two values of interest, or if the number of values is variable or not known beforehand.
If your shell does not have readarray, then you can still use the array-oriented approach, e.g. along the lines of:
i=-1; while read -r a ; do i=$((i+1)); v[$i]="$a" ; done
First:
for i in cat /Users/egraham/Downloads/test2 | jq .[] | jq .value; do echo $i done
Second:
for i in cat /Users/egraham/Downloads/test2 | jq .[] | jq .addr; do echo $i done
I don't know any way to get it without running the commands separately. I don't know AWK, but maybe it's something worth considering.