How do I merge thousands of json files with jq? - json

I have thousands of individually named json files in a single windows directory. I'm trying to use jq to merge them all into a single file I can then import into a jupyter notebook.
I keep getting a permissioned denied error when I try and run the following command:
jq --slurp 'map(.[])' bill
I've tried editing the directory (bill) permissions. My file path looks like this:
\downloadedJSONfiles\AK\2019-2020_31st_Legislature\bill
I downloaded jq through chocolately. I'm using cmder

You are getting permission denied because you provided a directory where a path to an ordinary file is expected.
I'm going to start with the "unix" approach because jq has unix roots. The following is the command you want to use:
jq --slurp 'map(.[])' bill/a.json bill/b.json bill/c.json ...
The shell will expand the following command into the above:
jq --slurp 'map(.[])' bill/*.json
The problem is that this can easily result in a command that's too long. So you really want one the following:
# A
(
jq '.[]' bill/a.json
jq '.[]' bill/b.json
jq '.[]' bill/c.json
jq '.[]' bill/d.json
jq '.[]' bill/e.json
jq '.[]' bill/f.json
) | jq --slurp .
# B
(
cat bill/a.json
cat bill/b.json
cat bill/c.json
cat bill/d.json
cat bill/e.json
cat bill/f.json
) | jq --slurp 'map(.[])'
# C
(
jq '.[]' bill/a.json bill/b.json bill/c.json
jq '.[]' bill/d.json bill/e.json bill/f.json
) | jq --slurp .
# D
(
cat bill/a.json bill/b.json bill/c.json
cat bill/d.json bill/e.json bill/f.json
) | jq --slurp 'map(.[])'
Something equivalent can be achieved using any of the following:
# Portable. Equivalent to A
find bill -mindepth 1 -maxdepth 1 -name '*.json' -exec jq '.[]' {} \; | jq --slurp .
# Portable. Equivalent to B
find bill -mindepth 1 -maxdepth 1 -name '*.json' -exec cat {} \; | jq --slurp 'map(.[])'
# Possibly portable with tweaks. Similar to "C"
find bill -mindepth 1 -maxdepth 1 -name '*.json' -print0 |
xargs -r0 jq '.[]' |
jq --slurp .
# Possibly portable with tweaks. Similar to "D"
find bill -mindepth 1 -maxdepth 1 -name '*.json' -print0 |
xargs -r0 cat |
jq --slurp .
# GNU-specific. Equivalent to "C"
find bill -mindepth 1 -maxdepth 1 -name '*.json' -exec jq '.[]' {} + | jq --slurp .
# GNU-specific. Equivalent to "D"
find bill -mindepth 1 -maxdepth 1 -name '*.json' -exec cat {} + | jq --slurp 'map(.[])'
But you asked about Windows. In the Windows world it's up to programs to perform their own wildcard expansion. So you'd expect to be able to do
jq --slurp 'map(.[])' bill\*.json
However, jq wasn't properly ported.
Assertion failed!
Program: c:\bin\jq.exe
File: src/main.c, Line 256
Expression: wargc == argc
So like in unix, you have to pass all the files you want to process to jq as separate arguments.
Using cmd, you could use either of the following:
:: Not efficient
copy /y nul bill.jsonl
for %q in (bill\*.json) do jq ".[]" %q >>bill.jsonl
jq --slurp . bill.jsonl
del bill.jsonl
:: More efficient
copy /y nul+bill\*.json bill.jsonl
jq --slurp "map(.[])" bill.jsonl
del bill.jsonl
PowerShell is a far more advanced shell than cmd. With PowerShell, you could use
jq --slurp 'map(.[])' ( Get-Item bill\*.json )
But just like the simple unix version, the above can easily result in a command line that's too long. To avoid that, we can use the following:
# Not efficient
Get-Item bill\*.json | %{ jq '.[]' $_ } | jq --slurp .
# More efficient
%{ Get-Content bill\*.json } | jq --slurp 'map(.[])'
(%{...} is a shorthand for ForEach-Object {...}.)
Finally, I'm not familiar with Cmder.

Related

Problem filtering json with jq with a bash script variable [duplicate]

I have written a script to retrieve certain value from file.json. It works if I provide the value to jq select, but the variable doesn't seem to work (or I don't know how to use it).
#!/bin/sh
#this works ***
projectID=$(cat file.json | jq -r '.resource[] | select(.username=="myemail#hotmail.com") | .id')
echo "$projectID"
EMAILID=myemail#hotmail.com
#this does not work *** no value is printed
projectID=$(cat file.json | jq -r '.resource[] | select(.username=="$EMAILID") | .id')
echo "$projectID"
Consider also passing in the shell variable (EMAILID) as a jq variable (here also EMAILID, for the sake of illustration):
projectID=$(jq -r --arg EMAILID "$EMAILID" '
.resource[]
| select(.username==$EMAILID)
| .id' file.json)
Postscript
For the record, another possibility would be to use jq's env function for accessing environment variables. For example, consider this sequence of bash commands:
EMAILID=foo#bar.com # not exported
EMAILID="$EMAILID" jq -n 'env.EMAILID'
The output is a JSON string:
"foo#bar.com"
I resolved this issue by escaping the inner double quotes
projectID=$(cat file.json | jq -r ".resource[] | select(.username==\"$EMAILID\") | .id")
Little unrelated but I will still put it here,
For other practical purposes shell variables can be used as -
value=10
jq '."key" = "'"$value"'"' file.json
Posting it here as it might help others. In string it might be necessary to pass the quotes to jq. To do the following with jq:
.items[] | select(.name=="string")
in bash you could do
EMAILID=$1
projectID=$(cat file.json | jq -r '.resource[] | select(.username=='\"$EMAILID\"') | .id')
essentially escaping the quotes and passing it on to jq
It's a quote issue, you need :
projectID=$(
cat file.json | jq -r ".resource[] | select(.username=='$EMAILID') | .id"
)
If you put single quotes to delimit the main string, the shell takes $EMAILID literally.
"Double quote" every literal that contains spaces/metacharacters and every expansion: "$var", "$(command "$var")", "${array[#]}", "a & b". Use 'single quotes' for code or literal $'s: 'Costs $5 US', ssh host 'echo "$HOSTNAME"'. See
http://mywiki.wooledge.org/Quotes
http://mywiki.wooledge.org/Arguments
http://wiki.bash-hackers.org/syntax/words
Jq now have better way to access environment variables, you can use env.EMAILID:
projectID=$(cat file.json | jq -r ".resource[] | select(.username==env.EMAILID) | .id")
Another way to accomplish this is with the jq "--arg" flag.
Using the original example:
#!/bin/sh
#this works ***
projectID=$(cat file.json | jq -r '.resource[] |
select(.username=="myemail#hotmail.com") | .id')
echo "$projectID"
EMAILID=myemail#hotmail.com
# Use --arg to pass the variable to jq. This should work:
projectID=$(cat file.json | jq --arg EMAILID $EMAILID -r '.resource[]
| select(.username=="$EMAILID") | .id')
echo "$projectID"
See here, which is where I found this solution:
https://github.com/stedolan/jq/issues/626
I know is a bit later to reply, sorry. But that works for me.
export K8S_public_load_balancer_url="$(kubectl get services -n ${TENANT}-production -o wide | grep "ingress-nginx-internal$" | awk '{print $4}')"
And now I am able to fetch and pass the content of the variable to jq
export TF_VAR_public_load_balancer_url="$(aws elbv2 describe-load-balancers --region eu-west-1 | jq -r '.LoadBalancers[] | select (.DNSName == "'$K8S_public_load_balancer_url'") | .LoadBalancerArn')"
In my case I needed to use double quote and quote to access the variable value.
Cheers.
I also faced same issue of variable substitution with jq. I found that --arg is the option which must be used with square bracket [] otherwise it won't work.. I am giving you sample example below:
RUNNER_TOKEN=$(aws secretsmanager get-secret-value --secret-id $SECRET_ID | jq '.SecretString|fromjson' | jq --arg kt $SECRET_KEY -r '.[$kt]' | tr -d '"')
In case where we want to append some string to the variable value and we are using the escaped double quotes, for example appending .crt to a variable CERT_TYPE; the following should work:
$ CERT_TYPE=client.reader
$ cat certs.json | jq -r ".\"${CERT_TYPE}\".crt" #### This will *not* work #####
$ cat certs.json | jq -r ".\"${CERT_TYPE}.crt\""

Using parameter in jq 'any' does not result in right answer [duplicate]

I have written a script to retrieve certain value from file.json. It works if I provide the value to jq select, but the variable doesn't seem to work (or I don't know how to use it).
#!/bin/sh
#this works ***
projectID=$(cat file.json | jq -r '.resource[] | select(.username=="myemail#hotmail.com") | .id')
echo "$projectID"
EMAILID=myemail#hotmail.com
#this does not work *** no value is printed
projectID=$(cat file.json | jq -r '.resource[] | select(.username=="$EMAILID") | .id')
echo "$projectID"
Consider also passing in the shell variable (EMAILID) as a jq variable (here also EMAILID, for the sake of illustration):
projectID=$(jq -r --arg EMAILID "$EMAILID" '
.resource[]
| select(.username==$EMAILID)
| .id' file.json)
Postscript
For the record, another possibility would be to use jq's env function for accessing environment variables. For example, consider this sequence of bash commands:
EMAILID=foo#bar.com # not exported
EMAILID="$EMAILID" jq -n 'env.EMAILID'
The output is a JSON string:
"foo#bar.com"
I resolved this issue by escaping the inner double quotes
projectID=$(cat file.json | jq -r ".resource[] | select(.username==\"$EMAILID\") | .id")
Little unrelated but I will still put it here,
For other practical purposes shell variables can be used as -
value=10
jq '."key" = "'"$value"'"' file.json
Posting it here as it might help others. In string it might be necessary to pass the quotes to jq. To do the following with jq:
.items[] | select(.name=="string")
in bash you could do
EMAILID=$1
projectID=$(cat file.json | jq -r '.resource[] | select(.username=='\"$EMAILID\"') | .id')
essentially escaping the quotes and passing it on to jq
It's a quote issue, you need :
projectID=$(
cat file.json | jq -r ".resource[] | select(.username=='$EMAILID') | .id"
)
If you put single quotes to delimit the main string, the shell takes $EMAILID literally.
"Double quote" every literal that contains spaces/metacharacters and every expansion: "$var", "$(command "$var")", "${array[#]}", "a & b". Use 'single quotes' for code or literal $'s: 'Costs $5 US', ssh host 'echo "$HOSTNAME"'. See
http://mywiki.wooledge.org/Quotes
http://mywiki.wooledge.org/Arguments
http://wiki.bash-hackers.org/syntax/words
Jq now have better way to access environment variables, you can use env.EMAILID:
projectID=$(cat file.json | jq -r ".resource[] | select(.username==env.EMAILID) | .id")
Another way to accomplish this is with the jq "--arg" flag.
Using the original example:
#!/bin/sh
#this works ***
projectID=$(cat file.json | jq -r '.resource[] |
select(.username=="myemail#hotmail.com") | .id')
echo "$projectID"
EMAILID=myemail#hotmail.com
# Use --arg to pass the variable to jq. This should work:
projectID=$(cat file.json | jq --arg EMAILID $EMAILID -r '.resource[]
| select(.username=="$EMAILID") | .id')
echo "$projectID"
See here, which is where I found this solution:
https://github.com/stedolan/jq/issues/626
I know is a bit later to reply, sorry. But that works for me.
export K8S_public_load_balancer_url="$(kubectl get services -n ${TENANT}-production -o wide | grep "ingress-nginx-internal$" | awk '{print $4}')"
And now I am able to fetch and pass the content of the variable to jq
export TF_VAR_public_load_balancer_url="$(aws elbv2 describe-load-balancers --region eu-west-1 | jq -r '.LoadBalancers[] | select (.DNSName == "'$K8S_public_load_balancer_url'") | .LoadBalancerArn')"
In my case I needed to use double quote and quote to access the variable value.
Cheers.
I also faced same issue of variable substitution with jq. I found that --arg is the option which must be used with square bracket [] otherwise it won't work.. I am giving you sample example below:
RUNNER_TOKEN=$(aws secretsmanager get-secret-value --secret-id $SECRET_ID | jq '.SecretString|fromjson' | jq --arg kt $SECRET_KEY -r '.[$kt]' | tr -d '"')
In case where we want to append some string to the variable value and we are using the escaped double quotes, for example appending .crt to a variable CERT_TYPE; the following should work:
$ CERT_TYPE=client.reader
$ cat certs.json | jq -r ".\"${CERT_TYPE}\".crt" #### This will *not* work #####
$ cat certs.json | jq -r ".\"${CERT_TYPE}.crt\""

diffculty using bash to pass the contents of `top` into a json file

I want to use a bash script to output the contents of top command and then write it to a json file. But I'm having difficulty writing the slashes/encodings/line breaks into a file with a valid json object
Here's what I tried:
#!/bin/bash
message1=$(top -n 1 -o %CPU)
message2=$(top -n 1 -o %CPU | jq -aRs .)
message3=$(top -n 1 -o %CPU | jq -Rs .)
message4=${message1//\\/\\\\/}
echo "{\"message\":\"${message2}\"}" > file.json
But when I look at the file.json, it looks soemthing like this:
{"message":""\u001b[?1h\u001b=\u001b[?25l\u001b[H\u001b[2J\u001b(B\u001b[mtop - 21:34:53 up 55 days, 5:14, 2 users, load average: 0.17, 0.09, 0.03\u001b(B\u001b[m\u001b[39;49m\u001b(B\u001b[m\u001b[39;49m\u001b[K\nTasks:\u001b(B\u001b[m\u001b[39;49m\u001b[1m 129 \u001b(B\u001b[m\u001b[39;49mtotal,\u001b(B\u001b[m\u001b[39;49m\u001b[1m 1 \u001b(B\u001b[m\u001b[39;49mrunning,\u001b(B\u001b[m\u001b[39;49m\u001b[1m 128 \u001b(B\u001b[m\u001b[39;49msleeping,\u001b(B\u001b[m
Each of the other attempts with message1 to message4 all result in various json syntax issues.
Can anyone suggest what I should try next?
You don't need all the whistle of echo and multiple jq invocations:
top -b -n 1 -o %CPU | jq -aRs '{"message": .}' >file.json
Or pass the output of the top command as an argument variable.
Using --arg to pass arguments to jq:
jq -an --arg msg "$(top -b -n 1 -o %CPU)" '{"message": $msg}' >file.json

Passing bash variable to jq

I have written a script to retrieve certain value from file.json. It works if I provide the value to jq select, but the variable doesn't seem to work (or I don't know how to use it).
#!/bin/sh
#this works ***
projectID=$(cat file.json | jq -r '.resource[] | select(.username=="myemail#hotmail.com") | .id')
echo "$projectID"
EMAILID=myemail#hotmail.com
#this does not work *** no value is printed
projectID=$(cat file.json | jq -r '.resource[] | select(.username=="$EMAILID") | .id')
echo "$projectID"
Consider also passing in the shell variable (EMAILID) as a jq variable (here also EMAILID, for the sake of illustration):
projectID=$(jq -r --arg EMAILID "$EMAILID" '
.resource[]
| select(.username==$EMAILID)
| .id' file.json)
Postscript
For the record, another possibility would be to use jq's env function for accessing environment variables. For example, consider this sequence of bash commands:
EMAILID=foo#bar.com # not exported
EMAILID="$EMAILID" jq -n 'env.EMAILID'
The output is a JSON string:
"foo#bar.com"
I resolved this issue by escaping the inner double quotes
projectID=$(cat file.json | jq -r ".resource[] | select(.username==\"$EMAILID\") | .id")
Little unrelated but I will still put it here,
For other practical purposes shell variables can be used as -
value=10
jq '."key" = "'"$value"'"' file.json
Posting it here as it might help others. In string it might be necessary to pass the quotes to jq. To do the following with jq:
.items[] | select(.name=="string")
in bash you could do
EMAILID=$1
projectID=$(cat file.json | jq -r '.resource[] | select(.username=='\"$EMAILID\"') | .id')
essentially escaping the quotes and passing it on to jq
It's a quote issue, you need :
projectID=$(
cat file.json | jq -r ".resource[] | select(.username=='$EMAILID') | .id"
)
If you put single quotes to delimit the main string, the shell takes $EMAILID literally.
"Double quote" every literal that contains spaces/metacharacters and every expansion: "$var", "$(command "$var")", "${array[#]}", "a & b". Use 'single quotes' for code or literal $'s: 'Costs $5 US', ssh host 'echo "$HOSTNAME"'. See
http://mywiki.wooledge.org/Quotes
http://mywiki.wooledge.org/Arguments
http://wiki.bash-hackers.org/syntax/words
Jq now have better way to access environment variables, you can use env.EMAILID:
projectID=$(cat file.json | jq -r ".resource[] | select(.username==env.EMAILID) | .id")
Another way to accomplish this is with the jq "--arg" flag.
Using the original example:
#!/bin/sh
#this works ***
projectID=$(cat file.json | jq -r '.resource[] |
select(.username=="myemail#hotmail.com") | .id')
echo "$projectID"
EMAILID=myemail#hotmail.com
# Use --arg to pass the variable to jq. This should work:
projectID=$(cat file.json | jq --arg EMAILID $EMAILID -r '.resource[]
| select(.username=="$EMAILID") | .id')
echo "$projectID"
See here, which is where I found this solution:
https://github.com/stedolan/jq/issues/626
I know is a bit later to reply, sorry. But that works for me.
export K8S_public_load_balancer_url="$(kubectl get services -n ${TENANT}-production -o wide | grep "ingress-nginx-internal$" | awk '{print $4}')"
And now I am able to fetch and pass the content of the variable to jq
export TF_VAR_public_load_balancer_url="$(aws elbv2 describe-load-balancers --region eu-west-1 | jq -r '.LoadBalancers[] | select (.DNSName == "'$K8S_public_load_balancer_url'") | .LoadBalancerArn')"
In my case I needed to use double quote and quote to access the variable value.
Cheers.
I also faced same issue of variable substitution with jq. I found that --arg is the option which must be used with square bracket [] otherwise it won't work.. I am giving you sample example below:
RUNNER_TOKEN=$(aws secretsmanager get-secret-value --secret-id $SECRET_ID | jq '.SecretString|fromjson' | jq --arg kt $SECRET_KEY -r '.[$kt]' | tr -d '"')
In case where we want to append some string to the variable value and we are using the escaped double quotes, for example appending .crt to a variable CERT_TYPE; the following should work:
$ CERT_TYPE=client.reader
$ cat certs.json | jq -r ".\"${CERT_TYPE}\".crt" #### This will *not* work #####
$ cat certs.json | jq -r ".\"${CERT_TYPE}.crt\""

Using jq to combine json files, getting file list length too long error

Using jq to concat json files in a directory.
The directory contains a few hundred thousand files.
jq -s '.' *.json > output.json
returns an error that the file list is too long. Is there a way to write this that uses a method that will take in more files?
If jq -s . *.json > output.json produces "argument list too long"; you could fix it using zargs in zsh:
$ zargs *.json -- cat | jq -s . > output.json
That you could emulate using find as shown in #chepner's answer:
$ find -maxdepth 1 -name \*.json -exec cat {} + | jq -s . > output.json
"Data in jq is represented as streams of JSON values ... This is a cat-friendly format - you can just join two JSON streams together and get a valid JSON stream.":
$ echo '{"a":1}{"b":2}' | jq -s .
[
{
"a": 1
},
{
"b": 2
}
]
The problem is that the length of a command line is limited, and *.json produces too many argument for one command line. One workaround is to expand the pattern in a for loop, which does not have the same limits as a command line, because bash can iterate over the result internally rather than having to construct an argument list for an external command:
for f in *.json; do
cat "$f"
done | jq -s '.' > output.json
This is rather inefficient, though, since it requires running cat once for each file. A more efficient solution is to use find to call cat with as many files as possible each time.
find . -name '*.json' -exec cat '{}' + | jq -s '.' > output.json
(You may be able to simply use
find . -name '*.json' -exec jq -s '{}' + > output.json
as well; it may depend on what is in the files and how multiple calls to jq using the -s option compares to a single call.)
[EDITED to use find]
One obvious thing to consider would be to process one file at a time, and then "slurp" them:
$ while IFS= read -r f ; cat "$f" ; done <(find . -maxdepth 1 -name "*.json") | jq -s .
This however would presumably require a lot of memory. Thus the following may be closer to what you need:
#!/bin/bash
# "slurp" a bunch of files
# Requires a version of jq with 'inputs'.
echo "["
while read f
do
jq -nr 'inputs | (., ",")' $f
done < <(find . -maxdepth 1 -name "*.json") | sed '$d'
echo "]"