I have a log file called log.json that's formatted like this:
{"msg": "Service starting up!"}
{"msg": "Running a job!"}
{"msg": "Error detected!"}
And another file called messages.json, which looks like this:
{"msg": "Service starting up!", "out": "The service has started"}
{"msg": "Error detected!", "out": "Uh oh, there was an error!"}
{"msg": "Service stopped", "out": "The service has stopped"}
I'm trying to write a function using jq that reads in both files, and whenever it finds a msg in log.json that matches a msg in messages.json, print the value of out in the corresponding line in messages.json. So, in this case, I'm hoping to get this as output:
"The service has started"
"Uh oh, there was an error!"
The closest that I've been able to get so far is the following:
jq --argfile a log.json --argfile b messages.json -n 'if ($a[].msg == $b[].msg) then $b[].out else empty end'
This successfully performs all of the comparisons that I'm hoping to make. However, rather than printing the specific out that I'm looking for, it instead prints every out whenever the if statement returns true (which makes sense. $b[].out was never redefined, and asks for each of them). So, this statement outputs:
"The service has started"
"Uh oh, there was an error!"
"The service has stopped"
"The service has started"
"Uh oh, there was an error!"
"The service has stopped"
So at this point, I need some way to ask for $b[current_index].out, and just print that. Is there a way for me to do this (or an entirely seperate approach that I can use)?
messages.json effectively defines a dictionary, so let's begin by creating a JSON dictionary which we can lookup easily. This can be done conveniently using INDEX/2 which (in case your jq does not have it) is defined as follows:
def INDEX(stream; idx_expr):
reduce stream as $row ({};
.[$row|idx_expr|
if type != "string" then tojson
else .
end] |= $row);
A first-cut solution is now straightforward:
INDEX($messages[]; .msg) as $dict
| inputs
| $dict[.msg]
| .out
Assuming this is in program.jq, an appropriate invocation would be as follows (note especially the -n option):
jq -n --slurpfile messages messages.json -f program.jq log.json
The above will print null if the .msg in the log file is not in the dictionary. To filter out these nulls, you could (for example) add select(.) to the pipeline.
Another possibility would be to use the original .msg, as in this variation:
INDEX($messages[]; .msg) as $dict
| inputs
| . as $in
| $dict[.msg]
| .out // $in.msg
Related
I want to parse a JSON file and extract some values, while also discarding or skipping certain entries if they contain substrings from another list passed in as an argument. The purpose is to exclude objects containing miscellaneous human-readable keywords from a master list.
input.json
{
"entities": [
{
"id": 600,
"name": "foo-001"
},
{
"id": 601,
"name": "foo-002"
},
{
"id": 602,
"name": "foobar-001"
}
]
}
args.json (list of keywords)
"foobar-"
"BANANA"
The output must definitely contain the foo-* entries (but not the excluded foobar- entries), but it can also contain any other names, provided they don't contain foobar- or BANANA. The exclusions are to be based on substrings, not exact matches.
I'm looking for a more performant way of doing this, because currently I just do my normal filters:
jq '[.[].entities[] | select(.name != "")] | walk(if type == "string" then gsub ("\t";"") else . end)' > file
(the input file has some erroneous tab escapes and null fields in it that are preprocessed)
At this stage, the file has only been minimally prepared. Then I iterate through this file line by line in shell and invoke grep -vf with a long list of invalid patterns from the keywords file. This gives a "master list" that is sanitized for later parsing by other applications. This seems intuitively wrong, though.
It seems like this should be done in one fell swoop on the first pass with jq instead of brute forcing it in a loop later.
I tried various invocations of INDEX and --slurpfile, but I seem to be missing something:
jq '.entities | INDEX(.name)[inputs]' input.json args.json
The above is a simplistic way of indexing the input args that at least seems to demonstrate that the patterns in the file can be matched verbatim, but doesn't account for substrings (contains ).
jq '.[] | walk(if type == "object" and (.name | contains($args[]))then empty else . end)' --slurpfile args args.json input.json
This looks to be getting closer to the idea, but something is screwy here. It seems like it's regurgitating all of the input file for each iteration of the arguments in the keywords file and returning them all for N number of arguments, and not actually emptying the original input, just dumbly checking the entire file for the presence of a single keyword and then starting over.
It seems like I need to unwrap the $args[] and map it here somehow so that the input file only gets iterated through once, with each keyword being checked for each record, rather than the entire file over and over again.
I found some conflicting information about whether a slurpfile is strictly necessary and can't determine what's the optimal approach here.
Thanks.
You could use all/2 as follows:
< input.json jq --slurpfile blacklist args.json '
.entities
| map(select(.name as $n
| all( $blacklist[]; . as $b | $n | index($b) | not) ))
'
or more concisely (but perhaps less obviously correct):
.entities | map( select( all(.name; index( $blacklist[]) | not) ))
You might wish to write .entities |= map( ... ) instead if you want to retain the original structure.
I have a JSON which goes like this:
{
"results":[
{
"uri":"www.xxx.com"
}
]
}
EDIT
When uri is not present, JSON looks like this:
{
"results":[
]
}
In some cases, uri is present and in some cases, it is not.
Now, I want to use jq to return boolean value if uri is present or not.
This is what I wrote so far but despite uri being present, it gives null.
${search_query_response} contains the JSON
file_status=$(jq -r '.uri' <<< ${search_query_response})
Can anyone guide me?
Since you use jq, it means you are working within a shell script context.
If the boolean result is to be handled by the shell script, you can make jq set its EXIT_CODE depending on the JSON request success or failure status, with jq -e
Example shell script using the EXIT_CODE from jq:
if uri=$(jq -je '.results[].uri') <<<"$search_query_response"
then
printf 'Search results contains an URI: %s.\n' "$uri"
else
echo 'No URI in search results.'
fi
See man jq:
-e / --exit-status:
Sets the exit status of jq to 0 if the last output values was neither false nor null, 1 if the last output value was either false or null, or 4 if no valid result was ever produced. Normally jq exits with 2 if there was any usage problem or system error, 3 if there was a jq program compile error, or 0 if the jq program ran.
Another way to set the exit status is with the halt_error builtin function.
The has function does the job:
jq '.results|map(has("uri"))|.[]'
map the has function on .results.
Summary
Starting from a JSON array, I'd like to replace an attribute of each item by the contents of a text file.
Example:
We have this initial JSON array:
[
{ "name": "step-a", "message": "step-a message placeholder" },
{ "name": "step-b", "message": "step-b message placeholder" }
]
And two text files matching the name attribute values (with an added .txt):
.
├── step-a.txt # contains the text: "Step A error logs ..."
└── step-b.txt # contains the text: "Step B error logs ..."
The goal is to perform the replacement and end up with this JSON array:
[
{ "name": "step-a", "message": "Step A error logs ..." },
{ "name": "step-b", "message": "Step B error logs ..." }
]
Attempt
I tried something like this:
# the variable $INITIAL contains the initial JSON array
echo $INITIAL | jq -c '.[] | .message = "$(cat .displayName+".txt")"'
Is there a way to perform an operation like this with jq or is using extra bash logic necessary?
Thank you.
You can use input_filename to achieve the goal very simply and efficiently.
For example, with the following in program.jq
# program.jq
(reduce inputs as $step ( {}; .[input_filename | rtrimstr(".txt")] = $step )) as $dict
| $a
| map(. + {message: $dict[.name] })
the following invocation would yield the expected result:
jq -n -f program.jq --argfile a array.json step-*.txt
An alternative
Depending on your requirements, you might like to replace the last line of program.jq as given above by:
| map(.message = ($dict[.name] // .message))
Alternatives to --argfile
If you prefer not to use --argfile, feel free to use --argjson or even --slurpfile, with appropriate adjustments to the above.
You can pass variables/arguments to jq using:
jq -r --arg first $1
$1 being your variable (which you can do: _fileContents="$( cat step-a.txt )" )
first is the variable name you want to create (which you can use inside the jq command like a bash variable ( $_fileContents );
An example from one of my scripts:
export techUser=$(jq -r --arg first $1 '.Environment[].Servers[] | select (.Address | contains($first)) | ."Technical User"' $serverlist)
I hope this has helped you finding a way to make it work ;)
I've got a tool that outputs a JSON record on each line, and I'd like to process it with jq.
The output looks something like this:
{"ts":"2017-08-15T21:20:47.029Z","id":"123","elapsed_ms":10}
{"ts":"2017-08-15T21:20:47.044Z","id":"456","elapsed_ms":13}
When I pass this to jq as follows:
./tool | jq 'group_by(.id)'
...it outputs an error:
jq: error (at <stdin>:1): Cannot index string with string "id"
How do I get jq to handle JSON-record-per-line data?
Use the --slurp (or -s) switch:
./tool | jq --slurp 'group_by(.id)'
It outputs the following:
[
[
{
"ts": "2017-08-15T21:20:47.029Z",
"id": "123",
"elapsed_ms": 10
}
],
[
{
"ts": "2017-08-15T21:20:47.044Z",
"id": "456",
"elapsed_ms": 13
}
]
]
...which you can then process further. For example:
./tool | jq -s 'group_by(.id) | map({id: .[0].id, count: length})'
As #JeffMercado pointed out, jq handles streams of JSON just fine, but if you use group_by, then you'd have to ensure its input is an array. That could be done in this case using the -s command-line option; if your jq has the inputs filter, then it can also be done using that filter in conjunction with the -n option.
If you have a version of jq with inputs (which is available in jq 1.5), however, then a better approach would be to use the following streaming variant of group_by:
# sort-free stream-oriented variant of group_by/1
# f should always evaluate to a string.
# Output: a stream of arrays, one array per group
def GROUPS_BY(stream; f): reduce stream as $x ({}; .[$x|f] += [$x] ) | .[] ;
Usage example: GROUPS_BY(inputs; .id)
Note that you will want to use this with the -n command line option.
Such a streaming variant has two main advantages:
it generally requires less memory in that it does not require a copy of the entire input stream to be kept in memory while it is being processed;
it is potentially faster because it does not require any sort operation, unlike group_by/1.
Please note that the above definition of GROUPS_BY/2 follows the convention for such streaming filters in that it produces a stream. Other variants are of course possible.
Handling a large amount of data
The following illustrates how to economize on memory. Suppose the task is to produce a frequency count of .id values. The humdrum solution would be:
GROUPS_BY(inputs; .id) | [(.[0]|.id), length]
A more economical and indeed far better solution would be:
GROUPS_BY(inputs|.id; .) | [.[0], length]
I have multiple files in the following format with different categories like:
{
"id": 1,
"flags": ["a", "b", "c"],
"name": "test",
"category": "video",
"notes": ""
}
Now I want to append all the files flags whose category is video with string d. So my final file should look like the file below:
{
"id": 1,
"flags": ["a", "b", "c", "d"],
"name": "test",
"category": "video",
"notes": ""
}
Now using the following command I am able to find files of my interest, but now I want to work with editing part which I an unable to find as there are 100's of file to edit manually, e.g.
find . - name * | xargs grep "\"category\": \"video\"" | awk '{print $1}' | sed 's/://g'
You can do this
find . -type f | xargs grep -l '"category": "video"' | xargs sed -i -e '/flags/ s/]/, "d"]/'
This will find all the filnames which contain line with "category": "video", and then add the "d" flag.
Details:
find . -type f
=> Will get all the filenames in your directory
xargs grep -l '"category": "video"'
=> Will get those filenames which contain the line "category": "video"
xargs sed -i -e '/flags/ s/]/, "d"]/'
=> Will add the "d" letter to the flags:line.
"TWEET!!" ... (yellow flag thown to the ground) ... Time Out!
What you have, here, is "a JSON file." You also have, at your #!shebang command, your choice of(!) full-featured programming languages ... with intimate and thoroughly-knowledgeale support for JSON ... with which you can very-speedily write your command-file.
Even if it is "theoretically possible" to do this using "bash scripts," this is roughly equivalent to "putting a beautiful stone archway over the front-entrance to a supermarket." Therefore, "waste ye no time" in such an utterly-profitless pursuit. Write a script, using a language that "honest-to-goodness knows about(!) JSON," to decode the contents of the file, then manipulate it (as a data-structure), then re-encode it again.
Here is a more appropriate approach using PHP in shell:
FILE=foo2.json php -r '$file = $_SERVER["FILE"]; $arr = json_decode(file_get_contents($file)); if ($arr->category == "video") { $arr->flags[] = "d"; file_put_contents($file,json_encode($arr)); }'
Which will load the file, decode into array, add "d" into flags property only when category is video, then write back to the file in JSON format.
To run this for every json file, you can use find command, e.g.
find . -name "*.json" -print0 | while IFS= read -r -d '' file; do
FILE=$file
# run above PHP command in here
done
If the files are in the same format, this command may help (version for a single file):
ex +':/category.*video/norm kkf]i, "d"' -scwq file1.json
or:
ex +':/flags/,/category/s/"c"/"c", "d"/' -scwq file1.json
which is basically using Ex editor (now part of Vim).
Explanation:
+ - executes Vim command (man ex)
:/pattern_or_range/cmd - find pattern, if successful execute another Vim commands (:h :/)
norm kkf]i - executes keystrokes in normal mode
kk - move cursor up twice
f] - find ]
i, "d" - insert , "d"
-s - silent mode
-cwq - executes wq (write & quit)
For multiple files, use find and -execdir or extend above ex command to:
ex +'bufdo!:/category.*video/norm kkf]i, "d"' -scxa *.json
Where bufdo! executes command for every file, and -cxa saves every file. Add -V1 for extra verbose messages.
If flags line is not 2 lines above, then you may perform backward search instead. Or using similar approach to #sps by replacing ] with d.
See also: How to change previous line when the pattern is found? at Vim.SE.
Using jq:
find . -type f | xargs cat | jq 'select(.category=="video") | .flags |= . + ["d"]'
Explanation:
jq 'select(.category=="video") | .flags |= . + ["d"]'
# select(.category=="video") => filters by category field
# .flags |= . + ["d"] => Updates the flags array