Separate JSON and non-JSON logs with jq? - json

I have some log files which contain mixed of JSON and non-JSON logs, I'd like to separate them into two files, one contains JSON logs only and the other contains non-JSON logs, I get some ideas from this to extract JSON logs with jq, here are what I have tried using tee to split log into two files (usage from here & here) and jq to extract logs:
cat $logfile | tee >(jq -R -c 'fromjson? | select(type == "object") | not') > $plain_log_file) >(jq -R -c 'fromjson? | select(type == "object")' > $json_log_file)
This extracts JSON logs correctly but returns false for each non-JSON log instead of the log content itself.
cat $logfile | tee >(jq -R -c 'try fromjson catch .') > $plain_log_file) >(jq -R -c 'fromjson? | select(type == "object")' > $json_log_file)
this gets jq syntax error "catch ."
I do this so I can view the logs in lnav (an excellent log view/navigation tool).
Any suggestion on how to achieve this? Appreciate your help!
sample input:
{ "name": "joe"}
text line, this can be multi-line too
{ "xyz": 123 }

Assuming each JSON log item occurs on a separate line:
For the JSON logs:
jq -nR -c 'inputs|fromjson?'
For the others, you could use:
jq -nRr 'inputs | . as $in | try (fromjson|empty) catch $in'

If you only want to linewise separate the input into different files, go with #peak's solution. But if you want to further process the lines on conditions, you could turn them into an array using -Rn and [inputs], and go from there. For instance, if you need the according line numbers (e.g. to feed them into another tool, e.g. sed), use from_entries which for arrays provides them in the .key field:
jq -Rn 'reduce ([inputs] | to_entries[]) as $in ({};
.[($in.value | fromjson? | "json") // "plain"] += [$in.key]
)'
{
"json": [
0,
2
],
"plain": [
1
]
}
Demo

If each JSON log entry can be spread over multiple lines, then some assumptions about the non-JSON log entries must be made. Here is an example based on reasonable assumptions about the non-JSON entries. A bash or bash-like environment is also assumed for the sake of convenience.
function log {
cat<<EOF
{ "name":
"joe"}
text line, this can be
multi-line too
{
"xyz": 123 }
EOF
}
log | sed '/^[^"{[ ]/ { s/"/\\"/g ; s/^/"/; s/$/"/;}' |
tee >(jq -rc 'select(type == "string")' > strings.log) |
jq -rc 'select(type != "string")' > json.log

Related

get files from directory in bash and build JSON object using jq

I am trying to build list of JSON objects with the files in a particular directory. I am looping thru the files and creating the expected output object as string. I am sure there is a better way of doing this using jq.
Can someone please help me out here?
# input
files=($( ls * ))
prefix="myawesomeprefix"
# expected output
{
"listoffiles": [
{"file":"myawesomeprefix/file1.txt"},
{"file":"myawesomeprefix/file2.txt"},
{"file":"myawesomeprefix/file3.txt"},
]
}
If you don't have any "problematic" file names, e.g. ones that have new lines as part of their name, the following should work:
ls -1 | jq -Rn '{ listoffiles: [inputs | { file: "prefix/\(.)" }] }'
It reads each line as string, and reads them through the inputs filter (must be combined with -n null-input). It then builds your object.
$ cat <<LS | jq -Rn '{ listoffiles: [inputs | {file:"prefix/\(.)"}] }'
file1
file2
file with spaces
LS
{
"listoffiles": [
{
"file": "prefix/file1"
},
{
"file": "prefix/file2"
},
{
"file": "prefix/file with spaces"
}
]
}
You could use for with a glob which should handle new lines in file names as well. But it requires you to chain 2 jq commands:
for f in *; do
printf '%s' "$f" | jq -Rs '{file:"prefix/\(.)"}';
done | jq -s '{listoffiles:.}'
To specify the prefix as variable from the outside, use --arg, e.g.
jq --arg prefix "yourprefixvalue" '$prefix + .'
You can try the nice little command line tool jc:
ls | jc --ls
It converts the output of many shell commands to JSON. For reference have a look there in Github https://github.com/kellyjonbrazil/jc .
Then you can transform the result using jq:
ls | jc --ls | jq "{ listoffiles: [.[] | { file: (\"$prefix/\" + .filename) }] }"
You shouldn't parse the output of ls. If installed, you could use tree with the -J option to produce a JSON listing, which you can transform to your needs using jq:
tree -aJL 1 | jq '
{listoffiles: first.contents | map({file: ("myawesomeprefix/" + .name)})}
'
Or more comfortably using --arg:
tree -aJL 1 | jq --arg prefix myawesomeprefix '
{listoffiles: first.contents | map({file: "\($prefix)/\(.name)"})}
'
This is another alternative :
jq -n --arg prefix "myawesomeprefix"\
'.listoffiles = ($ARGS.positional |
map({file:($prefix+"/"+.)}))'\
--args *

How to ignore particular keys inside .properties files while converting to json

I have .property file which I'm trying to convert to a json file using bash command(s) and I wanted to exclude particular keys being shown in the json file. Below are my .properties inside the property file, I want to exclude property 4 and 5 being converted to json
app.database.address=127.0.0.70
app.database.host=database.myapp.com
app.database.port=5432
app.database.user=dev-user-name
app.database.pass=dev-password
app.database.main=dev-database
Here's my bash command used for converting to json but it converts all the properties to json
cat fle.properties | jq -R -s 'split("\n") | map(split("=")) | map({(.[0]): .[1]}) | add' > zppprop.json
Is there any way we can include these parameters to exclude from converting to json
With xidel:
XPath + JSONiq solution
$ xidel -s fle.properties -e '
{|
x:lines($raw)[not(position() = (4,5))] ! {
substring-before(.,"="):substring-after(.,"=")
}
|}
'
{
"app.database.address": "127.0.0.70",
"app.database.host": "database.myapp.com",
"app.database.port": "5432",
"app.database.main": "dev-database"
}
x:lines($raw) is a shorthand for tokenize($raw,'\r\n?|\n') and turns $raw, the raw input, into a sequence where every new line is another item.
[not(position() = (4,5))] if it's always the 4th and 5th line you want to exclude. Otherwise, use [not(contains(.,"user") or contains(.,"pass"))] as seen below.
XQuery solution
$ xidel -s --xquery '
map:merge(
for $x in file:read-text-lines("fle.properties")[not(contains(.,"user") or contains(.,"pass"))]
let $kv:=tokenize($x,"=")
return
{$kv[1]:$kv[2]}
)
'
{
"app.database.address": "127.0.0.70",
"app.database.host": "database.myapp.com",
"app.database.port": "5432",
"app.database.main": "dev-database"
}
You can use file:read-text-lines() to do everything "in-query".
Playground.
You may filter out unneeded lines with grep:
cat fle.properties | grep -v -E "user|pass" | jq -R -s 'split("\n") | map(select(length > 0)) | map(split("=")) | map({(.[0]): .[1]}) | add'
It is also needed to remove the empty string at the end of the array returned by the split function. This is what map(select(length > 0)) is doing.
You can do the exclusion within the jq script:
properties2json
#!/usr/bin/env -S jq -sRf
split("\n") |
map(split("=")) |
map(
if .[0] | test(".*\\.(user|pass)";"i")
then
{}
else
{(.[0]): .[1]}
end
) |
add
# Make it executable
chmod +x properties2json
# Run it
./properties2json file.properties >file.json

jq raw json output carriage return?

Feel free to edit the title; not sure how to word it. I'm trying to turn shell output into JSON data for a reporting system I'm writing for work. Quick question, no matter what i do, when I take raw input in slurp mode and output the JSON, the last item in the array is blank (""). I feel like this is some sort of rookie jq issue I'm running into, but can't figure out how to word the issue. This seems to happen no matter what command I run on the shell and pipe to jq:
# rpm -qa | grep kernel | jq -R -s 'split("\n")'
[
"kernel-2.6.32-504.8.1.el6.x86_64",
"kernel-firmware-2.6.32-696.20.1.el6.noarch",
"kernel-headers-2.6.32-696.20.1.el6.x86_64",
"dracut-kernel-004-409.el6_8.2.noarch",
"abrt-addon-kerneloops-2.0.8-43.el6.x86_64",
"kernel-devel-2.6.32-358.11.1.el6.x86_64",
"kernel-2.6.32-131.4.1.el6.x86_64",
"kernel-devel-2.6.32-696.20.1.el6.x86_64",
"kernel-2.6.32-696.20.1.el6.x86_64",
"kernel-devel-2.6.32-504.8.1.el6.x86_64",
"libreport-plugin-kerneloops-2.0.9-33.el6.x86_64",
""
]
Any help is appreciated.
Every line ends with a newline. Either remove the final newline, or omit the empty element at the end of the array.
vnix$ printf 'foo\nbar\n' |
> jq -R -s '.[:-1] | split("\n")'
[
"foo",
"bar"
]
vnix$ printf 'foo\nbar\n' |
> jq -R -s 'split("\n")[:-1]'
[
"foo",
"bar"
]
The notation x[:-1] retrieves the value of a string or array x with the last element removed. This is called "slice notation".
Just to spell this out, if you take the string "foo\n" and split on newline, you get "foo" from before the newline and "" after it.
To make this really robust, maybe trim the last character only if it really is a newline.
vnix$ printf 'foo\nbar\n' |
> jq -R -s 'sub("\n$";"") | split("\n")'
[
"foo",
"bar"
]
vnix$ printf 'foo\nbar' |
> # notice, no final ^ newine
> jq -R -s 'sub("\n$";"") | split("\n")'
[
"foo",
"bar"
]
Assuming you have access to jq 1.5 or later, you can circumvent the problem entirely and economically using inputs:
jq -nR '[inputs]'
Just be sure to include the -n option, otherwise the first line will go missing.
You can also use
rpm -qa | grep kernel | jq -R . | jq -s .
to get the desired result.
Please see https://github.com/stedolan/jq/issues/563

Reading and Looping Through A JSON File in BASH

I've got a JSON file (see below) called department_groups.json.
Essentially if I gave an argument of commercial I'd like it to return:
commercial-team#domain.com
commercial-updates#domain.com
Can anyone guide/help me with doing this?
{
"legal": {
"google_groups":[
["Legal", "legal#domain.com"],
["Legal Team", "legal-team#domain.com"],
["Compliance Checks", "compliance#domain.com"]
],
"samba_groups": ""
},
"commercial":{
"google_groups":[
["Commercial Team", "commercial-team#domain.com"],
["Commercial Updates", "commercial-updates#domain.com"]
],
"samba_groups": ""
},
"technology":{
"google_groups":[
["Technology", "technology#domain.com"],
["Incidents", "incidents#domain.com"]
],
"samba_groups": ""
}
}
This returns the second element in each array in the google_groups property of the commercial property:
jq --arg key commercial '.[$key].google_groups | .[] | .[1]' file
Use jq -r to output in "raw" format (lose the double quotes).
$ key=commercial
$ jq -r --arg key "$key" '.[$key].google_groups | .[] | .[1]' file
commercial-team#domain.com
commercial-updates#domain.com
I used --arg in these examples to show how it is used, optionally with a shell variable. If, on the other hand, commercial was just a fixed string, then you could simplify:
jq -r '.commercial.google_groups | .[] | .[1]' file
To process each line of the output, you can just use a shell while read loop:
key=commercial
while read -r email; do
echo "$email"
# process each email individually here
done < <(jq -r --arg key "$key" '.[$key].google_groups | .[] | .[1]' file)
Here I am using a process substitution <(), which acts like a file that can be processed by the shell. One advantage of doing this, over using a pipe, is that no subshell is created. Among other things, this means that the variables used within the loop remain in scope after the while block, so you can use them later.
If you prefer to use a pipe, just remove the part after done and move the command up to the first line:
jq ... | while read -r email; do # etc.
As #TomFenech noted, the requirements are somewhat unclear, but if it's the email addresses you want, the following variant of his answer may be of interest:
key=commercial
$ jq -r --arg key "$key" '.[$key].google_groups[][] | select(test("#"))' department_groups.json
commercial-team#domain.com
commercial-updates#domain.com

How to get newline on every iteration in jq

I have the following file
[
{
"id": 1,
"name": "Arthur",
"age": "21"
},
{
"id": 2,
"name": "Richard",
"age": "32"
}
]
To display login and id together, I am using the following command
$ jq '.[] | .name' test
"Arthur"
"Richard"
But when I put it in a shell script and try to assign it to a variable then the whole output is displayed on a single line like below
#!/bin/bash
names=$(jq '.[] | .name' test)
echo $names
$ ./script.sh
"Arthur" "Richard"
I want to break at every iteration similar to how it works on the command line.
Couple of issues in the information you have provided. The jq filter .[] | .login, .id will not produce the output as you claimed on jq-1.5. For your original JSON
{
"login":"dmaxfield",
"id":7449977
}
{
"login":"stackfield",
"id":2342323
}
It will produce four lines of output as,
jq -r '.login, .id' < json
dmaxfield
7449977
stackfield
2342323
If you are interested in storing them side by side, you need to do variable interpolation as
jq -r '"\(.login), \(.id)"' < json
dmaxfield, 7449977
stackfield, 2342323
And if you feel your output stored in a variable is not working. It is probably because of lack of double-quotes when you tried to print the variable in the shell.
jqOutput=$(jq -r '"\(.login), \(.id)"' < json)
printf "%s\n" "$jqOutput"
dmaxfield, 7449977
stackfield, 2342323
This way the embedded new lines in the command output are not swallowed by the shell.
For you updated JSON (totally new one compared to old one), all you need to do is
jqOutput=$(jq -r '.[] | .name' < json)
printf "%s\n" "$jqOutput"
Arthur
Richard
In case the .login or .id contains embedded spaces or other characters that might cause problems, a more robust approach is to ensure each JSON value is on a separate line. Consider, for example:
jq -c .login,.id input.json | while read login ; do read id; echo login="$login" and id="$id" ; done
login="dmaxfield" and id=7449977
login="stackfield" and id=2342323