List directory to json format using jq - json

I've tried to get linux list of my all files and directories in specified path to json format using ls and jq.
Desired output:
this is all what I have...
ls | jq -R '[.]' | jq -s -c 'add'
Is it possible to build output like above in the picture?

The following only handles vanilla files and is neither portable nor robust but should be sufficient to get you on your way.
The JSON structure that is emitted is very similar to the output of the tree program (shown below); in particular, it uses directory components as strings, since that produces an economical
hierarchy allowing queries such as .a.b to view details about the directory ‘./a/b’. To provide jq the necessary data, we use find . -ls.
jqtree
#!/bin/bash
find . -ls | jq -nR '
# Return an object with useful information
def gather:
[splits(" +")] as $in
| { pathname: $in[-1], entrytype: $in[2][0:1], size: ($in[6] | tonumber) };
reduce (inputs | gather) as $entry ({};
($entry.pathname | split("/") ) as $names
| if ($entry|.entrytype == "-") then
($names[0:-1] + ["items"]) as $p
| setpath($p; getpath($p) + [{name: $names[-1], size: $entry.size}])
else . end) '
Demo
$ tree
.
|-- a
| `-- b
| `-- foo
|-- big
|-- foo
`-- so
$ ~/bin/jqtree
{
".": {
"items": [
{
"name": "big",
"size": 1025
},
{
"name": "so",
"size": 667
},
{
"name": "foo",
"size": 0
}
],
"a": {
"b": {
"items": [
{
"name": "foo",
"size": 0
}
]
}
}
}
}

This Linux works on mips. Not have find . -ls (PARAM not available), tree not have.
Maybe someone can compile tree package.
BusyBox v1.22.1 (2017-06-29 11:15:20 CST) multi-call binary.
Usage: find [-HL] [PATH]... [OPTIONS] [ACTIONS]
Search for files and perform actions on them.
First failed action stops processing of current file.
Defaults: PATH is current directory, action is '-print'
-L,-follow Follow symlinks
-H ...on command line only
Actions:
ACT1 [-a] ACT2 If ACT1 fails, stop, else do ACT2
ACT1 -o ACT2 If ACT1 succeeds, stop, else do ACT2
Note: -a has higher priority than -o
-name PATTERN Match file name (w/o directory name) to PATTERN
-iname PATTERN Case insensitive -name
If none of the following actions is specified, -print is assumed
-print Print file name

Related

get files from directory in bash and build JSON object using jq

I am trying to build list of JSON objects with the files in a particular directory. I am looping thru the files and creating the expected output object as string. I am sure there is a better way of doing this using jq.
Can someone please help me out here?
# input
files=($( ls * ))
prefix="myawesomeprefix"
# expected output
{
"listoffiles": [
{"file":"myawesomeprefix/file1.txt"},
{"file":"myawesomeprefix/file2.txt"},
{"file":"myawesomeprefix/file3.txt"},
]
}
If you don't have any "problematic" file names, e.g. ones that have new lines as part of their name, the following should work:
ls -1 | jq -Rn '{ listoffiles: [inputs | { file: "prefix/\(.)" }] }'
It reads each line as string, and reads them through the inputs filter (must be combined with -n null-input). It then builds your object.
$ cat <<LS | jq -Rn '{ listoffiles: [inputs | {file:"prefix/\(.)"}] }'
file1
file2
file with spaces
LS
{
"listoffiles": [
{
"file": "prefix/file1"
},
{
"file": "prefix/file2"
},
{
"file": "prefix/file with spaces"
}
]
}
You could use for with a glob which should handle new lines in file names as well. But it requires you to chain 2 jq commands:
for f in *; do
printf '%s' "$f" | jq -Rs '{file:"prefix/\(.)"}';
done | jq -s '{listoffiles:.}'
To specify the prefix as variable from the outside, use --arg, e.g.
jq --arg prefix "yourprefixvalue" '$prefix + .'
You can try the nice little command line tool jc:
ls | jc --ls
It converts the output of many shell commands to JSON. For reference have a look there in Github https://github.com/kellyjonbrazil/jc .
Then you can transform the result using jq:
ls | jc --ls | jq "{ listoffiles: [.[] | { file: (\"$prefix/\" + .filename) }] }"
You shouldn't parse the output of ls. If installed, you could use tree with the -J option to produce a JSON listing, which you can transform to your needs using jq:
tree -aJL 1 | jq '
{listoffiles: first.contents | map({file: ("myawesomeprefix/" + .name)})}
'
Or more comfortably using --arg:
tree -aJL 1 | jq --arg prefix myawesomeprefix '
{listoffiles: first.contents | map({file: "\($prefix)/\(.name)"})}
'
This is another alternative :
jq -n --arg prefix "myawesomeprefix"\
'.listoffiles = ($ARGS.positional |
map({file:($prefix+"/"+.)}))'\
--args *

jq: from one json input, construct multiple rows of tsv using an expression against the keys?

Using jq I can extract the data in this simple way as follows:
find . -name '*.jsonl' | xargs -I {} jq '[.data.Item_A_Foo.value, .data.Item_A_Bar.value] | #tsv' >> foobar.tsv
find . -name '*.jsonl' | xargs -I {} jq '[.data.Item_B_Foo.value, .data.Item_B_Bar.value] | #tsv' >> foobar.tsv
find . -name '*.jsonl' | xargs -I {} jq '[.data.Item_B_Foo.value, .data.Item_B_Bar.value] | #tsv' >> foobar.tsv
...
# and so on
But this seems pretty wasteful. Is there a more advanced way to use JQ, and perhaps:
Filter for .data.Item_*_Foo.value, .data.Item_*_Bar.value
OR chain these rows in a single jq expression (reasonably readable, compact)
# Here is a made up JSON file that can motivate this question.
# Imagine there are 100,000 of these and they are larger.
{
"data":
{
"Item_A_Foo": {
"adj": "wild",
"adv": "unruly",
"value": "unknown"
},
"Item_A_Bar": {
"adj": "rotund",
"quality": "mighty",
"value": "swing"
},
"Item_B_Foo": {
"adj": "nice",
"adv": "heroically",
"value": "medium"
},
... etc. for many Foo's and Bar's of A, B, C, ..., Z types
"Not_an_Item": {
"value": "doesn't matter"
}
}
And the goal is:
unknown, swing # data.Item_A_Foo.value, data.Item_A_Bar.value
medium, hit # data.Item_B_Foo.value, data.Item_B_Bar.value
whatever, etc. # data.Item_C_Foo.value, data.Item_C_Bar.value
The details of your requirements are unclear, but you could proceed along the lines suggested by this jq filter:
.data
| (keys_unsorted|map(select(test("^Item_[^_]*_Foo$")))) as $foos
| ($foos | map(sub("_Foo$"; "_Bar"))) as $bars
| [ .[$foos[]].value, .[$bars[]].value]
| #tsv
The idea is to determine dynamically which keys to select.

Output paths to all keys named "id" where the type of value is "string"

Given a huge (15GB) deeply nested (12+ object layers) JSON file how can I find the paths to all the keys named id whose values are type string?
A massively simplified example file:
{
"a": [
{
"id": 3,
"foo": "red"
}
],
"b": [
{
"id": "7",
"bar": "orange",
"baz": {
"id": 13
},
"bax": {
"id": "12"
}
}
]
}
Looking for a less ugly solution where I don't run out of RAM and have to punt to grep at the end (sigh). (I failed to figure out how to chain to_entries into this usefully. If that's even something I should be trying to do.)
Ugly solution 1:
$ cat huge.json | jq 'path(..|select(type=="string")) | join(".")' | grep -E '\.id"$'
"b.0.id"
"b.0.bax.id"
Ugly solution 2:
$ cat huge.json | jq --stream -c | grep -E '"id"],"'
[["b",0,"id"],"7"]
[["b",0,"bax","id"],"12"]
Something like this should do that.
jq --stream 'select(.[0][-1] == "id" and (.[1] | strings)) | .[0]' file
And by the way, your first ugly solution can be simplified to this:
jq 'path(.. .id? | strings)' file
Stream the input in as you started with your second solution, but add some filtering. You do not want want to read the entire contents into memory. And also... UUOC.
$ jq --stream '
select(.[0][-1] == "id" and (.[1]|type) == "string")[0]
| join(".")
' huge.json
Thank you both oguz and Jeff! Beautiful! This runs in 6.5 minutes (on my old laptop), never uses more than 21MB of RAM, and gives me exactly what I need. <3
$ jq --stream -c 'select(.[0][-1] == "id" and (.[1]|type) == "string")' huge.json

Building new JSON with JQ and bash

I am trying to create JSON from scratch using bash.
The final structure needs to be like:
{
"hosts": {
"a_hostname" : {
"ips" : [
1,
2,
3
]
},
{...}
}
}
First I'm creating an input file with the format:
hostname ["1.1.1.1","2.2.2.2"]
host-name2 ["3.3.3.3","4.4.4.4"]
This is being created by:
for host in $( ansible -i hosts all --list-hosts ) ; \
do echo -n "${host} " ; \
ansible -i hosts $host -m setup | sed '1c {' | jq -r -c '.ansible_facts.ansible_all_ipv4_addresses' ; \
done > hosts.txt
The key point here is that the IP list/array, is coming from a JSON file and being extracted by jq. This extraction outputs an already valid / quoted JSON array, but as a string in a txt file.
Next I'm using jq to parse the whole text file into the desired JSON:
jq -Rn '
{ "hosts": [inputs |
split("\\s+"; "g") |
select(length > 0 and .[0] != "") |
{(.[0]):
{ips:.[1]}
}
] | add }
' < ~/hosts.txt
This is almost correct, everything except for the IPs value which is treated as a string and quoted leading to:
{
"hosts": {
"hostname1": {
"ips": "[\"1.1.1.1\",\"2.2.2.2\"]"
},
"host-name2": {
"ips": "[\"3.3.3.3\",\"4.4.4.4\"]"
}
}
}
I'm now stuck at this final hurdle - how to insert the IPs without causing them to be quoted again.
Edit - quoting solved by using {ips: .[1] | fromjson }} instead of {ips:.[1]}.
However this was completely negated by #CharlesDuffy's help suggesting converting to TSV.
Original Q body:
So far I've got to
jq -n {hosts:{}} | \
for host in $( ansible -i hosts all --list-hosts ) ; \
do jq ".hosts += {$host:{}}" | \
jq ".hosts.$host += {ips:[1,2,3]}" ; \
done ;
([1,2,3] is actually coming from a subshell but including it seemed unnecessary as that part works, and made it harder to read)
This sort of works, but there seems to be 2 problems.
1) Final output only has a single host in it containg data from the first host in the list (this persists even if the second problem is bypassed):
{
"hosts": {
"host_1": {
"ips": [
1,
2,
3
]
}
}
}
2) One of the hostnames has a - in it, which causes syntax and compiler errors from jq. I'm stuck going around quote hell trying to get it to be interpreted but also quoted. Help!
Thanks for any input.
Let's say your input format is:
host_1 1 2 3
host_2 2 3 4
host-with-dashes 3 4 5
host-with-no-addresses
...re: edit specifying a different format: Add #tsv onto the JQ command producing the existing format to generate this one instead.
If you want to transform that to the format in question, it might look like:
jq -Rn '
{ "hosts": [inputs |
split("\\s+"; "g") |
select(length > 0 and .[0] != "") |
{(.[0]): .[1:]}
] | add
}' <input.txt
Which yields as output:
{
"hosts": {
"host_1": [
"1",
"2",
"3"
],
"host_2": [
"2",
"3",
"4"
],
"host-with-dashes": [
"3",
"4",
"5"
],
"host-with-no-addresses": []
}
}

How to use jq to find all paths to a certain key

In a very large nested json structure I'm trying to find all of the paths that end in a key.
ex:
{
"A": {
"A1": {
"foo": {
"_": "_"
}
},
"A2": {
"_": "_"
}
},
"B": {
"B1": {}
},
"foo": {
"_": "_"
}
}
would print something along the lines of:
["A","A1","foo"], ["foo"]
Unfortunately I don't know at what level of nesting the keys will appear, so I haven't been able to figure it out with a simple select. I've gotten close with jq '[paths] | .[] | select(contains(["foo"]))', but the output contains all the permutations of any tree that contains foo.
output: ["A", "A1", "foo"]["A", "A1", "foo", "_"]["foo"][ "foo", "_"]
Bonus points if I could keep the original data structure format but simply filter out all paths that don't contain the key (in this case the sub trees under "foo" wouldn't need to be hidden).
With your input:
$ jq -c 'paths | select(.[-1] == "foo")'
["A","A1","foo"]
["foo"]
Bonus points:
(1) If your jq has tostream:
$ jq 'fromstream(tostream| select(.[0]|index("foo")))'
Or better yet, since your input is large, you can use the streaming parser (jq -n --stream) with this filter:
fromstream( inputs|select( (.[0]|index("foo"))))
(2) Whether or not your jq has tostream:
. as $in
| reduce (paths(scalars) | select(index("foo"))) as $p
(null; setpath($p; $in|getpath($p)))
In all three cases, the output is:
{
"A": {
"A1": {
"foo": {
"_": "_"
}
}
},
"foo": {
"_": "_"
}
}
I had the same fundamental problem.
With (yaml) input like:
developer:
android:
members:
- alice
- bob
oncall:
- bob
hr:
members:
- charlie
- doug
this:
is:
really:
deep:
nesting:
members:
- example deep nesting
I wanted to find all arbitrarily nested groups and get their members.
Using this:
yq . | # convert yaml to json using python-yq
jq '
. as $input | # Save the input for later
. | paths | # Get the list of paths
select(.[-1] | tostring | test("^(members|oncall|priv)$"; "ix")) | # Only find paths which end with members, oncall, and priv
. as $path | # save each path in the $path variable
( $input | getpath($path) ) as $members | # Get the value of each path from the original input
{
"key": ( $path | join("-") ), # The key is the join of all path keys
"value": $members # The value is the list of members
}
' |
jq -s 'from_entries' | # collect kv pairs into a full object using slurp
yq --sort-keys -y . # Convert back to yaml using python-yq
I get output like this:
developer-android-members:
- alice
- bob
developer-android-oncall:
- bob
hr-members:
- charlie
- doug
this-is-really-deep-nesting-members:
- example deep nesting