Lets say we have this file:
{
"persons": [
{
"friends": 4,
"phoneNumber": 123456,
"personID": 11111
},
{
"friends": 2057,
"phoneNumber": 432100,
"personID": 22222
},
{
"friends": 50,
"phoneNumber": 147258,
"personID": 55555
}
]
}
I now want to extract the phone numbers of the persons 11111, 22222, 33333, 44444 and 55555 as a semicolon-separated string:
123456;432100;;;147258
While running
cat persons.txt | jq ".persons[] | select(.personID==<ID>) | .phoneNumber"
once for each <ID> and glueing the results together with the ; afterwards works, this is terribly slow, because it has to reload the file for each of the IDs (and other fields I want to extract).
Concatenating it in a single query:
cat persons.txt | jq "(.persons[] | select(.personID==11111) | .phoneNumber), (.persons[] | select(.personID==22222) | .phoneNumber), (.persons[] | select(.personID==33333) | .phoneNumber), (.persons[] | select(.personID==44444) | .phoneNumber), (.persons[] | select(.personID==55555) | .phoneNumber)"
This also works, but it gives
123456
432100
147258
so I do not know which of the fields are missing and how many ; I have to insert.
With your sample input in input.json, and using jq 1.6 (or a jq with INDEX/2), the following invocation of jq produces the desired output:
jq -r --argjson ids '[11111, 22222, 33333, 44444, 55555]' -f tossv.jq input.json
assuming tossv.jq contains the program:
INDEX(.persons[]; .personID) as $dict
| $ids
| map( $dict[tostring] | .phoneNumber)
| join(";")
Program notes
INDEX/2 produces a JSON object that serves as a dictionary. Since JSON keys must be strings, tostring must be used in line 3 above.
When using join(";"), null values effectively become empty strings.
If your jq does not have INDEX/2, then now might be a good time to upgrade. Otherwise you can snarf its definition by googling: jq "def INDEX" builtin.jq
Unfortunately I couldn't test if peak's answer works since I only have jq 1.5. Here's what I came up with yesterday evening:
For each semicolon, add the following query
(\";\" as \$a | \$a)
Resulting command (abstract):
cat persons.txt | jq "(<1's phone number>), (\";\" as \$a | \$a),
(<2's phone number>), (\";\" as \$a | \$a), ..."
Resulting command (concrete):
cat persons.txt | jq "(.persons[] | select(.personID==11111) | .phoneNumber), (\";\" as \$a | \$a),
(.persons[] | select(.personID==22222) | .phoneNumber), (\";\" as \$a | \$a),
(.persons[] | select(.personID==33333) | .phoneNumber), (\";\" as \$a | \$a),
(.persons[] | select(.personID==44444) | .phoneNumber), (\";\" as \$a | \$a),
(.persons[] | select(.personID==55555) | .phoneNumber)"
Result:
123456
";"
432100
";"
";"
";"
147258
Delete the newlines and ":
<commandAsAbove> | tr --delete "\n\""
Result:
123456;432100;;;147258
Do not get me wrong, this is far uglier than peak's answer, but it worked for me yesterday.
Without jq solution:
for i in $(seq 11111 11111 55555)
do
string=$(grep -B1 "$i" persons.txt | head -1 | sed 's/.* \(.*\),/\1/g')
echo "$string;" >> output
done
cat output | tr -d '\n' | rev | cut -d';' -f2- | rev > tmp && mv tmp output
This little script will yield the result you want and you can adapt it quickly if the input data varies
cat output
123456;432100;;;147258
Related
Not quite getting it. I can produce multiple lines but cannot get multiple entries to combine. Looking to take Source JSON and output to CSV as shown:
Source JSON:
[{"State": "NewYork","Drivers": [
{"Car": "Jetta","Users": [{"Name": "Steve","Details": {"Location": "Home","Time": "9a-7p"}}]},
{"Car": "Jetta","Users": [{"Name": "Roger","Details": {"Location": "Office","Time": "3p-6p"}}]},
{"Car": "Ford","Users": [{"Name": "John","Details": {"Location": "Home","Time": "12p-5p"}}]}
]}]
Desired CSV:
"NewYork","Jetta","Steve;Roger","Home;Office","9a-7p;3p-6p"
"NewYork","Ford","John","Home","12p-5p"
JQ code that does not work:
.\[\] | .Drivers\[\] | .Car as $car |
.Users\[\] |
\[$car, .Name\] | #csv
You're looking for something like this:
.[] | [.State] + (
.Drivers | group_by(.Car)[] | [.[0].Car] + (
map(.Users) | add | [
map(.Name),
map(.Details.Location),
map(.Details.Time)
] | map(join(";"))
)
) | #csv
$ jq -r -f tst.jq file
"NewYork","Ford","John","Home","12p-5p"
"NewYork","Jetta","Steve;Roger","Home;Office","9a-7p;3p-6p"
$
Not quite optimised, but I though't I'd share the general idea:
jq -r 'map(.State as $s |
(.Drivers | group_by(.Car))[]
| [
$s,
(map(.Users[].Name) | join(";")),
(map(.Users[].Details.Location) | join(";")),
(map(.Users[].Details.Time) | join(";"))
])
[] | #csv' b
map() over each state, remember the name (map(.State as $s | )
group_by(.Car)
Create an array containing all your fields that is passed to #csv
Use map() and join() to create the fields for Name, Location and Time
This part could be improved so you don't need that duplicated part
Output (with --raw-output:
"NewYork","John","Home","12p-5p"
"NewYork","Steve;Roger","Home;Office","9a-7p;3p-6p"
JqPlay seems down, so I'm still searching for an other way of sharing a public demo
Far from perfect, but it builds the result incrementally so it should be easily debuggable and extensible:
map({State} + (.Drivers[] | {Car} + (.Users[] | {Name} + (.Details | {Location, Time}))))
| group_by(.Car)
| map(reduce .[] as $item (
{State:null,Car:null,Name:[],Location:[],Time:[]};
. + ($item | {State,Car}) | .Name += [$item.Name] | .Location += [$item.Location] | .Time += [$item.Time]))
| .[]
| [.State, .Car, (.Name,.Location,.Time|join(","))]
| #csv
I save the IO graph statistics as CSV file containing the bits per second using the wireshark GUI. Is there a way to generate this CSV file with command line tshark? I can generate the statistics on command line as bytes per second as follows
tshark -nr test.pcap -q -z io,stat,1,BYTES
How do I generate bits/second and save it to a CSV file?
Any help is appreciated.
I don't know a way to do that using only tshark, but you can easily parse the output from tshark into a CSV file:
tshark -nr tmp.pcap -q -z io,stat,1,BYTES | grep -P "\d+\s+<>\s+\d+\s*\|\s+\d+" | awk -F '[ |]+' '{print $2","($5*8)}'
Explanations
grep -P "\d+\s+<>\s+\d+\s*\|\s+\d+" selects only the raw from the tshark output with the actual data (i.e., second <> second | transmitted bytes).
awk -F '[ |]+' '{print $2","($5*8)}' splits that data into 5 blocks with [ |]+ as the separator and display blocks 2 (the second at which starts the interval) and 5 (the transmitted bytes) with a comma between them.
Another thing that may be good to know:
If you change the interval from 1 second to 0.5 seconds, then you have to allow . in the grep part by adding \. between two digits \d .
Otherwise the result will be an empty *.csv file.
grep -P "\d{1,2}\.{1}\d{1,2}\s+<>\s+\d{1,2}\.{1}\d{1,2}\s*\|\s+\d+"
The answers in this thread gave me the keys to solving a similar problem with tshark io stats and I wanted to share the results and how it works. In my case, the task was to convert multiple columns of tshark io stat records with potential decimals in the data. This answer converts multiple data columns to csv, adds rudimentary headers, accounts for decimals in fields and variable numbers of spaces.
Complete command string
tshark -r capture.pcapng -q -z io,stat,30,,FRAMES,BYTES,"FRAMES()ip.src == 10.10.10.10","BYTES()ip.src == 10.10.10.10","FRAMES()ip.dst == 10.10.10.10","BYTES()ip.dst == 10.10.10.10" \
| grep -P "\d+\.?\d*\s+<>\s+|Interval +\|" \
| tr -d " " | tr "|" "," | sed -E 's/<>/,/; s/(^,|,$)//g; s/Interval/Start,Stop/g' > somefile.csv
Explanation
The command string has 3 major parts.
tshark creates the report with the data in columns
Extract the desired lines with grep
Use tr and sed to convert the records grep matched into a csv delimited file.
Part 1: tshark creates the report with the data in columns
tshark is run with -z io,stat at a 30 second interval, counting frames and bytes with various filters.
tshark -r capture.pcapng -q -z io,stat,30,,FRAMES,BYTES,"FRAMES()ip.src == 10.10.10.10","BYTES()ip.src == 10.10.10.10","FRAMES()ip.dst == 10.10.10.10","BYTES()ip.dst == 10.10.10.10"
Here is the output when run against my test pcap file:
=================================================================================================
| IO Statistics |
| |
| Duration: 179.179180 secs |
| Interval: 30 secs |
| |
| Col 1: Frames and bytes |
| 2: FRAMES |
| 3: BYTES |
| 4: FRAMES()ip.src == 10.10.10.10 |
| 5: BYTES()ip.src == 10.10.10.10 |
| 6: FRAMES()ip.dst == 10.10.10.10 |
| 7: BYTES()ip.dst == 10.10.10.10 |
|-----------------------------------------------------------------------------------------------|
| |1 |2 |3 |4 |5 |6 |7 |
| Interval | Frames | Bytes | FRAMES | BYTES | FRAMES | BYTES | FRAMES | BYTES |
|-----------------------------------------------------------------------------------------------|
| 0 <> 30 | 107813 | 120111352 | 107813 | 120111352 | 26682 | 15294257 | 80994 | 104808983 |
| 30 <> 60 | 122437 | 124508575 | 122437 | 124508575 | 49331 | 17080888 | 73017 | 107422509 |
| 60 <> 90 | 138999 | 135488315 | 138999 | 135488315 | 54829 | 22130920 | 84029 | 113348686 |
| 90 <> 120 | 158241 | 217781653 | 158241 | 217781653 | 42103 | 15870237 | 115971 | 201901201 |
| 120 <> 150 | 111708 | 131890800 | 111708 | 131890800 | 43709 | 18800647 | 67871 | 113082296 |
| 150 <> Dur | 123736 | 142639416 | 123736 | 142639416 | 50754 | 22053280 | 72786 | 120574520 |
=================================================================================================
Considerations
Looking at this output, we can see several items to consider:
Rows with data have a unique sequence in the Interval column of "space<>space", which we will can use for matching.
We want the header line, so we will use the word "Interval" followed by spaces and then a "|" character.
The number of spaces in a column are variable depending on the number of digits per measurement.
The Interval column gives both the time from 0 and the from the first measurement. Either can be used, so we will keep both and let the user decide.
When using milliseconds there will be decimals in the Interval field
Depending on the statistic requested, there may be decimals in the data columns
The use of "|" as delimiters will require escaping in any regex statement that covers them.
Part 2: Extract the desired lines with grep
Once tshark produces output, we use grep with regex to extract the lines we want to save.
grep -P "\d+\.?\d*\s+<>\s+|Interval +\|""
grep will use the "Digit(s)Space(s)<>Space(s)" character sequence in the Interval column to match the lines with data. It also uses an OR to grab the header by matching the characters "Interval |".
grep -P # The "-P" flag turns on PCRE regex matching, which is not the same as egrep. With egrep, you will need to change the escaping.
"\d+ # Match on 1 or more Digits. This is the 1st set of numbers in the Interval column.
\.? # 0 or 1 Periods. We need this to handle possible fractional seconds.
\d* # 0 or more Digits. To handle possible fractional seconds.
\s+<>\s+ # 1 or more Spaces followed by the Characters "<>", then 1 or more Spaces.
| # Since this is not escaped, it is a regex OR
Interval\s+\|" # Match the String "Interval" followed by 1 or more Spaces and a literal "|".
From the tshark output, grep matched these lines:
| Interval | Frames | Bytes | FRAMES | BYTES | FRAMES | BYTES | FRAMES | BYTES |
| 0 <> 30 | 107813 | 120111352 | 107813 | 120111352 | 26682 | 15294257 | 80994 | 104808983 |
| 30 <> 60 | 122437 | 124508575 | 122437 | 124508575 | 49331 | 17080888 | 73017 | 107422509 |
| 60 <> 90 | 138999 | 135488315 | 138999 | 135488315 | 54829 | 22130920 | 84029 | 113348686 |
| 90 <> 120 | 158241 | 217781653 | 158241 | 217781653 | 42103 | 15870237 | 115971 | 201901201 |
| 120 <> 150 | 111708 | 131890800 | 111708 | 131890800 | 43709 | 18800647 | 67871 | 113082296 |
| 150 <> Dur | 123736 | 142639416 | 123736 | 142639416 | 50754 | 22053280 | 72786 | 120574520 |
Part 3: Use tr and sed to convert the records grep matched into a csv delimited file.
tr and sed are used for converting the lines grep matched into csv. tr does the bulk work of removing spaces and changing the "|" to ",". This is simpler and faster then using sed. However, sed is used for some cleanup work
tr -d " " | tr "|" "," | sed -E 's/<>/,/; s/(^,|,$)//g; s/Interval/Start,Stop/g'
Here is how these commands perform the conversion. The first trick is to get rid of all of the spaces. This means we dont have to account for them in any regex sequences, making the rest of the work simpler
| tr -d " " # Spaces are in the way, so delete them.
| tr "|" "," # Change all "|" Characters to ",".
| sed -E 's/<>/,/; # Change "<>" to "," splitting the Interval column.
s/(^,|,$)//g; # Delete leading and/or trailing "," on each line.
s/Interval/Start,Stop/g' # Each of the "Interval" columns needs a header, so change the text "Interval" into two words with a , separating them.
> somefile.csv # Pipe the output into somefile.csv
Final result
Once through this process, we have a csv output that can now be imported into your favorite csv tool, spreadsheet, or fed to a graphing program like gnuplot.
$cat somefile.csv
Start,Stop,Frames,Bytes,FRAMES,BYTES,FRAMES,BYTES,FRAMES,BYTES
0,30,107813,120111352,107813,120111352,26682,15294257,80994,104808983
30,60,122437,124508575,122437,124508575,49331,17080888,73017,107422509
60,90,138999,135488315,138999,135488315,54829,22130920,84029,113348686
90,120,158241,217781653,158241,217781653,42103,15870237,115971,201901201
120,150,111708,131890800,111708,131890800,43709,18800647,67871,113082296
150,Dur,123736,142639416,123736,142639416,50754,22053280,72786,120574520
I want print "/" separator inside output title.
curl -s http://cd0a4a.ethosdistro.com/?json=yes \
| jq -c '.rigs|."0d6b27",."50dc35"|[.version,.driver,.miner,"\(.gpus)\(.miner_instance)"]|#csv' \
| sed 's/\\//g;s/\"//g' \
| gawk 'BEGIN{print "version" "," "GPU_driver" "," "miner" "," "gpu"} {print $0}' \
| csvlook -I
The output is like this :
| version | GPU_driver | miner | gpu |
| ------- | ---------- | -------- | --- |
| 1.2.3 | nvidia | ethminer | 22 |
| 1.2.4 | amdgpu | ethminer | 11 |
But I want separator in between the numbers inside gpu title like this :
| version | GPU_driver | miner | gpu |
| ------- | ---------- | -------- | ---- |
| 1.2.3 | nvidia | ethminer | 2/2 |
| 1.2.4 | amdgpu | ethminer | 1/1 |
You're doing a lot of unnecessary calls just to process the data. Your commands could be drastically simplified.
You don't need to explicitly key into the .rigs object to get their values, you could just access them using [].
You don't need the sed call to strip the quotes, just use the raw output -r.
You don't need the awk call to add the header, you could just output an additional row from jq.
So your command turns into this instead:
$ curl -s http://cd0a4a.ethosdistro.com/?json=yes \
| jq -r '["version", "GPU_driver", "miner", "gpu"],
(.rigs[] | [.version, .driver, .miner, "\(.gpus)/\(.miner_instance)"])
| #csv' \
| csvlook -I
Since you already use string interpolation for that specific field, simply include the character you need (slash /) inside the string, like this:
curl ... | jq -c '... [.version,.driver,.miner,"\(.gpus)/\(.miner_instance)"] ...'
In your case (the complete line):
curl -s http://cd0a4a.ethosdistro.com/?json=yes | jq -c '.rigs|."0d6b27",."50dc35"|[.version,.driver,.miner,"\(.gpus)/\(.miner_instance)"]|#csv' | sed 's/\\//g;s/\"//g' | gawk 'BEGIN{print "version" "," "GPU_driver" "," "miner" "," "gpu"} {print $0}' | csvlook -I
Here are some suggestions for simplification:
use the --raw-output option to jq to remove extraneous back-slashes
there is no need to remove the quotes, csvlook does it for you
no need for awk to add a title line, use a sub-shell
no need to specify rigs implicitly, use .[]
Here is an example:
(
echo version,GPU_driver,miner,gpu
curl -s 'http://cd0a4a.ethosdistro.com/?json=yes' |
jq -r '
.rigs | .[] |
[ .version, .driver , .miner , "\(.gpus)/\(.miner_instance)" ] |
#csv
'
) |
csvlook
Output:
|----------+------------+----------+------|
| version | GPU_driver | miner | gpu |
|----------+------------+----------+------|
| 1.2.3 | nvidia | ethminer | 2/2 |
| 1.2.4 | amdgpu | ethminer | 1/1 |
|----------+------------+----------+------|
I want print "/" separator inside output title.
curl -s http://cd0a4a.ethosdistro.com/?json=yes \
| jq -c '.rigs|."0d6b27",."50dc35"|[.version,.driver,.miner,"\(.gpus)\(.miner_instance)"]|#csv' \
| sed 's/\\//g;s/\"//g' \
| gawk 'BEGIN{print "version" "," "GPU_driver" "," "miner" "," "gpu"} {print $0}' \
| csvlook -I
The output is like this :
| version | GPU_driver | miner | gpu |
| ------- | ---------- | -------- | --- |
| 1.2.3 | nvidia | ethminer | 22 |
| 1.2.4 | amdgpu | ethminer | 11 |
But I want separator in between the numbers inside gpu title like this :
| version | GPU_driver | miner | gpu |
| ------- | ---------- | -------- | ---- |
| 1.2.3 | nvidia | ethminer | 2/2 |
| 1.2.4 | amdgpu | ethminer | 1/1 |
You're doing a lot of unnecessary calls just to process the data. Your commands could be drastically simplified.
You don't need to explicitly key into the .rigs object to get their values, you could just access them using [].
You don't need the sed call to strip the quotes, just use the raw output -r.
You don't need the awk call to add the header, you could just output an additional row from jq.
So your command turns into this instead:
$ curl -s http://cd0a4a.ethosdistro.com/?json=yes \
| jq -r '["version", "GPU_driver", "miner", "gpu"],
(.rigs[] | [.version, .driver, .miner, "\(.gpus)/\(.miner_instance)"])
| #csv' \
| csvlook -I
Since you already use string interpolation for that specific field, simply include the character you need (slash /) inside the string, like this:
curl ... | jq -c '... [.version,.driver,.miner,"\(.gpus)/\(.miner_instance)"] ...'
In your case (the complete line):
curl -s http://cd0a4a.ethosdistro.com/?json=yes | jq -c '.rigs|."0d6b27",."50dc35"|[.version,.driver,.miner,"\(.gpus)/\(.miner_instance)"]|#csv' | sed 's/\\//g;s/\"//g' | gawk 'BEGIN{print "version" "," "GPU_driver" "," "miner" "," "gpu"} {print $0}' | csvlook -I
Here are some suggestions for simplification:
use the --raw-output option to jq to remove extraneous back-slashes
there is no need to remove the quotes, csvlook does it for you
no need for awk to add a title line, use a sub-shell
no need to specify rigs implicitly, use .[]
Here is an example:
(
echo version,GPU_driver,miner,gpu
curl -s 'http://cd0a4a.ethosdistro.com/?json=yes' |
jq -r '
.rigs | .[] |
[ .version, .driver , .miner , "\(.gpus)/\(.miner_instance)" ] |
#csv
'
) |
csvlook
Output:
|----------+------------+----------+------|
| version | GPU_driver | miner | gpu |
|----------+------------+----------+------|
| 1.2.3 | nvidia | ethminer | 2/2 |
| 1.2.4 | amdgpu | ethminer | 1/1 |
|----------+------------+----------+------|
I have an aws query that I want to filter in jq.
I want to filter all the imageTags that don't end with "latest"
So far I did this but it filters things containing "latest" while I want to filter things not containing "latest" (or not ending with "latest")
aws ecr describe-images --repository-name <repo> --output json | jq '.[]' | jq '.[]' | jq "select ((.imagePushedAt < 14893094695) and (.imageTags[] | contains(\"latest\")))"
Thanks
You can use not to reverse the logic
(.imageTags[] | contains(\"latest\") | not)
Also, I'd imagine you can simplify your pipeline into a single jq call.
All you have to do is | not within your jq
A useful example, in particular for mac brew users:
List all bottled formulae
by querying the JSON and parsing the output
brew info --json=v1 --installed | jq -r 'map(
select(.installed[].poured_from_bottle)|.name) | unique | .[]' | tr '\n' ' '
List all non-bottled formulae
by querying the JSON and parsing the output and using | not
brew info --json=v1 --installed | jq -r 'map(
select(.installed[].poured_from_bottle | not) | .name) | unique | .[]'
In this case contains() doesn't work properly, is better use the not of index() function
select(.imageTags | index("latest") | not)
This .[] | .[] can be shorten to .[][] e.g.,
$ jq --null-input '[[1,2],[3,4]] | .[] | .[]'
1
2
3
4
$ jq --null-input '[[1,2],[3,4]] | .[][]'
1
2
3
4
To check whether a string does not contain another string, you can combine contains and not e.g.,
$ jq --null-input '"foobar" | contains("foo") | not'
false
$ jq --null-input '"barbaz" | contains("foo") | not'
true
You can do something similar with an array of strings with either any or all e.g.,
$ jq --null-input '["foobar","barbaz"] | any(.[]; contains("foo"))'
true
$ jq --null-input '["foobar","barbaz"] | any(.[]; contains("qux"))'
false
$ jq --null-input '["foobar","barbaz"] | all(.[]; contains("ba"))'
true
$ jq --null-input '["foobar","barbaz"] | all(.[]; contains("qux"))'
false
Say you had file.json:
[ [["foo", "foo"],["foo", "bat"]]
, [["foo", "bar"],["foo", "bat"]]
, [["foo", "baz"],["foo", "bat"]]
]
And you only want to keep the nested arrays that don't have any strings with "ba":
$ jq --compact-output '.[][] | select(all(.[]; contains("bat") | not))' file.json
["foo","foo"]
["foo","bar"]
["foo","baz"]