jq - sorting value by a related value - json

Basically I'm just trying to make a list of NCAA March Madness teams sorted by their respective seeds.
I'm using the JSON file from http://data.ncaa.com/jsonp/scoreboard/basketball-men/d1/2017/03/17/scoreboard.html. It's actually JSONP, but I convert it to JSON before parsing through it using:
jq -s -R '.[1+index("("): rindex(")")] | fromjson'
Piping that into the following command I can generate a nice list of the teams:
jq -r '.scoreboard[].games[] | select(.bracketRound=="First Round" and .bracketRegion=="EAST") | .home,.away | .nameRaw'
...but I want them to be in order of their seed. I've tried using sort and sort_by in various ways to no avail. I'm out of ideas.

Given your data, the following filter:
[ .scoreboard[].games[]
| select(.bracketRound=="First Round" and .bracketRegion=="EAST")
| (.home, .away) ]
| sort_by(.teamSeed | tonumber)
| .[]
| [.teamSeed, .nameRaw ]
produces:
["2","Duke"]
["3","Baylor"]
["6","SMU"]
["7","South Carolina"]
["10","Marquette"]
["11","USC"]
["14","New Mexico St."]
["15","Troy"]
If you just want the "nameRaw" values, then replace the last line of the filter by: | .nameRaw
Note that tonumber is required here as the seed values are given as strings.
Handling multiple top-level objects
In a comment, the OP gave a pastebin (https://pastebin.com/1eTAX4y3) consisting of two top-level objects each with a "scoreboard". Let us therefore consider the case of an arbitrary number of such objects.
For clarity, we begin by defining a function for selecting the home/away objects from a JSON object with "scoreboard":
def games:
[.scoreboard[].games[]
| select(.bracketRound=="First Round" and .bracketRegion=="EAST")
| (.home, .away) ] ;
Using the -s command-line option, we can ensure the JSON input is an array of objects. The arrays produced by games can be combined using add:
map(games)
| add
| sort_by(.teamSeed | tonumber)
| .[]
| [.teamSeed, .nameRaw ]
Given the pastebin data, the invocation using the command-line options -s and -c options produces:
["1","Villanova"]
["2","Duke"]
["3","Baylor"]
["4","Florida"]
["5","Virginia"]
["6","SMU"]
["7","South Carolina"]
["8","Wisconsin"]
["9","Virginia Tech"]
["10","Marquette"]
["11","USC"]
["12","UNCW"]
["13","East Tenn. St."]
["14","New Mexico St."]
["15","Troy"]
["16","Mt. St. Mary's"]

Does this do what you want?
jq -r '
def NameAndSeed(f): f | {nameRaw, "teamSeed" : (.teamSeed | tonumber)};
[
.scoreboard[].games[]
| select(.bracketRound=="First Round" and .bracketRegion=="EAST")
| NameAndSeed(.home), NameAndSeed(.away)
]
| sort_by(.teamSeed)
| .[].nameRaw'
To get sort_by to do what I think you want, I put the objects in an array and converted the teamSeed values to numbers.

Related

Mapping over a JSON array of objects and processing values using JQ

Just started playing around with jq and cannot for the life of me come to terms with how I should approach this in a cleaner way. I have some data from AWS SSM Parameter Store that I receive as JSON, that I want to process.
The data is structured in the following way
[
{
"Name": "/path/to/key_value",
"Value": "foo"
},
{
"Name": "/path/to/key_value_2",
"Value": "bar"
},
...
]
I want it output in the following way: key_value=foo key_value_2=bar. My first thought was to process it as follows: map([.Name | split("/") | last, .Value] | join("=")) | join(" ") but then I get the following error: jq: error (at <stdin>:9): Cannot index array with string "Value". It's as if the reference to the Value value is lost after piping the value for the Name parameter.
Of course I could just solve it like this, but it's just plain ugly: map([.Value, .Name | split("/") | last] | reverse | join("=")) | join(" "). How do I process the value for Name without losing reference to Value?
Edit: JQ Play link
map((.Name | split("/") | last) + "=" + .Value) | join(" ")
Will output:
"key_value=foo key_value_2=bar"
Online demo
The 'trick' is to wrap the .Name | split("/") | last) into () so that .value remains available
If you prefer string interpolation (\()) over (key) + .Value, you can rewrite it as:
map("\(.Name | split("/") | last)=\(.Value)") | join(" ")
Online demo

jq string manipulation on domain names and dns records

I am attempting to learn some jq but am running into trouble.
I am working with a dataset of dns records like {"timestamp":"1592145252","name":"0.127.9.109.rev.sfr.net","type":"a","value":"109.9.127.0"}
I cannot figure out how to
strip the subdomain details out of the name field. in this example i just want sfr.net
print the name backwards, eg: 0.127.9.109.rev.sfr.net would become ten.rfs.ver.901.9.721.0
my end goal is to print lines like this:
0.127.9.109.rev.sfr.net,ten.rfs.ver.901.9.721.0,a,sfr.net
Thanks SO!
To extract the "domain" part, you could use simple string manipulation methods to select it. Assuming anything after the .rev. part is the domain, you could do this:
split(".rev.")[1]
To reverse a string, jq doesn't have the operations to do it directly for strings. However it does have a function to reverse arrays. So you could convert to an array, reverse, then convert back.
split("") | reverse | join("")
To put it all together for your input:
.name | [
.,
(split("") | reverse | join("")),
(split(".rev.")[1])
] | join(",")
Here's one approach using reverse and capture:
jq -r '
.type as $type
| .name
| "\(.),\(explode|reverse|implode),\($type),"
+ capture("(?<subdomain>[^.]+[.][^.]+)$").subdomain'
Like this :
$ jq -r '.name' file.json | grep -oE '\w+\.\w+$'
sfr.net
$ jq -r '.name' file.json | rev
ten.rfs.ver.901.9.721.0

Use jq to select specific item from array [duplicate]

This question already has answers here:
Get JSON string from within javascript on a html page using shell script
(2 answers)
Closed 4 years ago.
I have this json string :
{
"head": {
"url": "foobar;myid=E50DAA932C22739F92BB250C14365440"
}
}
With jq on the shell I get the content of url as an array:
jq -r '.head.url | split(";")[] '
This returns:
foobar
myid=E50DAA932C22739F92BB250C14365440
My goal is to get the id (E50DA...) after = only. I could simply use [1] to get the second element and then use a regex to get the part after the =.
But the order of elements is not safe and I'm sure there's a better way with jq already that I dont know of. Maybe create a map of the elements and use myid as a key to get the value (E50...)?
Thank you for your input!
Do you have to do it with jq only? You could further process the output with grep and cut:
jq '.head.url | split(";")[]' | grep '^myid=' | cut -d= -f2
But alas, it is easily possible by first building an object from the key value pairs and then look up the value for the key in question:
.head.url
| split(";")
| map(split("=") | { key: .[0], value: .[1] })
| from_entries
| .myid
equivalent to:
.head.url
| split(";")
| map(split("=") | { key: .[0], value: .[1] })
| from_entries["myid"]
Or without building an object, simply by selecting the first array item with matching key, then outputting its value:
.head.url | split(";")[] | split("=") | select(first == "myid")[1]
NB. x | split(y) can be expressed as x/y, e.g. .head.url/"#".
Using jq's match() with positive lookbehind to output what's after myid=:
$ jq -r '.head.url | split(";")[] | match("(?<=myid=).*;"g").string' file
E50DAA932C22739F92BB250C14365440
or drop the split() and match() after myid= until the end or ;:
$ jq -r '.head.url | match("(?<=myid=)[^;]*";"g").string' file
E50DAA932C22739F92BB250C14365440

Bash with JQ grouping

I have a file with a stream of JSON objects as follows:
{"id":4496,"status":"Analyze","severity":"Critical","severityCode":1,"state":"New","code":"RNPD.DEREF","title":"Suspicious dereference of pointer before NULL check","message":"Suspicious dereference of pointer \u0027peer-\u003esctSapCb\u0027 before NULL check at line 516","file":"/home/build/branches/mmm/file1","method":"CzUiCztGpReq","owner":"unowned","taxonomyName":"C and C++","dateOriginated":1473991086512,"url":"http://xxx/yyy","issueIds":[4494]}
{"id":4497,"status":"Analyze","severity":"Critical","severityCode":1,"state":"New","code":"NPD.GEN.CALL.MIGHT","title":"Null pointer may be passed to function that may dereference it","message":"Null pointer \u0027tmpEncodedPdu\u0027 that comes from line 346 may be passed to function and can be dereferenced there by passing argument 1 to function \u0027SCpyMsgMsgF\u0027 at line 537.","file":"/home/build/branches/mmm/file1","method":"CzUiCztGpReq","owner":"unowned","taxonomyName":"C and C++","dateOriginated":1473991086512,"url":"http://xxx/yyy/zzz","issueIds":[4495]}
{"id":4498,"status":"Analyze","severity":"Critical","severityCode":1,"state":"New","code":"NPD.GEN.CALL.MIGHT","title":"Null pointer may be passed to function that may dereference it","message":"Null pointer \u0027tmpEncodedPdu\u0027 that comes from line 346 may be passed to function and can be dereferenced there by passing argument 1 to function \u0027SCpyMsgMsgF\u0027 at line 537.","file":"/home/build/branches/mmm/otherfile.c","method":"CzUiCztGpReq","owner":"unowned","taxonomyName":"C and C++","dateOriginated":1473991086512,"url":"http://xxx/yyy/zzz","issueIds":[4495]}
I would like to get with JQ (or in some other way), three lines, one each for the ids, the URLs, and the file name:
This is what I have so far:
cat /tmp/file.json | ~/bin_compciv/jq --raw-output '.id,.url,.file'
Result:
4496
http://xxx/yyy
/home/build/branches/mmm/file1
.
.
.
BUT - I would like to group them by file name, so that I will get comma-separated lists of urls and ids on the same line, like this:
4496,4497
http://xxx/yyy,http://xxx/yyy/zzz
/home/build/branches/mmm/file1
With one minor exception, you can readily achieve the stated goals using jq as follows:
jq -scr 'map({id,url,file})
| group_by(.file)
| .[]
| ((map(.id) | #csv) , (map(.url) | #csv), (.[0] | .file))'
Given your input, the output would be:
4496,4497
"http://xxx/yyy","http://xxx/yyy/zzz"
/home/build/branches/mmm/file1
4498
"http://xxx/yyy/zzz"
/home/build/branches/mmm/otherfile.c
You could then eliminate the quotation marks using a text-editing tool such as sed; using another invocation of jq; or as described below. However, this might not be such a great idea if there's ever any chance that any of the URLs contains a comma.
Here's the filter for eliminating the quotation marks with just one invocation of jq:
map({id,url,file})
| group_by(.file)
| .[]
| ((map(.id) | #csv),
([map(.url) | join(",")] | #csv | .[1:-1]),
(.[0] | .file))
Here is a solution which uses group_by and the -r, -s jq options:
group_by(.file)[]
| ([ "\(.[].id)" ] | join(",")),
([ .[].url ] | join(",")),
.[0].file

Convert JSON to CSV

I'm working on storing around 200 000 Json objects into a CSV file. But the problem is that any 2 JSON Objects might be different (having different key names).
I thought about creating a HashSet and traverse through all objects once so as to get column names for my CSV file. But this process is apparently taking too much time.
Is there another way to add columns to a CSV file dynamically?
One approach would be to use jq ("Json Query"):
def tocsv:
if length == 0 then empty
else
(.[0] | keys_unsorted) as $keys
| (map(keys) | add | unique) as $allkeys
| ($keys + ($allkeys - $keys)) as $cols
| ($cols, (.[] as $row | $cols | map($row[.])))
| #csv
end ;
tocsv
For example, assuming the above is in a file named json2csv.jq and that the input is in in.json:
jq -r -f json2csv.jq in.json
The above program constructs the header line by starting with the key names of the first object (in the order in which they appear there), and then extends the header line as required.
For more about jq, see https://stedolan.github.io/jq
Another approach would be to use in2csv, part of the csvkit tookit -- see https://csvkit.readthedocs.org