jq: grab specific repeating key-value from first N elements - json

I am trying to parse https://api.weather.gov/gridpoints/PHI/47,91/forecast/hourly, with mild success.
{
"number": 1,
"name": "",
"startTime": "2020-12-16T13:00:00-05:00",
"endTime": "2020-12-16T14:00:00-05:00",
"isDaytime": true,
"temperature": 30,
"temperatureUnit": "F",
"temperatureTrend": null,
"windSpeed": "15 mph",
"windDirection": "NE",
"icon": "https://api.weather.gov/icons/land/day/snow,40?size=small",
"shortForecast": "Chance Light Snow",
"detailedForecast": ""
}
I started with jq '.properties.periods[0]' to grab the first element, worked with jq '.properties.periods[0].shortForecast' and I figured out that jq '.properties.periods[0,1,2,3]' gets me the first 4 elements in the array.
However, I run into a syntax error if I try jq '.properties.periods[:3]'
jq: error: syntax error, unexpected '['
which I thought would be a shorthand for 0-3.
Additionally, I only want (the same, repeating) specific K/V pairs from each element (eg: shortForecast, temperature, etc.), but I have not been able to figure out how to combine it all into one jq statement.
So how do I grab specific values from the first X elements of an array? (I dont really need the keys, just the values.)
Bonus: would be great to have all values from each element on a single line.
Sample:
"2020-12-16T14:00:00-05:00" 30 "Chance Light Snow"
"2020-12-16T15:00:00-05:00" 30 "Snow"
"2020-12-16T16:00:00-05:00" 29 "Heavy Snow"

.properties.periods[:3] evaluates to an array of the three items, whereas .properties.periods[0,1,2] produces an itemization. So the abbreviation of the latter would be:
.properties.periods[:3][]
Selection
There are numerous possibilities, e.g. to get a specific set of key-value pairs on a single line:
jq -c '.properties.periods[:3][]
| {shortForecast, temperature}' input.json
To select just the values as CSV:
.properties.periods[:3][]
| {shortForecast, temperature}
| [.[]]
| #csv
You might like to use #tsv instead, or join(" "), or ....
Bonus
To get all the values in the order in which they are given, you could simply omit the selection line: | {....}
However, that would not be so robust. The following would be safer:
.properties.periods[:3]
| (.[0] | keys_unsorted) as $keys
| .[]
| [.[$keys[]]]
| #tsv

Related

How to sum up string fields that contain percentages with 'jq'?

I have a JSON file that's keeping track of column widths for a table as percentages. So the input file, columns.json, looks something like this:
[
{
"name": "Column A",
"width": "33%"
},
{
"name": "Column B",
"width": "33%"
},
{
"name": "Column C",
"width": "33%"
},
{
"name": "Column D",
"visible": false
}
]
Some columns are not displayed and therefore don't have widths (jq '.[].width' will return nulls for these), and then there's also the issue of the percent signs. Otherwise I might've used munge | munge | paste -sd+ | bc, which is usually what I use for summing things up in the shell, but that seems stupid here because jq ought to be able to do this by itself.
So using only jq, how can I sum up the width fields from this file, e.g., to make sure they don't exceed 100%?
Things I have tried (that didn't work)
I use select(.) here to filter out records that don't have a .width, then get rid of the percent sign:
jq '[.[].width | select(.) | sub("%"; "")] | add' columns.json
…but that just concatenates the strings and returns "333333".
I didn't see any mention of the word "typecast" in the jq man page, so I thought maybe it would do type inference, treating a string that looks like a number as a number in the right context:
jq '[.[].width | select(.) | sub("%"; "") | .+0] | add' columns.json
…but that just yields and error message like:
jq: error (at columns.json:18): string ("33") and number (0) cannot be added
A shorter alternative:
map(.width[:-1] | tonumber?) | add
Online demo
This SO answer gave me the clue that there was a tostring function, so a more thorough search of the manual page revealed that the analogous function for numbers is tonumber.
Well, duh. I guess I was expecting it to be named something else, like toint, which is why I didn't find it while string-searching through the man page.
Here is the solution I ended up with:
jq 'map(.width | sub("%"; "")? | tonumber) | add' columns.json
Instead of select(.) to filter out the nulls from objects with no .width field, I just silently ignore the error in sub (with ?), which drops those records.
Note that map(.width) is just another way of saying [.[].width].

Can I extract all key values inside second-level curly braces while ignoring values in the first-level of the curly braces

With JQ I am trying to read/extract the values present in the second-level of the curly braces (.phone, .termination, .duration, while ignoring values present in the first-level of curly braces (12345, 67891 and 78912, which are already repeated inside the second-level braces. Can this be done?
{
"12345":[{
"phone": "12345",
"termination": "picked-up",
"duration": 5
}],
"67891":[{
"phone": "67891",
"termination": "picked-up",
"duration": 10
}],
"78912":[{
"phone": "78912",
"termination": "busy",
"duration": 0
}]
}
I've tried filtering values by defining the keyword filters that interest me, but I am clearly missing additional first steps. There are the square brackets to consider as well. Ty
cat test.json | jq [.phone, .termination, .duration] | less
I'd like to have a line containing comma separated values for the three filters described.
12345, picked-up, 5,
67891, picked-up, 10,
78912, busy, 0
If the innermost keys are always in the same order, you could use the filter:
.[] | [.[][]] | #csv
Otherwise:
.[][]
| [.phone, .termination, .duration]
| #csv
In either case, you'd probably want to use the -r command-line option to yield:
"12345","picked-up",5
"67891","picked-up",10
"78912","busy",0
Unquoted
If you want to suppress the quotation marks, you could use the following, again with the -r option:
.[] | [.[][]] | join(", ")
or:
.[][] | "\(.phone), \(.termination), \(.duration)"
With the -r command-line option, these last two yield:
12345, picked-up, 5
67891, picked-up, 10
78912, busy, 0

Select entries based on multiple values in jq

I'm working with JQ and I absolutely love it so far. I'm running into an issue I've yet to find a solution to anywhere else, though, and wanted to see if the community had a way to do this.
Let's presume we have a JSON file that looks like so:
{"author": "Gary", "text": "Blah"}
{"author": "Larry", "text": "More Blah"}
{"author": "Jerry", "text": "Yet more Blah"}
{"author": "Barry", "text": "Even more Blah"}
{"author": "Teri", "text": "Text on text on text"}
{"author": "Bob", "text": "Another thing to say"}
Now, we want to select rows where the value of author is equal to either "Gary" OR "Larry", but no other case. In reality, I have several thousand names I'm checking against, so simply stating the direct or conditional (e.g. cat blah.json | jq -r 'select(.author == "Gary" or .author == "Larry")') isn't sufficient. I'm trying to do this via the inside function like so but get an error dialog:
cat blah.json | jq -r 'select(.author | inside(["Gary", "Larry"]))'
jq: error (at <stdin>:1): array (["Gary","La...) and string ("Gary") cannot have their containment checked
What would be the best method for doing something like this?
inside and contains are a bit weird. Here are some more straightforward solutions:
index/1
select( .author as $a | ["Gary", "Larry"] | index($a) )
any/2
["Gary", "Larry"] as $whitelist
| select( .author as $a | any( $whitelist[]; . == $a) )
Using a dictionary
If performance is an issue and if "author" is always a string, then a solution along the lines suggested by #JeffMercado should be considered. Here is a variant (to be used with the -n command-line option):
["Gary", "Larry"] as $whitelist
| ($whitelist | map( {(.): true} ) | add) as $dictionary
| inputs
| select($dictionary[.author])
IRC user gnomon answered this on the jq channel as follows:
jq 'select([.author] | inside(["Larry", "Garry", "Jerry"]))'
The intuition behind this approach, as stated by the user was: "Literally your idea, only wrapping .author as [.author] to coerce it into being a single-item array so inside() will work on it." This answer produces the desired result of filtering for a series of names provided in a list as the original question desired.
You can use objects as if they're sets to test for membership. Methods operating on arrays will be inefficient, especially if the array may be huge.
You can build up a set of values prior to reading your input, then use the set to filter your inputs.
$ jq -n --argjson names '["Larry","Garry","Jerry"]' '
(reduce $names[] as $name ({}; .[$name] = true)) as $set
| inputs | select($set[.author])
' blah.json

How to use JQ to unroll a list of objects into denormalized objects?

I have the following JSON lines example:
{"toplevel_key": "top value 1", "list": [{"key1": "value 1", "key2": "value 2"},{"key1": "value 3", "key2": "value 4"}]}
{"toplevel_key": "top value 2", "list": [{"key1": "value 5", "key2": "value 6"}]}
I want convert it using JQ, unrolling the list to a fixed number of "columns", ending up with a list of flat JSON objects, with the following format:
{
"top-level-key": "top value 1",
"list_0_key1": "value 1",
"list_0_key2": "value 2",
"list_1_key1": "value 3",
"list_1_key2": "value 4",
}
{
"top-level-key": "top value 2",
"list_0_key1": "value 4",
"list_0_key2": "value 5",
"list_1_key1": "",
"list_1_key2": "",
}
Note: I actually want them one per line, formatted here for legibility.
The only way I was able to get the output I want was by writing out all the columns in my JQ expression:
$ cat example.jsonl | jq -c '{toplevel_key, list_0_key1: .list[0].key1, list_0_key2: .list[0].key2, list_1_key1: .list[1].key1, list_1_key2: .list[1].key2}'
This gets me the result that I want, but I have to write manually ALL the fixed "columns" (and in production it will be a lot more than that).
I know I could use a script to generate that JQ code, but I'm NOT interested in a solution like that -- it won't solve my problem, because this is for an application that accepts only JQ.
Is there a way to do it in pure JQ?
This is what I was able to get so far:
$ cat example.jsonl | jq -c '(.list | to_entries | map({("list_" + (.key | tostring)): .value})) | add'
{"list_0":{"key1":"value 1","key2":"value 2"},"list_1":{"key1":"value 3","key2":"value 4"}}
{"list_0":{"key1":"value 5","key2":"value 6"}}
As long as you know the names of the specific keys, Jeff's answer is great. Here's an answer that doesn't hardcode the specific key names, that is, it works with objects of any structure and levels of nesting:
[leaf_paths as $path | {
"key": $path | map(tostring) | join("_"),
"value": getpath($path)
}] | from_entries
An explanation: paths is a builtin function that outputs an array representing the position of each element of the input you pass to it, recursively: the elements in said array are the ordered key names and indexes that lead to the requested array element. leaf_paths is a version of it that only gets the paths to the "leaf" elements, that is, elements that do not contain other elements.
To clarify, given the input [[1, 2]], paths will output [0], [0, 0], [0, 1] (that is, the paths to [1, 2], 1 and 2, respectively) while leaf_paths will only output [0, 0], [0, 1].
That's the hardest part. After that, we get each of the paths as $path (of the form ["list", 1, "key2"]) convert each of its elements to its string representation with map(tostring) (which gives us ["list", "1", "key2"]) and join them with underscores. We keep this as the key of the "entry" in the object we want to create: as value, we get the value of the original object at the $path given.
Lastly, we use from_entries to turn an array of key-value pairs into a JSON object. This will give us an output similar to the one on Jeff's answer: that is, one in which only keys with values appear.
However, your original question requested values appearing on any of the input objects to appear in all of the outputs, with the corresponding values set to empty strings when missing on the input. Here's a jq program that does this: as Jeff says in his answer, you need to slurp (-s) all the input values for it to be possible:
(map(leaf_paths) | unique) as $paths |
map([$paths[] as $path | {
"key": $path | map(tostring) | join("_"),
"value": (getpath($path) // "")
}] | from_entries)[]
You'll notice that it's pretty similar to the first program: the main difference is that we get all unique paths in the slurped object as $paths, and for each object we go through those instead of going through the paths of that object. We also use the alternative operator (//) to set missing values to empty strings.
Hope this helps!
Here's how you can build that up:
{ "top-level-key": .toplevel_key } + ([
range(.list|length) as $i
| .list[$i]
| to_entries[]
| .key = "list_\($i)_\(.key)"
] | from_entries)
This will map for every corresponding list entry.
{
"top-level-key": "top value 1",
"list_0_key1": "value 1",
"list_0_key2": "value 2",
"list_1_key1": "value 3",
"list_1_key2": "value 4"
}
{
"top-level-key": "top value 2",
"list_0_key1": "value 5",
"list_0_key2": "value 6"
}
If you need to pad it out, you'll have to slurp up the results to determine how much is actually needed and add the padding. But I'd leave it as this for now.
If you want to join the toplevel_key with the list as a string on seperate lines you can use the following:
jq -r '"\(.toplevel_key) - " as $i | [.list | to_entries[] | "\(.value | .key1), \(.value | .key2)"] | join(", ") as $j | $i + $j' toplevel.json
This will provide the below result:
top value 1 - value 1, value 2, value 3, value 4
top value 2 - value 5, value 6

How to convert arbitrary simple JSON to CSV using jq?

Using jq, how can arbitrary JSON encoding an array of shallow objects be converted to CSV?
There are plenty of Q&As on this site that cover specific data models which hard-code the fields, but answers to this question should work given any JSON, with the only restriction that it's an array of objects with scalar properties (no deep/complex/sub-objects, as flattening these is another question). The result should contain a header row giving the field names. Preference will be given to answers that preserve the field order of the first object, but it's not a requirement. Results may enclose all cells with double-quotes, or only enclose those that require quoting (e.g. 'a,b').
Examples
Input:
[
{"code": "NSW", "name": "New South Wales", "level":"state", "country": "AU"},
{"code": "AB", "name": "Alberta", "level":"province", "country": "CA"},
{"code": "ABD", "name": "Aberdeenshire", "level":"council area", "country": "GB"},
{"code": "AK", "name": "Alaska", "level":"state", "country": "US"}
]
Possible output:
code,name,level,country
NSW,New South Wales,state,AU
AB,Alberta,province,CA
ABD,Aberdeenshire,council area,GB
AK,Alaska,state,US
Possible output:
"code","name","level","country"
"NSW","New South Wales","state","AU"
"AB","Alberta","province","CA"
"ABD","Aberdeenshire","council area","GB"
"AK","Alaska","state","US"
Input:
[
{"name": "bang", "value": "!", "level": 0},
{"name": "letters", "value": "a,b,c", "level": 0},
{"name": "letters", "value": "x,y,z", "level": 1},
{"name": "bang", "value": "\"!\"", "level": 1}
]
Possible output:
name,value,level
bang,!,0
letters,"a,b,c",0
letters,"x,y,z",1
bang,"""!""",0
Possible output:
"name","value","level"
"bang","!","0"
"letters","a,b,c","0"
"letters","x,y,z","1"
"bang","""!""","1"
First, obtain an array containing all the different object property names in your object array input. Those will be the columns of your CSV:
(map(keys) | add | unique) as $cols
Then, for each object in the object array input, map the column names you obtained to the corresponding properties in the object. Those will be the rows of your CSV.
map(. as $row | $cols | map($row[.])) as $rows
Finally, put the column names before the rows, as a header for the CSV, and pass the resulting row stream to the #csv filter.
$cols, $rows[] | #csv
All together now. Remember to use the -r flag to get the result as a raw string:
jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | #csv'
The Skinny
jq -r '(.[0] | keys_unsorted) as $keys | $keys, map([.[ $keys[] ]])[] | #csv'
or:
jq -r '(.[0] | keys_unsorted) as $keys | ([$keys] + map([.[ $keys[] ]])) [] | #csv'
The Details
Aside
Describing the details is tricky because jq is stream-oriented, meaning it operates on a sequence of JSON data, rather than a single value. The input JSON stream gets converted to some internal type which is passed through the filters, then encoded in an output stream at program's end. The internal type isn't modeled by JSON, and doesn't exist as a named type. It's most easily demonstrated by examining the output of a bare index (.[]) or the comma operator (examining it directly could be done with a debugger, but that would be in terms of jq's internal data types, rather than the conceptual data types behind JSON).
$ jq -c '.[]' <<<'["a", "b"]'
"a"
"b"
$ jq -cn '"a", "b"'
"a"
"b"
Note that the output isn't an array (which would be ["a", "b"]). Compact output (the -c option) shows that each array element (or argument to the , filter) becomes a separate object in the output (each is on a separate line).
A stream is like a JSON-seq, but uses newlines rather than RS as an output separator when encoded. Consequently, this internal type is referred to by the generic term "sequence" in this answer, with "stream" being reserved for the encoded input and output.
Constructing the Filter
The first object's keys can be extracted with:
.[0] | keys_unsorted
Keys will generally be kept in their original order, but preserving the exact order isn't guaranteed. Consequently, they will need to be used to index the objects to get the values in the same order. This will also prevent values being in the wrong columns if some objects have a different key order.
To both output the keys as the first row and make them available for indexing, they're stored in a variable. The next stage of the pipeline then references this variable and uses the comma operator to prepend the header to the output stream.
(.[0] | keys_unsorted) as $keys | $keys, ...
The expression after the comma is a little involved. The index operator on an object can take a sequence of strings (e.g. "name", "value"), returning a sequence of property values for those strings. $keys is an array, not a sequence, so [] is applied to convert it to a sequence,
$keys[]
which can then be passed to .[]
.[ $keys[] ]
This, too, produces a sequence, so the array constructor is used to convert it to an array.
[.[ $keys[] ]]
This expression is to be applied to a single object. map() is used to apply it to all objects in the outer array:
map([.[ $keys[] ]])
Lastly for this stage, this is converted to a sequence so each item becomes a separate row in the output.
map([.[ $keys[] ]])[]
Why bundle the sequence into an array within the map only to unbundle it outside? map produces an array; .[ $keys[] ] produces a sequence. Applying map to the sequence from .[ $keys[] ] would produce an array of sequences of values, but since sequences aren't a JSON type, so you instead get a flattened array containing all the values.
["NSW","AU","state","New South Wales","AB","CA","province","Alberta","ABD","GB","council area","Aberdeenshire","AK","US","state","Alaska"]
The values from each object need to be kept separate, so that they become separate rows in the final output.
Finally, the sequence is passed through #csv formatter.
Alternate
The items can be separated late, rather than early. Instead of using the comma operator to get a sequence (passing a sequence as the right operand), the header sequence ($keys) can be wrapped in an array, and + used to append the array of values. This still needs to be converted to a sequence before being passed to #csv.
The following filter is slightly different in that it will ensure every value is converted to a string. (jq 1.5+)
# For an array of many objects
jq -f filter.jq [file]
# For many objects (not within array)
jq -s -f filter.jq [file]
Filter: filter.jq
def tocsv:
(map(keys)
|add
|unique
|sort
) as $cols
|map(. as $row
|$cols
|map($row[.]|tostring)
) as $rows
|$cols,$rows[]
| #csv;
tocsv
$cat test.json
[
{"code": "NSW", "name": "New South Wales", "level":"state", "country": "AU"},
{"code": "AB", "name": "Alberta", "level":"province", "country": "CA"},
{"code": "ABD", "name": "Aberdeenshire", "level":"council area", "country": "GB"},
{"code": "AK", "name": "Alaska", "level":"state", "country": "US"}
]
$ jq -r '["Code", "Name", "Level", "Country"], (.[] | [.code, .name, .level, .country]) | #tsv ' test.json
Code Name Level Country
NSW New South Wales state AU
AB Alberta province CA
ABD Aberdeenshire council area GB
AK Alaska state US
$ jq -r '["Code", "Name", "Level", "Country"], (.[] | [.code, .name, .level, .country]) | #csv ' test.json
"Code","Name","Level","Country"
"NSW","New South Wales","state","AU"
"AB","Alberta","province","CA"
"ABD","Aberdeenshire","council area","GB"
"AK","Alaska","state","US"
I created a function that outputs an array of objects or arrays to csv with headers. The columns would be in the order of the headers.
def to_csv($headers):
def _object_to_csv:
($headers | #csv),
(.[] | [.[$headers[]]] | #csv);
def _array_to_csv:
($headers | #csv),
(.[][:$headers|length] | #csv);
if .[0]|type == "object"
then _object_to_csv
else _array_to_csv
end;
So you could use it like so:
to_csv([ "code", "name", "level", "country" ])
This variant of Santiago's program is also safe but ensures that the key names in
the first object are used as the first column headers, in the same order as they
appear in that object:
def tocsv:
if length == 0 then empty
else
(.[0] | keys_unsorted) as $firstkeys
| (map(keys) | add | unique) as $allkeys
| ($firstkeys + ($allkeys - $firstkeys)) as $cols
| ($cols, (.[] as $row | $cols | map($row[.])))
| #csv
end ;
tocsv
If you're open to using other Unix tools, csvkit has an in2csv tool:
in2csv example.json
Using your sample data:
> in2csv example.json
code,name,level,country
NSW,New South Wales,state,AU
AB,Alberta,province,CA
ABD,Aberdeenshire,council area,GB
AK,Alaska,state,US
I like the pipe approach for piping directly from jq:
cat example.json | in2csv -f json -
A simple way is to just use string concatenation. If your input is a proper array:
# filename.txt
[
{"field1":"value1", "field2":"value2"},
{"field1":"value1", "field2":"value2"},
{"field1":"value1", "field2":"value2"}
]
then index with .[]:
cat filename.txt | jq -r '.[] | .field1 + ", " + .field2'
or if it's just line by line objects:
# filename.txt
{"field1":"value1", "field2":"value2"}
{"field1":"value1", "field2":"value2"}
{"field1":"value1", "field2":"value2"}
just do this:
cat filename.txt | jq -r '.field1 + ", " + .field2'