How to select items in JQ based on value in array - json

I have a file with lines like this:
{"items":["blue","green"]}
{"items":["yellow","green"]}
{"items":["blue","pink"]}
How can I use jq to select and show only the JSON values that have "blue" in their "items" array?
So the output would be:
{"items":["blue","green"]}
{"items":["blue","pink"]}

Found out the answer
jq 'select(.items | index("blue"))'

On Jan 30, 2017, a builtin named IN was added for efficiently testing whether a JSON entity is contained in a stream. It can also be used for efficiently testing membership in an array. In the present case, the relevant usage would be:
select( .items as $items | "blue" | IN($items[]) )
If your jq does not have IN/1, then so long as your jq has first/1, you can use this equivalent definition:
def IN(s): . as $in | first(if (s == $in) then true else empty end) // false;
any/0
Using any/0 here is relatively inefficient, e.g. compared to using any/1:
select( any( .items[]; . == "blue" ))
(In practice, index/1 is usually fast enough, but its implementation currently (jq 1.5 and versions through at least July 2017) is suboptimal.)

While what you have certainly works, it would be more correct to use contains. I would avoid that use since it can lead to confusion. index("blue") is 0 and one wouldn't consider that a truthy value and might expect it to be excluded from the results.
Consider using this filter instead:
select(.items | contains(["blue"]))
This has the added benefit that it would work if you wanted items with more than one match by simply adding more to the array.
As Will pointed out in the comments, this isn't quite correct. Strings are compared using substring matching (contains is used recursively) here.
In retrospect, contains didn't work out as I thought it would. Using index works, but personally I wouldn't use it. There's something about figuring out if an item is in a collection by looking for it's index that feels wrong to me. Using contains makes more sense to me, but in light of this information, it wouldn't be ideal in this case.
Here's an alternative that should work correctly:
select([.items[] == "blue"] | any)
Or for a more scalable way if you wanted to be able to match more values:
select(.items as $values | ["blue", "yellow"] | map([$values[] == .] | any) | all)

I have needed to use 'regex' for the same situation of the objects. (In another context, of course). I write the code because I did not find a solution for my need in these pages. This can be useful for someone.
For example, to match the blue color using a regular expression:
jq 'select(.items[]|test("bl.*"))' yourfile.json
jqPlay

Related

Get key name based on subkey value

I'm trying to setup some monitoring. As part of that I need to parse some gnarly json output to retrieve a node ID, which changes each time the node is rebooted or the service restarts. I always know the node name but not the "id". The JSON looks something like this:
{
"cluster_name":"cluster1",
"nodes":
{
"generatednodeid1":{"name":"node01"},
"generatednodeid2":{"name":"node2"}
}
}
Doing .nodes | keys gives me ["generatednodeid1","generatednodeid2"] as I'd expect.
I've tried .nodes[] | select(.name=="node2") but that only outputs {"name":"node2"}
What I really need to happen is if .name=="node2" then it gives me generatednodeid2
I've been beating my head against a wall. I can't for the life of me figure out what I'm missing. This seems so simple (probably is and I've looked at it too long). Any ideas?
Any ideas?
In this situation, the "to_entries" family of filters is helpful, e.g.:
.nodes
| to_entries[]
| select(.value.name == "node2")
| .key

jq: groupby and nested json arrays

Let's say I have: [[1,2], [3,9], [4,2], [], []]
I would like to know the scripts to get:
The number of nested lists which are/are not non-empty. ie want to get: [3,2]
The number of nested lists which contain or not contain number 3. ie want to get: [1,4]
The number of nested lists for which the sum of the elements is/isn't less than 4. ie want to get: [3,2]
ie basic examples of nested data partition.
Since stackoverflow.com is not a coding service, I'll confine this response to the first question, with the hope that it will convince you that learning jq is worth the effort.
Let's begin by refining the question about the counts of the lists
"which are/are not empty" to emphasize that the first number in the answer should correspond to the number of empty lists (2), and the second number to the rest (3). That is, the required answer should be [2,3].
Solution using built-in filters
The next step might be to ask whether group_by can be used. If the ordering did not matter, we could simply write:
group_by(length==0) | map(length)
This returns [3,2], which is not quite what we want. It's now worth checking the documentation about what group_by is supposed to do. On checking the details at https://stedolan.github.io/jq/manual/#Builtinoperatorsandfunctions,
we see that by design group_by does indeed sort by the grouping value.
Since in jq, false < true, we could fix our first attempt by writing:
group_by(length > 0) | map(length)
That's nice, but since group_by is doing so much work when all we really need is a way to count, it's clear we should be able to come up with a more efficient (and hopefully less opaque) solution.
An efficient solution
At its core the problem boils down to counting, so let's define a generic tabulate filter for producing the counts of distinct string values. Here's a def that will suffice for present purposes:
# Produce a JSON object recording the counts of distinct
# values in the given stream, which is assumed to consist
# solely of strings.
def tabulate(stream):
reduce stream as $s ({}; .[$s] += 1);
An efficient solution can now be written down in just two lines:
tabulate(.[] | length==0 | tostring )
| [.["true", "false"]]
QED
p.s.
The function named tabulate above is sometimes called bow (for "bag of words"). In some ways, that would be a better name, especially as it would make sense to reserve the name tabulate for similar functionality that would work for arbitrary streams.

Extract tokens from grammar

I have been working through the Advent of Code problems in Perl6 this year and was attempting to use a grammar to parse the Day 3's input.
Given input in this form: #1 # 1,3: 4x4 and this grammar that I created:
grammar Claim {
token TOP {
'#' <id> \s* '#' \s* <coordinates> ':' \s* <dimensions>
}
token digits {
<digit>+
}
token id {
<digits>
}
token coordinates {
<digits> ',' <digits>
}
token dimensions {
<digits> 'x' <digits>
}
}
say Claim.parse('#1 # 1,3: 4x4');
I am interested in extracting the actual tokens that were matched i.e. id, x + y from coordinates, and height + width from the dimensions from the resulting parse. I understand that I can pull them from the resulting Match object of Claim.parse(<input>), but I have to dig down through each grammar production to get the value I need e.g.
say $match<id>.hash<digits>.<digit>;
this seems a little messy, is there a better way?
For the particular challenge you're solving, using a grammar is like using a sledgehammer to crack a nut.
Like #Scimon says, a single regex would be fine. You can keep it nicely readable by laying it out appropriately. You can name the captures and keep them all at the top level:
/ ^
'#' $<id>=(\d+) ' '
'# ' $<x>=(\d+) ',' $<y>=(\d+)
': ' $<w>=(\d+) x $<d>=(\d+)
$
/;
say ~$<id x y w d>; # 1 1 3 4 4
(The prefix ~ calls .Str on the value on its right hand side. Called on a Match object it stringifies to the matched strings.)
With that out the way, your question remains perfectly cromulent as it is because it's important to know how P6 scales in this regard from simple regexes like the one above to the largest and most complex parsing tasks. So that's what the rest of this answer covers, using your example as the starting point.
Digging less messily
say $match<id>.hash<digits>.<digit>; # [「1」]
this seems a little messy, is there a better way?
Your say includes unnecessary code and output nesting. You could just simplify to something like:
say ~$match<id> # 1
Digging a little deeper less messily
I am interested in extracting the actual tokens that were matched i.e. id, x + y from coordinates, and height + width from the dimensions from the resulting parse.
For matches of multiple tokens you no longer have the luxury of relying on Perl 6 guessing which one you mean. (When there's only one, guess which one it guesses you mean. :))
One way to write your say to get the y coordinate:
say ~$match<coordinates><digits>[1] # 3
If you want to drop the <digits> you can mark which parts of a pattern should be stored in a list of numbered captures. One way to do so is to put parentheses around those parts:
token coordinates { (<digits>) ',' (<digits>) }
Now you've eliminated the need to mention <digits>:
say ~$match<coordinates>[1] # 3
You could also name the new parenthesized captures:
token coordinates { $<x>=(<digits>) ',' $<y>=(<digits>) }
say ~$match<coordinates><y> # 3
Pre-digging
I have to dig down through each grammar production to get the value I need
The above techniques still all dig down into the automatically generated parse tree which by default precisely corresponds to the tree implicit in the grammar's hierarchy of rule calls. The above techniques just make the way you dig into it seem a little shallower.
Another step is to do the digging work as part of the parsing process so that the say is simple.
You could inline some code right into the TOP token to store just the interesting data you've made. Just insert a {...} block in the appropriate spot (for this sort of thing that means the end of the token given that you need the token pattern to have already done its matching work):
my $made;
grammar Claim {
token TOP {
'#' <id> \s* '#' \s* <coordinates> ':' \s* <dimensions>
{ $made = ~($<id>, $<coordinatess><x y>, $<dimensions><digits>[0,1]) }
}
...
Now you can write just:
say $made # 1 1 3 4 4
This illustrates that you can just write arbitrary code at any point in any rule -- something that's not possible with most parsing formalisms and their related tools -- and the code can access the parse state as it is at that point.
Pre-digging less messily
Inlining code is quick and dirty. So is using a variable.
The normal thing to do for storing data is to instead use the make function. This hangs data off the match object that's being constructed corresponding to a given rule. This can then be retrieved using the .made method. So instead of $make = you'd have:
{ make ~($<id>, $<coordinatess><x y>, $<dimensions><digits>[0,1]) }
And now you can write:
say $match.made # 1 1 3 4 4
That's much tidier. But there's more.
A sparse subtree of a parse tree
.oO ( 🎶 On the first day of an imagined 2019 Perl 6 Christmas Advent calendar 🎶 a StackOverflow title said to me ... )
In the above example I constructed a .made payload for just the TOP node. For larger grammars it's common to form a sparse subtree (a term I coined for this because I couldn't find a standard existing term).
This sparse subtree consists of the .made payload for the TOP that's a data structure referring to .made payloads of lower level rules which in turn refer to lower level rules and so on, skipping uninteresting intermediate rules.
The canonical use case for this is to form an Abstract Syntax Tree after parsing some programming code.
In fact there's an alias for .made, namely .ast:
say $match.ast # 1 1 3 4 4
While this is trivial to use, it's also fully general. P6 uses a P6 grammar to parse P6 code -- and then builds an AST using this mechanism.
Making it all elegant
For maintainability and reusability you can and typically should not insert code inline at the end of rules but should instead use Action objects.
In summary
There are a range of general mechanisms that scale from simple to complex scenarios and can be combined as best fits any given use case.
Add parentheses as I explained above, naming the capture that those parentheses zero in on, if that is a nice simplification for digging into the parse tree.
Inline any action you wish to take during parsing of a rule. You get full access to the parse state at that point. This is great for making it easy to extract just the data you want from a parse because you can use the make convenience function. And you can abstract all actions that are to be taken at the end of successfully matching rules out of a grammar, ensuring this is a clean solution code-wise and that a single grammar remains reusable for multiple actions.
One final thing. You may wish to prune the parse tree to omit unnecessary leaf detail (to reduce memory consumption and/or simplify parse tree displays). To do so, write <.foo>, with a dot preceding the rule name, to switch the default automatic capturing off for that rule.
You can refer to each of you named portions directly. So to get the cordinates you can access :
say $match.<coordinates>.<digits>
this will return you the Array of digits matches. Ig you just want the values the easiest way is probably :
say $match.<coordinates>.<digits>.map( *.Int) or say $match.<coordinates>.<digits>>>.Int or even say $match.<coordinates>.<digits>».Int
to cast them to Ints
For the id field it's even easier you can just cast the <id> match to an Int :
say $match.<id>.Int

Powershell: json and dot including value variable access

I have a problem accessing json-objects of predictable structure but unknown depth in Powershell. So the json-objects contain information that can be connected by "and" and "or", but those connections can be used in several levels. As an exanple:
$ab=#"
{
"cond": "one",
"and": [
{"cond": "two"},
{"cond": "three"},
{"or": [{"cond": "four"},
{"cond": "five"}
]
}
]
}
"# | ConvertFrom-Json
I need to be able to read/test something like
$test="and.or"
$ab.$test.cond
where $test is a combination of several "and"s and "or"s like and.or.or.and .
The problem is that I can't figure out how my idea of $ab.$test.cond is to be written in Powershell to work. In theory I could test all possible combinations to a given depth by hand, but I'd prefer not to. Does anyhow have an idea how this could work? Thanks a lot!
(Powershell Version 5)
I think you should define a proper set of classes for your conditional engine/descriptors, either using PowerShell classes or using C# to create an assembly so you can use the types within PowerShell.
But for a quick and dirty PowerShell solution, you could do this:
"`$ab.$test.cond" | Invoke-Expression
# or
'$ab.{0}.cond' -f $test | Invoke-Expression
This has no error checking of course. Any other solution is likely going to be a separate recursive function if you want to get real checking and such, but it will be more fragile then using a well-defined set of objects.

What is the difference in purpose of TO and MAKE, and where are they documented?

I feel like I understand MAKE as being a constructor for a datatype. It takes two arguments... the first the target datatype, and the second a "spec".
In the case of objects it's fairly obvious that a block of Rebol data can be used as the "spec" to get back a value of type object!
>> foo: make object! [x: 10 y: 20 z: func [value] [print x + y + value] ]
== make object! [
x: 10
y: 20
]
>> print foo/x
10
>> foo/z 1
31
I know that if you pass an integer when you create a block, it will preallocate enough underlying memory to hold a block of that length, despite being empty:
>> foo: make block! 10
== []
That makes some sense. If you pass a string in, then you get the string parsed into Rebol tokens...
>> foo: make block! "some-set-word: {String in braces} some-word 12-Dec-2012"
== [some-set-word: "String in braces" some-word 12-Dec-2012]
Not all types are accepted, and again I'll say so far... so good.
>> foo: make block! 12-Dec-2012
** Script error: invalid argument: 12-Dec-2012
** Where: make
** Near: make block! 12-Dec-2012
By contrast, the TO operation is defined very similar, except it is for "conversion" instead of "construction". It also takes a target type as a first parameter, and then a "spec". It acts differently on values
>> foo: to block! 10
== [10]
>> foo: to block! 12-Dec-2012
== [12-Dec-2012]
That seems reasonable. If it received a non-series value, it wrapped it in a block. If you try an any-block! value with it, I'd imagine it would give you a block! series with the same values inside:
>> foo: to block! quote (a + b)
== [a + b]
So I'd expect a string to be wrapped in a block, but it just does the same thing MAKE does:
>> foo: to block! "some-set-word: {String in braces} some-word 12-Dec-2012"
== [some-set-word: "String in braces" some-word 12-Dec-2012]
Why is TO being so redundant with MAKE, and what is the logic behind their distinction? Passing integers into to block! gets the number inside a block (instead of having the special construction mode), and dates go into to block! making the date in a block instead of an error as with MAKE. So why wouldn't one want a to block! of a string to put that string inside a block?
Also: beyond reading the C sources for the interpreter, where is the comprehensive list of specs accepted by MAKE and TO for each target type?
MAKE is a constructor, TO is a converter. The reason that we have both is that for many types that operation is different. If they weren't different, we could get by with one operation.
MAKE takes a spec that is supposed to be a description of the value you're constructing. This is why you can pass MAKE a block and get values like objects or functions that aren't block-like at all. You can even pass an integer to MAKE and have it be treated like an allocation directive.
TO takes a value that is intended to be more directly converted to the target type (this value being called "spec" is just an unfortunate naming mishap). This is why the values in the input more directly correspond to the values in the output. Whenever there is a sensible default conversion, TO does it. That is why many types don't have TO conversions defined between them, the types are too different conceptually. We have fairly comprehensive conversions for some types where this is appropriate, such as to strings and blocks, but have carefully restricted some other conversions that are more useful to prohibit, such as from none to most types.
In some cases of simple types, there really isn't a complex way to describe the type. For them, it doesn't hurt to have the constructors just take self-describing values as their specs. Coincidentally, this ends up being the same behavior as TO for the same type and values. This doesn't hurt, so it's not useful to trigger an error in this case.
There are no comprehensive docs for the behavior of MAKE and TO because in Rebol 3 their behavior isn't completely finalized. There is still some debate in some cases about what the proper behavior should be. We're trying to make things more balanced, without losing any valuable functionality. We've already done a lot of work improving none and binary conversions, for instance. Once they are more finalized, and once we have a place to put them, we'll have more docs. In the meanwhile most of the Rebol 2 behavior is documented, and most of the changes so far for Rebol 3 are in CureCode.
Also: beyond reading the C sources for the interpreter, where is the
comprehensive list of specs accepted by MAKE and TO for each target
type?
May not be that useful, since it's red specific:
comparison-matrix
conversion-matrix
But it does at least mention if the behaviour is similar or different from rebol