Select objects based on value of variable in object using jq
That shows how to return values directly above the selection criteria but how would I get another object that was adjacent to a value above my selection criteria?
Given the data below, what jq invocation would return the French name of planets whose moon(s) have been spoiled? (this is a structural reproduction of the live data with which I am working -- which actually uses the word "value" in this way, so that's not helping)
{"kind":"solarsystem","name":"Sol",
"Planets": [
{ "kind":"habitable",
"names": { "english":"Earth","french":"Terre"},
"satellites" : [
{"name":"The Moon",
"parameters": [
{"name":"diameter", "intValue":"3476"},
{"name":"diameter_units", "value":"km"},
{"name":"unspoiled","value":"no"}]}]},
{"kind":"uninhabitable",
"names": {"english":"Mars","french":"Mars"},
"satellites" : [
{"name":"Phobos",
"parameters": [
{"name":"diameter", "intValue":"2200"},
{"name":"diameter_units", "value":"m"},
{"name":"unspoiled","value":"yes"}]},
{"name":"Deimos",
"parameters": [
{"name":"diameter", "intValue":"1200"},
{"name":"diameter_units", "value":"m"},
{"name":"unspoiled","value":"yes"}]}]}]}
The program below selects planets whose moons have all been spoiled. As each parameter is a name-value pair, we can use from_entries to transform the array of parameters into an object and retrieve the unspoiled status with just .unspoiled, and thus avoid another select to find the parameter we're interested in.
.Planets[] | select(.satellites | all(.parameters | from_entries .unspoiled == "no")) .names.french
If a single spoiled moon is enough, change all to any.
Online demo
And here, also a solution for the same JSON query using an alternative tool (jtc):
In the simplest form, the following will do:
bash $ <file.json jtc -w'[value]:<no>:[-5][names][french]'
"Terre"
However, that solution will return planet's french name for each of the moon, e.g., for spoiled moons it would give this:
bash $ <file.json jtc -w'[value]:<yes>:[-5][names][french]'
"Mars"
"Mars"
bash $
For the case when there're multiple moons but the name is required only once, strengthen the query like this (showcasing here spoiled moons):
bash $ <file.json jtc -w'<satellites>l:[value]:<yes>[-5][names][french]'
"Mars"
bash $
PS. I'm a deveoper of jtc unix JSON processor
PPS. the above disclaimer is required by SO.
Update:
the answer was updated based on discussion in comments with #oguzismail to enhance structural relationship between value and french labels so that other (irrelevant) possible value matches won't trigger false positives.
If, by a chance, the structural relation [-5][names] is not enough, the query then can be ultimately enhanced by inserting <unspoiled>[-1] before [value]... lexeme
Related
Suppose I have the following json input to the jq command:
[
{"type":"dog", "set":"foo"},
{"type":"bird", "set":"bar"},
{"type":"cat", "set":"blaz"},
{"type":"fish", "set":"mor"}
]
I know that there is an element in this array whose type is "bird", in this case, the second element. But I want its next (or previous) sibling, that is, the element after (before) it, in this case, the third (first) element. How can I get it in jq?
Also, I have another question: If the matched element (that is, the element whose value of type I know) is the last one in the array, I want it to get the first one as next (that is, cycle through the array) instead of returning nothing. The same whether the matched element is the first one (then I want to get the last element).
For the sake of specificity, let's suppose you want to extract the the (before, focus, after) triple as an array, where before and after wrap around as described. For simplicity, let's also suppose the source array is of length at least 2.
Next, for ease of exposition and understanding, we will define a helper function for extracting the three items:
# $i is assumed to be a valid index into the input array,
# which is assumed to be of length at least 2
def triple($i):
if $i == 0 then [last] + .[0:2]
elif $i == (length-1) then .[$i-1: $i+2] + [first]
else .[$i-1: $i+2]
end;
Now we have only to find the index, and use it:
(map(.type) | index("bird")) as $i
| if $i then triple($i) else empty end
Using this approach, other variants can easily be obtained.
let me also offer you an alternative solution, based on a walk-path unix tool for JSON: jtc - there you "encode" your query logic right into the path:
e.g. to find a "type":"bird" record and then it's preceding sibling (in the parent's array) would be like this:
bash $ <file.json jtc -w'[type]:<bird> [-1]<idx>k [-1]>idx<t-1' -r
{ "set": "foo", "type": "dog" }
let me break it down for you:
[type]:<bird> - will find recursively a record "type":"bird"
[-1]<idx>k - will step up 1 tier in JSON tree (select parent, effectively select the entire record {"type":"bird", "set":"bar"}) and will memorize its array index into the namespace idx
[-1]>idx<t-1 - will again step up 1 level in JSON (selecting the top array) and will search (non-recursively) for the entry with index (stored in idx) offset by -1
Equally once can select a next sibling:
bash $ <file.json jtc -w'[type]:<bird>[-1]<idx>k[-1]>idx<t1'
{ "set": "blaz", "type": "cat" }
Or, select the first entry (based on the last match):
bash $ <file.json jtc -w'[type]:<fish>[-1]<idx>k[-1]>idx<t-1000' -r
{ "set": "foo", "type": "dog" }
(just put some surely low value as the relative quantifier - it'll get normalized to the first entry)
PS> Disclosure: I'm the creator of the jtc tool
On the jq manual page there are a few examples of output formatting, particularly some shortcuts for when you want to just echo exactly what was in the input JSON.
What if I want to echo exactly what was in the input, but only for keys that match a certain pattern?
For example, given input like so ...
[
{"Name":"Widgets","Size":10,"SymUS":"Widg","SymCN":"Zyin","SymJP":"Kono"},
{"Name":"Blodgets","Size":400,"SymUS":"Blodg","SymAU":"Blod","SymJP":"Kado"},
{"Name":"Fonzes","Size":11,"SymRU":"Fyet","SymBR":"Foao"}
]
Say I want to select all objects where the Name ends in "ets" and then display the Name and all attributes of the form Sym*. All I know about those attributes is that there will be one or more per JSON object, and the names have the format Sym followed by a two-letter ISO country code.
I would like to just do this:
jq '.[] | select(.Name | endswith("ets")) | {Name, Sym*}'
but that's not a thing.
Is this just not something jq is designed to handle in a single operation? Should I do a first pass through the file to collect all the possible keys and then list them all explicitly via a slurpfile?
The key to a simple solution to the problem is to_entries, as described in the online manual. With your example data, the following filter produces the output shown below, in accordance with what I understand to be the expectations:
.[]
| select(.Name | test("ets$"))
| {Name} + (to_entries | map(select(.key|test("^Sym"))) | from_entries)
You might want to refine the regex tests, and/or make other minor adjustments.
Output:
{
"Name": "Widgets",
"SymUS": "Widg",
"SymCN": "Zyin",
"SymJP": "Kono"
}
{
"Name": "Blodgets",
"SymUS": "Blodg",
"SymAU": "Blod",
"SymJP": "Kado"
}
All my university notes are in JSON format and when I get a set of practical questions from a pdf it is formatted like this:
1. Download and compile the code. Run the example to get an understanding of how it works. (Note that both
threads write to the standard output, and so there is some mixing up of the two conceptual streams, but this
is an interface issue, not of concern in this course.)
2. Explore the classes SumTask and StringTask as well as the abstract class Task.
3. Modify StringTask.java so that it also writes out “Executing a StringTask task” when the execute() method is
called.
4. Create a new subclass of Task called ProdTask that prints out the product of a small array of int. (You will have
to add another option in TaskGenerationThread.java to allow the user to generate a ProdTask for the queue.)
Note: you might notice strange behaviour with a naïve implementation of this and an array of int that is larger
than 7 items with numbers varying between 0 (inclusive) and 20 (exclusive); see ProdTask.java in the answer
for a discussion.
5. Play with the behaviour of the processing thread so that it polls more frequently and a larger number of times,
but “pop()”s off only the first task in the queue and executes it.
6. Remove the “taskType” member variable definition from the abstract Task class. Then add statements such as
the following to the SumTask class definition:
private static final String taskType = "SumTask";
Investigate what “static” and “final” mean.
7. More challenging: write an interface and modify the SumTask, StringTask and ProdTask classes so that they
implement this interface. Here’s an example interface:
What I would like to do is copy it into vim and execute a find and replace to convert it into this:
"1": {
"Task": "Download and compile the code. Run the example to get an understanding of how it works. (Note that both threads write to the standard output, and so there is some mixing up of the two conceptual streams, but this is an interface issue, not of concern in this course.)",
"Solution": ""
},
"2": {
"Task": "Explore the classes SumTask and StringTask as well as the abstract class Task.",
"Solution": ""
},
"3": {
"Task": "Modify StringTask.java so that it also writes out “Executing a StringTask task” when the execute() method is called.",
"Solution": ""
},
"4": {
"Task": "Create a new subclass of Task called ProdTask that prints out the product of a small array of int. (You will have to add another option in TaskGenerationThread.java to allow the user to generate a ProdTask for the queue.) Note: you might notice strange behaviour with a naïve implementation of this and an array of int that is larger than 7 items with numbers varying between 0 (inclusive) and 20 (exclusive); see ProdTask.java in the answer for a discussion.",
"Solution": ""
},
"5": {
"Task": "Play with the behaviour of the processing thread so that it polls more frequently and a larger number of times, but “pop()”s off only the first task in the queue and executes it.",
"Solution": ""
},
"6": {
"Task": "Remove the “taskType” member variable definition from the abstract Task class. Then add statements such as the following to the SumTask class definition: private static final String taskType = 'SumTask'; Investigate what “static” and “final” mean.",
"Solution": ""
},
"7": {
"Task": "More challenging: write an interface and modify the SumTask, StringTask and ProdTask classes so that they implement this interface. Here’s an example interface:",
"Solution": ""
}
After trying to figure this out during the practical (instead of actually doing the practical) this is the closest I got:
%s/\([1-9][1-9]*\)\. \(\_.\{-}\)--end--/"\1": {\r "Task": "\2",\r"Solution": "" \r},/g
The 3 problems with this are
I have to add --end-- to the end of each question. I would like it to know when the question ends by looking ahead to a line which starts with [1-9][1-9]*. unfortunately when I search for that It also replaces that part.
This keeps all the new lines within the question (which is invalid in JSON). I would like it to remove the new lines.
The last entry should not contain a "," after the input because that would also be invalid JSON (Note I don't mind this very much as it is easy to remove the last "," manually)
Please keep in mind I am very bad at regular expressions and one of the reasons I am doing this is to learn more about regex so please explain any regex you post as a solution.
In two steps:
%s/\n/\ /g
to solve problem 2, and then:
%s/\([1-9][1-9]*\)\. \(\_.\{-}\([1-9][1-9]*\. \|\%$\)\#=\)/"\1": {\r "Task": "\2",\r"Solution": "" \r},\r/g
to solve problem 1.
You can solve problem 3 with another replace round. Also, my solution inserts an unwanted extra space at the end of the task entries. Try to remove it yourself.
Short explanation of what I have added:
\|: or;
\%$: end of file;
\#=: find but don't include in match.
If each item sits in single line, I would transform the text with macro, it is shorter and more straightforward than the :s:
I"<esc>f.s": {<enter>"Task": "<esc>A"<enter>"Solution": ""<enter>},<esc>+
Record this macro in a register, like q, then you can just replay it like 100#q to do the transformation.
Note that
the result will leave a comma , and the end, just remove it.
You can also add indentations during your macro recording, then your json will be "pretty printed". Or you can make it sexy later with other tool.
You could probably do this with one large regular expression, but that quickly becomes unmaintainable. I would break the task up into 3 steps instead:
Separate each numbered step into its own paragraph .
Put each paragraph on its own line .
Generate the JSON .
Taken together:
%s/^[0-9]\+\./\r&/
%s/\(\S\)\n\(\S\)/\1 \2/
%s/^\([0-9]\+\)\. *\(.*\)/"\1": {\r "Task": "\2",\r "Solution": ""\r},/
This solution also leaves a comma after the last element. This can be removed with:
$s/,//
Explanation
%s/^[0-9]\+\./\r&/ this matches a line starting with a number followed by a dot, e.g. 1., 8., 13., 131, etc. and replaces it with a newline (\r) followed by the match (&).
%s/\(\S\)\n\(\S\)/\1 \2/ this removes any newline that is flanked by non-white-space characters (\S).
%s/^\([0-9]\+\)\. *\(.*\) ... capture the number and text in \1 and \2.
... /"\1": {\r "Task": "\2",\r "Solution": ""\r},/ format text appropriately.
Alternative way using sed, awk and jq
You can perform steps one and two from above straightforwardly with sed and awk:
sed 's/^[0-9]\+\./\n&/' infile
awk '$1=$1; { print "\n" }' RS= ORS=' '
Using jq for the third step ensures that the output is valid JSON:
jq -R 'match("([0-9]+). *(.*)") | .captures | {(.[0].string): { "Task": (.[1].string), "Solution": "" } }'
Here as one command line:
sed 's/^[0-9]\+\./\n&/' infile |
awk '$1=$1; { print "\n" }' RS= ORS=' ' |
jq -R 'match("([0-9]+). *(.*)") | .captures | {(.[0].string): { "Task": (.[1].string), "Solution": "" } }'
Say I have the following JSON, stored in my variable jsonVariable.
{
"id": 1,
"details": {
"username": "jamesbrown",
"name": "James Brown"
}
}
I parse this JSON with jq using the following:
echo $jsonVariable | jq '.details.name | select(.name == "James Brown")'
This would give me the output
James Brown
But what if I want to get the id of this person as well? Now, I'm aware this is a rough and simple example - the program I'm working with at the moment is 5 or 6 levels deep with many different JQ functions other than select. I need a way to select a parent's field when I am already 5 or 6 layers deep after carrying out various methods of filtering.
Can anyone help? Is there any way of 'going in reverse', back up to the parent? (Not sure if I'm making sense!)
For a more generic approach, save the value of the "parent" element at the detail level you want, then pipe it at the end of your filter:
jq '. as $parent | .details.name | select(. == "James Brown") | $parent'
Of course, for the trivial case you expose, you could omit this entirely:
jq 'select(.details.name == "James Brown")'
Also, consider that if your selecting filters return many matches for a single parent object, you will receive a copy of the parent object for each match. You may wish to make sure your select filters only return one element at the parent level by wrapping all matches below parent level into an array, or to deduplicate the final result with unique.
Give this a shot:
echo $jsonVariable | jq '{Name: .details.name, Id: .Id} | select(.name == "James Brown")'
Rather than querying up to the value you're testing for, query up to the root object that contains the value you're querying on and the values you wish to select.
You need the object that contains both the id and the name.
$ jq --arg name 'James Brown' 'select(.details.name == $name).id' input.json
I have an array like below. I want to parse entire data to my bash array.
So i can call the first "JSON addressLineOne" from ${bashaddr[0]}, and etc.
[
{
"id":"f0c546d5-0ce4-55ee-e043-516e0f0afdc1",
"cardType":"WMUSGESTORECARD",
"lastFour":"1682",
"cardExpiryDate":"2012-01-16",
"firstName":"robert",
"lastName":"robishaw",
"addressLineOne":"Apt venue",
"addressLineTwo":"",
"city":"oakdale",
"state":"CT",
"postalCode":"06370",
"phone":"534534",
"isDefault":false
},
{
"id":"f0c546d5-0ce0-55ee-e043-516e0f0afdc1",
"cardType":"MASTERCARD",
"lastFour":"2731",
"cardExpiryDate":"2009-08-31",
"firstName":"robert",
"lastName":"robishaw",
"addressLineOne":"119 maple ave.",
"addressLineTwo":"",
"city":"uncasville",
"state":"CT",
"postalCode":"06382",
"phone":"7676456",
"isDefault":false
},
{
"id":"f0c546d5-0ce2-55ee-e043-516e0f0afdc1",
"cardType":"MASTERCARD",
"lastFour":"6025",
"cardExpiryDate":"2011-08-31",
"firstName":"robert",
"lastName":"robishaw",
"addressLineOne":"Angeline Street",
"addressLineTwo":"",
"city":"oakdale",
"state":"CT",
"postalCode":"06370",
"phone":"7867876",
"isDefault":false
}
]
I have tried like this:
#!/bin/bash
addressLineOne="$(echo $card | jsawk 'return this.addressLineOne')"
but it gives me the entire address:
["address 1","address 2","address 3"]
Thank you.
I wrote the answer below before reading the comments, but this is exactly the same answer as #4ae1e1 provided, except I don't put -r tag in case you want the values to remain quoted (e.g. passing this as an argument somewhere else).
I know this is not jsawk, but do consider jq:
jq '.[].addressLineOne' yourfile.txt
And to access specific values you can put record number in the square brackets (starting with 0 for the first address and so on). For example to get the address for the third record:
jq '.[2].addressLineOne' yourfile.txt
For learning more about jq and advanced uses, check: http://jqplay.org
What you need to do is make use of the -a switch to apply some post processing and filter the output array like this:
jsawk 'return this.addressLineOne' -a 'return this[0]'
From the documentation:
-b <script> | -a <script>
Run the specified snippet of JavaScript before (-b) or after (-a)
processing JSON input. The `this` object is set to the whole JSON
array or object. This is used to preprocess (-b) or postprocess
(-a) the JSON array before or after the main script is applied.
This option can be specified multiple times to define multiple
before/after scripts, which will be applied in the order they
appeared on the command line.