I've been trying to use jq parser to help me extract information from json files.
Here is an example snippet
{
"main_attribute": {
"name": {
"display_name": "abc"
},
"address": {
"unit": "1",
"street": "Dundas",
"suburb": "Syd",
"state": "NSW"
},
"financial_debt": {
"bank_loan": true
}
},
"secondary_attr": {
"income": {
"pretax": 100000
},
"automobile": {
"make": "Citroen",
"model": 2015,
"new": true
},
"property": {
"property_owned": 1,
"owned_since": 2000,
"first_sale": true
},
"education": {
"degree": "MS",
"graduated": 1990,
"financial_debt": {
"bank_loan": false
}
}
}
}
I need to find the blocks where "financial_debt" is true. This field could be either in the main_attribute (as a global value) or in the secondary attribute.
Expected output:
financial_debt: bank_loan on "automobile" and "property"
Can you please advise how to go about doing this search using jq?
This is by no means the most efficient way, but it is functional. It returns a boolean value specifying whether or not there is a true boolean value under the financial_debt property.
jq '[recurse | .financial_debt? | select(. != null) | recurse | booleans] | any'
tostream can be used to find paths containing "financial_debt" as follows:
tostream
| select(length==2)
| select(.[0] | contains(["financial_debt"]))
with this filter in filter.jq and data in data.json
$ jq -M -c -f filter.jq data.json
produces
[["main_attribute","financial_debt","bank_loan"],true]
[["secondary_attr","education","financial_debt","bank_loan"],false]
This intermediate result can be used along with reduce, setpath, getpath and a filter such as
. as $d
| reduce ( tostream
| select(length==2)
| select(.[0] | contains(["financial_debt"]))) as [$p,$v] (
{}
; setpath($p[:-1]; $d | getpath($p[:-1]))
)
to produce
{
"main_attribute": {
"financial_debt": {
"bank_loan": true
}
},
"secondary_attr": {
"education": {
"financial_debt": {
"bank_loan": false
}
}
}
}
Related
I'm trying to use JQ to create the following paths:
/staging-data-0/cassandra/cassandra-client-port
/staging-data-0/cassandra/cassandra-gossip-port
from the following blob of JSON (I've stripped unnecessary bits out):
{
"DebugConfig": {
"ServerPort": 8300,
"Services": [
{
"Checks": [
{
"CheckID": "cassandra-client-port",
"Timeout": "1s"
},
{
"CheckID": "cassandra-gossip-port",
"Timeout": "1s"
}
],
"Name": "cassandra"
},
{
"Checks": [
{
"CheckID": "cockroachdb-tcp",
"Timeout": "1s"
}
],
"Name": "cockroachdb"
}
]
},
"Member": {
"Name": "staging-data-0"
},
"Meta": {
"consul-network-segment": ""
}
}
I'm struggling with the JQ manual to generate the paths, I can only pull out the last part so far with
jq '.DebugConfig.Services | map(select(.Name=="cassandra")) | map(.Checks[].CheckID)'
The final path should be /{.Member.Name}/{.DebugConfig.Services.Name}/{.DebugConfig.Services.Checks.CheckID}
Only cassandra
jq -r '{a:.Member.Name, b:.DebugConfig.Services[]} | select(.b.Name=="cassandra") | {a:.a, b:.b.Name, c:.b.Checks[].CheckID} | [.a, .b, .c] | join("/")'
staging-data-0/cassandra/cassandra-client-port
staging-data-0/cassandra/cassandra-gossip-port
Both
jq -r '{a:.Member.Name, b:.DebugConfig.Services[]} | {a:.a, b:.b.Name, c:.b.Checks[].CheckID} | [.a, .b, .c] | join("/")'
staging-data-0/cassandra/cassandra-client-port
staging-data-0/cassandra/cassandra-gossip-port
staging-data-0/cockroachdb/cockroachdb-tcp
With your input, the jq filter:
.DebugConfig.Services[] as $s
| "/\(.Member.Name)/\($s.Name)/\($s.Checks[].CheckID)"
produces:
"/staging-data-0/cassandra/cassandra-client-port"
"/staging-data-0/cassandra/cassandra-gossip-port"
"/staging-data-0/cockroachdb/cockroachdb-tcp"
Since you only want the "cassandra" strings, you just need to interject a "select" filter:
.DebugConfig.Services[] as $s
| "/\(.Member.Name)/\($s.Name)/" +
($s
| select(.Name == "cassandra")
| .Checks[].CheckID)
but it's worth noting how easy it is to process all the "Checks" items.
I have this output from a pipe
{
"pipelineName": "pipelineName-AAAAQ6UFM",
"pipelineVersion": 2,
"stageStates": [
{
"stageName": "Approval",
"inboundTransitionState": {
"enabled": true
},
"actionStates": [
{
"actionName": "Approval",
"latestExecution": {
"status": "InProgress",
"token": "aaaa-aaaa-4316-a95f-2efc51d05761"
}
}
],
"latestExecution": {
"pipelineExecutionId": "fc73f4cb-c5a9-44a8-8fc1-d7e50259f485",
"status": "InProgress"
}
}
]
}
I am trying to write a json like this
{
"pipelineName": "pipelineName-AAAAQ6UFM",
"stageName": "Approval",
"actionName": "Approval",
"token": "aaaa-aaaa-4316-a95f-2efc51d05761",
"result": {
"status": "Approved",
"summary": ""
}
}
I could maybe set two variables from the pipeoutput with the read command but I don't know how to set both of them.
token
jq -r ' .stageStates[] | select(.stageName == "Approval") | .actionStates[0].latestExecution.token'
pipelineName
jq -r '.pipelineName'
Then I might be able to write the json with the jq command.
What would be the best way to do this ?
Based on your select(.stageName == "Approval"), it would appear that you are attempting to parameterize by the "stageName", so the following might be close to what you're looking for:
"Approval" as $stage
| { pipelineName, stageName: $stage, actionName: $stage }
+ (.stageStates[]
| select(.stageName == $stage).actionStates[]
| select(.actionName == $stage)
| {token: .latestExecution.token, result: {status: "Approved", summary: ""}})
You can use just jq to create the json:
jq ' .stageName = .stageStates[0].stageName
| .actionName = .stageStates[0].actionStates[0].actionName
| .token = .stageStates[0].actionStates[0].latestExecution.token
| .result = { "status": "Approved", "summary": "" }
| del(.stageStates, .pipelineVersion)
' file.json
I am trying to get values "en" of a JSON structure using jq on the linux command line.
find . -name "*.json" -exec jq -r \ '(input_filename | gsub("^\\./|\\.json$";"")) as $fname (map(.tags) | .[] | .[] | .tag.en ) as $tags | "\($fname)&\($tags)"' '{}' +
i have more than 5000 files, start from 0001.json 0002.json .. 5000.json
This is a simple file 0001.json
{
"result": {
"tags": [
{ "confidence": 100, "tag": { "en": "turbine" } },
{ "confidence": 64.8014373779297, "tag": { "en": "wind" } },
{ "confidence": 63.3033409118652, "tag": { "en": "generator" } },
{ "confidence": 7.27894926071167, "tag": { "en": "device" } },
{ "confidence": 7.01708889007568, "tag": { "en": "line" } }
]
},
"status": { "text": "", "type": "success" }
}
i get this result :
0001&turbine
0001&wind
0001&generator
0001&device
0001&line
jq: error (at ./0001.json:0): Cannot iterate over null (null)
Ouptut..
jq: error (at ./0002.json:0): Cannot iterate over null (null)
Output..
jq: error (at ./0003.json:0): Cannot iterate over null (null)
My Desired Output in one file from all json files results.
filename&enValue:confidenceValue
0001&turbine:100,wind:64,generator:63,device:7,line:7
0002&...
0003&...
0004&...
The jq filter you want can be written as follows:
(input_filename | gsub("^\\./|\\.json$";"")) as $fname
| ( [ .result.tags[] | [.tag.en, (.confidence | floor)] | join(":") ]
| join(",") ) as $tags
| "\($fname)&\($tags)"
I have an API that returns JSON - big blocks of it. Some of the key value pairs have more blocks of JSON as the value associated with a key. jq does a great job of parsing the main JSON levels. But I can't find a way to get it to 'recurse' into the values associated with the keys and pretty print them as well.
Here is the start of one of the JSON returns. Note it is only a small percent of the full return:
{
"code": 200,
"status": "OK",
"data": {
"PlayFabId": "xxxxxxx",
"InfoResultPayload": {
"AccountInfo": {
"PlayFabId": "xxxxxxxx",
"Created": "2018-03-22T19:23:29.018Z",
"TitleInfo": {
"Origination": "IOS",
"Created": "2018-03-22T19:23:29.033Z",
"LastLogin": "2018-03-22T19:23:29.033Z",
"FirstLogin": "2018-03-22T19:23:29.033Z",
"isBanned": false
},
"PrivateInfo": {},
"IosDeviceInfo": {
"IosDeviceId": "xxxxxxxxx"
}
},
"UserVirtualCurrency": {
"GT": 10,
"MB": 70
},
"UserVirtualCurrencyRechargeTimes": {},
"UserData": {},
"UserDataVersion": 15,
"UserReadOnlyData": {
"DataVersion": {
"Value": "6",
"LastUpdated": "2018-03-22T19:48:59.543Z",
"Permission": "Public"
},
"achievements": {
"Value": "[{\"id\":0,\"gamePack\":\"GAME.PACK.0.KK\",\"marblesAmount\":50,\"achievements\":[{\"id\":2,\"name\":\"Correct Round 4\",\"description\":\"Round 4 answered correctly\",\"maxValue\":10,\"increment\":1,\"currentValue\":3,\"valueUnit\":\"unit\",\"awardOnIncrement\":true,\"marbles\":10,\"image\":\"https://www.jamandcandy.com/kissinkuzzins/achievements/icons/sphinx\",\"SuccessKey\":[\"0_3_4_0\",\"0_5_4_0\",\"0_6_4_0\",\"0_7_4_0\",\"0_8_4_0\",\"0_9_4_0\",\"0_10_4_0\"],\"event\":\"Player_answered_round\",\"achieved\":false},{\"id\":0,\"name\":\"Complete
This was parsed using jq but as you can see when you get to the
"achievements": { "Vales": "[{\"id\":0,\"gamePack\":\"GAME.PACK.0.KK\",\"marblesAmount\":50,\
lq does no further parse the value at is also JSON.
Is there a filter I am missing to get it to parse the values as well as the higher level structure?
Is there a filter I am missing ...?
The filter you'll need is fromjson, but it should only be applied to the stringified JSON; consider therefore using |= as illustrated using your fragment:
echo '{"achievements": { "Vales": "[{\"id\":0,\"gamePack\":\"GAME.PACK.0.KK\",\"marblesAmount\":50}]"}}' |
jq '.achievements.Vales |= fromjson'
{
"achievements": {
"Vales": [
{
"id": 0,
"gamePack": "GAME.PACK.0.KK",
"marblesAmount": 50
}
]
}
}
recursively/1
If you want to apply fromjson recursively wherever possible, then recursively is your friend:
def recursively(f):
. as $in
| if type == "object" then
reduce keys[] as $key
( {}; . + { ($key): ($in[$key] | recursively(f) )} )
elif type == "array" then map( recursively(f) )
else try (f as $f | if $f == . then . else ($f | recursively(f)) end) catch $in
end;
This would be applied as follows:
recursively(fromjson)
Example
{a: ({b: "xyzzy"}) | tojson} | tojson
| recursively(fromjson)
yields:
{
"a": {
"b": "xyzzy"
}
}
(It was hard to come up with a title that summarizes the issue, so feel free to improve it).
I have a JSON file with the following content:
{
"Items": [
{
"ID": {
"S": "ID_Complete"
},
"oldProperties": {
"L": [
{
"S": "[property_A : value_A_old]"
},
{
"S": "[property_B : value_B_old]"
}
]
},
"newProperties": {
"L": [
{
"S": "[property_A : value_A_new]"
},
{
"S": "[property_B : value_B_new]"
}
]
}
},
{
"ID": {
"S": "ID_Incomplete"
},
"oldProperties": {
"L": [
{
"S": "[property_B : value_B_old]"
}
]
},
"newProperties": {
"L": [
{
"S": "[property_A : value_A_new]"
},
{
"S": "[property_B : value_B_new]"
}
]
}
}
]
}
I would like to manipulate the data using jq in such a way that for each item in Items[] that has a new value for property_A (under newProperties list) generate an output with the corresponding id, old and new (see desired output below) fields regardless of the value that property has in the oldProperties list. Moreover, if property_A does not exist in the oldProperties, I still need the old field to be populated with a null (or any fixed string for what it's worth).
Desired output:
{
"id": "id_Complete",
"old": "[property_A : value_A_old]",
"new": "[property_A : value_A_new]"
}
{
"id": "ID_Incomplete",
"old": null,
"new": "[property_A : value_A_new]"
}
Note: Even though property_A doesn't exist in the oldProperties list, other properties may (and will) exist.
The problem I am facing is that I am not able to get an output when the desired property does not exist in the oldProperties list. My current jq command looks like this:
jq -r '.Items[] |
{ id:.ID.S,
old:.oldProperties.L[].S | select(. | contains("property_A")),
new:.newProperties.L[].S | select(. | contains("property_A")) }'
Which renders only the ID_Complete case, while I need the other as well.
Is there any way to achieve this using this tool?
Thanks in advance.
Your list of properties appear to be values of some object. You could map them out into an object to then diff the objects, then report on the results.
You could do something like this:
def make_object_from_properties:
[.L[].S | capture("\\[(?<key>\\w+) : (?<value>\\w+)\\]")]
| from_entries
;
def diff_objects($old; $new):
def _prop($key): select(has($key))[$key];
([($old | keys[]), ($new | keys[])] | unique) as $keys
| [ $keys[] as $k
| ({ value: $old | _prop($k) } // { none: true }) as $o
| ({ value: $new | _prop($k) } // { none: true }) as $n
| (if $o.none then "add"
elif $n.none then "remove"
elif $o.value != $n.value then "change"
else "same"
end) as $s
| { key: $k, status: $s, old: $o.value, new: $n.value }
]
;
def diff_properties:
(.oldProperties | make_object_from_properties) as $old
| (.newProperties | make_object_from_properties) as $new
| diff_objects($old; $new) as $diff
| foreach $diff[] as $d ({ id: .ID.S };
select($d.status != "same")
| .old = ((select(any("remove", "change"; . == $d.status)) | "[\($d.key) : \($d.old)]") // null)
| .new = ((select(any("add", "change"; . == $d.status)) | "[\($d.key) : \($d.new)]") // null)
)
;
[.Items[] | diff_properties]
This yields the following output:
[
{
"id": "ID_Complete",
"old": "[property_A : value_A_old]",
"new": "[property_A : value_A_new]"
},
{
"id": "ID_Complete",
"old": "[property_B : value_B_old]",
"new": "[property_B : value_B_new]"
},
{
"id": "ID_Incomplete",
"old": null,
"new": "[property_A : value_A_new]"
},
{
"id": "ID_Incomplete",
"old": "[property_B : value_B_old]",
"new": "[property_B : value_B_new]"
}
]
It seems like your data is in some kind of encoded format too. For a more robust solution, you should consider defining some functions to decode them. Consider approaches found here on how you could do that.
This filter produces the desired output.
def parse: capture("(?<key>\\w+)\\s*:\\s*(?<value>\\w+)") ;
def print: "[\(.key) : \(.value)]";
def norm: [.[][][] | parse | select(.key=="property_A") | print][0];
.Items
| map({id:.ID.S, old:.oldProperties|norm, new:.newProperties|norm})[]
Sample Run (assumes filter in filter.jq and data in data.json)
$ jq -M -f filter.jq data.json
{
"id": "ID_Complete",
"old": "[property_A : value_A_old]",
"new": "[property_A : value_A_new]"
}
{
"id": "ID_Incomplete",
"old": null,
"new": "[property_A : value_A_new]"
}
Try it online!