Remove key if the value matches pattern - json

I have following json and want to remove keys if the "value" matches specific pattern. Name of the keys are not fixed or pre-determined.
{
"resources": [{
"tags": null,
"properties": {
"customerId": "1234-cbd9-42bc-9193-f6432a6ef0d4",
"provisioningState": "Succeeded",
"sku": {
"maxCapacityReservationLevel": 3000,
"lastSkuUpdate": "Fri, 19 Mar 2021 16:38:12 GMT"
},
"createdDate": "Fri, 19 Mar 2021 16:38:12 GMT",
"modifiedDate": "Fri, 19 Mar 2021 17:27:54 GMT",
"status": {
"events": [{
"count": 1,
"firstTimestamp": "2021-03-19T16:40:59Z",
"lastTimestamp": "2021-03-19T16:40:59Z",
"name": "Pulling",
"type": "Normal"
}]
}
}
}]
}
Expected output
After removal of the following keys as values matched timestamp format.
lastSkuUpdate
createdDate
modifiedDate
firstTimestamp
lastTimestamp
{
"resources": [{
"tags": null,
"properties": {
"customerId": "1234-cbd9-42bc-9193-f6432a6ef0d4",
"provisioningState": "Succeeded",
"sku": {
"maxCapacityReservationLevel": 3000,
},
"status": {
"events": [{
"count": 1,
"name": "Pulling",
"type": "Normal"
}]
}
}
}]
}

"(?x:
^
(?: \\d{4}-\\d{2}-\\d{2}T
| \\w{3},[ ][\\d ]\\d[ ]\\w{3}[ ]\\d{4}
)
)" as $date_pattern |
( .. | select(type == "object") ) |= del(.[
. as $o |
keys_unsorted[] |
select( $o[.] | type == "string" and test($date_pattern) )
])
jqplay
Adjust the pattern to your liking.

A straightforward way to delete key-value pairs from a JSON object is using with_entries, e.g. like so:
def matches:
type == "string"
and test("^[1-9]\\d{3}-\\d+-\\d|[1-9]\\d{3} \\d{2}:\\d{2}:\\d{2}");
walk(if type=="object"
then with_entries(if .value|matches then empty else . end)
else . end)
You may of course wish to adjust def matches: according to your requirements, or to shorten the filter-argument of with_entries to:
select(.value|matches|not)

Here's a superficially complicated but very efficient and actually quite simple solution. It uses jq's streaming parser and would thus be especially suitable for very large JSON input(s). Of course the matches filter as defined here should be taken as illustrative rather than definitive.
jq --stream -n '
def matches:
type == "string"
and test("^[1-9]\\d{3}-\\d+-\\d|[1-9]\\d{3} \\d{2}:\\d{2}:\\d{2}");
fromstream(inputs
| select((length==2
and (.[0][-1]|type)=="string"
and (.[-1]|matches))
| not) )
' input.json

Related

How to parse nested json to csv using command line

I want to parse a nested json to csv. The data looks similar to this.
{"tables":[{"name":"PrimaryResult","columns":[{"name":"name","type":"string"},{"name":"id","type":"string"},{"name":"custom","type":"dynamic"}]"rows":[["Alpha","1","{\"age\":\"23\",\"number\":\"xyz\"}]]]}
I want csv file as:
name id age number
alpha 1 23 xyz
I tried:
jq -r ".tables | .[] | .columns | map(.name)|#csv" demo.json > demo.csv
jq -r ".tables | .[] | .rows |.[]|#csv" demo.json >> demo.csv
But I am not getting expected result.
Output:
name id custom
alpha 1 {"age":"23","number":"xyz}
Expected:
name id age number
alpha 1 23 xyz
Assuming valid JSON input:
{
"tables": [
{
"name": "PrimaryResult",
"columns": [
{ "name": "name", "type": "string" },
{ "name": "id", "type": "string" },
{ "name": "custom", "type": "dynamic" }
],
"rows": [
"Alpha",
"1",
"{\"age\":\"23\",\"number\":\"xyz\"}"
]
}
]
}
And assuming fixed headers:
jq -r '["name", "id", "age", "number"],
(.tables[].rows | [.[0,1], (.[2] | fromjson | .age, .number)])
| #csv' input.json
Output:
"name","id","age","number"
"Alpha","1","23","xyz"
If any of the assumptions is wrong, you need to clarify your requirements, e.g.
How are column names determined?
What happens if the input contains multiple tables?
As the "dynamic" object always of the same shape? Or can it sometimes contain fewer, more, or different columns?
Assuming that the .rows array is a 2D array of rows and fields, and that a column of type "dynamic" always expects a JSON-encoded object whose fields represent further columns but may or may not always be present in every row.
Then you could go with transposing the headers array and the rows array in order to integratively process each column by their type, especially collecting all keys from the "dynamic" type on the fly, and then transpose it back to get the row-based CSV output.
Input (I have added another row for illustration):
{
"tables": [
{
"name": "PrimaryResult",
"columns": [
{
"name": "name",
"type": "string"
},
{
"name": "id",
"type": "string"
},
{
"name": "custom",
"type": "dynamic"
}
],
"rows": [
[
"Alpha",
"1",
"{\"age\":\"23\",\"number\":\"123\"}"
],
[
"Beta",
"2",
"{\"age\":\"45\",\"word\":\"xyz\"}"
]
]
}
]
}
Filter:
jq -r '
.tables[] | [.columns, .rows[]] | transpose | map(
if first.type == "string" then first |= .name
elif first.type == "dynamic" then
.[1:] | map(fromjson)
| (map(keys[]) | unique) as $keys
| [$keys, (.[] | [.[$keys[]]])] | transpose[]
else empty end
)
| transpose[] | #csv
'
Output:
"name","id","age","number","word"
"Alpha","1","23","123",
"Beta","2","45",,"xyz"
Demo

JQ- print specific key value pair

I have this JSON:
{
"time": "2022-02-28T22:00:55.196Z",
"severity": "INFO",
"params": [
{"key": "state", "value": "pending"},
{"key": "options", "value": "request"},
{"key": "description", "value": "[FILTERED]"}
],
"content_length": "231"
}
I want to print key value pairs of where the key matches to state and options and also the time and its value. I am able to print the time and all key value pairs by using below command, but not sure how to extract specific key value pairs.
jq '"time:\(.time)" ,[.params[] | "key:\(.key)" ,"value:\(.value)"]' test.json
This gives the output:
"time:2022-02-28T22:00:55.196Z"
[
"key:state",
"value:pending",
"key:options",
"value:request",
"key:description",
"value:[FILTERED]"
]
But my desired output is:
"time:2022-02-28T22:00:55.196Z"
"key:state",
"value:pending",
"key:options",
"value:request"
One solution to the stated problem would be:
< test.json jq '
"time:\(.time)",
[.params[] | select(.key|IN("state","options"))
| "key:\(.key)" ,"value:\(.value)"]
' | sed '/^[][]$/d'
However, it would almost certainly be better to modify the requirements slightly so that the output format is less idiosyncratic. This should also make it easier to formulate a cleaner (e.g. only-jq) solution.
You can use #csv (comma separated values).
Filter
"time:\(.time)",
(.params |
[map(select(.key=="state" or .key=="options"))[]
| "key:\(.key)", "value:\(.value)"]
| #csv)
Input
{
"time": "2022-02-28T22:00:55.196Z",
"severity": "INFO",
"params": [
{"key": "state", "value": "pending"},
{"key": "options", "value": "request"},
{"key": "description", "value": "[FILTERED]"}
],
"content_length": "231"
}
Output
time:2022-02-28T22:00:55.196Z
"key:state","value:pending","key:options","value:request"
Demo
https://jqplay.org/s/F_3QP6-EvK

how to output all the keys and values from json using jq?

I am trying to out all the data from my json file that matches the value "data10=true" it does that but only grabs the names, how can i make it so it will output everything in my json file with anything that matches the "data10=true"?
this is what ive got data=$(jq -c 'to_entries[] | select (.value.data10 == "true")| [.key, .value.name]' data.json )
This is in my YAML template btw, running it as a pipeline in devops.
The detailed requirements are unclear, but hopefully you'll be able to use the following jq program as a guide:
..
| objects
| select( .data10 == "true" )
| to_entries[]
| select(.key != "data10")
| [.key, .value]
This will recursively (thanks to the initial ..) examine all the JSON objects in the input.
p.s.
If you want to make the selection based on whether .data10 is "true" or true, you could change the criterion to .data10 | . == true or . == "true".
jq 'to_entries | map(select(.value.data10=="true")) | from_entries' data.json
input data.json,
with false value:
{
"FOO": {
"data10": "false",
"name": "Donald",
"location": "Stockholm"
},
"BAR": {
"data10": "true",
"name": "Walt",
"location": "Stockholm"
},
"BAZ": {
"data10": "true",
"name": "Jack",
"location": "Whereever"
}
}
output:
{
"BAR": {
"data10": "true",
"name": "Walt",
"location": "Stockholm"
},
"BAZ": {
"data10": "true",
"name": "Jack",
"location": "Whereever"
}
}
based on: https://stackoverflow.com/a/37843822/983325

How to select an element with jq in a nested JSON

I have input like this:
"data": [{
"id": 111585,
"name": "Inverter",
"batList": [{
"name": "Battery1",
"dataDict": [{
"key": "b1_1",
"name": "Battery V.",
"value": 57.63,
"unit": "V"
}, {
"key": "b1_2",
"name": "Battery I.",
"value": -0.10,
"unit": "A"
}, {
"key": "b1_3",
"name": "Battery P.",
"value": -6,
"unit": "W"
}, {
"key": "b1_4",
"name": "Inner T.",
"value": 25,
"unit": "℃"
}, {
"key": "b1_5",
"name": "Remaining Capacity % ",
"value": 99,
"unit": "%"
}]
}]
}],
from which I want to extract the 'value' property (i.e. 99) for "Remaining Capacity % ".
My best amateurish but well searched attempt is
jq --arg instance "Remaining Capacity % " '.data | .[] | select(.name == $instance) | .value')
but I get an empty result. Any help with this nested intransigence would be much appreciated.
Your idea seems about right, but you missed out mentioning the top-level paths after .data[], it should have been
jq --arg instance "Remaining Capacity % " \
'.data[].batList[].dataDict[] | select(.name == $instance ).value' json
An alternative to Inian's answer is to use the select-contains recipe, e.g.:
jq -r '.[].batList[].dataDict[] | select(.name|contains("Remaining")).value' file
While no better in this example, it is handy to remember especially if you need to find a bunch of values, e.g. contains("Battery") would return three results.

date/numeric filter with jq

I have a json that looks like this (it's the result of simple jq filter):
[{
"name": "Corrine",
"firstname": "Odile",
"uid": "PA2685",
"roles": [{
"role_name": "GQ",
"start": "2012-06-20"
},
{
"role_name": "HOUSE",
"start": "2012-06-26"
},
{
"role_name": "HOUSE",
"start": "2017-06-28"
}
]
},
{
"name": "Blanche",
"firstname": "Matthieu",
"uid": "PA2685",
"roles": [{
"role_name": "SENATE",
"start": "2014-06-20"
},
{
"role_name": "SENATE",
"start": "2012-06-26"
},
{
"role_name": "SENATE",
"start": "2012-06-28"
}
]
}
]
I would like to filter in two ways:
select only the first-level objects that have at least one role_name (inside the roles array) whose value is HOUSE;
and from this group select only the ones that have at least one start whose date is in 2017 or after.
In the json above "Corrine Odile" would be the only one selected.
I tried some with_entries(select(.value expressions, but my I'm confused about how to deal with the dates as well as the "at least one" requirement.
Requirements of the form "at least one" can usually be satisfied efficiently using any/2, e.g.
any(.roles[]; .role_name == "HOUSE")
The check for the year can (apparently) be accomplished by:
.start | .[:4] | tonumber >= 2017
Solution
To produce an array of objects satisfying the two conditions:
map(select(
any(.roles[]; .role_name == "HOUSE") and
any(.roles[]; .start[:4] | tonumber >= 2017) ))
.start[:4] is short for .start|.[0:4]