Format nested JSON input using JQ? - json

I am trying to convert the sample input below into the output below using jq:
Input JSON
"elements": [
{
"type": "CustomObjectData",
"id": "2185",
"fieldValues": [
{
"type": "FieldValue",
"id": "169",
"value": "9/6/2017 12:00:00 AM"
},
{
"type": "FieldValue",
"id": "190",
"value": "ABC"
}
]
},
{
"type": "CustomObjectData",
"id": "2186",
"contactId": "13",
"fieldValues": [
{
"type": "FieldValue",
"id": "169",
"value": "8/31/2017 12:00:00 AM"
},
{
"type": "FieldValue",
"id": "190",
"value": "DEF"
}
]
}
]
Desired Output (group by id)
Essentially trying to extract "value" field from each "fieldValues" object and group them by "id"
{
"id:"2185",
"value": "9/6/2017 12:00:00 AM",
"value": "ABC"
},
{
"id:"2186",
"value": "8/31/2017 12:00:00 AM",
"value": "DEF"
}
What jq syntax should i use to achieve this? Thanks very much!!

Assuming the input shown in the Q has been modified in the obvious way to make it valid JSON, the following filter will produce the output as shown below, that is, a stream of valid JSON values that is similar to the allegedly expected output included in the Q. If a single array is desired, one possibility would be to wrap the program in square brackets.
program.jq
.elements[]
| {id, values: [ .fieldValues[].value] }
Output
{
"id": "2185",
"values": [
"9/6/2017 12:00:00 AM",
"ABC"
]
}
{
"id": "2186",
"values": [
"8/31/2017 12:00:00 AM",
"DEF"
]
}
Producing CSV
One of many possibilities:
.elements[]
| [.id] + [.fieldValues[].value]
| #csv
With the -r command-line option, this produces the following CSV:
"2185","9/6/2017 12:00:00 AM","ABC"
"2186","8/31/2017 12:00:00 AM","DEF"

Related

How can I clean up empty fields when converting CSV to JSON with Miller?

I have several CSV files of item data for a game I'm messing around with that I need to convert to JSON for consumption. The data can be quite irregular with several empty fields per record, which makes for sort of ugly JSON output.
Example with dummy values:
Id,Name,Value,Type,Properties/1,Properties/2,Properties/3,Properties/4
01:Foo:13,Foo,13,ACME,CanExplode,IsRocket,,
02:Bar:42,Bar,42,,IsRocket,,,
03:Baz:37,Baz,37,BlackMesa,CanExplode,IsAlive,IsHungry,
Converted output:
[
{
"Id": "01:Foo:13",
"Name": "Foo",
"Value": 13,
"Type": "ACME",
"Properties": ["CanExplode", "IsRocket", ""]
},
{
"Id": "02:Bar:42",
"Name": "Bar",
"Value": 42,
"Type": "",
"Properties": ["IsRocket", "", ""]
},
{
"Id": "03:Baz:37",
"Name": "Baz",
"Value": 37,
"Type": "BlackMesa",
"Properties": ["CanExplode", "IsAlive", "IsHungry"]
}
]
So far I've been quite successful with using Miller. I've managed to remove completely empty columns from the CSV as well as aggregate the Properties/X columns into a single array.
But now I'd like to do two more things to improve the output format to make consuming the JSON easier:
remove empty strings "" from the Properties array
replace the other empty strings "" (e.g. Type of the second record) with null
Desired output:
[
{
"Id": "01:Foo:13",
"Name": "Foo",
"Value": 13,
"Type": "ACME",
"Properties": ["CanExplode", "IsRocket"]
},
{
"Id": "02:Bar:42",
"Name": "Bar",
"Value": 42,
"Type": null,
"Properties": ["IsRocket"]
},
{
"Id": "03:Baz:37",
"Name": "Baz",
"Value": 37,
"Type": "BlackMesa",
"Properties": ["CanExplode", "IsAlive", "IsHungry"]
}
]
Is there a way to achieve that with Miller?
My current commands are:
mlr -I --csv remove-empty-columns file.csv to clean up the columns
mlr --icsv --ojson --jflatsep '/' --jlistwrap cat file.csv > file.json for the conversion
It's not probably the way you want to do it. I use also jq.
Running
mlr --c2j --jflatsep '/' --jlistwrap remove-empty-columns then cat input.csv | \
jq '.[].Properties|=map(select(length > 0))' | \
jq '.[].Type|=(if . == "" then null else . end)'
you will have
[
{
"Id": "01:Foo:13",
"Name": "Foo",
"Value": 13,
"Type": "ACME",
"Properties": [
"CanExplode",
"IsRocket"
]
},
{
"Id": "02:Bar:42",
"Name": "Bar",
"Value": 42,
"Type": null,
"Properties": [
"IsRocket"
]
},
{
"Id": "03:Baz:37",
"Name": "Baz",
"Value": 37,
"Type": "BlackMesa",
"Properties": [
"CanExplode",
"IsAlive",
"IsHungry"
]
}
]
Using Miller, you can "filter out" the empty fields from each record with:
mlr --c2j --jflatsep '/' --jlistwrap put '
$* = select($*, func(k,v) {return v != ""})
' file.csv
remark: actually, we're building a new record containing the non-empty fields instead of deleting the empty fields from the record; the final result is equivalent though:
[
{
"Id": "01:Foo:13",
"Name": "Foo",
"Value": 13,
"Type": "ACME",
"Properties": ["CanExplode", "IsRocket"]
},
{
"Id": "02:Bar:42",
"Name": "Bar",
"Value": 42,
"Properties": ["IsRocket"]
},
{
"Id": "03:Baz:37",
"Name": "Baz",
"Value": 37,
"Type": "BlackMesa",
"Properties": ["CanExplode", "IsAlive", "IsHungry"]
}
]

Using jq to convert object to key with values

I have been playing around with jq to format a json file but I am having some issues trying to solve a particular transformation. Given a test.json file in this format:
[
{
"name": "A", // This would be the first key
"number": 1,
"type": "apple",
"city": "NYC" // This would be the second key
},
{
"name": "A",
"number": "5",
"type": "apple",
"city": "LA"
},
{
"name": "A",
"number": 2,
"type": "apple",
"city": "NYC"
},
{
"name": "B",
"number": 3,
"type": "apple",
"city": "NYC"
}
]
I was wondering, how can I format it this way using jq?
[
{
"key": "A",
"values": [
{
"key": "NYC",
"values": [
{
"number": 1,
"type": "a"
},
{
"number": 2,
"type": "b"
}
]
},
{
"key": "LA",
"values": [
{
"number": 5,
"type": "b"
}
]
}
]
},
{
"key": "B",
"values": [
{
"key": "NYC",
"values": [
{
"number": 3,
"type": "apple"
}
]
}
]
}
]
I have followed this thread Using jq, convert array of name/value pairs to object with named keys and tried to group the json using this expression
jq '. | group_by(.name) | group_by(.city) ' ./test.json
but I have not been able to add the keys in the output.
You'll want to group the items at the different levels and building out your result objects as you want.
group_by(.name) | map({
key: .[0].name,
values: (group_by(.city) | map({
key: .[0].city,
values: map({number,type})
}))
})
Just keep in mind that group_by/1 yields groups in a sorted order. You'll probably want an implementation that preserves that order.
def group_by_unsorted(key_selector):
reduce .[] as $i ({};
.["\($i|key_selector)"] += [$i]
)|[.[]];

How to filter missing inner key by using jq

Got a json input like this:
[
{
"dimensions": "helloworld",
"metrics": "sum(is_error)",
"values": {
"timestamp": 1558322460000,
"value": "0.0"
}
},
{
"dimensions": "helloworld",
"metrics": "sum(is_error)",
"values": {
"timestamp": 1558322160000,
"value": "0.0"
}
},
{
"dimensions": "helloworld",
"metrics": "sum(is_error)",
"values": "3423.25"
}
]
The third object doesnot have a timestamp on it. How could I return all the object only have a timestamp on it. Like the following:
[
{
"dimensions": "helloworld",
"metrics": "sum(is_error)",
"values": {
"timestamp": 1558322460000,
"value": "0.0"
}
},
{
"dimensions": "helloworld",
"metrics": "sum(is_error)",
"values": {
"timestamp": 1558322160000,
"value": "0.0"
}
}
]
Many thanks in advance.
Cheers,
Vincent
map( select ( .values | has("timestamp")? ))
and here's an alternative solution, using a walk-path unix tool for JSON: jtc:
bash $ <file.json jtc -w'<timestamp>l:[-2]' -j
[
{
"dimensions": "helloworld",
"metrics": "sum(is_error)",
"values": {
"timestamp": 1558322460000,
"value": "0.0"
}
},
{
"dimensions": "helloworld",
"metrics": "sum(is_error)",
"values": {
"timestamp": 1558322160000,
"value": "0.0"
}
}
]
bash $
it finds each (all) label timestamp, then goes 2 levels up from the found json entry and prints found Json element. -j wraps all printed walks back into array.
PS> Disclosure: I'm the creator of the jtc tool
Working example:
[ .[] | select (.values | has("timestamp")?) ]
https://jqplay.org/s/n5jsRsPMhW
Or alternative:
[ .[] | select (.values.timestamp?) ]
https://jqplay.org/s/HRWV44YgUp
P.S. This one was incorrect because of after .[] you are working with each item separately, not with array. So 'map' function is unnecessary.

Convert JSON to CSV - string manipulation (jq, bash, awk, sed, etc.)

I'm in a dire need of help for a script to basically convert JSON text to CSV text in an attempt to copy users from one AWS Cognito userpool to another.
The export JSON looks like this:
{
"Users": [
{
"Username": "user.name",
"Attributes": [
{
"Name": "sub",
"Value": "some-value"
},
{
"Name": "email_verified",
"Value": "true"
},
{
"Name": "custom:jobtitle",
"Value": Director"
},
{
"Name": "custom:user_id",
"Value": "38"
},
{
"Name": "email",
"Value": "foo.bar#email.com"
}
],
"UserCreateDate": some-value,
"UserLastModifiedDate": some-value,
"Enabled": some-value,
"UserStatus": "some-value"
}
[more lines down here]...
] }
Then the CSV file would contain these lines:
,,,,,,,,,foo.bar#email.com,TRUE,,,,,,FALSE,,,Director,,38,FALSE,foo.bar
[more lines down here]...
So, the variables would be like this for JSON:
{
"Users": [
{
"Username": "%USERNAME%",
"Attributes": [
{
"Name": "sub",
"Value": "some-value"
},
{
"Name": "email_verified",
"Value": "true"
},
{
"Name": "custom:jobtitle",
"Value": %JOB_TITLE%"
},
{
"Name": "custom:user_id",
"Value": "%USER_ID%"
},
{
"Name": "email",
"Value": %EMAIL%"
}
],
"UserCreateDate": some-value,
"UserLastModifiedDate": some-value,
"Enabled": some-value,
"UserStatus": "some-value"
}
...
]
}
And like this for CSV:
,,,,,,,,,%EMAIL%,TRUE,,,,,,FALSE,,,%JOB_TITLE%,,%USER_ID%,FALSE,%USERNAME%
where %EMAIL%, %JOB_TITLE%, %USER_ID%, and %USERNAME% are variables, everything else should be just string.
Appreciate your help in advanced guys.
Consider first this filter:
.Users[].Attributes
| map(select(.Name | . == "custom:jobtitle" or . == "custom:user_id" or . == "email") )
| from_entries
| [ .email, .["custom:jobtitle"], .["custom:user_id"] ]
| #csv
The trick used here is the use of from_entries to convert the array of Name/Value pairs to an object with the Names as keys.
Assuming valid JSON input along the lines shown in the Q, invoking jq with the -r option would yield:
"foo.bar#email.com","Director","38"
Unfortunately the precise requirements are not so clear to me, but you should be able to adapt the above in accordance with your needs.

jq add capturing group result outside

For example,
Input:
{
"id":"abc",
"name": "name-middlenane-lastname-1"
},
{
"id":"123",
"name": "fname-flast-2"
}
response:
{
"id":"abc",
"name": "name-middlename-lastname-1",
"newkey": "name-middlename-lastname"
},
{
"id":"123",
"name": "fname-flast-2",
"newkey": "fname-flast"
}
The filed name in each object is a string with characters and numbers separated by "-" hyphen. I need the complete string from beginning till the starting number. I don't want anything which is there after the number. And then the add new field with key as newkey and value should be extracted string without the number. Thus, the output should contain old fields as well as new one.
jq solution:
Sample input.json:
[
{
"id": "abc",
"name": "name-middlenane-lastname-1"
},
{
"id": "123",
"name": "fname-flast-2"
}
]
jq 'map(. + (.name | capture("(?<newkey>.+)-[0-9]+")) )' input.json
The output:
[
{
"id": "abc",
"name": "name-middlenane-lastname-1",
"newkey": "name-middlenane-lastname"
},
{
"id": "123",
"name": "fname-flast-2",
"newkey": "fname-flast"
}
]