Convert JSON to CSV - string manipulation (jq, bash, awk, sed, etc.) - json

I'm in a dire need of help for a script to basically convert JSON text to CSV text in an attempt to copy users from one AWS Cognito userpool to another.
The export JSON looks like this:
{
"Users": [
{
"Username": "user.name",
"Attributes": [
{
"Name": "sub",
"Value": "some-value"
},
{
"Name": "email_verified",
"Value": "true"
},
{
"Name": "custom:jobtitle",
"Value": Director"
},
{
"Name": "custom:user_id",
"Value": "38"
},
{
"Name": "email",
"Value": "foo.bar#email.com"
}
],
"UserCreateDate": some-value,
"UserLastModifiedDate": some-value,
"Enabled": some-value,
"UserStatus": "some-value"
}
[more lines down here]...
] }
Then the CSV file would contain these lines:
,,,,,,,,,foo.bar#email.com,TRUE,,,,,,FALSE,,,Director,,38,FALSE,foo.bar
[more lines down here]...
So, the variables would be like this for JSON:
{
"Users": [
{
"Username": "%USERNAME%",
"Attributes": [
{
"Name": "sub",
"Value": "some-value"
},
{
"Name": "email_verified",
"Value": "true"
},
{
"Name": "custom:jobtitle",
"Value": %JOB_TITLE%"
},
{
"Name": "custom:user_id",
"Value": "%USER_ID%"
},
{
"Name": "email",
"Value": %EMAIL%"
}
],
"UserCreateDate": some-value,
"UserLastModifiedDate": some-value,
"Enabled": some-value,
"UserStatus": "some-value"
}
...
]
}
And like this for CSV:
,,,,,,,,,%EMAIL%,TRUE,,,,,,FALSE,,,%JOB_TITLE%,,%USER_ID%,FALSE,%USERNAME%
where %EMAIL%, %JOB_TITLE%, %USER_ID%, and %USERNAME% are variables, everything else should be just string.
Appreciate your help in advanced guys.

Consider first this filter:
.Users[].Attributes
| map(select(.Name | . == "custom:jobtitle" or . == "custom:user_id" or . == "email") )
| from_entries
| [ .email, .["custom:jobtitle"], .["custom:user_id"] ]
| #csv
The trick used here is the use of from_entries to convert the array of Name/Value pairs to an object with the Names as keys.
Assuming valid JSON input along the lines shown in the Q, invoking jq with the -r option would yield:
"foo.bar#email.com","Director","38"
Unfortunately the precise requirements are not so clear to me, but you should be able to adapt the above in accordance with your needs.

Related

How to extract a paticular key from the json

I am trying to extract values from a json that I obtained using the curl command for api testing. My json looks as below. I need some help extracting the value "20456" from here?
{
"meta": {
"status": "OK",
"timestamp": "2022-09-16T14:45:55.076+0000"
},
"links": {},
"data": {
"id": 24843,
"username": "abcd",
"firstName": "abc",
"lastName": "xyz",
"email": "abc#abc.com",
"phone": "",
"title": "",
"location": "",
"licenseType": "FLOATING",
"active": true,
"uid": "u24843",
"type": "users"
}
}
{
"meta": {
"status": "OK",
"timestamp": "2022-09-16T14:45:55.282+0000",
"pageInfo": {
"startIndex": 0,
"resultCount": 1,
"totalResults": 1
}
},
"links": {
"data.createdBy": {
"type": "users",
"href": "https://abc#abc.com/rest/v1/users/{data.createdBy}"
},
"data.fields.user1": {
"type": "users",
"href": "https://abc#abc.com/rest/v1/users/{data.fields.user1}"
},
"data.modifiedBy": {
"type": "users",
"href": "https://abc#abc.com/rest/v1/users/{data.modifiedBy}"
},
"data.fields.projectManager": {
"type": "users",
"href": "https://abc#abc.com/rest/v1/users/{data.fields.projectManager}"
},
"data.parent": {
"type": "projects",
"href": "https://abc#abc.com/rest/v1/projects/{data.parent}"
}
},
"data": [
{
"id": 20456,
"projectKey": "Stratus",
"parent": 20303,
"isFolder": false,
"createdDate": "2018-03-12T23:46:59.000+0000",
"modifiedDate": "2020-04-28T22:14:35.000+0000",
"createdBy": 18994,
"modifiedBy": 18865,
"fields": {
"projectManager": 18373,
"user1": 18628,
"projectKey": "Stratus",
"text1": "",
"name": "Stratus",
"description": "",
"date2": "2019-03-12",
"date1": "2018-03-12"
},
"type": "projects"
}
]
}
I have tried the following, but end up getting error:
▶ cat jqTrial.txt | jq '.data[].id'
jq: error (at <stdin>:21): Cannot index number with string "id"
20456
Also tried this but I get strings outside the object that I am not sure how to remove:
cat jqTrial.txt | jq '.data[]'
Assuming you want the project id not the user id:
jq '
.data
| if type == "object" then . else .[] end
| select(.type == "projects")
| .id
' file.json
There's probably a better way to write the 2nd expression
Indeed, thanks to #pmf
.data | objects // arrays[] | select(.type == "projects").id
Your input consists of two JSON documents; both have a data field on top level. But while the first one is itself an object which has an .id field, the second one is an array with one object item, which also has an .id field.
To retrieve both, you could use the --slurp (or -s) option which wraps both top-level objects into an array, then you can address them separately by index:
jq --slurp '.[0].data.id, .[1].data[].id' jqTrial.txt
24843
20456
Demo

Delete json block with jq command

I have json file with multiple domains which is formated as is showed below. How can I delete whole blocks with domains? For example if I will want to delete whole block in json for domain domain.tld?
I tryed this, but output is error:
jq '."http-01"."domain"[]."main"="domain.tld"' acme.json
jq: error (at acme.json:11483): Cannot iterate over null (null)
formating example file:
{
"http-01": {
"Account": {
"Email": "mail#placeholder.tld",
"Registration": {
"body": {
"status": "valid",
"contact": [
"mailto:mail#placeholder.tld"
]
},
"uri": "https://acme-v02.api.letsencrypt.org/acme/acct/110801506"
},
"PrivateKey": "main_priv_key_string",
"KeyType": "4096"
},
"Certificates": [
{
"domain": {
"main": "www.some_domain.tld"
},
"certificate": "cert_string",
"key": "key_string",
"Store": "default"
},
{
"domain": {
"main": "some_domain.tld"
},
"certificate": "cert_string",
"key": "key_string",
"Store": "default"
},
{
"domain": {
"main": "www.some_domain2.tld"
},
"certificate": "cert_string",
"key": "key_string",
"Store": "default"
},
{
"domain": {
"main": "some_domain2.tld"
},
"certificate": "cert_string",
"key": "key_string",
"Store": "default"
}
]
}
}
To delete domain block "www.some_domain.tld" :
jq '."http-01".Certificates |= map(select(.domain.main != "www.some_domain.tld"))' input.json
Your question is quite broad. What is a "block"?
Let's assume you want to delete from within the object under http-01 each field that is of type array and has at index 0 an object satisfying .domain.main == "domain.tld". Then first navigate to where you want to delete from, and update it (|=) using del and select which performs the filtered deletion.
jq '
."http-01" |= del(
.[] | select(arrays[0] | objects.domain.main == "domain.tld")
)
' acme.json
{
"http-01": {
"Account": {
"Email": "email#domain.tld",
"Registration": {
"body": {
"status": "valid",
"contact": [
"mailto:email#domain.tld"
]
},
"uri": "https://acme-v02.api.letsencrypt.org/acme/acct/110801506"
},
"PrivateKey": "long_key_string",
"KeyType": "4096"
}
}
}
Demo
If your "block" is deeper, go deeper before updating. If it is higher, the whole document for instance, there's no need to update, just start with del.

Use jq to output a flat array of JSON objects nested anywhere within source document

I'd like to select/identity-output all objects in arrays under "emp" keys into a flat array of those objects.
[
{
"eng": {
"dev": {
"dir": {
"name": "Mickey"
},
"emp": [
{
"name": "Goofy",
"job": "laugh",
"start": "today"
},
{
"name": "Minnie",
"job": "laugh"
}
]
}
}
},
{
"mgmt": {
"dir": {
"name": "Donald"
},
"emp": [
{
"name": "Woody",
"job": "smile"
},
{
"name": "Buzz",
"job": "smile"
}
]
}
}
]
I'm looking for a flat array of arbitrary objects found in arbitrary locations within the document (in this example, under "emp" parent/keys).
In this example, it would look like
[
{
"name": "Goofy",
"job": "laugh",
"start": "today"
},
{
"name": "Minnie",
"job": "laugh"
},
{
"name": "Woody",
"job": "smile"
},
{
"name": "Buzz",
"job": "smile"
}
]
I've looked through a lot of documentation and am able to do this if I know in advance precisely where these 'emp' keys are in the document, but not if they're distributed through the document at a priori unknown locations/paths.
Use recurse to walk the structure. From all the substrucures, select objects with the emp key. Output the corresponding values and merge the resulting arrays.
jq '[recurse | select (type == "object" and .emp) | .emp ] | add' file.json

Selecting multiple conditionals in JQ

I've just started using jq json parser, is there anyway to choose multiple select?
I have this:
cat file | jq -r '.Instances[] | {ip: .PrivateIpAddress, name: .Tags[]}
| select(.name.Key == "Name")'
And I need to also include the .name.Key == "Type"
This is the JSON:
{
"Instances": [
{
"PrivateIpAddress": "1.1.1.1",
"Tags": [
{
"Value": "Daily",
"Key": "Backup"
},
{
"Value": "System",
"Key": "Name"
},
{
"Value": "YES",
"Key": "Is_in_Domain"
},
{
"Value": "PROD",
"Key": "Type"
}
]
}
]
}
And this is the current output:
{
"ip": "1.1.1.1",
"name": "System"
}
{
"ip": "2.2.2.2",
"name": "host"
}
{
"ip": "3.3.3.3",
"name": "slog"
}
Desired output:
{
"ip": "1.1.1.1",
"name": "System",
"type": "PROD"
}
{
"ip": "2.2.2.2",
"name": "host",
"type": "PROD"
}
{
"ip": "3.3.3.3",
"name": "slog",
"type": "PROD"
}
What is the right way to do it? Thanks.
There's no "right" way to do it, but there are approaches to take that can make things easier for you.
The tags are already in a format that makes converting to objects simple (they're object entries). Convert the tags to an object for easy access to the properties.
$ jq '.Instances[]
| .Tags |= from_entries
| {
ip: .PrivateIpAddress,
name: .Tags.Name,
type: .Tags.Type
}' file

JQ delete property based other property value

I'm trying to write a JQ filter allowing me to selectively filter object properties based on other of it's values.
For example, given following input
{
"object1": {
"Type": "Type1",
"Properties": {
"Property1": "blablabla",
"Property2": [
{
"Key": "Name",
"Value": "xxx"
},
{
"Key": "Surname",
"Value": "yyy"
}
],
"Property3": "xxx"
}
},
"object2": {
"Type": "Type2",
"Properties": {
"Property1": "blablabla",
"Property2": [
{
"Key": "Name",
"Value": "xxx"
},
{
"Key": "Surname",
"Value": "yyy"
}
],
"Property3": "xxx"
}
}
}
I would like to construct a filter, that based upon the object type, say "Type2", deletes or clears a property of that object, say Property2.
The resulting output would then be:
{
"object1": {
"Type": "Type1",
"Properties": {
"Property1": "blablabla",
"Property2": [
{
"Key": "Name",
"Value": "xxx"
},
{
"Key": "Surname",
"Value": "yyy"
}
],
"Property3": "xxx"
}
},
"object2": {
"Type": "Type2",
"Properties": {
"Property1": "blablabla",
"Property3": "xxx"
}
}
}
Any help greatly appreciated. Thanks in advance.
Pretty simple. Find the objects that you want to update, then update them.
Look through the values of your root object, filtering them based on your condition, the update the Properties property deleting the property you want.
(.[] | select(.Type == "Type2")).Properties |= del(.Property2)
Using .[] on an object yields all property values of an object. Also worth mentioning, when you update a value using assignments, the result of the expression just returns the input (in other words, it doesn't change the context).
A direct approach:
.[] |= (if .Type == "Type2" then delpaths([["Properties", "Property2"]]) else . end)