Using JQ to specific csv format - json

I have a json that looks like this:
[
{
"auth": 1,
"status": "Active",
"userCustomAttributes": [
{
"customAttributeName": "Attribute 1",
"customAttributeValue": "Value 1"
},
{
"customAttributeName": "Attribute 2",
"customAttributeValue": "Value 2"
},
{
"customAttributeName": "Attribute 3",
"customAttributeValue": "Value 3"
}
],
},
{
"auth": 1,
"status": "Active",
"userCustomAttributes": [
{
"customAttributeName": "Attribute 1",
"customAttributeValue": "Value 1"
},
{
"customAttributeName": "Attribute 2",
"customAttributeValue": "Value 2"
},
{
"customAttributeName": "Attribute 3",
"customAttributeValue": "Value 3"
},
{
"customAttributeName": "Attribute 4",
"customAttributeValue": "Value 4"
}
],
}
]
I would like to parse this and have a css output that looks something like this:
authType, status, attribute 1, attribute 2, attribute 3, attribute 4
"1", "active", "value1", "value2", "value3",""
"1", "active", "value1", "value2", "value3","value 4"
The json has over 180k records in the array so it would need to loop through all of them. Some records don't have all the attributes. Some have all 4 yet some only have 1. I am hoping to show a null value in the csv for the records that don't have the attribute.

With your sample input, the following program, which does not depend on the ordering of the "attribute" keys:
jq -r '
["Attribute 1", "Attribute 2", "Attribute 3", "Attribute 4"] as $attributes
# Header row
| ["authType", "status"]
+ ($attributes | map( (.[:1] | ascii_upcase) + .[1:])),
# Data rows:
(.[]
| (INDEX(.userCustomAttributes[]; .customAttributeName)
| map_values(.customAttributeValue)) as $dict
| [.auth, .status] + [ $dict[ $attributes[] ] ]
)
| #csv
'
produces the following CSV:
"authType","status","Attribute 1","Attribute 2","Attribute 3","Attribute 4"
1,"Active","Value 1","Value 2","Value 3",
1,"Active","Value 1","Value 2","Value 3","Value 4"
You can easily modify this to emit a literal string of your choice in place of a JSON null value.
Explanation
$dict[ $a[] ] produces the stream of values:
$dict[ $a[0] ]
$dict[ $a[1] ]
...
This is used to ensure the columns are produced in the correct order, independently of the ordering or even presence of the keys.

Related

Append JSON file after specific array index by using shell script

I want to append some content by using shell script.
I have a JSON file test.json as below.
{
"reference": "Json Test",
"title": {
"a": "Json Test"
},
"components": [
{
"reference": "Json Test",
"type": "panel",
"content": [
{
"link": "abc/123",
"label": {
"a": "for test 123 - a",
"b": "for test 123 - b"
}
},
{
"link": "abc/456",
"label": {
"a": "for test 456 - a",
"b": "for test 456 - b"
}
},
{
"link": "abc/789",
"label": {
"a": "for test 789 - a",
"b": "for test 789 - b"
}
}
]
}
]
}
I want to append the content and output as following by using shell script (*.sh) How can I achieve this ?
{
"reference": "Json Test",
"title": {
"a": "Json Test"
},
"components": [
{
"reference": "Json Test",
"type": "panel",
"content": [
{
"link": "abc/123",
"label": {
"a": "for test 123 - a",
"b": "for test 123 - b"
}
},
{
"link": "abc/101112",
"label": {
"a": "for test 101112 - a",
"b": "for test 101112 - b"
}
},
{
"link": "abc/456",
"label": {
"a": "for test 456 - a",
"b": "for test 456 - b"
}
},
{
"link": "abc/789",
"label": {
"a": "for test 789 - a",
"b": "for test 789 - b"
}
}
]
}
]
}
I tried to access the index and add some test string, the below command will replace the original data.
jq '.components[].content[1] + { "link" : "test" } ' test.json
You can use the slice filter to extract the head and the tail of the array, then use + to concatenate head + the new object + the tail. Finally, use update-assignment |= to modify the array:
.components[].content |= .[0:1] + [{ link: "test" }] + .[1:]
If you are planning on using this more often, consider defining a reusable function:
def splice($at; $obj): .[0:$at] + [$obj] + .[$at:];
.components[].content |= splice(1; {link: "test"})
Grab the empty sub-array at position 1 (slicing either by start and end position .[1:1], or by start position and length .[1:][:0]), and assign to it your insert value formatted as (single-element) array [{"link": "test"}] (as you are assigning to an array after all - add more items to it if you want to add all of them at once). This looks almost like your original attempt:
jq '.components[].content[1:1] = [{"link": "test"}]' test.json
For convenience, you can also turn this into an insertAt function:
def insertAt($pos; $val): .[$pos:$pos] = [$val];
.components[].content |= insertAt(1; {"link": "test"})

Conditionally merging two separate JSON objects in JQ

This is how my input looks:
{
"text" : "Some text here"
}
{
"usage": {
"text_units": 1,
"text_characters": 101,
"features": 1
},
"language": "en",
"categories": [
{
"score": 0.655041,
"label": "/technology law, govt and politics/espionage and intelligence/surveillance"
},
{
"score": 0.639809,
"label": "/technology and computing/computer security/network security"
},
{
"score": 0.624533,
"label": "/business and industrial/business operations"
}
]
}
Using JQ, if the first element of array category in the second object contains /technology, I want to add a new field named relevant with 1 as value (which I managed), and copy the text field from the first object.
So, the expected output is:
{
"usage": {
"text_units": 1,
"text_characters": 101,
"features": 1
},
"language": "en",
"categories": [
{
"score": 0.655041,
"label": "/technology law, govt and politics/espionage and intelligence/surveillance"
},
{
"score": 0.639809,
"label": "/technology and computing/computer security/network security"
},
{
"score": 0.624533,
"label": "/business and industrial/business operations"
}
],
"relevant": 1,
"text": "Some text here"
}
And this is what I have done so far:
if .categories[0].label | test("/technology"; "i") then . |=( . + {"relevant": 1} + {"text": .text}) else . |= . + {"relevant": 0} end
Link to a demo on jqplay
Your input consists of two separate objects. In order to be able to access the first while processing the second, you could save the first into a variable.
. as {$text} | input | if .categories[0].label | test("/technology"; "i") then . + {relevant: 1, $text} else . + {relevant: 0} end
Online demo

Split a string and trim a known prefix from each part in a complex JSON structure

I'm dealing with a fairly complex JSON-structure in which a single entry needs to be edited in several places. For example:
[
{
"name": "test 1",
"stuff": {
"properties": {
"id": 0,
"stuff_list": [
{
"entryId": 1,
"description": "- item 1\n- item 2\n- item 3"
},
{
"entryId": 2,
"description": "- item 1\n- item 2\n- item 3"
}
]
}
}
},
{
"name": "test 2",
"stuff": {
"properties": {
"id": 1,
"stuff_list": [
{
"entryId": 1,
"description": null
},
{
"entryId": 2,
"description": "- item 1\n- item 2\n- item 3"
}
]
}
}
}
]
Here I would like to edit each "description"-element: The string needs to be split at each \n and the substrings "^\n?-\s" of each resulting array element need to be removed. So it should result in:
{
"entryId": 1,
"description": ["item 1", "item 2", "item 3"]
}
My first approach is:
jq '.[].stuff.properties.stuff_list[].description | split("\n")' the_file.json
but that's not working in the first place becaue of the null values that can occur at some places. So now I wonder: how can I achieve what I want?
An alternate version using split() on the \n and trimming string - on the left, would be to do
.[].stuff.properties.stuff_list[].description |=
if . != null then
split("\n") | map(ltrimstr("- "))
else
.
end
jqplay - Demo

Remove parent elements with certain key-value pairs using JQ

I need to remove elements from a json file based on certain key values. Here is the file I am trying to process.
{
"element1": "Test Element 1",
"element2": {
"tags": "internal",
"data": {
"data1": "Test Data 1",
"data2": "Test Data 2"
}
},
"element3": {
"function1": {
"tags": [
"new",
"internal"
]
},
"data3": "Test Data 3",
"data4": "Test Data 4"
},
"element4": {
"function2": {
"tags": "new"
},
"data5": "Test Data 5"
}
}
I want to remove all elements that have a "tag" with value "internal". So the result should look like this:
{
"element1": "Test Element 1",
"element4": {
"function2": {
"tags": "new"
},
"data5": "Test Data 5"
}
}
I tried various approaches but I just don't get it done using jq. Any ideas? Thanks.
Just to add some more complexity. Let's assume the json is:
{
"element1": "Test Element 1",
"element2": {
"tags": "internal",
"data": {
"data1": "Test Data 1",
"data2": "Test Data 2"
}
},
"element3": {
"function1": {
"tags": [
"new",
"internal"
]
},
"data3": "Test Data 3",
"data4": "Test Data 4"
},
"element4": {
"function2": {
"tags": "new"
},
"data5": "Test Data 5"
},
"structure1" : {
"substructure1": {
"element5": "Test Element 5",
"element6": {
"tags": "internal",
"data6": "Test Data 6"
}
}
}
}
and I want to get
{
"element1": "Test Element 1",
"element4": {
"function2": {
"tags": "new"
},
"data5": "Test Data 5"
},
"structure1" : {
"substructure1": {
"element5": "Test Element 5",
}
}
}
Not easy, finding elements which have a tags key somewhere whose value is either the string internal, or an array of which an element is the string internal in a reliable way is only possible with a complex boolean expression as below.
Once found, deleting them can be done using the del built-in.
del(.[] | first(select(recurse
| objects
| has("tags") and (.tags
| . == "internal" or (
type == "array" and index("internal")
)
)
)))
Online demo
I think I figured out how to also solve the more complex case. I am now running:
walk(if type == "object" and has("tags") and (.tags | . == "internal" or (type == "array" and index("internal"))) then del(.) else . end) | delpaths([paths as $path | select(getpath($path) == null) | $path])
This will remove all elements that contain 'internal' as 'tag'.
The following solution is written with a helper function for clarity. The helper function uses any for efficiency and is defined so as to add a dash of generality.
To understand the solution, it will be helpful to know about with_entries and the infix // operator, both of which are explained in the jq manual.
# Does the incoming JSON value contain an object which has a .tags
# value that is equal to $value or to an array containing $value ?
def hasTag($value):
any(.. | select(type=="object") | .tags;
. == $value or (type == "array" and index($value)));
Assuming the top-level JSON entity is a JSON object, we can now simply write:
with_entries( select( .value | hasTag("internal") | not) )

How to check for null or empty in jq and substitute for empty string in jq transformation

How to check for null or empty in jq and substitute for empty string in jq transformation.
Example in below JSON, this is the JQ
JQ:
.amazon.items[] | select(.name | contains ("shoes")) as $item |
{
activeItem: .amazon.activeitem,
item : {
id : $item.id,
state : $item.state,
status : if [[ $item.status = "" or $item.status = null ]];
then 'IN PROCESS' ; else $item.status end
}
}
JSON:
{
"amazon": {
"activeitem": 2,
"items": [
{
"id": 1,
"name": "harry potter",
"state": "sold"
},
{
"id": 2,
"name": "adidas shoes",
"state": "in inventory"
},
{
"id": 3,
"name": "watch",
"state": "returned"
},{
"id": 4,
"name": "Nike shoes",
"state": "in inventory"
}
]
}
}
I want to add a default string "In Process" if the status is empty or Null.
Based on Item condition, using the query below and take the first object from the filtered results.
code
.amazon.items[] | select(.name | contains ("shoes"))
code
Expected Output:
{
"activeitem": 2,
"item": {
"id": 2,
"name": "adidas shoes",
"state": "in inventory",
"status": "IN PROCESS"
}
}
The key here is to use |=:
.amazon.item.status |=
if . == null or . == ""
then "IN PROCESS"
else .
end