Creating a CSV from json using jq, based on elements in array - json

I have the following json format that I need to convert to CSV
[{
"name": "joe",
"age": 21,
"skills": [{
"lang": "spanish",
"grade": "47",
"school": {
"name": "my school",
"url": "example.com/sp-school"
}
}, {
"lang": "english",
"grade": "87"
}]
},
{
"name": "sarah",
"age": 34,
"skills": [{
"lang": "french",
"grade": "47",
"school": {
"name": "my school",
"url": "example.com/sp-school"
}
}, {
"lang": "english",
"grade": "87"
}]
}, {
"name": "jim",
"age": 26,
"skills": [{
"lang": "spanish",
"grade": "60"
}, {
"lang": "english",
"grade": "66",
"school": {
"name": "eg school",
"url": "eg-school.com"
}
}]
}
]
to convert to csv
name,age,grade,school,url,file,line_number
joe,21,47,"my school","example.com/sp-school",sample.json,1
jim,26,60,"","",sample.json,3
So add the top level fields and the object from the skills array if lang=spanish and the school hash from the skills object for spanish if it exists
I'd also like to add the file and line number it came from.
I would like to use jq for the job, but can't figure out the syntax , anyone help me out ?

With your data in input.json, and the following jq program in tocsv.jq:
.[]
| [.name, .age] +
(.skills[]
| select(.lang == "spanish")
| [.grade, .school.name, .school.url, input_filename, input_line_number] )
| #csv
the invocation:
jq -r -f tocsv.jq input.json
yields:
"joe",21,"47","my school","example.com/sp-school","input.json",51
"jim",26,"60",,,"input.json",51
If you want the number-valued strings converted to numbers, you could use the "tonumber" filter. If you want the null-valued fields replaced by strings, use e.g. .school.name // ""
Of course this approach doesn't yield a very useful line number. One approach that would yield higher granularity would be to stream the individual objects into jq, but then you'd lose the filename. To recover the filename you could pass it in as an argument. So you would have a pipeline like so:
jq -c '.[]' input.json | jq -r --arg file input.json -f tocsv2.jq
where tocsv2.jq would be like tscsv.jq above but without the initial .[] |, and with $file instead of input_filename.
Finally, please also consider using the TSV format (#tsv) rather than the rather messy CSV format (#csv).

Related

How to dynamically update one json object and put it back into the original json objects?

How to dynamically update one JSON object and put it back into the original JSON objects variable?
I have one variable with the following JSON data in it.
test='[
{
"Name": "James",
"Mobile": 12345678,
"Gender": "Male",
"Boolean": true,
"Pet": "cat"
},
{
"Name": "John",
"Mobile": 1234567875,
"Gender": "Male",
"Boolean": true,
"Pet": "rat"
},
{
"Name": "Jennifer",
"Mobile": 1234567890,
"Gender": "Female",
"Boolean": true,
"Pet": "Dog"
},
{
"Name": "Julia",
"Mobile": 1234567890,
"Gender": "Female",
"Boolean": true,
"Pet": "Dog"
},
{
"Name": "Jeff",
"Mobile": 9871234567890,
"Gender": "Male",
"Boolean": true,
"Pet": "Fish"
},
{
"Name": "Jones",
"Mobile": 79871234567890,
"Gender": "Female",
"Boolean": true,
"Pet": "Parrot"
}
]'
items=$(echo "$test" | jq -c -r '.[]')
for item in ${items[#]}; do
uName=$(echo $item | jq -r '.Name')
if [ "$uName" == "John" ]; then
echo "$item"
echo " "
modifiedTest=$(echo "$item" | jq '.Name = "Tom"')
modifiedTest=$(echo "$modifiedTest" | jq '.Pet = "rabbit"')
echo "$modifiedTest"
fi
done
Now let's say we have the below second JSON object from the above JSON objects
{
"Name": "John",
"Mobile": 1234567875,
"Gender": "Male",
"Boolean": true,
"Pet": "rat"
}
We have updated the above-picked JSON object fields with below
{
"Name": "Tom",
"Mobile": 1234567875,
"Gender": "Male",
"Boolean": true,
"Pet": "rabbit"
}
Now how can we add/update the above modified JSON object back into the original objects list variable 'test' at the exact position (2nd position in this case) but using a filter of 'Name=John' and in a dynamic way we don't know exact index position of this object using bash scripting?
The tool jq can be used for JSON-manipulation:
jq '.[1].Name = "Tom" | .[1].Pet = "rabbit"' data.json
This will output the modified file on the console.
Note that in general jq [filter] data.json > data.json will not work and even when it seems to, overwriting the input file in this way should be avoided. One option would be to use a shell variable:
json_data=$(jq '.[1].Name = "Tom" | .[1].Pet = "rabbit"' data.json)
echo $json_data > data.json
Another option would be to use a temporary file; still another would be to use a utility such as sponge in moreutils.
Note that your shown file is not valid JSON and so jq will not be able to read it as JSON. To fix it, I have surrounded everything by [ and ] and removed the extra comma in the John object.
What if we don't know the exact index position of this object and use a filter of 'Name=John'
< data.json jq '
(map(.Name)| index("John")) as $ix
| (select($ix)
| .[$ix] |= (.Name = "Tom" | .Pet = "rabbit")) // .
' | sponge data.json
But you might want to backup data.json first.

Finding highest value in a specific JSON field in bash

I am writing a bash script that curls POST an API. The response from the post has values returned in the following format:
{
"other": "irrelevant-fields",
"results": [
{
"datapoints": [
{"timestamp": 1555977600, "value": 0},
{"timestamp": 1555984800, "value": 15},
{"timestamp": 1555992000, "value": 5}
]
}
]
}
I want to extract the highest figure from the "value" columns but I am having problems writing this code in bash. I am a beginner at JSON and there are no real references I can use to filter out the strings and values I don't need as each array is the same except for the timestamp, but I don't care about the timestamp, just the highest value returned.
My current code is just a generic way to extract the largest number from a file in bash:
grep -Eo '[[:digit:]]+' | sort -n | tail -n 1
...but instead of 15, that returns 1555992000.
echo '
{
"other": "irrelevant-fields",
"results": [
{
"datapoints": [
{"timestamp": 1555977600, "value": 0},
{"timestamp": 1555984800, "value": 15},
{"timestamp": 1555992000, "value": 5}
]
}
]
}
' | jq '.results[].datapoints | max_by(.value)'
The output will be like this:
{
"timestamp": 1555984800,
"value": 15
}
For more information, see this Medium post on jq, or the program's home page at https://stedolan.github.io/jq/
Please process JSON with a proper JSON interpreter/parser, like Xidel.
$ cat <<EOF | xidel -s - -e '$json/max((.//datapoints)()/value)'
{
"other": "irrelevant-fields",
"results": [
{
"datapoints": [
{"timestamp": 1555977600, "value": 0},
{"timestamp": 1555984800, "value": 15},
{"timestamp": 1555992000, "value": 5}
]
}
]
}
EOF
This returns 15.
(or in full: -e '$json/max((results)()/(datapoints)()/value)')

Use jq to Convert json File to csv

I am using curl to pull Alien Vault OTX pulses from their API, the initial output I receive is in json format and I need to convert this json into csv as so it can be read by some other software. I aim to use jq as many others have recommended it.
{ "count": 1210, "next": "https://otx.alienvault.com/api/v1/pulses/subscribed?page=2", "results": [
{
"industries": [],
"tlp": "white",
"description": "Tropic Trooper (also known as KeyBoy) levels its campaigns against Taiwanese, Philippine, and Hong Kong targets, focusing on their government, healthcare, transportation, and high-tech industries. Its operators are believed to be very organized and develop their own cyberespionage tools that they fine-tuned in their recent campaigns. Many of the tools they use now feature new behaviors, including a change in the way they maintain a foothold in the targeted network.",
"created": "2018-03-14T17:24:48.014000",
"tags": [
"china",
"keyboy",
"tropic trooper"
],
"modified": "2018-03-14T17:24:48.014000",
"author_name": "AlienVault",
"public": 1,
"extract_source": [],
"references": [
"https://blog.trendmicro.com/trendlabs-security-intelligence/tropic-trooper-new-strategy/"
],
"targeted_countries": [],
"indicators": [
{
"indicator": "CVE-2018-0802",
"description": "",
"created": "2018-03-14T17:25:03",
"title": "",
"content": "",
"type": "CVE",
"id": 406248965
},
{
"indicator": "fb9c9cbf6925de8c7b6ce8e7a8d5290e628be0b82a58f3e968426c0f734f38f6",
"description": "",
"created": "2018-03-14T17:25:03",
"title": "",
"content": "",
"type": "FileHash-SHA256",
"id": 438581959
}
],
"more_indicators": false,
"revision": 1,
"adversary": "Tropic Trooper",
"id": "5aa95ae02781860367e354e4",
"name": "Tropic Troopers New Strategy"
}
I am looking to use jq to extract certain fields and convert to csv. My expected output would look something like:
"CVE-2018-0802","CVE"
"tibetnews.today","domain"
"02281e26e89b61d84e2df66a0eeb729c5babd94607b1422505cd388843dd5456","FileHash-SHA256"
So far I have tried:
<AV.json jq -r '.results.indicators[] | [.indicator, .type] | #csv' AV.csv
Any help is greatly appreciated.
Cheers,
George
.results is an array so you'll have to expand it too. This can be done either by:
.results[] | .indicators[] | [.indicator, .type] | #csv
or more compactly:
.results[].indicators[] | [.indicator, .type] | #csv
You'll also have to direct the output to the designated file, e.g.:
jq -r -f program.jq < AV.json > AV.csv
Output
"CVE-2018-0802","CVE"
"fb9c9cbf6925de8c7b6ce8e7a8d5290e628be0b82a58f3e968426c0f734f38f6","FileHash-SHA256"

reshape json data using jq

I'm trying to reshape a json document and I assumed it would be easy to do using jq but I haven't been trying for several hours now and no success ...
(Please note that I'm not a jq jedi and the doc did not help)
I want to go from this :
{
"results": [
{
"profile": {
"birthYear": 1900,
"locale": "en_EN",
"city": "Somewhere, Around",
"timezone": "2",
"age": 52,
"gender": "m"
},
"UID": "SQSQSQerl7XSQSqSsqSQ"
}
]
}
to this :
{
"birthYear": 1900,
"locale": "en_EN",
"city": "Somewhere, Around",
"timezone": "2",
"age": 52,
"gender": "m",
"UID": "SQSQSQerl7XSQSqSsqSQ"
}
I got what below using this filter : .results[].profile , .results[].UID
{
"birthYear": 1900,
"locale": "en_EN",
"city": "Somewhere, Around",
"timezone": "2",
"age": 52,
"gender": "m"
}
"UID": "SQSQSQerl7XSQSqSsqSQ"
Thanks in advance for your help..
You can combine two objects with the addition operator.
jq '.results[] | .profile + {UID}'
.profile is already an object.
The other object is created with {}. {UID} is shorthand for {"UID" : .UID}
there are probably better ways but here you go
jq '.results[0].profile * .results[0] | del(.profile)'
explanation:
merge recursivly container with nested-container by means of A * B, then pipe to del( to remove nested container

Select or exclude multiples object with an array of IDs

I have the following JSON :
[
{
"id": "1",
"foo": "bar-a",
"hello": "world-a"
},
{
"id": "2",
"foo": "bar-b",
"hello": "world-b"
},
{
"id": "10",
"foo": "bar-c",
"hello": "world-c"
},
{
"id": "42",
"foo": "bar-d",
"hello": "world-d"
}
]
And I have the following array store in a variable: ["1", "2", "56", "1337"] (note the IDs are string, and may contain any regular character).
So, thanks to this SO, I found a way to filter my original data. jq 'jq '[.[] | select(.id == ("1", "2", "56", "1337"))]' ./data.json (note the array is surrounded by parentheses and not brackets) produces :
[
{
"id": "1",
"foo": "bar-a",
"hello": "world-a"
},
{
"id": "2",
"foo": "bar-b",
"hello": "world-b"
}
]
But I would also liked to do the opposite (basically excluding IDs instead of selecting them). Using select(.id != ("1", "2", "56", "1337")) doesn't work and using jq '[. - [.[] | select(.id == ("1", "2", "56", "1337"))]]' ./data.json seems very ugly and it doesn't work with my actual data (an output of aws ec2 describe-instances).
So have you any idea to do that? Thank you!
To include them, you need to verify that the id is any of the values in the keep set.
$ jq --argjson include '["1", "2", "56", "1337"]' 'map(select(.id == $include[]))' ...
To exclude them, you need to verify that all values are not in your excluded set. But it might just be easier to take the original set and remove the items that are in the excluded set.
$ jq --argjson exclude '["1", "2", "56", "1337"]' '. - map(select(.id == $exclude[]))' ...
Here is a solution that uses inside. Assuming you run jq as
jq -M --argjson IDS '["1","2","56","1337"]' -f filter.jq data.json
This filter.jq
map( select([.id] | inside($IDS)) )
produces the ids from data.json that are in the $IDS array:
[
{
"id": "1",
"foo": "bar-a",
"hello": "world-a"
},
{
"id": "2",
"foo": "bar-b",
"hello": "world-b"
}
]
and this filter.jq
map( select([.id] | inside($IDS) | not) )
produces the ids from data.json that are not in the $IDS array:
[
{
"id": "10",
"foo": "bar-c",
"hello": "world-c"
},
{
"id": "42",
"foo": "bar-d",
"hello": "world-d"
}
]