Use jq to Convert json File to csv - json

I am using curl to pull Alien Vault OTX pulses from their API, the initial output I receive is in json format and I need to convert this json into csv as so it can be read by some other software. I aim to use jq as many others have recommended it.
{ "count": 1210, "next": "https://otx.alienvault.com/api/v1/pulses/subscribed?page=2", "results": [
{
"industries": [],
"tlp": "white",
"description": "Tropic Trooper (also known as KeyBoy) levels its campaigns against Taiwanese, Philippine, and Hong Kong targets, focusing on their government, healthcare, transportation, and high-tech industries. Its operators are believed to be very organized and develop their own cyberespionage tools that they fine-tuned in their recent campaigns. Many of the tools they use now feature new behaviors, including a change in the way they maintain a foothold in the targeted network.",
"created": "2018-03-14T17:24:48.014000",
"tags": [
"china",
"keyboy",
"tropic trooper"
],
"modified": "2018-03-14T17:24:48.014000",
"author_name": "AlienVault",
"public": 1,
"extract_source": [],
"references": [
"https://blog.trendmicro.com/trendlabs-security-intelligence/tropic-trooper-new-strategy/"
],
"targeted_countries": [],
"indicators": [
{
"indicator": "CVE-2018-0802",
"description": "",
"created": "2018-03-14T17:25:03",
"title": "",
"content": "",
"type": "CVE",
"id": 406248965
},
{
"indicator": "fb9c9cbf6925de8c7b6ce8e7a8d5290e628be0b82a58f3e968426c0f734f38f6",
"description": "",
"created": "2018-03-14T17:25:03",
"title": "",
"content": "",
"type": "FileHash-SHA256",
"id": 438581959
}
],
"more_indicators": false,
"revision": 1,
"adversary": "Tropic Trooper",
"id": "5aa95ae02781860367e354e4",
"name": "Tropic Troopers New Strategy"
}
I am looking to use jq to extract certain fields and convert to csv. My expected output would look something like:
"CVE-2018-0802","CVE"
"tibetnews.today","domain"
"02281e26e89b61d84e2df66a0eeb729c5babd94607b1422505cd388843dd5456","FileHash-SHA256"
So far I have tried:
<AV.json jq -r '.results.indicators[] | [.indicator, .type] | #csv' AV.csv
Any help is greatly appreciated.
Cheers,
George

.results is an array so you'll have to expand it too. This can be done either by:
.results[] | .indicators[] | [.indicator, .type] | #csv
or more compactly:
.results[].indicators[] | [.indicator, .type] | #csv
You'll also have to direct the output to the designated file, e.g.:
jq -r -f program.jq < AV.json > AV.csv
Output
"CVE-2018-0802","CVE"
"fb9c9cbf6925de8c7b6ce8e7a8d5290e628be0b82a58f3e968426c0f734f38f6","FileHash-SHA256"

Related

Read json values using sed or awk. I am not allowed to use jq

For the following json data, I need to retrieve the value of the status. I tried to look for examples online and adopt the same, but couldn't do it successfully as this json has arrays. Can you please help me retrieving the "status" in the following json?
This is how the jq version looks echo $JSON | jq -r .data.affected_items[].status I need the same using
{
"data": {
"affected_items": [
{
"os": {
"arch": "x86_64",
"major": "2",
"name": "Amazon Linux",
"platform": "amzn",
"uname": "Linux |ip-10-179-120-6.vpc.internal |4.14.256-197.484.amzn2.x86_64 |#1 SMP Tue Nov 30 00:17:50 UTC 2021 |x86_64",
"version": "2"
},
"manager": "wazuh-manager-worker-0",
"dateAdd": "2022-02-24T08:42:52Z",
"lastKeepAlive": "2022-03-08T04:33:44Z",
"group": [
"default"
],
"name": "ec2_us-west-2_279976188247_i-030ccd7d70b84f0ee",
"ip": "10.179.120.6",
"configSum": "ab73af41699f13fdd81903b5f23d8d00",
"node_name": "wazuh-manager-worker-0",
"status": "active",
"version": "Wazuh v4.1.5",
"mergedSum": "56dfa0edef630b932284df2f81bf4a1c",
"id": "006",
"registerIP": "any"
}
],
"total_affected_items": 1,
"total_failed_items": 0,
"failed_items": []
},
"message": "All selected agents information was returned",
"error": 0
}
If this isn't all you need:
$ sed -n 's/.*"status": \("[^"]*"\).*/\1/p' file
"active"
then edit your question to contain a better explanation of your requirements and more truly representative sample input/output that the above doesn't work for.

Combine files in jq based on similar ID object and reform data

Preface: If the following is not possible with jq, then I completely accept that as an answer and will try to force this with bash.
I have two files that contain some IDs that, with some massaging, should be able to be combined into a single file. I have some content that I'll add to that as well (as seen in output). Essentially "mitre_test" should get compared to "sys_id". When compared, the "mitreid" from in2.json becomes technique_ID in the output (and is generally the unifying field of each output object).
Caveats:
There are some junk "desc" values placed in the in1.json that are there to make sure this is as programmatic as possible, and there are actually numerous junk inputs on the true input file I am using.
some of the mitre_test values have pairs and are not in a real array. I can split on those and break them out, but find myself losing the other information from in1.json.
Notice in the "metadata" for the output that is contains the "number" values from in1.json, and stored in a weird way (but the way that the receiving tool requires).
in1.json
[
{
"test": "Execution",
"mitreid": "T1204.001",
"mitre_test": "90b"
},
{
"test": "Defense Evasion",
"mitreid": "T1070.001",
"mitre_test": "afa"
},
{
"test": "Credential Access",
"mitreid": "T1556.004",
"mitre_test": "14b"
},
{
"test": "Initial Access",
"mitreid": "T1200",
"mitre_test": "f22"
},
{
"test": "Impact",
"mitreid": "T1489",
"mitre_test": "fa2"
}
]
in2.json
[
{
"number": "REL0001346",
"desc": "apple",
"mitre_test": "afa"
},
{
"number": "REL0001343",
"desc": "pear",
"mitre_test": "90b"
},
{
"number": "REL0001366",
"desc": "orange",
"mitre_test": "14b,f22"
},
{
"number": "REL0001378",
"desc": "pineapple",
"mitre_test": "90b"
}
]
The output:
[{
"techniqueID": "T1070.001",
"tactic": "defense-evasion",
"score": 1,
"color": "",
"comment": "",
"enabled": true,
"metadata": [{
"name": "DET_ID",
"value": "REL0001346"
}],
"showSubtechniques": true
},
{
"techniqueID": "T1204.001",
"tactic": "execution",
"score": 1,
"color": "",
"comment": "",
"enabled": true,
"metadata": [{
"name": "DET_ID",
"value": "REL0001343"
},
{
"name": "DET_ID",
"value": "REL0001378"
}],
"showSubtechniques": true
},
{
"techniqueID": "T1556.004",
"tactic": "credential-access",
"score": 1,
"color": "",
"comment": "",
"enabled": true,
"metadata": [{
"name": "DET_ID",
"value": "REL0001366"
}],
"showSubtechniques": true
},
{
"techniqueID": "T1200",
"tactic": "initial-access",
"score": 1,
"color": "",
"comment": "",
"enabled": true,
"metadata": [{
"name": "DET_ID",
"value": "REL0001366"
}],
"showSubtechniques": true
}
]
I'm assuming I have some splitting to do on mitre_test with something like .mitre_test |= split(",")), and there are some joins I'm assuming, but doing so causes data loss or mixing up of the data. You'll notice the static data in the output exists as well, but is likely easy to place in and as such isn't as much of an issue.
Edit: reduced some of the match IDs so that it is easier to look at while analyzing the in1 and in2 files. Also simplified the two inputs to have a similar structure so that the answer is easier to understand later.
The requirements are somewhat opaque but it's fairly clear that if the task can be done by computer, it can be done using jq.
From the description, it would appear that one of the unusual aspects of the problem is that the "dictionary" defined by in1.json must be derived by splitting the key names that are CSV (comma-separated values). Here therefore is a jq def that will do that:
# Input: a JSON dictionary for which some keys are CSV,
# Output: a JSON dictionary with the CSV keys split on the commas
def refine:
. as $in
| reduce keys_unsorted[] as $k ({};
if ($k|index(","))
then ($k/",") as $keys
| . + ($keys | map( {(.): $in[$k]}) | add)
else .[$k] = $in[$k]
end );
You can see how this works by running:
INDEX($mitre.records[]; .mitre_test) | refine
using an invocation of jq such as:
jq --argfile mitre in1.json -f program.jq in2.json
For the joining part of the problem, there are many relevant Q&As on SO, e.g.
How to join JSON objects on particular fields using jq?
There is probably a much more elegant way to do this, but I ended up manually walking around things and piping to new output.
Explanation:
Read in both files, pull the fields I need.
Break out the mitre_test values that were previously just a comma separated set of values with map and try.
Store the none-changing fields as a variable and then manipulate mitre_test to become an appropriately split array, removing nulls.
Group by mitre_test values, since they are the common thing that the output is based on.
Cleanup more nulls.
Sort output to look like I want it.
jq . in1.json in2.json | \
jq '.[] |{number: .number, test: .test, mitreid: .mitreid, mitre_test: .mitre_test}' |\
jq -s '[. |map(try(.mitre_test |= split(",")) // .)|\
.[] | [.number,.test,.mitreid] as $h | .mitre_test[] |$h + [.] | \
{DET_ID: .[0], tactic: .[1], techniqueID: .[2], mitre_test: .[3]}] |\
del(.[][] | nulls)' |jq '[group_by(.mitre_test)[]|{mitre_test: .[0].mitre_test, techniqueID: [.[].techniqueID],tactic: [.[].tactic], DET_ID: [.[].DET_ID]}]|\
del(.[].techniqueID[] | nulls) | del(.[].tactic[] | nulls) | del(.[].DET_ID[] | nulls)' | \
jq '.[]| [{techniqueID: .techniqueID[0],tactic: .tactic[0], metadata: [{name: "DET_ID",value: .DET_ID[]}]}] | .[] | \
select((.metadata|length)>0)'
It was a long line, so I split it among some of the basic ideas.

How can I parse nested JSON in PowerShell?

I'm trying to parse the results of a cURL command and the information I need is in a structure.
I tried getting to the data unsuccessfully and tried converting to PS Object but not sure how to access the structure as I'm new to PS.
Below is a sample of our cURL response.
I have a git commit hash ('c64a568399a572e82c223d55cb650b87ea1c22b8' matches latestCommit in fromRef for entry id 1101) and I need to find the corresponding displayId ('develop' in toRef)
I've done this in Linux using jq but need to replicate this in PS.
jq '.values | map(select(.fromRef.latestCommit=="'"$HASH"'")) | .[0].toRef.displayId'
I'm having 2 issues.
I can get to fromRef but it looks like #{id=refs/heads/feature/add-support; displayId=feature/add-support; latestCommit=c64a568399a572e82c223d55cb650b87ea1c22b8; repository=} and I cannot figure out how to parse
I'm not sure how to get the id so I can find the correct corresponding toRef
Any help would be greatly appreciated.
{
"size": 15,
"limit": 20,
"isLastPage": true,
"values": [
{
"id": 1101,
"version": 0,
"title": "Added header",
"description": "Added notes in header",
"state": "OPEN",
"open": true,
"closed": false,
"createdDate": 1595161367863,
"updatedDate": 1595161367863,
"fromRef": "#{id=refs/heads/feature/add-support; displayId=feature/add-support; latestCommit=c64a568399a572e82c223d55cb650b87ea1c22b8; repository=}",
"toRef": "#{id=refs/heads/develop; displayId=develop; latestCommit=58b3e3482bb35f3a735048849c2474cc676fbd9b; repository=}",
"locked": false,
"author": "#{user=; role=AUTHOR; approved=False; status=UNAPPROVED}",
"reviewers": " ",
"participants": "",
"properties": "#{mergeResult=; resolvedTaskCount=0; openTaskCount=0}",
"links": "#{self=System.Object[]}"
},
{
"id": 1053,
"version": 4,
"title": "Help with checking,",
"description": "fixed up code.",
"state": "OPEN",
"open": true,
"closed": false,
"createdDate": 1591826401310,
"updatedDate": 1595018917357,
"fromRef": "#{id=refs/heads/bugfix/checking-2.7; displayId=bugfix/checking-2.7; latestCommit=cf7d8860262c6a46b0b65ef5b6d66ae8cd698b75; repository=}",
"toRef": "#{id=refs/heads/hotfix/2.7_Improvements; displayId=hotfix/2.7_Improvements; latestCommit=01f1100c559ba41ec317421399c3bfb9a0aea91f; repository=}",
"locked": false,
"author": "#{user=; role=AUTHOR; approved=False; status=UNAPPROVED}",
"reviewers": " ",
"participants": "",
"properties": "#{mergeResult=; resolvedTaskCount=0; commentCount=4; openTaskCount=0}",
"links": "#{self=System.Object[]}"
}
],
"start": 0
}
Once you have converted the result with ConvertTo-Json and the correct -Depth parameter, you can get the values of the returned object quite easily in PowerShell.
Let's say you have used something like $json = $curlResult | ConvertTo-Json -Depth 100, then finding the displayId from the corresponding toRef can be done like this:
# this is the known hashvalue of the `fromRef` value to look for
$latestCommitHash = "c64a568399a572e82c223d55cb650b87ea1c22b8"
# get the value item. from here you can get all other properties belonging to that item
$valueItem = $json.values | Where-Object { $_.fromRef.latestCommit -eq $latestCommitHash }
# get the displayId value of the corresponding 'toRef' element:
$displayId = $valueItem.toRef.displayId
Returns
develop

Parse JSON from Raw format to CSV for a total novice

I'm looking for a way to parse RAW JSON into CSV and I'm a total novice with anything related to coding, programming, etc. I've found a site https://json-csv.com/ that does exactly what I need but the data sets I'm parsing are bigger than their free amount so I basically pay $10 a month for something I believe could be done by way of macro or something I could figure out.
I'm essentially looking for a quick way to parse this below chunk into a structured, column based detail. The columns would be: Key, Value, Context_Geography, Context_CompanyID, Context_ProductID, Description, Created by, Updated by, updated date.
{"policies":[{"key":"viaPayEnabledRates","value":"","context":{"geography":"","companyID":"","productID":""},"created_by":"0","updated_by":"0","updated_date":"2014-03-24T21:22:25.420+0000"},{"key":"viaPayEnabledRates","value":"[\"WSPNConsortia\",\"WSPNNegotiated\",\"WSPNPublished\"]","context":{"geography":"","companyID":"*","productID":"60003"},"description":"Central Payment Pilot","created_by":"10130590","updated_by":"10130590","updated_date":"2016-04-05T07:51:29.043+0000"}
Here is a solution using jq
If the file filter.jq contains
def headers:
[
"Key", "Value", "Context_Geography", "Context_CompanyID", "Context_ProductID",
"Description", "Created by", "Updated by", "updated date"
]
;
def fields:
[
.key, .value, .context.geography, .context.companyID, .context.productID,
.description, .created_by, .updated_by, .updated_date
]
;
headers, (.policies[] | fields)
| #csv
and the file data.jq contains your sample data
{
"policies": [
{
"key": "viaPayEnabledRates",
"value": "",
"context": {
"geography": "",
"companyID": "",
"productID": ""
},
"created_by": "0",
"updated_by": "0",
"updated_date": "2014-03-24T21:22:25.420+0000"
},
{
"key": "viaPayEnabledRates",
"value": "[\"WSPNConsortia\",\"WSPNNegotiated\",\"WSPNPublished\"]",
"context": {
"geography": "",
"companyID": "*",
"productID": "60003"
},
"description": "Central Payment Pilot",
"created_by": "10130590",
"updated_by": "10130590",
"updated_date": "2016-04-05T07:51:29.043+0000"
}
]
}
then running jq as
jq -M -r -f filter.jq data.json
produces the output
"Key","Value","Context_Geography","Context_CompanyID","Context_ProductID","Description","Created by","Updated by","updated date"
"viaPayEnabledRates","","","","",,"0","0","2014-03-24T21:22:25.420+0000"
"viaPayEnabledRates","[""WSPNConsortia"",""WSPNNegotiated"",""WSPNPublished""]","","*","60003","Central Payment Pilot","10130590","10130590","2016-04-05T07:51:29.043+0000"

Creating a CSV from json using jq, based on elements in array

I have the following json format that I need to convert to CSV
[{
"name": "joe",
"age": 21,
"skills": [{
"lang": "spanish",
"grade": "47",
"school": {
"name": "my school",
"url": "example.com/sp-school"
}
}, {
"lang": "english",
"grade": "87"
}]
},
{
"name": "sarah",
"age": 34,
"skills": [{
"lang": "french",
"grade": "47",
"school": {
"name": "my school",
"url": "example.com/sp-school"
}
}, {
"lang": "english",
"grade": "87"
}]
}, {
"name": "jim",
"age": 26,
"skills": [{
"lang": "spanish",
"grade": "60"
}, {
"lang": "english",
"grade": "66",
"school": {
"name": "eg school",
"url": "eg-school.com"
}
}]
}
]
to convert to csv
name,age,grade,school,url,file,line_number
joe,21,47,"my school","example.com/sp-school",sample.json,1
jim,26,60,"","",sample.json,3
So add the top level fields and the object from the skills array if lang=spanish and the school hash from the skills object for spanish if it exists
I'd also like to add the file and line number it came from.
I would like to use jq for the job, but can't figure out the syntax , anyone help me out ?
With your data in input.json, and the following jq program in tocsv.jq:
.[]
| [.name, .age] +
(.skills[]
| select(.lang == "spanish")
| [.grade, .school.name, .school.url, input_filename, input_line_number] )
| #csv
the invocation:
jq -r -f tocsv.jq input.json
yields:
"joe",21,"47","my school","example.com/sp-school","input.json",51
"jim",26,"60",,,"input.json",51
If you want the number-valued strings converted to numbers, you could use the "tonumber" filter. If you want the null-valued fields replaced by strings, use e.g. .school.name // ""
Of course this approach doesn't yield a very useful line number. One approach that would yield higher granularity would be to stream the individual objects into jq, but then you'd lose the filename. To recover the filename you could pass it in as an argument. So you would have a pipeline like so:
jq -c '.[]' input.json | jq -r --arg file input.json -f tocsv2.jq
where tocsv2.jq would be like tscsv.jq above but without the initial .[] |, and with $file instead of input_filename.
Finally, please also consider using the TSV format (#tsv) rather than the rather messy CSV format (#csv).