Columnar CSV Output from nested SON - json

[
{
"name": "Metadata:MER-2.0-ver AGYW_PREV-Results (Semi Annual)",
"id": "XOPEXepA7zg",
"categoryOptions.name": [
"0 -2 month",
">2months-<1 year",
"< 1 year",
"(1 - 4) Years",
"(1-9) Years"
],
"categoryOptions.id": [
"wfvXckoyaE9",
"Yi2K2FUDa3B",
"kKt6hryCX75",
"A0B8w6HoZvV",
"upbvx1IvICR"
]
},
{
"name": "Metadata:MER-2.0-ver KP-Results (Semi Annual)",
"id": "k9p3Ghbi6eW",
"categoryOptions.name": [
"Sex Workers",
"People in prisons and other enclosed settings (Incarcerated Population) ",
"PWID..",
"MSM",
"Transgender"
],
"categoryOptions.id": [
"mwTwhESK21T",
"eQjIwsDqbPy",
"zYaPQA3uTiH",
"vu0dG7psM5W",
"Jyo9XWumVtZ"
]
},
{
"name": "Metadata:MER-2.0-ver PP-Results (Semi Annual)",
"id": "rkExsSSc3yI",
"categoryOptions.name": [
"Adolescents (10-24)",
"Clients of Sex Workers",
"Displaced Persons",
"Fishing communities",
"Military and other Uniform Services"
],
"categoryOptions.id": [
"yWwp6xnt0pw",
"jlKwW6DC023",
"wF42hb47Z7J",
"qkIUghy30Vl",
"Vcuw6LkdAkk"
]
},
{
"name": "Metadata:MER-2.0-ver PREP_CURR-and-TX_ML (Semi Annual)",
"id": "ZYdO3FqQgo1",
"categoryOptions.name": [
"Adolescents (10-24)",
"Clients of Sex Workers",
"Displaced Persons",
"Fishing communities",
"Military and other Uniform Services"
],
"categoryOptions.id": [
"yWwp6xnt0pw",
"jlKwW6DC023",
"wF42hb47Z7J",
"qkIUghy30Vl",
"Vcuw6LkdAkk"
]
},
{
"name": "Metadata:MER-2.0-ver SupplyChain-Results (Semi Annual)",
"id": "Cub0DEVWs3P",
"categoryOptions.name": [
"TLD 30-count bottles",
"TLD 90-count bottles",
"TLD 180-count bottles",
"TLE/400 30-count bottles",
"TLE/400 90-count bottles"
],
"categoryOptions.id": [
"dtmTsLvH2dk",
"sOLj1z1XRxh",
"SnkZTF4kThV",
"sNnXSKiPvb5",
"t3iPChPFIcd"
]
}
]
Expected Output should be in csv format as below:
key,name,id,"categoryOptions.name","categoryOptions.id"
0,Metadata:MER-2.0-ver AGYW_PREV-Results (Semi Annual),XOPEXepA7zg,0 -2 month,wfvXckoyaE9
0,Metadata:MER-2.0-ver AGYW_PREV-Results (Semi Annual),XOPEXepA7zg,>2months-<1 year,Yi2K2FUDa3B
1,Metadata:MER-2.0-ver KP-Results (Semi Annual),k9p3Ghbi6eW,Sex Workers,mwTwhESK21T
1,Metadata:MER-2.0-ver KP-Results (Semi Annual),k9p3Ghbi6eWPeople in prisons and other enclosed settings (Incarcerated Population),eQjIwsDqbPy
2,Metadata:MER-2.0-ver PP-Results (Semi Annual),rkExsSSc3yI,Adolescents (10-24),yWwp6xnt0pw
2,Metadata:MER-2.0-ver PP-Results (Semi Annual),rkExsSSc3yI,Clients of Sex Workers,jlKwW6DC023
upto key4
The above input json came from here below:
cat /home/fred/Downloads/metadata/multiple-dataset-metadata.json
| jq '[.dataSets[]
| {name: .name,id: .id,"categoryOptions.name": [.dataSetElements[].dataElement.categoryCombo.categories[].categoryOptions
[].name],"categoryOptions.id": [.dataSetElements[].dataElement.categoryCombo.categories[].categoryOptions[].id]}]'

Here is one solution to the problem as I understand it:
range(0;length) as $i
| .[$i]
| [$i, .name, .id] +
( range(0, .["categoryOptions.name"]|length) as $j
| [ .["categoryOptions.name"][$j], .["categoryOptions.id"][$j] ] )
| #csv
This produces everything except the header row, the production of which is left as an exercise.
Invocation
... would be along the lines of:
jq -r -f program.jq input.json

To add onto #peak's solution
The final invocation ( with CSV header) may look like this:
jq -r -f program.jq input.json > output.csv && sed -i '1i "key","name","id","categoryOptions.name","categoryOptions.id"' output.csv
The sed solution is picked from here

Related

How to combine jq array value into same key in json file in shell script?

I have a json file with this content:
[
{
"id": "one",
"msg": [
"test"
],
"FilePath": [
"JsonSerializer.cs",
"ChatClient.cs",
"MiniJSON.cs"
],
"line": [
358,
1241,
382
]
},
{
"id": "two",
"msg": [
"secondtest"
],
"FilePath": [
"Utilities.cs",
"PhotonPing.cs"
],
"line": [
88,
36
]
}
]
I want the output where as you can see the value combine into one :
one
[
"test"
]
[
"JsonSerializer.cs",358
"ChatClient.cs",1241
"MiniJSON.cs",382
]
two
[
"secondtest"
]
[
"Utilities.cs",88
"PhotonPing.cs",36
]
I have tried this cat stack.json |jq -r '.[]|.id,.msg,.FilePath,.line'
which gave output as
one
[
"test"
]
[
"JsonSerializer.cs",
"ChatClient.cs",
"MiniJSON.cs"
]
[
358,
1241,
382
]
two
[
"secondtest"
]
[
"Utilities.cs",
"PhotonPing.cs"
]
[
88,
36
]
Kindly help me resolve this, I have tried a lot to debug this but unable to get through. Also, the Filepath and line would always be similar for each . For example if FilePath has 3, line would also have 3 values.
You're looking for transpose.
.[] | .id, .msg, ([.FilePath, .line] | transpose | add), ""
Online demo
<stack.json jq -r '.[] | .id, .msg, ([.FilePath, .line]|transpose|add)'
gives the output as required by you.
transpose turns [[1,2,3],[4,5,6]] into [[1,4],[2,5],[3,6]] and add collects all array elements into a single array.
If you are looking to have file path and line number in a single line, I suggest formatting them as string, separated by colon:
.[] | .id, .msg, ([.FilePath, .line]|transpose|map(join(":")))

Print only one property of an object that is within an an array attribute as well as a property that is a sibling to the array property in jq

I have a json file that looks like so:
[
{
"code": "1234",
"files": [
{
"fileType": "pdf",
"url": "http://.../a.pdf"
},
{
"fileType": "video",
"url": "http://.../b.mp4"
}
]
},
{
"code": "4321",
"files": [
{
"fileType": "pdf",
"url": "http://.../c.pdf"
},
{
"fileType": "video",
"url": "http://.../d.mp4"
}
]
},
{
"code": "9999",
"files": [
{
"fileType": "pdf",
"url": "http://.../e.pdf"
}
]
}
]
I would like to print out only the files that are of fileType == video in the files array such that I end up with output that looks like so:
1234, "http://.../b.mp4"
4321, "http://.../d.mp4"
So far I am only able to output something that looks like this:
1234, "http://.../a.pdf", "http://.../b.mp4",
4321, "http://.../c.pdf", "http://.../d.mp4"
Using the following:
jq -r '.[] | select(.files[]?.fileType == "video") | [.code, .files[].url] | #csv'
I was wondering how I can filter the .files[] based on the fileType as I am outputting them?
The following pipeline makes the solution fairly self-explanatory, assuming one understands the basic syntax and the -r command-line option:
< input.json jq -r '
.[]
| .code as $code
| .files[]
| select(.fileType == "video")
| "\($code), \"\(.url)\""
'

Print key and value for different entries in an object

I need to print some results with jq to take json.
This is an example:
{
"data": [
{
"time": 20201606,
"event": {
"ip": "127.0.1",
"hostname": "srv1",
"locations": [
"UK",
"site1"
],
"num": 1
}
},
{
"time": 202016034,
"event": {
"ip": "127.0.2",
"hostname": "srv2",
"locations": [
"UK",
"site2"
],
"num": 3
}
}
]
}
Like to generate this output "num, ip, hostname, locations":
1, srv1, 127.0.1, UK,site1
2, srv2, 127.0.2, HK,site2
3, srv3, 127.0.3, LO,site3
How can I print this via jq?
Join locations by a comma, and put the result into an array with other fields. Then join again by a comma followed by a space to get the desired output format. E.g.:
.data[].event | [
.num,
.hostname,
.ip,
(.locations | join(",")) ?
] | join(", ")
Use --raw-output/-r option in the command line invocation to get raw strings instead of JSON strings.
Online demo
At its core, you want to build an array consisting of the values you want:
$ jq '.data[].event | [.num, .hostame, .ip, .locations]' tmp.json
[
1,
null,
"127.0.1",
[
"UK",
"site1"
]
]
[
3,
null,
"127.0.2",
[
"UK",
"site2"
]
]
From there, it's a matter of formatting. First, let's turn the list of locations into a single string:
$ jq '.data[].event | [.num, .hostame, .ip, (.locations|join(","))]' tmp.json
[
1,
null,
"127.0.1",
"UK,site1"
]
[
3,
null,
"127.0.2",
"UK,site2"
]
Next, let's join those strings into a ", "-separated string.
$ jq '.data[].event | [.num, .hostame, .ip, (.locations|join(","))] | join(", ")' tmp.json
"1, , 127.0.1, UK,site1"
"3, , 127.0.2, UK,site2"
Finally, you can use the -r flag to output raw text rather than a JSON string value.
$ jq -r '.data[].event | [.num, .hostame, .ip, (.locations|join(","))] | join(", ")' tmp.json
1, , 127.0.1, UK,site1
3, , 127.0.2, UK,site2

jq- merge two json files on a value

i have two json files structured like that:
file 1
[
{
"id": 25422,
"location": "Hotel X",
"suppliers": [
12
]
},
{
"id": 25423,
"location": "Hotel Y",
"suppliers": [
13
]
}]
file 2
[
{
"id": 12,
"vatNumber": "0000000000"
},
{
"id": 14,
"vatNumber": "0000000001"
}]
and i'd like a result like this
[
{
"id": 25422,
"location": "Hotel X",
"suppliers": [
12
],
"vatNumber": "0000000000"
},
{
"id": 25423,
"location": "Hotel Y",
"suppliers": [
13
],
}]
The important thing to me is that the matching vatNumbers, are set in the first file. Supplier arrays are not required anymore after the melding, if it simplifies the job.
Also jq is not essential, but i need something i can use via terminal to set up a script.
Thank you in advance.
Here's one of many possible solutions. If your jq does not have INDEX/2, then either upgrade your jq or include its def (available e.g. from https://github.com/stedolan/jq/blob/master/src/builtin.jq):
Invocation:
jq -n --argfile f1 file1.json --argfile f2 file2.json -f merge.jq
merge.jq:
INDEX($f2[] ; .id) as $dict
| $f1
| map( ($dict[.suppliers[0]|tostring]|.vatNumber) as $vn
| if $vn then .vatNumber = $vn else . end)

CSV to JSON using BASH

I am trying to covert the below csv into json format.
Africa,Kenya,NAI,281
Africa,Kenya,NAI,281
Asia,India,NSI,100
Asia,India,BSE,160
Asia,Pakistan,ISE,100
Asia,Pakistan,ANO,100
European Union,United Kingdom,LSE,100
This is the desired json format and I just cannot get to create it. I will post my work in progress below this.. Any help or direction would be appreciated...
{"name":"Africa",
"children":[
{"name":"Kenya",
"children":[
{"name":"NAI","size":"109"},
{"name":"NAA","size":"160"}]}]},
{"name":"Asia",
"children":[
{"name":"India",
"children":[
{"name":"NSI","size":"100"},
{"name":"BSE","size":"60"}]},
{"name":"Pakistan",
"children":[
{"name":"ISE","size":"120"},
{"name":"ANO","size":"433"}]}]},
{"name":"European Union",
"children":[
{"name":"United Kingdom",
"children":[
{"name":"LSE","size":"550"},
{"name":"PLU","size":"123"}]}]}
Work in Progress.
$1 is the file with the csv values pasted above.
#!/bin/bash
pcountry=$(head -1 $1 | cut -d, -f2)
cat $1 | while read line ; do
region=$(echo $line|cut -d, -f1)
country=$(echo $line|cut -d, -f2)
code=$(echo $line|cut -d, -f3-)
size=$(echo $line|cut -d, -f4)
if test "$pcountry" == "$country" ;
then
echo -e {\"name\":\"$region\", '\n' \"children\": [ '\n'{\"name\":\"$country\",'\n'\"children\": [ '\n' \{\"name\":\"NAI\",\"size\":\"$size\"\}
else
if test "$pregion" == "$region"
then :
else
echo -e ,'\n'{\"name\":\""$region\", '\n' \"children\": [ '\n'{\"name\":\"$country\",'\n'\"children\": [ '\n' \{\"name\":\"NAI\",\"size\":\"$size\"\},
pcountry=$country
pregion=$region
fi ; done
Problem is that I cannot seem to find a way to find out when a countries value ends.
As a number of the commenters have said, using the shell for this kind of conversion is a horrible idea. And, it would be nigh impossible to do it with just bash builtins; and shell scripts are used to combine standard unix commands like sed, awk, cut, etc. anyway. You should choose a better language that's built for that kind of iterative parsing/processing to solve your problem.
However, because it's late and I've had too much coffee, I threw together a bash script (with a few bits of sed thrown in for parsing help) that takes the example .csv data you have and outputs the JSON in the format you noted. Here's the script:
#! /bin/bash
# Initial input file format:
#
# Africa,Kenya,NAI,281
# Africa,Kenya,NAA,281
# Asia,India,NSI,100
# Asia,India,BSE,160
# Asia,Pakistan,ISE,100
# Asia,Pakistan,ANO,100
# European Union,United Kingdom,LSE,100
#
# Intermediate file format for parsing to JSON:
#
# Africa|Kenya:NAI=281
# Asia|India:BSE=160&NSI=100|Pakistan:ISE=100&ANO=100
# European Union|United Kingdom:LSE=100
#
# Call as:
#
# $ ./script INPUTFILE.csv >OUTPUTFILE.json
#
# temporary files for output/parsing
TMP="./tmp.dat"
TMP2="./tmp2.dat"
>$TMP
>$TMP2
# read through initial file and output intermediate format
while read line
do
region=$(echo $line | cut -d, -f1)
country=$(echo $line | cut -d, -f2)
code=$(echo $line | cut -d, -f3)
size=$(echo $line | cut -d, -f4)
# region record already started
if grep "^$region" $TMP 2>&1 >/dev/null ;then
>$TMP2
while read rec
do
if echo $rec | grep "^$region" 2>&1 >/dev/null
then
if echo "$rec" | grep "\|$country:" 2>&1 >/dev/null
then
echo "$rec" | sed -e 's/\('"$country"':[^\|][^\|]*\)/\1\&'"$code"'='"$size"'/' >>$TMP2
else
echo "$rec|$country:$code=$size" >>$TMP2
fi
else
echo $rec >>$TMP2
fi
done < $TMP
mv $TMP2 $TMP
else
# new region
echo "$region|$country:$code=$size" >>$TMP
fi
done < $1
# Parse through our intermediary format and output JSON to standard out
echo "["
country_count=$(cat $TMP | wc -l)
while read line
do
country=$(echo $line | cut -d\| -f1)
echo "{ \"name\": \"$country\", "
echo " \"children\": ["
region_count=$(echo $line | cut -d\| -f2- | sed -e 's/|/\n/g' | wc -l)
echo $line | cut -d\| -f2- | sed -e 's/|/\n/g' |
while read region
do
name=$(echo $region | cut -d: -f1)
echo " { \"name\": \"$name\", "
echo " \"children\": ["
code_count=$(echo $region | sed -e 's/^'"$name"'://' -e 's/&/\n/g' | wc -l)
echo $region | sed -e 's/^'"$name"'://' -e 's/&/\n/g' |
while read code_size
do
code=$(echo $code_size | cut -d= -f1)
size=$(echo $code_size | cut -d= -f2)
code_count=$((code_count - 1))
COMMA=""
if [ $code_count -gt 0 ]; then
COMMA=","
fi
echo " { \"name\": \"$code\", \"size\": \"$size\" }$COMMA "
done
echo " ]"
region_count=$((region_count - 1))
if [ $region_count -gt 0 ]; then
echo " },"
else
echo " }"
fi
done
echo " ]"
country_count=$((country_count - 1))
COMMA=""
if [ $country_count -gt 0 ]; then
COMMA=","
fi
echo "}$COMMA"
done < $TMP
echo "]"
exit 0
And, here's the resulting output from the above script:
[
{ "name": "Africa",
"children": [
{ "name": "Kenya",
"children": [
{ "name": "NAI", "size": "281" },
{ "name": "NAA", "size": "281" }
]
}
]
},
{ "name": "Asia",
"children": [
{ "name": "India",
"children": [
{ "name": "NSI", "size": "100" },
{ "name": "BSE", "size": "160" }
]
},
{ "name": "Pakistan",
"children": [
{ "name": "ISE", "size": "100" },
{ "name": "ANO", "size": "100" }
]
}
]
},
{ "name": "European Union",
"children": [
{ "name": "United Kingdom",
"children": [
{ "name": "LSE", "size": "100" }
]
}
]
}
]
Please don't use code like the above in any production environment.
Here is a solution using jq.
If filter.jq contains the following filter
reduce (
split("\n")[] # split string into lines
| split(",") # split data
| select(length>0) # eliminate blanks
) as [$c1,$c2,$c3,$c4] ( # convert to object
{} # e.g. "Africa": { "Kenya": {
; setpath([$c1,$c2,"name"];$c3) # "name": "NAI",
| setpath([$c1,$c2,"size"];$c4) # "size": "281"
) # }, }
| [ # then build final array of objects format:
keys[] as $k1 # [ {
| {name: $k1, children: ( # "name": "Africa",
.[$k1] # "children": {
| keys[] as $k2 # "name": "Kenya",
| {name: $k2, children:.[$k2]} # "children": { "name": "NAI", "size": "281" }
)} # ...
]
and data contains the sample data then the command
$ jq -M -Rsr -f filter.jq data
produces
[
{
"name": "Africa",
"children": {
"name": "Kenya",
"children": {
"name": "NAI",
"size": "281"
}
}
},
{
"name": "Asia",
"children": {
"name": "India",
"children": {
"name": "BSE",
"size": "160"
}
}
},
{
"name": "Asia",
"children": {
"name": "Pakistan",
"children": {
"name": "ANO",
"size": "100"
}
}
},
{
"name": "European Union",
"children": {
"name": "United Kingdom",
"children": {
"name": "LSE",
"size": "100"
}
}
}
]
You'd be much better off using a tool like xidel that can manipulate csv / raw text and understands JSON:
I'm going to assume so_24300508.csv :
Africa,Kenya,NAI,109
Africa,Kenya,NAA,160
Asia,India,NSI,100
Asia,India,BSE,60
Asia,Pakistan,ISE,120
Asia,Pakistan,ANO,433
European Union,United Kingdom,LSE,550
European Union,United Kingdom,PLU,123
(this is extracted from your JSON sample instead of the CSV sample you provided)
xidel -s so_24300508.csv --json-mode=deprecated --xquery '
[
let $csv:=x:lines($raw)
for $region in distinct-values($csv ! tokenize(.,",")[1])
return {
"name":$region,
"children":[
for $country in distinct-values($csv[starts-with(.,$region)] ! tokenize(.,",")[2]) return {
"name":$country,
"children":for $data in $csv[starts-with(.,$region) and contains(.,$country)]
let $value:=tokenize($data,",")
return {
"name":$value[3],
"size":$value[4]
}
}
]
}
]
'
(without --json-mode=deprecated replace [ ] with array{ })
See this code snippet for intermediate steps leading to this query.
Also see this online xidelcgi demo.
Output:
[
{
"name": "Africa",
"children": [
{
"name": "Kenya",
"children": [
{
"name": "NAI",
"size": "109"
},
{
"name": "NAA",
"size": "160"
}
]
}
]
},
{
"name": "Asia",
"children": [
{
"name": "India",
"children": [
{
"name": "NSI",
"size": "100"
},
{
"name": "BSE",
"size": "60"
}
]
},
{
"name": "Pakistan",
"children": [
{
"name": "ISE",
"size": "120"
},
{
"name": "ANO",
"size": "433"
}
]
}
]
},
{
"name": "European Union",
"children": [
{
"name": "United Kingdom",
"children": [
{
"name": "LSE",
"size": "550"
},
{
"name": "PLU",
"size": "123"
}
]
}
]
}
]