How to format a csv file using json data? - json

I have a json file that I need to convert to a csv file, but I am a little wary of trusting a json-to-csv converter site as the outputted data seems to be incorrect... so I was hoping to get some help here!
I have the following json file structure:
{
"GroupName": "GrpName13",
"Number": 3,
"Notes": "Test Group ",
"Units": [
{
"UnitNumber": "TestUnit13",
"DataSource": "Factory",
"ContractNumber": "TestContract13",
"CarNumber": "2",
"ControllerTypeMessageId" : 4,
"NumberOfLandings": 4,
"CreatedBy": "user1",
"CommissionModeMessageId": 2,
"Details": [
{
"DetailName": "TestFloor13",
"DetailNumber": "5"
}
],
"UnitDevices": [
{
"DeviceTypeMessageId": 1,
"CreatedBy": "user1"
}
]
}
]
}
The issue I think Im seeing is that the converters seem to not be able to comprehend the many nested data values. And the reason I think the converters are wrong is because when I try to convert back to json using them, I dont receive the same structure.
Does anyone know how to manually format this json into csv format, or know of a reliable converter than can handle nested values?

Try
www.json-buddy.com/convert-json-csv-xml.htm
if not working for you then you can try this tool
http://download.cnet.com/JSON-to-CSV/3000-2383_4-76680683.html
should be helpful!

I have tried your json on this for url:
http://www.convertcsv.com/json-to-csv.htm
As a result:
UnitNumber,DataSource,ContractNumber,CarNumber,ControllerTypeMessageId,NumberOfLandings,CreatedBy,CommissionModeMessageId,Details/0/DetailName,Details/0/DetailNumber,UnitDevices/0/DeviceTypeMessageId,UnitDevices/0/CreatedBy
TestUnit13,Factory,TestContract13,2,4,4,user1,2,TestFloor13,5,1,user1
Because it could save the path of the key,like the 'DeviceTypeMessageId' in list 'UnitDevices': it will named the columns name with 'UnitDevices/0/DeviceTypeMessageId', this could avoid the same name mistake, so you can get the columns name by its converter rules.
Hope helpful.

Here is a solution using jq
If the file filter.jq contains
def denormalize:
def headers($p):
keys_unsorted[] as $k
| if .[$k]|type == "array" then (.[$k]|first|headers("\($p)\($k)_"))
else "\($p)\($k)"
end
;
def setup:
[
keys_unsorted[] as $k
| if .[$k]|type == "array" then [ .[$k][]| setup ]
else .[$k]
end
]
;
def iter:
if length == 0 then []
elif .[0]|type != "array" then
[.[0]] + (.[1:] | iter)
else
(.[0][] | iter) as $x
| (.[1:] | iter) as $y
| [$x[]] + $y
end
;
[ headers("") ], (setup | iter)
;
denormalize | #csv
and data.json contains (note extra samples added)
{
"GroupName": "GrpName13",
"Notes": "Test Group ",
"Number": 3,
"Units": [
{
"CarNumber": "2",
"CommissionModeMessageId": 2,
"ContractNumber": "TestContract13",
"ControllerTypeMessageId": 4,
"CreatedBy": "user1",
"DataSource": "Factory",
"Details": [
{
"DetailName": "TestFloor13",
"DetailNumber": "5"
}
],
"NumberOfLandings": 4,
"UnitDevices": [
{
"CreatedBy": "user1",
"DeviceTypeMessageId": 1
},
{
"CreatedBy": "user10",
"DeviceTypeMessageId": 10
}
],
"UnitNumber": "TestUnit13"
},
{
"CarNumber": "99",
"CommissionModeMessageId": 99,
"ContractNumber": "Contract99",
"ControllerTypeMessageId": 99,
"CreatedBy": "user99",
"DataSource": "Another Factory",
"Details": [
{
"DetailName": "TestFloor99",
"DetailNumber": "99"
}
],
"NumberOfLandings": 99,
"UnitDevices": [
{
"CreatedBy": "user99",
"DeviceTypeMessageId": 99
}
],
"UnitNumber": "Unit99"
}
]
}
then the command
jq -M -r -f filter.jq data.json
will produce
"GroupName","Notes","Number","Units_CarNumber","Units_CommissionModeMessageId","Units_ContractNumber","Units_ControllerTypeMessageId","Units_CreatedBy","Units_DataSource","Units_Details_DetailName","Units_Details_DetailNumber","Units_NumberOfLandings","Units_UnitDevices_CreatedBy","Units_UnitDevices_DeviceTypeMessageId","Units_UnitNumber"
"GrpName13","Test Group ",3,"2",2,"TestContract13",4,"user1","Factory","TestFloor13","5",4,"user1",1,"TestUnit13"
"GrpName13","Test Group ",3,"2",2,"TestContract13",4,"user1","Factory","TestFloor13","5",4,"user10",10,"TestUnit13"
"GrpName13","Test Group ",3,"99",99,"Contract99",99,"user99","Another Factory","TestFloor99","99",99,"user99",99,"Unit99"

Related

Use JQ to create new object where the key comes from one object and the value comes from another

I have the following input:
{
"Columns": [
{
"email": 123,
"name": 456,
"firstName": 789,
"lastName": 450,
"admin": 900,
"licensedSheetCreator": 617,
"groupAdmin": 354,
"resourceViewer": 804,
"id": 730,
"status": 523,
"sheetCount": 298
}
]
}
{
"Users": [
{
"email": "abc#def.com",
"name": "Abc Def",
"firstName": "Abc",
"lastName": "Def",
"admin": false,
"licensedSheetCreator": true,
"groupAdmin": false,
"resourceViewer": true,
"id": 521,
"status": "ACTIVE",
"sheetCount": 0
},
{
"email": "aaa#bbb.com",
"name": "Aaa Bob",
"firstName": "Aaa",
"lastName": "Bob",
"admin": false,
"licensedSheetCreator": true,
"groupAdmin": false,
"resourceViewer": false,
"id": 352,
"status": "ACTIVE",
"sheetCount": 0
}
]
}
I need to change the key for all key value pairs in users to match the value in Columns, like so:
{
"Columns": [
{
"email": 123,
"name": 456,
"firstName": 789,
"lastName": 450,
"admin": 900,
"licensedSheetCreator": 617,
"groupAdmin": 354,
"resourceViewer": 804,
"id": 730,
"status": 523,
"sheetCount": 298
}
]
}
{
"Users": [
{
123: "abc#def.com",
456: "Abc Def",
789: "Abc",
450: "Def",
900: false,
617: true,
354: false,
804: true,
730: 521,
523: "ACTIVE",
298: 0
},
{
123: "aaa#bbb.com",
456: "Aaa Bob",
789: "Aaa",
450: "Bob",
900: false,
617: true,
354: false,
804: false,
730: 352,
523: "ACTIVE",
298: 0
}
]
}
I don't mind if I update the Users array or create a new array of objects.
I have tried several combinations of with entries, to entries, from entries, trying to search for keys using variables but the more I dive into it, the more confused I get.
Elements of a stream are processed independently. So we have to change the input.
We could group the stream elements into an array. For an input stream, this can be achieved using --slurp/-s.[1]
jq -s '
( .[0].Columns[0] | map_values( tostring ) ) as $map |
(
.[0],
(
.[1:][] |
.Users[] |= with_entries(
.key = $map[ .key ]
)
)
)
'
Demo on jqplay
Alternatively, we could use --null-input/-n in conjunction with input and/or inputs to read the input.
jq -n '
input |
( .Columns[0] | map_values( tostring ) ) as $map |
(
.,
(
inputs |
.Users[] |= with_entries(
.key = $map[ .key ]
)
)
)
'
Demo on jqplay
Note that your desired output isn't valid JSON. Object keys must be strings. So the above produces a slightly different document than requested.
Note that I assumed that .Columns is always an array of one exactly one element. This is a nonsense assumption, but it's the only way the question makes sense.
For a stream the code generates, you could place the stream generator in an array constructor ([]). reduce can also be used to collect from a stream. For example, map( ... ) can be written as [ .[] | ... ] and as reduce .[] as $_ ( []; . + [ $_ | ... ] ).
The following has the merit of simplicity, though it does not sort the keys.
It assumes jq is invoked with the -n option and of course produces a stream of valid JSON objects:
input
| . as $Columns
| .Columns[0] as $dict
| input # Users
| .Users[] |= with_entries(.key |= ($dict[.]|tostring))
| $Columns, .
If having the keys sorted is important, then you could easily add suitable code to do that; alternatively, if you don't mind having the keys of all objects sorted, you could use the -S command-line option.

How to count occurrences of a key-value pair per individual object in JQ?

I could not find how to count occurrence of "title" grouped by "member_id"...
The json file is:
[
{
"member_id": 123,
"loans":[
{
"date": "123",
"media": [
{ "title": "foo" },
{ "title": "bar" }
]
},
{
"date": "456",
"media": [
{ "title": "foo" }
]
}
]
},
{
"member_id": 456,
"loans":[
{
"date": "789",
"media": [
{ "title": "foo"}
]
}
]
}
]
With this query I get loan entries for users with "title==foo"
jq '.[] | (.member_id) as $m | .loans[].media[] | select(.title=="foo") | {id: $m, title: .title}' member.json
{
"id": 123,
"title": "foo"
}
{
"id": 123,
"title": "foo"
}
{
"id": 456,
"title": "foo"
}
But I could not find how to get count by user (group by) for a title, to get a result like:
{
"id": 123,
"title": "foo",
"count": 2
}
{
"id": 456,
"title": "foo",
"count": 1
}
I got errors like jq: error (at member.json:31): object ({"title":"f...) and array ([[123]]) cannot be sorted, as they are not both arrays or similar...
When the main goal is to count, it is usually more efficient to avoid constructing an array if determining its length is the only reason for doing so. In the present case you could, for example, write:
def count(s): reduce s as $x (null; .+1);
"foo" as $title | .[] | {
id: .member_id,
$title,
count: count(.loans[].media[] | select(.title == $title))
}
group_by has its uses, but it is well to be aware that it is inefficient even for grouping, because its implementation involves a sort, which is not strictly necessary if the goal is to "group by" some criterion. A completely generic sort-free "group by" function is a bit tricky to implement, but often a simple but non-generic version is sufficient, such as:
# sort-free variant of group_by/1
# f must always evaluate to an integer or always to a string, which
# could be achieved by using `tostring`.
# Output: an array in the former case, or an object in the latter case
def GROUP_BY(f): reduce .[] as $x (null; .[$x|f] += [$x] );
Using group_by :
jq 'map(
(.member_id) as $m
| .loans[].media[]
| select(.title=="foo")
| {id: $m, title: .title}
)
|group_by(.id)[]
|.[0] + { count: length }
' input-file

jq to filter inner array elements but return the whole JSON

TL;DR
How can I return the whole JSON after filtering inner array elements of a top-level key?
Detailed explanation
I have a JSON describing the COCO image database and it is formatted as follows (irrelevant elements truncated as ...).
{
"info": {
"description": "COCO 2017 Dataset",
...
},
"licenses": [
{
"url": "http://creativecommons.org/licenses/by-nc-sa/2.0/",
...
},
...
],
"images": [
{
"license": 4,
...
},
"annotations": [
{
"segmentation": [
[
510.66,
...
]
],
"area": 702.1057499999998,
"iscrowd": 0,
"image_id": 289343,
"bbox": [
473.07,
395.93,
38.65,
28.67
],
"category_id": 18,
"id": 1768
},
"categories": [
{
"supercategory": "person",
...
},
]
}
I need to filter annotations where category_id has one of several values, for example 1, 2.
I can successfully filter such category_ids with
jq -C ' .annotations[] | select( .category_id == 1 or .category_id == 2 ) ' instances_val2017.json | less -R
However, what is returned are only the annotations element of the total JSON as below.
{
"segmentation": [
[
162.72,
...
]
],
"area": 426.9120499999995,
"iscrowd": 0,
"image_id": 45596,
"bbox": [
161.52,
507.18,
46.45,
19.16
],
"category_id": 2,
"id": 124742
}
{
...
{
I know it's possible to return these elements as an array by wrapping the expression in [] but how can I return the entire original JSON after filtering the specified category ids?
Okay I spent 3 hours trying to solve this yesterday then this morning I posted this question and subsequently figured it out!
Here is the solution which uses the |= operator which modifies an element in place.
jq '.annotations |= map(select(.category_id | contains(1,2)))' instances_val2017.json
As per the suggestion of #peak, here is the command with == instead of contains.
jq '.annotations |= map(select(.category_id == (1,2)))' instances_val2017.json

How do I update a single value in a nested array of objects in a json document using jq?

I have a JSON document that looks like the following. Note this is a simplified example of the real JSON, which is included at bottom of question:
{
"some_array": [
{
"k1": "A",
"k2": "XXX"
},
{
"k1": "B",
"k2": "YYY"
}
]
}
I would like to change the value of all the k2 keys in the some_array array where the value of the k1 key is "B".
Is this possible using jq ?
For reference this is the actual JSON document, which is an environment variable file for use in postman / newman tool. I am attempting this conversion using JQ because the tool does not yet support command line overrides of specific environment variables
Actual JSON
{
"name": "Local-Stack-Env-Config",
"values": [
{
"enabled": true,
"key": "KC_master_host",
"type": "text",
"value": "http://localhost:8087"
},
{
"enabled": true,
"key": "KC_user_guid",
"type": "text",
"value": "11111111-1111-1111-1111-11111111111"
}
],
"timestamp": 1502768145037,
"_postman_variable_scope": "environment",
"_postman_exported_at": "2017-08-15T03:36:41.474Z",
"_postman_exported_using": "Postman/5.1.3"
}
Here is a slightly simpler version of zayquan's filter:
.some_array |= map(if .k1=="B" then .k2="changed" else . end)
Here's another solution.
jq '(.some_array[] | select(.k1 == "B") | .k2) |= "new_value"'
Output
{
"some_array": [
{
"k1": "A",
"k2": "XXX"
},
{
"k1": "B",
"k2": "new_value"
}
]
}
Here is a viable solution:
cat some.json | jq '.some_array = (.some_array | map(if .k1 == "B" then . + {"k2":"changed"} else . end))'
produces the output:
"some_array": [
{
"k1": "A",
"k2": "XXX"
},
{
"k1": "B",
"k2": "changed"
}
]
}

jq get the value of x based on y in a complex json file

jq strikes again. Trying to get the value of DATABASES_DEFAULT based on the name in a json file that has a whole lot of names and I'm completely lost.
My file looks like the following (output of an aws ecs describe-task-definition) only much more complex; I've stripped this to the most basic example I can where the structure is still intact.
{
"taskDefinition": {
"status": "bar",
"family": "bar2",
"volumes": [],
"taskDefinitionArn": "bar3",
"containerDefinitions": [
{
"dnsSearchDomains": [],
"environment": [
{
"name": "bar4",
"value": "bar5"
},
{
"name": "bar6",
"value": "bar7"
},
{
"name": "DATABASES_DEFAULT",
"value": "foo"
}
],
"name": "baz",
"links": []
},
{
"dnsSearchDomains": [],
"environment": [
{
"name": "bar4",
"value": "bar5"
},
{
"name": "bar6",
"value": "bar7"
},
{
"name": "DATABASES_DEFAULT",
"value": "foo2"
}
],
"name": "boo",
"links": []
}
],
"revision": 1
}
}
I need the value of DATABASES_DEFAULT where the name is baz. Note that there are a lot of keypairs with name, I'm specifically talking about the one outside of environment.
I've been tinkering with this but only got this far before realizing that I don't understand how to access nested values.
jq '.[] | select(.name==DATABASES_DEFAULT) | .value'
which is returning
jq: error: DATABASES_DEFAULT/0 is not defined at <top-level>, line 1:
.[] | select(.name==DATABASES_DEFAULT) | .value
jq: 1 compile error
Obviously this a) doesn't work, and b) even if it did, it's independant of the name value. My thought was to return all the db defaults and then identify the one with baz, but I don't know if that's the right approach.
I like to think of it as digging down into the structure, so first you open the outer layers:
.taskDefinition.containerDefinitions[]
Now select the one you want:
select(.name =="baz")
Open the inner structure:
.environment[]
Select the desired object:
select(.name == "DATABASES_DEFAULT")
Choose the key you want:
.value
Taken together:
parse.jq
.taskDefinition.containerDefinitions[] |
select(.name =="baz") |
.environment[] |
select(.name == "DATABASES_DEFAULT") |
.value
Run it like this:
<infile jq -f parse.jq
Output:
"foo"
The following seems to work:
.taskDefinition.containerDefinitions[] |
select(
select(
.environment[] | .name == "DATABASES_DEFAULT"
).name == "baz"
)
The output is the object with the name key mapped to "baz".
$ jq '.taskDefinition.containerDefinitions[] | select(select(.environment[]|.name == "DATABASES_DEFAULT").name=="baz")' tmp.json
{
"dnsSearchDomains": [],
"environment": [
{
"name": "bar4",
"value": "bar5"
},
{
"name": "bar6",
"value": "bar7"
},
{
"name": "DATABASES_DEFAULT",
"value": "foo"
}
],
"name": "baz",
"links": []
}