Combine Pandas dataFrames into a labelled JSON file - json

I have two separate train and validation dataFrames, train_df and val_df, that both look like below (with varying values in the lists, ints and strings):
F1 F2 F3
[0,0,0] 1 'string1'
[1,2,1] 2 'string2'
... ... ...
And I'd like to save them into a single JSON file with the following format:
{
"training":[
{
"F1": [0,0,0]
"F2": 1
"F3": 'string1'
},
{
"F1": [1,2,1]
"F2": 2
"F3": 'string2'
}
],
"validation":[
{...},
{...}
]
}
where the validation section is structured as the training section (F1,F2,F3) but with different values.
The closest I've gotten is using train_df.to_json(orient="records") which results in the right substructure, but I'm struggling to figure out how to insert the top level training or validation identifier. One option I've thought of is to save both dataframes separately, then read them in and store as strings using .dumps and then insert the text in the required location, but that seems overly convoluted and I'm sure there must be a more pythonic way of doing this.

Here's a way you can do it.
final_json={'training':df1.to_dict('records'),'validation':df2.to_dict('records')}
'''
{
"training": [
{
"F1": "[0,0,0]",
"F2": 1,
"F3": "'string1'"
},
{
"F1": "[1,2,1]",
"F2": 2,
"F3": "'string2'"
}
],
"validation": [
{
"F1": "[0,0,0]",
"F2": 1,
"F3": "'string1'"
},
{
"F1": "[1,2,1]",
"F2": 2,
"F3": "'string2'"
}
]
}
'''

Related

How to use jq to produce multiple JSON objects?

How would one transform a JSON object into several derived JSON objects with jq?
Example input:
[
{
"id": 1,
"a": "value-in-a",
"b": "value-in-b"
},
{
"id": 2,
"c": "value-in-c"
}
]
Expected output:
[
{
"id": "1",
"value": "value-in-a"
},
{
"id": "1",
"value": "value-in-b"
},
{
"id": "2",
"value": "value-in-c"
}
]
Here the output is an array with 3 elements. First 2 elements are transformed using the first element in the input array. Third element is produced from second element in the input array.
I assume to achieve there will need to be several steps:
a) Construct 2 objects from single JSON object input. Aassume this can be done using variables. First assign input object into variable, then construct object with value a and then with value b. Not sure how to make JQ return several constructed JSON objects.
b) Conditionals will need to be used to not produce an object if a, or b, or c is missing. This can probably be done using 'alternative' operator or if-then-else
You can iterate over the other keys using del and keys_unsorted:
jq 'map({id, value: (del(.id) | .[keys_unsorted[]])})'
[
{
"id": 1,
"value": "value-in-a"
},
{
"id": 1,
"value": "value-in-b"
},
{
"id": 2,
"value": "value-in-c"
}
]
Demo

Getting first level with JMESPath

I have this JSON document:
{
"1": {
"a": "G1"
},
"2": {
"a": "GM1"
}
}
My expected result should be:
1,G1
2,GM1
With *.a i get
[
"G1",
"GM1"
]
but I am absolutely stuck for the rest.
Sadly there is not much you can do that would be totally matching your use case and that would scale properly.
This is because JMESPath does not have a way to reference its parent, although this has been requested before, to allow you something like
*.[join(',', [keys($), a])]
You can definitely extract a list of keys and values, thanks to the function keys:
#.{keys: keys(#), values: *.a}
That gives
{
"keys": [
"1",
"2"
],
"values": [
"G1",
"GM1"
]
}
But then you just fall under the same case as this other question, because keys will give you a list of keys.
You can also end with a list of lists:
#.[keys(#), *.a]
Will give you:
[
[
"1",
"2"
],
[
"G1",
"GM1"
]
]
And you can even go further and flatten it if needed:
#.[keys(#), *.a] []
Gives:
[
"1",
"2",
"G1",
"GM1"
]
With all this if you do happen to have a list of exactly two items, then a solution would be to use a combination of join and slice:
#.[join(',',[keys(#),*.a][] | [::2]), join(',',[keys(#),*.a][] | [1::2])]
That would give the expected:
[
"1,G1",
"2,GM1"
]
But, sadly, as soon as you have more than two items to consider you would end up with a buggy:
[
"1,3,G1,GM3",
"2,4,GM1,GM4"
]
With a data set of
{
"1": {
"a": "G1"
},
"2": {
"a": "GM1"
},
"3": {
"a": "GM3"
},
"4": {
"a": "GM4"
}
}
And then, of course, the same can be achieved hardcoding indexes:
#.[join(',', [keys(#)[0], *.a | [0]]), join(',', [keys(#)[1], *.a | [1]])]
That also gives the expected:
[
"1,G1",
"2,GM1"
]
But, sadly, this only works if you know in advance the number of rows that are going to be returned to you.
And if you want a single string, given that were you want to feed the data accepts \n as a new line, you can join he whole array again:
#.[join(',', [keys(#)[0], *.a | [0]]), join(',', [keys(#)[1], *.a | [1]])].join(`\n`,#)
Will give:
"1,G1\n2,GM1"
Finally this expression worked 100% for me:
[{key1:keys(#)[0],a:*.a| [0]},{key1:keys(#)[1],a:*.a| [1]}]

Jq convert an object into an array

I have the following file "Pokemon.json", it's a stripped down list of Pokémon, listing their Pokédex ID, name and an array of Object Types.
[{
"name": "onix",
"id": 95,
"types": [{
"slot": 2,
"type": {
"name": "ground"
}
},
{
"slot": 1,
"type": {
"name": "rock"
}
}
]
}, {
"name": "drowzee",
"id": 96,
"types": [{
"slot": 1,
"type": {
"name": "psychic"
}
}]
}]
The output I'm trying to achieve is, extracting the name value of the type object and inserting it into an array.
I can easily get an array of all the types with
jq -r '.pokemon[].types[].type.name' pokemon.json
But I'm missing the key part to transform the name field into it's own array
[ {
"name": "onix",
"id": 95,
"types": [ "rock", "ground" ]
}, {
"name": "drowzee",
"id": 96,
"types": [ "psychic" ]
} ]
Any help appreciated, thank you!
In the man it states you have an option to use map - which essentially means walking over each result and returning something (in our case, same data, constructed differently.)
This means that for each row you are creating new object, and put some values inside
Pay attention, you do need another iterator within, since we want one object per row.
(we simply need to map the values in different way it is constructed right now.)
So the solution might look like so:
jq -r '.pokemon[]|{name:.name, id:.id, types:.types|map(.type.name)}' pokemon.json

Collect JSON objects with same attribute value, and create new key/value pairs

Here is a simplified sample of the JSON data I'm working with:
[
{ "certname": "one.example.com",
"name": "fact1",
"value": "value1"
},
{ "certname": "one.example.com",
"name": "fact2",
"value": 42
},
{ "certname": "two.example.com",
"name": "fact1",
"value": "value3"
},
{ "certname": "two.example.com",
"name": "fact2",
"value": 10000
},
{ "certname": "two.example.com",
"name": "fact3",
"value": { "anotherkey": "anothervalue" }
}
]
The result I want to achieve, using jq preferably, is the following:
[
{
"certname": "one.example.com",
"fact1": "value1",
"fact2": 42
},
{
"certname": "two.example.com",
"fact1": "value3",
"fact2": 10000,
"fact3": { "anotherkey": "anothervalue" }
}
]
Its worth pointing out that not all elements have the same name/value pairs, by any means. Also, values are often complex objects in their own right.
If I was doing this in Python, it wouldn't be a big deal (and yes, I can hear the chorus of "do it in Python" ringing in my ears now). I would like to understand how to do this in jq, and it's escaping me at the moment.
... using jq preferably ...
That's the spirit! And in that spirit, here's a concise solution:
map( {certname, (.name): .value} )
| group_by(.certname)
| map(add)
Of course there are other reasonable solutions. If the above is at first puzzling, you might like to add a debug statement here or there, or you might like to explore the pipeline by executing the first line by itself, etc.

Parsing JIRA Insights API JSON using jq

So I basically have JSON output from the JIRA Insights API, been digging around and found jq for parsing the JSON. Struggling to wrap my head around on how parse the following to only return values for the objectTypeAttributeId's that I am interested in.
For Example I'm only interested in the value of objectTypeAttributeId 887 provided that objectTypeAttributeId 911's name states as active, but then would like to return the name value of another objectTypeAttributeId
Can this be achieved using jq only? Or shoudl I be using something else?
I can filter down to this level which is the 'attributes' section of the JSON output and print each value, but struggling to find an example catering for my situation.
{
"id": 137127,
"objectTypeAttributeId": 887,
"objectAttributeValues": [
{
"value": "false"
}
],
"objectId": 9036,
"position": 16
},
{
"id": 137128,
"objectTypeAttributeId": 888,
"objectAttributeValues": [
{
"value": "false"
}
],
"objectId": 9036,
"position": 17
},
{
"id": 137296,
"objectTypeAttributeId": 911,
"objectAttributeValues": [
{
"status": {
"id": 1,
"name": "Active",
"category": 1
}
}
],
"objectId": 9036,
"position": 18
},
Can this be achieved using jq only?
Yes, jq was designed precisely for this kind of query. In your case, you could use any, select and if ... then ... else ... end, along the lines of:
if any(.[]; .objectTypeAttributeId == 911 and
any(.objectAttributeValues[]; .status.name == "Active"))
then map(select(.objectTypeAttributeId == 887))
else "whatever"
end