How do I simplify a JSON object using JQ? - json

I've got a huge JSON object and I want to filter it down, to a small % of the available fields. I've searched some similar questions, such as enter link description here but that is for an array of objects. I have a JSON object that looks something like:
{
"timestamp":1455408955250999808,
"client":
{
"ip":"76.72.172.208",
"srcPort":0,
"country":"us",
"deviceType":"desktop"},
"clientRequest":
{
"bytes":410,
"bodyBytes":0}
}
What I'm trying to do is create a new JSON object that looks likes:
{
"timestamp":1455408955250999808,
"client":
{
"ip":"76.72.172.208",
}
"clientRequest":
{
"bytes":410
}
}
So effectively filter down the data. I've tried:
| jq 'map({client.ip: .client.ip, timestamp: .timestamp})' and I continue to get:
jq: error (at <stdin>:0): Cannot index number with string "client"
Even the most simple | jq 'map({timestamp: .timestamp})' is showing the same error.
I thought I could access the K,V pairs and use the map function as the person did for his array in the question linked above. Any help much appreciated.

Huzzah. Simple enough really :)
cat LogSample.txt | jq '. | {Id: .Id, client: {ip: .client.ip}}'
Basically define the object yourself :)

It looks like it will be simplest if you construct the object you want. Based on your example, you could do so using the following filter:
{ timestamp,
client: { ip: .client.ip },
clientRequest: {bytes: .clientRequest.bytes }
}
By contrast, map expects its input to be an array, whereas your input is a JSON object.
Please also note that jq provides direct ways to remove keys as well, e.g. using del/1.

Related

jq get values from complex object

I have an object that looks like this
{
"my_list": [
{
"name": "an_image",
"image": "an_image.com"
},
{
"name": "another_image",
"image": "another_image.com"
},
...<more objects with image property>
],
"foo": {
"image": "foobar.io"
},
"bar": {
"image": "bar_image.io"
},
...<more objects with image property>
}
I'd like to get all of the image properties from each of the objects, and from each object in my_list and other lists that have objects that include an image property. So in this example I'd like to get
"an_image.com"
"another_image.com"
"foobar.io"
"bar_image.io"
We don't know the keys of any of these objects at runtime, so we can't reference my_list, foo, or bar in this example.
Previously we didn't have my_list in the object and jq '.[].image' worked, but now that results in jq: error (at bar.json:18): Cannot index array with string "image".
The problem is that we don't know the name of the objects that contain image properties so we can't reference them explicitly, and now that we've added another element that's a list we're running into type errors, that I'm not sure how to solve.
I've tried various combinations of .[].image, but they all seem to run into issues with typing.
If you don't mind the terseness, you could perhaps go with:
jq '..|select(.image?).image'
You could select by objects and items of arrays:
jq '.[] | ., arrays[] | objects.image'
"an_image.com"
"another_image.com"
"foobar.io"
"bar_image.io"
Demo
Using recursive descent .. is more elegant:
jq '.. | .image? // empty'
If the input is large, you might want to consider streaming the data in:
$ jq --stream -r 'select(.[0][-1] == "image")[1] // empty' input.json
an_image.com
another_image.com
foobar.io
bar_image.io
When streamed, your input will be processed as path/value pairs for the most part. Filter the paths you want, then return the value.

Is there a way to delete the same key from a list of objects within a nested field?

I'm setting up a devops pipeline so that certain data profiles stored in JSON format can be shifted across different servers. While downloading it from the current server I need to clean up all the protected keys and unique identifiers. I'm looking for the cleanest way to do the following in JQ
Input:
{
"TopKey1":{
"some_key":"some_value"
},
"TopKey2":{
"some_key2":"some_value2"
},
"KeytoSearch":[
{
"_id":"sdf",
"non_relevant_key1":"val"
},
{
"_id":"sdfdsdf",
"non_relevant_key2":"val"
},
{
"_id":"sgf",
"non_relevant_key3":"val"
}
]
}
Output:
{
"TopKey1":{
"some_key":"some_value"
},
"TopKey2":{
"some_key2":"some_value2"
},
"KeytoSearch":[
{
"non_relevant_key1":"val"
},
{
"non_relevant_key2":"val"
},
{
"non_relevant_key3":"val"
}
]
}
In python terms if this were a dictionary
for json_object in dictionary["KeytoSearch"]:
json_object.pop("_id")
I've tried combinations of map and del but can't seem to figure out the nested indexing with this. The error messages I get are along the lines of jq: error (at <stdin>:277): Cannot index string with string "_id" which sort of tells me I haven't fundamentally understood how jq works or is to be used, but this is the route I need to go because using a Python script to clean up JSON objects is something I'd rather avoid
Going with your input JSON and assuming there are other properties in your KeytoSearch object along with the _id fields, you could just do below.
jq 'del(.KeytoSearch[]._id)'
See this jqplay.org snippet for a demo. The quotes around the property key containing _ are not needed as confirmed in one of the comments below. Some meta-characters (e.g. . in the property key values needs be accessed with quotes as ".id") needs to be quoted properly, but _ is clearly not one of them.
I've tried combinations of map and del
Good! You were probably just missing the '|=' magic ingredient:
.Keytosearch |= map( del(._id) )
alternatively, you could use a walk-path unix tool for JSON: jtc and apply changes right into the sourse json file (-f):
bash $ jtc -fpw'[KeytoSearch]<_id>l:' file.json
bash $
bash $
bash $ jtc file.json
{
"KeytoSearch": [
{
"non_relevant_key1": "val"
},
{
"non_relevant_key2": "val"
},
{
"non_relevant_key3": "val"
}
],
"TopKey1": {
"some_key": "some_value"
},
"TopKey2": {
"some_key2": "some_value2"
}
}
bash $
if given json snippet is a part of a larger JSON (and [KeytoSearch] is not addressable from the root), then replace it with the search lexeme: <KeytoSearch>l.
PS> Disclosure: I'm the creator of the jtc tool

How I can fix json structure to help spark read it properly. Different types for same key

I'm reciving json. I don't know on which keys problem will appear. When spark see different types for same key it puts this into string and I need to have data in array type. I'm using spark 2.4 with json lib so I read jsons as
spark.read.json("jsonfile")
I'm flattening my json schema to this kind of format where col name is:
B__C
B__somedifferentColname
Sample json look like this
{
"A":[
{
"B":{
"C":"Hello There"
}
},
{
"B":[
{
"C":"Hello"
},
{
"C":"Hi"
}
]
}
]
}
and I would like to have this json in format like this:
{
"A":[
{
"B":[{
"C":"Hello There"
}]
},
{
"B":[
{
"C":"Hello"
},
{
"C":"Hi"
}
]
}
]
}
So as you can see what I have changed is added square brackets to first object.
But when I have one value as struct type and one value as a list it puts this to string so the column value will be look like:
"[{"C":"Hello"},{"C":"Hi"}]"
and it should look like that
B__C
Hello
Hi
Hello There
Is anyone able to help me what trick I can use to resolve this issue?
Team which delivers jsons to us said this is not possible to do this from thier side so we have to resolve this on our side.

Flatten nested JSON with jq

I'm trying to flatten some nested JSON with jq. My first attempt was by looping over the JSON in bash with base64 as per this article. It turned out to perform very slowly, so I'm trying to figure out an alternative with just jq.
I have some JSON like this:
[
{
"id":117739,
"officers": "[{\"name\":\"Alice\"},{\"name\":\"Bob\"}]"
},
{
"id":117740,
"officers":"[{\"name\":\"Charlie\"}]"
}
]
The officers field holds a string which is JSON too. I'd like to reduce this to:
[
{ "id":117739, "name":"Alice" },
{ "id":117739, "name":"Bob" },
{ "id":117740, "name":"Charlie" }
]
Well the data you're attempting to flatten is itself JSON so you have to parse it using fromjson. Once parsed, you could then generate the new objects.
map({id} + (.officers | fromjson[]))

JSON path - extract all maps

I am trying to write a JSON path expression to extract all maps and submaps from a JSON structure. Considering the JSON:
{
"k1":"v1",
"arr": ["1","2","3" ,["7","8"] ],
"submap":
{
"a":"b",
"c":"d"
},
"submap_2":
{
"a_2":"b",
"c_2":"d",
"nested": { "x":"y" }
}
}
I would want to extract the elements "submap", "submap_2", "nested".
I've tried JSONPath expressions like:
$..*[?(#.length()>0 && #.*[0] empty true)]
This returns the structures I want, but also returns [ "7","8" ]. Is there any way to do this with JSONPath or is this better done in code?
(A neat JSONPath testing tools is here: http://jsonpath.herokuapp.com/)
(The specific implementation that I'm using is this one: https://github.com/jayway/JsonPath )
jq queries are often very similar to JSONPath queries, and I would strongly recommend that if at all possible you consider using jq.
Assuming the example data is in a file named example.json, the following invocation of jq produces the result you requested:
$ jq 'path(.. | select(type=="object")) | .[-1] | select(.)' example.json
"submap"
"submap_2"
"nested"
The output of the first filter (path(....)) consists of the full path expressions of the paths to all the JSON objects, including the top-level object itself. The remaining filters are needed to produce the exact output you requested. In practice, though, the full path expressions are probably more useful, so it might be helpful for you to see the output produced by the first filter:
$ jq -c 'path(.. | select(type=="object"))' example.json
[]
["submap"]
["submap_2"]
["submap_2","nested"]