In the Json string given below, I want to find all elements in which category = m AND the "middle" array contains elements which match this condition - the element's "middle" array has objects whose itemType = Executable.
I would like to use jsonpath to get the desired objects. I prefer to not use jmespath because it can be too complex for my purpose. But, I am new to jsonpath and I am not able to figure out the json query from online tutorials which are too trivial or basic. I wonder if its better to use a programming language instead to get the data I need. Please advise.
So far, I was able to only extract elements in which category = m by using this jsonpath query $.[?(#.category=="m")]. How do I do the remaining part ?
Json :
Overview - Every object has a "content" object. Each content object generally has a start, middle and end array besides other fields. Middle arrays can have multiple content objects inside them and so on. Some of the content objects have only a middle array. I am interested in locating items in such content objects as mentioned above.
Note that this is not the actual json which I have to process. It is an imitation which has been sanitized for SO.
{
"id": "123",
"contents": {
"title": "B1",
"start": [],
"middle": [
{
"level": "1",
"contents": {
"title": "C1",
"category": "c",
"start": [],
"middle": [
{
"level": "2",
"contents": {
"title": "M1",
"category": "m",
"start": [],
"middle": [
{
"level": "3",
"contents": {
"title": "MAT1",
"middle": [
{
"itemType": "Data"
}
]
}
},
{
"level": "3",
"contents": {
"title": "MAT2",
"middle": [
{
"itemType": "Executable",
"id": "exec1"
}
]
}
},
{
"level": "3",
"contents": {
"title": "MAT3",
"middle": [
{
"itemType": "Data"
}
]
}
}
],
"end": []
}
},
{
"level": "2",
"contents": {
"title": "M2",
"category": "m",
"start": [],
"middle": [
{
"level": "3",
"contents": {
"title": "MAT1",
"middle": [
{
"itemType": "Data"
}
]
}
},
{
"level": "3",
"contents": {
"title": "MAT2",
"middle": [
{
"itemType": "Executable",
"id": "exec2"
}
]
}
}
],
"end": []
}
}
],
"end": []
}
},
{
"level": "1",
"contents": {
"title": "C2",
"category": "c",
"start": [],
"middle": [
{
"level": "2",
"contents": {
"title": "M1",
"category": "m",
"start": [],
"middle": [
{
"level": "3",
"contents": {
"title": "MAT1",
"middle": [
{
"itemType": "Data"
}
]
}
},
{
"level": "3",
"contents": {
"title": "MAT2",
"middle": [
{
"itemType": "Executable",
"id": "exec3"
}
]
}
},
{
"level": "3",
"contents": {
"title": "MAT3",
"middle": [
{
"itemType": "Data"
}
]
}
}
],
"end": []
}
},
{
"level": "2",
"contents": {
"title": "M2",
"category": "m",
"start": [],
"middle": [
{
"level": "3",
"contents": {
"title": "MAT1",
"middle": [
{
"itemType": "Data"
}
]
}
},
{
"level": "3",
"contents": {
"title": "MAT2",
"middle": [
{
"itemType": "Executable",
"id": "exec4"
}
]
}
},
{
"level": "3",
"contents": {
"title": "MAT3",
"middle": [
{
"itemType": "Data"
}
]
}
}
],
"end": []
}
}
],
"end": []
}
}
],
"end": []
}
}
Context
json with nested objects1
jsonpath expression language
choosing between jsonpath and jmespath (or other JSON expression engine)
Problem
DeveMasterJoe2 wants to extract some values from nested JSON
Discussion
There are lots of implementations of jsonpath out there, and they do not all support the same features
The structure and normalization of the source JSON is going to influence how easily this can be done with pure jsonpath
In choosing a JSON expression engine, one has to weigh multiple factors
how consistent are the implementations across languages?
how many choices are there within a given language?
how clear is the specification?
how many examples, unit-tests or tutorials are available?
who is supporting it?
Example solution using Python and jsonpath-ng
Here is an example solution using python 3.7 and jsonpath-ng
This example uses a mix of jsonpath and python instead of just pure jsonpath, because of the heavily-nested JSON
I will leave it for someone else to provide an answer that relies on pure jsonpath
Note that the source JSON arguably could stand to be cleaned up a bit
(for example, why is there no id field attached to itemType==Data elements?)
(for example, why is category not found on all contents elements?)
(for example, if you expressly specify level why complicate things with heavily nested objects when you can determine depth by level ?)
This example:
## import libraries
import codecs
import json
import jsonpath_ng
from jsonpath_ng.ext import parse
##;;
## init vars
href="path/to/my/jsonfile/nested_dict.json"
json_string = codecs.open(href, 'rb', encoding='utf8').read()
json_dataroot = json.loads(json_string)
final_result = []
##;;
## init jsonpath outer-query
match = parse('$..contents.middle[*]').find(json_dataroot)
##;;
## iterate through outer-query and gather subelements
for ijj,item in enumerate(match):
## restrict to desired category == 'm'
if(match[ijj].value.get('contents',{}).get('category','') == 'm'):
## extract out desired subelements
json_datafrag001 = [item.get('contents',{}).get('middle',{})[0]
for item in match[ijj].value.get('contents',{}).get('middle',{})
]
match001 = parse("$[?(#.itemType=='Executable')]").find(json_datafrag001)
final_result.extend(list(match001[ikk].value for ikk,item in enumerate(match001)))
pass
##;;
## show final result
vout = json.dumps(final_result, sort_keys=True,indent=4, separators=(',', ': '))
print(vout)
##;;
... produces this result ...
[
{
"id": "exec1",
"itemType": "Executable"
},
{
"id": "exec2",
"itemType": "Executable"
},
{
"id": "exec3",
"itemType": "Executable"
},
{
"id": "exec4",
"itemType": "Executable"
}
]
1 (aka dictionary, associative-array, hash)
Related
I have a requirement to roll a collection of nodes that uses the current node name (within the collection) and for the value take each child nodes value (single node) into a string array, then use the parents key as the key.
Given.
{
"client": {
"addresses": [
{
"id": "27ef465ef60d2705",
"type": "RegisteredOfficeAddress"
},
{
"id": "b7affb035be3f984",
"type": "PlaceOfBusiness"
},
{
"id": "a8a3bef166141206",
"type": "EmailAddress"
}
],
"links": [
{
"id": "29a9de859e70799e",
"type": "Director",
"name": "Bob the Builder"
},
{
"id": "22493ad4c4fd8ac5",
"type": "Secretary",
"name": "Jennifer"
}
],
"Names": [
{
"id": "53977967eadfffcd",
"type": "EntityName",
"name": "Banjo"
}
]
}
}
from this the output needs to be
{
"client": {
"addresses": [
"RegisteredOfficeAddress",
"PlaceOfBusiness",
"EmailAddress"
],
"links": [
"Director",
"Secretary"
],
"Names": [
"EntityName"
]
}
}
What is the best way to achieve this? Any pointers to what/how to do this would be greatly appreciated.
Ron.
You can iterate over entries of your client object first with the help of the $each function, then get types for each of them, and combine via $merge:
{
"client": client
~> $each(function($list, $key) {{ $key: $list.type }})
~> $merge
}
Live playground: https://stedi.link/OpuRdE9
I'd like to select/identity-output all objects in arrays under "emp" keys into a flat array of those objects.
[
{
"eng": {
"dev": {
"dir": {
"name": "Mickey"
},
"emp": [
{
"name": "Goofy",
"job": "laugh",
"start": "today"
},
{
"name": "Minnie",
"job": "laugh"
}
]
}
}
},
{
"mgmt": {
"dir": {
"name": "Donald"
},
"emp": [
{
"name": "Woody",
"job": "smile"
},
{
"name": "Buzz",
"job": "smile"
}
]
}
}
]
I'm looking for a flat array of arbitrary objects found in arbitrary locations within the document (in this example, under "emp" parent/keys).
In this example, it would look like
[
{
"name": "Goofy",
"job": "laugh",
"start": "today"
},
{
"name": "Minnie",
"job": "laugh"
},
{
"name": "Woody",
"job": "smile"
},
{
"name": "Buzz",
"job": "smile"
}
]
I've looked through a lot of documentation and am able to do this if I know in advance precisely where these 'emp' keys are in the document, but not if they're distributed through the document at a priori unknown locations/paths.
Use recurse to walk the structure. From all the substrucures, select objects with the emp key. Output the corresponding values and merge the resulting arrays.
jq '[recurse | select (type == "object" and .emp) | .emp ] | add' file.json
The problem is: I have 2 cucumber test reports in JSON format
I need to remove redundant key-value pairs from those reports and compare them, but I can't understand how to remove the unnecessary data from those 2 jsons because of their structure after JSON.parse (array or hash with many nested arrays/hashes). Please advice if there are some gems or known solutions to do this
JSON structure is e.g. :
[
{
"uri": "features/home_screen.feature",
"id": "as-a-user-i-want-to-explore-home-screen",
"keyword": "Feature",
"name": "As a user I want to explore home screen",
"description": "",
"line": 2,
"tags": [
{
"name": "#home_screen",
"line": 1
}
],
"elements": [
{
"keyword": "Background",
"name": "",
"description": "",
"line": 3,
"type": "background",
"before": [
{
"match": {
"location": "features/step_definitions/support/hooks.rb:1"
},
"result": {
"status": "passed",
"duration": 505329000
}
}
],
"steps": [
{
"keyword": "Given ",
"name": "I click OK button in popup",
"line": 4,
"match": {
"location": "features/step_definitions/registration_steps.rb:91"
},
"result": {
"status": "passed",
"duration": 2329140000
}
},
{
"keyword": "And ",
"name": "I click Allow button in popup",
"line": 5,
"match": {
"location": "features/step_definitions/registration_steps.rb:96"
},
"result": {
"status": "passed",
"duration": 1861776000
}
}
]
},
Since you are asking for a gem, you might try iteraptor I have created exactly for this kind of tasks.
It allows iterating, mapping and reducing the deeply nested structures. For instance, to filter out all the keys called "name" on all levels, you might do:
input.iteraptor.reject(/name/)
The more detailed description might be found on the github page linked above.
What is the best way to serialize a dictionary when key is a complex type. For example consider this invalid json:
[
{ParentId:1, ParentName:'X'}:
[{'ChildId':'1', 'ChildName':'a'}, {'ChildId':'2', 'ChildName':'b'}],
{ParentId:2, ParentName:'Y'}:
[{'ChildId':"3", 'ChildName':'c'}]}
]
Is there any way to correctly serialize this?
You can wrap the json with keys as strings and values as your complex object.
In addition, your keys of the complex type should be strings as well.
For example:
[
{
"name": {
"ParentId": 1,
"ParentName": "X"
},
"value": [
{
"ChildId": "1",
"ChildName": "a"
},
{
"ChildId": "2",
"ChildName": "b"
}
]
},
{
"name": {
"ParentId": 2,
"ParentName": "Y"
},
"value": [
{
"ChildId": "3",
"ChildName": "c"
}
]
}
]
Unfortunately this is not a valid json, so the answer would be no :-( keys have to be strings. You could reorganize your data structure to wrap your children list into a "parent" object which has an id and a name !
Basically it would look like :
[
{
"ParentId": 1,
"ParentName": "X",
"children": [
{
"ChildId": 1,
"ChildName": "a"
},
{
"ChildId": 2,
"ChildName": "b"
}
]
},
{
"ParentId": 2,
"ParentName": "Y",
"children": [
{
"ChildId": 3,
"ChildName": "c"
}
]
}
]
You can find the specifications of JSON here : https://www.rfc-editor.org/rfc/rfc4627
I want to index & search nested json in solr. Here is my json code
{
"id": "44444",
"headline": "testing US",
"generaltags": [
{
"type": "person",
"name": "Jayalalitha",
"relevance": "0.334",
"count": 1
},
{
"type": "person",
"name": "Kumar",
"relevance": "0.234",
"count": 1
}
],
"socialtags": {
"type": "SocialTag",
"name": "US",
"importance": 2
},
"topic": {
"type": "Topic",
"name": "US",
"score": "0.936"
}
}
When I try to Index, I'm getting the error "Error parsing JSON field value. Unexpected OBJECT_START"
When we tried to use Multivalued Field & index, we couldn't able to search using the multivalued field? Its returning "Undefined Field"
Also Please advice if I need to do any changes in schema.xml file?
You are nesting child documents within your document. You need to use the proper syntax for nested child documents in JSON:
[
{
"id": "1",
"title": "Solr adds block join support",
"content_type": "parentDocument",
"_childDocuments_": [
{
"id": "2",
"comments": "SolrCloud supports it too!"
}
]
},
{
"id": "3",
"title": "Lucene and Solr 4.5 is out",
"content_type": "parentDocument",
"_childDocuments_": [
{
"id": "4",
"comments": "Lots of new features"
}
]
}
]
Have a look at this article which describes JSON child documents and block joins.
Using the format mentioned by #qux you will face "Expected: OBJECT_START but got ARRAY_START at [16]",
"code": 400
as when JSON starting with [....] will parsed as a JSON array
{
"id": "44444",
"headline": "testing US",
"generaltags": [
{
"type": "person",
"name": "Jayalalitha",
"relevance": "0.334",
"count": 1
},
{
"type": "person",
"name": "Kumar",
"relevance": "0.234",
"count": 1
}
],
"socialtags": {
"type": "SocialTag",
"name": "US",
"importance": 2
},
"topic": {
"type": "Topic",
"name": "US",
"score": "0.936"
}
}
The above format is correct.
Regarding searching. Kindly use the index to search for the elements of the JSON array.
The workaround for this can be keeping the whole JSON object inside other JSON object and the indexing it
I was suggesting to keep the whole data inside another JSON object. You can try the following way
{
"data": [
{
"id": "44444",
"headline": "testing US",
"generaltags": [
{
"type": "person",
"name": "Jayalalitha",
"relevance": "0.334",
"count": 1
},
{
"type": "person",
"name": "Kumar",
"relevance": "0.234",
"count": 1
}
],
"socialtags": {
"type": "SocialTag",
"name": "US",
"importance": 2
},
"topic": {
"type": "Topic",
"name": "US",
"score": "0.936"
}
}
]
}
see the syntax in http://yonik.com/solr-nested-objects/
$ curl http://localhost:8983/solr/demo/update?commitWithin=3000 -d '
[
{id : book1, type_s:book, title_t : "The Way of Kings", author_s : "Brandon Sanderson",
cat_s:fantasy, pubyear_i:2010, publisher_s:Tor,
_childDocuments_ : [
{ id: book1_c1, type_s:review, review_dt:"2015-01-03T14:30:00Z",
stars_i:5, author_s:yonik,
comment_t:"A great start to what looks like an epic series!"
}
,
{ id: book1_c2, type_s:review, review_dt:"2014-03-15T12:00:00Z",
stars_i:3, author_s:dan,
comment_t:"This book was too long."
}
]
}
]'
supported from solr 5.3