How to Index & Search Nested Json in Solr 4.9.0 - json

I want to index & search nested json in solr. Here is my json code
{
"id": "44444",
"headline": "testing US",
"generaltags": [
{
"type": "person",
"name": "Jayalalitha",
"relevance": "0.334",
"count": 1
},
{
"type": "person",
"name": "Kumar",
"relevance": "0.234",
"count": 1
}
],
"socialtags": {
"type": "SocialTag",
"name": "US",
"importance": 2
},
"topic": {
"type": "Topic",
"name": "US",
"score": "0.936"
}
}
When I try to Index, I'm getting the error "Error parsing JSON field value. Unexpected OBJECT_START"
When we tried to use Multivalued Field & index, we couldn't able to search using the multivalued field? Its returning "Undefined Field"
Also Please advice if I need to do any changes in schema.xml file?

You are nesting child documents within your document. You need to use the proper syntax for nested child documents in JSON:
[
{
"id": "1",
"title": "Solr adds block join support",
"content_type": "parentDocument",
"_childDocuments_": [
{
"id": "2",
"comments": "SolrCloud supports it too!"
}
]
},
{
"id": "3",
"title": "Lucene and Solr 4.5 is out",
"content_type": "parentDocument",
"_childDocuments_": [
{
"id": "4",
"comments": "Lots of new features"
}
]
}
]
Have a look at this article which describes JSON child documents and block joins.

Using the format mentioned by #qux you will face "Expected: OBJECT_START but got ARRAY_START at [16]",
"code": 400
as when JSON starting with [....] will parsed as a JSON array
{
"id": "44444",
"headline": "testing US",
"generaltags": [
{
"type": "person",
"name": "Jayalalitha",
"relevance": "0.334",
"count": 1
},
{
"type": "person",
"name": "Kumar",
"relevance": "0.234",
"count": 1
}
],
"socialtags": {
"type": "SocialTag",
"name": "US",
"importance": 2
},
"topic": {
"type": "Topic",
"name": "US",
"score": "0.936"
}
}
The above format is correct.
Regarding searching. Kindly use the index to search for the elements of the JSON array.
The workaround for this can be keeping the whole JSON object inside other JSON object and the indexing it
I was suggesting to keep the whole data inside another JSON object. You can try the following way
{
"data": [
{
"id": "44444",
"headline": "testing US",
"generaltags": [
{
"type": "person",
"name": "Jayalalitha",
"relevance": "0.334",
"count": 1
},
{
"type": "person",
"name": "Kumar",
"relevance": "0.234",
"count": 1
}
],
"socialtags": {
"type": "SocialTag",
"name": "US",
"importance": 2
},
"topic": {
"type": "Topic",
"name": "US",
"score": "0.936"
}
}
]
}

see the syntax in http://yonik.com/solr-nested-objects/
$ curl http://localhost:8983/solr/demo/update?commitWithin=3000 -d '
[
{id : book1, type_s:book, title_t : "The Way of Kings", author_s : "Brandon Sanderson",
cat_s:fantasy, pubyear_i:2010, publisher_s:Tor,
_childDocuments_ : [
{ id: book1_c1, type_s:review, review_dt:"2015-01-03T14:30:00Z",
stars_i:5, author_s:yonik,
comment_t:"A great start to what looks like an epic series!"
}
,
{ id: book1_c2, type_s:review, review_dt:"2014-03-15T12:00:00Z",
stars_i:3, author_s:dan,
comment_t:"This book was too long."
}
]
}
]'
supported from solr 5.3

Related

PayPal JSON format updating order

I know I am close on this, the error messages are getting nicer. Currently, I can call a similar call to update the seller's email no issue via Postman currently, working on updating the amount and associated objects. Something in my request format is off.
Is my breakdown section in the correct location? The amount_breakdown documentation looks like it is on same level as value and currency_code, so does it need to move into that section.
Here's my request JSON via Postman:
[
{
"op": "replace",
"path": "/purchase_units/#reference_id=='default'/amount",
"value": {
"currency_code": "CAD",
"value": "2",
"amount": {
"currency_code": "CAD",
"value": "2",
"breakdown": {
"item_total": {
"currency_code": "CAD",
"value": "2"
},
"tax_total": {
"value": "0",
"currency_code": "CAD"
}
}
},
"items": [
{
"name": "First Product Name",
"description": "Optional descriptive text..",
"unit_amount": {
"currency_code": "CAD",
"value": "2"
},
"tax": {
"value": "0",
"currency_code": "CAD"
},
"quantity": "1"
}
]
}
}
]
RESPONSE:
{
"name": "UNPROCESSABLE_ENTITY",
"details": [
{
"field": "/purchase_units/#reference_id=='default'/amount/breakdown/item_total",
"location": "body",
"issue": "ITEM_TOTAL_REQUIRED",
"description": "If item details are specified (items.unit_amount and items.quantity) corresponding amount.breakdown.item_total is required."
}
],
"message": "The requested action could not be performed, semantically incorrect, or failed business validation.",
"debug_id": "acecd3643c994",
"links": [
{
"href": "https://developer.paypal.com/docs/api/orders/v2/#error-ITEM_TOTAL_REQUIRED",
"rel": "information_link",
"method": "GET"
}
]
}
Thanks for any help!
Different variations of objects.
I can get the other PATCH operation working no issue but it is much simpler in object structure
There should be no amount key under the /amount path, and the items array does not belong at that /amount path either.

Json Path Read from a Kafka Message

I have a kafka message like below, where im trying to read the data from the json path. However im having a challenge when reading some of the attributes from the json path. here is the sample message.
sample1:
{
"header": {
"bu": "google",
"id": "12345",
"bum": "google",
"originTimestamp": "2021-10-09T15:17:09.842+00:00",
"batchSize": "0",
"jobType": "Batch"
},
"payload": {
"derivationdetails": {
"Id": "6783jhvvh897u31y283y",
"itemid": "1234567",
"batchid": 107,
"attributes": {
"itemid": "1234567",
"lineNbr": "1498",
"cat": "5929",
"Id": "6783jhvvh897u31y283y",
"indicator": "false",
"subcat": "3514"
},
"Exception": {
"values": [
{
"type": "PICK",
"value": "blocked",
"Reason": [
"RULE"
],
"rules": [
"439"
]
}
],
"rulesBagInfo": [
{
"Idtype": "XXXX",
"uniqueid": "7889423rbhevfhjaufdyeuiryeukjbdafvjd",
"rulesMatch": [
"439"
]
}
]
}
}
}
}
sample 2: Same message but see the difference in "Payload"
{
"header": {
"bu": "google",
"id": "12345",
"bum": "google",
"originTimestamp": "2021-10-09T15:17:09.842+00:00",
"batchSize": "0",
"jobType": "Batch"
},
"payload": {
"Id": "6783jhvvh897u31y283y",
"itemid": "1234567",
"batchid": 107,
"attributes": {
"itemid": "1234567",
"lineNbr": "1498",
"cat": "5929",
"Id": "6783jhvvh897u31y283y",
"indicator": "false",
"subcat": "3514"
},
"Exception": {
"values": [
{
"type": "PICK",
"value": "blocked",
"Reason": [
"RULE"
],
"rules": [
"439"
]
}
],
"rulesBagInfo": [
{
"Idtype": "XXXX",
"uniqueid": "7889423rbhevfhjaufdyeuiryeukjbdafvjd",
"rulesMatch": [
"439"
]
}
]
}
}
}
If you observe, sometimes the message has "derivationdetails", and sometimes it doesn't. But irrespective of its existence, i need to read the values of id,itemid and batchid. I tried using
$.payload[*].id
$.payload[*].itemid
$.payload[*].batchid
But i see that for batchid is returning null even though it has a value in the message, and the attributes under "attributes" return null if im using the above. For fields under "attributes" im using this(example):
$.payload.attributes.itemId
And, completely blank on how to read the below part.
"Exception": {
"values": [
{
"type": "PICK",
"value": "blocked",
"Reason": [
"RULE"
],
"rules": [
"439"
]
}
],
"rulesBagInfo": [
{
"Idtype": "XXXX",
"uniqueid": "7889423rbhevfhjaufdyeuiryeukjbdafvjd",
"rulesMatch": [
"439"
]
Im new to this and need some suggestions on how to read the attributes properly. Any help would be much appreciated.Thanks
Use ..(recursive descent, Deep scan. JSONPath borrows this syntax from E4X.) to get the values. But It will return a list if there are multiple entries with same key nested in deep.
Below jsonpath expressions will return a list with one item each for both sample1 and sample2
$.payload..attributes.Id
$.payload..attributes.itemid
$.payload..batchid
$.payload..Exception

Compare 2 cucumber JSON reports with ruby

The problem is: I have 2 cucumber test reports in JSON format
I need to remove redundant key-value pairs from those reports and compare them, but I can't understand how to remove the unnecessary data from those 2 jsons because of their structure after JSON.parse (array or hash with many nested arrays/hashes). Please advice if there are some gems or known solutions to do this
JSON structure is e.g. :
[
{
"uri": "features/home_screen.feature",
"id": "as-a-user-i-want-to-explore-home-screen",
"keyword": "Feature",
"name": "As a user I want to explore home screen",
"description": "",
"line": 2,
"tags": [
{
"name": "#home_screen",
"line": 1
}
],
"elements": [
{
"keyword": "Background",
"name": "",
"description": "",
"line": 3,
"type": "background",
"before": [
{
"match": {
"location": "features/step_definitions/support/hooks.rb:1"
},
"result": {
"status": "passed",
"duration": 505329000
}
}
],
"steps": [
{
"keyword": "Given ",
"name": "I click OK button in popup",
"line": 4,
"match": {
"location": "features/step_definitions/registration_steps.rb:91"
},
"result": {
"status": "passed",
"duration": 2329140000
}
},
{
"keyword": "And ",
"name": "I click Allow button in popup",
"line": 5,
"match": {
"location": "features/step_definitions/registration_steps.rb:96"
},
"result": {
"status": "passed",
"duration": 1861776000
}
}
]
},
Since you are asking for a gem, you might try iteraptor I have created exactly for this kind of tasks.
It allows iterating, mapping and reducing the deeply nested structures. For instance, to filter out all the keys called "name" on all levels, you might do:
input.iteraptor.reject(/name/)
The more detailed description might be found on the github page linked above.

How to parse a JSON key that has a hash sign # in it with the jq command?

I have a JSON data:
{
"module": {
"data": {
"deliverySummary_200648721592191#address": {
"fields": {
"address": "MyAddress",
"consignee": "MyName",
"phone": "MyPhone",
"postCode": "",
"title": "Alamat Pengiriman \\r\\n"
},
"id": "200648721592191#address",
"tag": "deliverySummary",
"type": "biz"
}
}
}
}
And I want to extract this part:
{
"fields": {
"address": "MyAddress",
"consignee": "MyName",
"phone": "MyPhone",
"postCode": "",
"title": "Alamat Pengiriman \\r\\n"
},
"id": "200648721592191#address",
"tag": "deliverySummary",
"type": "biz"
}
I have tried jq '.module.data.deliverySummary_200648721592191#address' but it just returns null instead of the part that I want above, how do I fix it ?
You should add double-quotation-marks around the problematic property key like '.module.data."deliverySummary_200648721592191#address"' for your case.
See the playground result here.

JQ JSON parser, concatenate a certain child of an array

I have the following JSON input.
The array is called cars.
So if I do .cars I get what you see below.
Now I have to concatenate all .name elements for each item of the array.
I want the output to be
Audi
VW
Audi,Honda,Chevy
Can you please help me construct the filter to output this concatenation of .name?
Sometimes .name can be empty, not null, nothing just empty. So I do need a // "null" added to the filter as well.
Thank you in advance.
[
{
"self": "link",
"id": "18900",
"name": "Audi",
"releaseDate": "2015-12-11"
}
]
[
{
"self": "link",
"id": "18900",
"name": "VW",
"releaseDate": "2015-12-11"
}
]
[
{
"self": "link",
"id": "19400",
"name": "Audi",
"releaseDate": "2015-11-20"
},
{
"self": "link",
"id": "18900",
"name": "Honda",
"releaseDate": "2015-12-11"
},
{
"self": "link",
"id": "19201",
"name": "Chevy",
"releaseDate": "2016-01-08"
}
]
Does this solve your problem?
.cars | map(.name? // empty) | join(",")