Deeply nested JSON documents in Apache Solr - json

I have a deeply nested document(pseudo structure as shown below):
[{
"id": "1",
"company_id": "1",
"company_name": "company_1",
"departments":[{
"dep1" : [{
"id" : 40,
"name" : xyz
},
{
"id" : 41,
"name" : xyr
}],
"dep2": [{
}]
}]
"employeePrograms" :[{
}]
}]
How can I index these type of documents in Apache Solr?
Documentation gives the idea of immediate child documents alone.

Unfortunatelly i'm don't have huge experience with this technology, but want to help. Here is some official documentation, that might be useful: oficial doc
more specific
If you have some uncommon issue, tell about it, maybe any error, or whatever.. I would try my best to help)
Upd1 :
Solr can only maintain a 'flat' representation of the data. What you weretrying to do is not really possible. There are a number of workarounds, such as using dynamic fields and using a solr join to link multiple data sets.
Speking about a deep nesting ? I've found such an example of work around.
If you had something like that:
"docs": [
{
"name": "Product Name",
"categories": [
{
"name": "Category 1",
"priority": 8
},
{
"name": "Category 2",
"priority": 6
}
...
]
},
You have to modify it like that to make it not deeply nested :
"docs": [
{
name: "Sample Product"
categories: [
{
priority_category: "9_Category 1",
},
{
priority_category: "5_Category 2",
}
...
]
},
So, you've done something similar, check if there are any errors anywhere

Related

JSON PATCH / Remove: how to remove specific value from an object?

How do you remove a value from an object in an http patch request?
For example:
{
"types": [
{
"id": 12,
"name": "xyz",
},
{
"id": 13,
"name": "ABC",
}
]
}
How do you remove where type/id=13?
Is it like this?
[{
"op":"remove",
"path":"types/13"
}]
Or like this?
[{
"op":"remove",
"path":"types",
"value":[{"id":13}]
}]
Also, Im not looking the delete the position no2 solution!
Thanks :)
One solution I found by the 3rd party I was using was to "update" with "null" and that deletes the specific entry. (Im not sure if it works universally)
[{
"op":"replace",
"path":"types",
"value":[{
"id": 13,
"value": null
}]
}]

replace "key" name in whole JSON python for bulk data in efficient way

Actually i am pushing data to other system but before pushing i have to change the "key" in the whole JSON. JSON may contain 200 or 10000 or 250000 data.
sample JSON:
{
"insert": "table",
"contacts": [
{
"testName": "testname",
"ContactID": 212121
},
{
"testName": "testname",
"ContactID": 2146354564
},
{
"testName": "testname",
"ContactID": 12312
},
{
"testName": "testname",
"ContactID": 211221
},
{
"testName": "testname",
"ContactID": 10218550
}
]
}
I need to change contacts array Keys. These contacts may be in bulk. So i need to work with this efficiently with minimal complexity.
The above JSON to be converted as below
{
"insert": "table",
"contacts": [
{
"name": "testname",
"phone": 212121
},
{
"name": "testname",
"phone": 2146354564
},
{
"name": "testname",
"phone": 12312
},
{
"name": "testname",
"phone": 211221
},
{
"name": "testname",
"phone": 10218550
}
]
}
here is my code trying by loop
ini_dict = request.data
contact_data = ini_dict['contacts']
for i in contact_data:
i['name'] = i.pop('testName')
print(contact_data)
Please suggest me how can i change the key names efficiently for bulk data. i mean for 50000 lists in contacts. "for loop" will be leading a performance issue. So please let me know the efficient way to achieve this
I dont know how fast you need it to be nor how you are choosing to store your json. One simple solution is just store it as a string and then replace all the instances of your attributes.
# Something like this using a jsonstring
jsonstring.replace("'testName':", "'name':")
jsonstring.replace("'ContactId':", "'phone':")
If you want to do this in bulk you, may need to create some batch process to be able to fetch multiple existing records and make changes at once. I have done this before with the java equivalent of https://pypi.org/project/JayDeBeApi/ but, that was more for modifying existing records in a database.

JSON Slurper Offsets

I have a large JSON file that I'm trying to parse with JSON Slurper. The JSON file consists of information about bugs so it has things like issue keys, descriptions, and comments. Not every issue has a comment though. For example, here is a sample of what the JSON input looks like:
{
"projects": [
{
"name": "Test Project",
"key": "TEST",
"issues": [
{
"key": "BUG-1",
"priority": "Major",
"comments": [
{
"author": "a1",
"created": "d1",
"body": "comment 1"
},
{
"author": "a2",
"created": "d2",
"body": "comment 2"
}
]
},
{
"key": "BUG-2",
"priority": "Major"
},
{
"key": "BUG-3",
"priority": "Major",
"comments": [
{
"author": "a3",
"created": "d3",
"body": "comment 3"
}
]
}
]
}
]
}
I have a method that creates Issue objects based on the JSON parse. Everything works well when every issue has at least one comment, but, once an issue comes up that has no comments, the rest of the issues get the wrong comments. I am currently looping through the JSON file based on the total number of issues and then looking for comments using how far along in the number of issues I've gotten. So, for example,
parsedData.issues.comments.body[0][0][0]
returns "comment 1". However,
parsedData.issues.comments.body[0][1][0]
returns "comment 3", which is incorrect. Is there a way I can see if a particular issue has any comments? I'd rather not have to edit the JSON file to add empty comment fields, but would that even help?
You can do this:
parsedData.issues.comments.collect { it?.body ?: [] }
So it checks for a body and if none exists, returns an empty list
UPDATE
Based on the update to the question, you can do:
parsedData.projects.collectMany { it.issues.comments.collect { it?.body ?: [] } }

How to update multiple documents in Solr with JSON?

How to update multiple documents in Solr 4.5.1 with JSON? I tried this but it does not work:
POST /solr/mycore/update/json:
{
"commit": {},
"add": {
"overwrite": true,
"doc": [{
"thumbnail": "/images/404.png",
"url": "/404.html?1",
"id": "demo:/404.html?1",
"channel": "demo",
"display_name": "One entry",
"description": "One entry is not enough."
}, {
"thumbnail": "/images/404.png",
"url": "/404.html?2",
"id": "demo:/404.html?2",
"channel": "demo",
"display_name": "Another entry",
"description": "Another entry is required."
}
]
}
}
Solr expects one "add"-key in the JSON-structure for each document (which might seem weird, if you think about the original meaning of the key in the object), since it maps directly to the XML format when doing the indexing - and this way you can have metadata for each document by itself.
{
"commit": {},
"add": {
"doc": {
"id": "321321",
"name": "barfoo"
}
},
"add": {
"doc": {
"id": "123123",
"name": "Foobar"
}
}
}
.. works. I think allowing an array as the element referenced by "add" would make more sense, but I haven't dug further into the source or know the reasoning behind this.
I understand that (at least) from versions 4.0 and older of solr, this has been fixed. Look at http://wiki.apache.org/solr/UpdateJSON.
In ./exampledocs/books.json there is an example of a json file with multiple documents.
[
{
"id" : "978-0641723445",
"cat" : ["book","hardcover"],
"name" : "The Lightning Thief",
"author" : "Rick Riordan",
"series_t" : "Percy Jackson and the Olympians",
"sequence_i" : 1,
"genre_s" : "fantasy",
"inStock" : true,
"price" : 12.50,
"pages_i" : 384
}
,
{
"id" : "978-1423103349",
"cat" : ["book","paperback"],
"name" : "The Sea of Monsters",
"author" : "Rick Riordan",
"series_t" : "Percy Jackson and the Olympians",
"sequence_i" : 2,
"genre_s" : "fantasy",
"inStock" : true,
"price" : 6.49,
"pages_i" : 304
},
...
]
While #fiskfisk answer is still a valid JSON, it is not easy to be serializable from a data structure. This one is.
elachell is correct that the array format will work if you are just adding documents with the default settings. Unfortunately, that won't work if, for instance, you need to add a custom boost to some of the documents or change the overwrite setting. You then have to use the full object structure with an "add" key for each of them, which as they pointed out, makes this frustratingly annoying to try to serialize from most languages which don't allow the same key more than once in an object:
{
"commit": {},
"add": {
"doc": {
"id": "321321",
"name": "barfoo"
},
"boost": 2.0
},
"add": {
"doc": {
"id": "123123",
"name": "Foobar"
},
"boost": 1.5,
"overwrite": false
}
}
Update for SOLR 8.8 (and maybe lower).
The following JSON works for /update/json:
{
'add': [
{'id': '123', 'field1': 'foo'},
{'id': '124', 'field1': 'foo'}
],
'delete': ['111', '106']
}
Another option if you are on Solr 4.10 or later is to use a custom JSON structure and tell Solr how to index it (not sure how to add boosts with this method either, but it's a nice option if you already have a data struct in JSON and don't want to convert it over to Solr's format). Here's the Solr documentation on this option:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-TransformingandIndexingCustomJSON

JSON format with gzip compression

My current project sends a lot of data to the browser in JSON via ajax requests.
I've been trying to decide which format I should use. The two I have in mind are
[
"colname1" : "content",
"colname2" : "content",
],
[
"colname1" : "content",
"colname2" : "content",
],
...
and
{
"columns": [
"column name 1",
"column name 2",
],
"rows": [
[
"content",
"content"
],
[
"content",
"content"
]
...
]
}
The first method is better because it is easier to work with. I just have to convert to an object once received. The second will need some post processing to convert it into a format more like the first so it is easier to work with in JavaScript.
The second is better because it is less verbose and therefore takes up less bandwidth and downloads more quickly. Before compression it is usually between 0.75% and 0.85% of the size of the first format.
GZip compression complicates things further. Making the difference in file size nearer 0.85% to 0.95%
Which format should I go with and why?
I'd suggest using RJSON:
RJSON (Recursive JSON) converts any JSON data collection into more compact recursive form. Compressed data is still JSON and can be parsed with JSON.parse. RJSON can compress not only homogeneous collections, but any data sets with free structure.
Example:
JSON:
{
"id": 7,
"tags": ["programming", "javascript"],
"users": [
{"first": "Homer", "last": "Simpson"},
{"first": "Hank", "last": "Hill"},
{"first": "Peter", "last": "Griffin"}
],
"books": [
{"title": "JavaScript", "author": "Flanagan", "year": 2006},
{"title": "Cascading Style Sheets", "author": "Meyer", "year": 2004}
]
}
RJSON:
{
"id": 7,
"tags": ["programming", "javascript"],
"users": [
{"first": "Homer", "last": "Simpson"},
[2, "Hank", "Hill", "Peter", "Griffin"]
],
"books": [
{"title": "JavaScript", "author": "Flanagan", "year": 2006},
[3, "Cascading Style Sheets", "Meyer", 2004]
]
}
Shouldn't the second bit of example 1 be "rowname1"..etc.? I don't really get example 2 so I guess I would aim you towards 1. There is much to be said for having data immediately workable without pre-processing it first. Justification: I once spend too long optimizing array system that turned out to work perfectly but its hell to update it now.