How to update multiple documents in Solr with JSON? - json

How to update multiple documents in Solr 4.5.1 with JSON? I tried this but it does not work:
POST /solr/mycore/update/json:
{
"commit": {},
"add": {
"overwrite": true,
"doc": [{
"thumbnail": "/images/404.png",
"url": "/404.html?1",
"id": "demo:/404.html?1",
"channel": "demo",
"display_name": "One entry",
"description": "One entry is not enough."
}, {
"thumbnail": "/images/404.png",
"url": "/404.html?2",
"id": "demo:/404.html?2",
"channel": "demo",
"display_name": "Another entry",
"description": "Another entry is required."
}
]
}
}

Solr expects one "add"-key in the JSON-structure for each document (which might seem weird, if you think about the original meaning of the key in the object), since it maps directly to the XML format when doing the indexing - and this way you can have metadata for each document by itself.
{
"commit": {},
"add": {
"doc": {
"id": "321321",
"name": "barfoo"
}
},
"add": {
"doc": {
"id": "123123",
"name": "Foobar"
}
}
}
.. works. I think allowing an array as the element referenced by "add" would make more sense, but I haven't dug further into the source or know the reasoning behind this.

I understand that (at least) from versions 4.0 and older of solr, this has been fixed. Look at http://wiki.apache.org/solr/UpdateJSON.
In ./exampledocs/books.json there is an example of a json file with multiple documents.
[
{
"id" : "978-0641723445",
"cat" : ["book","hardcover"],
"name" : "The Lightning Thief",
"author" : "Rick Riordan",
"series_t" : "Percy Jackson and the Olympians",
"sequence_i" : 1,
"genre_s" : "fantasy",
"inStock" : true,
"price" : 12.50,
"pages_i" : 384
}
,
{
"id" : "978-1423103349",
"cat" : ["book","paperback"],
"name" : "The Sea of Monsters",
"author" : "Rick Riordan",
"series_t" : "Percy Jackson and the Olympians",
"sequence_i" : 2,
"genre_s" : "fantasy",
"inStock" : true,
"price" : 6.49,
"pages_i" : 304
},
...
]
While #fiskfisk answer is still a valid JSON, it is not easy to be serializable from a data structure. This one is.

elachell is correct that the array format will work if you are just adding documents with the default settings. Unfortunately, that won't work if, for instance, you need to add a custom boost to some of the documents or change the overwrite setting. You then have to use the full object structure with an "add" key for each of them, which as they pointed out, makes this frustratingly annoying to try to serialize from most languages which don't allow the same key more than once in an object:
{
"commit": {},
"add": {
"doc": {
"id": "321321",
"name": "barfoo"
},
"boost": 2.0
},
"add": {
"doc": {
"id": "123123",
"name": "Foobar"
},
"boost": 1.5,
"overwrite": false
}
}

Update for SOLR 8.8 (and maybe lower).
The following JSON works for /update/json:
{
'add': [
{'id': '123', 'field1': 'foo'},
{'id': '124', 'field1': 'foo'}
],
'delete': ['111', '106']
}

Another option if you are on Solr 4.10 or later is to use a custom JSON structure and tell Solr how to index it (not sure how to add boosts with this method either, but it's a nice option if you already have a data struct in JSON and don't want to convert it over to Solr's format). Here's the Solr documentation on this option:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-TransformingandIndexingCustomJSON

Related

Are JSON nested objects necessary?

I have a question about the way JSON is written. Which is better in terms of usability? I am trying to use a single JSON file for decoding in PHP and Swift, and was wondering which would be the better way for the JSON to be written. Nested objects or no nested objects, essentially.
Option 1
{
"title": "Test",
"image": "image",
"imageCard": "imae2",
"count": 20,
"section": "personal",
"description": "Description",
"color": "#F17B08",
"content": {
"video": true,
"text": true,
"updated": false
},
"homepage": {
"featured": false,
"popular": false,
"new": false
},
"levels": "7-12"
}
]
Option 2
[
{
"title": "Test",
"image": "image",
"imageCard": "imae2",
"count": 20,
"section": "personal",
"description": "Description",
"color": "#F17B08",
"video": true,
"text": true,
"updated": false,
"featured": false,
"popular": false,
"new": false,
"levels": "7-12"
}
]
Thanks!
A JSON structure is often bound to be mapped to an object-oriented class. The closer the JSON structure is from the Model of your program (all the classes, and how each has one or many relationships to others), the easier it will be to deserialize the response from the API.
Now, JSON nesting is necessary just as much as you make associations, compositions and aggregations with your classes.
Let's say you have a Student that has a name, an unique ID and a Teacher which gives many Courses
Having a nested JSON such as:
[
{
"Name":"John Doe",
"ID":"948AFF",
"Teacher":{
"John Nash",
"Classes":[
{...},
{...},
{...}
]
}
},
{
"Name":"William Smooth",
"ID":"123LMLG",
"Teacher":{
"Ulrich Stokes",
"Classes":[
{...},
{...},
{...}
]
}
}
]
Is an intuitive, clean, flexible way to structure the JSON. Because it maps easily to your model. Each element of the array is an instance of a Student which has an instance of Teacher, which has a list of Courses.
Whereas if you had a flat structure like you suggest :
[
{
"Name":"John Doe",
"ID":"948AFF",
"TeacherName":"John Nash",
"ClassesOfTheTeacher":[
{...},
{...},
{...}
]
},
{
"Name":"William Smooth",
"ID":"123LMLG",
"TeacherName":"Ulrich Stokes",
"ClassesOfTheTeacher":[
{...},
{...},
{...}
]
}
]
Forces you to systematically map each field to the attribute of one of your classes. It doesn't stick easily to your model, you have to make some manipulations to map this to your model.
Furthermore, what if suddenly you wanted to have .. 3 Teachers for each student ? Would you end up with TeacherName1, TeacherName2, TeacherName3, and ClassesOfTheTeacher1, ClassesOfTheTeacher2, etc. ? Probably not.
Nested JSON is necessary if you have association/composition/aggregation in your classes.

Fiware STH: row data API not exposing metadata

I am using Cygnus with Mongo and sth sink to retrieve historical data.
In the current implementation of cygnus mongo sink the attribute metadata is not stored in the data base. So I updated cygnus to be able to store the attribute metadata.
But when I use the STH-comet to retrieve the history, the API appreantly does not support retrieveing the attribute metadata.
Am I missing some kind of configuration or the API is not supporting the attribute metadata since the response that I am getting from STH-comet is:
{
"contextResponses": [
{
"contextElement": {
"attributes": [
{
"name": "humidity",
"values": [
{
"recvTime": "2017-03-08T08:06:11.463Z",
"attrType": "Number",
"attrValue": "999"
},
{
"recvTime": "2017-03-08T08:10:54.199Z",
"attrType": "Number",
"attrValue": "3.06"
}
]
}
],
"id": "Room1",
"isPattern": false,
"type": "Room"
},
"statusCode": {
"code": "200",
"reasonPhrase": "OK"
}
}
]
}
In the mongoDB data base I have this content:
{ "_id" : ObjectId("58bfbb7c973c5c22d258cffc"), "recvTime" : ISODate("2017-03-08T08:06:11.463Z"), "attrName" : "humidity", "attrType" : "Number", "attrValue" : "999", "attrMetadata" : [ ] }
{ "_id" : ObjectId("58bfbc93973c5c22d258cffd"), "recvTime" : ISODate("2017-03-08T08:10:54.199Z"), "attrName" : "humidity", "attrType" : "Number", "attrValue" : "3.06", "attrMetadata" : [ { "name" : "unit", "type" : "Text", "value" : "voltage" } ] }
In case the API is not supporting the retrieval of the attribute metadata, can this feature be added?
Thanks & Best regards.
STH and Cygnus are aligned with regards to the information stored in MongoDB, both raw and aggregated one. In this sense, because Cygnus originally did not support for attribute metadata in NGSIMongoSink (the one in charge of storing the information in raw format), STH do not support attribute metadata in its raw API either.
As long as you have extended Cygnus functionality for this purpose, you'll have to extend STH API as well.

Deeply nested JSON documents in Apache Solr

I have a deeply nested document(pseudo structure as shown below):
[{
"id": "1",
"company_id": "1",
"company_name": "company_1",
"departments":[{
"dep1" : [{
"id" : 40,
"name" : xyz
},
{
"id" : 41,
"name" : xyr
}],
"dep2": [{
}]
}]
"employeePrograms" :[{
}]
}]
How can I index these type of documents in Apache Solr?
Documentation gives the idea of immediate child documents alone.
Unfortunatelly i'm don't have huge experience with this technology, but want to help. Here is some official documentation, that might be useful: oficial doc
more specific
If you have some uncommon issue, tell about it, maybe any error, or whatever.. I would try my best to help)
Upd1 :
Solr can only maintain a 'flat' representation of the data. What you weretrying to do is not really possible. There are a number of workarounds, such as using dynamic fields and using a solr join to link multiple data sets.
Speking about a deep nesting ? I've found such an example of work around.
If you had something like that:
"docs": [
{
"name": "Product Name",
"categories": [
{
"name": "Category 1",
"priority": 8
},
{
"name": "Category 2",
"priority": 6
}
...
]
},
You have to modify it like that to make it not deeply nested :
"docs": [
{
name: "Sample Product"
categories: [
{
priority_category: "9_Category 1",
},
{
priority_category: "5_Category 2",
}
...
]
},
So, you've done something similar, check if there are any errors anywhere

Talend: parse JSON string to multiple output

I'm aware of this question but I don't believe that there is no solution with standars component. I'm using Talend ESB Studio 5.4.
I have to parse a JSON string from a REST web service into multiple output, and add them to a database.
Database has two tables:
User (user_id, name, card, card_id, points)
Action (user_id, action_id, description, used_point)
My JSON Structure is something like that:
{
"users": [
{
"name": "foo",
"user_id": 1,
"card": {
"card_id": "AAA",
"points": 10
},
"actions": [
{
"action_id": 1,
"description": "buy",
"used_points": 2
},
{
"action_id": 3,
"description": "buy",
"used_points": 1
}
]
},
{
"name": "bar",
"user_id": 2,
"card": {
"card_id": "BBB",
"points": -1
},
"actions": [
{
"id": 2,
"description": "sell",
"used_point": 5
}
]
}
]
}
I have tried to add a JSON Schema Metadata but it is not clear to me how to "flat" the JSON. I have tried to look at tXMLMap, tExtractJSONFields.. but no luck till now.
I also had a look at tJavaRow but I don't understand how to make a Schema for that.
It's a pity because till now I'm loving Talend! Any advice?
You can save a json file in your disk, then create new json file in the metadata of Talend studio, the wizard retrieve the schema for you, after saving, you ca, copie schema in the generic schema of the metadata, and it's done, use that generic schema where you want, this is how to use generic schema in the tRestClient component:

JSON Slurper Offsets

I have a large JSON file that I'm trying to parse with JSON Slurper. The JSON file consists of information about bugs so it has things like issue keys, descriptions, and comments. Not every issue has a comment though. For example, here is a sample of what the JSON input looks like:
{
"projects": [
{
"name": "Test Project",
"key": "TEST",
"issues": [
{
"key": "BUG-1",
"priority": "Major",
"comments": [
{
"author": "a1",
"created": "d1",
"body": "comment 1"
},
{
"author": "a2",
"created": "d2",
"body": "comment 2"
}
]
},
{
"key": "BUG-2",
"priority": "Major"
},
{
"key": "BUG-3",
"priority": "Major",
"comments": [
{
"author": "a3",
"created": "d3",
"body": "comment 3"
}
]
}
]
}
]
}
I have a method that creates Issue objects based on the JSON parse. Everything works well when every issue has at least one comment, but, once an issue comes up that has no comments, the rest of the issues get the wrong comments. I am currently looping through the JSON file based on the total number of issues and then looking for comments using how far along in the number of issues I've gotten. So, for example,
parsedData.issues.comments.body[0][0][0]
returns "comment 1". However,
parsedData.issues.comments.body[0][1][0]
returns "comment 3", which is incorrect. Is there a way I can see if a particular issue has any comments? I'd rather not have to edit the JSON file to add empty comment fields, but would that even help?
You can do this:
parsedData.issues.comments.collect { it?.body ?: [] }
So it checks for a body and if none exists, returns an empty list
UPDATE
Based on the update to the question, you can do:
parsedData.projects.collectMany { it.issues.comments.collect { it?.body ?: [] } }