ElasticSearch Nested Array Partial Update - json

I have this particular object which contains the my_array:
"description": "My Object Description",
"my_array": [
{
"id": 1000,
"name": "abc",
"url" : "abc.html",
"content": "somebig content"
},
{
"id": 1001,
"name": "def",
"url" : "def.html",
"content": "somebig content"
},
{
"id": 1002,
"name": "xyz",
"url" : "xyz.html",
"content": "somebig content"
} ]
Each element in array contains a url. Now whenever this object changes, i have a task which hits the url for each element of the array, gets the html content for that element, and creates request document which can be indexed into elasticsearch.
Lets say, the url for id = 1001 is not accessible, and content for this element cannot be accessed. I still want to go ahead and process changes for elements 1000, and 1002. In that case my update would look like this:
"description": "My New Object Description",
"my_array": [
{
"id": 1000,
"name": "abc",
"url" : "abc-new-url.html",
"content": "some modified content"
},
{
"id": 1002,
"name": "xyz",
"url" : "xyz-new-url.html",
"content": "some modified content"
} ]
If i send this partial update to elasticsearch, the collection gets updated but element 1001 is removed from the collection.
My problem is how can i selectively update elements 1000, and 1002 without touching 1001. Index being stale with 1001 here is ok for me. One obvious choice is to fetch the existing doc from elasticsearch, and do the merging manually before doing the update. Is there any other way this partial update can be performed?
Another question, is there any way to send just the url to elasticsearch, and write a plugin to fetch the html content at index time, rather then doing it beforehand?

I think you could solve this using scripting in a update query, see these answers here:
remove objects from array elastic search

You can't do such an update using Elasticsearch native APIs. However, if you don't want to merge the updated content manually on your application level, a possible solution is to store each element of the array in a document with the same index as your original document, but different type.
Then do the update for each one of these elements (which in this case becomes documents) separately

Related

How to ensure a JSON file "id" field always stays unique in VS Code?

Working on JSON file in VS Code that is feeding big app with multiple layers that all have "ID" tag that needs to be unique or the app is broken. Multiple people are working on it and with large number of layers it is hard to keep track. Is there a way to make it appear as error or a warning to a user if there are 2 identical IDs or to make some script that will check it.
Sample code:
"layers":[
{
"id": "firstID",
"name": "name1"
},
{
"id": "secondID",
"name": "name2"
},
{
"id": "thirdID",
"name": "name3"
},
{
"id": "firstID",
"name": "name4"
}]
As seen in this code, is it possible for VS code to mark or make and error for last object as it have same "id" as first?
Thanks in advance

Pentaho Kettle: How to dynamically fetch JSON file columns

Background: I work for a company that basically sells passes. Every order that is placed by the customer will contain N number of passes.
Issue: I have these JSON event-transaction files coming into a S3 bucket on a daily basis from DocumentDB (MongoDB). This JSON file is associated to the relevant type of event (insert, modify or delete) for every document key (which is an order in my case). The example below illustrates a "Insert" type of event that came through to the S3 bucket:
{
"_id": {
"_data": "11111111111111"
},
"operationType": "insert",
"clusterTime": {
"$timestamp": {
"t": 11111111,
"i": 1
}
},
"ns": {
"db": "abc",
"coll": "abc"
},
"documentKey": {
"_id": {
"$uuid": "abcabcabcabcabcabc"
}
},
"fullDocument": {
"_id": {
"$uuid": "abcabcabcabcabcabc"
},
"orderNumber": "1234567",
"externalOrderId": "12345678",
"orderDateTime": "2020-09-11T08:06:26Z[UTC]",
"attraction": "abc",
"entryDate": {
"$date": 2020-09-13
},
"entryTime": {
"$date": 04000000
},
"requestId": "abc",
"ticketUrl": "abc",
"tickets": [
{
"passId": "1111111",
"externalTicketId": "1234567"
},
{
"passId": "222222222",
"externalTicketId": "122442492"
}
],
"_class": "abc"
}
}
As we see above, every JSON file might contain N number of passes and every pass is - in turn - is associated to an external ticket id, which is a different column (as seen above). I want to use Pentaho Kettle to read these JSON files and load the data into the DW. I am aware of the Json input step and Row Normalizer that could then transpose "PassID 1", "PassID 2", "PassID 3"..."PassID N" columns into 1 unique column "Pass" and I would have to have to apply a similar logic to the other column "External ticket id". The problem with that approach is that it is quite static, as in, I need to "tell" Pentaho how many Passes are coming in advance in the Json input step. However what if tomorrow I have an order with 10 different passes? How can I do this dynamically to ensure the job will not break?
If you want a tabular output like
TicketUrl Pass ExternalTicketID
---------- ------ ----------------
abc PassID1Value1 ExTicketIDvalue1
abc PassID1Value2 ExTicketIDvalue2
abc PassID1Value3 ExTicketIDvalue3
And make incoming value dynamic based on JSON input file values, then you can download this transformation Updated Link
I found everything work dynamic in JSON input.

Update MongoDB document using JSON

Is there a way to update a complex MongoDB document from C# using JSON? For example, suppose I have the following document:
{
"name": "John Smith",
"age": 35,
"readingList":
[{
"title": "Title1",
"ISBN": 6246246426724,
"author":
{
"name": "James Johnson",
"age": 40
}
},
{
"title": "Title2",
"ISBN": 3513531513551,
"author":
{
"name": "Sam Hill",
"age": 20
}
}]
}
Now I want to update the age of the second book's author (Sam Hill) from 20 to 21. Suppose I have the following JSON representation:
{
"readingList":
[
{
"title": "Title2",
"author":
{
"age": 21
}
}]
}
Basically the second JSON string is like the first one, minus all the fields and array elements that don't change, except for one field in any array being looked at that uniquely identifies that index. In this case, the "age" field is included since it is being updated with the given value. The "title" field is given to locate the right array element while searching for the field to update. There may also be even more subdocuments and arrays to go through, and the format is not static (it may change at a later time). This is just a simplified example.
Is it possible to pass in something like this to some function and update the correct field that way? Is there something at least similar to this, so I can just pass in some JSON to do the update?
The reason I am looking to do it this way, rather than through simpler means, is because I want to keep track of a history of changes to documents, and if I want to backtrack to an earlier version, I want an easy way to do so that can handle this level of complexity.
UPDATE:
I have some clarifications to make. In this particular scenario I have no way to predict what kinds of changes would need to be made. A change could be made to any field at any time, and that field could be anywhere in the document, possibly at the top level, or within multiple nested subdocuments/arrays. The data we're dealing with is for a separate party that may use it and modify it at will, so we have no control over what they choose to do with it. In addition, there is no fixed schema. The other party could add new fields, including new subdocuments or arrays, or delete them.
The reason I'm asking this question is because I would like to store a history of changes to documents in such a way that I could revert to an older snapshot of the document by applying the changes in reverse. In this case, changing the age from to 20 to 21 would revert the document to an older state (assuming that someone messed with the age beforehand and made it 20, and I wanted to fix it back to 21). Since somebody could make any change they wanted to the system, including to the underlying structure of the data itself, I can't just come up with my own schema, or hardcode a solution that changes specific fields using this specific schema.
In this example, the change in age from 20 to 21 would be from a record in the history whose structure I couldn't predict beforehand. So I am looking for an efficient solution to apply an unpredictable update to a document given a simplified JSON representation of the change to be made.
I am also open to alternatives that don't involve JSON if they are fairly efficient. I brought up JSON because I figured that, given MongoDB's usage of JSON to structure documents, it would make the most sense, and perhaps be superior to something like string manipulation. Another alternative I considered would involve storing the change using some kind of custom dot notation, like this: readingList[ISBN:3513531513551].author.age=21"
This would require me to create a custom function to interpret the string and turn it into something useful though, so it doesn't sound like the best solution.
Hi friend I used below JSON document
{
"_id" : ObjectId("56a99c121f25cc3a3c709151"),
"name" : "John Smith",
"age" : 35,
"readingList" : [
{
"title" : "Title1",
"ISBN" : NumberLong(6246246426724),
"author" : {
"name" : "James Johnson",
"age" : 40
}
},
{
"title" : "Title2",
"ISBN" : NumberLong(3513531513551),
"author" : {
"name" : "Sam Hill",
"age" : "25"
}
}
]
}
I just used condition as author name is Sam Hill and execute below query in C# and its work.
IMongoQuery query = Query.And(Query.EQ("name", "John Smith"), Query.EQ("readingList.author.name", "Sam Hill"));
var result =collection.Update(query,
MongoDB.Driver.Builders.Update.Set("readingList.$.author.age", "21"));
you can query your main document let's assume your main collection is named "books" this is the structure:
{
"id":"123",
"name": "John Smith",
"age": 35,
"readingList":
[{
"title": "Title1",
"ISBN": 6246246426724,
"author":
{
"name": "James Johnson",
"age": 40
}
},
{
"title": "Title2",
"ISBN": 3513531513551,
"author":
{
"name": "Sam Hill",
"age": 20
}
}]
}
// you need a query that returns the main document by id for example, when you have the main document you can query at the one you want to modify in the list and assing it to a varibale let's say readItem, then do the modifications you need and after that you can update only the fields you need using set and onle the element in the array using "$" something like:
readItem.title = "some new title";
readItem.age++;
var update = MongoDB.Driver.Builders.Update.Set("readingList.$", BsonDocumentWrapper.Create(readItem));
Update<Book>(query, update);
Actually I would not advise you to choose this kind of data model because in my experience it will get pretty messy. Still, you might have some very specific requirements which might force you to have this and only this data model.
I would create two collections: persons and readinglists.
persons would look like:
{
"id":"123",
"name": "John Smith",
"age": 35
}
and readinglists would look like (note that it has a compound natural id):
{
"_id": { "personid":"123", "title": "Title1"},
"ISBN": 6246246426724,
"author":
{
"name": "James Johnson",
"age": 40
}
}
Then you can easily update the readinglist:
var query = Query.EQ("_id", new BsonDocument(new BsonElement[]{ new BsonElement("personid":"123"), BsonElement("title":"Title1")}));
readingListCollection.Update(query, Update.Set("author.age": 22));
In your data mode you need to know the array index of the second document. It is better to model readingList attribute as a map. In following example I used isbn as a map key:
{
"id":"123",
"name":"John Smith",
"age":35,
"readingList":{
"6246246426724":{
"title":"Title1",
"ISBN":6246246426724,
"author":{
"name":"James Johnson",
"age":40
}
},
"3513531513551":{
"title":"Title2",
"ISBN":3513531513551,
"author":{
"name":"Sam Hill",
"age":20
}
}
}
}
In this data model you can access second book directly. For instance by dot notation:
db.authors.update(
{ item: "123" },
{ $set: { "readingList.3513531513551.author.age": 22 } }
)
Unfortunately I do know C# notation for that but should be straight forward.

Representing child objects in JSON for a mobile API

I am designing a JSON API for a mobile app and have to decide how to show child objects from the server to the client. Typically the request from the client to the server will be a single request to sync and the response will include all the objects that need updating. What is the best way to show the objects?
Option A - Nested Children:
{ "articles": [
{ "id" : 1,
"title": "This is the first article",
"comments": [
{"id": "1",
"article_id" : "1",
"title": "A comment on the first article"
}]
},
{ "id" : 2,
"title": "This is the second article",
"comments": [
{"id": "2",
"article_id" : "2",
"title": "A comment on the second article"
}]
}, ]}
OPTION B - All Objects on Their Own
{ "articles": [
{ "id" : 1,
"title": "This is the first article",
}
{ "id" : 2,
"title": "This is the second article",
}]
"comments": [
{"id": "1",
"article_id" : "1",
"title": "A comment on the first article"
},
{"id": "2",
"article_id" : "2",
"title": "A comment on the second article"
}]}
On the client side I can handle either format and build the relationship based on the article_id field so I am not too sure why nest the children, other than that it makes it look nice. However, when I think about writing tests for the client-side, especially the mapping of json to objects it seems easier to show and map each object on its own. I am a beginner here so any thoughts would be helpful.
PS. I am building the server using Rails/Grape and the clients with RestKit/Coredata (iOS) and probably RoboSpice/ORMLite (Android).
That's very subjective. There isn't one correct answer for that. It really depends on whatever approach is more suited to your task and data. You say this is a request used to sync data. How is the data represented and stored on the client side? If flat, like a relational database, then the flat output is probably easier to use. On the other hand, if the client will use the relationships a lot, it's probably better to use the nested structure.
From an API design standpoint, I'd have the endpoint for the articles collection accept a query parameter like expand, with a level number, or named entities, and it would add the nested children accordingly. So, for instance, GET /api/articles?expand=comments would generate output with nested comments, or GET /api/articles?expand=1 to generate output with all immediate children. That way, clients can easily generate the nested output if they need it, or they can query the endpoints for articles and comments separately and concatenate the output if they need the flat data.,

How should a JSON response be formatted?

I have a REST service that returns a list of objects. Each object contains objectcode and objectname.
This is my first time building a REST service, so I'm not sure how to format the response.
Should it be:
{
"objects": {
"count": 2,
"object": [
{
"objectcode": "1",
"objectname": "foo"
},
{
"objectcode": "2",
"objectname": "bar"
},
...more objects
]
}
}
OR
[
{
"objectcode": "1",
"objectname": "foo"
},
{
"objectcode": "2",
"objectname": "bar"
},
...more objects
]
I realize this might be a little subjective, but which would be easier to consume? I would also need to support XML formatted response later.
They are the same to consume, as a library handles both just fine. The first one has an advantage over the second though: You will be able to expand the response to include other information additional to the objects (for example, categories) without breaking existing code.
Something like
{
"objects": {
"count": 2,
"object": [
{
"objectcode": "1",
"objectname": "foo"
},
{
"objectcode": "2",
"objectname": "bar"
},
...more objects
]
}
"categories": {
"count": 2,
"category" : [
{ "name": "some category"}
]
}
}
Additionally, the json shouldn't be formatted in any way, so remove whitespace, linebreaks etc. Also, the count isn't really necessary, as it will be saved while parsing the objects themselves.
I often see the first one. Sometimes it's easier to manipulate data to have meta-data. For exemple google API use first one : http://maps.googleapis.com/maps/api/geocode/json?address=1600+Amphitheatre+Parkway,+Mountain+View,+CA&sensor=true
It's not only the question of personal preference; it's also the question fo your requirements. For example, if I was in the same situation and I did need object count on client side then I'd go with first approach otherwise I will choose the second one.
Also please note that "classic" REST server mostly will work a bit different way. If some REST function is to return a list of objects then it should return only a list of URLs to those objects. The URLs should be pointing to details endpoints - so by querying each endpoint you may get details on specific single object.
As a client I would prefer the second format. If the first format only includes the number of "objects", this is redundant information.