Getting specific fields of nested elasticsearch document - json

We have access to an api of all companies in denmark. There is a lot of data, necessary and unneccesary. We need to get a list business owners and directors.
The relevant list of Owners are nested four fields deep in the document. We loop over every 'organization', and loop over all the 'attributes' of the organization. If the attribute field 'type' is the string 'function' AND the field of a 'value' in 'attribute'(values is also a list, we however only want the first one) is on a whitelist(One of five strings) then we want the Value of the organizations 'name' field along with some other fields(the whole organization object would also be acceptable).
We currently take all these steps in a python loop, with 4 nested for loops for everything. This is a major PITA. I wan't to make this into an elasticsearch query(the api we query is alreasy es) but i have no major experience with DSL. Anyone know how this would be done?
Here is a gist of the 'path' we take to a given field: https://gist.github.com/mrcpj1998/e65c6988cf8aea9fcea4c4fb6c007c6f i also have a copy of the whole document In JSON here

Related

Freemarker: find specific object in array of arrays

I have a complex many-to-many relationship defined. The cross-reference table is an entity, so I have Contact with a One-To-Many to ContactList, and List with a One-To-Many to Contact List. Contact List contains listID, contactID, and a few Booleans. The relationships seem to work well and on the backend I can get a list of contacts on a review list using the Spring-Data-Jpa findByContactListsIn(Set).
However, I am trying to build a list of contacts in Freemarker, and show whether they were in the current list.
Before I made an Entity out of ContactList, I had a standard Many-To-Many relationship between them, and I was able to do something like this in my .ftl:
<#if list.contacts?seq_contains(contact)>
But I needed to add some data to ContactList specifically, so I needed it to be more complicated. How can I do something similar now? I tried:
<#if list.contactLists?seq_contains(contact)
But of course that always returns false, because it is comparing two different entity types. Is there a way to find if a contact is in one of the contactList objects?
I suppose I could do some back-end trickery, but I am looking for a front-end solution to this.
Don't use ?seq_contains for finding generic object at all. It doesn't call Object.equals, instead it works like the == operator of the template language, which only allows comparing strings, numbers, booleans and dates/times, otherwise it gives you an error. Unfortunately it won't fail in your case, because POJO-s are also strings (and their string value is what toString() returns). This is an unfortunate legacy of the stock ObjectWrapper (scheduled to be fixed in FM3); not even a quirk in the template language. Ideally you get an error there. Instead, now it silently compares the return value of the toString()-s...
Your data-model should already contain what the template should actually display. FTL is not a programming language, so if you try to extract that from the data-model in it, it will be a pain. But, that the data-model contains that data can also mean that some objects in the data-model have methods that extract the data you need. As a last resort, you can add objects that just contain helper methods.
Update: Returning to ?seq_contains, if you need the Java semantics and list is a Java Collection, you can just use the Java API: list?api.contains(contact).

Using Jolt, How do I remove a field "last_update" from everywhere in a JSON string?

I have a JSON array of objects and in many of the objects, at various points, there's a "last_update" field. ("Person" object may have a "Jobs" array and each Job in Jobs array may have an last_update, as well as the parent Person, as well as each Address in the "Addresses" object, etc. The "last_updated" field is not always at the same depth for various objects and in some objects may appear in multiple places.
I want to remove any mention of "last_update" no matter where in the JSON tree it lands.
If I was editing the JSON in Vim, I'd probably try using something like s/last_updated.*?//g.
It does not have OOTB support for that.

MongoDB structure and JSON, are they not the same?

Let's say I have a json object:
Object = {
param1: '',
param2: '',
param3: '',
param4: {
paramA: '',
paramB: '',
paramC: '',
paramD: [AnotherJsonObject1,AnotherJsonObject2]
}
}
Will my MongoDB structure not be similar? Would this type of structuring make the data (or some of it) less searchable?
Edit 1:
By less searchable I mean: if the top level entities have sub entities which themselves have sub-entities and so on. Will I be able to reach the lowest level entities with the same efficiency of those in the top level?
I currently depend heavily on JSON files in my website. Those files need not be indexed to searchable, BUT they would fit in the DB logically.
For example: I have a director, the director has the list of movies he created, every movie in this list has itself a list of actors who play in it, and every actor has a bio.
The bio in this example doesn't need to be indexed. I can just include a link to the file that contains the actor's bio, but I am wondering whether I can just add this to the DB because this way it will all fit in logically, or will 'unnecessary' data will harm the db's ability to perform efficiently.
Mongodb stores the document in a BSON format. It will appear similar to JSON structure.
The structure you explained seems to be a proper use case of nested documents.
You can query nested fields using the . operator
Would this type of structuring make the data (or some of it) less
searchable?
It depends on your nested data structure and the kind of queries on those fields. There may be some limitations or queries may be a bit more complicated in nested structure cases in case on nested docs. However, as far the searchability of your nested docs is concerned, it entirely depends on your use case.
For eg.
director:[movies:[{movieName:"movie1", actors:[{firstName:"will", lastName:"smith"}, {firstName:"bruce", lastName:"willis"}]}]]
In the above scenario, if you have search for a director where any of the directed movies has actor with firstName as will and lastName as smith may turn out to be a bit more complex.
a simple query like
{director.movies.actors.firstName:"will", director.movies.actors.lastName:"smith"}
may return a false response
The doc : director:[movies:[actors:actors:[{firstName:"will", lastName:"willis"}, {firstName:"bruce", lastName:"smith"}]]]
will also turn out to be a positive match.
Also, negation queries like where firstName!="bruce" will also return both the documents.
You may like to go through the mongodb docs for the same
For the first case, you can refer to elemMatch

Store multiple authors in to couchbase database

I am a newbie to "couchbase server". What i am looking for is to store 10 author names to couchbase document one after another. Someone please help me whether the structure is like a single document "author" and multiple values
{ id : 1, name : Auther 1}, { id : 2, name : Author 2}
OR store Author 1 to a document and Author 2 to another document.
If so, how can i increment the id automatically before "insert" command.
you can store all authors in a single document
{ doctype : "Authors",
AuthorNames:[
{
id: 1,
Name : "author1"
}
{
id: 2,
Name : "author2"
}
so on
]
IF you want to increase the ID, one is to enter one author name at a time in new document, but ID will be randomly generated and it would not in incremental order.
In Couchbase think more about how your application will be using the data more than how you are want to store it. For example, will your application need to get all of the 10 authors all of the time? If so, then one document might be worthwhile. Perhaps your application needs to only ever read/write one of the authors at a time. Then you might want to put each in their own, but have an object key pattern that makes it so you can get the object really fast. Objects that are used often are kept in the managed cache, other objects that are not used often may fall out of the managed cache...and that is ok.
The other factor is what your reads to writes ratio is on this data.
So like I said, it depends on how your application will be reading and writing your data. Use this as the guidance for how your data should be stored.
The single JSON document is pretty straight forward. The more advanced schema design where each author is in its own document and you access them via object key, might be a bit more complicated, but ultimately faster and more scalable depending on what I already pointed out. I will lay out an example schema and some possibilities.
For the authors, I might create each author JSON document with an object key like this:
authors::ID
Where ID is a value I keep in a special incrementer object that I will called authors::incrementer. Think of that object as a key value pair only holding an integer that happens to be the upper bound of an array. Couchbase SDKs include a special function to increment just such an integer object. With this, my application can put together that object key very quickly. If I want to go after the 5th author, I do a read by object key for "authors::5". If I need to get 10, I do a parallelized BulkGet function and get authors::1 through authors::10. If I want to get all the authors, I get the incrementer object, and get that integer and then to a parallelized bulk get. This way i can get them in order or in whatever order I feel like and I am accessing them by object key which is VERY fast in Couchbase.
All this being said, I could use a view to query this data or the upcoming "SQL for Documents" in Couchbase 4.0 or I can mix and match when I query and when I get objects by their key. Key access will ALWAYS be faster. It is the difference between asking a question then going and getting the object and simply knowing the answer and getting it immediately.

Is {0:{"id":1,...},{"id:2,....}} a other reprensation of a JSON list like [{"id":1,...},{"id:2,....}]

I have a little dilema. I have a backend/Frontend Application that comunicates with a JSON based REST Api.
The backend is written in PHP(Symfony/jmsserializer) and the Frontend in Dart
The communication between these two has a little Problem.
For most List Data the backend responds with a JSON like
[{"id":1,...},{"id:2,....}]
But for some it responds with
{"0":{"id":1,...}, "1":{"id:2,....}}
Now my Question is should the backend respond with the later at all or only with the first?
Problem
You usually have a list of objects. You sometimes get an object with sub-objects as properties.
Underlying issue
JS/JSON-Lists are ordered from 0 upwards which means that if you have PHP-Array which does not respect this rule json_encode will output a JS/JSON-Object instead using the numeric indices as keys.
PHP-Arrays are ordered maps which have more features that the JSON-Lists. Whenever you're using those extra features you won't be able to translate directly into JSON-Lists without loosing some information (ordering, keys, skipped indices, etc.).
PHP-Arrays and JSON-Objects on the other hand are more ore less equivalent in terms of features and can be correctly translated between each other without any loss of information.
Occurence
This happens if you have an initial PHP-Array of values which respects the JS/JSON-List rules but the keys in the list of objects are modified somehow. For example if you have a custom indexing order {"3":{}, "0":{}, "1":{}, "2":{}} or if you have (any) keys that are strings (ie. not numeric).
This always happens if you want to use the numeric id of the object as the numeric index of the list {"123":{"id": 123, "name": "obj"}} even if the numeric ids are in ascending order... so long as they are not starting from 0 upwards it's not a json-list it's a json-object.
Specific case
So my guess is that the PHP code in the backend is doing something like fetching a list of objects but its modifying something about it like inserting them by (string) keys into the array, inserting them in a specific order, removing some of them.
Resolution
The backend can easily fix this by using array_values($listOfObjects) before using json_encode which will reindex the entire list by numeric indices of ascending value.
Arrays and dictionaries are two separate types in JSON ("array" and "object" respectively), but PHP combines this functionality in a single array type.
PHP's json_encode deals with this as follows: an array that only contains numeric keys ($array = ['cat', 'dog']) is serialized as JSON array, an associative array that contains non-numeric keys ($array = ['cat' => 'meow', 'dog' => 'woof']) is serialized as JSON object, including the array's keys in the output.
If you end up with an associative array in PHP, but want to serialize it as a plain array in JSON, just use this to convert it to a numerical array before JSON encoding it: $array = array_values($array);