How to aggregate an ObjectSet after SearchAround - palantir-foundry

I have 2 phonograph objects, each one having millions of rows, which I have linked by using the Search Around methods.
On the example below, I filter to an Object Set of Flights based on the departure code, then I Search Around to the Passengers on those flights and then I filter again based on an attribute of Passengers Object.
const passengersDepartingFromAirport = Objects.search()
.flights()
.filter(flight => flight.departureAirportCode.exactMatch(airportCode))
.searchAroundPassengers()
.filter(passenger => passenger.passengerAttribute.exactMatch(value));
The result of the above code is:
LOG [2022-04-19T14:25:58.182Z] { osp: {},
objectSet:
{ objectSetProvider: '[Circular]',
objectSet: { type: 'FILTERED', filter: [Object], objectSet: [Object] } },
objectTypeIds: [ 'passengers' ],
emptyOrderByStep:
{ objectSet: '[Circular]',
orderableProperties:
{ attributeA: [Object],
attributeB: [Object],
attributeB: [Object],
...
Now, when I am trying to use take() or takeAsync() or to aggregate the result using groupBy(), I receive the below error:
RemoteError: INVALID_ARGUMENT ObjectSet:ObjectSetTooLargeForSearchAround with instance ID xxx.
Error Parameters: {
"RemoteError.type": "STATUS",
"objectSetSize": "2160870",
"maxAllowedSize": "100000",
"relationSide": "TARGET",
"relationId": "flights-passengers"
}
SafeError: RemoteError: INVALID_ARGUMENT ObjectSet:ObjectSetTooLargeForSearchAround with instance ID xxx
What could be the way to aggregate or to reduce the result of the above ObjectSet?

The current object storage infrastructure has a limit on the size of the "left side" or "starting object set" for a search around of 100,000 objects.
You can define and object set that uses a search around, which is what you're seeing as the result when you execute the Function before attempting any further manipulations.
Using take() or groupBy "forces" the resolution of the object set definition. I.e. you no longer need the pointer to the objects, but you need to actually materialize some data from each individual object to do that operation.
It's in this materialization step that the limit comes into play - the object sets are resolved and, if the object set at the search around step is larger than 100,000 objects, the request will fail with the above message.
There is ongoing work for Object Storage v2, which will eventually support much larger search-around requests, but for now it's necessary create a query pattern that results in less than 100,000 objects before making a search around.
In some cases it's possible to create an "intermediate" object type that represents a different level of granularity in your data or two invert the direction of your search around to find a way to address these limits.

Related

Any way to build up a dictionary / json object of sensor values in esphome?

I am trying to build up a dictionary / json object of sensor values in esphome. I have a sensor that sends me key / value pairs (e.g. one sensor reading could be { “temperature“: 25.1 }, another one could be { “speed“: 50.1 } and so forth) in very high frequency (milliseconds). What I would like to do is to collect data for these key / value pairs for a certain time span, for simplicity say ten seconds, and only then take the dictionary and post it to a web service. It would also somehow combine the readings for the same key if sent multiple times within the ten seconds time span for example by averaging them out, using a filter or whatever. So the final dictionary to be posted to the web service would then look like
{
“temperature“: 26.3,
“speed“: 52.5,
…
}
How could I achieve this - any idea / proposal?
Thanks and best regards
Dear stackoverflow community,
I found a solution to this issue. I am now using a global variable in esphome. This can be defined as follows:
globals:
- id: "my_dict"
type: std::map<std::string, std::string>
With this, I have a global map which I can use to store the key / value pairs. Adding a new key / value pair via a lambda works as simple as shown in the following (where in this example, the key is stored in the variable key and the value is stored in the variable value:
lambda: |-
id(my_dict)[key] = value;
Every ten seconds, I post the dictionary content to the web service and then clear the dictionary again:
interval:
- interval: 10s
then:
- http_request.post:
url: "https://<URL>"
json: |-
for ( auto item : id(my_dict) ) {
root[item.first] = item.second;
}
- lambda: |-
id(my_dict).clear();

How can I query for multiple values after a wildcard?

I have a json object like so:
{
_id: "12345",
identifier: [
{
value: "1",
system: "system1",
text: "text!"
},
{
value: "2",
system: "system1"
}
]
}
How can I use the XDevAPI SearchConditionStr to look for the specific combination of value + system in the identifier array? Something like this, but this doesn't seem to work...
collection.find("'${identifier.value}' IN identifier[*].value && '${identifier.system} IN identifier[*].system")
By using the IN operator, what happens underneath the covers is basically a call to JSON_CONTAINS().
So, if you call:
collection.find(":v IN identifier[*].value && :s IN identifier[*].system")
.bind('v', '1')
.bind('s', 'system1')
.execute()
What gets executed, in the end, is (simplified):
JSON_CONTAINS('["1", "2"]', '"2"') AND JSON_CONTAINS('["system1", "system1"]', '"system1"')
In this case, both those conditions are true, and the document will be returned.
The atomic unit is the document (not a slice of that document). So, in your case, regardless of the value of value and/or system, you are still looking for the same document (the one whose _id is '12345'). Using such a statement, the document is either returned if all search values are part of it, and it is not returned if one is not.
For instance, the following would not yield any results:
collection.find(":v IN identifier[*].value && :s IN identifier[*].system")
.bind('v', '1')
.bind('s', 'system2')
.execute()
EDIT: Potential workaround
I don't think using the CRUD API will allow to perform this kind of "cherry-picking", but you can always use SQL. In that case, one strategy that comes to mind is to use JSON_SEARCH() for retrieving an array of paths corresponding to each value in the scope of identifier[*].value and identifier[*].system i.e. the array indexes and use JSON_OVERLAPS() to ensure they are equal.
session.sql(`select * from collection WHERE json_overlaps(json_search(json_extract(doc, '$.identifier[*].value'), 'all', ?), json_search(json_extract(doc, '$.identifier[*].system'), 'all', ?))`)
.bind('2', 'system1')
.execute()
In this case, the result set will only include documents where the identifier array contains at least one JSON object element where value is equal to '2' and system is equal to system1. The filter is effectively applied over individual array items and not in aggregate, like on a basic IN operation.
Disclaimer: I'm the lead developer of the MySQL X DevAPI Connector for Node.js

Creating nodes and relations from JSON (dynamically)

I've got a couple hundred JSONs in a structure like the following example:
{
"JsonExport": [
{
"entities": [
{
"identity": "ENTITY_001",
"surname": "SMIT",
"entityLocationRelation": [
{
"parentIdentification": "PARENT_ENTITY_001",
"typeRelation": "SEEN_AT",
"locationIdentity": "LOCATION_001"
},
{
"parentIdentification": "PARENT_ENTITY_001",
"typeRelation": "SEEN_AT",
"locationIdentity": "LOCATION_002"
}
],
"entityEntityRelation": [
{
"parentIdentification": "PARENT_ENTITY_001",
"typeRelation": "FRIENDS_WITH",
"childIdentification": "ENTITY_002"
}
]
},
{
"identity": "ENTITY_002",
"surname": "JACKSON",
"entityLocationRelation": [
{
"parentIdentification": "PARENT_ENTITY_002",
"typeRelation": "SEEN_AT",
"locationIdentity": "LOCATION_001"
}
]
},
{
"identity": "ENTITY_003",
"surname": "JOHNSON"
}
],
"identification": "REGISTRATION_001",
"locations": [
{
"city": "LONDON",
"identity": "LOCATION_001"
},
{
"city": "PARIS",
"identity": "LOCATION_002"
}
]
}
]
}
With these JSON's, I want to make a graph consisting of the following nodes: Registration, Entity and Location. This part I've figured out and made the following:
WITH "file:///example.json" AS json_file
CALL apoc.load.json(json_file,"$.JsonExport.*" ) YIELD value AS data
MERGE(r:Registration {id:data.identification})
WITH json_file
CALL apoc.load.json(json_file,"$.JsonExport..locations.*" ) YIELD value AS locations
MERGE(l:Locations{identity:locations.identity, name:locations.city})
WITH json_file
CALL apoc.load.json(json_file,"$.JsonExport..entities.*" ) YIELD value AS entities
MERGE(e:Entities {name:entities.surname, identity:entities.identity})
All the entities and locations should have a relation with the registration. I thought I could do this by using the following code:
MERGE (e)-[:REGISTERED_ON]->(r)
MERGE (l)-[:REGISTERED_ON]->(r)
However this code doesn’t give the desired output. It creates extra "empty" nodes and doesn't connect to the registration node. So the first question is: How do I connect the location and entities nodes to the registration node. And in light of the other JSON's, the entities and locations should only be linked to the specific registration.
Furthermore, I would like to make the entity -> location relation and the entity - entity relation and use the given type of relation (SEEN_AT or FRIENDS_WITH) as label for the given relation. How can this be done? I'm kind of lost at this point and don’t see how to solve this. If someone could guide me into the right direction I would be much obliged.
Variable names (like e and r) are not stored in the DB, and are bound to values only within individual queries. MERGE on a pattern with an unbound variable will just create the entire pattern (including creating an empty node for unbound node variables).
When you MERGE a node, you should only specify the unique identifying property for that node, to avoid duplicates. Any other properties you want to set at the time of creation should be set using ON CREATE SET.
It is inefficient to parse through the JSON data 3 times to get different areas of the data. And it is especially inefficient the way your query was doing it, since each subsequent CALL/MERGE group of clauses would be done multiple times (since every previous CALL produces multiple rows, and the number of rows increases multiplicative). You can use aggregation to get around that, but it is unnecessary in your case, since you can just do the entire query in a single pass through the JSON data.
This may work for you:
CALL apoc.load.json(json_file,"$.JsonExport.*" ) YIELD value AS data
MERGE(r:Registration {id:data.identification})
FOREACH(ent IN data.entities |
MERGE (e:Entities {identity: ent.identity})
ON CREATE SET e.name = ent.surname
MERGE (e)-[:REGISTERED_ON]->(r)
FOREACH(loc1 IN ent.entityLocationRelation |
MERGE (l1:Locations {identity: loc1.locationIdentity})
MERGE (e)-[:SEEN_AT]->(l1))
FOREACH(ent2 IN ent.entityEntityRelation |
MERGE (e2:Entities {identity: ent2.childIdentification})
MERGE (e)-[:FRIENDS_WITH]->(e2))
)
FOREACH(loc IN data.locations |
MERGE (l:Locations{identity:loc.identity})
ON CREATE SET l.name = loc.city
MERGE (l)-[:REGISTERED_ON]->(r)
)
For simplicity, it hard-codes the FRIENDS_WITH and REGISTERED_ON relationship types, as MERGE only supports hard-coded relationship types.
So playing with neo4j/cyper I've learned some new stuff and came to another solution for the problem. Based on the given example data, the following can create the nodes and edges dynamically.
WITH "file:///example.json" AS json_file
CALL apoc.load.json(json_file,"$.JsonExport.*" ) YIELD value AS data
CALL apoc.merge.node(['Registration'], {id:data.identification}, {},{}) YIELD node AS vReg
UNWIND data.entities AS ent
CALL apoc.merge.node(['Person'], {id:ent.identity}, {}, {id:ent.identity, surname:ent.surname}) YIELD node AS vPer1
UNWIND ent.entityEntityRelation AS entRel
CALL apoc.merge.node(['Person'],{id:entRel.childIdentification},{id:entRel.childIdentification},{}) YIELD node AS vPer2
CALL apoc.merge.relationship(vPer1, entRel.typeRelation, {},{},vPer2) YIELD rel AS ePer
UNWIND data.locations AS loc
CALL apoc.merge.node(['Location'], {id:loc.identity}, {name:loc.city}) YIELD node AS vLoc
UNWIND ent.entityLocationRelation AS locRel
CALL apoc.merge.relationship(vPer1, locRel.typeRelation, {},{},vLoc) YIELD rel AS eLoc
CALL apoc.merge.relationship(vLoc, "REGISTERED_ON", {},{},vReg) YIELD rel AS eReg1
CALL apoc.merge.relationship(vPer1, "REGISTERED_ON", {},{},vReg) YIELD rel AS eReg2
CALL apoc.merge.relationship(vPer2, "REGISTERED_ON", {},{},vReg) YIELD rel AS eReg3
RETURN vPer1,vPer2, vReg, vLoc, eLoc, eReg1, eReg2, eReg3

Dynamodb batchgetItem API behavior for non existent keys

What is the behavior of DynamoDb BatchGetItem API if none of the keys exist in dynamodb?
Does it returns an empty list or throws an exception?
I am not sure about this after reading their doc: link
but I may be missing something.
BatchGetItem will not throw an exception. The results for those items will not be present in the Responses map in the response. This is also stated in the BatchGetItem documentation:
If a requested item does not exist, it is not returned in the result.
Requests for nonexistent items consume the minimum read capacity units
according to the type of read. For more information, see Capacity
Units Calculations in the Amazon DynamoDB Developer Guide.
This behavior is also easy to verify. This is for a Table with a hash key attribute named customer_id (the full example I am using is here):
dynamoDB.batchGetItem(new BatchGetItemSpec()
.withTableKeyAndAttributes(new TableKeysAndAttributes(EXAMPLE_TABLE_NAME)
.withHashOnlyKeys("customer_id", "ABCD", "EFGH")
.withConsistentRead(true)))
.getTableItems()
.entrySet()
.stream()
.forEach(System.out::println);
dynamoDB.batchGetItem(new BatchGetItemSpec()
.withTableKeyAndAttributes(new TableKeysAndAttributes(EXAMPLE_TABLE_NAME)
.withHashOnlyKeys("customer_id", "TTTT", "XYZ")
.withConsistentRead(true)))
.getTableItems()
.entrySet()
.stream()
.forEach(System.out::println);
Output:
example_table=[{ Item: {customer_email=jim#gmail.com, customer_name=Jim, customer_id=ABCD} }, { Item: {customer_email=garret#gmail.com, customer_name=Garret, customer_id=EFGH} }]
example_table=[]

Iterating through couchbase keys without a view

In couchbase, I was wondering if there was a way - WITHOUT using a view - to iterate through database keys. The admin interface appears to do this, but maybe its doing something special. What I'd like to is make a call like this to retrieve an array of keys:
$result = $cb->get("KEY_ALBERT", "KEY_FRED");
having the result be an array [KEY_ALEX, KEY_BOB, KEY_DOGBERT]
Again, I don't want to use a view unless there's no alternative. Doesn't look like its possible, but since the "view documents" in the admin appears to do this, I thought i'd double-check. I'm using the php interface if that matters.
Based on your comments, the only way is to create a simple view that emit only the id as par of the key:
function(doc, meta) {
emit( meta.id );
}
With this view you will be able to create query with the various options you need :
- pagination, range, ...
Note: you talk about the Administration Console, the console use an "internal view" that is similar to what I have written above (but not optimized)
I don't know about how couchbase admin works, but there are two options. First option is to store your docs as linked list, one doc have property (key) that points to another doc.
docs = [
{
id: "doc_C",
data: "somedata",
prev: "doc_B",
next: "doc_D"
},
{
id: "doc_D",
data: "somedata",
prev: "doc_C",
next: "doc_E"
}
]
The second approach is to use sequential id. You should have one doc that contain sequence and increment it on each add. It would be something like this:
docs = [
{
id: "doc_1",
data: "somedata"
},
{
id: "doc_2",
data: "somedata"
}
...
]
In this way you can do "range requests". To do this you form array of keys on server side:
[doc_1, doc_2 .... doc_N]and execute multiget query. Here is also a link to another example
The couchbase PHP sdk does support multiget requests. For a list of keys it will return an array of documents.
getMulti(array $ids, array $cas, int $flags) : array
http://www.couchbase.com/autodocs/couchbase-php-client-1.1.5/classes/Couchbase.html#method_getMulti