Phonograph2: how to use nextPageToken - palantir-foundry

I am syncing a data set with about 300k rows to Phonograph2 and need to make those records available via REST (End Point: /phonograph2/api/search/tables).
My requests looks as following (retrieving records after a certain timestamp):
{
"tableRids": [
"ri.phonograph2.main.table.xxxxxxxxxxxxxxx"
],
"filter": {
"type" : "range",
"range": {
"field": "reco_timestamp",
"gte": "1634408219000"
}
}
}
}
The response ends with:
"nextPageToken": "xxxxxxx"
This leads me to the following questions:
How do I use the "nextPageToken" to retrieve the next set of results?
Can the consumer get a list/array of pages to consume?
Can the number of hits which are displayed until the nextPageToken is written be configured?

We discussed this with our Palantir project support and will use Objects Gateway - as suggested. Thanks for pointing us into this direction.

Related

Mix DSL and URI query in Elasticsearch

I'm trying to write a python class that takes a properly formatted Elasticsearch JSON query as a parameter. So far so good.. however, as part of that class I'd like to also take a "to" and "from" parameter to limit the date range that the query runs over.
Is there a way to combine the JSON for the DSL query along with URI parameters to pass in the date and time constraint?
I know I can limit the time using a range parameter like this:
GET /my-awesome-index*/_search
{
"query":
{
"bool":
{
"must": [{"match_all": {}}],
"filter":
[
{"range": {"date_time": {"gte": "now-24h","lte": "now"}}},
{"match_phrase": {"super_cool_field": "foo"}},
{
"bool":
{
"should":
[
{"match_phrase": {"somewhat_cool_field_1": "bar"}},
{"match_phrase": {"somewhat_cool_field_2": "boo-ta"}}
],
"minimum_should_match": 1
}
}
]
}
}
}
and that's all well and good.. but, I want to craft my class to make the timeframe a variable. I also know I can limit the timeframe by submitting the URL like this..
GET /my-awesome-index*/_search?q=date_time:[1611732033412+TO+1611777796000]
{
"query":
{
"bool":
{
"must": [{"match_all": {}}],
"filter":
[
{"match_phrase": {"super_cool_field": "foo"}},
{
"bool":
{
"should":
[
{"match_phrase": {"somewhat_cool_field_1": "bar"}},
{"match_phrase": {"somewhat_cool_field_2": "boo-ta"}}
],
"minimum_should_match": 1
}
}
]
}
}
}
However, when I submit that Elasticsearch only seems to consider the timeframe from the URI and ignores the DSL JSON entirely.
Is there a way to get Elastic to consider / concatenate the two queries into one?
I'm considering programmatically making the range query something like this
range_part = '{{"range":{{"{}":{{"gte":"{}","lte":"{}"}}}}}}'.format(field,start,end)
And then dynamically inserting into any JSON the class takes.. but that seems cumbersome since there are so many formats available for the query and finding where to put the string etc..
Your suspicion regarding q is right. Quoting from the docs:
The q parameter overrides the query parameter in the request body. If both parameters are specified, documents matching the query request body parameter are not returned.
So q and query cannot be combined.
As to "programatically making the range query" -- since you don't know in what format you'll receive the queries, the standard approach would be to traverse the query json, find/create the correct bool-must/bool-filter and set the range query there.
But let's take a step back -- maybe your class shouldn't be expecting a pre-baked JSON query in the first place. Why not pass a query config array like
[
{
"type": "match_phrase",
"field": "super_cool_field",
"value": "foo"
},
...
]
and build the query yourself? That way you have full control over what gets passed downstream to ES. Plus, adding a date range query would be piece of cake...

Room object in Revit files

I followed the instruction in the link below to extract Room objects from Revit models:
https://forge.autodesk.com/blog/new-rvt-svf-model-derivative-parameter-generates-additional-content-including-rooms-and-spaces
I made the changes as instructed and tested the sample Revit file (rac_basic_sample_project.rvt). But, still I don't see the rooms or the viewables (phases). Below is fhe request I post. Am I missing anything?
{
"input": {
"urn": "dXJuOmFkc2sub2JqZWN0czpvcy5vYmplY3Q6YzQ4ZDUxNDNhMDRiNDAxNmI3ODYxY2NlMzQ2ZDkyNjdfZmFjaWxpdHlfOTUvZWIyYzMzNDgtNDAxYS00ZjQ3LTgwM2EtMjM1OGYwYmI0YjY2LnJ2dA"
},
"output": {
"destination": {
"region": "us"
},
"formats": [
{
"type": "svf",
"views": [
"3d"
],
"advanced": {
"generateMasterViews": true
}
}
]
}
}
I just tested the feature and I can see the room data:
The JSON payload seems ok, so try checking the following things:
Make sure you use the x-ads-force header (explained in the blog post you linked to); if you had already processed your Revit model before, triggering a new Model Derivative job would not do anything unless you force the translation
Try using another design (and from a newer version of Revit if possible); in my screenshot I'm using one of the official samples for Revit 2020, although I remember being able to get the room data from older samples as well
The room data is only available in certain "viewables" so make sure you're looking at the right one; for my sample project, for example, the room data is not available in the "{3D}" viewable but it is available in the "Working Drawings" viewable

Set variable from activity response Azure Data Factory

I have REST call in a Copy data activity which gives me a json response
My goal is to fetch the "hasNextPage" value and put it into the hasNext variable
I want to set it as a value in a "Set variable" activity that is connected to the "Copy data" activity, where I expected to acess the output in a way like this: #activity('Timesheets').output.data.timesheets.pageinfo.hasNext
I also want to be able to fetch the value of "cursor" from the last element in the "edges" array[]
I couldn't find any documentation on how to do this
Json response that I get from the Timesheets activity
[
{
"data": {
"timesheets": {
"pageInfo": {
"hasNextPage": true
},
"edges": [
{
"cursor": "81836000243260.81836000243275.",
"node": {
"parameter1": "2019-11-04",
"parameter2": "81836000243260"
}
},
{
"cursor": "81836000243252.81836000243260.81836000243275",
"node": {
"parameter1": "2019-11-04",
"parameter2": "81836000243260"
}
}
]
}
}
}
]
According to this, the output of an copy data activity don't have a data property you can access.
https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview
Copy Activity are made for copying large data, and it doesn't copy all rows in one go.
So it would not make sense to have an output dataset for a Copy Activity.
If your response from your REST service contains limited element, you can use an Web Activity to consume the REST service.
This have an output dataset you can access.
Followed by a foreach activity to iterate the data set. Remember to take into consideration parallel vs sequential iteration of you data set in the foreach activity.
Note in your service response, you get an array of "data" objects, so you need to address the first "data" element.

Microsoft Academic API, Knowledge graph search -- ReferenceIDs always empty

I'm using the graph search method of the Microsoft Academic API to retrieve citation IDs and reference IDs for a paper. However, while retrieving citation IDs works, the reference IDs field is always empty, even for papers which should have linked references. For example, retrieving this publication through the API:
POST https://westus.api.cognitive.microsoft.com/academic/v1.0/graph/search?mode=json
Content-Type: application/json
Host: westus.api.cognitive.microsoft.com
Ocp-Apim-Subscription-Key: my-api-key
{
"path": "/paper",
"paper": {
"select": [
"OriginalTitle",
"CitationIDs",
"ReferenceIDs"
],
"type": "Paper",
"id": [2059999322]
}
}
yields this response (I shortened the CitationIDs list for the sake of legibility):
{
"Results": [
[
{
"CellID": 2059999322,
"CitationIDs": "[630584464,2053566310,2239657960,...]",
"OriginalTitle": "Biodistribution of colloidal gold nanoparticles after intravenous administration: Effect of particle size",
"ReferenceIDs": ""
}
]
]
}
One thing I've noticed is that the graph schema provided here (at the bottom of the page) doesn't match the schema shown here (some of the attributes were renamed, e.g. NormalizedPaperTitle -> NormalizedTitle), so I thought the field was perhaps renamed to something else.
What is the correct query to get reference IDs through the API?
It should be ReferencesIDs, not ReferenceIDs

Query sync gateway buckets using N1QL

I wanted to know if it's possible to query the sync gateway buckets using N1QL? Does it behave as a normal couchbase bucket or because of the metadata that sync gateway adds, is it possible to query it only through Rest APIs?
Currently I have a webhooks handler, which keeps a replica of the documents residing under sync gateway buckets. I need to do some aggrgations which need to be pushed back to clients. So, can I do all this heavy lifting directly trhough n1ql on sync gateway or using webhooks which does the aggregations and simply pushes the updated docs to sync gateway is the right option?
PS: The webhooks+Rest APIS option works perfectly for me currently. Just wanted to understand if this hop is necessary or not?
Yes, it is possible to query the sync gateway using N1QL - you just can't change it (update/delete/insert), as it would break the revisions' metadata.
You need to ignore the documents with IDs starting with _sync: and the _sync property of each document, which contains internal metadata. The remaining attributes are your usual document.
Example:
select db.* from db where meta().id not like '_sync:%'
Result:
[
{
"_sync": {
"history": {
"channels": [
null,
null
],
"parents": [
-1,
0
],
"revs": [
"1-b7a15ec4afbb8c4d95e2e897d0ec0a2e",
"2-919b17d3f418100df7298a12ef2a84bb"
]
},
"recent_sequences": [
6,
7
],
"rev": "2-919b17d3f418100df7298a12ef2a84bb",
"sequence": 7,
"time_saved": "2016-05-04T18:54:26.952202911Z"
},
"name": "Document with two revisions"
}
]
Ignoring the _sync attribute:
select name from db where meta().id not like '_sync:%'
Result:
[
{
"name": "Document with two revisions"
}
]
In Couchbase 4.5 (BETA as of today) we can use the object_remove function - although I'd avoid it in favor of the previous more explicit syntax.
select object_remove(db, '_sync') from db where meta().id not like '_sync:%'
Result:
[
{
"$1": {
"name": "Document with two revisions"
}
}
]
I don't know what's your setup currently, but AFAIK, it's perfectly fine to keep querying the bucket throught N1QL while using the REST API for the data changes.