How to define which time_index attribute is used when notifing Quantumleap? - fiware

I am using FIWARE for some time-series data of heat-pumps. I use Orion 2.5.2 and Quantumleap 0.7.6.
My entities have a lot of attributes which are reported in batches. Those data-batches have individual time-stamps for each attribute, so the exact time of a measurement is known (this is also rather important). I use a little python tool to split these batches and send them to the iot-agent seperately via http, using the time-stamp parameter.
I end up with an entity like this:
...
"attrs": {
"temp_outdoor": {
"value": "-6.6",
"type": "Number",
"md": {
"TimeInstant": {
"type": "DateTime",
"value": 1613148707.7509995
}
},
"mdNames": [
"TimeInstant"
],
"creDate": 1612780352.3855166,
"modDate": 1613148716.1449544
},
"temp_return_flow": {
"value": "40.8",
"type": "Number",
"md": {
"TimeInstant": {
"type": "DateTime",
"value": 1613149016.394001
}
},
"mdNames": [
"TimeInstant"
],
"creDate": 1612780352.3855166,
"modDate": 1613149021.5991328
},
"TimeInstant": {
"value": 1613149101.1790009,
"type": "DateTime",
"mdNames": [],
"creDate": 1612780352.3855166,
"modDate": 1613149102.5100079
},
...
I don't really care about the creDate and modDate but about the TimeInstant in "md" of each attribute. Also the bottom "TimeInstant" attribute is just the value of the last Data-Point I think? I would like to use the "md" TimeInstant to create the time_index in CrateDB. Hence, the reported time has to be the custom-metadata one. I tried some different values while subscribing to Quantumleap but can't get it right.
Can someone tell me how to specify md->TimeInstant as value for time_index?
I find the documentation to be rather unconclusive on that topic and hope that someone has already solved that mistery and might let me in on it :)
Thanks!

Looking at your payload, it is not clear what's the NGSI model used, which would be the information need to help you. Anyhow, as reported by the documentation:
A fundamental element in the time-series database is the time index. You may be wondering... where is it stored? QuantumLeap will persist the time index for each notification in a special column called time_index.
The value that is used for the time index of a received notification is defined according to the following policy, which choses the first present and valid time value chosen from the following ordered list of options.
Custom time index. The value of the Fiware-TimeIndex-Attribute http header. Note that for a notification to contain such header, the corresponding subscription has to be created with an httpCustom block, as detailed in the Subscriptions and Custom Notifications section of the NGSI spec. This is the way you can instruct QL to use custom attributes of the notification payload to be taken as time index indicators.
Custom time index metadata. The most recent custom time index (the value of the Fiware-TimeIndex-Attribute) attribute value found in any of the attribute metadata sections in the notification. See the previous option about the details regarding subscriptions.
TimeInstant attribute. As specified in the FIWARE IoT agent documentation.
TimeInstant metadata. The most recent TimeInstant attribute value found in any of the attribute metadata sections in the notification. (Again, refer to the FIWARE IoT agent documentation.)
timestamp attribute.
timestamp metadata. The most recent timestamp attribute value found in any of the attribute metadata sections in the notification. As specified in the FIWARE data models documentation.
dateModified attribute. If you payed attention in the Orion Subscription section, this is the "dateModified" value notified by Orion.
dateModified metadata. The most recent dateModified attribute value found in any of the attribute metadata sections in the notification.
Finally, QL will use the Current Time (the time of notification reception) if none of the above options is present or none of the values found can actually be converted to a datetime.
This means that (and if you understand NGSI model, the documentation is quite clear), with the following payload
{
"id": "Room1",
"type": "Room",
"temperature": {
"value": 24.2,
"type": "Number",
"metadata": {
"myTime": {
"type": "DateTime",
"value": "2020-12-16T17:13:46.00Z"
}
}
},
"pressure": {
"value": 720,
"type": "Number",
"metadata": {
"TimeInstant": {
"type": "DateTime",
"value": "2020-12-16T17:13:46.00Z"
}
}
},
"dateObserved": "2021-02-02T00:00:00.00Z",
"dateCreated": "2019-09-24T12:49:02.00Z",
"dateModified": "2021-02-02T23:00:50.00Z",
"TimeInstant": {
"type": "DateTime",
"value": "2020-12-16T17:13:46.00Z"
}
}
If in the notification you set a custom header Fiware-TimeIndex-Attribute=dateObserved, time_index will be the value of dateObserved. If you set Fiware-TimeIndex-Attribute=myTime it will be the myTime attribute metadata linked to temperature. If not Fiware-TimeIndex-Attribute header is passed, the most recent value of metadata attribute TimeInstant will be picked. Suppose to remove the metadata attribute TimeInstant in the payload above, then attribute TimeInstant will be picked. If TimeInstant attribute is removed as well, dateModified value will be picked. In case that attribute is not received as well, current time is used.

Related

FIWARE, NGSI-LD - Understand the #context

I am creating a data model for a particular application and I did not start from any base model; since I did not start from any base model, the context below is sufficient, correct?
"#context": [
"https://schema.lab.fiware.org/ld/context",
"https://uri.etsi.org/ngsi-ld/v1/ngsi-ld-core-context-v1.3.jsonld"
]
My data model is not complicated, with just these properties and entity being more "complex":
"address": {
"type": "Property",
"value": {
"streetAddress": "",
"postalCode": "",
"addressLocality": "",
"addressCountry": ""
}
},
"location": {
"type": "Point",
"coordinates": [
,
]
},
{
"id": "urn:ngsi-ld:MeasurementSensor:",
"type": "MeasurementSensor",
"measurementVariable": {
"type": "Property",
"value": "Temperature"
},
"measurementValue": {
"type": "Property",
"value": 32.0,
"unitCode": "ºC",
"observedAt": "2022-05-10T11:09:00.000Z"
},
"refX": {
"type": "Relationship",
"object": "urn:ngsi-ld:"
},
"#context": [
"https://schema.lab.fiware.org/ld/context",
"https://uri.etsi.org/ngsi-ld/v1/ngsi-ld-core-context-v1.3.jsonld"
]
}
If you are using your own custom vocabulary you should declare your types and properties in your own LD #context. For instance,
{
"#context": [
{
"MeasurementSensor": "https://example.org/my-types/MesaurementSensor"
},
"https://schema.lab.fiware.org/ld/context",
"https://uri.etsi.org/ngsi-ld/v1/ngsi-ld-core-context-v1.3.jsonld"
]
}
it also seems you are not using URNs properly, you should check. unitCode seems to be broken as well, as it must follow the UN/CEFACT unit codes.
Nonetheless, I would not recommend to define your own vocabulary for sensors, given there are existing Vocabularies such as SAREF or W3C SOSA that can and should be reused.
I'm not a data model expert but I do know a thing or two about NGSI-LD and NGSI-LD brokers.
The #context you use is an array of "https://schema.lab.fiware.org/ld/context" and v1.3 of the core context.
"https://schema.lab.fiware.org/ld/context" in its turn is an array of "https://fiware.github.io/data-models/context.jsonld" and v1.1 of the core context ...
And, ""https://fiware.github.io/data-models/context.jsonld" doesn't define any of the three terms you are using, so, no need to give any context for that. The terms will be expanded using the default URL of the core context (the value of the #vocab member of the core context defines the default URL).
An NGSI-LD broker has the core context built-in, you don't need to pass it, so do yourself a favor, and get faster responses by not passing the core context to the broker. No need.
And, if you need a user context, pass it in the HTTP Header "Link" instead.
Host it somewhere (an NGSi-LD broker offers that service), so you don't force the poor broker to parse the #conterxt in each and every request.
Lastly, do follow Jose Manuels recommendations and use standard names for your attributes (and value for unitCode).

Request Body in BigQuery

Good Day,
I am testing a post method in another system using BigQuery as its data source.
I am currently testing the call method on BigQuery's live data to see if the API request gets a response.
What I want to know is: is the insertId meant to be the column I want to target and, in this case, the Client ID and the JSON object should have all the data within that Column ID?
"kind": "bigquery#tableDataInsertAllRequest",
"skipInvalidRows": false,
"ignoreUnknownValues": false,
"rows": [
{
"insertId": "ClientID",
"json": {
"ClientID": "55415",
"Client": "LANGA BRANCH",
"Project": "Customer Visits",
"Developer": "Bryan",
"Hours": "300"
}
}
]
}```
The insertId is an optional field. It can (and probably should) be omitted entirely, as it's used on a best effort basis for deduplication. Omitting it yields higher throughput: https://cloud.google.com/bigquery/quotas#streaming_inserts_without_insertid_fields
The REST reference for insertAll is here:
https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll

Elasticsearch dynamic mapping for object within attribute

Wondering if I can create a "dynamic mapping" within an elasticsearch index. The problem I am trying to solve is the following: I have a schema that has an attribute that contains an object that can differ greatly between records. I would like to mirror this data within elasticsearch if possible but believe that automatic mapping may get in the way.
Imagine a scenario where I have a schema like the following:
{
name: string
origin: string
payload: object // can be of any type / schema
}
Is it possible to create a mapping that supports this? I do not need to query the records by this payload attribute, but it would be great if I can.
Note that I have checked the documentation but am confused on if what elastic calls dynamic mapping is what I am looking for.
It's certainly possible to specify which queryable fields you expect the payload to contain and what those fields' mappings should be.
Let's say each doc will include the fields payload.livemode and payload.created_at. If these are the only two fields you'll want to perform queries on, and you'd like to disable dynamic, index-time mappings autogenerated by Elasticsearch for the rest of the fields, you can use dynamic templates like so:
PUT my-payload-index
{
"mappings": {
"dynamic_templates": [
{
"variable_payload": {
"path_match": "payload",
"mapping": {
"type": "object",
"dynamic": false,
"properties": {
"created_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"livemode": {
"type": "boolean"
}
}
}
}
}
],
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"origin": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
Then, as you ingest your docs:
POST my-payload-index/_doc
{
"name": "abc",
"origin": "web.dev",
"payload": {
"created_at": "2021-04-05 08:00:00",
"livemode": false,
"abc":"def"
}
}
POST my-payload-index/_doc
{
"name": "abc",
"origin": "web.dev",
"payload": {
"created_at": "2021-04-05 08:00:00",
"livemode": true,
"modified_at": "2021-04-05 09:00:00"
}
}
and verify with
GET my-payload-index/_mapping
no new mappings will be generated for the fields payload.abc nor payload.modified_at.
Not only that — the new fields will also be ignored, as per the documentation:
These fields will not be indexed or searchable, but will still appear in the _source field of returned hits.
Side note: if fields are neither stored nor searchable, they're effectively the opposite of enabled.
The Big Picture
Working with variable contents of a single, top-level object is quite standard. Take for instance the stripe event object — each event has an id, an api_version and a few other shared params. Then there's the data object that's analogous to your payload field.
Now, all is fine, until you need to aggregate on the contents of your payload. See, since the content is variable, so are the data paths / accessors. But wildcards in aggregation paths don't work in Elasticsearch. Scripts do but are onerous to maintain.
Back to stripe. They partially solved it through what they call polymorphic, typed hashes — as discussed in their blog on API design:
A pretty neat approach that's worth emulating.
P.S. I discuss dynamic templates in more detail in the chapter "Mapping Automation" of my ES Handbook.

Is it possible to have a dynamically sized array in a JSON Schema whose size is based on another value

I wish to store some building data for a calculator (for an existing game) in JSON. The thing is some can be upgraded, while others cannot. They are the same type of building though. Is there a way to dynamically set the size of the array size based on the value of the maximum levels, or am I expecting too much from JSON? I intend to open-source the tool and would like to have a schema that validates for anyone that adds a JSON file to it. With the below code, Visual Studio Code is giving me a warning about how maxItems expects an integer.
{
"$schema": "http://json-schema.org/draft-06/schema#",
"properties": {
"$schema": {
"type":"string"
},
"maxLevels": {
"description": "The maximum level that this building can be upgraded to",
"type":"integer",
"enum": [
1,
5
]
},
"goldCapacity": {
"description": "The maximum amount of gold that the building can hold.",
"type":"array",
"minItems": 1,
"maxItems": {"$ref": "#/properties/maxLevels"},
"items": {
"type":"integer",
"uniqueItems": true
}
}
}
}
There is the proposal for $data reference that allows to use values from the data as the values of certain schema keywords. Using $data reference you can:
{
"$schema": "http://json-schema.org/draft-06/schema#",
"properties": {
"maxLevel": {
"description": "The maximum level that this building can be upgraded to",
"type":"integer",
"minimum": 1,
"maximum": 5
},
"goldCapacity": {
"description": "The maximum amount of gold that the building can hold.",
"type":"array",
"minItems": 1,
"maxItems": {"$data": "1/maxLevel"},
"items": {
"type":"integer",
"uniqueItems": true
}
}
}
}
In this way, the value of property "maxLevel" (that should be >= 1 and <=5) determines the maximum number of items the array "goldCapacity" can hold.
Currently only ajv (JavaScript) implements $data reference, as far as I know, and it is being considered for the inclusion in the next versions of the specification (feel free to vote for it).
JSON (and JSON Schema) is basically a set of key/value pairs, so JSON alone has no real support to do what you want to do.
To accomplish what you want, construct the JSON with a default value for maxItems (e.g. 0), btain a reference to your JSON object and then update the value after you have your dynamic value using JavaScript:
jsonObj['maxItems'] = yourCalculatedValue;

How can i define a single, unique Key-Valuepair in JsonSchema?

The Schema should allow only the following constellation: {"status":"nok"}.
The Key must always be "status" and the value should allow "ok","nok","inProgress"
No differen or additional objects,... should be allowed
I have tried this:
{
"description": "blabla",
"type": "object",
"properties": {
"status": {
"type": "string",
"enum": [
"ok",
"inProgress",
"nok"
],
"required": true,
"additionalItems": false
}
},
"required": true,
"additionalProperties": false
}
This works, but this scheme allows that i can send the same key/value pair twice like {"status":"nok","status":"nok"}
I would be also happy, if it would work without this "object"-container that i'm using, because to reduce overhead.
Maybe someone knows a solution, thanks
There is a more fundamental issue with that input:
{"status":"nok","status":"nok"}
mainly: that input is not valid JSON. RFC 4627, section 2.2, explicitly states that "The names within an object SHOULD be unique". And in your case, they are not.
Which means the JSON parser you use can do whatever it wants to with such an input. Some JSON APIs will grab whatever value they come upon first, other parsers will grab the last value they read, others will even coalesce values -- none of this is illegal as per the RFC.
In essence: given such input, you cannot guarantee what the output of the JSON parser will be; and as such, you cannot guarantee JSON Schema validation of such an input either.