How do I add time-key to properties in geoJSON during SELECT from Postgres - json

I have a PHP-script to SELECT data from Postgres in geoJSON-format. That works fine. This is the SQL-code.
SELECT jsonb_build_object(
'type', 'FeatureCollection',
'features', json_agg(features.feature)
)
FROM (SELECT jsonb_build_object(
'type', 'Feature',
'id', id,
'geometry', st_AsGeojson(st_SetSrid(st_MakePoint(split_part(to_jsonb(row)->'data'->'location'->>'value', ',', 2)::double precision, split_part(to_jsonb(row)->'data'->'location'->>'value', ',', 1)::double precision), 4326))::json,
'properties', to_jsonb(row) - 'id'
) AS feature
FROM (SELECT * FROM cs_fiets_json WHERE last_updated > '$sel_f') row) features;
I need to include a time-key with value into the properties (to be able to use TimeDimension in Leaflet) and I am stuck at that.
The time-value I need is somewhere deep in the row.
Played with json_set and json_insert. But don't know if, how and where to use that.
So I need "time": "2022-03-02T14:32:37.00Z" directly under "properties".
Thanks in advance!
{
"id": 1581083302,
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
5.0545584,
52.1574455
]
},
"properties": {
"data": {
"id": "359215101322999",
"voc": {
"type": "Number",
"value": 42,
"metadata": {
"dateCreated": {
"type": "DateTime",
"value": "2021-11-25T10:49:10.00Z"
},
"dateModified": {
"type": "DateTime",
"value": "2022-03-02T14:32:37.00Z"
}
}
},
"pm10": {
"type": "Number",
"value": 14,
"metadata": {
"dateCreated": {
"type": "DateTime",
"value": "2021-11-25T10:49:10.00Z"
},
"dateModified": {
"type": "DateTime",
"value": "2022-03-02T14:32:37.00Z"
}
}
},

Related

postgres to get match count when json value matches condition

I have a column in my db entitled supers with a JSON element inside.
Below is its structure:
[
{
"name": "Trample",
"category": "evergreen"
},
{
"name": "Spell",
"category": "keyword"
},
{
"name": "Trample token",
"category": "keyword"
},
{
"name": "Cast",
"category": "keyword"
},
{
"name": "Cost",
"category": "keyword"
},
{
"name": "Total",
"category": "keyword"
},
{
"name": "Power",
"category": "keyword"
},
{
"name": "Creature token",
"category": "keyword"
},
{
"name": "Control",
"category": "keyword"
},
{
"name": "Less",
"category": "keyword"
},
{
"name": "Elder",
"category": "cardType"
},
{
"name": "Dinosaur",
"category": "cardType"
},
{
"name": "Creature",
"category": "cardType"
},
{
"name": "Legendary",
"category": "cardType"
},
{
"name": "Cost x Less",
"category": "super"
},
]
I have the following query to get all rows that have similar values:
select name
from all_cards
where exists (select 1
from jsonb_array_elements(supers) f(x)
where x->>'category' = 'keyword'
and x->>'name' in ('Spell', 'Trample token', 'Cast', 'Cost', 'Total', 'Power', 'Creature token', 'Control', 'Less')
having count(*)>=8);
This query works to get all the names where count(*) >=8 but how do I get that count(*) out as a value, so I can sort the results based on that number?
I've tried jsonb_array_length, count, sum, nothing seems to be able to give me the answer I'm looking for.
You can do a cross join to get the count:
select ac.name, k.num_keywords
from all_cards ac
cross join lateral (
select count(*) as num_keywords
from jsonb_array_elements(ac.supers) f(x)
where x->>'category' = 'keyword'
and x->>'name' in ('Spell', 'Trample token', 'Cast', 'Cost', 'Total', 'Power', 'Creature token', 'Control', 'Less')
) k
where k.num_keywords >= 8

Import JSON with objects as nested to Elastic Search

i've log with thousands records of aggregated data in JSON:
{
"count": 25,
"domain": "domain.tld",
"geoips": {
"AU": 5,
"NZ": 20
},
"ips": {
"1.2.3.4": 5,
"1.2.3.5": 1,
"1.2.3.6": 1,
"1.2.3.7": 1,
"1.2.3.8": 1,
"1.2.3.9": 9,
"1.2.3.10": 7
},
"subdomains": {
"a.domain.tld": 1,
"b.domain.tld": 1,
"c.domain.tld": 1,
"domain.tld": 22
},
"tld": "tld",
"types": {
"1": 3,
"43": 22
}
}
and i have mapping on ES:
"mappings": {
"properties": {
"count": {
"type": "long"
},
"domain": {
"type": "keyword"
},
"ips": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"val": {
"type": "long"
}
}
},
"geoips": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"val": {
"type": "long"
}
}
},
"subdomains": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"val": {
"type": "long"
}
}
},
"tld": {
"type": "keyword"
},
"types": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"val": {
"type": "long"
}
}
}
}
}
Is there any simple way how import these lines to ES as nested objects ? If i use a bulk insert without modification, the ES will modify mapping by adding a new field for each IP/subdomain/GeoIP instead add it as simple key/val object.
Or only one way is regenerate JSON to key/val nested fields ?
Your mapping is already very good but the data doesn't fit it since the nested data type expects an array of objects, not a single object. So you'll need to transform your nested objects into array of key-value pairs like so:
...
"ips": [
{
"key": "1.2.3.4",
"val": 5
},
{
"key": "1.2.3.5",
"val": 1
},
...
],
"subdomains": [
{
"key": "a.domain.tld",
"val": 1
},
{
"key": "b.domain.tld",
"val": 1
},
...
]
...

Azure DataFactory ForEach Copy activity is not iterating through but instead pulling all files in blob. Why?

I have a pipeline in DF2 that has to look at a folder in blob and process each of the 145 files sequentially into a database table. After each file has been loaded into the table, a stored procedure should be trigger that will check each record and either insert it, or update an existing record into a master table.
Looking online I feel as though I have tried every combination of "Get MetaData", "For Each", "LookUp" and "Assign Variable" activates that have been suggested but for some reason my Copy Data STILL picks up all files at the same time and runs 145 times.
Recently found a blog online that I followed to use "Assign Variable" as it will be useful for multiple file locations but it does not work for me. I need to read the files as CSVs to tables and not binary objects so therefore I think this is my issue.
{
"name": "BulkLoadPipeline",
"properties": {
"activities": [
{
"name": "GetFileNames",
"type": "GetMetadata",
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"typeProperties": {
"dataset": {
"referenceName": "DelimitedText1",
"type": "DatasetReference",
"parameters": {
"fileName": "#item()"
}
},
"fieldList": [
"childItems"
],
"storeSettings": {
"type": "AzureBlobStorageReadSetting"
},
"formatSettings": {
"type": "DelimitedTextReadSetting"
}
}
},
{
"name": "CopyDataRunDeltaCheck",
"type": "ForEach",
"dependsOn": [
{
"activity": "BuildList",
"dependencyConditions": [
"Succeeded"
]
}
],
"typeProperties": {
"items": {
"value": "#variables('fileList')",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "WriteToTables",
"type": "Copy",
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobStorageReadSetting",
"wildcardFileName": "*.*"
},
"formatSettings": {
"type": "DelimitedTextReadSetting"
}
},
"sink": {
"type": "AzureSqlSink"
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"mappings": [
{
"source": {
"name": "myID",
"type": "String"
},
"sink": {
"name": "myID",
"type": "String"
}
},
{
"source": {
"name": "Col1",
"type": "String"
},
"sink": {
"name": "Col1",
"type": "String"
}
},
{
"source": {
"name": "Col2",
"type": "String"
},
"sink": {
"name": "Col2",
"type": "String"
}
},
{
"source": {
"name": "Col3",
"type": "String"
},
"sink": {
"name": "Col3",
"type": "String"
}
},
{
"source": {
"name": "Col4",
"type": "String"
},
"sink": {
"name": "Col4",
"type": "String"
}
},
{
"source": {
"name": "DW Date Created",
"type": "String"
},
"sink": {
"name": "DW_Date_Created",
"type": "String"
}
},
{
"source": {
"name": "DW Date Updated",
"type": "String"
},
"sink": {
"name": "DW_Date_Updated",
"type": "String"
}
}
]
}
},
"inputs": [
{
"referenceName": "DelimitedText1",
"type": "DatasetReference",
"parameters": {
"fileName": "#item()"
}
}
],
"outputs": [
{
"referenceName": "myTable",
"type": "DatasetReference"
}
]
},
{
"name": "CheckDeltas",
"type": "SqlServerStoredProcedure",
"dependsOn": [
{
"activity": "WriteToTables",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"typeProperties": {
"storedProcedureName": "[TL].[uspMyCheck]"
},
"linkedServiceName": {
"referenceName": "myService",
"type": "LinkedServiceReference"
}
}
]
}
},
{
"name": "BuildList",
"type": "ForEach",
"dependsOn": [
{
"activity": "GetFileNames",
"dependencyConditions": [
"Succeeded"
]
}
],
"typeProperties": {
"items": {
"value": "#activity('GetFileNames').output.childItems",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "Create list from variables",
"type": "AppendVariable",
"typeProperties": {
"variableName": "fileList",
"value": "#item().name"
}
}
]
}
}
],
"variables": {
"fileList": {
"type": "Array"
}
}
}
}
The Details screen of the pipleline output shows the pipeline loops for the number of items in the blob but each time, the Copy Data and Stored Procedure are run for each file in the list at once as opposed to one at a time.
I feel like I am close to the answer but missing one vital part. Any help or suggestions are GREATLY appreciated.
Your payload is not correct.
GetMetadata actvitiy should not use the same dataset with Copy Activity.
GetMetadata activity should reference a dataset with a folder, the folder contains all file you want to deal with. but your dataset has 'filename' parameter.
use the output of the getMetadata activity as the input of forEach activity.

Azure Data Factory Copy Activity

I have been working on this for a couple days and cannot get past this error. I have 2 activities in this pipeline. The first activity copies data from an ODBC connection to an Azure database, which is successful. The 2nd activity transfers the data from Azure table to another Azure table and keeps failing.
The error message is:
Copy activity met invalid parameters: 'UnknownParameterName', Detailed message: An item with the same key has already been added..
I do not see any invalid parameters or unknown parameter names. I have rewritten this multiple times using their add activity code template and by myself, but do not receive any errors when deploying on when it is running. Below is the JSON pipeline code.
Only the 2nd activity is receiving an error.
Thanks.
Source Data set
{
"name": "AnalyticsDB-SHIPUPS_06shp-01src_AZ-915PM",
"properties": {
"structure": [
{
"name": "UPSD_BOL",
"type": "String"
},
{
"name": "UPSD_ORDN",
"type": "String"
}
],
"published": false,
"type": "AzureSqlTable",
"linkedServiceName": "Source-SQLAzure",
"typeProperties": {},
"availability": {
"frequency": "Day",
"interval": 1,
"offset": "04:15:00"
},
"external": true,
"policy": {}
}
}
Destination Data set
{
"name": "AnalyticsDB-SHIPUPS_06shp-02dst_AZ-915PM",
"properties": {
"structure": [
{
"name": "SHIP_SYS_TRACK_NUM",
"type": "String"
},
{
"name": "SHIP_TRACK_NUM",
"type": "String"
}
],
"published": false,
"type": "AzureSqlTable",
"linkedServiceName": "Destination-Azure-AnalyticsDB",
"typeProperties": {
"tableName": "[olcm].[SHIP_Tracking]"
},
"availability": {
"frequency": "Day",
"interval": 1,
"offset": "04:15:00"
},
"external": false,
"policy": {}
}
}
Pipeline
{
"name": "SHIPUPS_FC_COPY-915PM",
"properties": {
"description": "copy shipments ",
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "RelationalSource",
"query": "$$Text.Format('SELECT COMPANY, UPSD_ORDN, UPSD_BOL FROM \"orupsd - UPS interface Dtl\" WHERE COMPANY = \\'01\\'', WindowStart, WindowEnd)"
},
"sink": {
"type": "SqlSink",
"sqlWriterCleanupScript": "$$Text.Format('delete imp_fc.SHIP_UPS_IntDtl_Tracking', WindowStart, WindowEnd)",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "COMPANY:COMPANY, UPSD_ORDN:UPSD_ORDN, UPSD_BOL:UPSD_BOL"
}
},
"inputs": [
{
"name": "AnalyticsDB-SHIPUPS_03shp-01src_FC-915PM"
}
],
"outputs": [
{
"name": "AnalyticsDB-SHIPUPS_03shp-02dst_AZ-915PM"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"style": "StartOfInterval",
"retry": 3,
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Day",
"interval": 1,
"offset": "04:15:00"
},
"name": "915PM-SHIPUPS-fc-copy->[imp_fc]_[SHIP_UPS_IntDtl_Tracking]"
},
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "SqlSource",
"sqlReaderQuery": "$$Text.Format('select distinct ups.UPSD_BOL, ups.UPSD_BOL from imp_fc.SHIP_UPS_IntDtl_Tracking ups LEFT JOIN olcm.SHIP_Tracking st ON ups.UPSD_BOL = st.SHIP_SYS_TRACK_NUM WHERE st.SHIP_SYS_TRACK_NUM IS NULL', WindowStart, WindowEnd)"
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "UPSD_BOL:SHIP_SYS_TRACK_NUM, UPSD_BOL:SHIP_TRACK_NUM"
}
},
"inputs": [
{
"name": "AnalyticsDB-SHIPUPS_06shp-01src_AZ-915PM"
}
],
"outputs": [
{
"name": "AnalyticsDB-SHIPUPS_06shp-02dst_AZ-915PM"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"style": "StartOfInterval",
"retry": 3,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Day",
"interval": 1,
"offset": "04:15:00"
},
"name": "915PM-SHIPUPS-AZ-update->[olcm]_[SHIP_Tracking]"
}
],
"start": "2017-08-22T03:00:00Z",
"end": "2099-12-31T08:00:00Z",
"isPaused": false,
"hubName": "adf-tm-prod-01_hub",
"pipelineMode": "Scheduled"
}
}
Have you seen this link?
They get the same error message and suggest using AzureTableSink instead of SqlSink
"sink": {
"type": "AzureTableSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
}
It would make sense for you too since your 2nd copy activity is Azure to Azure
It could be a red herring but I'm pretty sure "tableName" is a require entry in the typeProperties for a sqlSource. Yours is missing this for the input dataset. Appreciate you have a join in the sqlReaderQuery so probably best to put a dummy (but real) table name in there.
Btw, not clear why you are using $$Text.Format and WindowStart/WindowEnd on your queries if you're not transposing these values into the query; you could just put the query between double quotes.

JSON-schema: value based on oneOf

I have the following json-schema which defines 3 types of toys, to be used with this json GUI builder (github):
{
"id": "http://some.site.somewhere/entry-schema#",
"$schema": "http://json-schema.org/draft-04/schema#",
"description": "schema for toys in game",
"type": "object",
"required": [ "type" ],
"properties": {
"sawObj": {
"type": "object",
"oneOf": [
{ "$ref": "#/definitions/rect" },
{ "$ref": "#/definitions/circle" },
{ "$ref": "#/definitions/img" }
]
}
},
"definitions": {
"rect": {
"properties": {
"width": { "type": "integer" },
"height": { "type": "integer" },
"weight": { "type": "integer" }
},
"required": [ "width", "height", "weight" ],
"additionalProperties": false
},
"circle": {
"properties": {
"radius": { "type": "integer" },
"weight": { "type": "integer" }
},
"required": [ "radius", "weight" ],
"additionalProperties": false
},
"img": {
"properties": {
"path": { "type": "string" },
"width": { "type": "integer" },
"height": { "type": "integer" },
"weight": { "type": "integer" }
},
"required": [ "path", "width", "height", "weight" ],
"additionalProperties": false
}
}
}
If I pick the circle object for example I get an output:
{
"sawObj": {
"radius": 0,
"weight": 0
}
}
I want to add a value "type" which would always be constrained to reflect the users chosen type. So instead something like this:
{
"sawObj": {
"type": "circle",
"radius": 0,
"weight": 0
}
}
Where the type is automatically determined by the users choice from the oneOf properties section.
How can I do this with json-schema?
I was able to do this the enum value and only allowing a single value representing the type. I also set the type value as required so it is always automatically set to 'circle'.
"circle": {
"properties": {
"radius": {
"type": "integer"
},
"weight": {
"type": "integer"
},
"type": {
"type": "string",
"enum": ["circle"]
}
},
"required": ["radius", "weight", "type"],
"additionalProperties": false
}
Note: I want to point out this solution is not ideal. I'm hoping to find a better way to do this.