I've got a process that reads a mapping configuration
stored in smartsheet.
This is where the process admins can control the data flow.
Ultimately this will be stored in a snowflake table to be used by other Talend flows.
I've brought it to this point which includes getting updated names formapped columns and sheets.
My objective now is to create JSON to add the new rows to Smartsheet and also update existing rows.
The only difference in the two calls is the inclusion of the row id in Smartsheet.
For this example, I am focusing on new rows.
I'm confident I can adapt any solution that addresses new rows to updating existing rows as it only involves one more JSON attribute.
I'm having a little trouble wrapping my head around Smartsheet's unique way of storing rows and columns in JSON responses.
Each Row is a collection of cells.
Each cell is a collection of attributes which includes the column id the cell belongs to.
Here is what I have at this point
Data to be converted to JSON in a table read from cache memory (tHashOutput and tHashInput via several steps addressing other requirements)
LOADED_DATE_TIME_STR
SSHEET_NAME
SSHEET_ID
SSHEET_ROW_ID
SSHEET_COL_ID
SSHEET_COL_NAME
DB_TBL_NAME
DB_COL_NAME
20220221232059
sheet_name_1_str
sheet_id_1_int
null
xxxxxxxxxxxxxxx1
sheet_1_col_name_a_str
null
null
20220221232059
sheet_name_2_str
sheet_id_2_int
null
xxxxxxxxxxxxxxx2
sheet_2_col_name_b_str
null
null
20220221232059
sheet_name_2_str
sheet_id_2_int
null
xxxxxxxxxxxxxxx3
sheet_2_col_name_c_str
null
null
The mapping configuration sheet has 5 important columns
that I have not yet incorported their mapping and ids into the process.
I will do this once I have an idea of where this part of the flow is headed
LOADED_DATE_TIME_STR = col_id_1_int
SSHEET_NAME = col_id_2_int
SSHEET_ID = col_id_3_int
SSHEET_COL_ID = col_id_4_int
SSHEET_COL_NAME = col_id_6_int
Output json format: (this will be a sub-element of a larger JSON tree).
Specifically, each array of cells defines a row and will be a sub-element
of the row within the Smartsheet API structure
{
"cells": [
{
"columnId": col_id_1_int,
"value": "20220221232059"
},
{
"columnId": col_id_2_int,
"value": "sheet_name_1_str"
},
{
"columnId": col_id_3_int,
"value": "sheet_id_1_int"
}
{
"columnId": col_id_4_int,
"value": "xxxxxxxxxxxxxxx1"
}
{
"columnId": col_id_6_int,
"value": "sheet_1_col_name_a_str"
}
]
},
{
"cells": [
{
"columnId": col_id_1_int,
"value": "20220221232059"
},
{
"columnId": col_id_2_int,
"value": "sheet_name_2_str"
},
{
"columnId": col_id_3_int,
"value": "sheet_id_2_int"
}
{
"columnId": col_id_4_int,
"value": "xxxxxxxxxxxxxxx2"
}
{
"columnId": col_id_6_int,
"value": "sheet_2_col_name_b_str"
}
]
},
...
Related
IN ADF Data Flow how can I flatten JSON into rows rather than columns?
{
"header": [
{
"main": {
"id": 1
},
"sub": [
{
"type": "a",
"id": 2
},
{
"type": "b",
"id": 3
}
]}]}
In ADF I'm using the flatten task and get the below result:
However the result I'm trying to achieve is merging the two id columns into one column like below:
Since both main_id and sub_id belong in the same column, instead of using 1 flatten to flatten all the data, flatten both main and sub separately.
I have taken the following JSON as source for my dataflow.
{
"header":[
{
"main":{
"id":1
},
"sub":[
{
"type":"a",
"id":2
},
{
"type":"b",
"id":3
}
]
},
{
"main":{
"id":4
},
"sub":[
{
"type":"c",
"id":5
},
{
"type":"d",
"id":6
}
]
}
]
}
I have taken 2 flatten transformations flattenMain and flattenSub instead of 1 which use the same source.
For flattenMain, I have unrolled by header and selected unroll root as header. Then created an additional column selecting source column header.main.id.
The data preview for flattenMain would be:
For flattenSub, I have unrolled by header.sub and selected unroll root as header.sub. Then created 2 additional column selecting source column header.sub.id as id column and header.sub.type as type column.
The data preview for flattenSub transformation would be:
Now I have applied union transformation on both flattenMain and flattenSub. I have applied union by using Name.
The final data preview for this Union transformation will give the desired result.
NOTE: All the highlighted rows in output images indicate the result that would be achieved when we use the JSON sample provided in the question.
Background: I work for a company that basically sells passes. Every order that is placed by the customer will contain N number of passes.
Issue: I have these JSON event-transaction files coming into a S3 bucket on a daily basis from DocumentDB (MongoDB). This JSON file is associated to the relevant type of event (insert, modify or delete) for every document key (which is an order in my case). The example below illustrates a "Insert" type of event that came through to the S3 bucket:
{
"_id": {
"_data": "11111111111111"
},
"operationType": "insert",
"clusterTime": {
"$timestamp": {
"t": 11111111,
"i": 1
}
},
"ns": {
"db": "abc",
"coll": "abc"
},
"documentKey": {
"_id": {
"$uuid": "abcabcabcabcabcabc"
}
},
"fullDocument": {
"_id": {
"$uuid": "abcabcabcabcabcabc"
},
"orderNumber": "1234567",
"externalOrderId": "12345678",
"orderDateTime": "2020-09-11T08:06:26Z[UTC]",
"attraction": "abc",
"entryDate": {
"$date": 2020-09-13
},
"entryTime": {
"$date": 04000000
},
"requestId": "abc",
"ticketUrl": "abc",
"tickets": [
{
"passId": "1111111",
"externalTicketId": "1234567"
},
{
"passId": "222222222",
"externalTicketId": "122442492"
}
],
"_class": "abc"
}
}
As we see above, every JSON file might contain N number of passes and every pass is - in turn - is associated to an external ticket id, which is a different column (as seen above). I want to use Pentaho Kettle to read these JSON files and load the data into the DW. I am aware of the Json input step and Row Normalizer that could then transpose "PassID 1", "PassID 2", "PassID 3"..."PassID N" columns into 1 unique column "Pass" and I would have to have to apply a similar logic to the other column "External ticket id". The problem with that approach is that it is quite static, as in, I need to "tell" Pentaho how many Passes are coming in advance in the Json input step. However what if tomorrow I have an order with 10 different passes? How can I do this dynamically to ensure the job will not break?
If you want a tabular output like
TicketUrl Pass ExternalTicketID
---------- ------ ----------------
abc PassID1Value1 ExTicketIDvalue1
abc PassID1Value2 ExTicketIDvalue2
abc PassID1Value3 ExTicketIDvalue3
And make incoming value dynamic based on JSON input file values, then you can download this transformation Updated Link
I found everything work dynamic in JSON input.
I am reverse engineering an app that sends queries to
SOMESERVERNAME.analysis.windows.net/public/reports/querydata via an HTTP POST of an JSON-structured query.
Some initial lines of a sample query are at the end of this message.
I can't find any documentation on this anywhere. I don't know if this is some secret API or what. I ultimately would like to just ignore the aggregations altogether and just dump the raw data, which seems to sit in some flat-file type container on the back-end, but without some API documentation I'm stuck with just re-running the super basic handful of queries I've been able to intercept.
Note: this app is an embedded analytics page created with PowerBI, but the only REST API I can find for PowerBI has nothing to do with querying, but just basic object management.
Thanks!
{
"version": "1.0.0",
"queries": [
{
"Query": {
"Commands": [
{
"SemanticQueryDataShapeCommand": {
"Query": {
"Version": 2,
"From": [
{
"Name": "s",
"Entity": "Sheet1"
}
],
"Select": [
{
"Aggregation": {
"Expression": {
"Column": {
"Expression": {
"SourceRef": {
"Source": "s"
}
},
"Property": "Total"
}
},
"Function": 0
},
"Name": "Sum(Sheet1.Total)"
}
],
"Where": [
{
"Condition": {
"In": {
"Expressions": [
{
"Column": {
"Expression": {
"SourceRef": {
"Source": "s"
}
},
"Property": "Year"
}
}
],
"Values": [
[
{
"Literal": {
"Value": "'2018'"
}
}
]
]
}
}
},
............
I have built a client that scrapes data off a specific Power BI report using the same API, but probably you'll be able to adapt it to your use case. Maybe we can even abstract the code into a more generalized Power BI client!
Having tinkered with the API for two days, I realised that there are many ways the data can be formatted:
"nested"/multidimensional data can be unflattened, flattened by 1 degree, etc.
a primary "table" of a result dataset (in data.PH) can reference others (in data.SH)
The basics are as follows:
A dataset is structured like a multidimensional table, with cells containing values.
In a set of cells, the first always has a field S that contains the schema of its and all subsequent cells.
The schema maps a field of each cell's object with a selection from your query, e.g. the G0 field with the queried column age.
My client seems to work only with a specific type of query (SemanticQueryDataShapeCommand), a specific nr of dimensions and a specific column marked as primary (via Binding.Primary). But maybe that helps! https://github.com/derhuerst/fetch-bvg-occupancy/blob/1ebb864b1ff7130f9d2f0ab031c6d78bcabdd633/lib/parse-dataset.js
The only documented way to use this API is through the ADOMD.NET or OleDb provider.
If you want to send a DAX/MDX query and retrieve data programmatically, there's a sample of how to front-end the service with a simple REST API here.
In Athena, I am reading some nested JSON files into a table. The field that actually contains the nested JSON has an inconsistent number of fields within it across the different files in the raw data.
Sometimes the data looks something like this:
{
"id": "9f1e07b4",
"date": "05/20/2018 02:30:53.110 AM",
"data": {
"a": "asd",
"b": "adf",
"body": {
"sid": {
"uif": "yes",
"sidd": "no",
"state": "idle"
}
},
"category": "scene"
}
}
Other times the data looks something like this:
{
"id": "9f1e07b4",
"date": "05/20/2018 02:30:45.436 AM",
"data": {
"a": "event",
"b": "state",
"body": {
"persona": {
"one": {
"movement": "idle"
}
}
},
"category": "scene"
}
}
Other times the "body" field contains both the "sid" struct and the "persona" struct.
As you can see the fields given within "body" are not always consistent. I tried to add all of the possible fields and their structures within my CREATE EXTERNAL TABLE query. However, the "data" column that contains the "body" field still does not fill and remains blank when I "preview table" in Athena.
In the CREATE TABLE DDL, is there a way to indicate that I want to fill all of columns that aren't present in the nested JSON of each file with null values?
Furthermore, the 'names' given to the fields in the query do not have to correspond to the key values in the raw JSON. It seems Athena is simply reading the structure and nothing else. Is there a way to indicate which JSON key corresponds to which Athena field name directly? So that if some fields are missing from the "body" of one file, Athena can know which one is missing and fill it in as null?
I'm Working on AngularJS.
In this part of the project my goal is to obtain a JSON structure after filling a form with some particulars values.
Here's the fiddle of my simple form: Fiddle
With the form I will do a query to KairosDB, that is my NoSql Database, I will query data from it by a JSON object. The form is structured in this way:
a Name
a certain Number of Tags, with Tag Id ("ch" for example) and tag value ("932" for example)
a certain Number of Aggregators to manipulate data coming from DB
Start Timestamp and End Timestamp (now they are static and only included in the final JSON Object)
After filling this form, with my code I'll obtain for example this JSON object:
{
"metrics": [
{
"tags": [
{
"id": "ch",
"value": "932"
},
{
"id": "ch",
"value": "931"
}
],
"aggregators": {
"name": "sum",
"sampling": [
{
"value": "1",
"unit": "milliseconds",
"type": "SUM"
}
]
}
}
],
"cache_time": 0,
"start_absolute": 123,
"end_absolute": 1234
}
Unfortunately, KairosDB accepts a different structure, and as you could see, Tag id "ch" doesn't hase an "id" string before, or for example, Tag values coming from the same tag id are grouped together
{
"metrics": [
{
"tags": {
"ch": [
"932",
"931"
]
},
"name": "AIENR",
"aggregators": [
{
"name": "sum",
"sampling": {
"value": "1",
"unit": "milliseconds"
}
}
]
}
],
"cache_time": 0,
"start_absolute": 1367359200000,
"end_absolute": 1386025200000
}
My question is: Is there a way to obtain the JSON structure like the one accepted by Kairos DB with an Angular JS form?. Thanks to everyone.
I've seen this topic as the one more similar to mine but it isn't in AngularJS.
Personally, I'd do the refactoring work in the backend - Have what ever server interfaces sends and receives data do the manipulation - Otherwise you'll end up needing to refactor your data inside Angular anywhere you want to use that dataset.
Where as doing it in the backend would put it in a single access point.
Of course, you could do it in Angular, just replace userString in the submitData method with a copy of the array and replace the tags section with data in the new format, and likewise refactor the returned result to the correct format when you get a reply.