How do I create an output transform with a dynamic date and time in Code Repository? - palantir-foundry

Is it possible to create an output depending on the date and time? I would like to a use a dynamic name for the datasets that will be built every day for example. This would allow me to keep track of the dataset and the date will be displayed in the pathing. I attached an example below.
Ideally - the output will be named ".../datasets/dataset_2023_03_09_HH_MM"

Dynamic naming for transform outputs is not possible.
The technical reason is that the inputs/outputs/transforms are fixed at CI-time. When you press "commit" in Authoring or you merge a PR, a CI job is kicked off. In this CI job, all the relations between inputs and outputs are determined including the links between unique identifiers and datasets names. Output datasets that don't exist yet are created, and a "jobspec" is added to them. A "jobspec" is a snippet of JSON that describes to foundry how a particular dataset is generated.
Anytime you press the "build" button on a dataset (or build the dataset through a schedule or similar), the jobspec is consulted (not updated). It contains a reference to the repository, revision, source file and entry point of the function that builds this dataset. From there the build is orchestrated and kicks off, invoking your function to produce the final output.
Therefore, when the date is changing and the build is triggered, an error will be raised as the job specs are not valid anymore. The link between unique identifier and dataset name is broken.
To address your needs of further analysis based on the date of the day, the optimal way to proceed is to:
add a new column to your dataset which includes the date at which the build has run
build you dataset incrementally by specifying snapshot_input=[your_dataset] to ensure your are adding each day all exiting rows from your input dataset
perform your analysis by filtering on the date column
Please find here the complete documentation for Incremental transforms and Snapshot Input.

Related

Foundry Scenarios edited data materialized as dataset

Is it possible to materialize the edits made as part of a scenario into a dataset in foundry?
I want for each scenario to write out the primary keys of the objects edited as part of the scenario.
The motivation is that I need to run multiple processes to compute metrics as part of the changed values for each scenario, at a scale and runtime that is not possible to do with Functions.
Edit with details:
The thing is that I am not doing actual edits to the objects for the object type, I don't want to apply it.
I tested out the "Action Log" and it does not seem like this picks up "uncommitted" actions, meaning actions that is just run as part of a scenario. Also, it does not seem to be a link to the scenario it was a part of, even if the changes were committed.
The workflow is that I have Object Type A, and I define multiple scenarios S on a subset of the objects in A.
Each scenario might make something like 50k edits to a subset of A, through multiple Actions backed by a Function.
I save some of the scenarios. Now I am able to load these scenarios and "apply" them on A again in Workshop.
However I need to be able to get all the primary keys, and the edited values of A materialized into a dataset (for each scenario), as I need to run some transformation logic to compute a metric for the change as part of each scenario (at a scale and execution time not possible in Functions).
The Action Log did not seem to help a lot for this. How do I get the "edits" as part of a saved scenario into a dataset?
The only logic you can run BEFORE applying will be functions.
Not sure about your exact logic but Function's Custom Aggregations can be very powerful: Docs here
this might not directly let you calculate the Diff but you could use the scenario compare widgets in workshop to compare your aggregation between multiple Scenarios
e.g. you have a function that sums(total profit)
Your Workshop could show:
Current Data:
$10k
Scenario A:
$5k
Scneario B:
$13k
instead of like:
Scenario A:
-$5k
Scenario B:
+$3k
Afaik there's no first class way of doing this (yet).
"Applying" a scenario basically means you're submitting the actions queued on the scenario to the ontology. So neither the actions nor the ontology are aware that they came from a scenario.
What I've been doing to achieve what you're working on is using the "Action Log". It's still in Beta so you might need to ask for it to be enabled. It will allow you on each action to define a "log" object to be created that can track the pks of your edited objects per Action.
How I do the action log is:
My Action log has the "timestamp" of the action when they were run.
My Scenarios have the "timestamp" of when it was applied.
Since "Applying a Scenario" means -> actually running all actions on Ontology (underlying data) this gets me this structure if I sort everything by timestamp:
Action 1
Action 2
Scenario A applied
Action 3
Action 4
Scenario B applied
this allows you to do a mapping later on of Action 1/2 must come from Scenario A and Action 3+4 come from Scenario B.
EDIT: Apparently you might be able to use the Scenario RID directly in the Action Logs too (which is a recent addition I haven't adopted yet)
This won't allow you tho to compute (in transforms...) anything BEFORE applying a scenario tho

Fact table reconciliation or verify

In a data warehouse project how do I verify that my fact table loaded in a data warehouse DB through SSIS ETL load is correct with my staging table so that later I don't have incorrect reporting?
Good question, people creates different systems for this. So you understand this is one of most complex check/reconciliation process that developers built. I tried to give you three ways to do this. I would recommend first one because its easier and most efficient.
You can -
Post Load reports: create reports which will reconcile data after load. Write SQL to compare source data and target data - compare count, compare amount, compare null values, compare daily data etc. If the comparison generates flag/alert - this means some issue in load.
Check as you go : You can create some reusable function or mapping which will compare incoming source data and target data - compare count, compare amount, compare null values, compare daily data etc. and store in a table. A script will keep on checking those values and if there is any issue, script will notify support team.
Pre process check : Before starting any ETL, you can check source data - like count, null values, daily count etc. to verify how is the data, if there is any file missing etc.

Business Objects Complex prompt - how best to set up, using 4.0?

We're trying to create a template date prompt to be used across multiple universes, and also be used against multiple date fields (for instance, Transaction Date, Invoice Date, etc)
The prompt should display a list of values like the below (there's about 30 total):
Date Range START_DATE END_DATE
-------------------- ------------------------------ --------------
D: Yesterday 12/02/2015 12/03/2015
M: Month Before Last 10/01/2015 10/31/2015
M: Month to Date 12/01/2015 12/02/2015
Our initial attempt at this (creating a derived table, and then some aliases against the derived table, with one alias for each date type such as Transaction Date, Invoice Date, etc) was a failure - the sql generated is wrong, and includes the sql that's just supposed to provide the list of values. I think we need to use a different approach entirely.
Thanks for reading so far. I would greatly appreciate any ideas! Feel free to ask questions and I'll edit my notes to answer.
EDIT - we're using UNV (legacy Universe Design tool)
I'm going to assume you have an existing (dimension) table that contains a record for each date and the necessary columns to hold the different representations. You can also create a derived table for this.
Here are the steps to achieve what you described (sorry, no screenshots, this is off the top of my head):
Create the required dimension objects (based on your date table) in a separate class in the universe (you can hide this class at the end; the end user shouldn't see them).
Take one of the date dimension objects (e.g. Transaction Date, Invoice Date. …), enable the LOV option and edit it (which should bring up the query panel).
In the query panel, select all the dimension objects, created in step 1, that you want to show in your LOV. Important: the object holding the value to be returned, should be placed first in the query panel. Run the query (nothing will appear though).
Make sure that you enable the option to Export the LOV, otherwise your customisations will be lost upon exporting the universe. Optionally, enable the option to refresh the LOV each time the user calls it.
As you can't really define a single, reusable LOV in UDT that you can reference in different dimension objects, you'll have to perform this for each dimension object that you would want to have this LOV.
One way around this annoyance may to define the customised LOV once, note down the generated LOV name (about 8 alphanumeric characters long) and then replace the LOV name in the other dimensions with that LOV name. I'm can't guarantee that this will work though.
In contrast: with IDT you can define a customised LOV like this once (either in the Data Foundation Layer or the Business Layer), and then reference it as much as you want.

SSIS 2008 determine source column from destination column (programatically)

Not sure if there's any way to do this, but we're trying to programmatically determine dependencies in our ETL process, specifically whether modifying a column in our source data set will impact our ETLs and if so, which ones, ie. with a package 'myPackage' containing a data flow task that draws from 'sourceTable' and includes various columns including 'column1' and ultimately loads 'destinationTable' with 'column1New' is there any way to query the SSIS package itself to determine that column1New is based on column1 (does lineage provide anything of use here?)
Each column you use in a transformation of your package is associated an ID. The next component to which the column is passed down to will refer to that column using the lineage ID property, but is given a new id.
You could query the XML of your package to trace the path a column takes by creating a map of these IDs. However, this might be difficult to implement in a stable way.
This might help you on your way:
http://blogs.msdn.com/b/helloworld/archive/2008/08/01/how-to-find-out-which-column-caused-ssis-to-fail.aspx

SSIS Look up - ignore certain records

I am doing an SSIS look up transformation, looking up in a voyages table, however some of my records don't have voyages, so I get errors. Is there any way that I can skip the look up on those records.
To expand on unclepaul84's answer, you can configure your lookup component to perform one of three actions on a failed lookup.
Fail Component (the default and the action you have now from your question. Fails the job step (and maybe the entire package) when there are no matches to the row in a lookup attempt.)
Ignore Failure (Doesn't fail your job step, leaves a null in the field you brought in from the lookup i.e. Voyage name? )
Redirect Row (Doesn't fail your job step, allows you to direct rows with no voyage to a different processing flow for handling (i.e. if you want to put a default 'No Voyages' message in your Voyage Name field))
Alternatively, as John Saunders mentioned in his comment, you could test the VoyageID column and split your data flow into two paths depending upon if the VoyageID column is null. Since the Lookup component can handle this, I prefer using the single lookup rather than a conditional split followed by a lookup on one of the paths.
You could tell the lookup of component to ignore lookup failures.