Best way to modify downstream references to a code workbook dataset to point to the new code repository dataset created using helper? - palantir-foundry

When using the "Export to Code Repository Helper" tool in an existing code workbook, what is the most efficient way to modify downstream dependencies to point to the newly created Code Repository dataset?
We want to modify all downstream dependencies, not a subset.

The most efficient way would be to replace the logic of the original source dataset, without changing the actual dataset (i.e. by keeping the dataset with existing RID.) You can do it by:
Removing the job spec of the dataset (dataset view -> details -> Job Spec -> edit -> delete)
Setting the output of your code in repository to be the existing dataset.
This way you do not have to modify any downstream dependencies.

Related

How to import excel/csv with "File Import" widget in Foundry's Slate?

Context:
For a data pipeline we need to ingest excel spreadsheets directly into foundry (arriving via email). In order to avoid any manual handling error, we'd like to build a small slate app that basically just uploads an excel sheet and automatically appends it to an existing dataset (given schema, headers, etc.).
Unfortunately, there is very little documentation on the "File Import" widget or the API that gets called when drag and dropping a file into a folder.
Idea: Is there a way of uploading a file with slate? Could this file then be added to a dataset, similarly as with the prompt that opens when dropping it into a folder?
You actually don't have to build a Slate app to do this! Datasets that are made up of underlying .csv files support new additions of files directly.
Note: All of the following screenshots are from the dataset preview page.
For example, the following dataset I created from 4 .csv files:
And I can click on the Import button in the top right to add in more files (with the same schema, or not. Depends on if you want to strictly adhere to your applied schema.
If you have already applied a schema, you can also simply Import new files on top of the dataset, but the schemas of the files must exactly match those already present, otherwise your dataset will fail when attempted to be read.

Ssis empty excel columns causing error

Using Microsoft Visual Studio Community 2015.
Goal of project
-create "*\temp\email" directory
-start program to extract all emails that include xls attachments to the previously created folder
-use for each loop to cycle through each file in the folder, process, and shift to sql table.
The problem I am running into is caused by either a blank excel document (which is occasionally sent from a remote location) or some of the original xls reports only contain 5 columns instead of 6 that I have mapped now. Is there any way to separate files that include the correct columns from those that do not match?
** as Long as these two problems do not exist I can run the ssis package and everything runs without issue.
Control flow;
File System Task (creates directory --->Execute Process Task (xls extraction)-->ForEach Loop(Data flow Task "email2Sql")
Data Flow;
Excel Source (uses expression ExcelFilePath,#user:filepath) delay validation ==true
(columns are initially set to f1-f6 and are mapped to for ex. a,b,c,d,e,f. The Older files that get mixed in only include a,b,c,d,e.) This is where I want to be able to separate the xls files
Conditional Transformation split (column names are not in row 1, this helps remove "null" values)
Ole Db destination (sql table)
Sorry for the amount of reading, but for the first post I tried to include anything that I thought may be relevant.
There are some tools out there which would allow you to open the excel doc and read it. However, I think the simplest thing to do would be to use SSIS out of the box:
1 - add a file system task after the data flow which reads the file.
2 - Make the precedence constraint from the data flow to the file system task "failure." This will cause that to only fire when the data flow task fails.
3 - set the file task to move the "bad" files to another folder
This will allow you to loop through all the files and move the failed ones. Ultimately, the package will end in failure. If you don't want that behavior you can change the ForceExecutionResult property to be success. However, it might be good to know that there were problems with some files so that they can be addressed.
m

How to do File System Task in SSIS depending on Result of Data Flow

I'm writing a (what I thought to be a) simple SSIS package to import data from a CSV file into a SQL table.
On the Control Flow task I have a Data Flow Task. In that Data Flow Task I have
a Flat File Source "step",
followed by a Data Conversion "step",
followed by a OLE DB destination "step".
What I want to do is to move the source CSV file to a "Completed" folder or to a "Failed" folder based on the results of the Data Flow Task.
I see that I can't add a File System step inside the Data Flow Task, but I have to do it in the Control Flow tab.
My question is how do I do a simple thing like assign a value to a variable (I saw how to create variable and assign them a value at the bottom pane of Data Tools (2012)) depending of if the "step" succeeds or fails?
Thanks!
(You can tell by my question that I'm an SSIS rookie - and don't assume I can write a C# script, please)
I have used VB or C# scripts to accomplish this myself. Since you do not want to use scripts I would recommend using a different path for the project to flow. Have your success path lead to moving the file to completed and failure path lead to moving the file to failed. This keeps it simple and accomplishes what you are looking for.

Temporarily disable MS Access data macros

I have several Access files with data from a group of users that I'm importing into one master file. The tables in the user files are each configured with a Before Change data macro that adds a timestamp each time the user edits the data.
("Data macros" are similar to triggers in SQL Server. They are different from UI macros. For more info, see this page.)
I'd like to import these timestamps into the master file, but since the master file is a clone of the user files, it also contains the same set of data macros. Thus, when I import the data, the timestamps get changed to the time of the import, which is unhelpful.
The only way I can find to edit data macros is by opening each table in Design View and then using the Ribbon to change the settings. There must be an easier way.
I'm using VBA code to perform the merge, and I'm wondering if I can also use it to temporarily disable the data macro feature until the merge has been completed. If there is another way to turn the data macros off for all files/tables at once, even on the users' files/tables, I'd be open to that too.
Disable the code? No. Bypass the code? Yes.
Use a table/field as a flag. Set the status before importing. Check the status of this flag in your event code, and decide if you want to skip the rest of the code. I.e.
If [tblSkipFlag].[SkipFlag] = false
{rest of data macros}
EndIf
Another answer here explains how you can use the (almost-)undocumented SaveAsText and LoadFromText methods with the acTableDataMacro argument to save and retrieve the Data Macros to a text file in XML format. If you were to save the Data Macro XML text for each table, replace ...
<DataMacro Event="BeforeChange"><Statements>
... with ...
<DataMacro Event="BeforeChange"><Statements><Action Name="StopMacro"/>
... and then write the updated macros back to the table then that would presumably have the effect of "short-circuiting" those macros.

How do I load Excel files selectively?

I have an SSIS package that needs to lookup two different types of excel files, type A and type B and load the data within to two different staging tables, tableA and tableB. The formats of these excel sheets are different and they match their respective tables.
I have thought of putting typeA.xls and typeB.xls in two different folders for simplicity(folder paths to be configureable). The required excel files will then be put here through some other application or manually.
What I want is to be able to have my dtsx package to scan the folder and pick the latest unprocessed file and load it ignoring others and then postfix the file name with '-loaded' (typeAxxxxxx-loaded.xls). The "-loaded" in the filename is how I plan to differentiate between the already loaded files and the ones yet to be loaded.
I need advice on:
a) How to check that configured folder for the latest file ie. without the '-loaded' in the filename and load it? ..and then after loading it, rename the same file in that configured folder with the '-loaded' postfixed.
b) Is this the best approach to doing this or is there a better way?
Thanks.
You can do it this way, but it might require several complex string expressions.
E.g. create a ForEach loop over .xls files, inside the loop add an empty script task, then a data flow to load this file. Connect them with a precedence constraint and make it conditional: precedence constraint expression will the check if file name does not end with -loaded.xls. You may either do it in script task or purely using SSIS expression on precedence constraint. Finally, add File System Task to rename the file. You may need to build new file name with another expression.
It might be easier to create two folders: Incoming for new unprocessed files, and Loaded for the files you've processed, and just move the .xls to this folder after processing without renaming. This will avoid the first conditional expression (and dummy script task), and simplify the configuration of File System task.
You can get the SQL File watcher Task and add it to your SSIS. I think this is a cleaner way to do what you want.
SQL File Watcher