How are executors assigned in Code Workbooks in Palantir Foundry? - palantir-foundry

I have two code workbooks. If I run a computationally expensive transform in pyspark in workbook A and try to run something in workbook B, both queue in perpetuity until the build in workbook A is stopped and then the one in Workbook B runs immediately, as if it was waiting on the build in workbook A.
Are executors shared on all code workbooks for one user? What is going on?

For Foundry running in PalantirCloud, Executors are set by the spark configuration settings and managed by Rubix. This is to guarantee execution time with lower variance than fixed resources in YARN (and for the additional Rubix security features such as containerization)
As permissions in Foundry are set at the project level, if a user is running (in interactive mode) more than one code workbook in the same project using the same profile (same set of libraries and spark configurations), the SparkSession will be shared between the two to save on computational resources.
You can check the spark session by running
print(spark)
<pyspark.sql.session.SparkSession object at 0x7ffb605ef048>.
If I have another workbook in the same project, I would get the same result:
print(spark)
<pyspark.sql.session.SparkSession object at 0x7ffb605ef048>.
If I have another workbook in a different project using the same profile, I would get a different spark session:
print(spark)
<pyspark.sql.session.SparkSession object at 0x7f45800df7f0>
If it's important for it to run in a different SparkSession (and not share the executors) then, the user can make a slight modifications to the packages in one of the workbooks, or create another pre-warmed spark session profile (instead of the default one).

Related

Ssis empty excel columns causing error

Using Microsoft Visual Studio Community 2015.
Goal of project
-create "*\temp\email" directory
-start program to extract all emails that include xls attachments to the previously created folder
-use for each loop to cycle through each file in the folder, process, and shift to sql table.
The problem I am running into is caused by either a blank excel document (which is occasionally sent from a remote location) or some of the original xls reports only contain 5 columns instead of 6 that I have mapped now. Is there any way to separate files that include the correct columns from those that do not match?
** as Long as these two problems do not exist I can run the ssis package and everything runs without issue.
Control flow;
File System Task (creates directory --->Execute Process Task (xls extraction)-->ForEach Loop(Data flow Task "email2Sql")
Data Flow;
Excel Source (uses expression ExcelFilePath,#user:filepath) delay validation ==true
(columns are initially set to f1-f6 and are mapped to for ex. a,b,c,d,e,f. The Older files that get mixed in only include a,b,c,d,e.) This is where I want to be able to separate the xls files
Conditional Transformation split (column names are not in row 1, this helps remove "null" values)
Ole Db destination (sql table)
Sorry for the amount of reading, but for the first post I tried to include anything that I thought may be relevant.
There are some tools out there which would allow you to open the excel doc and read it. However, I think the simplest thing to do would be to use SSIS out of the box:
1 - add a file system task after the data flow which reads the file.
2 - Make the precedence constraint from the data flow to the file system task "failure." This will cause that to only fire when the data flow task fails.
3 - set the file task to move the "bad" files to another folder
This will allow you to loop through all the files and move the failed ones. Ultimately, the package will end in failure. If you don't want that behavior you can change the ForceExecutionResult property to be success. However, it might be good to know that there were problems with some files so that they can be addressed.
m

SSIS package to upload escel where user will be presented with a file picker window to choose file to be uploaded

I have a requirement wherein I will be executing an SSIS package from a command button on an MS Access front end. The requirement is to present a file picker window which will let user choose file to be uploaded and then once chosen it should be inserted into table in SQL Server.
Kindly let me know if this option can be made available in the package.
No, you cannot make this option available in an SSIS package.
The way I have handled this in the past is to write a .net application that uses the file picker to let the user choose the file, and then stores the user's filename/path in a table, and launches a job that calls the SSIS package, which gets the filename/path from the table, and then uploads that file.
I assume you can do the same with a MS Access front end, but I am not an Access expert.

How can I update LibreOffice Calc cells in real-time from a MySQL database?

I'm looking to write a program to update cells in LibreOffice Calc in real-time (or at least at some fixed tick) with data pulled from a MySQL database. Ideally, when the values in the database are updated, the corresponding cells in the spreadsheet would be updated such that any formulas or calculations existing in calc would continue to operate on the new values. So far, I have yet to find a way to dynamically and programatically insert data in this manner. Is it possible?
The LibreOffice component Base is a database front-end that handles queries, forms, and reports. While by default it uses an embedded version of HyperSQL database to manage the tables, it comes with drivers for any number of other back-end programs, including MySQL.
I think the easiest way to approach this would be to create a Base file with your MySQL database as its back-end (note Base will only be able to see tables and views from MySQL - it won't import queries; although you can save queries in the Base file if you want). Make sure to 'register' the Base file so the rest of LibreOffice can 'see' it. Once the file is registered, any open LibreOffice component can access the data from MySQL (Base file can be closed).
Now you can import any tables or views (from the MySQL component) or queries (from the Base file) into Calc: [Tutorial] Using registered datasources in Calc
Refreshing the imported data can be done through an API call. Here is an example in StarBasic code:
Sub refresh_DBRanges
Dim oDBRangesEnum
Dim oNext
oDBRangesEnum = thisComponent.DatabaseRanges.createEnumeration()
while oDBRangesEnum.hasMoreElements()
oNext = oDBRangesEnum.nextElement()
oNext.refresh()
wend
End Sub
Note in the second posting of the 'registered data sources' tutorial, it gives the API call to set the import ranges on a refresh timer.
Just a note that the Registered DataSources Tutorial is updated further down its page. It says the list of registered data sources can be accessed by hitting F4. That was true once but was changed with version 5. It's now Ctrl+Shft+F4.

Triggering execution of SSIS package when Files Arrive in a Folder

I have scenario in SSIS. I have a package which is simple data movement from flatfile to database.
I have a specific location and I want to execute that package when file comes on the folder.
Step-By-Step using WMI Event Watcher Task
Create a WMI Connection manager. Use Windows credentials when running locally (you must be an admin to access WMI event info), and enter credentials when running remotely (be sure to encrypt your packages!)
Add a new WMI Event Watcher Task. Configure the WQLQuerySource property with a WQL query to watch a specific folder for files.
WQL is SQL-like but slightly off, here's the example I'm using to watch a folder:
SELECT * FROM __InstanceCreationEvent WITHIN 10
WHERE TargetInstance ISA "CIM_DirectoryContainsFile"
and TargetInstance.GroupComponent= "Win32_Directory.Name=\"c:\\\\WMIFileWatcher\""
Breaking down this query is out of scope, but note the directory name in the filter and the string escaping required to make it work.
Create a For Each Loop and attach it to the WMI Event watcher Task. Set it with a Foreach File Enumerator, and set the folder to the folder you're watching.
In the Variable Mappings tab of the For Each Loop editor, assign the file name to a variable.
Use that variable name to perform actions on the file (for example, assign it to the ConnectionString property of a Flat File connection and use that connection in a Data Flow task) and then archive the file off somewhere else.
In the diagram below, this package will run until a file has been added, process it, and then complete.
To make the package run in perpetuity, wrap those two tasks in a For Loop with the EvalExpression set to true == true.
You can also consider registering object events using PowerShell and kicking off your SSIS package when those events are triggered. This requires a little less continuous overhead of having your package constantly running, but it adds an extra dependency.
The WMI solution is interesting, but the environment / setup requirements are a bit complex for my tastes. I prefer to resolve this using a ForEach Loop Container and a Execute SQL Wait task, both inside a For Loop Container.
I configure the ForEach Loop Container to loop over the files in a directory, pointing it at the expected file name. The only task inside this Container is a Script Task that increments a Files_Found variable - this will only happen when files are found.
Following the ForEach Loop Container is an Execute SQL task to wait between checks, e.g. WAITFOR DELAY '00:05:00' for a 5 minute delay.
Both that ForEach Loop and Execute SQL task are contained in a For Loop, which initializes and tests the Files_Found variable.
This solution is a lot simpler to manage - it doesn't require any particular setup, environment, permissions, stored credentials or knowledge of WMI syntax.

Execute SSIS task when a given list of files exist

I have a working SSIS task that executes once in a month and runs trough a given path, iterating though various XML files and inserting their contents on an SQL Server database.
On the last meeting with the area who determines the rules for that SSIS task, it was stated that the task can only run after all the files expected are present on the path. The files in question are in a number of 35. Their name is TUPSTATUS_SXX.xml, where XX estates for a number from 01 to 35 (35 telecommunication sectors here in Brazil). Those files are generated by the telecom companies that operate each one of those sectors.
So, my question is: how to run an SSIS task only after all the 35 files are present on our directory?
Instead of doing a busy wait on the folders with your SSIS process running continuously, why not set up a FileSystemWatcher which will trigger on file system conditions. Use it to invoke your SSIS package. you can implement this watcher in either a service(if required for a longer time), or a simple application .
Another way is to have a simple application/script to check for the count, which will be invoked from task scheduler.
You could use a Script Task that will count the files in the folder, and only execute further down the pipeline when 35 files exist. Although if the files are large and being transferred by FTP, the final file may exist, but not have fully transferred at that point.