How do I remove the latest transaction in a Foundry incremental build/transform? - palantir-foundry

I have an incremental dataset and would like to remove the last transaction. Below I attached a screenshot and added a border to the one I like removed. I want to remove it while preserving the dataset's "incrementality".

Yes, it is possible to delete one or several transactions in your current dataset which is incrementally built without breaking its incrementality.
The only way to delete a transaction is to use Foundry API calls. If you are not familiar with APIs, please find here the guidelines and we would strongly recommend you trying instructions on a test dataset first until you are comfortable with the process.
The options available depend on your downstream datasets:
SCENARIO 1: Your downstream datasets are running incrementally
You can roll-back your dataset to the latest successful transaction by using the API in foundry's Catalog API "updateBranch2" (branchesUpdate2) please find additional information in this StackOverflow Thread:
curl -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
“https://$HOSTNAME/foundry-
catalog/api/catalog/datasets/$DATASET_RID/branchesUpdate2/master" \
-d '"TRANSACTION_RID"'
The result is that your downstream datasets will continue to run incrementally.
SCENARIO 2: If your downstream datasets are NOT running incrementally
You can remove specific files.
The lifecycle of a transaction is as follows:
Start a new transaction setting the transaction type and the instructions of what you want the transaction to do
If you are not satisfied, you can abort the transaction. When you are happy with what it will do, you can commit the transaction (this
is the point of no return)
Therefore, for deleting specific files, you will have to use the following steps:
Use create transaction with a transaction type of DELETE
curl -X POST \
-H "Content-type: application/json" \
-H "Authorization: Bearer $TOKEN" \
“https://$HOSTNAME/api/v1/datasets/$DATASET_RID/transactions" \
-d '{"transactionType":"DELETE"}'
<DATASET_RID> you can find the Dataset RID in your URL.
ex. ri.foundry.main.dataset.c26f11c8-cdb3-4f44-9f5d-9816ea1c82da
Add files to Delete Transaction by listing and opening the logical paths of the files to delete
You can get the filepaths from the dataset Details tab under Files
ex: spark/part-00000-d5e90287-22bd-4840-a6a0-6eb1d98d0af3-c000.snappy.parquet
curl -X POST \
-H "Content-type: application/json" \
-H "Authorization: Bearer $TOKEN" \
“https://$HOSTNAME/foundry-catalog/api/catalog/datasets/$DATASET_RID/transactions/$TRANSACTION_RID/files/open/$FILEPATH'
<TRANSACTION_RID> the has been sent as a response body of the first API call
Commit your transaction
curl -X POST
-H "Content-type: application/json"
-H "Authorization: Bearer $TOKEN"
"https://$HOSTNAME/api/v1/datasets/$DATASET_RID/transactions/$TRANSACTION_RID/commit“
At any time, you can abortTransaction or get the files currently in your transaction with getFilesInTransactionPaged2.
Committing a DELETE transaction does not delete the underlying file from the backing file system—it simply removes the file reference from the dataset view.
DELETE transactions are breaking incrementality. Therefore, if this dataset is used on downstream incremental datasets, this action will break incrementality of their builds.

Related

How to retrieve all the executions of a workflow by multiple status through the GitHub API?

According to the API documentation to retrieve all executions of a workflow by a status we can use this command:
curl \
-H "Accept: application/vnd.github+json" \
-H "Authorization: token <TOKEN>" \
https://api.github.com/repos/OWNER/REPO/actions/workflows/WORKFLOW_ID/runs?status=in_progress
... But is there a way to retrieve all the executions of a workflow by multiple status with one command instead to launch multiple command?
I tried this command where there are multiple status parameters in the query string:
curl \
-H "Accept: application/vnd.github+json" \
-H "Authorization: token <TOKEN>" \
https://api.github.com/repos/OWNER/REPO/actions/workflows/WORKFLOW_ID/runs?status=in_progress&status=queued&status=requested
This solution doesn't return the expected result: so with this rest method you have to make a call for each status you need to retrieve information

How to update junit test result in to Xray using cypress javascript

I was looking at the documentation of the Xray plugin : https://docs.getxray.app/display/XRAY/Import+Execution+Results+-+REST#ImportExecutionResultsREST-JUnitXMLresultsMultipart
And what I found, is a bit confusing, after a few attempts. If I'm NOT trying to import executions using the multipart
In their examples I see no way to send the test execution key for it to be updated. Which is strange, because by importing without multipart, I can set it.
The below curl command I am using to upload test result.
curl -H "Content-Type: text/xml" -X POST -H "Authorization: Bearer $token" --data #"results/test-results.xml" https://xray.cloud.xpand-it.com/api/v1/import/execution/junit?projectKey=####
Anyone has any idea how to achieve this?

IBM Watson Tone Analyzer Invalid JSON Error

I must be missing something very simple here. I am following the Example Tutorial Instructions. I already created a free account and I have my API key and the URL. I copied the JSON file as instructed. Here is the command I issued:
curl -X POST -u "apikey:MY-API-KEY" \
--header "Content-Type: application/json" \
--data-binary PATH-TO-FILE \
"MY-URL"
Where MY-API-KEY equals my personal key specified on my Manage page.
Where PATH-TO-FILE equals the path to my local copy of tone.json
Where MY-URL equals the url specified on my Manage page.
Here is the error I am getting:
{"code":400,"sub_code":"C00012","error":"Invalid JSON input at line 1, column 2"}
I copied the JSON exactly from the directions:
{
"text": "Team, I know that times are tough! Product sales have been disappointing for the past three quarters. We have a competitive product, but we need to do a better job of selling it!"
}
I also attempted the following JSON and it received the same error:
{"text":"Hello world"}
What obvious thing am I missing here?
I knew it was going to be something silly.
The directions have this as an example:
curl -X POST -u "apikey:{apikey}" \
--header "Content-Type: application/json" \
--data-binary #{path_to_file}tone.json \
"{url}/v3/tone?version=2017-09-21"
For the path to the file, I had to keep the # symbol in front. so let's assume the full path to the file is /home/joe/Desktop/tone.json, that line has to be:
--data-binary #/home/joe/Desktop/tone.json \

Error while trying to run a MapReduce job on FIWARE-Cosmos using Tidoop REST API

I am following this guide on Github and I am not able run the example mapreduced job mentioned in Step 5.
I am aware that this file no longer exists:
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
And I am aware that the same file can now be found here:
/usr/lib/hadoop-0.20/hadoop-examples-0.20.2-cdh3u6.jar
So I form my call as below:
curl -v -X POST "http://computing.cosmos.lab.fiware.org:12000/tidoop/v1/user/$user/jobs" -d '{"jar":"/usr/lib/hadoop-0.20/hadoop-examples-0.20.2-cdh3u6.jar","class_name":"WordCount","lib_jars":"/usr/lib/hadoop-0.20/hadoop-examples-0.20.2-cdh3u6.jar","input":"testdir","output":"testoutput"}' -H "Content-Type: application/json" -H "X-Auth-Token: $TOKEN"
The input directory exists in my hdfs user space and there is a file called testdata.txt inside it. The testoutput folder does not exist in my hdfs user space since I know it creates problems.
When I execute this curl command, the error I get is {"success":"false","error":1} which is not very descriptive. Is there something I am missing here?
This has been just tested with my user frb and a valid token for that user:
$ curl -X POST "http://computing.cosmos.lab.fiware.org:12000/tidoop/v1/user/frb/jobs" -d '{"jar":"/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar","class_name":"wordcount","lib_jars":"/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar","input":"testdir","output":"outputtest"}' -H "Content-Type: application/json" -H "X-Auth-Token: xxxxxxxxxxxxxxxxxxx"
{"success":"true","job_id": "job_1460639183882_0011"}
Please observe the fat jar with the MapReduce examples in the "new" cluster (computing.cosmos.lab.fiware.org) is at /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar, as detailed in the documentation. /usr/lib/hadoop-0.20/hadoop-examples-0.20.2-cdh3u6.jar was the fat jar in the "old" cluster (cosmos.lab.fiware.org).
EDIT 1
Finally, the user had no account in the "new" pair of clusters of Cosmos in FIWARE LAB (storage.cosmos.lab.fiware.org and computing.cosmos.lab.fiware.org), where Tidoop runs, but in another "old" cluster (cosmos.lab.fiwre.org). Thus, the issue was fixed by simply provisioning an account in the "new" ones.

Restrict shared links to collaborators only through box API

I am trying to setup a pre-configured folder for users in the Enterprise where the share options are limited to collaborators only.
This feature is available in the web interface in the folder properties form under the security tab: "Restrict shared links to collaborators only"
The box content API (v2) allows for the creation and modification of shared links, this works as expected; but it is not clear whether/how we can restrict the shared link options.
The API docs for folder update: developers.box.com/docs/#folders-update-information-about-a-folder appears to indicate that there is an access attribute on the folder in addition to the shared_link attribute:
access: Can be open or collaborators. Type: object
I am not sure what the object value would be if not the "collaborators" String.
I have tried:
curl https://api.box.com/2.0/folders/FOLDER_ID \
-H "Authorization: Bearer ACCESS_TOKEN" \
-H "As-User: USER_ID" \
-d '{"access": "collaborators"}' -X PUT
and
curl https://api.box.com/2.0/folders/FOLDER_ID \
-H "Authorization: Bearer ACCESS_TOKEN" \
-H "As-User: USER_ID" \
-d '{"access": {"access": "collaborators"}}' -X PUT
both return a status 200, though they do not appear to do anything.
The access field is actually a sub-field of the shared_link field, which is why it's slightly indented in the documentation (this is kind of hard to see). If you want to create a shared link to a folder and restrict the access to collaborators, you can do so with a request like:
curl https://api.box.com/2.0/folders/FOLDER_ID \
-H "Authorization: Bearer ACCESS_TOKEN" \
-H "As-User: USER_ID" \
-d '{"shared_link": {"access": "collaborators"}}' -X PUT