Error while trying to run a MapReduce job on FIWARE-Cosmos using Tidoop REST API - fiware

I am following this guide on Github and I am not able run the example mapreduced job mentioned in Step 5.
I am aware that this file no longer exists:
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
And I am aware that the same file can now be found here:
/usr/lib/hadoop-0.20/hadoop-examples-0.20.2-cdh3u6.jar
So I form my call as below:
curl -v -X POST "http://computing.cosmos.lab.fiware.org:12000/tidoop/v1/user/$user/jobs" -d '{"jar":"/usr/lib/hadoop-0.20/hadoop-examples-0.20.2-cdh3u6.jar","class_name":"WordCount","lib_jars":"/usr/lib/hadoop-0.20/hadoop-examples-0.20.2-cdh3u6.jar","input":"testdir","output":"testoutput"}' -H "Content-Type: application/json" -H "X-Auth-Token: $TOKEN"
The input directory exists in my hdfs user space and there is a file called testdata.txt inside it. The testoutput folder does not exist in my hdfs user space since I know it creates problems.
When I execute this curl command, the error I get is {"success":"false","error":1} which is not very descriptive. Is there something I am missing here?

This has been just tested with my user frb and a valid token for that user:
$ curl -X POST "http://computing.cosmos.lab.fiware.org:12000/tidoop/v1/user/frb/jobs" -d '{"jar":"/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar","class_name":"wordcount","lib_jars":"/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar","input":"testdir","output":"outputtest"}' -H "Content-Type: application/json" -H "X-Auth-Token: xxxxxxxxxxxxxxxxxxx"
{"success":"true","job_id": "job_1460639183882_0011"}
Please observe the fat jar with the MapReduce examples in the "new" cluster (computing.cosmos.lab.fiware.org) is at /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar, as detailed in the documentation. /usr/lib/hadoop-0.20/hadoop-examples-0.20.2-cdh3u6.jar was the fat jar in the "old" cluster (cosmos.lab.fiware.org).
EDIT 1
Finally, the user had no account in the "new" pair of clusters of Cosmos in FIWARE LAB (storage.cosmos.lab.fiware.org and computing.cosmos.lab.fiware.org), where Tidoop runs, but in another "old" cluster (cosmos.lab.fiwre.org). Thus, the issue was fixed by simply provisioning an account in the "new" ones.

Related

How do I remove the latest transaction in a Foundry incremental build/transform?

I have an incremental dataset and would like to remove the last transaction. Below I attached a screenshot and added a border to the one I like removed. I want to remove it while preserving the dataset's "incrementality".
Yes, it is possible to delete one or several transactions in your current dataset which is incrementally built without breaking its incrementality.
The only way to delete a transaction is to use Foundry API calls. If you are not familiar with APIs, please find here the guidelines and we would strongly recommend you trying instructions on a test dataset first until you are comfortable with the process.
The options available depend on your downstream datasets:
SCENARIO 1: Your downstream datasets are running incrementally
You can roll-back your dataset to the latest successful transaction by using the API in foundry's Catalog API "updateBranch2" (branchesUpdate2) please find additional information in this StackOverflow Thread:
curl -X POST \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
“https://$HOSTNAME/foundry-
catalog/api/catalog/datasets/$DATASET_RID/branchesUpdate2/master" \
-d '"TRANSACTION_RID"'
The result is that your downstream datasets will continue to run incrementally.
SCENARIO 2: If your downstream datasets are NOT running incrementally
You can remove specific files.
The lifecycle of a transaction is as follows:
Start a new transaction setting the transaction type and the instructions of what you want the transaction to do
If you are not satisfied, you can abort the transaction. When you are happy with what it will do, you can commit the transaction (this
is the point of no return)
Therefore, for deleting specific files, you will have to use the following steps:
Use create transaction with a transaction type of DELETE
curl -X POST \
-H "Content-type: application/json" \
-H "Authorization: Bearer $TOKEN" \
“https://$HOSTNAME/api/v1/datasets/$DATASET_RID/transactions" \
-d '{"transactionType":"DELETE"}'
<DATASET_RID> you can find the Dataset RID in your URL.
ex. ri.foundry.main.dataset.c26f11c8-cdb3-4f44-9f5d-9816ea1c82da
Add files to Delete Transaction by listing and opening the logical paths of the files to delete
You can get the filepaths from the dataset Details tab under Files
ex: spark/part-00000-d5e90287-22bd-4840-a6a0-6eb1d98d0af3-c000.snappy.parquet
curl -X POST \
-H "Content-type: application/json" \
-H "Authorization: Bearer $TOKEN" \
“https://$HOSTNAME/foundry-catalog/api/catalog/datasets/$DATASET_RID/transactions/$TRANSACTION_RID/files/open/$FILEPATH'
<TRANSACTION_RID> the has been sent as a response body of the first API call
Commit your transaction
curl -X POST
-H "Content-type: application/json"
-H "Authorization: Bearer $TOKEN"
"https://$HOSTNAME/api/v1/datasets/$DATASET_RID/transactions/$TRANSACTION_RID/commit“
At any time, you can abortTransaction or get the files currently in your transaction with getFilesInTransactionPaged2.
Committing a DELETE transaction does not delete the underlying file from the backing file system—it simply removes the file reference from the dataset view.
DELETE transactions are breaking incrementality. Therefore, if this dataset is used on downstream incremental datasets, this action will break incrementality of their builds.

Export contents of the Openshift image to a file

I've been searching for this for a while. I don't have access to the binary items used to build the image because an artifactory migration ruined the repo. There is one particularly precious binary I would love to extract from the image. I know docker save would save me, but I don't have access to docker, only to the oc client.
EDIT:
After looking around a little, thought that docker-registry API should be the way to go. Debugging oc client and logs of the docker-registry pods, found that both v1 and v2 API versions seem to be used.
Somehow cannot get any further than the version check.
Getting the auth token and registry url from oc:
TOKEN=`oc whoami -t`
URL="https://"`oc -n default get route docker-registry -o jsonpath="{.status.ingress[0].host}"
Then getting a correct response to:
curl -k -X GET -H "Authorization: Bearer $TOKEN" "$URL/v2/"
...
HTTP/1.1 200 OK
but:
curl -k -X GET -H "Authorization: Bearer $TOKEN" "$URL/v2/_catalog"
...
HTTP/1.1 400 Bad Request
You can log in to the internal image registry if exposed and then pull the image back down to your local system and do what you want with it. Instructions for logging in can be found in:
http://cookbook.openshift.org/image-registry-and-image-streams/how-do-i-push-an-image-to-the-internal-image-registry.html
That talks about doing a push, but you want to do a pull.

hadoop + ambari cluster change configuration

I want to upload the new bluprint.json file to my ambari cluster as the following
curl -u admin:admin -H "X-Requested-By: ambari" -X GET http://10.14.5.40:8080/api/v1/clusters/HDP6?format=blueprint -o /tmp/1-HDP6_blueprint.json
when I run it , seems that every thing is ok because we not get any warning /error
but when I read the ambari GUI parameters I see that the new bluprint.json not affected the ambari cluster with the new configuration
how to debug this ?, or how to get notification from the curl ... syntax about what happens ?
Please note that curl command you used is for downloading existing cluster configuration in blueprint format (its XGET).
You will have to use curl -XPOST to register and upload new blueprint to ambari.
curl --verbose -H "X-Requested-By: ambari" -X POST -u admin:admin http://10.14.5.40:8080/api/v1/blueprints/:HDP6new?validate_topology=false --data "#./blueprint.json"
Also note that, to change existing cluster configuration, uploading modified bluetooth is not correct way. You may refer this document for modifying configurations.

Publish to Chrome Webstore using the items.Publish API and cURL

We are developing a Chrome Extension and, as part of the release build, we want to publish it to the Chrome Webstore for testing.
We are using cURL to send the http requests.
Using the information in :
https://developer.chrome.com/webstore/using_webstore_api
we have successfully updated the store, but I am seeing an odd error when trying to publish it using the information in
"Publishing an item to trusted testers" in the above link.
The command line looks like this as suggested in the link above:
curl -H "Authorization: Bearer %refresh_token%" -H "x-goog-api-version: 2" -H "Content-Length: 0" -H "publishTarget: trustedTesters" -X POST -v https://www.googleapis.com/chromewebstore/v1.1/items/%app_id%/publish
When I run this I get an error back stating that the publish condition is not met. The error message states that we should set publish_to_trusted_testers=true, but I can find no documentation suggesting how or where I should set this.
Note that access tokens are working OK, and the PUT command to upload the new extension is also successful.
Any advice would be gratefully accepted.
Jon
https://developer.chrome.com/webstore/webstore_api/items/publish#parameters
The docs on https://developer.chrome.com/webstore/using_webstore_api don't currently point to the correct use of the api, but the publish docs are correct.
I tried url query and it succeeded:
curl \
enter code here-H "Authorization: Bearer $ACCESS_TOKEN" \
-H "x-goog-api-version: 2" \
-H "Content-Length: 0" \
-X POST \
-v \
https://www.googleapis.com/chromewebstore/v1.1/items/$APP_ID/publish?publishTarget=trustedTesters

Use cURL to add a JSON web page's data in Solr

I see from the UpdateJSON page how to use a command prompt to index a standalone file stored locally. Using this example I was able to successfully make a .json file accessible through Solr:
curl 'http://localhost:8983/solr/update/json?commit=true' --data-binary #books.json -H 'Content-type:application/json'
What I'm not able to find is the proper syntax to do the same for a webpage containing JSON data. I've tried with the #:
curl 'http://localhost:8983/solr/update/json?commit=true' --data-binary #[URL] -H 'Content-type:application/json'
and without:
curl 'http://localhost:8983/solr/update/json?commit=true' --data-binary [URL] -H 'Content-type:application/json'
Both ways lead to errors. How do I configure a command to prompt Solr to index the contents at [URL]?
According to documentation, (https://wiki.apache.org/solr/ContentStream) you should first ensure remote streaming is enabled (solrconfig, search for enableRemoteStreaming)
Then the command should be of the kind:
curl 'http://localhost:8983/solr/update/json?commit=true&stream.url=YOURURL' -H 'Content-type:application/json'