Ambari cluster + Service Auto Start Configuration by API - json

Ambari services can be configured to start automatically on system boot. Each service can be configured to start all components, masters and workers, or selectively.
so how to enable all services in ambari cluster to start automatically on system boot by API ?
Remark - by default all services are disabled

You may use auto-restart API, refer following document https://cwiki.apache.org/confluence/display/AMBARI/Recovery%3A+auto+start+components
Syntax. Following is syntax of API
curl -u admin:<password> -H "X-Requested-By: ambari" -X PUT 'http://<ambari host>:<ambari port>/api/v1/clusters/<cluster_name>/components?ServiceComponentInfo/component_name.in(<component name>)' -d '{"ServiceComponentInfo" : {"recovery_enabled":"true"}}'
Example. To set auto-restart for app timeline server component of YARN service use curl command as follows.
curl -u admin:<password> -H "X-Requested-By: ambari" -X PUT 'http://localhost:8080/api/v1/clusters/HDPCL/components?ServiceComponentInfo/component_name.in(APP_TIMELINE_SERVER)' -d '{"ServiceComponentInfo" : {"recovery_enabled":"true"}}'
NOTE: You can find list of components from http://<ambarihost>:<ambari port>/api/v1/clusters/Fenton/components

Related

How to make GCE instance stop when its deployed container finishes?

I have a Docker container that performs a single large computation. This computation requires lots of memory and takes about 12 hours to run.
I can create a Google Compute Engine VM of the appropriate size and use the "Deploy a container image to this VM instance" option to run this job perfectly. However once the job is finished the container quits but the VM is still running (and charging).
How can I make the VM exit/stop/delete when the container exits?
When the VM is in its zombie mode only the stackdriver containers are left running:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
bfa2feb03180 gcr.io/stackdriver-agents/stackdriver-logging-agent:0.2-1.5.33-1-1 "/entrypoint.sh /u..." 17 hours ago Up 17 hours stackdriver-logging-agent
161439a487c2 gcr.io/stackdriver-agents/stackdriver-metadata-agent:0.2-0.0.17-2 "/bin/sh -c /opt/s..." 17 hours ago Up 17 hours 8000/tcp stackdriver-metadata-agent
I create the VM like this:
gcloud beta compute --project=abc instances create-with-container vm-name \
--zone=us-central1-c --machine-type=custom-1-65536-ext \
--network=default --network-tier=PREMIUM --metadata=google-logging-enabled=true \
--maintenance-policy=MIGRATE \
--service-account=xyz \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--image=cos-stable-69-10895-71-0 --image-project=cos-cloud --boot-disk-size=10GB \
--boot-disk-type=pd-standard --boot-disk-device-name=vm-name \
--container-image=gcr.io/abc/my-image --container-restart-policy=on-failure \
--container-command=python3 \
--container-arg="a" --container-arg="b" --container-arg="c" \
--labels=container-vm=cos-stable-69-10895-71-0
When you create the VM, you'll need to give it write access to compute so you can delete the instance from within. You should also set container environment variables like gce_zone and gce_project_id at this time. You'll need them to delete the instance.
gcloud beta compute instances create-with-container {NAME} \
--container-env=gce_zone={ZONE},gce_project_id={PROJECT_ID} \
--service-account={SERVICE_ACCOUNT} \
--scopes=https://www.googleapis.com/auth/compute,...
...
Then within the container, whenever YOU determine your task is finished:
request an api token (im using curl for simplicity and DEFAULT gce service account)
curl "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token" -H "Metadata-Flavor: Google"
This will respond with json that looks like
{
"access_token": "foobarbaz...",
"expires_in": 1234,
"token_type": "Bearer"
}
Take that access token and hit the instances.delete api endpoint (notice the environment variables)
curl -XDELETE -H 'Authorization: Bearer {TOKEN}' https://www.googleapis.com/compute/v1/projects/$gce_project_id/zones/$gce_zone/instances/$HOSTNAME
Having grappled with the problem for some time, here's a full solution that works pretty well.
This solution doesn't use the "start machine with a container image" option. Instead it uses a startup script, which is more flexible. You still use a Container-Optimized OS instance.
Create a startup script:
#!/usr/bin/env bash
# get image name and container parameters from the metadata
IMAGE_NAME=$(curl http://metadata.google.internal/computeMetadata/v1/instance/attributes/image_name -H "Metadata-Flavor: Google")
CONTAINER_PARAM=$(curl http://metadata.google.internal/computeMetadata/v1/instance/attributes/container_param -H "Metadata-Flavor: Google")
# This is needed if you are using a private images in GCP Container Registry
# (possibly also for the gcp log driver?)
sudo HOME=/home/root /usr/bin/docker-credential-gcr configure-docker
# Run! The logs will go to stack driver
sudo HOME=/home/root docker run --log-driver=gcplogs ${IMAGE_NAME} ${CONTAINER_PARAM}
# Get the zone
zoneMetadata=$(curl "http://metadata.google.internal/computeMetadata/v1/instance/zone" -H "Metadata-Flavor:Google")
# Split on / and get the 4th element to get the actual zone name
IFS=$'/'
zoneMetadataSplit=($zoneMetadata)
ZONE="${zoneMetadataSplit[3]}"
# Run compute delete on the current instance. Need to run in a container
# because COS machines don't come with gcloud installed
docker run --entrypoint "gcloud" google/cloud-sdk:alpine compute instances delete ${HOSTNAME} --delete-disks=all --zone=${ZONE}
Put the script somewhere public. For example put it on Cloud Storage and create a public URL. You can't use a gs:// URI for a COS startup script.
Start an instance using a startup-script-url, and passing the image name and parameters, e.g.:
gcloud compute --project=PROJECT_NAME instances create INSTANCE_NAME \
--zone=ZONE --machine-type=TYPE \
--metadata=image_name=IMAGE_NAME,\
container_param="PARAM1 PARAM2 PARAM3",\
startup-script-url=PUBLIC_SCRIPT_URL \
--maintenance-policy=MIGRATE --service-account=SERVICE_ACCUNT \
--scopes=https://www.googleapis.com/auth/cloud-platform --image-family=cos-stable \
--image-project=cos-cloud --boot-disk-size=10GB --boot-disk-device-name=DISK_NAME
(You probably want to limit the scopes, the example uses full access for simplicity)
I wrote a self-contained Python function based on Vincent's answer.
def kill_vm():
"""
If we are running inside a GCE VM, kill it.
"""
# based on https://stackoverflow.com/q/52748332/321772
import json
import logging
import requests
# get the token
r = json.loads(
requests.get("http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token",
headers={"Metadata-Flavor": "Google"})
.text)
token = r["access_token"]
# get instance metadata
# based on https://cloud.google.com/compute/docs/storing-retrieving-metadata
project_id = requests.get("http://metadata.google.internal/computeMetadata/v1/project/project-id",
headers={"Metadata-Flavor": "Google"}).text
name = requests.get("http://metadata.google.internal/computeMetadata/v1/instance/name",
headers={"Metadata-Flavor": "Google"}).text
zone_long = requests.get("http://metadata.google.internal/computeMetadata/v1/instance/zone",
headers={"Metadata-Flavor": "Google"}).text
zone = zone_long.split("/")[-1]
# shut ourselves down
logging.info("Calling API to delete this VM, {zone}/{name}".format(zone=zone, name=name))
requests.delete("https://www.googleapis.com/compute/v1/projects/{project_id}/zones/{zone}/instances/{name}"
.format(project_id=project_id, zone=zone, name=name),
headers={"Authorization": "Bearer {token}".format(token=token)})
A simple atexit hook gets me my desired behavior:
import atexit
atexit.register(kill_vm)
Another solution is to not use GCE and instead use AI Platform's custom job service, which automatically shuts down the VM after the Docker container exits.
gcloud ai-platform jobs submit training $JOB_NAME \
--region $REGION \
--master-image-uri $IMAGE_URI
You can specify --master-machine-type.
See the GCP documentation on custom containers.
The simplest way, from within the container, once it's finished:
ZONE=`gcloud compute instances list --filter="name=($HOSTNAME)" --format 'csv[no-heading](zone)'`
gcloud compute instances delete $HOSTNAME --zone=$ZONE -q
-q skips the interactive confirmation
$HOSTNAME is already exported
Just use curl and the local metadata server (no need for Python scripts or gcloud). Add the following to the end of your Docker Entrypoint script, so it's run when the container finishes:
# Note: inside the container the name is exposed as $HOSTNAME
INSTANCE_NAME=$(curl -sq "http://metadata.google.internal/computeMetadata/v1/instance/name" -H "Metadata-Flavor: Google")
INSTANCE_ZONE=$(curl -sq "http://metadata.google.internal/computeMetadata/v1/instance/zone" -H "Metadata-Flavor: Google")
echo "Terminating instance [${INSTANCE_NAME}] in zone [${INSTANCE_ZONE}}"
TOKEN=$(curl -sq "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token" -H "Metadata-Flavor: Google" | jq -r '.access_token')
curl -X DELETE -H "Authorization: Bearer ${TOKEN}" https://www.googleapis.com/compute/v1/$INSTANCE_ZONE/instances/$INSTANCE_NAME
For security sake, and Principle of Least Privilege, you can run the VM with a custom service account, and give that service account a role, with this permission (a custom role is best).
compute.instances.delete

Prometheus API returning HTML instead of JSON

Configured prometheus with kubernates and trying to execute queries using API's. Followed document to configure and execute the API
https://github.com/prometheus/prometheus/blob/master/docs/querying/api.md
Executing below curl command for output:
curl -k -X GET "https://127.0.0.1/api/v1/query?query=kubelet_volume_stats_available_bytes"
But getting output in HTML instead of JSON.
Is any additional configuration needed to be done to get output in json format for prometheus?
Per the Prometheus documentation, Prometheus "[does] not provide any server-side authentication, authorisation or encryption".
It would seem that you're hitting some proxy, so you need to figure out how to get past that proxy and through to Prometheus. Once you do that, you'll get the response you expect.
When I run prometheus on my local machine, it runs on port 9090 by default based on the Prometheus README.md:
* Install docker
* change the prometheus.yml section called target
#static_configs: (example)
# - targets: ['172.16.129.33:8080']
the target IP should be your localhost IP. Just providing localhost also would work.
* docker build -t prometheus_simple .
* docker run -p 9090:9090 prometheus_simple
* endpoint for prometheus is http://localhost:9090
So if I put the port in your curl call I have
curl -k -X GET "https://127.0.0.1:9090/api/v1/query?query=kubelet_volume_stats_available_bytes"
And I get:
{"status":"success","data":{"resultType":"vector","result":[]}}

Export contents of the Openshift image to a file

I've been searching for this for a while. I don't have access to the binary items used to build the image because an artifactory migration ruined the repo. There is one particularly precious binary I would love to extract from the image. I know docker save would save me, but I don't have access to docker, only to the oc client.
EDIT:
After looking around a little, thought that docker-registry API should be the way to go. Debugging oc client and logs of the docker-registry pods, found that both v1 and v2 API versions seem to be used.
Somehow cannot get any further than the version check.
Getting the auth token and registry url from oc:
TOKEN=`oc whoami -t`
URL="https://"`oc -n default get route docker-registry -o jsonpath="{.status.ingress[0].host}"
Then getting a correct response to:
curl -k -X GET -H "Authorization: Bearer $TOKEN" "$URL/v2/"
...
HTTP/1.1 200 OK
but:
curl -k -X GET -H "Authorization: Bearer $TOKEN" "$URL/v2/_catalog"
...
HTTP/1.1 400 Bad Request
You can log in to the internal image registry if exposed and then pull the image back down to your local system and do what you want with it. Instructions for logging in can be found in:
http://cookbook.openshift.org/image-registry-and-image-streams/how-do-i-push-an-image-to-the-internal-image-registry.html
That talks about doing a push, but you want to do a pull.

hadoop + ambari cluster change configuration

I want to upload the new bluprint.json file to my ambari cluster as the following
curl -u admin:admin -H "X-Requested-By: ambari" -X GET http://10.14.5.40:8080/api/v1/clusters/HDP6?format=blueprint -o /tmp/1-HDP6_blueprint.json
when I run it , seems that every thing is ok because we not get any warning /error
but when I read the ambari GUI parameters I see that the new bluprint.json not affected the ambari cluster with the new configuration
how to debug this ?, or how to get notification from the curl ... syntax about what happens ?
Please note that curl command you used is for downloading existing cluster configuration in blueprint format (its XGET).
You will have to use curl -XPOST to register and upload new blueprint to ambari.
curl --verbose -H "X-Requested-By: ambari" -X POST -u admin:admin http://10.14.5.40:8080/api/v1/blueprints/:HDP6new?validate_topology=false --data "#./blueprint.json"
Also note that, to change existing cluster configuration, uploading modified bluetooth is not correct way. You may refer this document for modifying configurations.

Error while trying to run a MapReduce job on FIWARE-Cosmos using Tidoop REST API

I am following this guide on Github and I am not able run the example mapreduced job mentioned in Step 5.
I am aware that this file no longer exists:
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar
And I am aware that the same file can now be found here:
/usr/lib/hadoop-0.20/hadoop-examples-0.20.2-cdh3u6.jar
So I form my call as below:
curl -v -X POST "http://computing.cosmos.lab.fiware.org:12000/tidoop/v1/user/$user/jobs" -d '{"jar":"/usr/lib/hadoop-0.20/hadoop-examples-0.20.2-cdh3u6.jar","class_name":"WordCount","lib_jars":"/usr/lib/hadoop-0.20/hadoop-examples-0.20.2-cdh3u6.jar","input":"testdir","output":"testoutput"}' -H "Content-Type: application/json" -H "X-Auth-Token: $TOKEN"
The input directory exists in my hdfs user space and there is a file called testdata.txt inside it. The testoutput folder does not exist in my hdfs user space since I know it creates problems.
When I execute this curl command, the error I get is {"success":"false","error":1} which is not very descriptive. Is there something I am missing here?
This has been just tested with my user frb and a valid token for that user:
$ curl -X POST "http://computing.cosmos.lab.fiware.org:12000/tidoop/v1/user/frb/jobs" -d '{"jar":"/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar","class_name":"wordcount","lib_jars":"/usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar","input":"testdir","output":"outputtest"}' -H "Content-Type: application/json" -H "X-Auth-Token: xxxxxxxxxxxxxxxxxxx"
{"success":"true","job_id": "job_1460639183882_0011"}
Please observe the fat jar with the MapReduce examples in the "new" cluster (computing.cosmos.lab.fiware.org) is at /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar, as detailed in the documentation. /usr/lib/hadoop-0.20/hadoop-examples-0.20.2-cdh3u6.jar was the fat jar in the "old" cluster (cosmos.lab.fiware.org).
EDIT 1
Finally, the user had no account in the "new" pair of clusters of Cosmos in FIWARE LAB (storage.cosmos.lab.fiware.org and computing.cosmos.lab.fiware.org), where Tidoop runs, but in another "old" cluster (cosmos.lab.fiwre.org). Thus, the issue was fixed by simply provisioning an account in the "new" ones.