How do I start instance on GCE again after power off.
Instance shows TERMINATED , but has PERSISTENT disk type.
if I use add instance with the same instance name it asks me for the
Select an new image with only choice of OS level, not my existing disk.
then fails with
ERROR: RESOURCE_ALREADY_EXISTS: The resource XXXX already exists
Is there way to start (or clone) copy of image once stopped?
Anything similar to AWS stop/start. I don't care about instance state or scratch to be saved, just start since I have boot disk stored and payed for.
Success, below is stop/start procedure, assuming that $PROJECT and $INSTANCE are set appropriately:
#--------- stop instance -----
#connect and shutdown
gcutil --project=$PROJECT ssh $INSTANCE
sudo shutdown -h now
# check
gcutil listinstances --project $PROJECT
#delete instance/keep boot disk , use -f to avoid confirmation
gcutil --project=$PROJECT deleteinstance $INSTANCE --nodelete_boot_pd
# check disks
gcutil listdisks --project=$PROJECT
#--------- start new instance -----
# launch instance using the existing disk (has to be in the same zone!)
gcutil --project=$PROJECT addinstance $INSTANCE --disk=$DISK,boot --zone=$ZONE --machine_type=n1-standard-1
#check that it's running
gcutil listinstances --project $PROJECT
You're on the right track. You just need to delete the existing TERMINATED instance before adding it again.
Even though the instance isn't running when it is TERMINATED, the resources (such as Persistent Disk) are still allocated to it.
Also, if this instance was created before December 5th, (when Compute Engine went GA), you'll need to add a kernel to the disk or it won't boot. See the transition guide for details.
(For a temporary work around to upgrading the kernel, see this Q/A: My Google Compute Engine instances hang during boot using the v1 API)
Related
How do I create a google compute engine template, named my-template, with a custom disk named my-disk, an external ip, that's preemptible, and with the tags necessary to open the http server ports?
Can I use the a managed template group to automatically restart these preemptible instances?
Something like the following command will work. Note that I set it up to use a highmem machine with 8 cores.
gcloud compute instance-templates create my-template \
--disk=boot=yes,auto-delete=no,name=my-disk \
--machine-type=n1-highmem-8 \
--preemptible \
--network-interface=address=35.238.XXX.YYY \
--tags=http-server,https-server
As of Nov 2018, the following link is where you can setup your external IP:
https://console.cloud.google.com/networking/addresses/list
Yes, you'll be able to use a managed instance group to automatically restart the preemptible instance once compute resources are available.
I want to attach disk to an instance on Google Compute Engine, through commands below.
gcloud compute instances create pg-disk-formatter --image ubuntu-1604-lts --custom-cpu 1 --custom-memory 1
gcloud compute disks create pg-data-disk --size 50GB
gcloud compute instances attach-disk pg-disk-formatter --disk pg-data-disk
However, even I logged into the machine and cd to /dev/disk/by-id/, it doesn't show up on the list.
mkouhei0910#pg-data-disk:~$ cd /dev/disk/by-id/
google-persistent-disk-0 scsi-0Google_PersistentDisk_persistent-disk-0
google-persistent-disk-0-part1 scsi-0Google_PersistentDisk_persistent-disk-0-part1
google-pg-data-disk2 scsi-0Google_PersistentDisk_pg-data-disk2
I noticed it shows up after I attached new disk image from Google Platform Console, but how can I achieve it purely gloud command line?
You first command is not correct. It should be:
gcloud compute instances create pg-disk-formatter --image-project ubuntu-os-cloud --image-family ubuntu-1604-lts --custom-cpu 1 --custom-memory 1
The second and third commands are good. They will create a disk and will attach it to the VM instance. The additional disk is listed in the output of ls command that your provided:
google-pg-data-disk2
If you want guest operating system sees a different name for attached disk, you can use --device-name flag with the command.
I'm using "oc cluster up" to start my Openshift Origin environment. I can see, however, that once I shutdown the cluster my projects aren't persisted at restart. Is there a way to make them persistent ?
Thanks
There are a couple ways to do this. oc cluster up doesn't have a primary use case of persisting resources.
There are couple ways to do it:
Leverage capturing etcd as described in the oc cluster up README
There is a wrapper tool, that makes it easy to do this.
There is now an example in the cluster up --help command, it is bound to stay up to date so check that first
oc cluster up --help
...
Examples:
# Start OpenShift on a new docker machine named 'openshift'
oc cluster up --create-machine
# Start OpenShift using a specific public host name
oc cluster up --public-hostname=my.address.example.com
# Start OpenShift and preserve data and config between restarts
oc cluster up --host-data-dir=/mydata --use-existing-config
So specifically in v1.3.2 use --host-data-dir and --use-existing-config
Assuming you are using docker machine with vm such as virtual box, the easiest way I found is taking a vm snapshot WHILE vm and openshift cluster are up and running. This snapshot will backup memory in addition to disk therefore you can restore entire cluster later on by restoring the vm snapshot, then run docker-machine start ...
btw, as of latest os image openshift/origin:v3.6.0-rc.0 and oc cli, --host-data-dir=/mydata as suggested in the other answer doesn't work for me.
I'm using:
VirtualBox 5.1.26
Kubernetes v1.5.2+43a9be4
openshift v1.5.0+031cbe4
Didn't work for me using --host-data-dir (and others) :
oc cluster up --logging=true --metrics=true --docker-machine=openshift --use-existing-config=true --host-data-dir=/vm/data --host-config-dir=/vm/config --host-pv-dir=/vm/pv --host-volumes-dir=/vm/volumes
With output:
-- Checking OpenShift client ... OK
-- Checking Docker client ...
Starting Docker machine 'openshift'
Started Docker machine 'openshift'
-- Checking Docker version ...
WARNING: Cannot verify Docker version
-- Checking for existing OpenShift container ... OK
-- Checking for openshift/origin:v1.5.0 image ... OK
-- Checking Docker daemon configuration ... OK
-- Checking for available ports ... OK
-- Checking type of volume mount ...
Using Docker shared volumes for OpenShift volumes
-- Creating host directories ... OK
-- Finding server IP ...
Using docker-machine IP 192.168.99.100 as the host IP
Using 192.168.99.100 as the server IP
-- Starting OpenShift container ...
Starting OpenShift using container 'origin'
FAIL
Error: could not start OpenShift container "origin"
Details:
Last 10 lines of "origin" container log:
github.com/openshift/origin/vendor/github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc4202a1600, 0x42b94c0, 0x1f, 0xc4214d9f08, 0x2, 0x2)
/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/pkg/capnslog/pkg_logger.go:75 +0x16a
github.com/openshift/origin/vendor/github.com/coreos/etcd/mvcc/backend.newBackend(0xc4209f84c0, 0x33, 0x5f5e100, 0x2710, 0xc4214d9fa8)
/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/mvcc/backend/backend.go:106 +0x341
github.com/openshift/origin/vendor/github.com/coreos/etcd/mvcc/backend.NewDefaultBackend(0xc4209f84c0, 0x33, 0x461e51, 0xc421471200)
/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/mvcc/backend/backend.go:100 +0x4d
github.com/openshift/origin/vendor/github.com/coreos/etcd/etcdserver.NewServer.func1(0xc4204bf640, 0xc4209f84c0, 0x33, 0xc421079a40)
/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/etcdserver/server.go:272 +0x39
created by github.com/openshift/origin/vendor/github.com/coreos/etcd/etcdserver.NewServer
/go/src/github.com/openshift/origin/_output/local/go/src/github.com/openshift/origin/vendor/github.com/coreos/etcd/etcdserver/server.go:274 +0x345
Openshift writes to the directories /vm/... (also defined in VirtualBox) but successfully won't start.
See [https://github.com/openshift/origin/issues/12602][1]
Worked for me too, using Virtual Box Snapshots and restoring them.
To make it persistent after each shutdown you need to provide base-dir parameter.
$ mkdir ~/openshift-config
$ oc cluster up --base-dir=~/openshift-config
From help
$ oc cluster up --help
...
Options:
--base-dir='': Directory on Docker host for cluster up configuration
--enable=[*]: A list of components to enable. '*' enables all on-by-default components, 'foo' enables the component named 'foo', '-foo' disables the component named 'foo'.
--forward-ports=false: Use Docker port-forwarding to communicate with origin container. Requires 'socat' locally.
--http-proxy='': HTTP proxy to use for master and builds
--https-proxy='': HTTPS proxy to use for master and builds
--image='openshift/origin-${component}:${version}': Specify the images to use for OpenShift
--no-proxy=[]: List of hosts or subnets for which a proxy should not be used
--public-hostname='': Public hostname for OpenShift cluster
--routing-suffix='': Default suffix for server routes
--server-loglevel=0: Log level for OpenShift server
--skip-registry-check=false: Skip Docker daemon registry check
--write-config=false: Write the configuration files into host config dir
But you shouln't use it, because "cluster up" is removed in version 4.0.0. More here: https://github.com/openshift/origin/pull/21399
I have a Google compute engine instance(Cent-Os) which I could access using its external IP address till recently.
Now suddenly the instance cannot be accessed using its using its external IP address.
I logged in to the developer console and tried rebooting the instance but that did not help.
I also noticed that the CPU usage is almost at 100% continuously.
On further analysis of the Serial port output it appears the init module is not loading properly.
I am pasting below the last few lines from the serial port output of the virtual machine.
rtc_cmos 00:01: RTC can wake from S4
rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
rtc0: alarms up to one day, 114 bytes nvram
cpuidle: using governor ladder
cpuidle: using governor menu
EFI Variables Facility v0.08 2004-May-17
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
usbhid: v2.6:USB HID core driver
GRE over IPv4 demultiplexor driver
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 17
registered taskstats version 1
rtc_cmos 00:01: setting system clock to 2014-07-04 07:40:53 UTC (1404459653)
Initalizing network drop monitor service
Freeing unused kernel memory: 1280k freed
Write protecting the kernel read-only data: 10240k
Freeing unused kernel memory: 800k freed
Freeing unused kernel memory: 1584k freed
Failed to execute /init
Kernel panic - not syncing: No init found. Try passing init= option to kernel.
Pid: 1, comm: swapper Not tainted 2.6.32-431.17.1.el6.x86_64 #1
Call Trace:
[] ? panic+0xa7/0x16f
[] ? init_post+0xa8/0x100
[] ? kernel_init+0x2e6/0x2f7
[] ? child_rip+0xa/0x20
[] ? kernel_init+0x0/0x2f7
[] ? child_rip+0x0/0x20
Thanks in advance for any tips to resolve this issue.
Mathew
It looks like you might have an script or other program that is causing you to run out of Inodes.
You can delete the instance without deleting the persistent disk (PD) and create a new vm with a higher capacity using your PD, however if it's an script causing this, you will end up with the same issue. It's always recommended to backup your PD before making any changes.
Run this command to find more info about your instance:
gcutil --project= getserialportoutput
If the issue still continue, you can either
- Make a snapshot of your PD and make a PD's copy or
- Delete the instance without deleting the PD
Attach and mount the PD to another vm as a second disk, so you can access it to find what is causing this issue. Visit this link https://developers.google.com/compute/docs/disks#attach_disk for more information on how to do this.
Visit this page http://www.ivankuznetsov.com/2010/02/no-space-left-on-device-running-out-of-inodes.html for more information about inodes troubleshooting.
Make sure the Allow HTTP traffic setting on the vm is still enabled.
Then see which network firewall you are using and it's rules.
If your network is set up to use an ephemral IP, it will be periodically released back. This will cause your IP to change over time. Set it to static/reserved then (on networks page).
https://developers.google.com/compute/docs/instances-and-network#externaladdresses
I have managed to create an instance and ssh into it. However, I have couple of questions regarding the Google Compute Engine.
I understand that I will be charged for the time my instance is running. That is till I exit out of the instance. Is my understanding correct?
I wish to run some batch job (java program) on my instance. How do I make my instance stop automatically after the job is complete (so that I don't get charged for the additional time it may run)
If I start the job and disconnect my PC, will the job continue to run on the instance?
Regards,
Asim
Correct, instances are charged for the time they are running. (to the minute, minimum 10 minutes). Instances run from the time they are started via the API until they are stopped via the API. It doesn't matter if any user is logged in via SSH or not. For most automated use cases users never log in - programs are installed and started via start up scripts.
You can view your running instances via the Cloud Console, to confirm if any are currently running.
If you want to stop your instance from inside the instance, the easiest way is to start the instance with the compute-rw Service Account Scope and use gcutil.
For example, to start your instance from the command line with the compute-rw scope:
$ gcutil --project=<project-id> addinstance <instance name> --service_account_scopes=compute-rw
(this is the default when manually creating an instance via the Cloud Console)
Later, after your batch job completes, you can remove the instance from inside the instance:
$ gcutil deleteinstance -f <instance name>
You can put halt command at the end of your batch script (assuming that you output your results on persistent disk).
After halt the instance will have a state of TERMINATED and you will not be charged.
See https://developers.google.com/compute/docs/pricing
scroll downn to "instance uptime"
You can auto shutdown instance after model training. Just run few extra lines of code after the model training is complete.
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
credentials = GoogleCredentials.get_application_default()
service = discovery.build('compute', 'v1', credentials=credentials)
# Project ID for this request.
project = 'xyz' # Project ID
# The name of the zone for this request.
zone = 'xyz' # Zone information
# Name of the instance resource to stop.
instance = 'xyz' # instance id
request = service.instances().stop(project=project, zone=zone, instance=instance)
response = request.execute()
add this to your model training script. When the training is complete GCP instance automatically shuts down.
More info on official website:
https://cloud.google.com/compute/docs/reference/rest/v1/instances/stop
If you want to stop the instance using the python script, you can follow this way:
from google.cloud.compute_v1.services.instances import InstancesClient
from google.oauth2 import service_account
instance_client = InstancesClient().from_service_account_file(<location-path>)
zone = <zone>
project = <project>
instance = <instance_id>
instance_client.stop(project=project, instance=instance, zone=zone)
In the above script, I have assumed you are using service-account for authentication. For documentation of libraries used you can go here:
https://googleapis.dev/python/compute/latest/compute_v1/instances.html