I would like to start a job on Compute engine instance, turn off my laptop and have the instance automatically deleted once the remote job is complete (without my laptop running)
Currently I have a script which runs on my laptop which has a delete instance command in the end. But I wish to turn off my laptop, since the remote job can even run for >24hours.
Is it possible to do this? Maybe I can poweroff or delete the instance at the end of the remote job FROM the remote instance itself?
Your assumption is correct you just need to trigger the shutdown command from within your GCE instance when the script has finished and not calling it from your laptop
Related
I'm trying to set up a small blog server on Google Cloud Platform using the free tier f1.micro instance. I'm using Ubuntu 20.04 LTS as the base image (Ubuntu is the only Linux distro that I'm at all familiar with), though I tried 20.10. Everything works normally until I install MySQL. This is the guide that I'm following. After each failure, I deleted the VM and started with a fresh one.
These are the VM settings:
In addition to the steps listed in the guide, I also tried adding ssh to ufw, just in case.
sudo ufw allow ssh
sudo ufw enable
I also tried running this prior to installing MySQL, based on this article after failing the first couple of times.
sudo apt-get purge mysql*
sudo apt-get autoremove
sudo apt-get autoclean
sudo apt-get dist-upgrade
Once I try installing mysql-server the ssh prompt hangs here:
I've tried reconnecting immediately and I've tried waiting overnight, but I always get stuck here when I try to connect again (it stays like this for a very long time before failing):
I experienced a similar issue with a MySQL Instance in GCP, the first issue was related with the type of the VM instance I used, I had a f1-micro machine type on this VM Instance and suddenly I wasn’t able to access the ssh. As this type of VM Instance has only 0.6GB of memory, it became out of memory soon, I changed it to a e2-medium that is value by default and it resolved my problems this time.
As the Instance was out of memory the services in the instance started to fail, it was the reason that I can't access my instance.
At another time I started again with similar issues, but this time, the problem was the disk, I only had 10 GB and there was a process filling my disk, when a partition was out of space, the instance started to fail again.
I only resized my disk, now my instance disk is 20GB and is working fine.
Having said that, I suggest increasing your resources per your convenience to enhance your performance, because to have the problems you described is a good indicator that your existing machine type is not a good fit for your workloads you run on that instance.
So, I suggest to change the machine type to adjust your memory and you can follow the next steps for these tasks please visit the following link to get further information about it.
Changing a machine type
1.- Go to the VM Instances page.
2.- In the Name column, click your instance.
From the instance details page, complete the following steps:
a) Click the Stop button to stop the instance, if you have not stopped it yet.
b) After the instance stops, click the Edit button at the top of the page.
c) Under the Machine configuration section, select the machine type you want to use, or create a custom machine type to increase only the Memory.
d) Save your changes and start again your VM Instance.
You can resize your disk following this guide or with the following command:
gcloud compute disks resize DISK_NAME --size DISK_SIZE
Or with the Console:
Go to the Disks page to see a list of zonal persistent disks in your project.
Click the name of the disk that you want to resize.
On the disk details page, click Edit.
In the Size field, enter the new size for your disk.
Click Save to apply your changes to the disk.
After you resize the disk, you must resize the file system so that the operating system can access the additional space.
Note: Do not resize boot disks beyond 2 TB because this is the limit.
As per the installation guide you need a server with at least 1GB of memory and your selected VM instance has 614MB of memory. If I understand correctly, when Mysql service is installed it has been occupied total memory and that might be the reason you got stuck on that point also not able to SSH the instance.
I have a small instance running in GCE, had some troubles with the MongoDb so after some tries decided to reset the instance. But... it didn't seem to come back online. So i stopped the instance and restarted it.
It is an Bitnami MEAN stack which starts apache and stuff at startup.
But... i can't reach the instance! No SCP, no SSH, no webservice running. When i try to connect via SSH (in GCE) it times out, cant make connection on port 22. In the information it says 'The instance is booting up and sshd is not running yet', which is possible of course.... But i cant reach the instance in no possible manner not even after an hour wait :) Not sure what's happening if i cant connect to it somehow :(
There is some activity in the console... some CPU usage, mostly 0%, some incomming traffic but no outgoing...
I hope someone can give me a hint here!
Update 1
After the helpfull tip form Serhii... if found this in the logs...
Booting from Hard Disk 0...
[ 0.872447] piix4_smbus 0000:00:01.3: SMBus base address uninitialized - upgrade BIOS or use force_addr=0xaddr
/dev/sda1 contains a file system with errors, check forced.
/dev/sda1: Inodes that were part of a corrupted orphan linked list found.
/dev/sda1: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
(i.e., without -a or -p options)
fsck exited with status code 4
The root filesystem on /dev/sda1 requires a manual fsck
Update 2...
So, i need to fsck the drive...
Created a snapshot, made a new disk from that snapshot, added the new disk as an extra disk to another instance. Now that instance wont boot with the same problem... removing the extra disk fixed it again. So adding the disk makes it crash even though it isn't the boot-disk?
First, have a look at the Compute Engine -> VM instances -> NAME_OF_YOUR_VM -> Logs -> Serial port 1 (console) and try to find errors and warnings that could be connected to lack of free space or SSH. It'll be helpful if you updated your post by providing this information. In case if your instance run out of free space follow this instructions.
You can try to connect to your VM via Serial console by following this guide, but keep in mind that:
The interactive serial console does not support IP-based access
restrictions such as IP whitelists. If you enable the interactive
serial console on an instance, clients can attempt to connect to that
instance from any IP address.
more details you can find in the documentation.
Have a look at the Troubleshooting SSH guide and Known issues for SSH in browser. In addition, Google provides a troubleshooting script for Compute Engine to identify issues with SSH login/accessibility of your Linux based instance.
If you still have a problem try to use your disk on a new instance.
EDIT It looks like your test VM is trying to boot from the disk that you created from the snapshot. Try to follow this guide.
If you still have a problem, you can try to recreate the boot disk from a snapshot to resize it.
I decided to play around with Google Could SQL and I setup a test sql instance, loaded it with some data and then setup replication on it in the google dev console. I did my testing and found out it all works great, the master/slave setup works as it should and my little POC was a success. So now I want to delete the POC sql instances but that's not going so well.
I deleted the replica instance fine (aka the 'slave') but for some reason the master instance still thinks there is a slave and therefore will not let me delete it. For example I run the following command in the gclound shell:
gcloud sql instances delete MY-INSTANCE-NAME
I get the following message:
ERROR: (gcloud.sql.instances.delete) The requested operation is not valid for a replication master instance.
This screenshot also shows that in the google dev console it clearly thinks there are no replicas attached to this instance (because I deleted them) but when I run:
gcloud sql instances describe MY-INSTANCE-NAME
It shows that there is a replica name still attached to the instance.
Any ideas on how to delete this for good? Kinda lame to keep on paying for this when it was just a POC that I want to delete (glad I didn't pick a high memory machine!)
Issue was on Google's side and they fixed it. Here were the sequence of events that led to the issue happening:
1) Change master's tier
2) Promote replica to master while the master tier change is in progress
Just had the same problem using GCloud. Deleting the failover replica first and then the master instance worked for me.
Several months ago, I followed http://aws.amazon.com/articles/1663 and got it all running. Then, my PC crashed and I lost the keypair (http://stackoverflow.com/questions/7949835/accessing-ec2-instance-after-losing-keypair) and could no longer access the instance.
I want to now launch a new instance and mount this MySQL/DB volume which is left over from before and see if I can get to the data on it. How can I go about doing that?
You outlined the correct approach to this problem already, and the author of the article you referenced, Eric Hammond, has written another one detailing this very process, see Fixing Files on the Root EBS Volume of an EC2 Instance - it boils down to:
start another EC2 instance
stop the EC2 instance you can't access anymore
detach the EBS volume from the stopped instance
attach the EBS volume to the running instance
SSH into the running instance
mount the EBS volume in the running instance
perform whatever fixes necessary, i.e. adjust the /var permissions in your case
Please see Eric's instructions for details on how to do this from the command line; obviously you can achieve all steps up to the SSH access via the AWS Management Console as well, removing the need to install the Amazon EC2 API Tools, in case they aren't readily available already.
I have a Jenkins (Hudson) server setup that runs tests on a variety of slave machines. What I want to do is reconfigure the slave (using remote APIs), reboot the slave so that he changes take effect, then continue with the rest of the test. There are two hurdles that I've encountered so far:
Once a Jenkins job begins to run on the slave, the slave cannot go down or break the network connection to the server otherwise Jenkins immediately fails the test. Normally, I would say this is completely desirable behavior. But in this case, I would like for Jenkins to accept the disruption until the slave comes back online and Jenkins can reconnect to it - or the slave reconnects to Jenkins.
In a job that has been attached to the slave, I need to run some build tasks on the Jenkins master - not on the slave.
Is this possible? So far, I haven't found a way to do this using Jenkins or any of its plugins.
EDIT - Further Explanation
I really, really like the Jenkins slave architecture. Combined with the plugins already available, it makes it very easy to get jobs to a slave, run, and the results pulled back. And the ability to pick any matching slave allows for automatic job/test distribution.
In our situation, we use virtualized (VMware) slave machines. It was easy enough to write a script that would cause Jenkins to use VMware PowerCLI to start the VM up when it needed to run on a slave, then ship the job to it and pull the results back. All good.
EXCEPT Part of the setup of each test is to slightly reconfigure the virtual machine in some fashion. Disable UAC, logon as a different user, have a different driver installed, etc - each of these changes requires that the test VM/slave be rebooted before the changes take affect. Although I can write slave on-demand scripts (Launch Method=Launch slave via execution of command on the master) that handle this reconfig and restart, it has to be done BEFORE the job is run. That's where the problem occurs - I cannot configure the slave that early because the type of configuration changes are dependent on the job being run, which occurs only after the slave is started.
Possible Solutions
1) Use multiple slave instances on a single VM. This wouldn't work - several of the configurations are mutually exclusive, but Jenkins doesn't know that. So it would try to start one slave configuration for one job, another slave for a different job - and both slaves would be on the same VM. Locks on the jobs don't prevent this since slave starting isn't part of the job.
2) (Optimal) A build step that allows a job to know that it's slave connection MIGHT be disrupted. The build step may have to include some options so that Jenkins knows how to reconnect the slave (will the slave reconnect automatically, will Jenkins have to run a script, will simple SSH suffice). The build step would handle the disconnect of the slave, ignore the usually job-failing disconnect, then perform the reconnect. Once the slave is back up and running, the next build step can occur. Perhaps a timeout to fail the job if the slave isn't reconnectable in a certain amount of time.
** Current Solution ** - less than optimal
Right now, I can't use the slave function of Jenkins. Instead, I use a series of build steps - run on the master - that use Windows and PowerShell scripts to power on the VM, make the configurations, and restart it. The VM has a SSH server running on it and I use that to upload test files to the test VM, then remote execute them. Then download the results back to Jenkins for handling by the job. This solution is functional - but a lot more work than the typical Jenkins slave approach. Also, the scripts are targeted towards a single VM; I can't easily use a pool of slaves.
Not sure if this will work for you, but you might try making the Jenkins agent node programmatically tell the master node that it's offline.
I had a situation where I needed to make a Jenkins job that performs these steps (all while running on the master node):
revert the Jenkins agent node VM to a powered-off snapshot
tell the master that the agent node is disconnected (since the master does not seem to automatically notice the agent is down, whenever I revert or hard power off my VMs)
power the agent node VM back on
as a "Post-build action", launch a separate job restricted to run on the agent node VM
I perform the agent disconnect step with a curl POST request, but there might be a cleaner way to do it:
curl -d "offlineMessage=&json=%7B%22offlineMessage%22%3A+%22%22%7D&Submit=Yes" http://JENKINS_HOST/computer/THE_NODE_TO_DISCONNECT/doDisconnect
Then when I boot the agent node, the agent launches and automatically connects, and the master notices the agent is back online (and will then send it jobs).
I was also able to toggle a node's availability on and off with this command (using 'toggleOffline' instead of 'doDisconnect'):
curl -d "offlineMessage=back_in_a_moment&json=%7B%22offlineMessage%22%3A+%22back_in_a_moment%22%7D&Submit=Mark+this+node+temporarily+offline" http://JENKINS_HOST/computer/NODE_TO_DISCONNECT/toggleOffline
(Running the same command again puts the node status back to normal.)
The above may not apply to you since it sounds like you want to do everything from one jenkins job running on the agent node. And I'm not sure what happens if an agent node disconnects or marks itself offline in the middle of running a job. :)
Still, you might poke around in this Remote Access API doc a bit to see what else is possible with this kind of approach.
Very easy. You create a Master job that runs on the Master, from the master job you call the client job as a build step (it's a new kind of build step and I love it). You need to check that the master job should wait for the client job to finish. Then you can run your script to reconfigure your client and run the second test on the client.
An even better strategy is to have two nodes running on your slave machines. You need to configure two nodes in Jenkins. I used that strategy successfully with a unix slave. The reason was that I needed different environment variables to be set up and I didn't wanted to push that into the jobs. I used ssh clients, so I don't know if it is possible with different client types. Than you might be able to run both tests at the same time or you chain the jobs or use the master strategy mentioned above.