Windows Server 2012 VM created from Snapshot fails to Boot on GCE - google-compute-engine

I am not a Windows expert and stuck while recovering Windows Server 2012 from Snapshot. I am trying to create a new VM instance on Google Compute Engine - GCE from a Snapshot which I created from Windows Server 2012 VM Instance couple of weeks ago . Whenever I create a new VM I am never able to do RDP then after reading GCE Troubleshooting guide I determined that may be Windows is not booting up properly. I was able to view the Serial Port Output as following
SeaBIOS (version 1.8.2-20161003_105447-google)
Total RAM Size = 0x00000003c0000000 = 15360 MiB
CPUs found: 4 Max CPUs supported: 4
found virtio-scsi at 0:3
virtio-scsi vendor='Google' product='PersistentDisk' rev='1' type=0 removable=0
virtio-scsi blksize=512 sectors=314572800 = 153600 MiB
drive 0x000f3120: PCHS=0/0/0 translation=lba LCHS=1024/255/63 s=314572800
Booting from Hard Disk 0...
Its stuck at Booting from Hard Disk 0...
I dig further into it and read the Serial Port 2 log as following
Windows Boot Manager
Windows failed to start. A recent hardware or software change might be the cause. To fix the problem:
Insert your Windows installation disc and restart your computer.
Choose your language settings, and then click "Next."
Click "Repair your computer.
If you do not have this disc, contact your system administrator or computer manufacturer for assistance.
Status:
A required device isn't connected or can't be accessed.
I attached the disk to another machine and I can see all files on the disk. But can't modify any thing as the disk is write-protected.
Original VM from which I created Snapshot is still there so I tried to attach the snapshot to original VM instance as Boot Disk in order to have same hardware configuration but I can't detach the disk associated with that instance and it gives me following error.
Hot-remove of the root disk is not supported.
I tried creating VM with same Machine Type as before taking the Snapshot. But failed as well.
Any suggestions on how I can create new VM instance from Snapshot and boot Windows properly?

After going through number of articles, forums and user guides, finally I am able to spin the Windows VM Instance successfully from Snapshot.
The issue was with bootloader and Boot Configuration Data (BCD). Ideally you use Images for OS Disk. In my case OS and data were in same disk and we just had a Snapshot. GCE allows creating new VM instance from Snapshot but in my case the instance was not booting up.
Follow the step by step guide to recover your OS/Data from snapshot.
Summary:
Create Disk from Snapshot and fix the disk BCD using new temporary VM Instance.
Detailed Steps
Step 1: Create Recovery VM Instance and Start it
This instance is temporary instance and you may delete it after you recover your OS / data.
From Google Cloud Console
Select Compute Engine > VM Instances and select CREATE INSTANCE
Make sure you select the same OS as the Snapshot. Once started, make sure you can do remote desktop and login to new VM instance.
Note down instance name and zone in which the instance is running.
Step 2: Create New Disk from Snapshot
From Google Cloud Console
Select Compute Engine > Disks and select CREATE DISK
Make Sure you select the same Disk size / Disk Type as on/before taking snapshot else Windows may throw boot error. Also make sure the disk is in same zone as your recovery instance. If your disk is not in same zone as in your instance then you won't be able to attach it.
Step 3: Attach the Disk to Recovery Instance
In this step you attach the disk you (created in Step-2) to VM instance (created in Step-1)
Open Google Cloud Shell and type the following command
gcloud compute instances attach-disk [INSTANCE-NAME] --disk [DISK-NAME] --zone [ZONE]
Replace the variables with your Instance name, Disk name and Zone in which you are running the instance.
Step 4: Mount Disk and assign Drive letter in Windows
Go to Start > Search and type diskmgmt.msc to open the Disk Management tool. If the disk you just attached, shows up as Offline, right-click on it and select Online.
After making sure the disk is Online, verify that each volume on the disk has a drive letter assigned. The specific drive letters assigned is not important. If any of the volumes do not have a drive letter assignment, right-click the volume and select Change Drive Letter and Paths, then Add. Select Assign the following drive letter, let it choose the next available drive letter, then click OK. Again, the actual drive letters used doesn’t matter.
Note down the Drive letter. For me its D: drive.
Step 6: Remove write-protection from disk
Try creating a new folder in your attached drive. If your disk is write-protected and you are not able to create any file or folder in the drive then you need to turn-off the write-protection, otherwise you can skip this step.
Open Elevated Command Prompt (run as Administrator) and type
diskpart
and you will get DISKPART> prompt
Type:
list volume
System will display all volumes with number. Next you need to select the volume by typing:
select volume #
where # is the volume number. For me it's 1.
Then type following commands remove write-protection
attr disk clear readonly
attr volume clear readonly
attr volume clear hidden
attr volume clear shadowcopy
Exit diskpart by typing exit or closing command prompt window. Open the drive in Windows Explorer. You should be able to see all your data and Windows system files. Create a new folder in the drive to make sure disk is not write-protected.
Step 7: Fix Boot Configuration Data (BCD)
If you are familiar with Windows bcedit command, then by all means use bcedit, but I used EasyBCD for fixing the boot configuration data.
Download and install EasyBCD on your Recovery VM Instance from https://neosmart.net/EasyBCD
Once installed open EasyBCD and click on
File > Select BCD Store
and file selection dialog under filename enter D:\Boot\BCD or whatever drive letter you assigned in Step 5. The system will show you Boot Configuration Data for your drive.
Click on Edit Boot Menu button and select Skip the boot menu and click on Save settings.
Click on Advanced Settings button and under the Basic tab click on Drive: menu and select the Drive letter of the disk.
Please note: The drive letter should be the same as Step-5
Click on BCD Backup/Repair button and under the BCD management options select Re-create/repair boot files and click on Perform Action button.
Make the disk offline by Opening Disk Management again and right-click the disk and select Offline.
Now minimize your RDP window and in Google Cloud Shell type following command to detach the disk from Recovery instance
gcloud compute instances detach-disk [INSTANCE-NAME] --disk [DISK-NAME] --zone [ZONE]
Now you have fixed the Boot Configuration Data of the Disk created from Snapshot.
We are now ready to spin the VM Instance and boot it using this disk. Let's create instance from disk
Step 8: Create new VM Instance
From Google Cloud Console select Compute Engine > VM Instances and select CREATE INSTANCE
In order to avoid any problem at first run, make sure you select the same Machine Type as on/before the snapshot creation.
At this point you should be able to have working VM instance and you should be able to do RDP login.
In case you are still facing any issues, then have a look at your serial port log by clicking the VM instance in Google Cloud Console and then scroll down to bottom of the page or you can type the following command in Google Cloud Shell.
gcloud compute instances get-serial-port-output [INSTANCE-NAME] --zone [ZONE]
Lesson Learned
For OS disks with / without data, use images instead of snapshots.
Do not keep your data in same disk as OS even if it's a test machine and you are doing some temporary work.

Thanks for the detailed the answer. The key issue here is that Windows will mark the volume as read-only/snapshot when you take a VSS initiated snapshot. GCE Persistent Disk Snapshot will preserve that flag. All the data is correct and consistent. All you need to do is to remove that flags as step1-step6 outlined.
A small suggestion for step7. You can simply use bulit-in tool bcdboot.exe to fix it.
bcdboot.exe D:\Windows /s D:
(BCD Hive is actually all right, however, the disk signature in MBR is modified by Windows when you manually online a disk. That's why we need to run the command here to bring it back to be consistent with boot database).

#Shaheryar's answer is awesome!! If you need to automate / script some of the steps, you can use the following:
Step 4 and 6
The disk can be made online and read flags cleared by putting the following in a mount.txt file:
rem This script is mean to be run with diskpart /s mount.txt
rem It mounts the snapshot volume to D:/ and makes it writeable
list disk
rem The snapshot disk should be on disk #1
rem Use `list disk` to find it if it changed - also update unmount.txt
select disk 1
online disk
rem The partition should be on volume #2
rem Use `list volume` to find it if it changed - also update unmount.txt
select volume 2
assign letter=D
attr disk clear readonly
attr volume clear readonly
attr volume clear hidden
attr volume clear shadowcopy
exit
From an admin console/shell, run this file using
diskpart /s mount.txt
Step 7
The GUI steps to fix the BCD can be replaced with the following commands in an admin Powershell session:
attrib -h -s D:\Boot\BCD
Remove-Item -Path D:\Boot\BCD
# Recreate BCD
bcdedit.exe /createstore D:\Boot\BCD
bcdedit.exe /store D:\Boot\BCD /create "{bootmgr}" /d "Windows Boot Manager"
bcdedit.exe /store D:\Boot\BCD /set "{bootmgr}" device partition=D:
bcdedit.exe /store D:\Boot\BCD /set "{bootmgr}" displaybootmenu No
bcdedit.exe /store D:\Boot\BCD /timeout 0
$out = bcdedit.exe /store D:\Boot\BCD /create /d "Microsoft Server" /application osloader
$id = Select-String -Input $out -Pattern '{.+}' -AllMatches | % { $_.Matches } | % { $_.Value }
# Output from above
bcdedit.exe /store D:\Boot\BCD /set "$id" device partition=D:
bcdedit.exe /store D:\Boot\BCD /set "$id" osdevice partition=D:
bcdedit.exe /store D:\Boot\BCD /set "$id" path \Windows\system32\winload.exe
bcdedit.exe /store D:\Boot\BCD /set "$id" systemroot \Windows
bcdedit.exe /store D:\Boot\BCD /set "$id" nx OptOut
bcdedit.exe /store D:\Boot\BCD /set "$id" allowedinmemorysettings 0x15000075
bcdedit.exe /store D:\Boot\BCD /set "$id" inherit "{6efb52bf-1766-41db-a6b3-0ee5eff72bd7}"
bcdedit.exe /store D:\Boot\BCD /set "$id" resumeobject "{45e743a4-934e-11e9-a872-8c17a054efa7}"
bcdedit.exe /store D:\Boot\BCD /displayorder "$id"
# Print result
bcdedit.exe /store D:\Boot\BCD /enum /v
These steps are adopted from the creators of EasyBCD. The missing steps were found by trial-and-error to match the resulting file the EasyBCD GUI will create.
Missing unmount step
You might also want to unmount the volume before detaching it. This can be done by putting the following in an unmount.txt file:
rem This script unmounts the snapshot after its bootloader has been fixed
rem See mount.txt for why volume 2
select volume 2
remove
rem See mount.txt for why disk 1
select disk 1
offline disk
exit
Finally, run it using an admin console/shell
diskpart /s mount.txt

Related

Running task in the background?

If we are submitting a task to the compute engine through ssh from host machine and if we shut down the host machine is there a way that we can get hold of the output of the submitted task later on when we switch on the host machine?
From the Linux point of view ‘ssh’ and ‘gcloud compute ssh’ are commands like all the others, therefore it is possible to redirect their output to a file while the command is performed using for example >> to redirect and append stdout to a file or 2>> to store stderr.
For example if you run from the first instance 'name1':
$ gcloud compute ssh name2 --command='watch hostname' --zone=XXXX >> output.out
where 'name2' is the second instance, and at some point you shutdown 'name1' you will find stored into output.out the output provided by the command till the shutdown occurred.
Note that there is also the possibility to create shut down scripts, that in this scenario could be useful in order to upload output.out to a bucket or to perform any kind of clean-up operation.
In order to do so you can run the following command
$ gcloud compute instances add-metadata example-instance --metadata-from-file shutdown-script=path/to/script_file
Where the content of the script could be something like
#! /bin/bash
gsutil cp path/output.out gs://yourbucketname
Always keep in mind that Compute Engine only executes shutdown scripts on a best-effort basis and does not guarantee that the shutdown script will be run in all cases.
More Documentation about shutdown scrips if needed.

Service Fabric SDK 2.2.207 how to change data and log paths?

Since installing Service Fabric SDK 2.2.207 I'm not able to change the cluster data and log paths (with previous SDKs I could).
I tried:
Editing the registry keys in HKLM\Software\Microsoft\Service Fabric - they just revert back to C:\SfDevCluster\data and C:\SfDevCluster\log when the cluster is created.
Running poweshell: & "C:\Program Files\Microsoft SDKs\Service Fabric\ClusterSetup\DevClusterSetup.ps1" -PathToClusterDataRoot d:\SfDevCluster\data -PathToClusterLogRoot d:\SfDevCluster\log - this works successfully but upon changing the cluster mode to 1-node (newly available configuration with this SDK), the cluster moves to the C drive.
Any help is appreciated!
Any time you switch cluster mode on local dev box, existing cluster is removed and a new one is created. You can pass use \DevClusterSetup.ps1 to switch mode from 5->1 node, by passing -CreateOneNodeCluster to create one node cluster and pass Data and Log root paths to it as well.

Add a new persistent disk to GCE which created with VirtualBox custom image

I followed the tutorial google has up on youtube for creating a custom image for compute engine using VirtualBox by the link as follow
https://www.youtube.com/watch?v=YlcR6ZLebTM
I have succeed in created custom images and imported it to the Google Compute Engine.
But when I try to follow this document to attached a new persistent disk :
https://cloud.google.com/compute/docs/disks/persistent-disks#attachdiskcreation
The document mentions a command line tool :
/usr/share/google/safe_format_and_mount
but the folder /usr/share/google does not exist in my custom image.
How can I install it ?
or is there another way to mount a new persistence disk in GCE instance
?
The /usr/share/google/safe_format_and_mountcommand comes with the Google Compute Engine image packages. You can see the source code here.
You can either install the packages or run these commands:
1- Determine the device location of your new persistent disk: ls -l /dev/disk/by-id/google-*. Let's suppose it's /dev/sdb
2- sudo mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 -F /dev/sdb
3- sudo mount -o discard,defaults /dev/sdb <destination_folder>
Run df-h or mount to check if your disk is already mounted in the destination folder.

Google Compute Engine - Clone Instance

I have a GCE instance that I have customised and uploaded various applications to (such as PHP apps running under Apache). I now want to duplicate this instance - i.e. everything on it.
I originally thought clone might do this but I had a play around with it and it only seems to clone the instance config and not anything customised on it.
I've been googling it and it looks like what I need to do is create an image and use this image on a new instance or clone?
Is that correct?
If so, are there any could steps by steps out there to do this?
I had a look at the Google page on images and it talks about having to terminate the instance to do this. I'm a bit wary of this. Maybe it's just the language used in the docs, but I don't want to lose my existing instance.
Also, will everything be stored on the image?
So, for example, will the following all make it onto the image?
MySQL - config & databases schemas & data?
Apache - All installed apps under /var/www/html
PHP - php.ini, etc...
All other server configs/modifications?
You can create a snapshot of the source instance, then create a new instance selecting the source snapshot as disk. It will replicate the server very fast. For other attached disks, you have to create a new disk and copy file by net (scp, rsync etc)
In the Web Console, create a snapshot, then click on the snapshot and over CREATE INSTANCE button, you can customize the settings and then click where it says:
Equivalent REST or command line
and copy the command line, this will be your template.
From this, you can create a a BASH script (clone_instance.sh), I did something like this:
#!/bin/bash -e
snapshot="my-snapshot-name"
gcloud_account="ACCOUNTNUMBER-compute#developer.gserviceaccount.com"
#clone 10 machines
for machine in 01 02 03 04 05 06 07 08 09 10
do
gcloud compute --project "myProject" disks create "instance-${machine}" \
--size "220" --zone "us-east1-d" --source-snapshot "${snapshot}" \
--type "pd-standard"
gcloud compute --project "bizqualify" instances create "webscrape-${machine}" \
--zone "us-east1-d" --machine-type "n1-highmem-4" --network "default" \
--maintenance-policy "MIGRATE" \
--service-account "ACCOUNTNUMBER-compute#developer.gserviceaccount.com" \
--scopes "https://www.googleapis.com/auth/devstorage.read_only","https://www.googleapis.com/auth/logging.write","https://www.googleapis.com/auth/monitoring.write","https://www.googleapis.com/auth/servicecontrol","https://www.googleapis.com/auth/service.management.readonly","https://www.googleapis.com/auth/trace.append" \
--tags "http-server","https-server" \
--disk "name=webscrape-${machine},device-name=webscrape-${machine},mode=rw,boot=yes,auto-delete=yes"
done
Now, in your terminal, you can execute your script
sh clone_instance.sh
In case you have other disks attached, the best way without actually unmounting them is changing the path of how they're mounted in /etc/fstab.
If you use the UUID in fstab and use the same disks from snapshots (which will have the same UUIDs) then you can do the cloning without unmounting anything.
Just change each disk in fstab to use UUID like this
UUID=[UUID_VALUE] [MNT_DIR] ext4 discard,defaults,[NOFAIL] 0 2
you can get the UUID from
sudo blkid /dev/[DEVICE_ID]
if you're unsure about your DEVICE_ID you can use
sudo lsblk
to get the list of device ids used by your system.
It's 2021 and this is now very simple:
Click the VM Instance you want to clone
Click "Create Machine Image" at the top
From Machine Images on the left, open your new image and click "Create VM Instance"
This will clone the machine specs and data.
As was mentioned, if the source instance has a secondary disk attached, it is not possible to ssh into the new instance.
I had to take a snapshot of a production instance, so I couldn't unmount the secondary disk without causing disruption.
I was able to fix the problem by creating a disk from the snapshot, mounting the disk on another instance, removing any reference to the secondary disk, i.e., removing the entry from /etc/fstab.
Once I had done that, I was able to use the disk as boot disk in a new instance, and ssh to it.
You can use the GCP Import VM option, to import this machine back to the project.

How to solve jenkins 'Disk space is too low' issue?

I have deployed Jenkins in my CentOS machine, Jenkins was working well for 3 days, but yesterday there was a Disk space is too low. Only 1.019GB left. problem.
How can I solve this problem, it make my master offline for hours?
You can easily change the threshold from jenkins UI (my version is 1.651.3):
[]
Update: How to ensure high disk space
This feature is meant to prevent working on slaves with low free disk space. Lowering the threshold would not solve the fact that some jobs do not properly cleanup after they finish.
Depending on what you're building:
Make sure you understand what is the disk output of your build - if possible - restrict the output to happen only to the job workspace. Use workspace cleanup plugin to cleanup the workspace as post build step.
If the process must write some data to external folders - clean them up manually on post build steps.
Alternative1 - provision a new slave per job (use spot slaves - there are many plugins that integrate with different cloud provider to provision on the fly machines on demand)
Alternative2 - run the build inside a container. Everything will be discarded once the build is finished
Beside above solutions, there is a more "COMMON" way - directly delete the largest space consumer from Linux machine. You can follow the below steps:
Login to Jenkins machine (Putty)
cd to the Jenkins installation path
Using ls -lart to list out hidden folder also, normally jenkin
installation is placed in .jenkins/ folder
[xxxxx ~]$ ls -lart
drwxrwxr-x 12 xxxx 4096 Feb 8 02:08 .jenkins/
list out the folders spaces
Use df -h to show Disk space in high level
du -sh ./*/ to list out total memory for each subfolder in current path.
du -a /etc/ | sort -n -r | head -n 10 will list top 10 directories eating disk space in /etc/
Delete old build or other large size folder
Normally ./job/ folder or ./workspace/ folder can be the largest folder. Please go inside and delete base on you need (DO NOT
delete entire folder).
rm -rf theFolderToDelete
You can limit the reduce of disc space by discarding the old builds. There's a checkbox for this in the project configuration.
This is actually a legitimate question so I don't understand the downvotes, perhaps it belongs on Superuser or Serverfault. This is a soft warning threshold not hard limit where the disk is out of space.
For hudson see where to configure hudson node disk temp space thresholds - this is talking about the host, not nodes
Jenkins is the same. The conclusion is for many small projects the system property called hudson.diagnosis.HudsonHomeDiskUsageChecker.freeSpaceThreshold could be decreased.
In saying that I haven't tested it and there is a disclaimer
No compatibility guarantee
In general, these switches are often experimental in nature, and subject to change without notice. If you find some of those useful, please file a ticket to promote it to the official feature.
I got the same issue. My jenkins version is 2.3 and its UI is slightly different. Putting it here so that it may helps someone. Increasing both disk space thresholds to 5GB fixed the issue.
I have a cleanup job with the following build steps. You can schedule it #daily or #weekly.
Execute system groovy script build step to clean up old jobs:
import jenkins.model.Jenkins
import hudson.model.Job
BUILDS_TO_KEEP = 5
for (job in Jenkins.instance.items) {
println job.name
def recent = job.builds.limit(BUILDS_TO_KEEP)
for (build in job.builds) {
if (!recent.contains(build)) {
println "Preparing to delete: " + build
build.delete()
}
}
}
You'd need to have Groovy plugin installed.
Execute shell build step to clean cache directories
rm -r ~/.gradle/
rm -r ~/.m2/
echo "Disk space"
du -h -s /
To check the free space as Jenkins Job:
Parameters
FREE_SPACE: Needed free space in GB.
Job
#!/usr/bin/env bash
free_space="$(df -Ph . | awk 'NR==2 {print $4}')"
if [[ "${free_space}" = *G* ]]; then
free_space_gb=${x/[^0-9]*/}
if [[ ${free_space_gb} -lt ${FREE_SPACE} ]]; then
echo "Warning! Low space: ${free_space}"
exit 2
fi
else
echo "Warning! Unknown: ${free_space}"
exit 1
fi
echo "Free space: ${free_space}"
Plugins
Set build description
Post-Build Actions
Regular expression: Free space: (.*)
Description: Free space: \1
Regular expression for failed builds: Warning! (.*)
Description for failed builds: \1
For people who do not know where the configs are, download the tmpcleaner from
https://updates.jenkins-ci.org/download/plugins/tmpcleaner/
You will get an hpi file here. Go to Manage Jenkins-> Manage plugins-> Advanced and then upload the hpi file here and restart jenkins
You can immediately see a difference if you go to Manage Nodes.
Since my jenkins was installed in a debian server, I did not understand most of the answers related to this since i cannot find a /etc/default folder or jenkins file.
If someone knows where the /tmp folder is or how to configure it for debian , do let me know in comments