google compute engine mounting persistant disk issues - google-compute-engine

I am following this guide https://developers.google.com/compute/docs/troubleshooting#ssherrors specifically the section about recovering your persistent disk with another vm.
I am trying to follow this part:
mount /dev/disk/by-id/scsi-0Google_PersistentDisk_myinstance-debugging /mnt/myinstance
This is the error I get:
root#debugger:~# mount /dev/disk/by-id/scsi-0Google_PersistentDisk_marty-wll-debugging /mnt/marty-wll
mount: you must specify the filesystem type
I am unsure of the filesystem due to google-compute disks being used, and the system has already been deleted and attached to another machine following the google developers guide I referenced above.
parted scsi-0Google_PersistentDisk_marty-wll-debugging -l
root#debugger:/dev/disk/by-id# parted scsi-0Google_PersistentDisk_marty-wll-debugging -l
Model: Google PersistentDisk (scsi)
Disk /dev/sda: 10.7GB
Sector size (logical/physical): 512B/4096B
Partition Table: msdos
Number Start End Size Type File system Flags
1 1049kB 10.7GB 10.7GB primary ext4
Model: Google PersistentDisk (scsi)
Disk /dev/sdb: 10.7GB
Sector size (logical/physical): 512B/4096B
Partition Table: msdos
Number Start End Size Type File system Flags
1 1049kB 10.7GB 10.7GB primary ext4
gave me the information that its "ext4".
although when I issue the following command I still get an error:
root#debugger:~# mount -t ext4 /dev/disk/by-id/scsi-0Google_PersistentDisk_marty-wll-debugging /mnt/marty-wll
mount: wrong fs type, bad option, bad superblock on /dev/sdb,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
dmesg of syslog said :
[ 2452.205447] EXT4-fs (sdb): VFS: Can't find ext4 filesystem
any ideas?

Thanks for pointing this out, I will update the docs. Try adding -part1 to the end of your device name. This will mount the partition, instead of the disk. For your specific case:
mount /dev/disk/by-id/scsi-0Google_PersistentDisk_myinstance-debugging-part1 /mnt/myinstance
Also, there are cleaner aliases, so this should work as well:
mount /dev/disk/by-id/google-myinstance-debugging-part1 /mnt/myinstance

Related

import of file failed not enough space

I don't have a tech background, but decided to create a Google Compute Engine instance to be able to create a 1.6GB database. I use PostgreSQL (Pgadmin3) to access the database on GCE. It has been working fine so far, but now I am trying to import a new file and am getting the error message
"Panic: could not write to file "pg_xlog/xlogtemp.1293": No space left on the device"
I looked at the documentation in Google for resizing the disk. I did a df -h in SSH and got
/dev/sda1 9.8G 9.2G 16M 100% /
I also did
sudo lsblk --output NAME,size,state
NAME SIZE STATE
sda 20G running
└─sda1 10G
So then I did
sudo resize2fs /dev/disk/by-id/google-myddisk-part1
resize2fs 1.42.12 (29-Aug-2014)
The filesystem is already 2620928 (4k) blocks long. Nothing to do!
Anyone have any ideas?
As this is the root disk you have to reboot the instance. It should then automatically resize the disk during boot. Please check details here: https://cloud.google.com/compute/docs/disks/create-root-persistent-disks#repartitionrootpd

Google Compute instance won't mount persistent disk, maintains ~100% CPU

During some routine use of my web server (saving posts via WordPress), my instance suddenly jumped up to 400% CPU usage and wouldn't come back down below 100%. Restarting and stopping/starting the instance didn't change anything.
Looking at the last bit of my serial output:
[ 0.678602] md: Waiting for all devices to be available before autodetect
[ 0.679518] md: If you don't use raid, use raid=noautodetect
[ 0.680548] md: Autodetecting RAID arrays.
[ 0.681284] md: Scanned 0 and added 0 devices.
[ 0.682173] md: autorun ...
[ 0.682765] md: ... autorun DONE.
[ 0.683716] VFS: Cannot open root device "sda1" or unknown-block(0,0): error -6
[ 0.685298] Please append a correct "root=" boot option; here are the available partitions:
[ 0.686676] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[ 0.688489] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.19.0-30-generic #34~14.04.1-Ubuntu
[ 0.689287] Hardware name: Google Google, BIOS Google 01/01/2011
[ 0.689287] ffffea00008ae400 ffff880024ee7db8 ffffffff817af477 000000000000111e
[ 0.689287] ffffffff81a7c6c0 ffff880024ee7e38 ffffffff817a9338 ffff880024ee7dd8
[ 0.689287] ffffffff00000010 ffff880024ee7e48 ffff880024ee7de8 ffff880024ee7e38
[ 0.689287] Call Trace:
[ 0.689287] [<ffffffff817af477>] dump_stack+0x45/0x57
[ 0.689287] [<ffffffff817a9338>] panic+0xc1/0x1f5
[ 0.689287] [<ffffffff81d3e5f3>] mount_block_root+0x210/0x2a9
[ 0.689287] [<ffffffff81d3e822>] mount_root+0x54/0x58
[ 0.689287] [<ffffffff81d3e993>] prepare_namespace+0x16d/0x1a6
[ 0.689287] [<ffffffff81d3e304>] kernel_init_freeable+0x1f6/0x20b
[ 0.689287] [<ffffffff81d3d9a7>] ? initcall_blacklist+0xc0/0xc0
[ 0.689287] [<ffffffff8179fab0>] ? rest_init+0x80/0x80
[ 0.689287] [<ffffffff8179fabe>] kernel_init+0xe/0xf0
[ 0.689287] [<ffffffff817b6d98>] ret_from_fork+0x58/0x90
[ 0.689287] [<ffffffff8179fab0>] ? rest_init+0x80/0x80
[ 0.689287] Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 0.689287] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
(Not sure if it's obvious from that, but I'm using the standard Ubuntu 14.04 image)
I've tried taking snapshots and mounting them on new instances, and now I've even deleted the instance and mounted the disk on to a new one, still the same issue and exactly the same serial output.
I really hope my data has not been hopelessly corrupted. Not sure if anyone has any suggestions on recovering data from a persistent disk?
Note that the accepted answer for: Google Compute Engine VM instance: VFS: Unable to mount root fs on unknown-block did not work for me.
I posted this on another question, but this question is worded better, so I'll re-post it here.
What Causes This?
That is the million dollar question. After inspecting my GCE VM, I found out there were 14 different kernels installed taking up several hundred MB's of space. Most of the kernels didn't have a corresponding initrd.img file, and were therefore not bootable (including 3.19.0-39-generic).
I certainly never went around trying to install random kernels, and once removed, they no longer appear as available upgrades, so I'm not sure what happened. Seriously, what happened?
Edit: New response from Google Cloud Support.
I received another disconcerting response. This may explain the additional, errant kernels.
"On rare occasions, a VM needs to be migrated from one physical host to another. In such case, a kernel upgrade and security patches might be applied by Google."
How to recover your instance...
After several back-and-forth emails, I finally received a response from support that allowed me to resolve the issue. Be mindful, you will have to change things to match your unique VM.
Take a snapshot of the disk first in case we need to roll back any of the changes below.
Edit the properties of the broken instance to disable this option: "Delete boot disk when instance is deleted"
Delete the broken instance.
IMPORTANT: ensure not to select the option to delete the boot disk. Otherwise, the disk will get removed permanently!!
Start up a new temporary instance.
Attach the broken disk (this will appear as /dev/sdb1) to the temporary instance
When the temporary instance is booted up, do the following:
In the temporary instance:
# Run fsck to fix any disk corruption issues
$ sudo fsck.ext4 -a /dev/sdb1
# Mount the disk from the broken vm
$ sudo mkdir /mnt/sdb
$ sudo mount /dev/sdb1 /mnt/sdb/ -t ext4
# Find out the UUID of the broken disk. In this case, the uuid of sdb1 is d9cae47b-328f-482a-a202-d0ba41926661
$ ls -alt /dev/disk/by-uuid/
lrwxrwxrwx. 1 root root 10 Jan 6 07:43 d9cae47b-328f-482a-a202-d0ba41926661 -> ../../sdb1
lrwxrwxrwx. 1 root root 10 Jan 6 05:39 a8cf6ab7-92fb-42c6-b95f-d437f94aaf98 -> ../../sda1
# Update the UUID in grub.cfg (if necessary)
$ sudo vim /mnt/sdb/boot/grub/grub.cfg
Note: This ^^^ is where I deviated from the support instructions.
Instead of modifying all the boot entries to set root=UUID=[uuid character string], I looked for all the entries that set root=/dev/sda1 and deleted them. I also deleted every entry that didn't set an initrd.img file. The top boot entry with correct parameters in my case ended up being 3.19.0-31-generic. But yours may be different.
# Flush all changes to disk
$ sudo sync
# Shut down the temporary instance
$ sudo shutdown -h now
Finally, detach the HDD from the temporary instance, and create a new instance based off of the fixed disk. It will hopefully boot.
Assuming it does boot, you have a lot of work to do. If you have half as many unused kernels as me, then you might want to purge the unused ones (especially since some are likely missing a corresponding initrd.img file).
I used the second answer (the terminal-based one) in this askubuntu question to purge the other kernels.
Note: Make sure you don't purge the kernel you booted in with!
In order to recover your data, you need to create a brand new instance where you can ssh, and attach the corrupted disk to it as a secondary disk. More information can be found in this article. I would suggest taking a snapshot of the corrupted disk before attaching it, for backup purposes.

Mount LVM2 and recover lost data from failed HDD?

HDD failure occurred.
So, a new primary HDD was added in and the old HDD was added in as a secondary one.
I'm trying to mount my secondary HDD but there are errors occurring.
I made /media/qwe/.
I then went on Putty and used these SSH commands:
root#chicken [/]# mount /dev/sdb2 /media/qwe
mount: unknown filesystem type 'LVM2_member'
But, I got an error.
root#chicken [/]# vgscan
Reading all physical volumes. This may take a while...
Found volume group "VolGroup" using metadata type lvm2
Found volume group "VolGroup" using metadata type lvm2
root#chicken [/]# vgs
VG #PV #LV #SN Attr VSize VFree
VolGroup 1 3 0 wz--n- 1.82t 0
VolGroup 1 3 0 wz--n- 1.82t 0
I use cPanel and WHM.
I am trying to recover the MySQL databases that were lost. I managed to mount the sdb1 bit, but I think that's the boot partition. I don't need that. I need to access the other files!
Any help?
You don't need file system to get you data back.
Start with taking an image from the failed disk

Unable to access Google Compute Engine instance using external IP address

I have a Google compute engine instance(Cent-Os) which I could access using its external IP address till recently.
Now suddenly the instance cannot be accessed using its using its external IP address.
I logged in to the developer console and tried rebooting the instance but that did not help.
I also noticed that the CPU usage is almost at 100% continuously.
On further analysis of the Serial port output it appears the init module is not loading properly.
I am pasting below the last few lines from the serial port output of the virtual machine.
rtc_cmos 00:01: RTC can wake from S4
rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
rtc0: alarms up to one day, 114 bytes nvram
cpuidle: using governor ladder
cpuidle: using governor menu
EFI Variables Facility v0.08 2004-May-17
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
usbhid: v2.6:USB HID core driver
GRE over IPv4 demultiplexor driver
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 17
registered taskstats version 1
rtc_cmos 00:01: setting system clock to 2014-07-04 07:40:53 UTC (1404459653)
Initalizing network drop monitor service
Freeing unused kernel memory: 1280k freed
Write protecting the kernel read-only data: 10240k
Freeing unused kernel memory: 800k freed
Freeing unused kernel memory: 1584k freed
Failed to execute /init
Kernel panic - not syncing: No init found. Try passing init= option to kernel.
Pid: 1, comm: swapper Not tainted 2.6.32-431.17.1.el6.x86_64 #1
Call Trace:
[] ? panic+0xa7/0x16f
[] ? init_post+0xa8/0x100
[] ? kernel_init+0x2e6/0x2f7
[] ? child_rip+0xa/0x20
[] ? kernel_init+0x0/0x2f7
[] ? child_rip+0x0/0x20
Thanks in advance for any tips to resolve this issue.
Mathew
It looks like you might have an script or other program that is causing you to run out of Inodes.
You can delete the instance without deleting the persistent disk (PD) and create a new vm with a higher capacity using your PD, however if it's an script causing this, you will end up with the same issue. It's always recommended to backup your PD before making any changes.
Run this command to find more info about your instance:
gcutil --project= getserialportoutput
If the issue still continue, you can either
- Make a snapshot of your PD and make a PD's copy or
- Delete the instance without deleting the PD
Attach and mount the PD to another vm as a second disk, so you can access it to find what is causing this issue. Visit this link https://developers.google.com/compute/docs/disks#attach_disk for more information on how to do this.
Visit this page http://www.ivankuznetsov.com/2010/02/no-space-left-on-device-running-out-of-inodes.html for more information about inodes troubleshooting.
Make sure the Allow HTTP traffic setting on the vm is still enabled.
Then see which network firewall you are using and it's rules.
If your network is set up to use an ephemral IP, it will be periodically released back. This will cause your IP to change over time. Set it to static/reserved then (on networks page).
https://developers.google.com/compute/docs/instances-and-network#externaladdresses

Starting instance again after power off

How do I start instance on GCE again after power off.
Instance shows TERMINATED , but has PERSISTENT disk type.
if I use add instance with the same instance name it asks me for the
Select an new image with only choice of OS level, not my existing disk.
then fails with
ERROR: RESOURCE_ALREADY_EXISTS: The resource XXXX already exists
Is there way to start (or clone) copy of image once stopped?
Anything similar to AWS stop/start. I don't care about instance state or scratch to be saved, just start since I have boot disk stored and payed for.
Success, below is stop/start procedure, assuming that $PROJECT and $INSTANCE are set appropriately:
#--------- stop instance -----
#connect and shutdown
gcutil --project=$PROJECT ssh $INSTANCE
sudo shutdown -h now
# check
gcutil listinstances --project $PROJECT
#delete instance/keep boot disk , use -f to avoid confirmation
gcutil --project=$PROJECT deleteinstance $INSTANCE --nodelete_boot_pd
# check disks
gcutil listdisks --project=$PROJECT
#--------- start new instance -----
# launch instance using the existing disk (has to be in the same zone!)
gcutil --project=$PROJECT addinstance $INSTANCE --disk=$DISK,boot --zone=$ZONE --machine_type=n1-standard-1
#check that it's running
gcutil listinstances --project $PROJECT
You're on the right track. You just need to delete the existing TERMINATED instance before adding it again.
Even though the instance isn't running when it is TERMINATED, the resources (such as Persistent Disk) are still allocated to it.
Also, if this instance was created before December 5th, (when Compute Engine went GA), you'll need to add a kernel to the disk or it won't boot. See the transition guide for details.
(For a temporary work around to upgrading the kernel, see this Q/A: My Google Compute Engine instances hang during boot using the v1 API)