I was trying to add an IP# to my Google Compute Engine (RHEL7) instance, but I typed the invocation wrong:
sudo ifconfig eth0 1.2.3.4
The existing IP# on eth0 was 1.2.3.3, so that invocation changed my existing IP# to one that isn't known to anything else. And so I lost all connections (ssh, http, even ping) to the instance.
How do I recover from this mistake? Is there a gcloud or GCP Console method I can use, since I can't connect directly to the instance anymore.
Since the ifconfig was invoked from a shell, not reconfigured in any startup scripts (or anywhere else), just resetting the instance will reboot it and cause it to config its eth0 according to its startup scripts:
$ gcloud compute instances list
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
<instance-name> <instance-zone> <machine-type> <preemptible> <bad-internal-ip#> <external-ip#>
$ gcloud compute instances reset <instance-name>
For the following instance:
- [<instance-name>]
choose a zone:
[1] asia-east1-a
[2] asia-east1-b
[...]
Please enter your numeric choice: <N-of-instance-zone>
Updated [https://www.googleapis.com/compute/v1/projects/<project-name>/zones/<instance-zone>/instances/<instance-name].
$ gcloud compute instances list
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
<instance-name> <instance-zone> <machine-type> <preemptible> <default-internal-ip#> <external-ip#> RUNNING
After you enter your numeric zone it can take several seconds or a longer (but probably not more than 5 minutes) for the instance to restart.
Look around in the cloud platform console. You usually can change the external IP, then go the long way around - provided its instanced.
Related
I read all I could find, but documentation on this scenario is scant or unclear for podman. I have the following (contrived) ROOTLESS podman setup:
pod-1 name: pod1
Container names in pod1:
p1c1 -- This is also it's assigned hostname within pod1
p1c2 -- This is also it's assigned hostname within pod1
p1c3 -- This is also it's assigned hostname within pod1
pod-2 name: pod2
Container names in pod2:
p2c1 -- This is also it's assigned hostname within pod2
p2c2 -- This is also it's assigned hostname within pod2
p2c3 -- This is also it's assigned hostname within pod2
I keep certain containers in different pods specifically to avoid port conflict, and to manage containers as groups.
QUESTION:
Give the above topology, how do I communicate between, say, p1c1 and p2c1? In other words, step-by-step, what podman(1) commands do I issue to collect the necessary addressing information for pod1:p1c1 and pod2:p2c1, and then use that information to configure applications in them so they can communicate with one another?
Thank you in advance!
EDIT: For searchers, additional information can be found here.
Podman doesn't have anything like the "services" concept in Swarm or Kubernetes to provide for service discovery between pods. Your options boil down to:
Run both pods in the same network namespace, or
Expose the services by publishing them on host ports, and then access them via the host
For the first solution, we'd start by creating a network:
podman network create shared
And then creating both pods attached to the shared network:
podman pod create --name pod1 --network shared
podman pod create --name pod2 --network shared
With both pods running on the same network, containers can refer to
the other pod by name. E.g, if you were running a web service in
p1c1 on port 80, in p2c1 you could curl http://pod1.
For the second option, you would do something like:
podman pod create --name pod1 -p 1234:1234 ...
podman pod create --name pod2 ...
Now if p1c1 has a service listening on port 1234, you can access that from p2c1 at <some_host_address>:1234.
If I'm interpreting option 1 correctly, if the applications in p1c1 and p2c1 both use, say, port 8080; then there won't be any conflict anywhere (either within the pods and the outer host) IF I publish using something like this: 8080:8080 for app in p1c1 and 8081:8080 for app in p2c1? Is this interpretation correct?
That's correct. Each pod runs with its own network namespace
(effectively, it's own ip address), so services in different pods can
listen on the same port.
Can the network (not ports) of a pod be reassigned once running? REASON: I'm using podman-compose(1), which creates things for you in a pod, but I may need to change things (like the network assignment) after the fact. Can this be done?
In general you cannot change the configuration of a pod or a
container; you can only delete it and create a new one. Assuming that
podman-compose has relatively complete support for the
docker-compose.yaml format, you should be able to set up the network
correctly in your docker-compose.yaml file (you would create the
network manually, and then reference it as an external network in
your compose file).
Here is a link to the relevant Docker documentation. I haven't tried this myself with podman.
Accepted answer from #larsks will only work for rootful containers. In other words, run every podman commands with sudo prefix. (For instance when you connect postgres container from spring boot application container, you will get SocketTimeout exception)
If two containers will work on the same host, then get the ip address of the host, then <ipOfHost>:<port>. Example: 192.168.1.22:5432
For more information you can read this blog => https://www.redhat.com/sysadmin/container-networking-podman
Note: The above solution of creating networks, only works in rootful mode. You cannot do podman network create as a rootless user.
"Internal 500 server error" after VM runs for a day or two.
This is the second time it has happened, I start the instance, install littlest Jupyterhub
(see details below). I can login to the external ip, for a day, but then it stops
with internal 500 error. I cannot ssh or get into the instance, only alternate is to
create a new instance and re-do. What is the problem?
I have installed littlest jupyterhub using on this instance, using
#!/bin/bash
curl https://raw.githubusercontent.com/jupyterhub/the-littlest-jupyterhub/master/bootstrap/bootstrap.py | sudo python3 - --admin master
I would recommend you enable access on your instance to the serial console [1].
You will also need to setup a password for your user following this documentation [2].
With these two steps done, you should be able to reconnect to your instance once you are locked out like you mentioned by following this [3].
You should then be able to investigate what is going on in the instance.
Then try to verify if your application is still running, if the SSH server is still running etc.
Frederic
[1] https://cloud.google.com/compute/docs/instances/interacting-with-serial-console#enable_instance_access
[2] https://cloud.google.com/compute/docs/instances/interacting-with-serial-console#setting_up_a_local_password
[3] https://cloud.google.com/compute/docs/instances/interacting-with-serial-console#connectserialconsole
I have a few instances running in Google Compute engine:
NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS
inst-dns2 xx-xxxx-x f1-micro 10.240.0.4 x.x.x.x RUNNING
inst-st0 xx-xxxx-x f1-micro 10.240.0.2 x.x.x.x RUNNING
inst-www0 xx-xxxx-x n1-standard-2 10.240.0.3 x.x.x.x RUNNING
inst-dns3 xx-xxxx-x f1-micro 10.240.0.5 x.x.x.x RUNNING
As you can see, they all have IPs in the same subnet. I have been using the built-in default network since I started.
NAME MODE IPV4_RANGE GATEWAY_IPV4
default legacy 10.240.0.0/16 10.240.0.1
I still have the default-allow-internal firewall rule:
NAME NETWORK SRC_RANGES RULES SRC_TAGS TARGET_TAGS
default-allow-internal default 10.240.0.0/16 tcp:1-65535,udp:1-65535,icmp
Yet, none of my instances can ping one another.
The docs state, "In the default network, by default, all virtual machine instances within a network can communicate with each other, because of the default firewall rules described above."
So, I am consfused. I thought I could do this without having to create custom networks and all that.
Any input would be much appreciated.
I resolved this by using official FreeBSD instance images instead of my own pre-built ones. Beware: the official images have their own issues, such as problems system date/time. Do your research and test before going into production.
Since google Compute engine does not provides internal DNS i created 2 centos bind machines which will do the resolving for the machines on GCE and forward the resolvings over vpn to my private cloud and vice versa.
as the google cloud help docs suggests you can have this kind of scenario. and edit the resolv.conf on each instance to do the resolving.
What i did was edit the ifcg-eth0 to disable the PEERDNS and in /etc/resolv.conf
i added the search domain and top 2 nameservrs my instances.
now after one instance gets rebooted..it wont start again because its searching for the metadata.google.internal domain
Jul 8 10:17:14 instance-1 google: Waiting for metadata server, attempt 412
What is the best practice in this kind of scenarios?
ty
Also i need the internal DNS for to do the poor's man round-robin failover, since GCE does not provides internal balancers.
As mentioned at https://cloud.google.com/compute/docs/networking:
Each instance's metadata server acts as a DNS server. It stores the DNS entries for all network IP addresses in the local network and calls Google's public DNS server for entries outside the network. You cannot configure this DNS server, but you can set up your own DNS server if you like and configure your instances to use that server instead by editing the /etc/resolv.conf file.
So you should be able to just use 169.254.169.254 for your DNS server. If you need to define external DNS entries, you might like Cloud DNS. If you set up a domain with Cloud DNS, or any other DNS provider, the 169.254.169.254 resolver should find it.
If you need something more complex, such as customer internal DNS names, then your own BIND server might be the best solution. Just make sure that metadata.google.internal. resolves to 169.254.169.254.
OK, I just ran in to this.. but unfortunately there was no timeout after 30 minutes that got it working. Fortunatly nelasx had correctly diagnosed it, and given the fix. I'm adding this to give the steps I had to take based on his excellent question and commented answer. I've just pulled the info I had to gather together in one place, to get to a solution.
Symptoms: on startup of the google instance - getting connection refused
After inspecting serial console output, will see:
Jul 8 10:17:14 instance-1 google: Waiting for metadata server, attempt 412
You could try waiting, didn't work for me, and inspection of https://github.com/GoogleCloudPlatform/compute-image-packages/blob/master/google-startup-scripts/usr/share/google/onboot
# Failed to resolve host or connect to host. Retry indefinitely.
6|7) sleep 1.0
log "Waiting for metadata server, attempt ${count}"
Led me to believe that will not work.
So, the solution was to fiddle with the disk, to add in nelasx's solution:
"edit ifcfg-eth and change PEERDNS=no edit /etc/resolv.conf and put on top your nameservers + search domain edit /etc/hosts and add: 169.254.169.254 metadata.google.internal"
To do this,
Best to create a snapshot backup before you start in case it goes awry
Uncheck "Delete boot disk when instance is deleted" for your instance
Delete the instance
Create a micro instance
Mount the disk
sudo ls -l /dev/disk/by-id/* # this will give you the name of the instances
sudo mkdir /mnt/new
sudo mount /dev/disk/by-id/scsi-0Google_PersistentDisk_instance-1-part1 /mnt/new
where instance-1 will be changed as per your setup
Go in an edit as per nelasx's solution - idiot trap I fell for - use a relative path - don't just sudo vi /etc/hosts use /mnt/new/etc/hosts - that cost me 15 more minutes as I had to go through the: got depressed, scratched head, kicked myself cycle.
Delete the debug instance, ensuring your attached disk delete option is unchecked
Create a new instance matching your original with the edited disk as your boot disk and fire it up.
After creating the instance, I can login using gcutil or ssh. I tried copy/paste from the ssh link listed at the bottom of the instance and get the same error message.
The permission denied error probably indicates that SSH private key authentication has failed. Assuming that you're using an image derived from the Debian or Centos images recommended by gcutil, it's likely one of the following:
You don't have any ssh keys loaded into your ssh keychain, and you haven't specified a private ssh key with the -i option.
None of your ssh keys match the entries in .ssh/authorized_keys for the account you're attempting to log in to.
You're attempting to log into an account that doesn't exist on the machine, or attempting to log in as root. (The default images disable direct root login – most ssh brute-force attacks are against root or other well-known accounts with weak passwords.)
How to determine what accounts and keys are on the instance:
There's a script that runs every minute on the standard Compute Engine Centos and Debian images which fetches the 'sshKeys' metadata entry from the metadata server, and creates accounts (with sudoers access) as necessary. This script expects entries of the form "account:\n" in the sshKeys metadata, and can put several entries into authorized_keys for a single account. (or create multiple accounts if desired)
In recent versions of the image, this script sends its output to the serial port via syslog, as well as to the local logs on the machine. You can read the last 1MB of serial port output via gcutil getserialportoutput, which can be handy when the machine isn't responding via SSH.
How gcutil ssh works:
gcutil ssh does the following:
Looks for a key in $HOME/.ssh/google_compute_engine, and calls ssh-keygen to create one if not present.
Checks the current contents of the project metadata entry for sshKeys for an entry that looks like ${USER}:$(cat $HOME/.ssh/google_compute_engine.pub)
If no such entry exists, adds that entry to the project metadata, and waits for up to 5 minutes for the metadata change to propagate and for the script inside the VM to notice the new entry and create the new account.
Once the new entry is in place, (or immediately, if the user:key was already present) gcutil ssh invokes ssh with a few command-line arguments to connect to the VM.
A few ways this could break down, and what you might be able to do to fix them:
If you've removed or modified the scripts that read sshKeys, the console and command line tool won't realize that modifying sshKeys doesn't work, and a lot of the automatic magic above can get broken.
If you're trying to use raw ssh, it may not find your .ssh/google_compute_engine key. You can fix this by using gcutil ssh, or by copying your ssh public key (ends in .pub) and adding to the sshKeys entry for the project or instance in the console. (You'll also need to put in a username, probably the same as your local-machine account name.)
If you've never used gcutil ssh, you probably don't have a .ssh/google_compute_engine.pub file. You can either use ssh-keygen to create a new SSH public/private keypair and add it to sshKeys, as above, or use gcutil ssh to create them and manage sshKeys.
If you're mostly using the console, it's possible that the account name in the sshKeys entry doesn't match your local username, you may need to supply the -l argument to SSH.
Ensure that the permissions on your home directory and on the home directory of the user on the host you're connecting to are set to 700 ( owning user rwx only to prevent others seeing the .ssh subdirectory ).
Then ensure that the ~/.ssh directory is also 700 ( user rwx ) and that the authorized_keys is 600 ( user rw ) .
Private keys in your ~/.ssh directory should be 600 or 400 ( user rw or user r )
I was facing this issue for long time. Finally it was issue of ssh-add. Git ssh credentials were not taken into consideration.
Check following command might work for you:
ssh-add
I had the same problem and for some reason The sshKeys was not syncing up with my user on the instance.
I created another user by adding --ssh_user=anotheruser to gcutil command.
The gcutil looked like this
gcutil --service_version="v1" --project="project" --ssh_user=anotheruser ssh --zone="us-central1-a" "inst1"
I just experienced a similar message [ mine was "Permission denied (publickey)"] after connecting to a compute engine VM which I just created. After reading this post, I decided to try it again.
That time it worked. So i see 3 possible reasons for it working the second time,
connecting the second time resolves the problem (after the ssh key was created the first time), or
perhaps trying to connect to a compute engine immediately after it was created could also cause a problem which resolves itself after a while, or
merely reading this post resolves the problem
I suspect the last is unlikely :)
I found this error while connecting ec2 instance with ssh.
and it comes if i write wrong user name.
eg. for ubuntu I need to use ubuntu as user name
and for others I need to use ec2-user.
You haven't accepted an answer, so here's what worked for me in PuTTY:
Without allowing username changes, i got this question's subject as error on the gateway machine.
You need to follow this instructions
https://cloud.google.com/compute/docs/instances/connecting-to-instance#generatesshkeypair
If get "Permission denied (publickey)." with the follow command
ssh -i ~/.ssh/my-ssh-key [USERNAME]#[IP_ADDRESS]
you need to modify the /etc/ssh/sshd_config file and add the line
AllowUsers [USERNAME]
Then restart the ssh service with
service ssh restart
if you get the message "Could not load host key: /etc/ssh/ssh_host_ed25519_key" execute:
ssh-keygen -A
and finally restart the ssh service again.
service ssh restart
I followed everything from here:
https://cloud.google.com/compute/docs/instances/connecting-to-instance#generatesshkeypair
But still there was an error and SSH keys in my instance metadata wasn't getting recognized.
Solution: Check if your ssh key has any new-line. When I copied my public key using cat, it added into-lines into the key, thus breaking the key. Had to manually check any line-breaks and correct it.
The trick here is to use the -C (comment) parameter to specify your GCE userid. It looks like Google introduced this change last in 2018.
If the Google user who owns the GCE instance is myname#gmail.com (which you will use as your login userid), then generate the key pair with (for example)
ssh-keygen -b521 -t ecdsa -C myname -f mykeypair
When you paste mykeypair.pub into the instance's public key list, you should see "myname" appear as the userid of the key.
Setting this up will let you use ssh, scp, etc from your command line.
Add ssh public key to Google cloud
cat ~/.ssh/id_rsa.pub
go and click your VM instances
edit VM instances
add ssh public key(from id_rsa.pub) in SSH keys area
ssh login from Git bash on your computer
ssh -i ~/.ssh/id_rsa tiennt#x.y.z.120