"Internal 500 server error" after VM runs for a day or two.
This is the second time it has happened, I start the instance, install littlest Jupyterhub
(see details below). I can login to the external ip, for a day, but then it stops
with internal 500 error. I cannot ssh or get into the instance, only alternate is to
create a new instance and re-do. What is the problem?
I have installed littlest jupyterhub using on this instance, using
#!/bin/bash
curl https://raw.githubusercontent.com/jupyterhub/the-littlest-jupyterhub/master/bootstrap/bootstrap.py | sudo python3 - --admin master
I would recommend you enable access on your instance to the serial console [1].
You will also need to setup a password for your user following this documentation [2].
With these two steps done, you should be able to reconnect to your instance once you are locked out like you mentioned by following this [3].
You should then be able to investigate what is going on in the instance.
Then try to verify if your application is still running, if the SSH server is still running etc.
Frederic
[1] https://cloud.google.com/compute/docs/instances/interacting-with-serial-console#enable_instance_access
[2] https://cloud.google.com/compute/docs/instances/interacting-with-serial-console#setting_up_a_local_password
[3] https://cloud.google.com/compute/docs/instances/interacting-with-serial-console#connectserialconsole
Related
I am using couchbase server 6.0.2 image from RedHat
https://access.redhat.com/containers/?tab=overview&get-method=registry-tokens#/registry.connect.redhat.com/couchbase/server
in openshift.
The Pod is running but does not react to http://localhost:8091. The Logs show the error shown below.
I have 3 questions:
Why is whoami failing in the entrypoint?
Why isn't the server responding on port 8091?
Does the couchbase server image require root permissions?
It seems the couchbase/server image is expecting to be run as root, then creates its own user couchbase and group couchbase.
At the end it's running an entrypoint script and in there checking if the user running the whole thing, is actually the user couchbase by executing the whois command.
This is not the case if you just run it in openshift, as the container will be run as some "random" unprivileged user.
This leads to a set of consecutive failures:
Here You will find the evaluation that is done in the entrypoint.sh.
Now the whois command is failing since there is not actual user just said random UID. that failing, leaves the first part of the evaluation blank, which will result in a failure.
This is a bug in the couchbase/server image and as such you should, if time allows contribute to fixing by opening an issue against that repo.
I'm desperate for help here. I have a compute engine instance that hosts a lot of websites. These are the steps that I took:
Go to Compute Engine > Snapshots and take a snapshot of my instance
Click on the newly created snapshot and click Create Instance.
The new instance has all the configs of the current running instance
Then when I tried to access the new instance via SSH, it wouldn't work. Error message:
"Connection Failed
We are unable to connect to the VM on port 22. Learn more about possible causes of this issue."
Clicking on Learn more gets me to https://cloud.google.com/compute/docs/ssh-in-browser#ssherror
The instance is booting up and sshd is not yet running - Not sure how to check this
The instance is not running sshd - Not sure how to check this either
sshd is listening on a port other than the one you are connecting to - My current instance is having ssh running on port 22 so I guess this is fine?
There is no firewall rule allowing SSH access on the port - Again, my current instance is having ssh running so I don't think it's because of firewall, right?
The firewall rule allowing SSH access is enabled, but is not configured to allow connections from GCP Console services. - Same as above
The instance is shut down - Instance is still running.
Strange thing is if I create a fresh instance from scratch and then do the steps above to clone to a new instance then that new instance can be accessed normally via SSH.
Can anyone show me how to fix this if possible? Or show me how to see logs, check for what went wrong etc as I tried to google but pretty confused with all the jargons or where to find a particular stuff. Sorry for the wall of text. Thanks
**
Edit #1
**: I got technical support from Google. The steps below might help someone else, but not me as when I reached step 7, I waited forever and couldn't get to the login page.
1.) Go to the VM instances page and click on the Instance name of your VM.
2.) Click the Edit button at the top of the page.
3.) Under Custom metadata, click Add item.
4.) Set 'Key' to 'startup-script' and set 'Value' to this script:
#! /bin/bash
useradd -G sudo USERNAME
echo 'USERNAME:PASSWORD' | chpasswd
NOTE: change the value of USERNAME and PASSWORD to the name and password of your choice.
5.) Enable "Enable connecting to serial ports" by checking the box below the SSH button.
6.) Click Save and then click RESET on the top of the page. Wait for some time for the instance to reboot.
7.) Click on 'Connect to serial port' in the page. In the new window, you might need to wait a bit and press on Enter of your keyboard once; then, you should see the login prompt.
8.) Login using the USERNAME and PASSWORD you provided.
Note: Please do not share any of your password and username for your data security.
As those steps above couldn't help me and the Google support representative looked at the log but didn't see anything wrong, she suggested to debug SSH following this guide https://cloud.google.com/compute/docs/troubleshooting/troubleshooting-ssh#use_your_disk_on_a_new_instance which I will do when I have time. Feel like I'm writing an essay. Will keep posted
The troubleshooting steps that you can follow are:
Use the serial console to view your instance logs and check whether the new instance you created from the snapshot failed to start to the appropriate run level where the ssh daemon would get started. If sshd was not started you would not have ssh access to your instance.
You can try restarting the instance if it doesn’t affect production and try to gain ssh access again. Might be that some issue prevented the instance from starting up properly and restarting it could fix it.
You can try creating another VM instance from the snapshot in case the previous instance wasn’t created properly.
If creating a new VM instance from the snapshot doesn’t fix the issue, it might be that the snapshot itself wasn’t created properly. You can read this documentation guide, section Understanding snapshot best practices, and try creating another snapshot and VM instances from it.
I had the same problem and after a lot of searching, I found an answer from user Peripheral from ServerFault that worked for me.
I found the fix for me. A recent update has a known issue where it removes the default gateway from the iptables. To fix it, I have to go to the instance and select Edit. Scroll down, and under Custom Metadata put the following:
key: startup-script
value: route add default gw <gatewayIP> eth0
Save and restart the VM.
Source
All credits to him/her, just want to share to help others find their solution faster.
I had the same issue. I eventually figured that it was because I attached a persistent disk added an entry into the /etc/fstab file. This entry is supposed to automatically mount the attached disk upon restart of the instance.
However, when I created a snapshot of the boot disk, I didn't remove the /etc/fstab entry. So creating a new instance from this snapshot will always cause a boot error as the script tries to mount a disk that is not attached.
This information is present in the documentation
I created an app called "world" following the instructions from:
https://blog.openshift.com/12-tips-for-hosting-wordpress-on-openshift/.
It's a hosted Wordpress blog, with PHP 5.4 scalable up to 1GB, with a Web Load Balancer and MySQL 5.5.
Everytime I try to check for the space used, I get the same error.
rhc show-app world --gears quota
Unable to connect to gear 54d48383fcf933f91f0000aa#54d48383fcf933f91f0000aa-laurapons.rhcloud.com
Unable to connect to gear 54d48383fcf933f91f0000a9#world-laurapons.rhcloud.com
Gear Cartridges Used Limit
------------------------ ------------------- ----- -----
54d48383fcf933f91f0000aa mysql-5.5 error 1 GB
54d48383fcf933f91f0000a9 haproxy-1.4 php-5.4 error 1 GB
I tried to restart the application (using restart and stop&start commands) but nothing seems to work.
I am also facing some other connection problems (probably related to the same issue):
I have the same problem when trying to clone the application with git clone:
ssh: connect to host world-laurapons.rhcloud.com port 22: Bad file number
fatal: Could not read from remote repository.
Please make sure you have the correct access rights and the repository exists.
And also with the rhc port-forward world
I copied the URL for git clone from the openshift online dashboard, and I can open the wordpress blog and see all the information, but somehow, I'm unable to access to the data.
I have already created a default Public Key and 2 authorisations (one to access through the browser and the other to access through RHC)...
What should I try?
How can I get the usage?
Do I need to set up anything else?
I am stuck... any suggestion?
Sounds like your SSH key is not working properly. Make sure you installed your keys and that they are working. Try running 'rhc setup'. If that still doesn't work try
ssh -vvv 54d48383fcf933f91f0000a9#world-laurapons.rhcloud.com
and look at the output.
You can also try using
ssh -i /path/to/your/ssh.key 54d48383fcf933f91f0000a9#world-laurapons.rhcloud.com
And see if that works (specifies what ssh key to use)
rhc with some ruby version will have issue with pageant (putty). I closed pageant, ran again rhc command then it worked.
I have searched all over google for this issue but found no answere yet, so I thought asking here.
I have a google compute instance and I had a running putty ssh connection that worked flawlessly. But after I formatted my PC, everything went wrong.
I installed gcloud and done the whole procces of ssh again (config-ssh, adding ssh to key list and trying to connect), also I was trying to connect to my old user after I realized that I typed a different name to my windows user name. Suddenly I got the No supported authentication message. So I thought something is wrong with the ssh keys, But I realized that I cannot connect to my user even through the google web browser window, the connection is always stuck on trying to connect until timeout.
I would gladly appreciate any help :)
gcloud compute ssh currently has a known problem and might not work on Windows.
Here's a workaround until we fix it: run "gcloud compute ssh INSTANCE --dry-run". This will output the command it tries to execute.
Copy that command. You can either add -W flag to it and run it, or replace ssh.exe with ssh-term.exe and remove the -o flags.
If gcloud is installed in a place like Program Files, you might also need to add "" around the path.
First of all, run the following command (replacing the word in capital letters) which will ensure that your SSH key is created if it was not created before: gcloud compute ssh INSTANCE
Then, follow these steps to add your SSH key to your project and SSH into your instance:
1- Copy the content of C:\Users\<username>\.ssh\ google_compute_engine.pub(might be different path based on each Windows version) into the project metadata (Developers Console -> PROJECT -> Compute -> Metadata -> SSH keys -> Edit -> Add key).
If you want to log in as a different user, you can do it modifying it in the last word of the pasted text: <username>#<hostname>
2- Configure Putty. Go to Connection -> SSH -> Auth -> Browse and select your Putty SSH key which should be located in C:\Users\<username>\.ssh\ google_compute_engine.ppk)and try to SSH into the instance.
3- If it doesn't work, remove the instance metadata because the instance metadata overrides the project metadata. To do that, go to Compute-> Compute Engine-> INSTANCE -> SSH keys -> Edit -> Click on every ‘x’ and save the changes.
Regarding your issue trying to access using the SSH button in the Developers Console, I’d reboot the instance if it’s not in production because there is a script that must be working properly in order to access from there: /usr/bin/python /usr/share/google/google_daemon/manage_accounts.py --daemon
I hope it helps.
I'm trying to execute the command in the Windows console:
C:\SphinxSearch\bin\indexer --all --config C:\SphinxSearch\sphinx.conf
But I get an error:
ERROR: index 'indexname': sql_connect: Can't create TCP/IP socket
(10093) (DSN=mysql://root:*#localhost:3306/test).
A data source is mysql. Before the server restart everyone works fine.
How can I fix it?
I'm having the same error 10093. It's a windows error code by the way. In my case it occurs when trying to run the indexer through the system account via a scheduled task. If I'm running it directly as administrator, there's not a problem.
According to the site above:
Either your application hasn't called WSAStartup(), or WSAStartup() failed, or--possibly--you are accessing a socket which the current active task does not own (i.e. you're trying to share a socket between tasks).
In my case I'm thinking it might be the last one, some security problem due to user SYSTEM being used in my scheduled task. I was able to solve it by using my admin user instead: in the scheduled task, I set to use my local admin account with the option to "Run when user is logged on or not" and "Do not store password". I've also checked "Run with highest privileges". This seems to have done the trick as now my indexes are rotating on schedule.