Google Cloud Compute Engine not Migrated Before Power off (Maintenance?) - google-compute-engine

I was under the impression according to docs here that if your VM was set to "Migrate VM Instance" for "on host maintenance" there would be no downtime.
I have 2 issues:
1) I had a VM go down this weekend and here is what the auth.log shows:
Sep 10 20:28:07 comp_engine_name systemd-logind[2571]: Power key pressed.
Sep 10 20:28:07 comp_engine_name systemd-logind[2571]: Powering Off...
Sep 10 20:28:07 comp_engine_name systemd-logind[2571]: System is powering down.
Sep 10 20:28:07 comp_engine_name sshd[3291]: Received signal 15; terminating.
Sep 10 20:28:07 comp_engine_name systemd: pam_unix(systemd-user:session): session closed for user username
My Google cloud platform logs for Compute Engine show that at 20:28, my instance received a "GCE_API_CALL" event type for "compute.instances.stop." The user for this event was "". So not very helpful.
2) The instance was not automatically restarted.
Here is a screenshot of my relevant settings for this VM.
I did not shut it down myself, and there are no scripts running on it that controls power/restart. I also do not see evidence that my security was compromised (although I admit I'm not an expert).
Am I wrong or should neither of these things happened?

Related

Cannot connect to localhost:3306 on MySQL Workbench on Mac

This is something that is recently happening. I don't know if it is due to updating my MAC OS. I am at macOS Catalina 10.15.4.
When I try to test my connection to localhost on 3306 I get the following:
Failed to Connect to MySQL at localhost:3306 with user root
If I go into System Settings I see the following:
If I try to start it the colored LEDs briefly go GREEN before going back to RED.
From the command line I have tried other solutions I have seen such as:
sudo /usr/local/mysql/support-files/mysql.server start
I get the following error:
Starting MySQL
. ERROR! The server quit without updating PID file (/usr/local/mysql/data/lester2s-Mini.pid).
Any thoughts on how I can get this working again?
Here is what I found at /var/log/system.log:
May 7 13:28:06 lester2s-Mini xpcproxy[3576]: libcoreservices: _dirhelper_userdir: 557: bootstrap_look_up returned (ipc/send) invalid destination port
May 7 13:28:06 lester2s-Mini com.apple.xpc.launchd[1] (com.oracle.oss.mysql.mysqld[3576]): Service exited with abnormal code: 2
May 7 13:28:06 lester2s-Mini com.apple.xpc.launchd[1] (com.oracle.oss.mysql.mysqld): Service only ran for 0 seconds. Pushing respawn out by 10 seconds.
May 7 13:28:12 lester2s-Mini com.apple.xpc.launchd[1] (com.oracle.oss.mysql.mysqld): This service is defined to be constantly running and is inherently inefficient.
May 7 13:28:12 lester2s-Mini xpcproxy[3582]: libcoreservices: _dirhelper_userdir: 557: bootstrap_look_up returned (ipc/send) invalid destination port
May 7 13:28:12 lester2s-Mini com.apple.xpc.launchd[1] (com.oracle.oss.mysql.mysqld[3582]): Service exited with abnormal code: 2
May 7 13:28:12 lester2s-Mini com.apple.xpc.launchd[1] (com.oracle.oss.mysql.mysqld): Service only ran for 0 seconds. Pushing respawn out by 10 seconds.

Openshift Backup - Server Not Reachable

I had openshift 2 starter account where I had my application running.
Openshift 2 been shut down and now I got mail to migrate it to 3
But I don't have backup of an application
I am getting following errors
Upon rhc save-snapshot myapp I am getting following error.
Error in trying to save snapshot. You can try to save manually by
running: ssh 54f03dbd4382ec9101000159#myapp-myapps.rhcloud.com
'snapshot' > myapp.tar.gz
If I try to ssh application then connection is getting closed.
ssh 54f03dbd4382ec9101000159#myapp-myapps.rhcloud.com
Connection to myapp-myapps.rhcloud.com closed.
If I try to restart application from console then I am getting error
could not open session
could not open session
could not open session Failed to execute: 'control restart' for
/var/lib/openshift/54f03dbd4382ec9101000159/mysql Failed to execute:
'control restart' for
/var/lib/openshift/54f03dbd4382ec9101000159/phpmyadmin Failed to
execute: 'control restart' for
/var/lib/openshift/54f03dbd4382ec9101000159/php
EDIT : I get following error in browser when I try to open my site.
Proxy Error
The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request GET /.
Reason: Error reading from remote server
Apache/2.2.15 (Red Hat) Server at www.mydomain.com Port 80
Need your suggestions. Thanks.
There are a new post on OpenShift blog:
Updated October 3, 2017
We understand how important your data is, and
we have made a one-time exception to allow you to access your
OpenShift Online v2 data. You have until October 5, 2017 at 4:00 PM
UTC to perform a backup of your application. If you have not used it
before, you can download the rhc tool here.
Then you can perform your backup until the 2017/10/05.
Reading around (don't know exactly where) I found that the payed accounts still working until decebmber 31th, so I updated to bronze and could restart the service and backup it. Don't know exactly if is because the upgrade or if was fixed some issue.

Google Compute Engine: Internal DNS server and issues with the resolving

Since google Compute engine does not provides internal DNS i created 2 centos bind machines which will do the resolving for the machines on GCE and forward the resolvings over vpn to my private cloud and vice versa.
as the google cloud help docs suggests you can have this kind of scenario. and edit the resolv.conf on each instance to do the resolving.
What i did was edit the ifcg-eth0 to disable the PEERDNS and in /etc/resolv.conf
i added the search domain and top 2 nameservrs my instances.
now after one instance gets rebooted..it wont start again because its searching for the metadata.google.internal domain
Jul 8 10:17:14 instance-1 google: Waiting for metadata server, attempt 412
What is the best practice in this kind of scenarios?
ty
Also i need the internal DNS for to do the poor's man round-robin failover, since GCE does not provides internal balancers.
As mentioned at https://cloud.google.com/compute/docs/networking:
Each instance's metadata server acts as a DNS server. It stores the DNS entries for all network IP addresses in the local network and calls Google's public DNS server for entries outside the network. You cannot configure this DNS server, but you can set up your own DNS server if you like and configure your instances to use that server instead by editing the /etc/resolv.conf file.
So you should be able to just use 169.254.169.254 for your DNS server. If you need to define external DNS entries, you might like Cloud DNS. If you set up a domain with Cloud DNS, or any other DNS provider, the 169.254.169.254 resolver should find it.
If you need something more complex, such as customer internal DNS names, then your own BIND server might be the best solution. Just make sure that metadata.google.internal. resolves to 169.254.169.254.
OK, I just ran in to this.. but unfortunately there was no timeout after 30 minutes that got it working. Fortunatly nelasx had correctly diagnosed it, and given the fix. I'm adding this to give the steps I had to take based on his excellent question and commented answer. I've just pulled the info I had to gather together in one place, to get to a solution.
Symptoms: on startup of the google instance - getting connection refused
After inspecting serial console output, will see:
Jul 8 10:17:14 instance-1 google: Waiting for metadata server, attempt 412
You could try waiting, didn't work for me, and inspection of https://github.com/GoogleCloudPlatform/compute-image-packages/blob/master/google-startup-scripts/usr/share/google/onboot
# Failed to resolve host or connect to host. Retry indefinitely.
6|7) sleep 1.0
log "Waiting for metadata server, attempt ${count}"
Led me to believe that will not work.
So, the solution was to fiddle with the disk, to add in nelasx's solution:
"edit ifcfg-eth and change PEERDNS=no edit /etc/resolv.conf and put on top your nameservers + search domain edit /etc/hosts and add: 169.254.169.254 metadata.google.internal"
To do this,
Best to create a snapshot backup before you start in case it goes awry
Uncheck "Delete boot disk when instance is deleted" for your instance
Delete the instance
Create a micro instance
Mount the disk
sudo ls -l /dev/disk/by-id/* # this will give you the name of the instances
sudo mkdir /mnt/new
sudo mount /dev/disk/by-id/scsi-0Google_PersistentDisk_instance-1-part1 /mnt/new
where instance-1 will be changed as per your setup
Go in an edit as per nelasx's solution - idiot trap I fell for - use a relative path - don't just sudo vi /etc/hosts use /mnt/new/etc/hosts - that cost me 15 more minutes as I had to go through the: got depressed, scratched head, kicked myself cycle.
Delete the debug instance, ensuring your attached disk delete option is unchecked
Create a new instance matching your original with the edited disk as your boot disk and fire it up.

Intermittent 2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0" to CloudSQL

I setup an old Django app on a new GCE instance on Sunday and pointed it to a new CloudSQL instance with imported data there. This code and data has successfully run over the past few years on a variety of dedicated hosting setups, on EC2 and on EC2+RDS.
Since Sunday I have had intermittent reports of 2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0" from the App. In particular today it happened in two bursts of 3 separated by about 7 hours.
I panicked in the earlier outages and restarted both the app and the CloudSQL instance which did the trick. However the latter ones today righted themselves after a few minutes.
I've never encountered this error before working with MySQL and any searching on the error gives results related to people who have general access problems to the DB.
On the GCE side the only difference I can think of from previous setups is that it is using Google's out-of-the-box Debian image instead of Ubuntu 12.04. On the MySQL side I have no idea as I've successfully run this on both MySQL 5.x and MariaDB.
Is there any way of figuring out why this is happening and fixing it?
Thanks.
Have you tried changing keep alive settings for TCP connections? GCE has a firewall rules that drops idle TCP connections after 10 mins:
https://cloud.google.com/compute/docs/troubleshooting#communicatewithinternet
You can check current value of 'tcp_keepalive_time':
cat /proc/sys/net/ipv4/tcp_keepalive_time
And change it to 60 seconds:
vi /etc/sysctl.conf
# Add this line
net.ipv4.tcp_keepalive_time = 60
# Reload Sysctl interface
sudo /sbin/sysctl --load=/etc/sysctl.conf
You might need to restart Django server to pick new keep alive settings.
Note: If this problem was limited to yesterday (18/11/2014) and your Cloud SQL instances are located in EU, you might have been affected by this:
https://groups.google.com/forum/#!topic/google-cloud-sql-announce/k5raPT48hc0
It seems there is an issue with the firewall, preventing incoming connections, just change the location from the server to another mirror things suppose to go fine. It worked for me

Best way to debug MySQL connections that are being closed on me after ~39 minutes?

I have hibernate 3.3, c3p0, MySql 5.1, and Spring.
The MySQL connections in my service calls are consistently being closed after ~39 minutes. The natural running time of my service call is on the order of ~5 hours.
I've tried changing various c3p0 config, etc, to avoid the 39 minute cap. No luck.
Is there a more direct, systematic way to log or troubleshoot this? i.e. can I find out why the connection is being closed, and by whom, at which layer?
Update: stack trace
24 Oct 2010 02:22:12,262 [WARN] 012e323c-df4b-11df-89ed-97e9a9c1ac19 (Foobar Endpoint : 3) org.hibernate.util.JDBCExceptionReporter: SQL Error: 0, SQLState: 08003
24 Oct 2010 02:22:12,264 [ERROR] 012e323c-df4b-11df-89ed-97e9a9c1ac19 (Foobar Endpoint : 3) org.hibernate.util.JDBCExceptionReporter: No operations allowed after connection closed.
24 Oct 2010 02:22:12,266 [ERROR] 012e323c-df4b-11df-89ed-97e9a9c1ac19 (Foobar Endpoint : 3) org.hibernate.event.def.AbstractFlushingEventListener: Could not synchronize database state with session
I have hibernate 3.3, c3p0, MySql 5.1, and Spring. The MySQL connections in my service calls are consistently being closed after ~39 minutes. The natural running time of my service call is on the order of ~5 hours.
I'm not sure I understood. Do you have processes that are supposed to run for 5 hours but currently get aborted after ~39mn (or probably 2400 seconds). Can you confirm? What is previously working? Did you change anything?
Meanwhile, here are some ideas:
start with the database (see B.5.2.11. Communication Errors and Aborted Connections)
start the server with the --log-warnings option and check the logs for suspicious messages
see if you can reproduce the problem using a MySQL client from the db host
if it works, do the same thing from the app server machine
it if works, you'll know MySQL is ok
move at the app server level
activate logging (of Hibernate and C3P0) to get a full stack trace and/or more hints about the culprit
also please show your C3P0 configuration settings
And don't forget that C3P0's configuration when using Hibernate is very specific and some settings must go in a c3p0.properties file.
Auto reconnect configuration
http://hibernatedb.blogspot.com/2009/05/automatic-reconnect-from-hibernate-to.html