My servers status suddenly got PROVISIONING in Google Compute Engine - google-compute-engine

After restarting some of my servers in Google Compute Engine and try to connect them via ssh they are all in PROVISIONING status for more than 4 hours!
According to google documentation:
https://cloud.google.com/compute/docs/instances#checkmachinestatus
PROVISIONING - Resources are being reserved for the instance. The
instance isn't running yet.
Well, they were working for more than one month.
I tried several time to turn them off via gcloud command-line tool but it didn't work.
check for any problem in Google Cloud Status, nothing is mentioned there for today:
https://status.cloud.google.com
Any idea?

In some cases where your "PROVISIONING" takes too long, depending on the region/zone/time you try to create your instances in, the issue could be related to (generally, but not limited to. I'm just giving some ideas):
Limited resources in that zone, at that time, could cause the instances to hang in the provisioning status. Google usually takes care of this pretty quickly, within a few days or something. I don't have exact information but they add more resources to the zone and the issue disappears. They also add new zones and such, so it could be worth moving to a new zone if more resources are readily available there.
Temporary issue that should resolve itself.
A few questions I have for you:
Is this still happing? Considering the fact that you posted about a month ago, I assume you've gotten past this issue and everything is working as expected at this time. If so, I'd recommend posting an answer yourself with details on what happened or what you did to fix it. You can then accept your own answer so that others can see what fixed it.
Have you tried creating instances in different zones to see if you have the problem everywhere or just within a specific one?
All in all, this is usually a transient issue, based on my experience with the Google Cloud Platform. If this is still causing trouble for you, give us some more information on what is currently happening and the community might be able to help better.

Related

Is it possible to get GCP's ANY distribution for Kubernetes GKE node pool?

I have a GKE Kubernetes cluster running on GCP. This cluster has multiple node pools set with autoscale ON and placed at us-central1-f.
Today we started getting a lot of errors on these Node pools' Managed Instance Groups saying that us-central1-f had run out of resources. The specific error: ZONE_RESOURCE_POOL_EXHAUSTED_WITH_DETAILS
I've found another topic on Stackoverflow with a similar question, where the answer points to a discussion on Google Groups with more details. I know that one of the recommended ways of avoiding this is to use multiple zones and/or regions.
When I first faced this issue I wondered if there is a way to set multiple region as a fallback system, instead of redundancy system. In that sense, I would set my VMs to be placed wherever zone that has available resources prioritizing the ones closer to, lets say, us-central1-f.
Then, reading the discussion on the Google Group I found a feature that caught my attentions which is the ANY distribution method for Managed Instance Groups. It seems that this feature does exactly what I need - the zone fallback.
So, my question: Does the ANY distribution method resolve my issue? Can I use it for GKE Node Pools? If not, is there any other solution other than using multiple zones?
It is possible to get a regional (i.e. multi-zonal) GKE deployment, however this will use multiple zonal MIGs as the underlying compute layer. So technically speaking you will not use the ANY distribution method, but you should achieve pretty much the same result.

Google Cloud SQL Restart and Update

Randomly, out of nowhere, we stopped being able to connect to the Google Cloud SQL database at almost precisely 8:30 am ET this morning.
We then tried to restart the instance and have been stuck for more than an hour with a similar situation to this question. It seems that this sort of freak accident has happened before on Google Cloud SQL.
The problem is that the instance is completely unresponsive to any commands - either via the GUI or the command line.
To make matters worse, there's no way to call support unless you pay hundreds of dollars per month to join a plan. I'm hoping that someone from Google might be trolling the SO threads with these tags, or someone who has dealt with this before can offer some advice.
Just providing update to this, for anyone who comes across the future...
The issue was with the Google Cloud SQL instance itself. Someone from tech support had to go in at a lower level and restart the entire instance. Basically, there's nothing you can do if you encounter the exact same situation.
This issue happened again just a few days ago (twice in the same 4 weeks), and again, there was nothing we could do.
NOTE: When this happens, you CANNOT access your db backups. This is serious cause for concern.
This seems very strange for a hosted db product, and I've come across similar cases documented by others.
We're still waiting on a post mortem that has taken almost 2 weeks, despite our upgrading to "gold level support" for $400 per month. In the meantime, we're migrating over to AWS as we've never experienced issues with downtown on RDS.

Drupal site: mysql queries not closing and entry resource limit reached

I have a drupal site (castlehillbasin.co.nz) that has a small number of users. Over the last few days it has suddenly hit the "entry processes limit" continually.
My host provider has shown me that there are many open queries that are sleeping, so are not getting closed correctly. They have advised "to contact a web-developer and check the website codes to see why the databases queries are not properly closing. You will need to optimize the database and codes to resolve the issue". (their words)
I have not made any changes or updates prior to the problem starting. I also have a duplicate on my home server that does not have this issue. The host uses cpanel and I can not see these 'sleeping' processes through mysql queries.
Searching is not turning up many good solutions, except raising the entry process limit (which is 20) and the host will not do that.
So I am a little stumped as to how to resolve the issue, any advice?
I think I have answered it myself. I got temporary ssh access and inspected live queries.
It was the Flickr module and the getimagesize() call timing out (which takes 2 minutes). Turns out it only uses this call for non-square image requests, so I have just displayed square images for now.
In progress issue here: https://www.drupal.org/node/2547171

Error accessing Cosmos through Hive

Literally from:
https://ask.fiware.org/question/84/cosmos-error-accessing-hive/
As the answer in the quoted FIWARE Q&A entry suggest the problem is fixed by now. its here: https://ask.fiware.org/question/79/cosmos-database-privacy/. However, it seems like other issues arisen related to the solution, namely: Through ssh connection, the typing the hive command results in the following error: https://cloud.githubusercontent.com/assets/13782883/9439517/0d24350a-4a68-11e5-9a46-9d8a24e016d4.png the hiveSQL queries work fine (through ssh) regardless the error message.
When launching exactly the same hiveSQL queries (each one of them worked flawlessly two weeks ago) remotely, the request times out even in absurd time windows (10 minutes). The most basic commands ('use $username;', 'show tables';) also time out.
(The thrift client is: https://github.com/garamon/php-thrift-hive-client)
Since the Cosmos usage is an integral part of our project, it is of utmost importance whether it is a temporal issue caused by the fixes or it is a permanent change in the remote availability (could not identify relevant changes in the documentation).
Apart from fixing the issue you mention, we moved to a HiveServer2 deployment instead of the old Hive server (or HiveServer1), which had several performance drawbacks dued to, indeed, the usage of Thrift (particularly, only one connection could be served at the same time). HiveServer2 now allows for parallel queries.
Being said that, most probably the client you are using is not valid anymore since it could be specifically designed for working with a HiveServer1 instance. The good news are it seems there several other client implementations for HS2 using PHP, such as https://github.com/QwertyManiac/hive-hs2-php-thrift (this is the first entry I found when performing a search in Google).
What is true is this is not officialy documented anywhere (it is only mentioned in this other SOF question). So, nice catch! I'll add it inmediatelly.

Persistence & Performance of libapache2-mod-log-sql

I have been working on a requirement for our apache2 logs to be recorded to a mysql database, instead of the text log file norm.
I had no difficulty in accomplishing the setup and config, and it works as expected, however there seems to be a bit of information that I cannot find (or may very well be that I am searching for the wrong thing).
Is there anyone out there who use (or even like to use) libapache2-mod-log-sql that are able to tell more about its connection to mysql? Is it persistent? What kind of resource impact should I expect?
These two issues are core to my research, and yet so rare to find info on.
thanks in advance.