I have created instance group for windows in google compute engine based on CPU Utilization(60%) and I executed simple while loop program to make use CPU more than 60% on first instance in group.
As CPU load increases in first instance, instance group automatically started few more instances in group to handle the load. But I don't see my program running in that other instances but CPU utilization is showing increased in those extra instances also.
Can anyone help how exactly it share CPU load between instances in group?
Thanks In Advance...
Related
We are using a private GCP account and we would like to process 30 GB of data and do NLP processing using SpaCy. We wanted to use more workers and we decided to start with a maxiumn number of worker of 80 as show below. We submited our job and we got some issue with some of the GCP standard user quotas:
QUOTA_EXCEEDED: Quota 'IN_USE_ADDRESSES' exceeded. Limit: 8.0 in region XXX
So I decided to request some new quotas of 50 for IN_USE_ADDRESSES in some region (it took me few iteration to find a region who could accept this request). We submited a new jobs and we got new quotas issues:
QUOTA_EXCEEDED: Quota 'CPUS' exceeded. Limit: 24.0 in region XXX
QUOTA_EXCEEDED: Quota 'CPUS_ALL_REGIONS' exceeded. Limit: 32.0 globally
My questions is if I want to use 50 workers for example in one region, which quotas do I need to changed ? The doc https://cloud.google.com/dataflow/quotas doesn't seems to be up to date since they only said " To use 10 Compute Engine instances, you'll need 10 in-use IP addresses.". As you can see above this is not enought and other quotas need to be changed as well. Is there some doc, blog or other post where this is documented and explained ? Just for one region there are 49 Compute Engine quotas that can be changed!
I would suggest that you start using Private IP's instead of Public IP addresses. This would help in you in 2 ways:-
You can bypass some of the IP address related quotas as they are related to Public IP addresses.
Reduce costs significantly by eliminating network egress costs as the VM's would not be communicating with each other over public internet. You can find more details in this excellent article [1]
To start using the private IP's please follow the instructions as mentioned here [2]
Apart from this you would need to take care of the following quota's
CPUs
You can increase the quota for a given region by setting the CPUs quota under Compute Engine appropriately.
Persistent Disk
By default each VM needs a storage of 250 GB therefore for 100 instances it would be around 25TB. Please check the disk size of the workers that you are using and set the Persistent Disk quota under Compute Instances appropriately.
The default disk size is 25 GB for Cloud Dataflow Shuffle batch pipelines.
Managed Instance Groups
You would need to take that you have enough quota in the region as Dataflow needs the following quota:-
One Instance Group per Cloud Dataflow job
One Managed Instance Group per Cloud Dataflow job
One Instance Template per Cloud Dataflow job
Once you review these quotas you should be all set for running the job.
1 - https://medium.com/#harshithdwivedi/how-disabling-external-ips-helped-us-cut-down-over-80-of-our-cloud-dataflow-costs-259d25aebe74
2 - https://cloud.google.com/dataflow/docs/guides/specifying-networks
I've been using autoscaling based on cpu usage. We used to set it up using a single zone, but to ensure instance availability we are now creating it with multi-zone enabled.
Now it seems to create much more instances than required according to cpu usage. I believe it has to do with the fact that instances are created among different zones and the total usage calculation somehow is not taking that into consideration.
From the documentation, the regional autoscaler will need at least 3 instances that will be located in 3 different zones, even though your utilisation is lower and it could be served from an instance in a single zone. This is to provide resiliency, because a region is less likely to go down than a single zone.
I've got the following setup:
Instance template for n1-standard-1 instances, HTTP(S) accessible, on SSD disks
Instance group with named ports for 80/443, autoscaling turned on with min/max=2/10 instances, target CPU=60%, cool-down=60s, and initial delay=600s
Group health check on port 80 every 10s with a threshold of 3 attempts
GCE HTTP(S) load balancer with above group as HTTP backend, max CPU=80%, health check identical to the one defined above for the group
Everything else is default. What I'm seeing from my graphs is that my 2 instances are regularly re-starting for no apparent reason. The instances both re-start every 6 hours, but staggered an hour apart so they're at least never down at the same time. The instance template is made from the disk of an instance that ran reliably (i.e. without regular, inexplicable re-starts) for months outside of an auto-scaling group. I've never seen one of my instances listed as unhealthy in the LB dashboard, but if I had to guess, I'd guess that my health checks are misconfigured somehow. Thanks.
Running "gcloud compute operations list" yields events of type "compute.instances.repair.recreateInstance" the correspond exactly to the periodic restarts. I have no idea why this is happening and haven't found any clues searching.
Your instances are restarted because they are probably unhealthy. Please check whether BackendSevrice.GetHealth(group) returns HEALTHY for all of the instances. If not, this might be case of your server as well as some misconfiguration in firewalls for range 130.211.0.0/22 (https://cloud.google.com/compute/docs/load-balancing/health-checks)
I have been trying to determine what instances I should choose, for Compute engine and Cloud SQL, for when we lunch our product.
Initially I'm working on handling max 500 users per day, with peak traffic likely to occur during the evenings. The users are expected to stay on the site with constant interactions for a lengthily period of time (10min+).
So far my guess's lead me to the following:
Compute engine:
n1-standard-2 ->
2 virtuals cpu's, 3.75GB memory
Cloud SQL:
D2 ->
1GB ram, max 250 concurrent users
Am I in the right ball park, or can I use smaller/larger instances?
I'd say to use appropriate performance testing tools to simulate the traffic that will be hitting your server and estimate the amount of the resources you will require to handle the requests.
For Compute Engine VM instance, you can go with a lighter machine type and take advantage of the GCE Autoscaler to automatically add more resources to your front-end when the traffic goes high.
I recommend watching this video.
I have used Google Compute Engine for my backend (debian-lamp), suddenly it gets deleted automatically without any user interaction and also doesn't shows the operation(Deletion of VM Instance ) performed by which user. I have also attached the image of Google Compute Engine Operations for further study.
I want to know why does this happened and what are the ways to restore the deleted instance.
Note: I am using trial version of Google Compute Engine and this was my second VM Instance created in Current Project.
It looks like the instance was deleted by the Instance Group Manager after you resized the instance group (most likely to zero). To learn about why this happened, visit the docs pages for Instance Groups and the Instance Group Manager.
If you resize the Instance Group back up to 1, the Instance Group Manager will create a new VM automatically.