I am running a Tornado webserver on Google Compute engine. The webserver returns a very simple JSON response. When I test the throughput capacity of this server, it seems to be throttled at 20 req/s. I can not achieve a higher throughput than 20 req/s.
I know that there is a Google Compute Engine API rate limit at 20 req/s. Is there some sort of Network/Instance rate limit that prevents my server fulfilling more than 20 req/s? How do I increase this limit?
The rate limit of 20 requests per second is not on the server, it is on the GCE API - like when you make calls from gcloud to create instances (it calls the GCE API underneath the covers).
As documented here, the network bandwidth of a GCE VM is limited mainly by the software you run on it, and to some extent by the size of the VM (VMs get up to 2 Gbps per core until 8 cores for a max rate of 16 Gbps). Nothing in the VM subsystem knows anything about requests or responses, it's all just IP traffic to us.
Related
Does anyone know that if there is a limit on network traffic among VMs in different data centers in Google Compute Engine?
Specifically, are there any performance limits if VMs in different DCs are frequently (every 5 ms) communicating with each other?
Thanks in advance.
I'm sure that there are some performance limits, but they should be fairly high if you're within the same region. (>100Mbps, possibly >1Gbps) Between regions, bandwidth is likely to be somewhat more variable, but I'd expect it to be >100Mbps on the same continent.
Note that there are also egress fees for traffic between VMs in different GCP zones, so you might want to pay attention to the total data transferred; 130Mbps would be around 1GB every minute, or $6/hour.
I have created an compute engine instance in Google Cloud environment. The instance hosts a service. Using some script I load tested the service by sending http requests at different load level. Now I want to visualize the resource utilization in R. Is there a simple API that I can use to fetch CPU utilization between X and Y time? X & Y are load test time.
Thanks
Use Google Cloud Monitoring API.
Thecompute.googleapis.com/instance/cpu/utilization metric returns the percentage of the allocated CPU that is currently in use on the instance.
Thecompute.googleapis.com/instance/cpu/usage_time metric is a delta type and you can specify the time interval for CPU usage. You can try these APIs at this link or at Google API Explorer.
I have two VMs, one with 2 CPU and another with 1 CPU, but, when I "click to deploy" a mongodb, I get the message: "Quota 'CPUS' exceeded. Limit: 8.0"
3 + 5 = 8 ...
What should I do? Is there a CPU count cache or something?
Are you in a free trial account ? if so, you are limited to 8 vCPUs. From what you mentioned, you are already using 3 vCPUs..so you should have 5 available.
When launching MongoDB click-to-deploy, make sure you are using n1-standard-1 or n1-standard-2 machine types and lower the MongoDB server nodes (default is 2 from what I see)
Also, please be advised that while the deployment is in progress, there is another machine that gets instantiated (a deployment coordinator), which takes an additional vCPU..so take that into account
You can check your current CPU quota in Developers Console > Compute Engine > Quotas
I've been using GCE for several weeks with free credits and have repeatedly found that the quota values keep changing. The default CPU quota is 24 per region, but, depending on what other APIs I enable and in what order, that can silently change to 2 or 8. And today when I went to use it the CPU quota had again changed from 24 to 2 even though I hadn't changed what APIs were enabled. Disabling then re-enabling Compute Engine put the quota back to 24, but that is not a very satisfactory solution. This seems to be a bug to me. Anyone else have this problem and perhaps a solution? I know about the quota increase request form, but it says that if I request an increase than that is the end of my free credits.
Free trial in GCE has some limitation as only 2 concurrent cores at a time, so if for some reason you were able to change it to 24 cores, it's expected that it will be back at 2 cores.
I was planning to use Google Geocoding API. I was wondering what is the latency I should expect in getting the response back? I cannot find out these details on the website.
Does anyone aware of what will be the actual latency if I am using Google Geocoding API?
Meaning how much time it will take to get the response back from the Geocoding API.
we have a live app working in the playstore and we get roughly 120-150 hits per hour. Our median latency is around 210 ms and latency (98%) is 510 ms.
We have an application 24x7 with ~2 requests per second.
Median: 197.08 ms
98th percentile (slowest 2%): 490.54 ms
Could be a high bottle neck for you application... use some strategies to help you:
Memory cache
Secondary cache
batch persistence