Is hosting my multiplayer HTML5 game on a free heroku dyno hurting my network performance?

Is hosting my multiplayer HTML5 game on a free heroku dyno hurting my network performance? - html

I've recently built a multiplayer game in HTML5 using the TCP-based WebSockets protocol for the networking. I already have taken steps in my code to minimize lag (using interpolation, minimizing the number of messages sent/message size), but I occasionally run into issues with lag and choppiness that I believe are happening due to a combination of packet loss and TCP's policy of in-order delivery.
To elaborate - my game sends out frequent websocket messages to players to update them on the position of the enemy players. If a packet gets dropped/delayed, my understanding is that it will prevent later packets from being received in a timely manner, which causes the enemy players to appear frozen in the same spot and then zoom to the correct location once the delayed packet is finally received.
I confess that my understanding of networking/bandwidth/congestion is quite weak. I've been wondering whether running my game on a single free heroku dyno, which is basically a VM on another virtual server (heroku dynos are on EC2 instances) could be exacerbating this problem. Do heroku dynos and multi-tenant servers in general tend to have worse network congestion due to noisy neighbors or other reasons?

Yes. You don't get dedicated networking performance from Heroku instances. Some classes of EC2 instances in a VPC can have "Enhanced Networking" enabled which is supposed to help give you dedicated performance.
Ultimately, though the best thing to do before jumping to a new solution is benchmarking. Benchmark what level of throughput you can get from a Heroku dyno then try benchmarking an Amazon instance to see what kind of difference it makes.

Related

What kind of latency is expected of contract calls?

I'm developing a dapp and got it working well using web3 and testrpc.
My frontend is currently pretty "chatty" with contract calls (constant methods) and everything works super fast.
I was wondering what kind of latency I should expect in the real network for simple calls? do I need to aggresively optimize my contract reads?

It depends. If your dApp is running on a node (and is fully synced), then constant functions will execute similar to what you're seeing in your testing. If not, then all bets are off. Your latency will depend on the provider you're connecting to.
My best advice is once you finish development, deploy to testnet and run performance tests. Chances are if you're not running a fully synced local node, and your app is as chatty as you say, then you may be disappointed with the results. You would want to look into optimizing your reads, moving some state data out of the contract (is possible), or turning your client into a light node.

Benefit of running Kubernetes in bare metal and cloud with idle VM or machines?

I want to know the high level benefit of Kubernetes running in bare metal machines.
So let's say we have 100 bare metal machines ready with kubelet being deployed in each. Doesn't it mean that when the application only runs on 10 machines, we are wasting the rest 90 machines, just standing by and not using them for anything?
For cloud, does Kubernetes launch new VMs as needed, so that clients do not pay for idle machines?
How does Kubernetes handle the extra machines that are needed at the moment?

Yes, if you have 100 bare metal machines and use only 10, you are wasting money. You should only deploy the machines you need.
The Node Autoscaler works at certain Cloud Providers like AWS, GKE, or Open Stack based infrastructures.
Now, Node Autoscaler is useful if your load is not very predictable and/or scales up and down widely over the course of a short period of time (think Jobs or cyclic loads like a Netflix type use case).
If you're running services that just need to scale eventually as your customer base grows, that is not so useful as it is as easy to simply add new nodes manually.
Kubernetes will handle some amount of auto-scaling with an assigned number of nodes (i.e. you can run many Pods on one node, and you would usually pick your machines to run in a safe range but still allow handling of spikes in traffic by spinning more Pods on those nodes.
As a side note: with bare metal, you typically gain in performance, since you don't have the overhead of a VM / hypervisor, but you need to supply distributed storage, which a cloud provider would typically provide as a service.

Kubernetes on GCE / Prevent pods undergoing an eviction with "The node was low on compute resources."

Painful investigation on aspects that so far are not that highlighted by documentation (at least from what I've googled)
My cluster's kube-proxy became evicted (+-experienced users might be able to consider the faced issues). Searched a lot, but no clues about how to have them up again.
Until describing the concerned pod gave a clear reason : "The node was low on compute resources."
Still not that experienced with resources balance between pods/deployments and "physical" compute, how would one 'prioritizes' (or similar approach) to make sure specific pods will never end up in such a state ?
The cluster has been created with fairly low resources in order to get our hands on while keeping low costs and eventually witnessing such problems (gcloud container clusters create deemx --machine-type g1-small --enable-autoscaling --min-nodes=1 --max-nodes=5 --disk-size=30), is using g1-small is to prohibit ?

If you are using iptables-based kube-proxy (the current best practice), then kube-proxy being killed should not immediately cause your network connectivity to fail, but new services and updates to endpoints will stop working. Still, your apps should continue to work, but degrade slowly.
If you are using userspace kube-proxy, you might want to upgrade.
The error message sounds like it was due to memory pressure on the machine.
When there is memory pressure, Kubelet tries to terminate things in order of lowest to highest QoS level.
If your kube-proxy pod is not using Guaranteed resources, then you might want to change that.
Other things to look at:
if kube-proxy suddenly used a lot more memory, it could be terminated. If you made a huge number of pods or services or endpoints, this could cause it to use more memory.
if you started processes on the machine that are not under kubernetes control, that could cause kubelet to make an incorrect decision about what to terminate. Avoid this.
It is possible that on such a small machine as a g1-small, the amount of node resources held back is insufficient, such that too much guaranteed work got put on the machine -- see allocatable vs capacity. This might need tweaking.
Node oom documentation

Why does the CPU load dropped in the last days?

Anybody has a hint? I didn't change anything in the machine (except for the security updates), and the sites hosted there didn't suffer a significant change in connections.
May be Google changed something in their infrastructure? Coincidentally, it was an issue with the Cloud DNS ManagedZone these days: they charged me with $ 920 for half month usage, and it was an error (they counted thousands of weeks of usage too) so they recently changed back to $ 0,28. May be there was some process that indeed used the Cloud DNS by error and thus consumed CPU power, and they corrected now?
I wish to know what is happening from someone that knows what going on in GC. Thank you.

CPU utilization reporting is now more accurate from a VM guest perspective as it doesn't include virtualization layer overhead anymore. It has nothing to do with Cloud DNS.
See this issue for some extra context:
https://code.google.com/p/google-compute-engine/issues/detail?id=281

What happens during Stand-By and Hibernation?

It just hit me the other day. What actually happens when I tell the computer to go into Stand-By or to Hibernate?
More spesifically, what implications, if any, does it have on code that is running? For example if an application is compressing some files, encoding video files, checking email, running a database query, generating reports or just processing lots of data or doing complicated math stuff. What happens? Can you end up with a bug in your video? Can the database query fail? Can data processing end up containing errors?
I'm asking this both out of general curiosity, but also because I started to wonder if this is something I should think about when I program myself.

You should remember that the OS (scheduler) freezes your program about a gazillion times each second. This means that your program can already function pretty well when the operating system freezes it. There isn't much difference, from your point of view, between stand-by, hibernate and context switching.
What is different is that you'll be frozen for a long time. And this is the only thing you need to think about. In most cases, this shouldn't be a problem.
If you have a network connection you'll probably need to re-establish it, and similar issues. But this just means checking for errors in all IO operations, which I'm sure you're already doing... :-)

My initial thought is that as long as your program and its eco-system is contained within the pc that is going on stand - by or hibernation, then, upon resume your program should not be affected.
However, if you are say updating a record in some database hosted on a separate machine then hibernation / stand - by will be treated as a timeout.
If your program is dependent on such a change in "power status" you can listen to WM_POWERBROADCAST Message as mentioned on msdn

Stand-By keeps your "state" alive by keeping it in RAM. As a consequence if you lose power you'll lose your stored "state".
But it makes it quicker to achieve.
Hibernation stores your "state" in virtual RAM on the hard disk, so if you lose power you can still come back three days later. But it's slower.
I guess a limitation with Stand-By is how much RAM you've got, but I'm sure virtual RAM must be employed by Stand-By when it runs out of standard RAM. I'll look that up though and get back!

The Wikipedia article on ACPI contains the details about the different power savings modes which are present in modern PCs.
Here's the basic idea, from how I understand things:
The basic idea is to keep the current state of the system persisted, so when the machine is brought back into operation, it can resume at the state it was before the machine was put into sleep/standby/hibernation, etc. Think of it as serialization for your PC.
In standby, the computer will keep feeding power to the RAM, as the main memory is volatile memory that needs constant refreshing to hold on to its state. This means that the hard drives, CPU, and other components can be turned off, as long as there is enough power to keep the DRAM refreshed to keep its contents from disappearing.
In hibernation, the main memory will also be turned off, so the contents must be copied to permanent storage, such as a hard drive, before the system power is turned off. Other than that, the basic premise of hiberation is no different from standby -- to store the current state of the machine to restore at a later time.
With that in mind, it's probably not too likely that going into standby or hibernate will cause problems with tasks that are executing at the moment. However, it may not be a good idea to allow network activity to stop in the middle of execution, as depending on the protocol, your network connection could timeout and be unable to resume upon returning the system to its running state.
Also, there may be some machines that just have flaky power-savings drivers which may cause it to go to standby and never come back, but that's completely a different issue.

There are some implications for your code. Hibernation is more than just a context switch from the scheduler. Network connections will be closed, network drives or removable media might be disconnected during the hibernation, ...
I dont think your application can be notified of hibernation (but I might be wrong). What you should do is handle error scenarios (loss of network connectivity for example) as gracefully as possible. And note that those error scenario can occur during normal operation as well, not only when going into hibernation ...

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008